Dear Dropboxers,
would it be possible to see an example for large file download, equivalent to https://www.dropboxforum.com/hc/en-us/community/posts/205544836-python-upload-big-file-example for the upload?
Thanks.
It is already implemented in the Java SDK.
(It is also implemented in the API v1 Python client, but I can't recommend using that as it's deprecated.)
If you wanted to implement it manually, or modify the Python SDK, here's a sample of what it would look like in curl for reference:
curl -X POST https://content.dropboxapi.com/2/files/download \ --header "Authorization: Bearer ACCESS_TOKEN" \ --header "Dropbox-API-Arg: {\"path\": \"/test.txt\"}" \ --header "Range:bytes=0-2"
That would download just the first 3 bytes of the file at /test.txt.
Hi Olaf, for downloading, you generally just need a simple call to the files_download method. There's a sample here:
https://stackoverflow.com/documentation/dropbox-api/408/downloading-a-file/1350/downloading-a-file-using-the-dropbox-python-library#t=201607262010533665443
Are you running into issues downloading large files?
Hi Gregory,
thanks. I am working with very large files, up to several tens of GB, and it would be nice to be able to parcel them in chunks, as we do for the upload. The motivation is the same: to be able to monitor progress and retry in the case of failure. I cannot find the the right tools for that in the API. Are they not provided, or am I looking in the wrong place?
I see, thanks for the additional context! The API itself does actually support Range Retrieval Requests which can be used to download files in pieces, but this functionality unfortunately isn't currently exposed in the API v2 Python SDK. We'll consider this a feature request for that though.
Is it available in some other programing language?
OK, thank you, this helps.
I have the same problem - want to download a very large file in a streaming format in the v2 API and used to use v1 "get_file()" functionality. The idea is to use it for a "tar" restore from backup, and pulling it all into memory ala "files_download()" would get ugly fast.
Any hope that this will make it back into the exposed Python API?
Thanks!
Hi Matt, it sounds like your request may actually be slightly different than what was being discussed on this thread. We were talking about downloading files in distinct chunks (similar to the chunked upload), but it sounds like you want to be able to stream the download as desired, like you currently do with the get_file method.
I believe the files_download method does already work the same way as that though. It returns a requests.models.Response object on which you can call iter_content to iterate over the content, streaming it off the connection.
That would look something like:
metadata, res = dbx.files_download(path)for data in res.iter_content(10): print(data)
Hope this helps!
Excellent! I will give that a try - thanks!
Hello!
This post is a bit old, but was the incremental download implemented in the v2 Python SDK?
I'm having troubles to manage large accounts (many files and large files) so I'm developing some tools using Python. Downloading large files (>20Gb) using the desktop application takes ages and have no control, and even using the navigator there are many interruptions or abortions, so the idea is to have total control about exactly what is being downloaded, and to be able to restart from the last sucessful chunk as needed.
I'm already being able to upload large files using files_upload_session_start/append/finish methods.
Thanks and regards.
@MarceloC No, unfortunately support for Range requests still hasn't been implemented in the Dropbox API v2 Python SDK.
Hi Greg,
Thank you for your reply!
Although I'm just starting with Python and still have no much knowledge of it's libraries or objects structure, I tried to implement your suggestion and got to have the partial download working! :slight_smile:
It is just an workaround and very FAR from a proper or elegant solution, but if anyone wants just to make it work, this is what I did... (use under your own responsability!)
As I could not even find from where the request_json_string method (where the header is set) is called to try to include some proper parameters, I just included 2 global variables in the dropbox unit, and created a method to set them from the main program.
In the beginning of the file dropbox.py (backup it before changing!!) I added the global (in class context) variables: (may be inserted in the first lines of the file)
Download_Range_Start=-1Download_Range_End=-1
Just before the request_json_string method definition (line 389 at the original dropbox.py file) I added the method to set the variables:
def set_download_range(self,Start,End): global Download_Range_Start, Download_Range_End Download_Range_Start=Start Download_Range_End=End
And, inside the request_json_string method included two lines (the 3rd and 4th lines below - line 433 at the original dropbox.py file):
... elif route_style == self._ROUTE_STYLE_DOWNLOAD: headers['Dropbox-API-Arg'] = request_json_arg if (Download_Range_Start>=0)and(Download_Range_End>=Download_Range_Start): headers['Range'] = 'bytes='+str(Download_Range_Start)+'-'+str(Download_Range_End) stream = True elif route_style == self._ROUTE_STYLE_UPLOAD: ...
To use just put in your main program before downloading the file: (dbx is the main dropbox object)
dbx.set_download_range(5,10)
This will make the download bring from the 6th to the 11th byte from the file (it is 0 based).
As it is using global variables, be careful because this change will affect all downloads done after it, and may persist if the library is retained in the same process (and possibly may affect other threads in the same process, so not recomended for multi-thread apps).
To return to the original condition (no range) use:
dbx.set_download_range(-1,-1)
The Start position must be inside the file, to prevent an exception. The End position may be after the file end.
This approach is different than the one for upload, based in a session (appending chunks) controlled by Dropbox server. In this case it is up to you to create and control a "session" in the client side and merge the chunks.
Well, if someone wants to improve and do a more elegant implementation adding parameters to the download methods, please be welcome.
*I only tested with files_download_to_file method, but probably it will work with files_download as well.