nav[aria-label="Primary Navigation"] { padding: 0; & ul { list-style: none; width: 100%; display: flex; flex-direction: row; justify-content: start; align-items: start; gap: 30px; padding: 0; & li { margin: 0; } & ul li { list-style: none; } } }

Child Item

Robust error handling with Python SDK

bioin4matics

I'm writing a file-transfer service that uses the dropbox API via the SDK (8.0.0 release). Most of the time it works, but there have been a couple times where I've encountered errors that I'd like to handle gracefully. I'm looking to see how others might have handled it, and also some potential reasons why they are occurring.

Namely, I've received two errors that I'd like to handle:

1. timeout errors when transferring larger chunks (here 140MB although API states 150MB is the max size). The timeout has not yet happened if I reduce the chunk size (to say, 100MB) but obviously I need something more robust than a heuristic. The Python requests library has some timeout kwargs, but I don't see a way to invoke those with extra kwargs via the dropbox library. Is there any way to play with that? StackOverflow seems to suggest there's a setting in my OS that I could change to alter the default timeout (I would rather not do that...).

2. The more common error I've received (even on very small chunks of 10MB) are connection reset "errors", e.g. from my logger:

2017-08-17 15:50:49,218:ERROR:<class 'requests.exceptions.ConnectionError'>
2017-08-17 15:50:49,218:ERROR:('Connection aborted.', error(104, 'Connection reset by peer'))

It's so hit-or-miss that it's ****** to debug, especially since I am not sure what happens to the upload session (see below).

Now that the problems are described, the minimum/simplified working code snippet (before adding any exception handling) is:

import os
import dropbox

DEFAULT_CHUNK_SIZE=100*1024*1024

token = 'api token goes here'
client = dropbox.dropbox.Dropbox(token)

local_filepath = '/home/foo/bar/baz.txt'
file_size = os.path.getsize(local_filepath)
path_in_dropbox = '/%s' % os.path.basename(local_filepath)

file_obj = open(local_filepath)
if file_size <= DEFAULT_CHUNK_SIZE:
        client.files_upload(file_obj.read(), path_in_dropbox)
else:
        i = 1
        session_start_result = client.files_upload_session_start(file_obj.read(DEFAULT_CHUNK_SIZE))
        cursor=dropbox.files.UploadSessionCursor(session_start_result.session_id, offset=file_obj.tell())
        commit=dropbox.files.CommitInfo(path=path_in_dropbox)
        while file_obj.tell() < file_size:
                print 'Sending chunk %s' % i
                if (file_size-file_obj.tell()) <= DEFAULT_CHUNK_SIZE:
                        print 'Finishing transfer and committing'
                        client.files_upload_session_finish(file_obj.read(DEFAULT_CHUNK_SIZE), cursor, commit)
                else:
                        print 'Before append, cusor.offset=%d, file_obj is at %d' % (cursor.offset, file_obj.tell())
                        client.files_upload_session_append_v2(file_obj.read(DEFAULT_CHUNK_SIZE), cursor)
                        cursor.offset = file_obj.tell()
                i += 1
file_obj.close()

For the case of the connection reset (item 2), my main question hinges on what happens to the upload session if the ConnectionError is raised. Is the session "preserved" or do I need to start a new one? If the former, I figure I can do something like the following:

<...snip...>
try:
    current_offset = file_obj.tell()
    client.files_upload_session_append_v2(file_obj.read(DEFAULT_CHUNK_SIZE), cursor)
except requests.exceptions.ConnectionError as ex:
    file_obj.seek(current_offset)
    cursor.offset = current_offset 
<...snip...>

The idea here is that if the current chunk fails due to the reset error, catch the exception and ensure that both the cursor (dropbox.files.UploadSessionCursor) and the file object get reset to the previous spot so it can try again. This is assuming that the failed call to files_upload_session_v2 moves the pointer in the file object (file_obj)....I'm not sure if it does, but this seems OK even if it does not. Is there a better/more graceful way to get the chunk to retry?

Find more posts tagged with

API

Abuse

Accepted answers

Greg-DB

1. The 150 MB limit is technically not an exact limit. It depends on some network conditions which can vary, so even uploads smaller than that can sometimes fail.

It looks like you already found how to change the timeout, but in general we do recommend using chunk sizes smaller than that to begin with. Using a smaller chunk size adds overhead by increasing the number of requests necessary, but it lowers the chance of any call failing, and reduces how much you need to re-upload when one does.

There isn't an optimal chunk size since this will depend on various factors, so you may want to try a few different sizes and see what works best for your app.

2. The connection reset error looks like transient network or infrastructure issues, so you should just have your app automatically catch and retry these.

Upload sessions are good for 48 hours after being created. They aren't invalidated if any particular request after that fails. So, when you get one of these requests, you can just try again with the same upload.

Depending on when during the API call the connection failed, the offset may or may not change. It's probably much more likely that it will fail before all of the data has been uploaded, in which case the offset wouldn't change, so you have the right idea for handling that by re-seeking your file object. (If it does fail after the full request has been received but before your app gets the response, your app will then have the wrong offset. Although unlikely, your app can catch that particular case on the next call, which will fail with an `incorrect_offset` error, containing the actual expected offset.)

All comments

bioin4matics

Update RE: timeouts.

I did find out that I can set a timeout kwarg on the dropbox.dropbox.Dropbox constructor.
e.g.
client = dropbox.dropbox.Dropbox(oauth2_token, timeout=60)

Greg-DB

1. The 150 MB limit is technically not an exact limit. It depends on some network conditions which can vary, so even uploads smaller than that can sometimes fail.

There isn't an optimal chunk size since this will depend on various factors, so you may want to try a few different sizes and see what works best for your app.

2. The connection reset error looks like transient network or infrastructure issues, so you should just have your app automatically catch and retry these.

bioin4matics

Great- thanks! Just what I needed to know (i.e. I do not need to create a new session).

If it helps anyone else, here's how I handled that particular exception (in terms of chasing down where one can get the correct offset). I'm not (yet) concerned about other potential reasons for raising the ApiError:

try:
    client.files_upload_session_append_v2(file_obj.read(DEFAULT_CHUNK_SIZE), cursor)
    cursor.offset = file_obj.tell()
except dropbox.exceptions.ApiError as ex:
    if ex.error.is_incorrect_offset():
        correct_offset = ex.error.get_incorrect_offset().correct_offset # these methods don't read so naturally
        cursor.offset = correct_offset
        file_obj.seek(correct_offset)
    # other potential problems here...

I suppose there could also be some type-checking in there since I'm not sure ex.error is always an object, nor one that has a `is_incorrect_offset` method.