I'm trying to utilize the API to scan my directories for .XLS files and read them. I've attempted this in two ways:
using '/search'
url = "https://api.dropboxapi.com/2/files/search"
data = {
"path": "/",
"query": ".XLS",
"start": 0,
"max_results": 1000,
"mode": {".tag":"filename"}
}
response = requests.post(url, headers=headers, data=json.dumps(data))
matches, more, start = response.json().values()
paths = [x['metadata']['path_display'] for x in matches]
while response.json()['more']:
data["start"] = start
response = requests.post(url, headers=headers, data=json.dumps(data))
matches, more, start = response.json().values()
paths.extend([x['metadata']['path_display'] for x in matches])
else:
return set(filter(lambda x: '.XLS' in x, paths))
and using '/list_folder' and '/list_folder/continue'
data = {"path": "/", "recursive": True}
url = "https://api.dropboxapi.com/2/files/list_folder"
response = requests.post(url, headers=headers, data=json.dumps(data))
entries, cursor, has_more = response.json().values()
paths = [x['path_display'] for x in filter(lambda x: '.XLS' in x['path_display'], entries)]
data = {
"cursor": cursor,
}
while has_more:
response = requests.post(url+'/continue', headers=headers, data=json.dumps(data))
print(path, len(paths))
entries, data['cursor'], has_more = response.json().values()
paths.extend([x['path_display'] for x in filter(lambda x: '.XLS' in x['path_display'], entries)])
else:
return set(paths)For some reason that I can't discern based on diffs of the results, the search option is not exhaustive. The majority of the files it's missing are "recent" files in that they were inserted into the directory (or a subfolder) within the last six months, but that doesn't hold for all of the files and doesn't seem to be a known issue with the API. I can much more easily parallelize '/search' and would prefer to use it, but I need to know that it is exhaustive.