Dropbox API search vs. recursive list_folder with filtered results

Question

I'm trying to utilize the API to scan my directories for .XLS files and read them. I've attempted this in two ways:

using '/search'

url = "https://api.dropboxapi.com/2/files/search"
data = {
    "path": "/",
    "query": ".XLS",
    "start": 0,
    "max_results": 1000,
    "mode": {".tag":"filename"}
}
response = requests.post(url, headers=headers, data=json.dumps(data))
matches, more, start = response.json().values()
paths = [x['metadata']['path_display'] for x in matches]
while response.json()['more']:
    data["start"] = start
    response = requests.post(url, headers=headers, data=json.dumps(data))
    matches, more, start = response.json().values()
    paths.extend([x['metadata']['path_display'] for x in matches])
else:
    return set(filter(lambda x: '.XLS' in x, paths))

and using '/list_folder' and '/list_folder/continue'

data = {"path": "/", "recursive": True}
url = "https://api.dropboxapi.com/2/files/list_folder"
response = requests.post(url, headers=headers, data=json.dumps(data))
entries, cursor, has_more = response.json().values()

paths = [x['path_display'] for x in filter(lambda x: '.XLS' in x['path_display'], entries)]
data  = {
    "cursor": cursor,
}
while has_more:
    response = requests.post(url+'/continue', headers=headers, data=json.dumps(data))
    print(path, len(paths))
    entries, data['cursor'], has_more = response.json().values()
    paths.extend([x['path_display'] for x in filter(lambda x: '.XLS' in x['path_display'], entries)])
else:
    return set(paths)

For some reason that I can't discern based on diffs of the results, the search option is not exhaustive. The majority of the files it's missing are "recent" files in that they were inserted into the directory (or a subfolder) within the last six months, but that doesn't hold for all of the files and doesn't seem to be a known issue with the API. I can much more easily parallelize '/search' and would prefer to use it, but I need to know that it is exhaustive.

Greg-DB · Accepted Answer

The results returned by the /2/files/search endpoint are not technically exhaustive. For search queries that have a very large number results, all of them may not be returned. Specifically, there is a max value limit of 9,999 for the 'start' parameter, so if there are more than 10,000 matches, you won't be able to retrieve everything.

For use cases where you have that many entries to retrieve, please use /2/files/list_folder[/continue] instead.

If that doesn't seem to be the issue here though, please open an API ticket with details on the missing search results so we can look into it for you.