Child Item

Searching for a specific file type

Hi there,

I'll ask for some patience and forgiveness in advance. Im about 2 weeks in to Python devleopment, so Im likely missing some obvious approaches - please dont assume a lot of knowledge on my part if you can help.

Objective: I've been tasked with 'crawling' through folders on DropBox via the API to look for certain image types (specific file extensions only - *.dco for reference - as I wont have knowledge of the file names), and then extracting the path and filename (and then 'do stuff'). Locally I have already completed this code (in other words, if the files are on my computer it works fine) - but now it needs to work in DropBox as well because the data sets will be quite large. I cannot assume that the files in the folders will be the types I want - hence I need to search for filename extensions.

I have access to the DropBox and authorization sorted. I've created a folder and put in some temp files (which are the pdf's and jpeg's provided by DropBox for testing). I can query the folder, and return a list of results via files_list_folder.

I can get a list of files, extensions and paths via the code below - however the issue I am having is that I cannot parse the data based on file extension, and the rudementary methods I am using are not working.

While I can get a list of files, and even 'copy' them to another list - I cannot find a way to parse the list to give me the path_lower/ the directory and filename - which will give me the extension (ie: find all *.jpg's). I must be missing something in the manner in which the data is constructed (I understand its instance/object based). I have been assuming Im not hitting on the correct keyword combinations to extract the data - so Im looking for some help in identifying where Im going wrong. Thanks in advance!

my_client=Dropbox(token)
folderfile_list = my_client.files_list_folder('', True, True)

#this gives me a nice list of items - however I dont seem to be able to *do* anything with it
for item in folderfile_list.entries:
    if isinstance(item, dropbox.files.FileMetadata):
        name = item.name
        fileID= item.id
        fileHash = item.content_hash
        path= metadata.path_lower
        print(name, path)

#This does return the search results I want - but its not iterable - so I dont seem to be able to do anything with it
files_search = my_client.files_search('', '*.pdf')
print(files_search)

type(files_search)
Out[325]: <class 'dropbox.files.SearchResult'>


#this returns nothing
for files in folderfile_list.entries:
    if files.path_lower == '*.jpg':
        print("yes")

#this returns nothing
for item in folderfile_list.entries:
    if entry.path_lower == '*.jpg':
        print("I got it")
    else:
        print("still nothing")

#this also doesnt work
import fnmatch
pattern ='*.jpg'
matching = fnmatch.filter(folderfile_list.entries, pattern)
print(matching)

fname = []
for i in folderfile_list.entries:
fname.append(i)
print(fname[1])

import fnmatch
pattern ='*.jpg'
matching = fnmatch.filter(fname, pattern)
print(matching)

#this did work - however I cannot find a file TYPE with this - the specific file #name I can find - but not the file extension
#In other words if I change this it *.pdf - it does not get a 'happy' result 
for files in fname:
    if files.path_lower == '/test folder/strategy-session-hotel.pdf':
        print("happy")
        print(files.path_lower)
    else:
        print('unhappy')

Find more posts tagged with

Developers

Comments

Bin2

I believe I have solved my own problem - incase anyone else needs it. Its not pretty - but it works.

spot=[]
holder=[]
holder=dbx.files_list_folder('/Test Folder')
print(holder)
for files in holder.entries:
  spot.append(files.path_lower)

print(spot)

pattern = '*.jpg'
matching = fnmatch.filter(spot, pattern)
print(matching)

['/test folder/az-car-rental.jpg', '/test folder/il-car-rental.jpg', '/test folder/car-rental-invoice.jpg', '/test folder/dinner-receipt.jpg', '/test folder/lunch-receipt.jpg', '/test folder/meal-receipt.jpg', '/test folder/meetup-dinner.jpg', '/test folder/team-offsite-lunch.jpg', '/test folder/training-airfare.jpg', '/test folder/training-hotel-invoice.jpg', '/test folder/travel-meal.jpg']

Greg-DB

I'm glad to hear you already got this working. You have the right idea in that you can call files_list_folder to list the contents of a folder, and then check the Metadata.path_lower (or Metadata.name) for the returned entries to see if the file extension is one you're interested in.

Note though that you should also implement files_list_folder_continue to make sure you can receive all of the entries. Check out the files_list_folder documentation for more information.

Also, one alternative for your file extension check may be to use the 'endswith' method like this:

files.path_lower.endswith(".jpg")