I am working on an AWS serverless app that queries a specific DropBox folder tree for daily PDF uploads. My process and config/code are below. I _think_ I understand how the API endpoint is supposed to work but the results I am seeing do not match what I expect. So the most likely explanation is that I actually do not understand how it works.
My App:
My app is simple. I watch a DropBox folder for daily PDF uploads and at the end of the day, download and merge all new PDFs into a single PDF. I am using the NodeJS DropBox pkg here : https://www.npmjs.com/package/dropbox-v2-api
I have no indication that the NodeJS package is not working as it should.
On a given day there are between 150-200 PDFs anywhere from a couple of MB up to 500MB. I'm not having any issues with the size of the PDFs. That part works great.
The Process:
- At 2:00 AM every morning I call the get_latest_cursor endpoint and store the cursor.
- At 3:00 PM every afternoon I call /files/list_folder/continue passing the stored cursor
- My config has:
- recursive = true
- include_deleted = false
- limit = 2000
What I expect to see is a list of all files added to the folder tree each day since the 2:00 AM cursor excluding files with the ".tag" : "deleted" property.
What I am seeing is that ".tag" : "deleted" files are included in the results. So where as my result set should be around 400 files including support JPG and PSD files as well as the PDFs, I am seeing about 900 files because all of the deleted files are included even though I am explicitly excluding them.
/**
* Get latest Dropbox cursor
* @param event
* @param callback
* @returns {*}
*/
module.exports.getLatestCursor = (event, callback) => {
const
s3 = getAwsS3()
, dropbox = getDropbox();
console.log('[index.js][getLatestCursor] STEP 01 -- Get Latest Cursor')
dropbox({
resource: 'files/list_folder/get_latest_cursor',
parameters: {
path : process.env.DROPBOX_WATCH_FOLDER,
recursive : true,
include_deleted : false,
include_non_downloadable_files : false,
include_media_info : false,
limit : 2000
}
}, (err, result, response) => {
if (err) { return console.log(err); }
console.log('[index.js][getLatestCursor] STEP 02 -- Prepare Latest Cursor', JSON.stringify(response))
const params = {
Bucket : process.env.S3_BUCKET_NAME,
Key : `cursor/${process.env.CURSOR_FILENAME}`,
Body : Buffer.from(JSON.stringify(response.body)),
ACL : 'private'
};
s3.upload(params, (err, data) => {
console.log('[index.js][getLatestCursor] STEP 03 -- Save Latest Token to S3', data)
if (err) throw err;
callback(null, data)
});
});
};
Then, my call to list files:
/**
* Get file list
* @param event
* @param callback
* @returns {*}
*/
module.exports.getFileList = (event, callback) => {
const
bucket = process.env.S3_BUCKET_NAME
, prefix = 'cursor'
, filename = process.env.CURSOR_FILENAME
const s3 = getAwsS3();
const params = {
Bucket: bucket,
Key: `${prefix}/${filename}`
}
s3.getObject(params, (err, data) => {
if (err) {
console.error(err);
throw err;
}
@', data)
let response;
const
dropbox = getDropbox()
, cursor = data.Body.cursor
, params = {
resource: 'files/list_folder/continue',
parameters: {
cursor : cursor
}
};
dropbox(params, (err, result, response) => {
if (err) {
console.error(err);
throw err;
}
@', result)
let iter = 0
, _debug_downloads = []
, _debug_all_hr = []
, _debug_all_lr = []
, entries = []
, downloadables = []
if (result && typeof result.entries !== 'undefined') {
entries = result.entries;
saveToS3Bucket(`debug/${formatDate(false, true)}/${kTS}/entries.json`, JSON.stringify(entries));
entries = entries.map((entry, i) => {
// Process the entries
});
// Storing results for debugging. Ignore this. It works fine.
saveToS3Bucket(`debug/${formatDate(false, true)}/${kTS}/proofs.csv`, _debug_all_lr.join("\r\n"));
saveToS3Bucket(`debug/${formatDate(false, true)}/${kTS}/artwork.csv`, _debug_all_hr.join("\r\n"));
saveToS3Bucket(`debug/${formatDate(false, true)}/${kTS}/downloadables.csv`, _debug_downloads.join("\r\n"));
}
// process result set.
});
});
};
My questions are:
- Why are the deleted files being included? They should not be should they?
- Am I using the cursor and the list_folder/continue correctly?
Thanks in advance.