'Working with large folders in Apps script
I have a folder with over 50,000 files in it and it's so big that even getting all the file id's causes a 6 minute timeout. In fact it's so large I cannot even count the number of files. This is a script I made to count the files. Apologies if it doesn't run. I modified it to post here to make it easier to read.
//This requires the Drive API To be turned on in the Advanced Google Services
function (folder){
var allDriveFiles = folder.getFiles();
var query = 'trashed = false and "' + folder.getId() +'" in parents';
var counter =0;
var pageToken
do {
var filesInFolder = Drive.Files.list({q: query,maxResults: 100, pageToken: pageToken});
counter += filesInFolder.items.length;
pageToken = filesInFolder.nextPageToken;
Logger.log('filesInFolder: ' + counter);
} while (pageToken);
Logger.log('filesInFolder: ' + counter);
}
The first Logger.log is there because it always times out before reaching the second. It gets to about 13000 files by timeout.
So questions then:
Is there a better way of counting the files in a folder?
If I want to work with a huge folder and I want to delete certain files, is there a decent way to do it? Currently I have created a second folder and I either delete the file if needed or move the current file to the second folder. This way I am slowly emptying out the huge folder, deleting the files I don't want as I go or moving files I want to keep to the other folder. When I reach 6 minutes I rerun the file and continue. By the way this is amazingly slow. I have been sitting here 4 hours already and the job is only part done.
Solution 1:[1]
There are 2 optimizations that will go a long way to being able to list files in a folder faster. The first is to just bump up the value of maxResults
. Google's limit for the v2 API appears to be 1000. This is not documented, but it makes sense since it matches the v3 API, which does have a documented limit. Note that you probably won't get 1000 results per page (I found myself limited to about 460) unless you use the 2nd optimization.
The 2nd optimization will allow you to hit the 1000 files per page limit and will make requests return faster. Use the fields
option to limit which fields are included in the response to only those you actually need.
function countFiles(folderId){
let query = 'trashed = false and "' + folderId +'" in parents';
let count = 0;
let pageToken;
do {
const resp = Drive.Files.list({
maxResults: 10000, // huge number meaning "as many as possible"
pageToken: pageToken,
q: query,
fields: 'items/id,nextPageToken', // get as few fields as possible
});
Logger.log("got %d items", resp.items.length);
count += resp.items.length;
pageToken = resp.nextPageToken;
} while(pageToken);
return count;
}
Solution 2:[2]
Apps Script has its limitations.
Depending on the type of account you have, your script maximum runtime can be:
- 6 minutes - if you are using one of the following types of accounts: consumer account, Google Workspace free edition, Google Workspace Basic;
- 30 minutes - if you are using a Google Workspace Business / Enterprise / Education account;
However, since you want to retrieve such a big number of files and you are already using the Drive Advanced Service the best solution is to use the Drive API itself.
A solution is to start with the Drive API v3 Node.js Quickstart here and then later adapt it to your needs.
Reference
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | 9072997 |
Solution 2 | ale13 |