'How to check for new files in a folder in python
I am trying to create a script that will be executed every 10 minutes. Each time I have to check if there are new files in specific folder in my computer and if yes, there are some functions that would run on this file in order to get some values. These values will be written to excel file. The problem is that every time this function will be executed, the variables that contain the path to all the files will be generated again, and the program will go over all the files. How can I handle this problem? Thanks
Solution 1:[1]
Start by initializing variables:
savedSet=set()
mypath=… #YOUR PATH HERE
At the end of each cycle, save a set of file names, creation times and sizes in tuple format to another variable. When retrieving files, do the following:
-Retrieve a set of file paths
nameSet=set()
for file in os.listdir(path):
fullpath=os.path.join(mypath, file)
if os.path.isfile(fullpath):
nameSet.add(file)
-Create tuples
retrievedSet=set()
for name in nameSet:
stat=os.stat(os.path.join(mypath, name))
time=ST_CTIME
#size=stat.ST_SIZE If you add this, you will be able to detect file size changes as well.
#Also consider using ST_MTIME to detect last time modified
retrievedSet.add((name,time))
-Compare set with saved set to find new files
newSet=retrievedSet-savedSet
-Compare set with saved set to find removed files
deletedSet=savedSet-retrievedSet
-Run your functions on files with names from newSet -Update saved set
savedSet=newSet
Solution 2:[2]
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
class MyHandler(FileSystemEventHandler):
def on_any_event(self, event):
print(event.event_type, event.src_path)
def on_created(self, event):
print("on_created", event.src_path)
print(event.src_path.strip())
if((event.src_path).strip() == ".\test.xml"):
print("Execute your logic here!")
event_handler = MyHandler()
observer = Observer()
observer.schedule(event_handler, path='.', recursive=False)
observer.start()
while True:
try:
pass
except KeyboardInterrupt:
observer.stop()
- pip install watchdog
- Create a scheduled task for this script in the Task scheduler and monitor the folder where the file will be created.
Solution 3:[3]
import operator
from stat import ST_CTIME
import os, sys, time
path = str(os.getcwd()) + '/' ; #or you can assign the return value of your
#function (the updated path as per your question)
#which operates on the file 'new_file' to this variable.
files = os.listdir(path);
def mostRecentFile(path):
all_files = os.listdir(path);
file_ctime = dict();
for file in all_files:
file_times[e] = time.time() - os.stat(e).st_ctime;
return sorted(file_times.items(), key=operator.itemgetter(1))[0][0]
new_file = mostRecentFile(path)
The code returns only one file, which is the newest in the directory (as per your requirement). The variable new_file
has the file name returned by the function mostRecentFile
, which is the one most recently created in the present directory given by the variable path
. You can tweak that to change how you want the path to be fed - current working directory or by changing to the desired directory. Given to your requirement, I think you want the current directory, and the same is used by the code.
I have considered creation time by using st_ctime
. You can use the modification time by replacing st_ctime
with st_mtime
.
You can pass this newly created file new_file
to your function, and assign the new path that is generated by this function to the variable path
.
Solution 4:[4]
first, run for the first time this script in your directory to create a "files" file
import os
import pandas as pd
list_of_files=os.listdir()
list_of_files.append('files.csv')
pd.DataFrame({'files':list_of_files}).to_csv('files.csv')
then in your main script add this:
import pandas as pd
import os
files=pd.read_csv('files.csv')
list_of_files=os.listdir()
if len(files.files)!=len(list_of_files):
#do what you want
#save your excel with the name sample.xslx
#append your excel into list of files and get the set so you will not have the sample.xlsx twice if run again
list_of_files.append('sample.xslx')
list_of_files=list(set(list_of_files))
#save again the curent list of files
pd.DataFrame({'files':list_of_files}).to_csv('files.csv')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Dorijan Cirkveni |
Solution 2 | user3349907 |
Solution 3 | |
Solution 4 | Billy Bonaros |