'How to Create a Program Which Searches for Values from a .txt or any Text Document in Specific Folders
I am relatively new to programming and want to create a program which can solve a problem that I frequently have.
So here's the background to my short story: I was on a website which hosted many files (We're talking about around 500-1000 small files). I was then like," Oh sweet! I want to have all these things in my hard drive so I know that I have access to them... but am probably not going to use them either way". I proceeded to download all 500-1000 files on that site, but encountered a problem when I looked at the properties of my destination file. Let's say that out of 500 on the site, my computer only had 499 files. Just my luck. I wanted to know what was that one pesky file that slipped right by me and download that file specifically. What I didn't want to do was to delete all the files and then try my luck once more in downloading all the files from the website. On the site, there was no indication of what all files I downloaded, so I was completely in the blue. I could go in Ctrl+C each item, then Ctrl+V into the file manager search bar, but that would be tedious to repeat that 500 times.
Now, what I want to do: I wanted to go ahead and take all of the file names from the website (The file name that I downloaded and the file name that was in my drive are the same), put them all in a simple .txt document or something (The website has multiple unwanted text alongside the text I need, such as:
. If this is not possible to extract the text from the site like this, then I am ok with manually entering the names via copy paste). Then I want the computer to take these values in the document and then search for it in a specific folder path (Note: the actual files are in subfolders within the root folder I want to choose, so the program has to be able to search within multiple folders of the root). Then I want the computer to know if the value in the document, is present as a file. If the file doesn't exist, then I want that value/those values in the document to be displayed as the output. I want this cycle to repeat until all the values have been gone through. The output should list the values that were not present.
Conclusion: You probably now get at what I am trying to do, if you don't, tell me what I need to elaborate on. I really don't care how this program is made (what language or software), I just want something that works... but myself don't know how to create.
Thanks for reading and any response is appreciated!
Dhanwanth P :)
Solution 1:[1]
No worries; I found a solution by myself using Excel (God, it's powerful!).
Basically, I copied and pasted my values from the website, then used a filter to show the values only with .wav
. Then I used a Power Query from the folder to get me a list of all names of files in a folder. Finally, I went ahead and compared the two using a formula:
=IF(COUNTIF(B:B,D,"OK","MISSING")
If you need more elaboration, I'd be happy to help, just reply to this. There might be an easier way, but I personally liked the straight-forwardness of this. You only need Microsoft excel!
EDIT:
For me, I used these two videos which go over the power query and countif function:
How to Get the List of File Names in a Folder in Excel (without VBA): https://www.youtube.com/watch?v=OSCPVBWOqwc
How to Compare Two Excel Sheets (and find the differences): https://www.youtube.com/watch?v=8Ou_wfzcKKk
In my case, I made my sheet look like this:
Solution 2:[2]
Here's a solution in Python in case you would like to explore...
Similar to what you described, all files from the website are listed in an Excel file 'website_files.xlsx'
And all files are saved in a folder 'downloaded_wav'. The script will work regardless the files are saved in the root directory or sub-folders.
Then I run below Python script to look for the missing file:
import pandas as pd
import os
path_folder = 'C:\\Users\\Admin\\Downloads\\downloaded_wav'
downloaded_files = []
d,m = 0,0
for path_name, subfolders, files in os.walk(path_folder): #include all subfolders
for file in files:
d+=1
downloaded_files.append(file)
df = pd.read_excel('website_files.xlsx')
for file in df.values:
if file not in downloaded_files:
print('MISSING', file)
m+=1
print(len(df), 'files on website')
print(d, 'files downloaded')
print(m, 'missing file(s) found')
Output:
MISSING ['OLIVER_snare_disco_mixready_hybrid.wav']
3 files on website
2 files downloaded
1 missing file(s) found
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Stephen Ostermiller |
Solution 2 |