'rename files that match strings in a txt file
I am trying to rename multiple files according to the match to a .txt file my files are
GCF_000698265.1_ASM69826v1_genomic.gff.gz
GCF_000785125.1_ASM78512v1_genomic.gff.gz
GCF_000934565.1_ASM93456v1_genomic.gff.gz
GCF_000963495.1_ASM96349v1_genomic.gff.gz
then my tab separated txt file looks like this:
GCF_000698265.1_ASM69826v1 Pseudomonas_str1
GCF_000785125.1_ASM78512v1 Pseudomonas_str2
GCF_000934565.1_ASM93456v1 Pseudomonas_str3
GCF_000963495.1_ASM96349v1 Pseudomonas_str4
So, for filenames that match the first column of the file, I want to rename the file as the second column. I was trying to understand how to do it piping mv and awk, but I got lost. I would like my desired output look like this:
Pseudomonas_str1_genomic.gff.gz
Pseudomonas_str2_genomic.gff.gz
Pseudomonas_str3_genomic.gff.gz
Pseudomonas_str4_genomic.gff.gz
Can anybody help in this? I hope I was clear and thanks a lot!
Solution 1:[1]
Using sed
and bash
, assuming the txt file is named 'rename.txt'
sed 's/^/mv /' rename.txt | bash
Using awk:
awk '{system("mv " $1 " " $2)}' rename.txt
The key here is to insert "mv " to the beginning of each line and execute.
This last solution does not use any external tool, just bash:
while read old new; do mv "$old" "$new"; done < rename.txt
Update
Based on Alberto's updated question, here are the changes:
Using sed:
sed sed 's/^/mv /;s/$/_genomic.gff.gz/' rename.txt | bash
Note: The ;s/$/_genomic.gff.gz/
expression said: search the end of the line and append "_genomic.gff.gz" to it. This will work only if you don't have trailing spaces in each line.
Using awk:
awk '{system("mv " $1 " " $2 "_genomic.gff.gz")}' rename.txt
Using Bash:
while read old new; do mv "$old" "${new}_genomic.gff.gz"; done < rename.txt
Solution 2:[2]
was trying to understand how to do it piping mv and awk
You might use AWK to prepare series of commands which then you use as standard input for bash
. Be warned that your case
file1.txt cat.txt
file2.txt dog.txt
file3.txt fish.txt
file4.txt mouse.txt
is specific as there are not spaces in filenames, if spaces are prohibitied in names then you might simply prepend lines with mv
for example if said files is named renaming.txt
then you might do
awk '{print "mv " $0}' renaming.txt | bash
however this will fail if there is space in any name. If spaces are allowed then I suggest to use python
(which is likely installed if you use linux machine) following way, create file renamer.py
with following content
import os
with open("renaming.txt","r") as f:
for line in f:
src, dst = line.rstrip().split("\t")
os.rename(src, dst)
where renaming.txt
is name of file with 2 tab-sheared columns holding current name and desired name then use it as follows
python renamer.py
How it works: open
s renaming.txt
for reading (r
) for each line it does jettison trailing whitespaces (newlines) and split line at TAB character, 1st part goes to src
, 2nd to dst
which are then used in os.rename
function.
You might select other language for that, preferably which has function for managing files, as this will made developing code for this task easier.
Solution 3:[3]
So i create a synthetic test-set that's somewhat sizable, and intentionally only made 1 / 7th of them match, and there are no duplicates anywhere, since the synthetic file names are all based on a unique list of primes, and the files are also in shuffled order.
254923
19113991 19113991 test_rename_output_2b.txt
254923
15545069 15545069 test_need_to_rename_2.txt
1784459 53025088 53025088 test_ref_lookup_2.txt
2294305 87684148 87684148 total
# gawk profile, created Thu May 12 03:57:36 2022
# Rule(s)
1 FNR == NR { # 1
1784459 do {
1784459 __[$!_]
1784459 getline
} while (FNR == NR)
}
1 FNR != NR { # 1
254923 do {
254923 if ($!_ in __) {
printf("gmv -vn \47%s\47 "\
"\47%s\47 ;\n",$!!NF,$NF)
}
} while (getline)
}
The advantage of this solution is that it's already pre-formatted for direct renaming using mv command (sample output) :
gmv -vn 'file522111333101.txt' 'newname_799042B2ED_.txt' ;
gmv -vn 'file2011113799793759.txt' 'newname_72518EBA3BC5F_.txt' ;
gmv -vn 'file476743673269.txt' 'newname_6F002325B5_.txt' ;
gmv -vn 'file7979798079897989.txt' 'newname_1C599585EE8185_.txt' ;
gmv -vn 'file211031042203.txt' 'newname_31226E289B_.txt' ;
gmv -vn 'file172888842428207.txt' 'newname_9D3DD209DF2F_.txt' ;
To play it safe, I've pre-pended the don't overwrite
aka no-clobber
aka -n
flag in all the renaming commands, which one can directly send into something light weight like dash
to execute without requiring further filename manipulation.
Performance is acceptable I suppose -
mawk2
took1.218 secs
to complete all steps (inclusive of writing final output file to disk).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Daweo |
Solution 3 | RARE Kpop Manifesto |