'rename files that match strings in a txt file

I am trying to rename multiple files according to the match to a .txt file my files are

GCF_000698265.1_ASM69826v1_genomic.gff.gz
GCF_000785125.1_ASM78512v1_genomic.gff.gz
GCF_000934565.1_ASM93456v1_genomic.gff.gz
GCF_000963495.1_ASM96349v1_genomic.gff.gz

then my tab separated txt file looks like this:

GCF_000698265.1_ASM69826v1  Pseudomonas_str1
GCF_000785125.1_ASM78512v1  Pseudomonas_str2
GCF_000934565.1_ASM93456v1  Pseudomonas_str3
GCF_000963495.1_ASM96349v1  Pseudomonas_str4

So, for filenames that match the first column of the file, I want to rename the file as the second column. I was trying to understand how to do it piping mv and awk, but I got lost. I would like my desired output look like this:

Pseudomonas_str1_genomic.gff.gz
Pseudomonas_str2_genomic.gff.gz
Pseudomonas_str3_genomic.gff.gz
Pseudomonas_str4_genomic.gff.gz

Can anybody help in this? I hope I was clear and thanks a lot!



Solution 1:[1]

Using sed and bash, assuming the txt file is named 'rename.txt'

sed 's/^/mv /' rename.txt | bash

Using awk:

awk '{system("mv " $1 " " $2)}' rename.txt

The key here is to insert "mv " to the beginning of each line and execute.

This last solution does not use any external tool, just bash:

while read old new; do mv "$old" "$new"; done < rename.txt

Update

Based on Alberto's updated question, here are the changes:

Using sed:

sed sed 's/^/mv /;s/$/_genomic.gff.gz/' rename.txt | bash

Note: The ;s/$/_genomic.gff.gz/ expression said: search the end of the line and append "_genomic.gff.gz" to it. This will work only if you don't have trailing spaces in each line.

Using awk:

awk '{system("mv " $1 " " $2 "_genomic.gff.gz")}' rename.txt

Using Bash:

while read old new; do mv "$old" "${new}_genomic.gff.gz"; done < rename.txt

Solution 2:[2]

was trying to understand how to do it piping mv and awk

You might use AWK to prepare series of commands which then you use as standard input for bash. Be warned that your case

file1.txt   cat.txt  
file2.txt   dog.txt
file3.txt   fish.txt
file4.txt   mouse.txt

is specific as there are not spaces in filenames, if spaces are prohibitied in names then you might simply prepend lines with mv for example if said files is named renaming.txt then you might do

awk '{print "mv " $0}' renaming.txt | bash

however this will fail if there is space in any name. If spaces are allowed then I suggest to use python (which is likely installed if you use linux machine) following way, create file renamer.py with following content

import os
with open("renaming.txt","r") as f:
    for line in f:
        src, dst = line.rstrip().split("\t")
        os.rename(src, dst)

where renaming.txt is name of file with 2 tab-sheared columns holding current name and desired name then use it as follows

python renamer.py

How it works: opens renaming.txt for reading (r) for each line it does jettison trailing whitespaces (newlines) and split line at TAB character, 1st part goes to src, 2nd to dst which are then used in os.rename function.

You might select other language for that, preferably which has function for managing files, as this will made developing code for this task easier.

Solution 3:[3]

So i create a synthetic test-set that's somewhat sizable, and intentionally only made 1 / 7th of them match, and there are no duplicates anywhere, since the synthetic file names are all based on a unique list of primes, and the files are also in shuffled order.

  • 254923 19113991 19113991 test_rename_output_2b.txt
  • 254923 15545069 15545069 test_need_to_rename_2.txt
 1784459  53025088 53025088 test_ref_lookup_2.txt
 2294305  87684148 87684148 total 

# gawk profile, created Thu May 12 03:57:36 2022

    # Rule(s)

     1  FNR == NR { # 1
1784459     do {
1784459         __[$!_]
1784459         getline
        } while (FNR == NR)
    }

     1  FNR != NR { # 1
254923      do {
254923          if ($!_ in __) { 
                    printf("gmv -vn \47%s\47 "\
                                    "\47%s\47 ;\n",$!!NF,$NF)
            }
        } while (getline)
    }

The advantage of this solution is that it's already pre-formatted for direct renaming using mv command (sample output) :

gmv -vn 'file522111333101.txt' 'newname_799042B2ED_.txt' ;
gmv -vn 'file2011113799793759.txt' 'newname_72518EBA3BC5F_.txt' ;
gmv -vn 'file476743673269.txt' 'newname_6F002325B5_.txt' ;
gmv -vn 'file7979798079897989.txt' 'newname_1C599585EE8185_.txt' ;
gmv -vn 'file211031042203.txt' 'newname_31226E289B_.txt' ;
gmv -vn 'file172888842428207.txt' 'newname_9D3DD209DF2F_.txt' ;

To play it safe, I've pre-pended the don't overwrite aka no-clobber aka -n flag in all the renaming commands, which one can directly send into something light weight like dash to execute without requiring further filename manipulation.

Performance is acceptable I suppose -

  • mawk2 took 1.218 secs to complete all steps (inclusive of writing final output file to disk).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Daweo
Solution 3 RARE Kpop Manifesto