'Subprocess loop in python

I have a folder with a few hundred .bed files that I want to loop over to extract fasta sequences. In the terminal, my command is:

twoBitToFa -bed=PA2_03_2bit.bed -udcDir=. https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.2bit stdout > PA2_03.fa

This works for the single bed file, but I'd rather not do this several hundred times.

I'm new to subprocesses and python, but this seems like it might be an option. I am open to other options.

So far I have:


import os
path_of_the_directory= '/home/2bit_L1_beds'
for filename in os.listdir(path_of_the_directory):
    f = os.path.join(path_of_the_directory,filename)
    if os.path.isfile(f):
        print(f)

which outputs the path to each file in the directory. To add the sub process, I tried:

import subprocess
import sys
import os
from subprocess import Popen, PIPE

path_of_the_directory= '/home/2bit_L1_beds'
for filename in os.listdir(path_of_the_directory):
    f = os.path.join(path_of_the_directory,filename)
    if os.path.isfile(f):
        result = subprocess.run([twoBitToFa -bed=f -udcDir=. https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.2bit stdout > f.fa],
                                capture_output=True, text=True)
        print(stdout)
        print(f)

I am getting "invalid syntax" and would appreciate some help! My goal is to have 1 .fa file output for each .bed file input.



Solution 1:[1]

Consider subprocess.Popen (which you import but do not use) and pass a list of command and arguments. In fact, you can even use cwd to change directory to path of files for relative referencing. Below assumes all files end with _2bit.bed to be replaced with .fa.

import os
from subprocess import Popen, PIPE

path_of_the_directory = "/home/mrsmeta/Axiotl/2bit_L1_beds"
url = "https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.2bit"

for f in os.listdir(path_of_the_directory):
    if os.path.isfile(f):
        print(f)

        cmd = [
            "twoBitToFa", f"-bed={f}", "-udcDir=.", url, 
            "stdout", ">", f.replace("_2bit.bed", ".fa")
        ]

        result = subprocess.Popen(
            cmd, stdin=PIPE, stdout=PIPE, stderr=PIPE, cwd=path_of_the_directory
        )

        output, error = result.communicate()
        print(output)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Parfait