It’s been a long time since I’ve had the bandwidth to write up a code snippet here. This morning I had not quite enough time between Zoom meetings to tackle something more involved, so here goes!
In this case I needed to find ~200 sequence (fasta) files for a student in my lab. They were split across several sequencing runs, and for various logistical reasons it was getting a bit tedious to find the location of each sequence file. To solve the problem I wrote a short Python script to wrap the Linux locate command and copy all the files to a new directory where they could be exported.
First, I created a text file “files2find.txt” with text uniquely matching each file that I needed to find. One of the great things about locate is that it doesn’t need to match the full file name.
head files2find.txt 151117_PAL_Sterivex_1 151126_PAL_Sterivex_2 151202_PAL_Sterivex_3 151213_PAL_Sterivex_4 151225_PAL_Sterivex_5 151230_PAL_Sterivex_6 160106_PAL_Sterivex_7 160118_PAL_Sterivex_9 160120_PAL_Sterivex_10 160128_PAL_Sterivex_11
Then the wrapper:
import subprocess
import shutil
with open('files2find.txt') as file_in:
for line in file_in:
line = line.rstrip()
## Here we use the subprocess module to run the locate command, capturing
## standard out.
temp = subprocess.Popen('locate ' + line,
shell = True,
executable = '/bin/bash',
stdout = subprocess.PIPE)
## The communicate method for object temp returns a tuple. First object
## in the tuple is standard out.
locations = temp.communicate()[0]
locations = locations.decode().split('\n')
## Thank you internet for this one-liner, Python one-liners always throw
## me for a loop (no pun intended). Here we search all items in the locations
## list for a specific suffix that identifies files that we actually want.
## In this case our final analysis files contain "exp.fasta". Of course if
## you're certain of the full file name you could just use locate on that and
## omit this step.
fastas = [i for i in locations if 'exp.fasta' in i]
path = '/path/to/where/you/want/files/'
found = set()
## Use the shutil library to copy found files to a new directory "path".
## Copied files are added to the set "found" to avoid being copied more than
## once, if they exist in multiple locations on your computer.
for fasta in fastas:
file_name = fasta.split('/')[-1]
if file_name not in found:
shutil.copyfile(fasta, path + file_name)
found.add(file_name)
## In the event that no files are found report that here.
if len(fastas) == 0:
print(line, 'not found')
