Comments on: Parsing blast xml output

By: Ben

Ben — Wed, 30 Aug 2023 06:31:01 +0000

Since I am reading this today, some might be tempted to use this in 2023: be careful when using .strip(foo) and .rstrip(bar)

They do not remove the foo and bar prefixes/suffixes, they would remove from the ends of the string (or only the tail when using rstrip) every character belonging in the set ‘foo’ or ‘bar’.

Example :
”’
>>> s = “id_seed”
>>> s.strip(“”).strip(“ 'seed'
'''

…and the "id_" is lost!

By: Jeff

Jeff — Fri, 19 Apr 2013 18:45:21 +0000

I just learned a great trick that makes this a whole lot easier. Python has a module (gzip) that allows gzipped files to be read as text files, e.g.

import gzip
with gzip.open('really_big_file.xml.gz', 'rb') as xml:
   for line in xml:
      do some stuff

This allows massive blast output files to stay compressed at about 1/5 their inflated size. I've placed an updated script here. The updated script also very niftily creates a fasta of hits, as aa seqs if blastx was used.