Alignment and phylogenetic inference with hmmalign and RAxML-ng

RAxML is one of the most popular programs around for phylogenetic inference via maximum likelihood. Similarly, hmmalign within HMMER 3 is a popular way to align amino acid sequences against HMMs from Pfam or created de novo. Combine the two and you have an excellent method for constructing phylogenetic trees. But gluing the two together isn’t exactly seamless and novice users might be deterred by a couple of unexpected hurdles. Recently, I helped a student develop a workflow which I’m posting here.

First, define some variables just to make the bash commands a bit cleaner. REF refers to the name of the Pfam hmm that we’re aligning against (Bac_rhodopsin.hmm in this case), while QUERY is the sequence file to be aligned (hop and bop gene products, plus a dinoflagellate rhodopsin as outgroup).

REF=Bac_rhodopsin
QUERY=uniprot_hop_bop_reviewed

Now, align and convert the alignment to fasta format (required by RAxML-ng).

hmmalign --amino -o $QUERY.sto $REF.hmm $QUERY.fasta
seqmagick convert $QUERY.sto $QUERY.align.fasta

Test which model is best for these data. Here we get LG+G4+F.

modeltest-ng -i $QUERY.align.fasta -d aa -p 8

Check your alignment!

raxml-ng --check --msa $QUERY.align.fasta --model LG+G4+F --prefix $QUERY

Oooh… I bet it failed. Exciting! In this case (using sequences from Uniprot) the long sequence descriptions are incompatible with RAxML-ng. Let’s do a little Python to clean that up.

from Bio import SeqIO

with open('uniprot_hop_bop_reviewed.align.clean.fasta', 'w') as clean_fasta:
    for record in SeqIO.parse('uniprot_hop_bop_reviewed.align.fasta', 'fasta'):
        record.description = ''
        SeqIO.write(record, clean_fasta, 'fasta')

Check again…

raxml-ng --check --msa $QUERY.align.clean.fasta --model LG+G4+F --prefix $QUERY

If everything is kosher go ahead and fire up your phylogenetic inference. Here I’ve limited bootstrapping to 100 trees. If you have the time/resources do more.

raxml-ng --all --msa $QUERY.align.clean.fasta --model LG+G4+F --prefix $QUERY --bs-trees 100

Superimpose the bootstrap support values on the best ML tree.

raxml-ng --support --tree $QUERY.raxml.bestTree --bs-trees $QUERY.raxml.bootstraps

And here’s our creation as rendered by Archaeopteryx. Some day I’ll create a tree that is visually appealing, but today is not that day. But you get the point.

6188 Total Views 10 Views Today
This entry was posted in Computer tutorials and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

WordPress Anti Spam by WP-SpamShield