Exercise 3: Multiple Alignment and Phylogeny

A phylogenetics exercise:
What is the difference between an ortholog and a paralog?

Make a multiple alignment and phylogenic tree of these four protein sequences using ClustalX:

    STN2_HUMAN   Q93045
    STN2_MOUSE   P55821
    STN1_RAT     P13668
    ST2A_XENLA   Q09001

First, retrieve the protein sequences from GenBank using the NCBI ENTREZ webserver:
http://www.ncbi.nlm.nih.gov
Remember to choose Protein from the pulldown menu. Set the Display format to FASTA and copy the 4 protein sequences into one text file in multiple FASTA format (use Notepad).

Now, either download and install CLUSTALX on your computer:
http://www-igbmc.u-strasbg.fr/BioInfo/ClustalX
or use a web CLUSTAL server:

    http://www.ebi.ac.uk/clustalw
    http://www.cmbi.kun.nl/bioinf/tools/clustalw.shtml
    http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_clustalw.html

One advantage of most of the web-based tools is that they immediately produce a graphical phylogenetic tree. If you use a local copy of CLUSTALX, then you have to take the text tree file into some tree drawing software. I usually use the Pylodendron website:
http://iubio.bio.indiana.edu/treeapp/treeprint-form.html
but there are many stand alone programs for Windows (and Mac) computers i.e: TreeView

Once you make the alignment and the tree, you will immediately see something strange. The rat gene clusters away from the other 3. How can this be? Mouse is clearly closer to rat than to Xenopus (African frog).

The best solution to this kind of question is to add more sequences to your analysis. Go to the NCBI BLAST web page and collect the sequences for top 5 hits (in FASTA format) using each of the 4 sequences as the query.
http://www.ncbi.nlm.nih.gov/BLAST
[You should immediately realize that something is up when you look at the pattern of overlaps between the various lists.]

Now re-build the multiple alignment with all of the sequences (remove the obvious duplicates).

Your resulting tree should help you figure out what is going on here. Just because a group of sequences has enough similarity to make an alignment does not mean that they are really all the same orthologous gene.

You should now begin to realize how important it is to establish that a pair of genes are truly orthologs before generalizing from the function of one to the other.


Stuart Brown - RCR
Last modified: Mon Oct 28 12:07:48 EST 2002