              
|
PROTEIN PHYLOGENETICS
PAUP4.0*
can be used for an impressive range of analytical methods involving DNA
alignments. This,
unfortunately is not the case for estimating protein phylogenies. Only
protein parsimony analyses can be performed in PAUP (a very simple distance
method based on straight similarity can be performed as well but it is
not recommended, particularly for divergent sequences).
For sophisticated distance
and maximum likelihood analyses we need to use alternative programs. The
following practical aims at making you familiar with three sets of programs
that can be used for protein distance and protein maximum likelihood analyses,
they include:
|
For more information
on these programs and many others, this is an excellent web page
written and maintained by Joe
Felsenstein, packed with useful information on programs for
phylogeny. All three programs can read the PHYLIP
format simplifying their use in parallel.
PHYLIP
and PUZZLE have similar and
straight forward menu-driven options whereas PROTML makes use of
a command line interface.
PHYLIP
is one of the most comprehensive packages for phylogenetic analyses,
it is developed by Joe Felsenstein.
It contains a broad range of methods for DNA (parsimony, distance
and maximum likelihood) and protein analyses including protein distances
and a protein parsimony method.
|
You will be using
during these exercises several programs available in the PHYLIP
package in an effort to infer protein distance trees and perform
bootstrapping. For protein
maximum likelihood analyses we shall
use the program PUZZLE4.0 and PROTML2.2, the later being from the
MOLPHY package.
In
addition, PUZZLE can also be used in conjunction with PHYLIP to
perform more complex distance estimates. Of particular interest
in PUZZLE is the broad range of protein evolutionary models with
several matrices (e.g. PAM, JTT and BLOSUM62) and the possibility
to incorporate a correction for rate heterogeneity between sites
using a discreet gamma shape parameter with the possibility of assuming
a fraction of constant sites.
|
The dataset
you will be using has 6 taxa and 760 aligned amino acids positions.
It is a subset of the alignment of the largest subunit of the RNA
polymerase II that we have analyzed in detail to investigate the
phylogeny of Microsporidia (as discussed during the lectures - see
also Hirt et al. 1999 for more details). It is in the PHYLIP format
which can be used by all the programs you shall be using during
these practicals. There are also two additional files to be used
for user-defined tree comparisons (i.e. the Kishino-Hasegawa test)
and a command line file for the use of PUZZLEBOOT.
|
1)
PHYLIP programs.
PHYLIP3.52 programs can
be run on several platforms including UNIX
and Macintosh machines. Programs can be used to perform parsimony,
distance and maximum
likelihood analyses of both DNA and protein datasets. There are also
two programs for bootstrapping and calculating majority-rule consensus
trees. For DNA analyses PAUP has essentially superceded PHYLIP in terms
of the diversity and complexity of DNA evolution models. PHYLIP however
is still very useful for protein analyses. During these practicals you
will be using the following programs from PHYLIP:
|
PROTDIST:
calculate pairwise distances from protein alignments, it allows
the calculation of distances with several mutational data matrices
including the PAM matrix.
|
NEIGHBOR:
calculate a neighbor-joining (NJ)
tree from a distance matrix. It
does not assume a molecular clock. NJ trees are heuristic estimates
of minimum evolution trees but no
alternative trees are compared (NJ is an algorithm and does not
have an objective function), a unique tree is always produced without
any idea of the quality of the tree. Its main advantage is its speed
of execution, which might be considered to be a significant advantage
when numerous taxa have to analyzed.
|
FITCH:
calculate a least-square tree from a distance matrix. It does not
assume a molecular clock. If the distance are additive or close
to be additive the program should find the best tree. The program
conducts a search of tree-space and the best tree which optimizes
the difference between calculated distances and the inferred distances
on the tree is selected.
|
Click to enlarge.
|
Click to enlarge
|
Click to enlarge
|
Bootstrapping
is recommended in an effort to obtain some information on support for
nodes within the tree. If you analyze few taxa, say less than 40, it
is advisable to use the FITCH program described below. If the distances
are additive (or nearly additive), the NJ method can identify the same
tree as FITCH (see below).
|
SEQBOOT:
produces bootstrap replicates from DNA or protein alignments.
It is simply a method of producing many resampled datasets from
the original one. The bootstrapped data is then used by
one of the distance (or parsimony or likelihood) calculation programs.
|
CONSENSE:
calculate a majority-rule consensus tree, is used in conjunction
withSEQBOOT, PROTDIST and any tree inference program such as FITCH
or NEIGHBOR.
|
SEQBOOT
and CONSENSE can also be used in conjunction with PUZZLE, (see
PUZZLEBOOT information), to allow more complex models to be used
to estimate protein pairwise distances.
|
|

Click to enlarge
|

Click to enlarge
|

Click to enlarge
|
First you have to download
the files to be used during the exercises
|
- inf6.760
a PHYLIP file containing the protein alignment (6 taxa, 760 aligned
amino acids).
|
-
utrees.5 a PHYLIP file containing 5 trees that
were obtained with PROTML. Will be used to perform Kishino-Hasegawa
tests with PROTML and PUZZLE (see later exercises). |
-
puzzle.cmds a command line file to instruct
PUZZLE (using PUZZLEBOOT) to perform protein distance bootstrapping
with the models available in PUZZLE. |
<< Prev | Next
>> |