Quick Start Tutorial.. 1

Introduction.. 2

Starting PAUP*. 4

1 Examine the data file. 4

2 Execute the data file. 5

Logging results.. 6

1 Start logging.. 6

2 Stop logging.. 6

Summarizing the data.. 7

Managing the data.. 8

1 Exclude characters. 8

2 Delete taxa.. 8

Defining assumptions.. 9

1 Add character weights. 9

2 Set character types. 9

3 Save current assumptions. 9

4 Recall assumptions. 10

Searching for Trees.. 11

1 Define optimality criterion.. 11

2 Define search strategy.. 11

Printing trees.. 13

1 Display trees. 13

2 Describe trees. 13

3 Print low-resolution trees. 15

4 Print high-resolution trees. 15

Saving results.. 16

Setting the optimality criterion to distance. 17

1 Set the optimality criterion.. 17

2 Display distances. 17

3 Build a neighbor joining tree. 17

4 Build a least squares tree. 17

Setting the optimality criterion to likelihood.. 18

1 Set the optimality criterion.. 18

2 Evaluate the parsimony tree. 18

3 Set likelihood model parameters. 18

4 Start the tree search.. 19

Submitting commands in batch file. 20

Submitting commands in batch file. 21

Moving on.. 22

References.. 22

The following hands-on tutorial provides a very brief overview of the basic usage of PAUP* 4.0. The tutorial will take you step-by-step through an analysis of one of the sample data files included on your distribution disk and also available on the world wide web at http://paup.csit.fsu.edu/data/primate-mtDNA-interleaved.nex . This tutorial was designed for people with no prior experience using PAUP*. If you are already familiar with PAUP* then you will probably wish to skip this tutorial. We assume that users are familiar with basic phylogenetic terminology and operating system specific issues. As you become more experienced using PAUP* 4.0, you will discover that there are many alternative ways to execute the operations described below. For obvious reasons, we have chosen not to describe all the possibilities in this tutorial; however, we encourage you to explore other menu and command-line options as your time permits.

 

The Windows interface is almost entirely command-line driven. Some menu functions are available in the Windows interface; however, these functions mostly include file and edit operations. This tutorial will use both menu options and command-line syntax to demonstrate the different environments under which PAUP* may be run.

 

Throughout this tutorial we follow several typographical conventions. First, menus, menu items, and items contained in dialog boxes or elsewhere on the screen are given in a bold san serif font. For example, the text File > Open means click "File" from the main menu and then select "Open" from the menu items under "File." Second, text that is intended to be typed by the user at the command-line prompt or into a dialog box is given in a plain fixed-width font. For example, the instructions "Type: weights 2:1stpos" mean that everything after "Type:" should be entered exactly as it appears. Finally, interface specific instructions are offset and bulleted, whereas all other text pertains to all of the PAUP* interfaces.


Starting PAUP*

1 Examine the data file

  • Double-click the PAUP application icon.

 

 

  • PAUP* will automatically launch the open dialog box when it is first started.

 

 

  • Select the file named primate-mtDNA-interleaved.nex in the Sample NEXUS data folder.
  • Open the file in PAUP*’s editor by changing the initial mode from Execute to Edit and click Open/Execute

 

In the editor, scroll through the sample file. Notice that the file is divided into blocks of text, delimited by the words "begin" and "end". The word following "begin" defines the block-type. In this example, the following types of blocks are used: taxa, characters, assumptions, and paup. There are, however, numerous other NEXUS block-types. In fact, one of the advantages of the NEXUS format is that applications will simply skip over blocks that they do not recognize. For a more detailed discussion of the NEXUS format see Maddison, et al. (1997). For this example, you will not need to modify the original sample file.

2 Execute the data file

Close the sample file and do the following:

 

  • Select File > Execute "primate-mtDNA-interleaved.nex"

 

After executing the sample file, PAUP* will display comments and some general information about the data. For this example, the source of the data set is given, followed by a section reporting the dimensions of the data matrix, the type of data, etc. As of yet, no analyses have been conducted; PAUP* has simply processed the data and is now waiting to be told what to do next.


Logging results

Ordinarily, you will want to log the results of a PAUP* session to a disk file to have a record of the results of your analyses.

 

1 Start logging

  • Select File > Log Output to Disk...

 

 

  • Under Filename: type practice.log and click Start

 

2 Stop logging

Logging can be started and stopped anytime during your PAUP* session. To stop logging do the following:

 

  • Select File > Log Output to Disk...

Summarizing the data

Now that the data matrix has been processed, you can use PAUP* to obtain basic summary information about the data set. To start, you will display information about the characters included in sample data set.

 

  • Type: cstatus;


    click Execute.

 

PAUP* will display a summary of the current character status (i.e., types, weights, etc.). Remember, if logging was turned on, the summary information displayed to your screen will also be saved to the log file. You may also choose to display a summary of the taxa (tstatus), the entire data matrix (showmatrix), and more.


Managing the data

PAUP* provides several ways to restrict analyses to a subset of the taxa and characters included in a data matrix. For example, the sample data set includes protein coding and non-coding regions of primate mitochondrial DNA. Suppose we wish to analyze only the coding regions of the data. The characters belonging to these regions have already been identified in the sample file using the charsets command. Character sets simplify certain procedures by allowing you to refer to a group of characters by a single name. You will start by excluding all characters in the data set except for the coding regions.

 

1 Exclude characters

  • Type: include coding/only;

 

2 Delete taxa

You will also restrict your analyses to all but five species of hominoids and three other primates species used as the outgroup taxa. The five hominoids (Homo sapiens, Pan, Gorilla, Pongo, and Hylobates) have already been identified in the sample file using the taxset command. In the same way that charset allows you to refer to a group of characters by a single name, taxset allows you to refer to a group of taxa by a single name.

 

  • Type: undelete hominoids lemur_catta macaca_fuscata saimiri_sciureus/only;

 

Notice that spaces in taxon names must be replaced with an "_" (underscore character) or enclosed in single quotes when entered at the command-line. Also, PAUP* does not pay attention to the character case in taxa labels. Finally, be aware that when you exclude characters or delete taxa using the exclude and delete commands respectively (or the menu equivalents) you do not actually modify the data file. That is, the next time you execute the sample data set all of the characters and taxa will be included.


Defining assumptions

Before you begin an analysis there is a good chance that you know something about the characters in your data matrix, which might suggest that the characters should be differentially weighted. For example, we know that substitutions at the first codon position generally occur less frequently than substitutions at third positions. The simple explanation for this is that substitutions at first position codons usually result in an amino acid substitution; whereas, third-position changes can occur without changing the amino acid translation. You will incorporate this information into the following analysis by applying a higher weight to substitutions occurring at first position codons. Codon positions have already been identified in the sample file using the charset command.

1 Add character weights

  • Type: weights 2:1stpos;

2 Set character types

By default, PAUP* considers all transformation costs to be equal. In this section, you will invoke a character type that will assign a higher weight to transversions than to transitions. More specifically, we will assume that transversions, changes from a purine (A or G) to pyrimidine (C or T), are two times the cost of transitions, changes from a purine to a purine and pyrimidine to a pyrimidine. One way to incorporate this assumption into the analysis is to set up a transition/transversion "step matrix”. Such a step matrix has already been defined in the sample file. To apply the transformation cost to all of the characters currently being considered, do the following:

 

  • Type: ctype 2_1:all;

 

3 Save current assumptions

Up to this point you have excluded characters, deleted taxa, weighted characters, and defined character transformation types. If for some reason you had to abandon your analyses and close PAUP*, you would have to select all of the menu options or repeat the commands previously entered to get back to where you are now. One way to avoid this potentially time-consuming task is to save your assumptions to a file that can be recalled at a later time.

 

  • Type: saveassum file=tutorial.dat;

4 Recall assumptions

Restart PAUP* and execute the file primate-mtDNA-interleaved.nex as you did in the beginning of the tutorial. Do the following to recall the previous set of assumptions:

 

  • Select File : Open... and select tutorial.dat
  • Change the Initial mode from Edit to Execute and click Execute

 

You should now be back to where you started. To be sure the assumptions are in effect issue the command cstatus from the command-line. You should get the following output:

 

Character-status summary:

  Current optimality criterion = parsimony

  205 characters are excluded

  Of the remaining 693 included characters:

    All characters are of user-defined type "2 1"

    462 characters have weight 1

    231 characters have weight 2

    296 characters are constant

    155 variable characters are parsimony-uninformative

    Number of (included) parsimony-informative characters = 242


Searching for Trees

1 Define optimality criterion

PAUP* 4.0 has the advantage of being able to analyze data using several different optimality criteria; parsimony, likelihood, and distance. Several chapters in this manual and a plethora of published literature are devoted to comparing the performance of optimality criteria. Rather than spend time here discussing the relative merits of the available optimality criteria, we will just say that each criterion has its strengths and limitations. To begin with, you will use the default criterion, maximum parsimony, to search for optimal trees. Later in this tutorial you will search under the other criteria.

 

  • Type: set criterion=parsimony;

 

2 Define search strategy

PAUP* provides two basic classes of methods for searching for optimal trees; exact and heuristic. Exact methods guarantee to find the optimal tree(s) but may require prohibitive amounts of computer time for medium to large-sized data sets. Heuristic methods do not guarantee optimality but generally require far less computer time. Even though the current data set is relatively small, you will start by conducting a heuristic search.

 

  • Type: hsearch addseq=random;

 


Once the search is started, PAUP* will display general information about the options and assumptions being used during the search. If you were logging results, this information would be saved to the log file. When the search completes, PAUP* will display general information about the results of the search.

 

 

  • Click Close to dismiss the search status dialog box

Printing trees

1 Display trees

According to the output on your screen, there is a single tree currently in memory. To display the tree do the following:

 

  • Type: showtrees;

2 Describe trees

The showtrees command draws a simple picture of the branching order of the taxa.

 

 


Say for example, you want to know something about the branch lengths of the tree. To get a more detailed picture of the tree do the following:

 

  • Type: describetrees 1/plot=phylogram brlens=yes;

 


3 Print low-resolution trees

  • One way to get a quick paper copy of the tree shown on your display is to print the contents of the display buffer. Printing the display buffer, however, will print the tree as well as everything else output to the screen before the tree was displayed. Therefore, we recommend that you first clear the contents of the display buffer and then display the tree again using the showtrees or describetrees command.
  • Select File > Print Display Buffer…

 

/------------------------------------ Lemur catta

|

|                                                /---- Homo sapiens

|                                                |

|                                     /---------14 /------- Pan

|                                     |          \13

|                              /-----15            \--------- Gorilla

18                             |      |

|                /------------16      \-------------- Pongo

|                |             |

+---------------17             \-------------- Hylobates

|                |

|                \-------------------- Macaca fuscata

|

\----------------------------- Saimiri sciureus