Skip to content Skip to navigation

Connexions

You are here: Home » Content » Tree of Life

Navigation

Content Actions

  • Download module PDF
  • Add to ...
    Add the module to:
    • My Favorites
    • A lens
    • An external social bookmarking service
    • My Favorites (What is 'My Favorites'?)
      'My Favorites' is a special kind of lens which you can use to bookmark modules and collections directly in Connexions. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need a Connexions account to use 'My Favorites'.
    • A lens (What is a lens?)

      Definition of a lens

      Lenses

      A lens is a custom view of Connexions content. You can think of it as a fancy kind of list that will let you see Connexions through the eyes of organizations and people you trust.

      What is in a lens?

      Lens makers point to Connexions materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

      Who can create a lens?

      Any individual Connexions member, a community, or a respected organization.

    • External bookmarks
  • E-mail the author

Recently Viewed

This feature requires Javascript to be enabled.

Tree of Life

Module by: Quoclinh Nguyen

In this activity you will work in groups of two to use bioinformatics methods to build

Part I: A rRNA-based tree of life

In this part and the next, you will use some standard bioinformantics software to build phylogenetic trees. In this part you will be building a tree using the DNA sequences corresponding to the 18s ribosomal RNA. As we discussed in class, the ribosomal RNAs make a good sequence for building such trees since they are relatively conserved across all of life. This process will consist of two steps:

  1. Use the program ClustalX to do a multisequence alignment of the DNA sequences and to build a phylogenetic tree. (ClustalX is freely available on the WWW.)
  2. Use the program drawtree to produce a nice unrooted tree diagram that summarizes your results. (drawtree is part of the freely available Phylip toolkit for comparing and manipulating DNA and protein sequence data.)
  3. (Optional) Use the program drawgram to draw a rooted phylogenetic tree based on the rRNA alignment.

You will be working with 10 DNA sequences coding for the 18s rRNAs in the following organisms (the names have been deliberately chosen to be a bit obscure):

1. C. Elegans
2. Drosophila
3. Homo Sapiens
4. Musculus
5. Norvegius
6. Cerevisiae
7. Pombe
8. Xenopus
9. Zea Mays
10. Arabidopsis

Step 1: Go to the Activity 13 folder which will be in the My Documents folder on the computer desktop. Double click on the clustalx icon to start ClustalX. A window should open on your desktop.

Step 2: Input the 10 DNA sequences into ClustalX. For the first sequence you will click on file::Load Sequences and then select the first DNA sequence. All 10 DNA sequences are in the rRNA sub-folder and have file names like Pombe rRNA. After loading the first sequence, you should see a sequence appearing in the ClustalX window. For the next 9 sequences, you will click on File::Append Sequences. In the end, you should end up with all 10 sequences appearing in the ClustalX window.

Step 3: Run the alignment by clicking the Alignment::Do Complete Alignment button. Before the alignment begins, the program will ask you for two output file names. You can just click Align since you won’t be using these output files. The alignment may take a few minutes to complete—progress will be displayed at the bottom of the window

Step 4: Once the alignment is complete, use the scroll bar to look at the overall alignment. Since each DNA base is highlighted with a different color, it’s easy to see where the alignment is good. An asterix (*) is printed at the top of the alignment where all sequences agree on a particular base location, and a hyphen (-) is put in where a gap has been inserted by the program to make the alignment work well. Approximately what fraction of the overall rRNA sequence appears to have a good alignment (i.e. at least 7 of the sequences agree on the bases)?

Step 5: Now save to disk the tree representation of this alignment by clicking the Trees::Draw N-J Tree. When asked for a filename, be sure that the directory is rRNA and set the filename to rRNA.phy.

Step 6: Now you will plot tree arising from this alignment. Without closing the ClustalX program (just in case you need to go back to it), open the Activity 13 folder and double click on drawtree which will open a window. You will be asked for the filename for the input data, type rRNA/rRNA.phy. Now you will specify a few settings:

  • a - Specify that the plot to be produced as a Windows Image file: Type P and then P and then 3.
  • b - Specify that the labels be placed radially on the tree plot: Type L and then R
Now type Y for the plot—a new window should appear showing you a nicely drawn phylogenetic tree. (If you don’t get a new window or the tree looks crazy--for example the lines go outside plot--check with the instructor.) If the plot looks reasonable, on the plot window click File::Plot. The file produced will be called plotfile and it will be in the Activity 13 folder. Move this file to the rRNA folder and rename it: rRNA.pcx. You can now double click on this file and see it in the Windows Photo Editor from which you can print it. (Alternatively, you can insert this as an image into a Powerpoint slide.) One way or another, print out two copies of the resulting tree and save it to answer the questions in Part IV.

Step 7 [Optional]: If you’d like to see a rooted tree diagram of your alignment results, you can follow the exact same procedure in Step 6, using the program drawgram.

Part II: Phylogeny from a Protein Sequence

In this part you will follow almost the exact same procedure as in Part II, but will be using protein sequences, rather that rRNA sequences to build your tree. (ClustalX is smart enough to automatically recognize that you are now working with protein sequences.) You will be aligning and tree-building from a set of sequences from different species of a protein chosen from the NIH’s HomoloGene database of homologous (related) genes among eukaryotes. This protein is Ca2+/calmodulin-dependent protein kinase, (the name is appreviated CaMk in this write-up). Note that for some of the species, the protein function is inferred by comparison with similar protein sequences in other species where it has been biochemically characterized. You will be working with 11 CaMk sequences--the names are a lot less obscure than in Part II:

1. Boar
2. C. Elegans
3. Chicken
4. Ferret
5. Frog
6. Human
7. Mouse
8. Rabbit
9. Rat
10. Sponge
11. Zebrafish

Step 1: Go to the Activity 13 folder which will be in the My Documents folder on the computer desktop. Double click on the clustalx icon to start ClustalX. A window should open on your desktop

Step 2: Input the 11 protein sequences into ClustalX. For the first sequence you will click on file::Load Sequences and then select the first protein sequence. All 11 protein sequences are in the CaMk sub-folder and have file names like Boar CaMk. After loading the first sequence, you should see a sequence appearing in the ClustalX window. For the next 10 sequences, you will click on File::Append Sequences. In the end, you should end up with all 11 sequences appearing in the ClustalX window.

Step 3: Run the alignment by clicking the Alignment::Do Complete Alignment button. Before the alignment begins, the program will ask you for two output file names. You can just click Align since you won’t be using these output files.

Step 4: Now save to disk the tree representation of this alignment by clicking the Trees::Draw N-J Tree. When asked for a filename, be sure that the directory is CaMk and set the filename to CaMk.phy.

Step 5: Now you will plot tree arising from this alignment. Without closing the ClustalX program (just in case you need to go back to it), open the Activity 13 folder and double click on drawtree which will open a window. You will be asked for the filename for the input data, type CaMk/CaMk.phy. Now you will specify a few settings:

  • c. - Specify that the plot to be produced as a Windows Image file: Type P and then P and then 3.
  • d. - Specify that the labels be placed radially on the tree plot: Type L and then R
Now type Y for the plot—a new window should appear showing you a nicely drawn phylogenetic tree. If the plot looks reasonable, on the plot window click File::Plot. The file produced will be called plotfile and it will be in the Activity 13 folder. Move this file to the CaMk folder and rename it: CaMk.pcx. You can now double click on this file and see it in the Windows Photo Editor from which you can print it. (Alternatively, you can insert this as an image into a Powerpoint slide.) One way or another, print out two copies of the resulting tree and save it to answer the questions in Part IV.

Part IV: Analysis and Questions

Exercise 1

1. Using the internet (i.e. google.com searches), identify each of the organisms on the phylogenetic tree you printed out in Part II by their common name (e.g. Rabbit). Next, circle the sets of nodes on your tree that are similar organisms (e.g. plants, yeast, etc.). Looking at the organisms listed on your tree, does your tree seem reasonable? [Optional—compare your tree to the locations of the organisms on the Tree of Life website or the following protein-based tree: http://www.tarweed.com/pgr/PGR98-058.figure1.jpeg]

Solution 1

Coming soon ...

Exercise 2

2. [Optional] If you choose to create a rooted tree (Step 7), label the tree produce just as you labeled the unrooted tree in Part IV, Step 1. Does your rooted tree look like a reasonable evolutionary tree? (If not, ask the instructor for a quick overview of the challenges to properly “rooting” an unrooted tree.)

Solution 2

Coming soon ...

Exercise 3

3. Now look at the phylogenetic tree you created using the CaMk protein sequences. Does the tree seem to reasonably represent the actual evolutionary distances between all selected organisms? (Look both at the big picture; i.e. the distances between the obviously very different organisms, and well as at individual pairs of organisms. How do your results compare with those from Part II?)

Solution 3

Coming soon ...

Exercise 4

4. What hypotheses would you propose to explain any unexpected features of the tree you produced in Part III? Some concepts to stimulate your thinking about this: a. The “higher” eukaryotes, especially mammals, often have many versions of a given protein all somewhat specialized over the ancestral protein. a. Sometimes the optimal protein sequence to perform a particular task can emerge from two different protein precursors (so-called convergent evolution). c. Building the optimal phylogenetic tree is actually a very hard computational task (technically it is “NP-complete”) so that all practical software tree-builders use approximate methods to estimate the optimal tree.

Solution 4

Coming soon ...

Comments, questions, feedback, criticisms?

Send feedback