As a result of recent technologicaladvances, it is relatively quick
and easy to determine a DNA or pro-tein sequence. These sequences bythemselves, of course, tell us very lit-tle: GAATCCA, for example. We needto know what those sequences mean.Which proteins are encoded by thatDNA sequence; does the sequenceindeed encode a protein at all? Whateffect does a small change in the DNAsequence have on the structure of theencoded protein? What function doesthat protein have in the cell? And, ofcourse, what can our DNA sequencetell us about our evolutionary histo-ry?
These and other important biologi-cal questions can be tackled withbioinformatics: essentially, by com-
www.scienceinschool.org28 Science in School Issue 17 : Winter 2010
When we think of bioinformatics we probably imagine huge com-puters and sequencing machines, but the methods of this new sci-ence can be presented by means of simple classroom activities tobe carried out with pencil and paper, as Cleopatra Kozlowski doesin this article.
The author challenges us with the building of the family tree ofhumans and other primates on the basis of the genetic differencesbetween short (fake) DNA sequences. The proposed activity can beprofitably (and enjoyably) exploited in secondary schools to addresssome tricky biology topics such as the use of molecular clocks inthe study of evolution.
The article is aimed at science teachers, who will find useful com-prehension exercises at the end of the text; students can also use thequestions to deepen their understanding of the topic. The quotedweb references provide further information and resources.
Giulia Realdon, ItalyRE
VIE
W
Bioinformatics with penand paper: building aphylogenetic tree
Bioinformatics is usually done with a powerful computer. With help fromCleopatra Kozlowski, however, you caninvestigate our primate ancestry – armedwith nothing but a pen and paper.
Image courtesy of hometowncd / iStockphoto
sis_17_RZ_.qxq:Layout 1 24.11.2010 9:38 Uhr Seite 28
c. 3500-3000 BC
c. 1000 BC
c. 500 AD
c. 800-1200 AD
c. 1300 AD
c.1700-1900 AD
Teaching activities
www.scienceinschool.org 29Science in School Issue 17 : Winter 2010
paring DNA or protein sequences –for example, by comparing newly dis-covered sequences with sequences forwhich we already have a lot of infor-mation (perhaps they have a similarfunction?) or comparing similarsequences in different species.
Bioinformatics is, of course, normal-ly done with the aid of a powerfulcomputer. However, it is all too easyto let a computer do all the workwithout understanding the underly-ing principles involved. For this rea-son, these activities are designed to bedone on paper, to get the students tounderstand how bioinformatic analy-sis works.
This article includes one of a groupof four activities. The two introducto-ry activities (‘Gene finding’ and‘Mutations’) and the concluding activ-ity (‘Mobile DNA’) can be down-loaded from the website of theEuropean Learning Laboratory for theLife Sciences (ELLS)w1. All the tables
required for students to complete thisactivity, together with the step-by-step procedure and answers to thecomprehension questions, can bedownloaded from the Science in Schoolwebsitew2.
Constructing a phylogenetic treeThe accumulation of mutations
causes DNA sequences to change overgenerations. The following activitydemonstrates how this can be used todeduce evolutionary relationshipsbetween organisms. It takes about 90min and requires nothing but a penand the tables, which can be down-loaded from the Science in School web-sitew2.
Introduction
Think about how you would classi-fy diverse animals. Traditionally,physical differences between organ-isms were used to deduce evolution-ary relationships between them, for
example, whether an organism has abackbone, or if it has wings. This maycause problems, however. For exam-ple, birds, bats and insects all havewings, but are they closely related?How do you measure how recentlythe organisms diverged from a com-mon ancestor?
We know from DNA sequencingstudies that DNA mutations occurrandomly at a very slow rate and arepassed from parents to offspring.Thus, if you assume that all organ-isms have a common ancestor, , youcan use the differences in homologoussequences to measure how long it hasbeen since the organisms diverged. Inother words, the longer the time sincetwo species diverged from a commonancestor, the more different theirDNA sequences will be.
Homologous sequences are definedas those sequences in two organismsthat have a common origin. In realitywe don’t really have proof that any
Figure 1: The Indo-European language tree. Note that although Indian, Germanic, Romance and many other European languagesbelong to this family, Finnish, Estonian and Hungarian do not: they belong to the Uralic language group
Dat
a so
urce
: ht
tp:/
/ww
w.li
ngua
tics.
com
/ind
oeur
opea
n_la
ngua
ges.
htm
Irish
Indian Armenian Iranian Germanic Balto-Slavic
Indo-European
Albanian Celtic Hellenic Italic
LatinBretonGaelicWelshBaltic
Lithuanian,Lettish
Old Persian
Persian Greek
Sanskrit
Middle Indian
Hindustani, Bengaliand other modernIndian languages
N Germanic
E Norse
Swedish,Danish,
Gothlandic
Norwegian,Icelandic,Faroese
German Yiddish
High German Low German
Middle English
Modern English
W Norse
E Germanic
Gothic
W Germanic French Provençal Italian Spanish Portuguese Catalan Romanian
Russian, Polish,Czech, Bulgarian,
Serbo-Croatian, etc
Old SlavicAvestan
Low Franconian
Middle Dutch
Dutch, Flemish
Anglo-Saxon(Old English)
Old Saxon
Middle Low German
Old Frisian
Frisian
Plattdeutsch
sis_17_RZ_.qxq:Layout 1 24.11.2010 9:38 Uhr Seite 29
two sequences are homologous (wewere not there to watch the DNAchanging over time) but if they aresufficiently similar, we often assumethat they are ‘homologues’. To knowhow similar two sequences are, youneed to align them correctly (but thisis not part of this activity).
spring. This is discussed in the‘Mobile DNA’ activity.
To illustrate the concept of homolo-gy, you can use the example of philol-ogy – the study of the evolution oflanguages. In fact, there are manyparallels between the methods usedto study evolution of language andorganisms.
Using the differences between frag-ments of DNA sequences is a bit likecomparing a word that means thesame thing in different languages, tosee how closely they are related.
Armenian gatz
Basque katu
Dutch kat
English cat
Estonian kass
Finnish kissa
Icelandic kottur
Italian gatto
Norwegian katt
Polish kot
Portuguese gato
Russian kot
Spanish gato
Swedish katt
You can see that the words for ‘cat’in Italian, Spanish and Portuguese arealmost the same: gatto, gato and gato.In both Swedish and Norwegian, theword is ‘katt’ but you see that inFinnish it is different: ‘kissa’.Although, like Sweden and Norway,Finland is a Nordic country, theFinnish word for ‘cat’ is more similarto the Estonian word, ‘kass’. In fact,the two languages are closely related.So you can learn a little bit about lan-guage relationships by studying howthe words have changed over time.
www.scienceinschool.org30 Science in School Issue 17 : Winter 2010
Haeckel’s tree of life from The Evolution of Man (1879)
Note that different regions of theDNA – coding and non-codingregions – evolve at different speeds.In general, coding regions evolvemore slowly, because a mutation thatcauses a change in a protein is gener-ally more costly to the organism – it isless likely to survive and leave off-
Publ
ic d
omai
n im
age;
imag
e so
urce
: W
ikim
edia
Com
mon
s
Table 1: List of ‘cat’ in Indo-European languages
sis_17_RZ_.qxq:Layout 1 24.11.2010 9:38 Uhr Seite 30
Teaching activities
www.scienceinschool.org 31Science in School Issue 17 : Winter 2010
Constructing a phylogenetic treeof primates
In this activity, we will construct aphylogenetic tree using five homolo-gous DNA sequences from primates.Because the sequences have beenmade up, we cannot deduce any realestimates of genetic distance; to createa meaningful phylogenetic tree fromreal data would require far longersequences. Nonetheless, the fictionalsequences (in Table 2) have been cho-sen to give a reasonably accurate pic-ture of primate relationships.
Note: all the tables required for stu-dents to complete this activity can bedownloaded from the Science in Schoolwebsitew2.1. Count the number of differences
between each pair of sequences,and record it in Table 4. This iseasy to do if you compare eachsequence side by side. For exam-ple, Neanderthals and humans dif-fer at three nucleotides in thesequence (Table 3a) whereas chim-panzees and gorillas differ at 11points (Table 3b).
Comparison tables for all the pairsof species, and the completed table ofsequence differences (Table 4), can bedownloaded from the Science in Schoolwebsitew2.
The number of nucleotide differ-ences between two sequences dividedby the total number of nucleotides ineach sequence (in this case, 46) givesthe proportional distance between thetwo sequences.2. Consider the two species with the
most similar sequences:Neanderthal and human. In Table5, record the number of nucleotidedifferences (3) and the proportionaldifference (3/46 = 0.065).
The ‘average sequence’ of twospecies is assumed to be their ances-tor. In this exercise, we do not directlycalculate the average sequence of, forexample, Neanderthals and humans,but the evolutionary distance betweenthe Neanderthal/human ancestor,and all other primates in the group.
Table 4: Sequence differences between primates
Neanderthal Human Chimpanzee Gorilla Orangutan
Neanderthal 0 3
Human 3 0
Chimpanzee 0 11
Gorilla 11 0
Orangutan 0
Table 5: Evolutionary distances between primate ancestors and primates
Differences Proportional difference
Neanderthal and human 3 3/46 = 0.065
Neanderthal / human and chimpanzee
Neanderthal / human / chimpanzee and gorilla
Neanderthal / human / chimpanzee / gorilla and orangutan
Table 6a: Sequence differences between the Neanderthal/human ancestor and otherprimates
Neanderthal Chimpanzee Gorilla Orangutan/ human
Neanderthal 0 (4+5)/2 = 4.5 (11+12)/2=11.5/ human
Chimpanzee (4+5)/2 = 4.5 0
Gorilla (11+12)/2=11.5 0
Orangutan 0
Table 3a: A comparison of Neanderthal and human sequences
Neanderthal TGGTCCTGCAGTCCTCTCCTGGCGCCCCGGGCGCGAGCGGTTGTCC
Human TGGTCCTGCTGTCCTCTCCTGGCGCCCTGGGCGCGAGCGGATGTCC
Table 3b: A comparison of chimpanzee and gorilla sequences
Chimpanzee TGATCCTGCAGTCCTCTTCTGGCGCCCTGGGCGCGTGCGGTTGTCC
Gorilla TGGACCTGCAGTCATCTTCTGCCCGCCCGAGCGCTTGCCGATGTCC
Table 2: Five DNA sequences from primates
Primate Sequence
Neanderthal (n) TGGTCCTGCAGTCCTCTCCTGGCGCCCCGGGCGCGAGCGGTTGTCC
Human (h) TGGTCCTGCTGTCCTCTCCTGGCGCCCTGGGCGCGAGCGGATGTCC
Chimpanzee (c) TGATCCTGCAGTCCTCTTCTGGCGCCCTGGGCGCGTGCGGTTGTCC
Gorilla (g) TGGACCTGCAGTCATCTTCTGCCCGCCCGAGCGCTTGCCGATGTCC
Orangutan (o) ACAACCTGCACTCCTATTCTGCCGAGCCGGGCGCGTGGCAAAGTCC
sis_17_RZ_.qxq:Layout 1 24.11.2010 9:38 Uhr Seite 31
3. Calculate the distance between theaverage sequence of theNeanderthals and humans, and theother primate species and enter thedata in Table 6a.
There are four differences betweenNeanderthal, and chimpanzee andfive differences between human andchimpanzee. Thus the average dis-tance between Neanderthal/humanand chimpanzee is 4.5.
There are 11 differences betweenNeanderthal and gorilla, and 12 dif-ferences between human and gorilla.Thus the average distance between
Neanderthal/human and gorillais 11.5.4. As before, these distances can be
turned into proportional differ-ences by dividing by the number of nucleotides in each sequence(46). Calculate the proportional distances between the averagesequence of the Neanderthals /humans, and the other primatespecies. Enter the figures in Table 5.For chimpanzees, the proportionaldistance from the Neanderthal /human ancestor is 4.5/46 = 0.98.
Using Table 5, you can begin to construct the evolutionary tree.5. Connect Neanderthals and humans
with a line. The branch lengthshould correspond to how long ittook for humans and Neanderthalsto diverge from their commonancestor.Let us assume that it would take 20 million years for every singlenucleotide in this particular DNAsequence to change. Thus for theDNA sequence to change by 0.065,it would take 0.065*20 million = 1.3 million years. The branchshould, therefore, measure 1.3 mil-lion years on the time scale (seeFigure 2).
6. To calculate how long ago theancestor of chimpanzees divergedfrom the ancestor of humans (thebranch length), add up the propor-tional differences in Table 5.Remember that the proportional
www.scienceinschool.org32 Science in School Issue 17 : Winter 2010
distance between the Neanderthal /human ancestor and the chim-panzee was 0.98. Thus the timesince chimpanzees, humans andNeanderthals diverged from acommon ancestor is:
(0.065 + 0.098) * 20 million= 0.163 * 20 million= 3.3 million years ago.
7. Continue the calculations. Repeatsteps 3 to 6 to calculate how longago the Neanderthal/ human /chimpanzee ancestor divergedfrom the gorilla and from theorangutan. Then calculate howlong ago the Neanderthal/ human/ chimpanzee/gorilla ancestordiverged from the orangutan. Enterthe results in Table 5.
If you need help, you can downloadthe step-by-step procedure from theScience in School website.8. Use the completed Table 5 to finish
the phylogenetic tree, as shown onpage 33.
QuestionsBelow are some questions you
could use to test your students’understanding of the activity.Answers can be downloaded from theScience in School websitew2.
1. In your phylogenetic tree, howmany years ago did gorillas andhumans diverge from a commonancestor? What about orangutansand humans?
2. Can you find out if these and theother estimates in your tree are cor-rect?
3. Why may phylogenetic trees con-structed using different regions ofthe DNA look different?
4. What regions of DNA should youuse to compare organisms that areclosely related?
5. What kind of genes should you useto compare organisms that are evo-lutionarily distant from each other?
6. What should you do if you arecomparing two sequences, but oneof them has gaps due to deletions(or insertions in the othersequence)?
7. Can you think of reasons why thismethod of simply comparing thenumber of differences between thenucleotides may not work if youare comparing organisms that arevery different? Remember that weare assuming it takes 20 millionyears for every nucleotide in asequence to mutate.
0.098
Time (million years)
12.5 10 7.5 5 2.5 0
Figure 2: Incomplete phylogenetic tree
Human
0.065
Image courtesy of Nicola Graf
Neanderthal
Chimpanzee
Imag
es c
ourt
esy
of r
oom
101
, Tem
pelm
eist
er /
pix
elio
.de
sis_17_RZ_.qxq:Layout 1 24.11.2010 9:38 Uhr Seite 32
0.065
0.098
0.245
0.317
Time (million years)
12.5 10 7.5 5 2.5 0
Teaching activities
www.scienceinschool.org 33Science in School Issue 17 : Winter 2010
8. Can you think of other reasonswhy it may not be so good to usethis method to calculate evolution-ary distances? What simplificationshave we made?
9. Can you think of reasons why ifyou are studying more distantorganisms, it is better to compareamino acid sequences than DNAsequences?In this exercise, we have concen-trated on working out when thefive primate species diverged fromeach other (the scale of the tree).Often, however, we do not evenknow the order in which the speciesdiverged from one another (theshape of the tree). How do weknow, for example, that humansand chimpanzees are more closelyrelated than gorillas and chim-panzees are? If the latter were true,how would the sequence differ-ences (Table 4) differ?
AcknowledgementThis activity was developed in a
special collaboration between theEuropean Learning Laboratory for theLife Sciences (ELLS)w1 and theEuropean Molecular Biology
Laboratory’s E-STAR Fellows todevelop teaching resources forschools. Cleopatra Kozlowski wassupported by an E-STAR fellowshipfunded by the EuropeanCommission’s FrameworkProgramme 6 Marie Curie HostFellowship for Early Stage ResearchTraining, under contract numberMEST-CT-2004-504640.
Web referencesw1 – The European Learning
Laboratory for the Life Sciences(ELLS) is an education facilitywhich brings secondary-schoolteachers into the research lab for aunique hands-on encounter withstate-of-the-art molecular biologytechniques. ELLS also gives scien-tists a chance to work with teachers,helping to bridge the widening gapbetween research and schools. Theactivity described in this article wasdesigned as a teaching resource forELLS’ professional developmentprogramme for European teachers.For more information about ELLS,see: www.embl.org/ells
Gorilla
Orangutan
Human
Figure 3: Complete phylogenetic tree
w2 – Download all the tables requiredfor students to complete this activity,together with the step-by-step proce-dure and answers to the comprehen-sion questions, from the Science inSchool website:www.scienceinschool.org/2010/issue17/bioinformatics#resources
ResourcesThe website of the US National Center
for Biotechnology Information(NCBI) offers an introduction tophylogenetics. See:www.ncbi.nlm.nih.gov/About/primer/phylo.html
To learn more about using proteinsequences to establish phylogenetictrees, see: http://users.rcn.com/jkimball.ma.ultranet/BiologyPagesor use the direct link:http://tinyurl.com/2wqp7nq
To learn about how a group of scien-tists recreated the new tree of life,tracing the course of evolution, see:
Hodge R (2006) A new tree of life.Science in School 2: 17-19.www.scienceinschool.org/2006/issue2/tree
The Interactive Tree Of Life is anonline tool for the display andmanipulation of phylogenetic trees.To learn more, see:http://itol.embl.de
To browse other evolution-related arti-cles in Science in School, see:www.scienceinschool.org/evolution
Image courtesy of Nicola Graf
Imag
es c
ourt
esy
of r
oom
101
, Tem
pelm
eist
er, S
teph
an F
ranz
Xav
er D
ietl,
Ste
phan
Hah
nel /
pix
elio
.de
10.
Neanderthal
Chimpanzee
Imag
e co
urte
sy o
f Ste
phan
Fra
nz X
aver
Die
tl /
pixe
lio.d
e
sis_17_RZ_.qxq:Layout 1 24.11.2010 9:38 Uhr Seite 33