BMI/CS 776Lecture 2
Colin Dewey2007.01.25
Today
• The biology of nucleic acids• Trees• Homology forests
What is life?
• living cell: membrane with genetic material
• organism: one or more connected cells
membrane chromosome
genome
Central dogma
• Complicated regulation at each step• Regulatory cycles
DNA RNA Proteintranscription translation
DNA Composition
Deoxyadenosine monophosphate
polynucleotide
DNA bases
DNA structure
The importance of pairing
• Complementation -> Replication
“the the specific pairing we have postulated immediately suggests a possible copying mechanism
for the genetic material”- Watson & Crick (1953)
DNA Replication
RNA
• Differences• Ribose sugar, not deoxyribose• Uracil (U) instead of thymine• Single stranded
• Enzymatic activity -> “RNA World”adenosine
monophosphate
Important Enzymes
• RNA Polymerase: RNA->RNA, DNA->RNA• Primase: DNA->RNA• Reverse transcriptase: RNA->DNA• Telomerase: RNA->DNA
Mutation
• Substitutions• Insertions• Deletions• Rearrangements
Base mispairing
• Base not paired with complement• Causes: replication error, radiation damage,
chemical mutagen
• CpG: highly mutable in animals• Causes substitution mutations
Replication Slippage
• Strand separation during replication• Re-pairing at wrong place• Repair results in insertion or deletion• Common in repetitive regions
Recombination - start
• Interaction of highly similar regions of DNA
• Formation of Holliday junction
• Junction migration
Recombination - end
• Junction resolved• Possible outcomes• No crossing over• Crossing over• Gene conversion• No gene conversion
Recombination results
• Many types of mutations can occur due to recombination:
• inversion• insertion• deletion• chromosome fissions/fusions
Chromosome breaks/joins
• Breakage• Double-stranded cut• Causes: radiation damage, endonucleases
• Joining• Ligase
• Mutations: inversions, transpositions, fusions, fissions
Mutation fixation
• Whether or not a mutation becomes frequent in population depends on natural selection and random drift
• Multi-cellular organisms: mutations must occur in germline to have evolutionary effect
• Key distinction between probability of mutation and probability of fixation
Mutation summaryclass causes
substitution base mispairing
insertion base mispairing, recombination
deletion base mispairing, recombination
rearrangement recombinationchromosome breaks/joins
Homology
• Common ancestry• Characters• morphological• molecular
• Richard Owen: “the same organ in different animals under every variety of form and function”
Nucleotide homology
• What is a evolutionary character?• Position in DNA or RNA
• Single-stranded characters• Properties: position, state• x is a “copy” of y if x was initially base-
paired with y during template-dependent synthesis
Double-stranded case
• double-stranded character x• comprised of x+ and x- (single-stranded)• properties: position, state, orientation (+ or -)• x (ds) a copy of y (ss) if x+ or x- a copy of y• y (ss) a copy of x (ds) if y a copy of x+ or x- • x (ds) a copy of y (ds) if x+ or x- a copy of x+ or
x-
Mutation
• Changes character states or positions, not relationships
• Repair after damage: can create new relationships if template-driven
Nucleotide homology
• x is “derived” from y if x1, x2, ... , xT exists s.t. y = x1 and x = xT and xi+1 is a copy of xi
• x is “homologous” to y if there existed a character z s.t. both x and y are derived from z
Refinements of homology
• Not all homology relationships are equal• Fitch: orthology, paralogy, xenology• Each has different biological implications
Xenology
• Result of horizontal transfer
XA1 XC XB
Xanc
XA2
Refinements of Homology ancestor A
species A species B
XA1 XA2 XB
Xanc Yanc
YA YB ZA ZB1 ZB2
Zanc
Orthologous: Diverged from LCA due to a speciation event
Paralogous: Diverged from LCA due to a duplication eventFitch,1970:
• Duplication is directed if removing one of A or A’ from G’ does not give G
• Examples• retrotransposition• segmental duplications
• Evolutionary consequence: source more likely to retain ancestral role
Directed Duplications
targetsource
G
G’A A’
• Duplication is undirected if removal of A from G’ gives G and removal from A’ from G’ gives G
• Examples• tandem duplication• whole genome/chromosome
duplication
• Evolutionary consequence: both copies under very similar evolutionary pressures
Undirected Duplications
G
G
G’A A’
G’A
A’
Topoorthology
• Characters x and x’ are topoorthologous if they are orthologous and neither is derived from the target of a directed duplication since the time of the last common ancestor of x and x’
• “topo” = position: topoorthologs are more likely have similar genomic contexts
Monotopoorthology
• Characters x and x’ are monotopoorthologous if they are topoorthologous and neither is derived from an undirected duplication since the time of the last common ancestor of x and x’.
• Only transitive subrelation of homology• One-to-one relation
Refinements of Orthology ancestor
species A species B
directed duplication
targetsource
undirected duplication
XA1 XA2 XB
Xanc Yanc
YA YB ZA ZB1 ZB2
Zanc
Topoorthologs: (XA1, XB), (XA2, XB)Monotopoorthologs: (YA, YB), (ZA, ZB1)
Trees in biology
Darwin’s first tree (1837)
Graph basics
• Graph: vertices (V), edges (E)• Path, Cycle, Connected• Degree, Leaf• Forest, Tree, Binary Tree
Phylogenetic X-trees
• Phylogenetic X-tree• Weighted Phylogenetic X-tree• Binary Phylogenetic X-tree• Rooted Phylogenetic X-tree• Binary Rooted Phylogenetic X-tree
Dissimilarity Maps
• Dissimilarity maps• Connection between dissimilarity maps and
weighted phylogenetic X-trees
• Metrics• Tree metrics weighted phylogenetic X-
trees
• Four point condition tree metric
Is there a species tree?
• Not really!• Untree-like mechanisms• Hybrid speciation• Horizontal transfer (xenology)• Incomplete lineage sorting
Incomplete lineage sorting
Pollard et al., 2006, PLoS Genetics
Nucleotide trees
• Relationships between nucleotides are trees!• nucleotide position has one parent
• Exceptions (mispairing)• heteroduplex DNA from recombination• replication slippage
Homology forest
1.3. HOMOLOGY FORESTS 17
time to common ancestry of a set of related genes [Hartl and Clark, 1997].Exceptions to the nucleotide tree model occur due to mispairing events
in which one nucleotide is paired with neither its parent nor its child (Sec-tion 1.1). A double-stranded position with a mispairing of this type thushas two parents. Recombination and replication slippage are two mecha-nisms that can cause such mispairings. However, these exceptions are notimportant from an evolutionary standpoint. In the case of recombination,we may assume that either no heteroduplex DNA (which contains these mis-pairings) is formed, or that heteroduplex DNA is always repaired by excisingone strand of the region, rather than just those positions that have incorrectcomplementary bases. Replication slippage can be similarly explained awayby assuming excision of one strand of the mispaired region, and replacementvia the opposite strand. Although these assumptions are violated, the se-quences that result from these assumed scenarios are indistinguishable fromthose that occur naturally.
Homology forests
We now formally define the notion of a homology forest, which represents allevolutionary relationships between nucleotide positions. We use the notationσai to denote the i
th element of a sequence σa = σa1 , . . . ,σan of length n. Letσ = σa1 , . . . ,σan denote the complement of σ, where σai is the character withstate complementary to σai . That is, σ
ai and σai are double-stranded DNA
characters with identical position and orientations of + and −, respectively.By a set of sequence characters S = {σ1, . . . ,σk} we mean the set of n1 +n2 · · · + nk sequence characters that form the sequences σ1,σ2, . . . ,σk oflengths n1, n2, . . . , nk, respectively. Lastly, for S = {σ1, . . . ,σk}, let S ={σ1, . . . ,σk}.
Definition 1.15 Given a set of sequence characters S, a homology forest,F is a forest with leaves labeled by S ∪ S and with at most one of σai andσai used as a leaf label, ∀σai ∈ S. A phylogenetic X-tree in F represents theevolutionary history of a set of homologous sequence characters, X.
Given a set of sequences, the multiple alignment problem consists ofconstructing the homology forest for those sequences. In order to do so weneed to model the types of events described in Section 1.
Notes
Multiple alignment
• Given set S of sequences• Multiple alignment = construction of
homology forest on S
• Further: annotated homology forest
Annotated Trees of Life• Alignment trees are annotated with duplications (undirected
or directed), speciations, and horizontal transfers
X1
• Alignment trees:
W38 Y42
Y30
X3 Z15
Z67 X87
Z9 W2Y4
X10 Z23
speciation directed duplication undirected duplicationhorizontal transfer
Next time
• Topic: “Modeling nucleotide evolution”• Reading: Text 8.1-8.3