Date post: | 01-Jun-2015 |
Category: |
Technology |
Upload: | jonathan-eisen |
View: | 3,014 times |
Download: | 1 times |
TIGRTIGRTIGRTIGR
TIGRTIGRTIGRTIGRTIGRTIGRTIGRTIGR
“Nothing in biology makes senseexcept in the light of evolution.”
T. H. Dobzhansky (1973)
TIGRTIGRTIGRTIGR
Topics of Discussion• Introduction to phylogenomics• Uses of evolutionary analysis in genomics
– Selection of species– Functional prediction– Gene duplication– Gene loss– Genome rearrangements– Lateral transfer– Uncultured species– Specialization
TIGRTIGRTIGRTIGR
Phylogenomic Analysis
Phylogenomics involves combining evolutionary reconstructions of genes, proteins, pathways, and species with analysis of complete genome sequences.
TIGRTIGRTIGRTIGR
Uses of Phylogenomics• Selection of species• Functional prediction• Gene duplication• Intragenomic movement• Gene loss• Lateral transfer• Genome rearrangements• Uncultured species
TIGRTIGRTIGRTIGR
Strain Selection and Evolution
• Increasing phylogenetic representation• Determining relatedness to model organism• Understanding major evolutionary transitions• Identifying taxa with unusual (high or low) rates
of evolution• Identifying source of DNA from uncultured
species• Species naming and type strains (e.g., see Ward et.
al. 2001)
TIGRTIGRTIGRTIGRBacteria Archaea
Evolutionary Diversity Still Poorly Represented in Complete Genomes
TIGRTIGRTIGRTIGR
BacteriaArchaea
Eukaryotes
Giardia
Trichomonas
Naegleria
Trypanosoma
Euglena
Plasmodium
Tetrahymena
Phytophthora
Arabidopsis
Chlamydomonas
Dictyostelium
Humans
Fly
Worm
Encephalatozoon
S. cerevisiae
S. pombe
S. pombe Genome AnalysisEukaryotes vs. Prokaryotes
TIGRTIGRTIGRTIGR
Plants
Giardia
Trichomonas Parabisalia
Diplomonads
Naegleria
Trypanosoma
Euglena
Plasmodium
Tetrahymena
Phytophthora
Arabidopsis
Chlamydomonas
Fungi
Animals
Dictyostelium
HumansFly
Worm
Encephalatozoon
S. cerevisiaeS. pombe
Microsporidia
Dictyostelia
HeterokontsCiliates
ApicomplexaKinetoplastids
EuglenasAcrasidae
Single vs. Multi-celled
TIGRTIGRTIGRTIGR
Uses of Phylogenomics• Selection of species• Functional prediction• Gene duplication• Intragenomic movement• Gene loss• Lateral transfer • Genome rearrangements• Uncultured species
TIGRTIGRTIGRTIGR
Predicting Function
• Identification of motifs• Homology/similarity based methods
– Highest hit, top hit, HMMs, threading
• Evolutionary methods– Phylogenetic trees– Ds/Dn– Phylogenetic profiles
TIGRTIGRTIGRTIGR
TIGRTIGRTIGRTIGR
MutS.Aquaeorf.TrepaSPE1.DromeMSH2.XenlaMSH2.RatMSH2.MouseMSH2.HumanMSH2.YeastMSH2.NeucratMSH2.ArathMutS.Borbuorf.StrpyMutS.BacsuMutSSynspMutSEcoliorfNeigoMutSThemaMutSTheaq
orf.Deiraorf.ChltrMSH1.SpombeMSH1.YeastMSH3.YeastSwi4.SpombeRep3.MousehMSH3.Humanorf.ArathMSH6.YeastGTBP.HumanGTBP.MouseMSH6.ArathorfStrpyyshDBacsuMSH5CaeelhMHS5humanMSH5YeastMutS.MetthorfBorbuMutS2AquaeMutSSynsporfDeiraMutS.HelpysgMutS.SauglMSH4.YeastMSH4.CaeelhMSH4.HumanA.AquaeTrepaFlyXenlaRatMouseHumanYeastNeucrArathBorbuStrpyBacsuSynspEcoliNeigoThemaTheaqDeiraChltrSpombeYeastYeastSpombeMouseHumanArathYeastHumanMouseArathMutS2.MetthMutS2.SauglStrpyBacsuCaeelHumanYeastBorbuAquaeSynspDeiraHelpyYeastCaeelHumanMSH4MSH5MutS2MutS1MSH1MSH3MSH6MSH2B.AquaeTrepaXenlaNeucrArathBorbuSynspNeigoThemaDeiraChltrSpombeSpombeArathMouseMouseFlyRatMouseHumanYeastStrpyBacsuEcoliTheaqYeastYeastHumanYeastHumanArathStrpyBacsuHumanMutS2-MetthBorbuAquaeSynspDeiraHelpyMutS2-SauglCaeelYeastYeastCaeelHumanMSH4MSH5MutS2MutS1MSH1MSH3MSH6MSH2C.MutS2StrpyBacsuMutS2.MetthBorbuAquaeSynspDeiraHelpyMutS2.SauglCaeelYeastYeastCaeelHumanHumanMSH4Segregation &
Crossover
MSH5Segregation &
Crossover
FlyMouseHumanYeastAquaeTrepaXenlaNeucrArathBorbuSynspNeigoThemaDeiraChltrSpombeSpombeArathArathMutS1All MMR
(Bacteria)
RatStrpyBacsuEcoliTheaqYeastYeastMouseHumanYeastHumanMouseMSH1MMR in
Mitochondria
MSH3MMR of
Large Loops in Nucleus
MSH6MMR of
Mismatches and Small Loopsin Nucleus
MSH2All MMR
in Nucleus
D.
TIGRTIGRTIGRTIGR
rRNA and Uncultured Microbes
TIGRTIGRTIGRTIGR
Evolutionary Rate Variation
231456
TIGRTIGRTIGRTIGR
Uses of Phylogenomics• Selection of species• Functional prediction• Gene duplication• Gene loss• Lateral transfer• Genome rearrangements• Uncultured species
TIGRTIGRTIGRTIGR
Why Duplications Are Useful to Identify
• Allows division into orthologs and paralogs
• Improves functional predictions
• Helps identify mechanisms of duplication
• Can be used to study mutation processes in different parts of a genome
• Lineage specific duplications may be indicative of species’ specific adaptations
TIGRTIGRTIGRTIGR
Lineage Specific Duplications in Wolbachia wMelAnnotationankyrin repeat domain proteinankyrin repeat domain proteinankyrin repeat domain proteinankyrin repeat domain proteinankyrin repeat domain proteinankyrin repeat domain proteinankyrin repeat domain proteinconserved domain proteinconserved domain proteinconserved domain proteinconserved domain proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinconserved hypothetical proteinFRAMESHIFTconserved hypothetical proteinPOINT MUTATIONconserved hypothetical protein,degenerateconserved hypothetical protein,FRAMESHIFTconserved hypothetical protein,FRAMESHIFTconserved hypothetical protein,FRAMESHIFTconserved hypothetical protein,FRAMESHIFTconserved hypothetical protein,interruption-Cconserved hypothetical protein,POINT MUTATIONconserved hypothetical protein,POINT MUTATIONconserved hypothetical protein,truncatedconserved hypothetical protein,truncationDNA mismatch repair proteinMutL (mutL)DNA repair protein RadC,putativeDNA repair protein RadC,putative, truncationDNA repair protein RadC,truncationDnaJ domain proteinDnaJ domain proteinexopolysaccharide synthesisprotein ExoD-related proteinexopolysaccharide synthesisprotein ExoD-related proteinHNH endonuclease familyproteinHNH endonuclease familyproteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical protein
hypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinhypothetical proteinmajor facilitator familytransportermajor facilitator familytransportermajor facilitator familytransportermembrane protein, putativemembrane protein, putativemembrane protein, putativeMutL family proteinNa+/H+ antiporter family proteinNa+/H+ antiporter, putativepermease, putativeportal protein, FRAMESHIFTportal protein, FRAMESHIFTprophage LambdaW1, DNAmethylaseprophage LambdaW1, terminaselarge subunit, putativeprophage LambdaW2, ankyrinrepeat domain proteinprophage LambdaW2, ankyrinrepeat domain protein
prophage LambdaW2, baseplateassembly protein J, putativeprophage LambdaW2, baseplateassembly protein V, putativeFRAMESHIFTprophage LambdaW2, baseplateassembly protein V, putativeFRAMESHIFTprophage LambdaW2, baseplateassembly protein W, putativeprophage LambdaW2, minor tailprotein Z, putative,FRAMESHIFTprophage LambdaW2, site-specific recombinase, resolvasefamilyprophage LambdaW4, ankyrinrepeat domain proteinprophage LambdaW4, DNAmethylaseprophage LambdaW4, portalprotein, FRAMESHIFTprophage LambdaW4, portalprotein, FRAMESHIFTprophage LambdaW4, terminaselarge subunit, putativeprophage LambdaW5, ankyrinrepeat domain proteinprophage LambdaW5, ankyrinrepeat domain proteinprophage LambdaW5, ankyrinrepeat domain proteinprophage LambdaW5, baseplateassembly protein J, putative,FRAMESHIFTprophage LambdaW5, baseplateassembly protein V, putativeprophage LambdaW5, baseplateassembly protein W, putativeprophage LambdaW5, minor tailprotein Z, putative, degenerate,FRAMESHIFTprophage LambdaW5, site-specific recombinase, resolvasefamilyregulatory protein RepA, putativeregulatory protein RepA, putativereverse transcriptase, putativereverse transcriptase, putativereverse transcriptase, putativesodium/alanine symporter familyproteinsodium/alanine symporter familyproteinTenA/THI-4 family proteintranscriptional regulatortranscriptional regulatortranscriptional regulatortranscriptional regulatortranscriptional regulatortranscriptional regulatortranscriptional regulator, putativetranslation elongation factor Tu(tuf)translation elongation factor Tu(tuf)transposase, degeneratetransposase, IS4 familytransposase, IS4 familytransposase, IS4 familytransposase, IS5 family,interruption-Ntransposase, IS5 family,truncationtransposase, putative, degeneratetransposase, putative, degeneratetransposase, putative, degeneratetype IV secretion system proteinVirB4, putativeUDP-N-acetylglucosaminepyrophosphorylase-relatedprotein
TIGRTIGRTIGRTIGR
MutL Duplication in Wolbachia wMel
ORF01096 DNA mismatch repair protein MutL (mutL)ORF00446 MutL family protein
TIGRTIGRTIGRTIGR
MutL Duplication in Wolbachia wMel
TIGRTIGRTIGRTIGR0.1
Schizosaccharomyces pombeGP139
Neurospora crassaPIRS55262S552
Clostridium perfringensGP18145
Bacillus subtilisSPP45864YWJD
Bacillus cereusGP6759487embCAB
B BACAN 01914 UV endonuclease
Bacillus haloduransOMNINTL01BH
B BACAN 01459 UV endonuclease
Deinococcus radioduransGP61167
Nostoc sp. PCC 7120GP17130610d
Older Duplication of UVDE
TIGRTIGRTIGRTIGR
Uses of Phylogenomics• Selection of species• Functional prediction• Gene duplication• Intragenomic movement• Gene loss• Lateral transfer• Genome rearrangements• Uncultured species
TIGRTIGRTIGRTIGR
X-files
Eisen et al. 2000. Genome Biology 1(6): 11.1-11.9
Also see Tillier and Collins. 2000. Nature Genetics 26(2):195-7 and Suyama and Bork. 2001. Trends Genetics 17: 10-13.
TIGRTIGRTIGRTIGR C. trachomatis MoPn
C. p
neu
mon
iae
AR
39Origin
Terminus
C. trachomatis vs C. pneumoniae Dot Plot
Read et al. 2000
TIGRTIGRTIGRTIGR
StrpB vs. StrpA All
13621300
13621500
13621700
13621900
13622100
13622300
13622500
13622700
13622900
13623100
0 500 1000 1500 2000 2500
Series1
TIGRTIGRTIGRTIGR
StrpB vs. StrpA: Orthologs
13621300
13621500
13621700
13621900
13622100
13622300
13622500
13622700
13622900
13623100
0 500 1000 1500 2000 2500
Series1
TIGRTIGRTIGRTIGR
Uses of Phylogenomics• Selection of species• Functional prediction• Gene duplication• Intragenomic movement• Gene loss• Lateral transfer• Genome rearrangements• Uncultured species
TIGRTIGRTIGRTIGR
Most ‘Evidence’ for Gene Transfer has Alternative Explanations
Observation Other Causes Always Occurs
Unusual Distribution Sampling bias Not if recipient already has gene.
Unusual GC/Codons Selection Not if donor/recipient similar.Not if it occurred long ago.
High hit to "distant" species SelectionRate variationGene loss
Usually.
Incongruent trees Bad treesMissed paralogs
Usually.
Correlation of above withneighbors
Selection Only if genes keep order aftertransfer.
TIGRTIGRTIGRTIGR
Steps in Lateral Gene Transfer
1
2
3-5
6
A B C D
TIGRTIGRTIGRTIGR
Mitochondrial Genome Integration into A. thaliana chrII
Lin et al., 1999
TIGRTIGRTIGRTIGR
Number of pBVTs Dependson # of Genomes Analyzed
1 2 3 4 5 Other
0
200
400
600
800
1000
1200
1400
1600
1800
Number of protein sets
Fruit flyC. elegansArabidopsisYeastParasites
Salzberg et al. 2001
TIGRTIGRTIGRTIGR
Trees Don’t Support Transfer II
TIGRTIGRTIGRTIGR
Uses of Phylogenomics• Selection of species• Functional prediction• Gene duplication• Intragenomic movement• Gene loss• Lateral transfer• Genome rearrangements• Uncultured species
TIGRTIGRTIGRTIGR Beja O, et.al., Science 2000 289:1902-6, Nature (2001) 411: 786-789
TIGRTIGRTIGRTIGR
Puf Operons from Uncultured Bacteria
TIGRTIGRTIGRTIGR
Puf Operons vs. Cultured Species
TIGRTIGRTIGRTIGR
Alternative Phylogenetic AnchorsChlorobium tepidum
Cytophaga hutchinsonii
Prevotella ruminocola
Bacteroides fragilis
Porphyromonas gingivalis
MBBAD68TR
MBBAD65TR
TIGRTIGRTIGRTIGR
Acknowledgements• Outside TIGR
–A. Stoltzfus
–H. Ochman
–D. Bryant
–W. F. Doolittle
–M. Eisen
–M-I Benito
• $$$:
–NSF
–NIH
–ONR
–DOE
–NEB
TIGRTIGRTIGRTIGR
B. anthracis lineage specific duplications
ORF04205 molybdopterin biosynthesis protein MoeA (moeA)ORF05907 molybdopterin biosynthesis protein MoeA (moeA)ORF02636 molybdopterin biosynthesis protein MoeA (moeA)ORF04204 molybdopterin biosynthesis protein MoeB, putativeORF05908 molybdopterin biosynthesis protein MoeB, putativeORF02634 molybdopterin biosynthesis protein MoeB, putativeORF05904 molybdopterin converting factor, subunit 1 (moaD)ORF02639 molybdopterin converting factor, subunit 1 (moaD)ORF04206 molybdopterin converting factor, subunit 2 (moaE)ORF05905 molybdopterin converting factor, subunit 2 (moaE)ORF02638 molybdopterin converting factor, subunit 2 (moaE)
Based on Read et al. submitted
TIGRTIGRTIGRTIGR0.1
Schizosaccharomyces pombeGP139
Neurospora crassaPIRS55262S552
Clostridium perfringensGP18145
Bacillus subtilisSPP45864YWJD
Bacillus cereusGP6759487embCAB
B BACAN 01914 UV endonuclease
Bacillus haloduransOMNINTL01BH
B BACAN 01459 UV endonuclease
Deinococcus radioduransGP61167
Nostoc sp. PCC 7120GP17130610d
TIGRTIGRTIGRTIGR
TIGRTIGRTIGRTIGR
C. pneumoniae Paralogs by Position
TIGRTIGRTIGRTIGR
C. pneumoniae Paralogs - Lineage Specific