+ All Categories
Home > Documents > (bacterial) Genome Evolution: added value of genomes

(bacterial) Genome Evolution: added value of genomes

Date post: 16-Jan-2016
Category:
Upload: mercury
View: 39 times
Download: 0 times
Share this document with a friend
Description:
Bioinformatics and Evolutionary Genomics Genome Evolution ( I ) and Genomics Context for function prediction. (bacterial) Genome Evolution: added value of genomes. How does gene content evolve? How does gene order evolve? - PowerPoint PPT Presentation
Popular Tags:
43
Bioinformatics and Evolutionary Genomics Bioinformatics and Evolutionary Genomics Genome Evolution ( Genome Evolution ( I I ) and Genomics Context ) and Genomics Context for function prediction for function prediction
Transcript
Page 1: (bacterial) Genome Evolution: added value of genomes

Bioinformatics and Evolutionary GenomicsBioinformatics and Evolutionary Genomics

Genome Evolution (Genome Evolution (II) and Genomics Context for ) and Genomics Context for function predictionfunction prediction

Bioinformatics and Evolutionary GenomicsBioinformatics and Evolutionary Genomics

Genome Evolution (Genome Evolution (II) and Genomics Context for ) and Genomics Context for function predictionfunction prediction

Page 2: (bacterial) Genome Evolution: added value of genomes

(bacterial) Genome Evolution: added value of genomes(bacterial) Genome Evolution: added value of genomes(bacterial) Genome Evolution: added value of genomes(bacterial) Genome Evolution: added value of genomes

• How does gene content evolve?How does gene content evolve?• How does gene order evolve?How does gene order evolve?• How important are various evolutionary dynamics of How important are various evolutionary dynamics of

genes on a genomic scale (e.g. gene fusion, gene genes on a genomic scale (e.g. gene fusion, gene loss, gene duplication): moving from anecdotes to loss, gene duplication): moving from anecdotes to trendstrends

• How does gene content evolve?How does gene content evolve?• How does gene order evolve?How does gene order evolve?• How important are various evolutionary dynamics of How important are various evolutionary dynamics of

genes on a genomic scale (e.g. gene fusion, gene genes on a genomic scale (e.g. gene fusion, gene loss, gene duplication): moving from anecdotes to loss, gene duplication): moving from anecdotes to trendstrends

Page 3: (bacterial) Genome Evolution: added value of genomes

functionally associated functionally associated proteins leave proteins leave

evolutionary tracesevolutionary traces of of their relation in genomestheir relation in genomes

functionally associated functionally associated proteins leave proteins leave

evolutionary tracesevolutionary traces of of their relation in genomestheir relation in genomes

Genomic context / in silico interaction prediction

Page 4: (bacterial) Genome Evolution: added value of genomes

Gene order evolution:Gene order evolution:

-Establish orthologous relations between pairs of genomes -Establish orthologous relations between pairs of genomes (e.g. S-W best bidirectional hit approach(e.g. S-W best bidirectional hit approach-Put them in a dotplot, color the relative direction of -Put them in a dotplot, color the relative direction of transcription (transcription (GreenGreen for the same relative direction. for the same relative direction. RedRed for for the opposite direction.)the opposite direction.)

Gene order evolution:Gene order evolution:

-Establish orthologous relations between pairs of genomes -Establish orthologous relations between pairs of genomes (e.g. S-W best bidirectional hit approach(e.g. S-W best bidirectional hit approach-Put them in a dotplot, color the relative direction of -Put them in a dotplot, color the relative direction of transcription (transcription (GreenGreen for the same relative direction. for the same relative direction. RedRed for for the opposite direction.)the opposite direction.)

Page 5: (bacterial) Genome Evolution: added value of genomes
Page 6: (bacterial) Genome Evolution: added value of genomes

Evolution of genome organization:Evolution of genome organization:

-In prokaryotes, genome inversions centered around the In prokaryotes, genome inversions centered around the origin/terminus of replication are a major source of genome origin/terminus of replication are a major source of genome rearrangements.rearrangements.

-This suggests that both replication forks are in close contact -> This suggests that both replication forks are in close contact -> comparative genome analysis provides support for a hypothesis comparative genome analysis provides support for a hypothesis about genome replicationabout genome replication““and a close proximity of the forks would increase theand a close proximity of the forks would increase theprobability of reciprocal recombination or transposition between probability of reciprocal recombination or transposition between sequences at the two forks. That the forks are near each other is sequences at the two forks. That the forks are near each other is also consistent with the 'replication factory' model based on also consistent with the 'replication factory' model based on immunolocalization of components of the replication machinery in immunolocalization of components of the replication machinery in Bacillus subtilis” (Tillier and Collins, 2000. Nat. Gen)”Bacillus subtilis” (Tillier and Collins, 2000. Nat. Gen)”

b

Page 7: (bacterial) Genome Evolution: added value of genomes

Gene order evolves Gene order evolves rapidlyrapidly

Gene order evolves Gene order evolves rapidlyrapidly

But …But …But …But …

Page 8: (bacterial) Genome Evolution: added value of genomes

Differential retention Differential retention of divergent / of divergent / convergent gene convergent gene pairs suggests that pairs suggests that conservation implies conservation implies a functional a functional associationassociation

OperonsOperons

Gene Order Gene Order EvolutionEvolution

Gene Order Gene Order EvolutionEvolution

Page 9: (bacterial) Genome Evolution: added value of genomes

Conserved gene orderConserved gene orderConserved gene orderConserved gene order

• i.e. genes that are present over ‘sufficiently large’ i.e. genes that are present over ‘sufficiently large’ evolutionary distances in the same gene clusterevolutionary distances in the same gene cluster

• Contributes many reliable predictionsContributes many reliable predictions

• i.e. genes that are present over ‘sufficiently large’ i.e. genes that are present over ‘sufficiently large’ evolutionary distances in the same gene clusterevolutionary distances in the same gene cluster

• Contributes many reliable predictionsContributes many reliable predictions

Page 10: (bacterial) Genome Evolution: added value of genomes

Conserved gene orderConserved gene orderConserved gene orderConserved gene order

NB1 predicting operons is not trivial; in fact NB1 predicting operons is not trivial; in fact conserved gene order or functional conserved gene order or functional association is a major clueassociation is a major clue

NB2 using ‘only’ operons NB2 using ‘only’ operons without requiring without requiring conservationconservation results in much less reliable results in much less reliable function predictionfunction prediction

Page 11: (bacterial) Genome Evolution: added value of genomes

Comparison to pathways conservation implies a functional Comparison to pathways conservation implies a functional associationassociation

Comparison to pathways conservation implies a functional Comparison to pathways conservation implies a functional associationassociation

1

10

100

1000

10000

0 3 6 9 12 15 18 21 24 27 30

co-occurrences in operons

num

ber

of C

OG

s

0

1

2

3

4

5

6

aver

age

met

abol

ic

dist

ance

number of COGS

average metabolicdistance

Page 12: (bacterial) Genome Evolution: added value of genomes

Conserved gene order: an example from Conserved gene order: an example from metabolism of propionyl-CoA

Conserved gene order: an example from Conserved gene order: an example from metabolism of propionyl-CoA

““query”query”““target”target”

Page 13: (bacterial) Genome Evolution: added value of genomes

Conserved gene order: an example from Conserved gene order: an example from metabolism of propionyl-CoAConserved gene order: an example from Conserved gene order: an example from metabolism of propionyl-CoA

Biochemical assays Biochemical assays confirm the function confirm the function of members of of members of COG0346 as a DL-COG0346 as a DL-methylmalonyl-CoA methylmalonyl-CoA racemase racemase

Biochemical assays Biochemical assays confirm the function confirm the function of members of of members of COG0346 as a DL-COG0346 as a DL-methylmalonyl-CoA methylmalonyl-CoA racemase racemase

Page 14: (bacterial) Genome Evolution: added value of genomes

Gene FusionGene FusionGene FusionGene Fusion

• ““Rare” (especially in prokaryotes): ~3000 linked Rare” (especially in prokaryotes): ~3000 linked COGs in STRING v6 (~180 genomes)COGs in STRING v6 (~180 genomes)

• But what about domain recombination?But what about domain recombination?

• ““Rare” (especially in prokaryotes): ~3000 linked Rare” (especially in prokaryotes): ~3000 linked COGs in STRING v6 (~180 genomes)COGs in STRING v6 (~180 genomes)

• But what about domain recombination?But what about domain recombination?

FusionFusionFusionFusion

Page 15: (bacterial) Genome Evolution: added value of genomes

Gene fusionGene fusionGene fusionGene fusion

• i.e. the orthologs of two genes in another organism are fused into one i.e. the orthologs of two genes in another organism are fused into one polypeptide polypeptide

• A very reliable indicator for functional interaction; partly because it is A very reliable indicator for functional interaction; partly because it is an relatively infrequent evolutionary event:an relatively infrequent evolutionary event:

• i.e. the orthologs of two genes in another organism are fused into one i.e. the orthologs of two genes in another organism are fused into one polypeptide polypeptide

• A very reliable indicator for functional interaction; partly because it is A very reliable indicator for functional interaction; partly because it is an relatively infrequent evolutionary event:an relatively infrequent evolutionary event:

Page 16: (bacterial) Genome Evolution: added value of genomes

Gene fusion: an exampleGene fusion: an exampleGene fusion: an exampleGene fusion: an example

Page 17: (bacterial) Genome Evolution: added value of genomes

Gene Content EvolutionGene Content EvolutionGene Content EvolutionGene Content Evolution

Page 18: (bacterial) Genome Evolution: added value of genomes

What about HGT?What about HGT?Genome trees based on gene content:Genome trees based on gene content:

What about HGT?What about HGT?Genome trees based on gene content:Genome trees based on gene content:

Escherichia coliEscherichia coliEscherichia coliEscherichia coliHaemophilus influenzaeHaemophilus influenzaeHaemophilus influenzaeHaemophilus influenzae

shared genesshared genesshared genesshared genes

Species specificSpecies specificgenesgenesSpecies specificSpecies specificgenesgenes Species specificSpecies specific

genesgenesSpecies specificSpecies specificgenesgenes

Page 19: (bacterial) Genome Evolution: added value of genomes

Genome trees based on gene contentGenome trees based on gene contentGenome trees based on gene contentGenome trees based on gene content

( )( )# shared OGs (spA, spB)# shared OGs (spA, spB)

Weighted average Genomesize(spA, spB)

Weighted average Genomesize(spA, spB)

\s sp1 sp2 sp3 sp4 …

sp1 \1 0.2 0.4 0.2 …

sp2 \1 0.9 0.1 …

sp3 \1 0.3 …

sp4 \1 …

… … … … …

\s sp1 sp2 sp3 sp4 …

sp1 \1 0.2 0.4 0.2 …

sp2 \1 0.9 0.1 …

sp3 \1 0.3 …

sp4 \1 …

… … … … …

Neighbor joiningNeighbor joining

d

0

0.8 0

0.6 0.1 0

0.8 0.9 0.7 0

d

0

0.8 0

0.6 0.1 0

0.8 0.9 0.7 0

dist (spA, spB) = 1 –dist (spA, spB) = 1 –

OG1 OG2 OG3 OG4 …

sp1 1 1 0 1 …

sp2 0 1 0 0 …

sp3 0 0 1 1 …

… … … … …

OG1 OG2 OG3 OG4 …

sp1 1 1 0 1 …

sp2 0 1 0 0 …

sp3 0 0 1 1 …

… … … … …

Presence / absence matrix:Presence / absence matrix:

Page 20: (bacterial) Genome Evolution: added value of genomes

Genome trees based on gene content are remarkably Genome trees based on gene content are remarkably similar to consensus on ToLsimilar to consensus on ToL

Genome trees based on gene content are remarkably Genome trees based on gene content are remarkably similar to consensus on ToLsimilar to consensus on ToL

C. pneumoniaeC. pneumoniae

C. trachomatisC. trachomatis

M. tuberculosisM. tuberculosis M. pneumoniaeM. pneumoniaeM. genitaliumM. genitalium

B. subtilisB. subtilis

T. pallidumT. pallidum

T. maritimaT. maritima

B. burgdorferiB. burgdorferi

P. horikoshiiP. horikoshii

M. thermoautotrophicumM. thermoautotrophicum

A. fulgidusA. fulgidus

M. jannaschiiM. jannaschii

S. cerevisiaeS. cerevisiaeC. elegansC. elegans

A. aeolicusA. aeolicus

E. coliE. coli

H. influenzaeH. influenzaeR. prowazekiiR. prowazekii

H. pylori H. pylori 2669526695

Synechocystis sp.Synechocystis sp.

H. pylori H. pylori J99J99

A. pernixA. pernix100100100100

100100 100100

100100

100100100100

100100

1001009898

93938989

6969

8888

0.10.1

9797

ProteobacteriaProteobacteria

EukaryaEukarya

EuryarchaeotaEuryarchaeota

SpirochaetalesSpirochaetales100100

Page 21: (bacterial) Genome Evolution: added value of genomes

Reconstruction Reconstruction of Gene Contentof Gene ContentReconstruction Reconstruction of Gene Contentof Gene Content

b

Page 22: (bacterial) Genome Evolution: added value of genomes

DeletionGain

Ancestral Genome Reconstruction of Ancestral Genome Reconstruction of LUCA : patchy gene distributionsLUCA : patchy gene distributions

Ancestral Genome Reconstruction of Ancestral Genome Reconstruction of LUCA : patchy gene distributionsLUCA : patchy gene distributions

Page 23: (bacterial) Genome Evolution: added value of genomes

ParsimonyParsimonyParsimonyParsimony

• Attach a cost (Attach a cost (cc) to HGT / independent gain in terms ) to HGT / independent gain in terms of loss; find scenario with lowest costof loss; find scenario with lowest cost

• At g = 1.5, 733 genes in LUCAAt g = 1.5, 733 genes in LUCA• At g = 2, 956 genes in LUCAAt g = 2, 956 genes in LUCA• Evolution is not parsimonious, minimal estimate?Evolution is not parsimonious, minimal estimate?• Why not use gene trees?Why not use gene trees?

• Attach a cost (Attach a cost (cc) to HGT / independent gain in terms ) to HGT / independent gain in terms of loss; find scenario with lowest costof loss; find scenario with lowest cost

• At g = 1.5, 733 genes in LUCAAt g = 1.5, 733 genes in LUCA• At g = 2, 956 genes in LUCAAt g = 2, 956 genes in LUCA• Evolution is not parsimonious, minimal estimate?Evolution is not parsimonious, minimal estimate?• Why not use gene trees?Why not use gene trees?

b

Page 24: (bacterial) Genome Evolution: added value of genomes

Nice resultsNice resultse.g. nucleotidee.g. nucleotidebiosynthesisbiosynthesis

Page 25: (bacterial) Genome Evolution: added value of genomes

Another attempt to reconstruct the genome Another attempt to reconstruct the genome of LUCAof LUCA

Another attempt to reconstruct the genome Another attempt to reconstruct the genome of LUCAof LUCA

• over 1000 gene families, of which more than 90% are over 1000 gene families, of which more than 90% are also functionally characterized.also functionally characterized.

• a fairly complex genome similar to those of free-living a fairly complex genome similar to those of free-living prokaryotes, with a variety of functional capabilities prokaryotes, with a variety of functional capabilities including metabolic transformation, information including metabolic transformation, information processing, membrane/transport proteins and processing, membrane/transport proteins and complex regulation, complex regulation,

• over 1000 gene families, of which more than 90% are over 1000 gene families, of which more than 90% are also functionally characterized.also functionally characterized.

• a fairly complex genome similar to those of free-living a fairly complex genome similar to those of free-living prokaryotes, with a variety of functional capabilities prokaryotes, with a variety of functional capabilities including metabolic transformation, information including metabolic transformation, information processing, membrane/transport proteins and processing, membrane/transport proteins and complex regulation, complex regulation,

Page 26: (bacterial) Genome Evolution: added value of genomes

Presence / absence of genesPresence / absence of genesPresence / absence of genesPresence / absence of genes

Gene content Gene content co-evolution. (The easy case, few genomes. ) co-evolution. (The easy case, few genomes. )Gene content Gene content co-evolution. (The easy case, few genomes. ) co-evolution. (The easy case, few genomes. )

Genomes share genes for phenotypes they have in commonGenomes share genes for phenotypes they have in commonGenomes share genes for phenotypes they have in commonGenomes share genes for phenotypes they have in common

Differences between gene Differences between gene Content reflect differences inContent reflect differences inPhenotypic potentialitiesPhenotypic potentialities

Differences between gene Differences between gene Content reflect differences inContent reflect differences inPhenotypic potentialitiesPhenotypic potentialities

Page 27: (bacterial) Genome Evolution: added value of genomes

Qualitative differential genome analysis:

-Find “pathogen specific” specific proteins that can serve as drug targets

-Relate the differences between genomes to the differences in the phenotypes

Qualitative differential genome analysis:

-Find “pathogen specific” specific proteins that can serve as drug targets

-Relate the differences between genomes to the differences in the phenotypes

Page 28: (bacterial) Genome Evolution: added value of genomes

Three-way comparisonsThree-way comparisonsThree-way comparisonsThree-way comparisons

Huynen et al., 1998, FEBS Lett

Page 29: (bacterial) Genome Evolution: added value of genomes

Convergence in functional classes of gene content in Convergence in functional classes of gene content in small intra cellular bacterial parasites small intra cellular bacterial parasites

Convergence in functional classes of gene content in Convergence in functional classes of gene content in small intra cellular bacterial parasites small intra cellular bacterial parasites

Zomorodipour & Andersson FEBS Letters 1999

Page 30: (bacterial) Genome Evolution: added value of genomes

Although we can, qualitatively, interpret the variations in Although we can, qualitatively, interpret the variations in shared gene content in terms of the phenotypes of the shared gene content in terms of the phenotypes of the species, quantitatively they depend on the relative species, quantitatively they depend on the relative phylogenetic positions of the species. The closer two phylogenetic positions of the species. The closer two species are the larger fraction of their genes they share.species are the larger fraction of their genes they share.

Page 31: (bacterial) Genome Evolution: added value of genomes

Presence / absence of genesPresence / absence of genesPresence / absence of genesPresence / absence of genes

L. innocua (non-pathogen)L. innocua (non-pathogen) L. monocytogenes (pathogen)L. monocytogenes (pathogen)

Page 32: (bacterial) Genome Evolution: added value of genomes

Occurrence of genesOccurrence of genesOccurrence of genesOccurrence of genes

L. innocua (non-pathogenic)L. innocua (non-pathogenic) L. monocytogenes (pathogenic)L. monocytogenes (pathogenic)

Genes involved in pathogenecity Genes involved in pathogenecity

Page 33: (bacterial) Genome Evolution: added value of genomes

Generalization: phylogenetic profiles / co-occurence

Generalization: phylogenetic profiles / co-occurence

Gene 1: Gene 2:Gene 3:....

Gene 1: Gene 2:Gene 3:....

spec

ies

1 sp

ecie

s 2

spec

ies

3

spec

ies

4

spec

ies

5 ..

....

..

... ..sp

ecie

s 1

spec

ies

2

spec

ies

3

spec

ies

4

spec

ies

5 ..

....

..

... ..

Gene 1: 1 0 1 1 0 1 Gene 2: 1 1 0 0 1 0Gene 3: 0 1 0 0 1 0....

Gene 1: 1 0 1 1 0 1 Gene 2: 1 1 0 0 1 0Gene 3: 0 1 0 0 1 0....

spec

ies

1 sp

ecie

s 2

spec

ies

3

spec

ies

4

spec

ies

5 ..

....

..

... ..sp

ecie

s 1

spec

ies

2

spec

ies

3

spec

ies

4

spec

ies

5 ..

....

..

... ..

Page 34: (bacterial) Genome Evolution: added value of genomes

Co-occurrence of genes across genomesCo-occurrence of genes across genomes

• i.e. two genes i.e. two genes have the same have the same presence/ absence presence/ absence pattern over pattern over multiple genomes:multiple genomes:

•AKA phylogenetic AKA phylogenetic profilesprofiles

•NB complete NB complete genomes absencegenomes absence

•Correction for Correction for phylogenetic signal phylogenetic signal needed needed → events→ events

b

Page 35: (bacterial) Genome Evolution: added value of genomes

Predicting function of a disease gene protein with unknown function, Predicting function of a disease gene protein with unknown function, frataxin, using co-occurrence of genes across genomesfrataxin, using co-occurrence of genes across genomes

Predicting function of a disease gene protein with unknown function, Predicting function of a disease gene protein with unknown function, frataxin, using co-occurrence of genes across genomesfrataxin, using co-occurrence of genes across genomes

• Friedreich’s ataxiaFriedreich’s ataxia• No (homolog with) known functionNo (homolog with) known function

• Friedreich’s ataxiaFriedreich’s ataxia• No (homolog with) known functionNo (homolog with) known function

Page 36: (bacterial) Genome Evolution: added value of genomes

A.aeolicus Synechocystis

B.subtilis

M.genitalium

M.tuberculosis

D.radiodurans

R.prow

azekii

C.crescentus

M.loti

N.m

eningitidis

X.fastidiosa

P.aeruginosa

Buchnera

V.cholerae

H.influenzae

P.multocida

E.coliA

.pernixM

.janna schii

A.th al ian a S.cer ev isiae

s

C. jejun i

C. albican s

S.p ombe

H.sapiens

C.eleg an

H. pylori

D.m

elan.

cyaY Yfh1cyaY Yfh1

hscB Jac1hscB Jac1hscAhscA

ssq1ssq1

Nfu1Nfu1

iscA Isa1-2iscA Isa1-2fdx Yah1fdx Yah1

Arh1Arh1

RnaMRnaMIscRIscRHypHyp

iscS Nfs1 iscS Nfs1 iscU Isu1-2iscU Isu1-2

Atm1Atm1

Frataxin has co-evolved with hscA and hscB indicating that it plays a role in iron-sulfur Frataxin has co-evolved with hscA and hscB indicating that it plays a role in iron-sulfur cluster assemblycluster assembly

Frataxin has co-evolved with hscA and hscB indicating that it plays a role in iron-sulfur Frataxin has co-evolved with hscA and hscB indicating that it plays a role in iron-sulfur cluster assemblycluster assembly

Page 37: (bacterial) Genome Evolution: added value of genomes

Iron-Sulfur (2Fe-2S) cluster in the Rieske protein

Page 38: (bacterial) Genome Evolution: added value of genomes

Prediction:

~Confirmation:

Page 39: (bacterial) Genome Evolution: added value of genomes

functionally associated functionally associated proteins leave proteins leave

evolutionary tracesevolutionary traces of of their relation in genomestheir relation in genomes

functionally associated functionally associated proteins leave proteins leave

evolutionary tracesevolutionary traces of of their relation in genomestheir relation in genomes

Genomic context / in silico interaction prediction

Page 40: (bacterial) Genome Evolution: added value of genomes

Evolutionary rate Evolutionary rate Evolutionary rate Evolutionary rate

Chen &Dokholyan TiG 2006Chen &Dokholyan TiG 2006Chen &Dokholyan TiG 2006Chen &Dokholyan TiG 2006

Page 41: (bacterial) Genome Evolution: added value of genomes

Co-evolution: Co-evolution: mirrortreemirrortree

Co-evolution: Co-evolution: mirrortreemirrortree

Pavos & Valencia PEDS 2001Pavos & Valencia PEDS 2001Pavos & Valencia PEDS 2001Pavos & Valencia PEDS 2001

Page 42: (bacterial) Genome Evolution: added value of genomes

Co-evolution: mirrortreeCo-evolution: mirrortreeCo-evolution: mirrortreeCo-evolution: mirrortree

Page 43: (bacterial) Genome Evolution: added value of genomes

0 0.2 0.4 0.6 0.8 1Score

0

0.2

0.4

0.6

0.8

1

FusionGene OrderCo-occurrenceF

ract

ion

sam

e K

EG

G m

a p

Integrating genomic context scores into one Integrating genomic context scores into one single score (post-hoc)single score (post-hoc)

• Compare each individual method against an independent benchmark Compare each individual method against an independent benchmark (KEGG), and find “equivalency”(KEGG), and find “equivalency”• Multiply the chances that two proteins are Multiply the chances that two proteins are not not interacting and subtract interacting and subtract from 1; naive bayesian i.e. assuming independencefrom 1; naive bayesian i.e. assuming independence


Recommended