Post on 03-Jan-2016
description
transcript
Codon Bias and Codon Bias and Regulation of Translation Regulation of Translation
among Bacteria and Phagesamong Bacteria and Phages
Thesis defense ofThesis defense of
Marc BAILLY-BECHETMarc BAILLY-BECHET
Advisor: Massimo VERGASSOLAAdvisor: Massimo VERGASSOLA
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Institut Pasteur, Dept Genomes & Genetics, Unit « In Silico » GeneticsInstitut Pasteur, Dept Genomes & Genetics, Unit « In Silico » Genetics
SummarySummary
Introduction to the bacterial Introduction to the bacterial translation system and the codon translation system and the codon biasbias
Structuration of the bacterial Structuration of the bacterial chromosomes by codon bias chromosomes by codon bias domainsdomains
Why tRNAs in phages?Why tRNAs in phages?
Translation processes in Translation processes in prokariotuc cellsprokariotuc cells
Transfer RNATransfer RNA
tRNAs are the small RNAs tRNAs are the small RNAs that link an amino-acid to that link an amino-acid to the peptide sequencethe peptide sequence
They have a special They have a special palindromic structurepalindromic structure
They are amino acid They are amino acid specific AND codon specific AND codon « specific » (wooble)« specific » (wooble)
They differ greatly in They differ greatly in number in the cell (from number in the cell (from ~100 to ~5000 for a given ~100 to ~5000 for a given amino acid)amino acid)
Degeneracy of the genetic Degeneracy of the genetic code code
Differential usage of Differential usage of synonymous codons at the genome synonymous codons at the genome
scalescale
Causes of the codon biasCauses of the codon bias
Non-selective causes of the codon biasNon-selective causes of the codon bias Mutation biases (e. g. towards high/low G+C)Mutation biases (e. g. towards high/low G+C) Strand bias on the chromosome (GT bias)Strand bias on the chromosome (GT bias)
Selective causes of the codon bias:Selective causes of the codon bias: Translation efficiencyTranslation efficiency Translation accuracyTranslation accuracy Codon-anticodon selection ?Codon-anticodon selection ? Codon robustness ?Codon robustness ?
tRNA concentration tRNA concentration correlates to codon biascorrelates to codon bias
Dong et al. (1996) J. Mol. Biol. 260:649
Codon bias domains Codon bias domains over over
bacterial chromosomesbacterial chromosomes
Motivations of the projectMotivations of the project
Aim: clustering the genes of an organism Aim: clustering the genes of an organism according to their codon biasaccording to their codon bias
Biological interests: Biological interests: – Functional analysis of the groups of genesFunctional analysis of the groups of genes– Role of codon bias in the chromosome structurationRole of codon bias in the chromosome structuration– Comparison of the genome organization between Comparison of the genome organization between
speciesspecies– Inference of some codon bias causes from the Inference of some codon bias causes from the
classificationclassification
Previous resultsPrevious results
Methods: Methods: correspondance analysiscorrespondance analysis
2 main sub-groups of 2 main sub-groups of genes identified in genes identified in multiple organisms:multiple organisms:– Highly expressedHighly expressed– Horizontal transfer genesHorizontal transfer genes
Methodological Methodological difficulties:difficulties:– Choice of the number of Choice of the number of
groupsgroups– Choice of the distanceChoice of the distance
Kunst et al. (1997), Nature 390:249
Key idea about the Key idea about the method: the optimization method: the optimization
criteria criteria Each group is defined by the probability Each group is defined by the probability
distribution of codon usage generated by the distribution of codon usage generated by the genes it contains genes it contains
A good classification is one which maximize the A good classification is one which maximize the gain of information on these probability gain of information on these probability distributions, relative to a uniform prior distributions, relative to a uniform prior distributiondistribution
€
maxaminoacids
∑ DKL* Pprior || Ppost( )
groups
∑
The clustering algorithmThe clustering algorithm
…….
…….
N
N-1
Threshold C =40
Key idea about the method: Key idea about the method: selection of the number of groupsselection of the number of groups
The good number of groups is the one The good number of groups is the one maximizing the average stability of genes maximizing the average stability of genes attribution inside the groups, relative to attribution inside the groups, relative to the expected stability in absence of the expected stability in absence of structure (random case)structure (random case)
€
bgs =
L g∈ Cs( )
L g∈ Cs'( )s'
∑
€
max bgs
C s( )
s=1
S
∏1
S
Number of groups and Number of groups and clustering significanceclustering significance
Codon usage inside the Codon usage inside the groupsgroups
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Tests of the algorithmTests of the algorithm
Gene function is Gene function is correlated correlated
with codon biaswith codon bias1.1. Highly expressed genes, translation and ribosomal proteins : Highly expressed genes, translation and ribosomal proteins :
COG J ( COG J (99//2222).).
2.2. Unknown genes, pathogenicity islands and horizontally Unknown genes, pathogenicity islands and horizontally transfered genes : COG - (transfered genes : COG - (1717//1919).).
3.3. Metabolism (synthesis & transport) : COG C (Metabolism (synthesis & transport) : COG C (44//66), E (), E (77//44) et ) et F ( F (77).).
4.4. Membrane and carbohydrate metabolism genes : COG G (Membrane and carbohydrate metabolism genes : COG G (66) ) et M (et M (33//33).).
5.5. B. subtilisB. subtilis only -- Motility genes : COG N ( only -- Motility genes : COG N (55).).
Anabolic genes are Anabolic genes are grouped on the lagging grouped on the lagging
strandstrand
Replication and Replication and transcription machineries transcription machineries
collisionscollisions
Mirkin & Mirkin (2005) Mol Cell Biol. 25(3): 888
Anabolic genes are usually transcribed when no replication occurs
=> being on the lagging strand is not counter-selected.
Codon bias domainsCodon bias domains
Group by group analysis : Group by group analysis : influence of the GC%influence of the GC%
Group 2 GC=35.8%
Group 4 GC=47%
Acknowledgements (I)Acknowledgements (I)
Frank Kunst and all the GMP TeamFrank Kunst and all the GMP Team
Why tRNAs in phages?Why tRNAs in phages?
What’s a phage?What’s a phage?
Motivations of the projectMotivations of the project
Understanding the presence of tRNAs Understanding the presence of tRNAs inside bacteriophagesinside bacteriophages– Correlation to the host or phage codon bias?Correlation to the host or phage codon bias?– Differences between lytic and temperate Differences between lytic and temperate
phages?phages?– Selection acting on tRNA acquisition and Selection acting on tRNA acquisition and
implications for phage evolution?implications for phage evolution?
Acquisition of tRNA Acquisition of tRNA sequences by sequences by
bacteriophagesbacteriophages Lysogenic phages are known to insert in Lysogenic phages are known to insert in
microbial genomes in tRNA sequencesmicrobial genomes in tRNA sequences=> Imprecise excision could explain the => Imprecise excision could explain the
acquisition of tRNA sequencesacquisition of tRNA sequences
Lytic phages cause liberation of the host Lytic phages cause liberation of the host genetic material after cell lysisgenetic material after cell lysis
=> Acquisition of tRNAs sequences in the => Acquisition of tRNAs sequences in the surrounding media or neighbour hosts surrounding media or neighbour hosts
DatasDatas Beginning : Beginning :
– 200 DNA phage genomes, 23 hosts, 240 tRNAs 200 DNA phage genomes, 23 hosts, 240 tRNAs Taken out : Taken out :
– Non sequenced hostsNon sequenced hosts– Phages genomes without tRNAsPhages genomes without tRNAs– tRNAs inserted in prophagic regionstRNAs inserted in prophagic regions– Phages having tRNAs their host do not havePhages having tRNAs their host do not have
Final dataset :Final dataset :– 37 phages, 15 hosts, 169 tRNAs37 phages, 15 hosts, 169 tRNAs(6 duplicates, 1 triplet)(6 duplicates, 1 triplet)
tRNA distribution in tRNA distribution in phagesphages
Correlation of host and Correlation of host and phages codon biasphages codon bias
< R > = 0.77 0.27 real data< R > = 0.38 0.42 phage-random host=> Codon usage is correlated between the host and the phage
< R > = 0.83 0.14 real data - Temperate< R > = 0.61 0.39 real data - Lytic=> The correlations are higher in temperate phages
Phage codon frequency Phage codon frequency distribution is related to tRNA distribution is related to tRNA
contentcontent<Nc> =49.9 <Nc> =52.9
First conclusionsFirst conclusions
Lytic phages have a codon usage less Lytic phages have a codon usage less similar to the one of their hosts when similar to the one of their hosts when compared to temperate phagescompared to temperate phages
Lytic phages have more tRNAs than Lytic phages have more tRNAs than temperate onestemperate ones
Codon usage is more biased in lytic Codon usage is more biased in lytic phages than in temperate onesphages than in temperate ones
Both seem to have tRNAs corresponding Both seem to have tRNAs corresponding to the codons they use moreto the codons they use more
Random uptake hypothesisRandom uptake hypothesis
tRNA content of host matches codon biastRNA content of host matches codon bias Codon bias of phage matches the host Codon bias of phage matches the host one’sone’s=> No need for the phage to have tRNAs !=> No need for the phage to have tRNAs !
Random uptake hypothesis: the tRNA content Random uptake hypothesis: the tRNA content of a phage should be proportional to its host of a phage should be proportional to its host tRNA content, and so would be indirectly tRNA content, and so would be indirectly correlated to the codon bias of the phagecorrelated to the codon bias of the phage
Statistical tests of the Statistical tests of the random uptake hypothesisrandom uptake hypothesis
Significance for high values of <f>: p = 0.68Significance for high values of <f>: p = 0.68– No specific enrichment in tRNAs for the phage high No specific enrichment in tRNAs for the phage high
frequency codons frequency codons Significance for high values of <∆f>: p < 0.0007Significance for high values of <∆f>: p < 0.0007
– Significant enrichment in tRNAs for the codons the phage Significant enrichment in tRNAs for the codons the phage uses more than its hostuses more than its host
€
< f >=1
Nα
fα (k)k=1
Nα
∑
€
<Δf >=1
Nα
fα (k) − fβ (k)( )k=1
Nα
∑
Modelisation of the Modelisation of the acquisition and loss acquisition and loss
processesprocesses
€
Pαβ ,x (n, t + dt)€
Pαβ ,x (n +1, t)
€
Pαβ ,x (n −1, t)
€
Pαβ ,x (n, t)
Gain Loss
€
rHβ ,x
€
(n +1)€
1− rHβ ,x
€
−n
Inference of the parameters Inference of the parameters by maximum likelihoodby maximum likelihood
Maximum likelihood
€
P(n)
€
r = 0.060€
L(r) = P(Nα )α ,β ,x
∏Likelihood of the real data, given the model
Most probable
Probability
Evolutive processes testedEvolutive processes tested
Selection based on:Selection based on:– Frequency of usage of the corresponding Frequency of usage of the corresponding
codon in the phage genome (+)codon in the phage genome (+)– Frequency of usage of the corresponding Frequency of usage of the corresponding
codon in the host genome (-)codon in the host genome (-)– Difference of codon usage frequencies Difference of codon usage frequencies
between phage and host genome (+)between phage and host genome (+)
Duplication of tRNA on the phage Duplication of tRNA on the phage genomegenome
Master model equation Master model equation resultsresults
Selection based on the phage frequency of Selection based on the phage frequency of codon usage is non significant (p=0.15)codon usage is non significant (p=0.15)
Selection based on the rarity of the codon in Selection based on the rarity of the codon in the host genome is slightly significant the host genome is slightly significant (p=0.018 before Bonferroni correction)(p=0.018 before Bonferroni correction)
Selection based on the Selection based on the difference of difference of frequenciesfrequencies of codon usage between phage of codon usage between phage and host is highly significant (p<2.10and host is highly significant (p<2.10-7-7))
The tRNA duplication hypothesis has to be The tRNA duplication hypothesis has to be rejectedrejected
Adaptative selection of Adaptative selection of tRNAs?tRNAs?
Selection relative to the phage codon usage Selection relative to the phage codon usage only could lead to a static tRNA content, and only could lead to a static tRNA content, and could be non-optimal after an host changecould be non-optimal after an host change
Selection relative to the host codon usage only Selection relative to the host codon usage only does not take into account the quick phage does not take into account the quick phage sequence evolutionsequence evolution
Selection needs to take both into account to Selection needs to take both into account to be adaptative and gives rise to a useful tRNA be adaptative and gives rise to a useful tRNA content content
ConclusionsConclusions
Translational selection is a strong pressure Translational selection is a strong pressure acting on phage tRNA contentacting on phage tRNA content
tRNA content among phages is optimized to tRNA content among phages is optimized to compensate for differences between host and compensate for differences between host and phage codon usagephage codon usage
This pressure is more important in lytic phagesThis pressure is more important in lytic phages
Acknowledgements (II)Acknowledgements (II)
Massimo Vergassola Massimo Vergassola Eduardo RochaEduardo Rocha
The committee membersThe committee members Yves CharonYves Charon Guillaume CambrayGuillaume Cambray Aymeric Fouquier d’HerouelAymeric Fouquier d’Herouel All the family and friends who came All the family and friends who came
today!today!
Supp. Mat. Part 1Supp. Mat. Part 1
Codons probability Codons probability distributionsdistributions
Tests of the algorithm (II)Tests of the algorithm (II)
High CAI genes share the same High CAI genes share the same codon bias: codon bias: – 32/59 in group 1 of 32/59 in group 1 of B. subtilisB. subtilis– 33/33 in group 1 of 33/33 in group 1 of E. coliE. coli
Genes in the same operon or Genes in the same operon or pathway tend to belong to the pathway tend to belong to the same groupsame group
Transcription and Transcription and translationtranslation
From Miller et al., 1970, Science 169:392
Translation regulation and Translation regulation and synchronization by tRNA recyclingsynchronization by tRNA recycling
Gene 1 Gene 2 Gene 3
Recycling phenomenon Recycling phenomenon analysisanalysis
On average, tRNA recycling should On average, tRNA recycling should not increase translation speed not increase translation speed
Recycling could induce a coupling Recycling could induce a coupling between close ribosomes, allowing for between close ribosomes, allowing for protein synthesis synchronizationprotein synthesis synchronization
Synthetases are the limiting factor as Synthetases are the limiting factor as they prevent in most cases a tRNA they prevent in most cases a tRNA used by a ribosome to be re-used by a ribosome to be re-employed by another close oneemployed by another close one
Supp. Mat. Part 2Supp. Mat. Part 2
Phage codon frequency Phage codon frequency distribution is related to tRNA distribution is related to tRNA
contentcontent
Master equation model (I)Master equation model (I)
€
∂P(n, t + dt)
∂t= (rH)P(n −1, t) + (n +1)P(n +1, t) − (rH + n)P(n, t)
Random excision
€
limt →∞
P(n, t) =(rH)n
n!e−rH
Modelisation of the acquisition Modelisation of the acquisition and loss processes (II)and loss processes (II)
€
Pαβ ,x (n, t + dt)€
Pαβ ,x (n +1, t)
€
Pαβ ,x (n −1, t)
€
Pαβ ,x (n, t)
Gain Loss
€
rHβ ,x
€
(n +1)e−sfα ,x
€
1− rHβ ,x
€
−ne−sfα ,x
Master equation models Master equation models (II)(II)
€
∂P(n, t + dt)
∂t= (rH)P(n −1, t) + (n +1)P(n +1, t) − (rH + n)P(n, t)
€
∂P(n, t + dt)
∂t= (rH)P(n −1, t) + (n +1)e−sΔf P(n +1, t) − (rH + ne−sΔf )P(n, t)
€
∂P(n, t + dt)
∂t= (rH + (n −1)c)P(n −1, t) + (n +1)e−sΔf P(n +1, t) − (rH + n(e−sΔf + c))P(n, t)
Random excision
Random excision + selective loss
Random excision + selective loss + random copy
Selection is significant event relative to random hosts