Date post: | 03-Feb-2018 |
Category: |
Documents |
Upload: | nguyenhanh |
View: | 217 times |
Download: | 4 times |
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Symplectic biology:the cell as a living
computer
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Authors
in silicoGang FangEtienne LarsabalGéraldine PascalEduardo RochaIvan MoszerClaudine Médigue
in vivo / in vitroAgnieszka SekowskaAnne Marie GillesOctavian Barzu
CollectiveStanislas Noria
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Background
Physics: matter, energy, timeBiology: Physics + information, coding,control...Arithmetics: strings of whole numbers,recursivity, coding…Computing: Arithmetics + program + machine...
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Information Transfer
As is the case for building up a machine, oneneeds a book of recipe to build up a cell
This asks for changing the text of the recipe intosomething concrete: this transfers« information »
In a cell, information transfer is managed by thegenetic program
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Three processes are needed for Life:
Information transfer (Living Computers?) => the goal of genomicsis to decipher the blueprint of the “read-only” memory of themachine
Driving force for a coupling between the genome structure and thestructure of the cell:
Metabolism (Internal organisation)Compartmentalisation (General structure)
What is Life?
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Two processes are needed for computing:
A read/write machine
A program on a physical support (typically, a tape illustrates thesequential string of symbols that makes up the program), split (inpractice) into two entities:
Program (providing the goal)Data (providing the context)
The machine is distinct from the program
What iscomputing?
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Cells as computers
Genomics rest on an alphabetic metaphor, that of a textwritten with a four-letter alphabet, acting as a program
Conjecture: do cells behave as computers?
Genetic engineeringVirusesHorizontal gene transferCloning animal cells
all point to separation betweenMachineData + Program
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
If the machine has not only to behave as acomputer but has also to construct themachine itself, one must find an image ofthe machine somewhere in the machine (J.von Neumann)
Is there a map of the cellin the chromosome?
A. Danchin The Delphic Boat. What genomes tell us (2003) Harvard University Press
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Genome organisation
Is the gene order random in the chromosomes?
At first sight, despite different DNA managementprocesses not much is conserved, and genestransferred from other organisms are distributedthroughout genomes
However, groups of genes such as operons orpathogenicity islands tend to cluster in specificplaces, and they code for proteins with commonfunctions
First question: how are generated and where arelocated repeats in the genome sequence?
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Caveat: Repeats aremeaningful
Remember also:
This clock has aminute minutehand
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Repeats in bacteria
Abcissa: first occurrence of the repeatOrdinate: second position of the repeat
Diagonal: repeats are located near to eachother
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
.
0
0
1000
2000
3000
4000
1000 2000 3000 4000
Escher i ch i ac o l i
1000
0 200
2000
600 1000 1400 1800
Haemophi lusin f l uenzae
00500
1000
1500
0 500 1000 1500
Methanococcusj an n asch i i 10
0200
300
400
0
0 100 200 300 400 500
Mycoplasmagen i ta l i um
500
0200
400
600
800
0 200 400 600 800
Mycoplasmapneumoniae
0500
1000
1500
0 500 1000 1500
Hel icobacterpy l o r i
0 1000 2000 3000 4000
0
B a c i l l u ss u b t i l i s
1000
2000
3000
4000
0 500 1000 1500
0
Methanobacteriumthermoautotrophicum
500
1000
1500
NR = 397NT = 283
NR = 170NT = 54
NR = 204NT = 111
NR = 139NT = 82
NR = 260NT = 187
NR = 552NT = 250
NR = 183NT = 75
NR = 280NT = 137
DNA management:Repeats in genomes
E. Rocha, A. Viari & A. Danchin Analysis of long repeats in bacterial genomes reveals alternative evolutionary mechanisms in Bacillus subtilis andother competent prokaryotes. Mol. Biol. Evol. (1999) 16: 1219-1230
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Genome organisation
The genome organisation is so rigidthat the overall result of selectionpressure on DNA is visible in thegenome text, which is full of« flexible patterns of class A »
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
The period 10-11.5
The genome of Helicobacter pylori displays a periodof 11 over regions spanning 60 nucleotides
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Class A flexiblepatterns
The period 10-11.5 is explained by the presence ofomnipresent patterns the class A flexible patterns
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Class A flexible patternsare ubiquitous
The period 10-11.5 is explained by the presence ofomnipresent patterns the class A flexible patterns
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Class A flexiblepatterns
The period 10-11.5 is explained by the presence ofomnipresent patterns the class A flexible patterns
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
A universal rule: classA flexible patterns
The flexible nature of the patterns permits DNA toaccomodate superturns or local bending
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Genome organisation
The genome organisation is so rigidthat the overall result of selectionpressure on DNA is visible in thegenome text, where the constraintsof replication are visible in theleading and the lagging strand
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
E. Rocha
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
180
90
0
27055% leading
Escherichia coli
Ori
Ter
90270 65% leadingTreponema pallidum
Ori
Ter
180
90270 75% leadingBacillus subtilis
Ori
Ter
9027087% leading
Thermoanaerobactertengcongensis
Ori
Ter
CDS densityLeading CDS density
(updated from Kunst etal , Nature, 97)
Different “OperatingSystems”?
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Chosing arbitrarily anorigin of replicationand a property of thestrand (basecomposition, codoncomposition, codonusage, amino acidcomposition of thecoded protein…) onecan use discriminantanalysis to seewhether thehypothesis holds.
To lag or to lead...
E. Rocha, A. Danchin & A. Viari Universal replication biases in bacteria. Mol. Microbiol. (1999) 32: 11-16
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
To lag or to lead, that is the question
.
0,450,5
0,550,6
0,650,7
0,750,8
0,85
0 20 40 60 80 100
Bacillussubtilis
accu
racy
Borreliaburgdorferi
0,4
0,5
0,6
0,7
0,8
0,9
1
0 20 40 60 80 100 0,4
0,5
0,6
0,7
0,8
0,9
1Chlamydiatrachomatis
0 20 40 60 80 100
0,45
0,5
0,55
0,6
0,65
0,7
0,75
0 20 40 60 80 100
Escherichiacoli
accu
racy
0,45
0,5
0,55
0,6
0,65
0,7
0,75
0 20 40 60 80 100
Heamophilusinfluenzae
0 20 40 60 80 100
HelicobacterPylori
0,4
0,45
0,5
0,55
0,6
0,65
0,7
0,40,45
0,50,55
0,60,65
0,70,75
0,8
0 20 40 60 80 100
Methanobacteriumthermoautotrophicum
position (%) position (%) position (%)
accu
racy
0,45
0,5
0,55
0,6
0,65
0,7
0,75
0 20 40 60 80 100
Mycobacteriumtuberculosis
0,4
0,5
0,6
0,7
0,8
0,9
1
0 20 40 60 80 100
Treponemapallidum
Bases
Amino acids
Codons
Dinucleotides
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Visible even in proteins…
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Essentiality in B. subtilis
highlyexpressed
0%
25%
50%
75%
100%
non-highlyexpressed
Essential genes
highlyexpressed
non-highlyexpressed
Non-essential genes
Lagging
Leading
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Gene persistence
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Some of the genes missing from the list of persistentgenes have diverged considerably. To assess thecontribution of this effect we measured for each pair ofgenomes the correlation between the similarity oforthologous pairs and that of the 16S rRNA. As expected,the correlations were high. For example (Figure A), 38%(resp. 48%) of B. subtilis (resp. E. coli) persistent genesshowed a correlation coefficient >0.9 between thesequence similarity of the pair of orthologs and the 16S.In contrast, some genes (Figure B) evolve in an erraticway. This may be due to horizontal gene transfer, localadaptations leading to faster or slower evolutionary pace,or simply wrong assignments of orthology. The latter canbe a significant problem, especially in large proteinfamilies. However, the genes presenting such an erraticpattern are rare in the persistent set.
Gene persistence
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Replicationtranscription conflicts
Transcription may proceed opposite tothe movement of the replication forkmovementThis will abort transcription, leading totruncated mRNAIf translated truncated mRNA may leadto truncated proteins, this will becomenegative dominant if in complexes…
E.P.C. Rocha & A. Danchin Essentiality, not expressiveness, drives gene-strand bias in bacteria. Nature Genetics (2003) 34 : 377-378
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
When polymerases collide
DNAPdeceleration
End oftranscription
Arrest of RNAP & DNAP
Transcriptionabortion
Co-oriented Head-onConsequences:1. Replication slow-down
2. Loss of transcripts
Consequences:1. Aborted transcripts
2. Truncated essentialproteins
E. Rocha
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Distribution of highly expressed genes
Highly expressed genescluster near the origin infast-growing bacteria
Origin
Terminus
Middle
Ori
Ter
10%
20%
30%
40%
50%
60%
70%
0%
C. c
resc
entu
s
M. t
uber
culo
sis
E. c
oli
B. su
btili
s
Fast growers | Slow growersFast growers | Slow growers
E. Rocha
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Symplectic biology:The Delphic Boat
Genes do not operate inisolationProteins are part ofcomplexes, as are partsin an engineIt is important tounderstand theirrelationships, as those inthe planks which make aboat
The Delphic Boat: Harvard University
Press, february 2003
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Gene vicinity: synteny
C. Médigue
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Multivariate Analyses
In contrast to standard genetics, genomics analyses large collectionsof genes and gene products.
Multivariate analyses try to extract information by simplifying thenumber of relevant descriptors in the objects of interest.
Principal Component Analysis uses the centered average and a simpledistance (identity); it is the reference method.
Correspondence Analysis belongs to the same family, but it uses theχ2 measure as a distance. This allows the user not only to work withhighly heterogeneous objects but also to work simultaneously on thespace of objects and on the space of descriptors.
Independent Component Analysis uses the non gaussian character ofthe values associated to descriptors
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Neighbourhood:distribution ofaminoacids in theproteome
G. Pascal
Bias in amino acid distribution
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Universal biases in proteinamino acid composition
First axis: separates Integral Inner Membrane Proteins(IIMP) from the rest; driven by opposition between chargedand large hydrophobic residues
Second axis: separates proteins according to anopposition driven by the G+C content of the first codonbase
Third axis: separates proteins by their content inaromatic amino acids; enriched in orphan proteins
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
The “gluons”
There is an aromatic residues-oriented biasin all genomes
With proteins of the same size this opposesribosomal proteins to orphan proteins
Hypothesis: orphans are “self”-specificproteins that stabilise complexes, they act as“gluons”
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Temperature-dependentbiases in protein amino acid
composition The amino acid composition of proteins
depends heavily on the phylogeny => need tocompare organisms related to each other
The general trend of amino acid compositionbias is to avoid some aminoacids at highertemperatures
Mesophilic bacteria belong to at least twodifferent classes (in a 5-clusters analysis)
Biases are always dominated by the IIMPclustering
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Temperature-dependentamino acid biases
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Codon usage biases
20 amino acids 61 codonsStudy of the genes in the codon space,using Correspondence Analysis (χ2
measure)At least three classes of genes,including one corresponding tohorizontal transfer
C. Médigue, T. Rouxel, P. Vigier, A. Hénaut & A. Danchin. Evidence for horizontal gene transfer in Escherichia coli speciation.J. Mol. Biol. (1991) 222 pp. 851-856
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Gene exchange Genes expressed at a
high level
under exponential
growth conditions
Horizontally
exchanged genes
Core metabolism
of the cell
Class I: core metabolism
Class II: high expression inexponential growth
Class III: horizontal transfer
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Codon usage, organisation andevolution of the B. subtilis genome
(Moszer, 98)
Correspondence analysis
Classification
Highly expressedAtypical / HGTOthers
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
The cell organizers
It is too early to understand theselection pressures that organize thecell architecture. However, at least inbacteria, the role of gasses andchemical highly reactive radicals playprobably a major role. Most of thecorresponding genes are stillunknown….
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Sulfur undergoes oxido-reduction reactions from -2 to +6Incorporation of sulfur into metabolism usually requires reduction to thegaseous form H2SH2S is highly reactive, in particular towards dioxygen=> These two gasses, despite their diffusion properties, must be keptseparate as much as possibleSulfur scavenging is energy-costly=> Sulfur containing molecules have to be recycled
Selection pressure fororganisation: oxido-
reduction
A. Sekowska, H-F. Kung & A. Danchin Sulfur metabolism in Escherichia coli and related bacteria, facts and fiction.J. Mol. Microbiol. Biotechnol. (2000) 2: 145-177
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Sulfur metabolism: anunexpected organiser of the
cell ’s architecture• Sulfur metabolism-related proteins are more acidic(average pI 6.5) than bulk proteins (richer in asp and glu),they are poor in serine residues
• They are significantly poor in sulfur-containing amino-acids
• Their genes are very poor in codons ATA, AGA and TCA
• There are no class III (horizontal transfer) genes in theclass (only 2 in 150 genes)
• => sulfur-metabolism genes are ancestral and may for acore structure for the E. coli genome
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Proximity in the chromosomeSulphur islands
E.P.C. Rocha, A. Sekowska & A. Danchin Sulfur islands in the Escherichia coli genome: markers of the cell's architecture?FEBS Lett. (2000) 476: 8-11
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
The error catastrophe
Similarity in sequence leads to functionalinference
Because of recruitment of pre-existing structures,there is often no obvious link between a structureand a function (the book-paperweight)
Hence a propagation of annotation errors ykrS (mtnA) annotated as « translation factor » is
a component of sulfur metabolism! A Sekowska, V Dénervaud, H Ashida, K Michoud, D Haas, A Yokota, A Danchin Bacterial variations on the methioninesalvage pathway BMC Microbiol (2004) 4: 9
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
A new metabolicpathway
A. Sekowska
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Just so story: proximity in thegenome
cmk (mssA) rpsAEscherichia coli
cmk ypfDBacillus subtilis no rpsA !!!
cmk rpsA
cmk rpsA
Haemophilus influenzae
Sinorhizobium meliloti
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
The pyrimidine diphosphateparadox
OMP UMP UDPUDP UTP CTP
In order to make deoxyribonucleotides the cell uses
ribonucleosides diphosphates, not triphosphates
NDP dNDP dNTP !NDR NDK
no CDP !!!
And here is the paradox:
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
How is the paradox resolved?
OMP UMP UDPUDP UTP CTP
mRNA
DNA
CMP
CDPdCDP
RNases Cmk
PNPase
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Phylogenetic neighbours:the S1 box
• rpsA codes for ribosomal protein S1. It contains the S1 box (PROSITE PS50126). Many other proteins contain a similar box: polynucleotide phosphorylase, RNases E, G and R, RNAhelicases etc.• protein RegB of bacteriophage T4, associated to S1, cuts mRNA at GAGG motifs.• S1 is a subunit of bacteriophage Qβ replicase…
=> All this points to a function for S1 in RNA metabolism
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Codonusage biasneighbours
Gene CommentblacatdicBlppompA
long mRNA turnover
pyrF pyrimidine metabolismhflBftsHmrsACFlpp
cell architecture
nusApcnBmetYpnprnarnbrncrne/amsrngrph
RNA maturation and turnover
trxA oxido-reduction, subunit of T7replicase, needed for synthesisof deoxyribonucleotides
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Protein complexes:the Degradosome
PNPasePolyA polymeraseRNAse ES1
Polyphosphate kinase
Enolase
mRNA degradation
CDP for de novo DNA synthesis
GDP recycling of GTP for carbohydrate secretion
GDP + PEP GTPNDK +PYK
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Just so story: the cmk rpsAoperon
cmk (mssA) rpsAEscherichia coli
Conclusion:The function of the cmk rpsA operon is to make CDPfor DNA synthesis
mssA was discovered as a suppressor ofsmbA (pyrH), itself a suppressorof MukB, amyosin-like protein involved in chromosomesegregation=> DNA synthesis is involved in the function.
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Selection pressure forcompartmentalisation: adangerous intermediate
OMP UMP UDPUDP UTP CTP
dUDP
dUTPDNA dUMP + PPi
dTMP
dTDP
dTTP
DNA
Uridylate kinase (UMK)pyrH (smbA) No CDP: no DNA…
S. Noria & A. Danchin Just so genome stories : what does my neighbor tell me? International Congress Series 1246 Elsevier Science (2002) 3-13
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
In conclusion:
UMK must becompartmentalisedS. Landais, P. Gounon, C. Laurent-Winter,J.C. Mazié, A. Danchin, O. Barzu &H. Sakamoto Immunochemical analysisof UMP kinase from Escherichia coli. J.Bacteriol. (1999) 181: 833-840
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
A prediction: ribosomerecycling and UTP
pyr H frr Escherichia coli
pyr H frr Bacillus subtilis
pyr H frr Photorhabdus luminescens
This organisation is conserved in most Gram+ and Gram-bacteria. Why ?
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Ribosome recycling andUTP
frr codes for the ribosome recycling factor, that allows 70Sribosomes to split into 30S and 50S subunits. In polycistronicoperons, the 70S ribosome can go on from one gene to thenext one without recycling (this requires formylation of the firstmethionine). At the end of the message, the ribosomes mustrecycle. This happens in a context where transcripts makestem and loops, ending with a polyU sequence.
Conjecture: is UTP controlling the activity of Frr? Rememberthat one cannot speak of « concentrations » of molecules in acell. 1 micromolar would mean 600 molecules. There are20,000 ribosomes, therefore 1 mM means only 30 individualmolecules in the immediate vicinity of each ribosome...
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
Transcription termination
UUUUUUUUUU
At Rho-independentsites for termination oftranscription themessenger RNA endswith rows of U. Thismust lower the localavailability of UTP….
This suggests Frr as a drug target, with analogs of UTP asleads...
© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]
A preconceived ideology
Mathematics
Physics
Chemistry
Biology
Sociology
MolecularMolecularBiologyBiology
StructuralStructuralBiologyBiology
GeneticsGeneticsandandGenomicsGenomics