+ All Categories
Home > Documents > Symplectic biology: the cell as a living computeradanchin/lectures/hinxton-05_2.pdf · Symplectic...

Symplectic biology: the cell as a living computeradanchin/lectures/hinxton-05_2.pdf · Symplectic...

Date post: 03-Feb-2018
Category:
Upload: nguyenhanh
View: 217 times
Download: 4 times
Share this document with a friend
60
© Genetics of Bacterial Genomes http://www.pasteur.fr/recherche/unites/REG [email protected] Symplectic biology: the cell as a living computer
Transcript

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Symplectic biology:the cell as a living

computer

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Authors

in silicoGang FangEtienne LarsabalGéraldine PascalEduardo RochaIvan MoszerClaudine Médigue

in vivo / in vitroAgnieszka SekowskaAnne Marie GillesOctavian Barzu

CollectiveStanislas Noria

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Background

Physics: matter, energy, timeBiology: Physics + information, coding,control...Arithmetics: strings of whole numbers,recursivity, coding…Computing: Arithmetics + program + machine...

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Information Transfer

As is the case for building up a machine, oneneeds a book of recipe to build up a cell

This asks for changing the text of the recipe intosomething concrete: this transfers« information »

In a cell, information transfer is managed by thegenetic program

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Three processes are needed for Life:

Information transfer (Living Computers?) => the goal of genomicsis to decipher the blueprint of the “read-only” memory of themachine

Driving force for a coupling between the genome structure and thestructure of the cell:

Metabolism (Internal organisation)Compartmentalisation (General structure)

What is Life?

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Two processes are needed for computing:

A read/write machine

A program on a physical support (typically, a tape illustrates thesequential string of symbols that makes up the program), split (inpractice) into two entities:

Program (providing the goal)Data (providing the context)

The machine is distinct from the program

What iscomputing?

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Cells as computers

Genomics rest on an alphabetic metaphor, that of a textwritten with a four-letter alphabet, acting as a program

Conjecture: do cells behave as computers?

Genetic engineeringVirusesHorizontal gene transferCloning animal cells

all point to separation betweenMachineData + Program

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

If the machine has not only to behave as acomputer but has also to construct themachine itself, one must find an image ofthe machine somewhere in the machine (J.von Neumann)

Is there a map of the cellin the chromosome?

A. Danchin The Delphic Boat. What genomes tell us (2003) Harvard University Press

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Genome organisation

Is the gene order random in the chromosomes?

At first sight, despite different DNA managementprocesses not much is conserved, and genestransferred from other organisms are distributedthroughout genomes

However, groups of genes such as operons orpathogenicity islands tend to cluster in specificplaces, and they code for proteins with commonfunctions

First question: how are generated and where arelocated repeats in the genome sequence?

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Caveat: Repeats aremeaningful

Remember also:

This clock has aminute minutehand

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Repeats in bacteria

Abcissa: first occurrence of the repeatOrdinate: second position of the repeat

Diagonal: repeats are located near to eachother

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

.

0

0

1000

2000

3000

4000

1000 2000 3000 4000

Escher i ch i ac o l i

1000

0 200

2000

600 1000 1400 1800

Haemophi lusin f l uenzae

00500

1000

1500

0 500 1000 1500

Methanococcusj an n asch i i 10

0200

300

400

0

0 100 200 300 400 500

Mycoplasmagen i ta l i um

500

0200

400

600

800

0 200 400 600 800

Mycoplasmapneumoniae

0500

1000

1500

0 500 1000 1500

Hel icobacterpy l o r i

0 1000 2000 3000 4000

0

B a c i l l u ss u b t i l i s

1000

2000

3000

4000

0 500 1000 1500

0

Methanobacteriumthermoautotrophicum

500

1000

1500

NR = 397NT = 283

NR = 170NT = 54

NR = 204NT = 111

NR = 139NT = 82

NR = 260NT = 187

NR = 552NT = 250

NR = 183NT = 75

NR = 280NT = 137

DNA management:Repeats in genomes

E. Rocha, A. Viari & A. Danchin Analysis of long repeats in bacterial genomes reveals alternative evolutionary mechanisms in Bacillus subtilis andother competent prokaryotes. Mol. Biol. Evol. (1999) 16: 1219-1230

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Genome organisation

The genome organisation is so rigidthat the overall result of selectionpressure on DNA is visible in thegenome text, which is full of« flexible patterns of class A »

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

The period 10-11.5

The genome of Helicobacter pylori displays a periodof 11 over regions spanning 60 nucleotides

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Class A flexiblepatterns

The period 10-11.5 is explained by the presence ofomnipresent patterns the class A flexible patterns

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Class A flexible patternsare ubiquitous

The period 10-11.5 is explained by the presence ofomnipresent patterns the class A flexible patterns

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Class A flexiblepatterns

The period 10-11.5 is explained by the presence ofomnipresent patterns the class A flexible patterns

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

A universal rule: classA flexible patterns

The flexible nature of the patterns permits DNA toaccomodate superturns or local bending

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Genome organisation

The genome organisation is so rigidthat the overall result of selectionpressure on DNA is visible in thegenome text, where the constraintsof replication are visible in theleading and the lagging strand

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

E. Rocha

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

180

90

0

27055% leading

Escherichia coli

Ori

Ter

90270 65% leadingTreponema pallidum

Ori

Ter

180

90270 75% leadingBacillus subtilis

Ori

Ter

9027087% leading

Thermoanaerobactertengcongensis

Ori

Ter

CDS densityLeading CDS density

(updated from Kunst etal , Nature, 97)

Different “OperatingSystems”?

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Chosing arbitrarily anorigin of replicationand a property of thestrand (basecomposition, codoncomposition, codonusage, amino acidcomposition of thecoded protein…) onecan use discriminantanalysis to seewhether thehypothesis holds.

To lag or to lead...

E. Rocha, A. Danchin & A. Viari Universal replication biases in bacteria. Mol. Microbiol. (1999) 32: 11-16

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

To lag or to lead, that is the question

.

0,450,5

0,550,6

0,650,7

0,750,8

0,85

0 20 40 60 80 100

Bacillussubtilis

accu

racy

Borreliaburgdorferi

0,4

0,5

0,6

0,7

0,8

0,9

1

0 20 40 60 80 100 0,4

0,5

0,6

0,7

0,8

0,9

1Chlamydiatrachomatis

0 20 40 60 80 100

0,45

0,5

0,55

0,6

0,65

0,7

0,75

0 20 40 60 80 100

Escherichiacoli

accu

racy

0,45

0,5

0,55

0,6

0,65

0,7

0,75

0 20 40 60 80 100

Heamophilusinfluenzae

0 20 40 60 80 100

HelicobacterPylori

0,4

0,45

0,5

0,55

0,6

0,65

0,7

0,40,45

0,50,55

0,60,65

0,70,75

0,8

0 20 40 60 80 100

Methanobacteriumthermoautotrophicum

position (%) position (%) position (%)

accu

racy

0,45

0,5

0,55

0,6

0,65

0,7

0,75

0 20 40 60 80 100

Mycobacteriumtuberculosis

0,4

0,5

0,6

0,7

0,8

0,9

1

0 20 40 60 80 100

Treponemapallidum

Bases

Amino acids

Codons

Dinucleotides

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Visible even in proteins…

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Essentiality in B. subtilis

highlyexpressed

0%

25%

50%

75%

100%

non-highlyexpressed

Essential genes

highlyexpressed

non-highlyexpressed

Non-essential genes

Lagging

Leading

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Gene persistence

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Some of the genes missing from the list of persistentgenes have diverged considerably. To assess thecontribution of this effect we measured for each pair ofgenomes the correlation between the similarity oforthologous pairs and that of the 16S rRNA. As expected,the correlations were high. For example (Figure A), 38%(resp. 48%) of B. subtilis (resp. E. coli) persistent genesshowed a correlation coefficient >0.9 between thesequence similarity of the pair of orthologs and the 16S.In contrast, some genes (Figure B) evolve in an erraticway. This may be due to horizontal gene transfer, localadaptations leading to faster or slower evolutionary pace,or simply wrong assignments of orthology. The latter canbe a significant problem, especially in large proteinfamilies. However, the genes presenting such an erraticpattern are rare in the persistent set.

Gene persistence

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Replicationtranscription conflicts

Transcription may proceed opposite tothe movement of the replication forkmovementThis will abort transcription, leading totruncated mRNAIf translated truncated mRNA may leadto truncated proteins, this will becomenegative dominant if in complexes…

E.P.C. Rocha & A. Danchin Essentiality, not expressiveness, drives gene-strand bias in bacteria. Nature Genetics (2003) 34 : 377-378

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

When polymerases collide

DNAPdeceleration

End oftranscription

Arrest of RNAP & DNAP

Transcriptionabortion

Co-oriented Head-onConsequences:1. Replication slow-down

2. Loss of transcripts

Consequences:1. Aborted transcripts

2. Truncated essentialproteins

E. Rocha

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Distribution of highly expressed genes

Highly expressed genescluster near the origin infast-growing bacteria

Origin

Terminus

Middle

Ori

Ter

10%

20%

30%

40%

50%

60%

70%

0%

C. c

resc

entu

s

M. t

uber

culo

sis

E. c

oli

B. su

btili

s

Fast growers | Slow growersFast growers | Slow growers

E. Rocha

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Symplectic biology:The Delphic Boat

Genes do not operate inisolationProteins are part ofcomplexes, as are partsin an engineIt is important tounderstand theirrelationships, as those inthe planks which make aboat

The Delphic Boat: Harvard University

Press, february 2003

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Gene vicinity: synteny

C. Médigue

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Multivariate Analyses

In contrast to standard genetics, genomics analyses large collectionsof genes and gene products.

Multivariate analyses try to extract information by simplifying thenumber of relevant descriptors in the objects of interest.

Principal Component Analysis uses the centered average and a simpledistance (identity); it is the reference method.

Correspondence Analysis belongs to the same family, but it uses theχ2 measure as a distance. This allows the user not only to work withhighly heterogeneous objects but also to work simultaneously on thespace of objects and on the space of descriptors.

Independent Component Analysis uses the non gaussian character ofthe values associated to descriptors

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Neighbourhood:distribution ofaminoacids in theproteome

G. Pascal

Bias in amino acid distribution

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Universal biases in proteinamino acid composition

First axis: separates Integral Inner Membrane Proteins(IIMP) from the rest; driven by opposition between chargedand large hydrophobic residues

Second axis: separates proteins according to anopposition driven by the G+C content of the first codonbase

Third axis: separates proteins by their content inaromatic amino acids; enriched in orphan proteins

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

The “gluons”

There is an aromatic residues-oriented biasin all genomes

With proteins of the same size this opposesribosomal proteins to orphan proteins

Hypothesis: orphans are “self”-specificproteins that stabilise complexes, they act as“gluons”

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Temperature-dependentbiases in protein amino acid

composition The amino acid composition of proteins

depends heavily on the phylogeny => need tocompare organisms related to each other

The general trend of amino acid compositionbias is to avoid some aminoacids at highertemperatures

Mesophilic bacteria belong to at least twodifferent classes (in a 5-clusters analysis)

Biases are always dominated by the IIMPclustering

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Temperature-dependentamino acid biases

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Codon usage biases

20 amino acids 61 codonsStudy of the genes in the codon space,using Correspondence Analysis (χ2

measure)At least three classes of genes,including one corresponding tohorizontal transfer

C. Médigue, T. Rouxel, P. Vigier, A. Hénaut & A. Danchin. Evidence for horizontal gene transfer in Escherichia coli speciation.J. Mol. Biol. (1991) 222 pp. 851-856

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Gene exchange Genes expressed at a

high level

under exponential

growth conditions

Horizontally

exchanged genes

Core metabolism

of the cell

Class I: core metabolism

Class II: high expression inexponential growth

Class III: horizontal transfer

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Codon usage, organisation andevolution of the B. subtilis genome

(Moszer, 98)

Correspondence analysis

Classification

Highly expressedAtypical / HGTOthers

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

The cell organizers

It is too early to understand theselection pressures that organize thecell architecture. However, at least inbacteria, the role of gasses andchemical highly reactive radicals playprobably a major role. Most of thecorresponding genes are stillunknown….

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Sulfur undergoes oxido-reduction reactions from -2 to +6Incorporation of sulfur into metabolism usually requires reduction to thegaseous form H2SH2S is highly reactive, in particular towards dioxygen=> These two gasses, despite their diffusion properties, must be keptseparate as much as possibleSulfur scavenging is energy-costly=> Sulfur containing molecules have to be recycled

Selection pressure fororganisation: oxido-

reduction

A. Sekowska, H-F. Kung & A. Danchin Sulfur metabolism in Escherichia coli and related bacteria, facts and fiction.J. Mol. Microbiol. Biotechnol. (2000) 2: 145-177

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Sulfur metabolism: anunexpected organiser of the

cell ’s architecture• Sulfur metabolism-related proteins are more acidic(average pI 6.5) than bulk proteins (richer in asp and glu),they are poor in serine residues

• They are significantly poor in sulfur-containing amino-acids

• Their genes are very poor in codons ATA, AGA and TCA

• There are no class III (horizontal transfer) genes in theclass (only 2 in 150 genes)

• => sulfur-metabolism genes are ancestral and may for acore structure for the E. coli genome

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Proximity in the chromosomeSulphur islands

E.P.C. Rocha, A. Sekowska & A. Danchin Sulfur islands in the Escherichia coli genome: markers of the cell's architecture?FEBS Lett. (2000) 476: 8-11

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

The error catastrophe

Similarity in sequence leads to functionalinference

Because of recruitment of pre-existing structures,there is often no obvious link between a structureand a function (the book-paperweight)

Hence a propagation of annotation errors ykrS (mtnA) annotated as « translation factor » is

a component of sulfur metabolism! A Sekowska, V Dénervaud, H Ashida, K Michoud, D Haas, A Yokota, A Danchin Bacterial variations on the methioninesalvage pathway BMC Microbiol (2004) 4: 9

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

A new metabolicpathway

A. Sekowska

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Just so story: proximity in thegenome

cmk (mssA) rpsAEscherichia coli

cmk ypfDBacillus subtilis no rpsA !!!

cmk rpsA

cmk rpsA

Haemophilus influenzae

Sinorhizobium meliloti

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

The pyrimidine diphosphateparadox

OMP UMP UDPUDP UTP CTP

In order to make deoxyribonucleotides the cell uses

ribonucleosides diphosphates, not triphosphates

NDP dNDP dNTP !NDR NDK

no CDP !!!

And here is the paradox:

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

How is the paradox resolved?

OMP UMP UDPUDP UTP CTP

mRNA

DNA

CMP

CDPdCDP

RNases Cmk

PNPase

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Phylogenetic neighbours:the S1 box

• rpsA codes for ribosomal protein S1. It contains the S1 box (PROSITE PS50126). Many other proteins contain a similar box: polynucleotide phosphorylase, RNases E, G and R, RNAhelicases etc.• protein RegB of bacteriophage T4, associated to S1, cuts mRNA at GAGG motifs.• S1 is a subunit of bacteriophage Qβ replicase…

=> All this points to a function for S1 in RNA metabolism

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Codonusage biasneighbours

Gene CommentblacatdicBlppompA

long mRNA turnover

pyrF pyrimidine metabolismhflBftsHmrsACFlpp

cell architecture

nusApcnBmetYpnprnarnbrncrne/amsrngrph

RNA maturation and turnover

trxA oxido-reduction, subunit of T7replicase, needed for synthesisof deoxyribonucleotides

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Protein complexes:the Degradosome

PNPasePolyA polymeraseRNAse ES1

Polyphosphate kinase

Enolase

mRNA degradation

CDP for de novo DNA synthesis

GDP recycling of GTP for carbohydrate secretion

GDP + PEP GTPNDK +PYK

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Just so story: the cmk rpsAoperon

cmk (mssA) rpsAEscherichia coli

Conclusion:The function of the cmk rpsA operon is to make CDPfor DNA synthesis

mssA was discovered as a suppressor ofsmbA (pyrH), itself a suppressorof MukB, amyosin-like protein involved in chromosomesegregation=> DNA synthesis is involved in the function.

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Selection pressure forcompartmentalisation: adangerous intermediate

OMP UMP UDPUDP UTP CTP

dUDP

dUTPDNA dUMP + PPi

dTMP

dTDP

dTTP

DNA

Uridylate kinase (UMK)pyrH (smbA) No CDP: no DNA…

S. Noria & A. Danchin Just so genome stories : what does my neighbor tell me? International Congress Series 1246 Elsevier Science (2002) 3-13

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

In conclusion:

UMK must becompartmentalisedS. Landais, P. Gounon, C. Laurent-Winter,J.C. Mazié, A. Danchin, O. Barzu &H. Sakamoto Immunochemical analysisof UMP kinase from Escherichia coli. J.Bacteriol. (1999) 181: 833-840

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

A prediction: ribosomerecycling and UTP

pyr H frr Escherichia coli

pyr H frr Bacillus subtilis

pyr H frr Photorhabdus luminescens

This organisation is conserved in most Gram+ and Gram-bacteria. Why ?

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Ribosome recycling andUTP

frr codes for the ribosome recycling factor, that allows 70Sribosomes to split into 30S and 50S subunits. In polycistronicoperons, the 70S ribosome can go on from one gene to thenext one without recycling (this requires formylation of the firstmethionine). At the end of the message, the ribosomes mustrecycle. This happens in a context where transcripts makestem and loops, ending with a polyU sequence.

Conjecture: is UTP controlling the activity of Frr? Rememberthat one cannot speak of « concentrations » of molecules in acell. 1 micromolar would mean 600 molecules. There are20,000 ribosomes, therefore 1 mM means only 30 individualmolecules in the immediate vicinity of each ribosome...

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

Transcription termination

UUUUUUUUUU

At Rho-independentsites for termination oftranscription themessenger RNA endswith rows of U. Thismust lower the localavailability of UTP….

This suggests Frr as a drug target, with analogs of UTP asleads...

© Genetics of Bacterial Genomeshttp://www.pasteur.fr/recherche/unites/REG [email protected]

A preconceived ideology

Mathematics

Physics

Chemistry

Biology

Sociology

MolecularMolecularBiologyBiology

StructuralStructuralBiologyBiology

GeneticsGeneticsandandGenomicsGenomics


Recommended