Whole-genome biophysics, mutations, evolution, chromatin, … Konstantin Zeldovich x62354, LRB 1004.

Whole-genome biophysics, mutations, evolution, chromatin, …

Konstantin Zeldovichx62354, LRB 1004

In the previous lecture:Protein structures and sequences are largely determined by the physical chemistry

Ab initio paradigm: sequence + physics = structure (+function, hopefully)

Today:Are physical and chemical constraints discernible at the

whole proteome / whole genome level ?

-Constraints on amino acid usage in prokaryotes- Thermostability- Metabolic cost of protein synthesis

-Mutational robustness of proteins-Evolution of protein stability-The genetic code is nonrandom-Large-scale structure of chromatin, 3C-like methods (J. Dekker lab).

Temperature ranges of modern life

Psychro-, meso-, thermo-, hyperthermophilic bacteria/archaea

-10°C (Antarctic ice, permafrost in Siberia and Canada) Colwellia spp, Psychrobacter spp

+110°C (deep sea hydrothermal vents, hot springs)Pyrococcus spp, Methanococcus spp

Cold-blooded animals: Notothenia spp. Antarctic fish: -1.8°C habitat,

dies of overheating at +6°C = 40°FDesert iguana: up to +60°C

Simplest eukaryotes: up to ~60°C (nematode from hot springs)`

Very few complete genomes!

>250 sequenced genomes

Is habitat temperature reflected in the genomes?

• What is presumably related to thermostability?– G+C in DNA increases with temperature (wrong)

• DNA stabilization by pairing

– Fraction of charges (DEKR) in proteins increases• Hydrophobic interactions weaken with temperature

– Fraction of polar residues decreases• ?

Existing knowledge

Limitations of the previous work: based on a few (dozen) individual proteins, or a limited number (~20) of completely sequenced genomes

Here: high-thoroughput analysis, 204 genomes

Zeldovich, Berezovsky, Shakhnovich, PLOS CB 2007

IVYWREL, or LIVEWYR

Topt=937FIVYWREL-335 , R=0.93, rmsd Topt=8.9°C

86 genomes

Zeldovich, Berezovsky, Shakhnovich, PLOS CB 2007

Genomic DNA: any temperature, any GC content

Base pairing is not the bottleneck of thermal adaptation.

204 genomes

DNA adaptation via codon bias

Fraction of A+G Autocorr. function of A,G

Fractions of A, G nucleotides are changing with temperature

Thermal adaptation of proteins and DNA are independent processes.

Metabolic cost of protein synthesis

Akashi and Gojobori, PNAS 99:3695 (2002)

Starting from the same basic precursors, some amino acids are easy to synthesize, some are hard, and require more energy

Hypothesis:Energy (ATP) is the limiting factor in a.a. synthesis and thus survival.

Thus, highly expressed proteins must be made of “cheap” amino acids

A.a. cost can be deduced from pathway maps

Protein expression can be either measured, or inferred from codon usage (codon adaptation index)

Highly expressed proteins are “cheaper”

Akashi and Gojobori, PNAS 99:3695 (2002)

MCU rationale:Synonymous codons are used with different frequencies (codon bias)For some reason (translation efficiency?), codon bias is correlated with expressionMCU can be calibrated using a few genes with known expression levels

Kanaya et al, Gene 238:143 (1999)

Nowadays, direct measurements of expression are available (PROJECT!)

Possible effects of mutationsDNA-Exon, nonsynonymous -> see “protein” -Exon, synonymous -> normally neutral-Introns, regulatory sequences ->???

-Altered protein expression, localization, alternative splicing, …-RNA coding regions -> changes in RNA structure/function-Chromatin structure?

Protein (non-synonymous)-Change of stability-Possible misfolding or aggregation (-> neurodegenerative diseases)-Altered interaction(s) with other protein(s) or small molecule(s)-Altered function

Change of thermodynamic stability is among the easiest to comprehend.

Mutational robustness of proteins

Average = 1 kcal/mol (destabilizing), variance = 3 (kcal/mol)2

ProTherm database http://gibk26.bse.kyutech.ac.jp/jouhou/protherm/protherm.html

~2000 mutations, thermal & chemical unfolding

Kumar et al, NAR 2006Zeldovich et al, PNAS 2007

G prediction servers and tools

• FoldX• PoPMuSiC• MUPro• CUPSAT• Eris(they are all trained on highly overlapping datasets, including ProTherm)

More servers listed at http://www.gen2phen.org/wiki/protein-level-predictions-4-stability-changes-prediction

Can we translate this to the organism level?-Mutations a protein change its stability G, occur at cell replication

-magnitudes of G can be measured or modeled

-Proteins must be stable for the function to exist and evolve

-Essential proteins must be stable (G<0) in a viable organism

- essential proteins per genome (~300 in bacteria)

-For simplicity: - No epistasis, all proteins equally essential

- Locally flat, two-level fitness landscape (life or death)

- Asexual replication

Mutations shuffle stability back and forth

(Protein) evolution is a diffusion process in the -dimensional space of stabilities of the cell’s essential proteins

Zeldovich, Chen, Shakhnovich PNAS 2007

… back in 1930

??

??

r

Low fitness High

Diffusion in the space of “characters”

Single fitness peak at origin

Fitness w=w(r)

n-dimensional hyperspheresof constant fitness

Compensatory mutations andepistasis

Soft selection

Axes poorly quantified!!!

Hartl, Taubes 1998Poon, Otto 2000

2D example R.A. Fisher 1930

viable phenotypes

G2=0

G1=0lethal phenotypes unstable proteins, G>0

impossible genotypes, too stable proteins

mutation

“Characters” are protein stabilities, =2

(… skipping the math – analytic solution exists – please ask if interested)

Replication of the viable organisms must compensate for death due to the flux across G=0 adsorbing boundary

replication

Prediction: universal distribution of G of all proteins

Line: theory; histogram: ProTherm database, ~200 proteins

EeEP Dh

hE

sin)(2

Zeldovich, Chen, Shakhnovich PNAS 2007

Genetic code links genomes and proteomesInformation-theoretical viewpoint: is this 64->20 mapping in any way optimal?

Freeland & Hurst, J. Mol. Evol. 47:238 (1998)

Hypothesis: the genetic code minimizes the effect of DNA mutations on protein structure

mean-square change of a.a. polarity upon point mutation

1,000,000 realizations of the code (64->20)

Large-scale structure of chromatin

Dekker et al, Science 2002Lieberman-Aiden et al, Science 2009

On a small scale, chromatin is tightly packed (nucleosomes, 10- and 30-nm fibers)Large-scale structure?

Chromosome Conformation Capture (3C, 5C, HiC, …)

Formaldehyde crosslink

digest ligate uncrosslink

fragments can be counted by qPCR or deep sequencing

Result: “contact map” of the chromosome: which part is spatially close to which3D structure can then be inferred, a la NMR structures of proteins (distance constraints)

DNA looping and long-range transcriptional control

Sequence determinants of the contacts?? (PROJECT!)

Dekker, TiBS 28:277 (2003)

Murine beta-globin locus, 130kb

Whole chromosome as a polymer?

Lieberman-Aiden et al, Science 2009

s bp

Probability of contact?

sp ~

globule" crumpled"

knots, without fractalcompact ,1

coilpolymer random,2/3

Theory:

Date post:	19-Dec-2015
Category:	Documents
View:	217 times
Download:	0 times

Whole-genome biophysics, mutations, evolution, chromatin, … Konstantin Zeldovich x62354, LRB 1004.

Documents