Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 217 times |
Download: | 0 times |
In the previous lecture:Protein structures and sequences are largely determined by the physical chemistry
Ab initio paradigm: sequence + physics = structure (+function, hopefully)
Today:Are physical and chemical constraints discernible at the
whole proteome / whole genome level ?
-Constraints on amino acid usage in prokaryotes- Thermostability- Metabolic cost of protein synthesis
-Mutational robustness of proteins-Evolution of protein stability-The genetic code is nonrandom-Large-scale structure of chromatin, 3C-like methods (J. Dekker lab).
Temperature ranges of modern life
Psychro-, meso-, thermo-, hyperthermophilic bacteria/archaea
-10°C (Antarctic ice, permafrost in Siberia and Canada) Colwellia spp, Psychrobacter spp
+110°C (deep sea hydrothermal vents, hot springs)Pyrococcus spp, Methanococcus spp
Cold-blooded animals: Notothenia spp. Antarctic fish: -1.8°C habitat,
dies of overheating at +6°C = 40°FDesert iguana: up to +60°C
Simplest eukaryotes: up to ~60°C (nematode from hot springs)`
Very few complete genomes!
>250 sequenced genomes
Is habitat temperature reflected in the genomes?
• What is presumably related to thermostability?– G+C in DNA increases with temperature (wrong)
• DNA stabilization by pairing
– Fraction of charges (DEKR) in proteins increases• Hydrophobic interactions weaken with temperature
– Fraction of polar residues decreases• ?
Existing knowledge
Limitations of the previous work: based on a few (dozen) individual proteins, or a limited number (~20) of completely sequenced genomes
Here: high-thoroughput analysis, 204 genomes
Zeldovich, Berezovsky, Shakhnovich, PLOS CB 2007
IVYWREL, or LIVEWYR
Topt=937FIVYWREL-335 , R=0.93, rmsd Topt=8.9°C
86 genomes
Zeldovich, Berezovsky, Shakhnovich, PLOS CB 2007
Genomic DNA: any temperature, any GC content
Base pairing is not the bottleneck of thermal adaptation.
204 genomes
DNA adaptation via codon bias
Fraction of A+G Autocorr. function of A,G
Fractions of A, G nucleotides are changing with temperature
Thermal adaptation of proteins and DNA are independent processes.
Metabolic cost of protein synthesis
Akashi and Gojobori, PNAS 99:3695 (2002)
Starting from the same basic precursors, some amino acids are easy to synthesize, some are hard, and require more energy
Hypothesis:Energy (ATP) is the limiting factor in a.a. synthesis and thus survival.
Thus, highly expressed proteins must be made of “cheap” amino acids
A.a. cost can be deduced from pathway maps
Protein expression can be either measured, or inferred from codon usage (codon adaptation index)
Highly expressed proteins are “cheaper”
Akashi and Gojobori, PNAS 99:3695 (2002)
MCU rationale:Synonymous codons are used with different frequencies (codon bias)For some reason (translation efficiency?), codon bias is correlated with expressionMCU can be calibrated using a few genes with known expression levels
Kanaya et al, Gene 238:143 (1999)
Nowadays, direct measurements of expression are available (PROJECT!)
Possible effects of mutationsDNA-Exon, nonsynonymous -> see “protein” -Exon, synonymous -> normally neutral-Introns, regulatory sequences ->???
-Altered protein expression, localization, alternative splicing, …-RNA coding regions -> changes in RNA structure/function-Chromatin structure?
Protein (non-synonymous)-Change of stability-Possible misfolding or aggregation (-> neurodegenerative diseases)-Altered interaction(s) with other protein(s) or small molecule(s)-Altered function
Change of thermodynamic stability is among the easiest to comprehend.
Mutational robustness of proteins
Average = 1 kcal/mol (destabilizing), variance = 3 (kcal/mol)2
ProTherm database http://gibk26.bse.kyutech.ac.jp/jouhou/protherm/protherm.html
~2000 mutations, thermal & chemical unfolding
Kumar et al, NAR 2006Zeldovich et al, PNAS 2007
G prediction servers and tools
• FoldX• PoPMuSiC• MUPro• CUPSAT• Eris(they are all trained on highly overlapping datasets, including ProTherm)
More servers listed at http://www.gen2phen.org/wiki/protein-level-predictions-4-stability-changes-prediction
Can we translate this to the organism level?-Mutations a protein change its stability G, occur at cell replication
-magnitudes of G can be measured or modeled
-Proteins must be stable for the function to exist and evolve
-Essential proteins must be stable (G<0) in a viable organism
- essential proteins per genome (~300 in bacteria)
-For simplicity: - No epistasis, all proteins equally essential
- Locally flat, two-level fitness landscape (life or death)
- Asexual replication
Mutations shuffle stability back and forth
(Protein) evolution is a diffusion process in the -dimensional space of stabilities of the cell’s essential proteins
Zeldovich, Chen, Shakhnovich PNAS 2007
… back in 1930
??
??
r
Low fitness High
Diffusion in the space of “characters”
Single fitness peak at origin
Fitness w=w(r)
n-dimensional hyperspheresof constant fitness
Compensatory mutations andepistasis
Soft selection
Axes poorly quantified!!!
Hartl, Taubes 1998Poon, Otto 2000
2D example R.A. Fisher 1930
viable phenotypes
G2=0
G1=0lethal phenotypes unstable proteins, G>0
impossible genotypes, too stable proteins
mutation
“Characters” are protein stabilities, =2
(… skipping the math – analytic solution exists – please ask if interested)
Replication of the viable organisms must compensate for death due to the flux across G=0 adsorbing boundary
replication
Prediction: universal distribution of G of all proteins
Line: theory; histogram: ProTherm database, ~200 proteins
EeEP Dh
hE
sin)(2
Zeldovich, Chen, Shakhnovich PNAS 2007
Genetic code links genomes and proteomesInformation-theoretical viewpoint: is this 64->20 mapping in any way optimal?
Freeland & Hurst, J. Mol. Evol. 47:238 (1998)
Hypothesis: the genetic code minimizes the effect of DNA mutations on protein structure
mean-square change of a.a. polarity upon point mutation
1,000,000 realizations of the code (64->20)
Large-scale structure of chromatin
Dekker et al, Science 2002Lieberman-Aiden et al, Science 2009
On a small scale, chromatin is tightly packed (nucleosomes, 10- and 30-nm fibers)Large-scale structure?
Chromosome Conformation Capture (3C, 5C, HiC, …)
Formaldehyde crosslink
digest ligate uncrosslink
fragments can be counted by qPCR or deep sequencing
Result: “contact map” of the chromosome: which part is spatially close to which3D structure can then be inferred, a la NMR structures of proteins (distance constraints)
DNA looping and long-range transcriptional control
Sequence determinants of the contacts?? (PROJECT!)
Dekker, TiBS 28:277 (2003)
Murine beta-globin locus, 130kb