Ramy K. AzizSan Diego State University & Cairo University
Rocky 2009Dec 10 2009
Nature’s mostsuccessful genes?
• What is prevalence? For an object x,
– Ubiquity (number of sets to which x belongs)
– Abundance (“average” frequency of x in a set)
@sets = (genomes, metagenomes, biomes)
• What to count? (PEG/ EGT/ function/ family)?
• How to count? and where (genomes/ MGs)?
– Gene length matters frequency / gene length
– Metagenome size matters relative abundance
Spelling out the question:half the way to the answer
• Current knowledge:RuBisCo* (*ribulose-1,5-bis phosphate carboxylase) is the enzyme with the highest copy number (mass?) in ecosystems. However, its gene is neither the most ubiquitous nor the most abundant
• Any guesses? (an enzyme? a transcription factor? a transporter? DNA
metabolism? Carbohydrate metabolism?)
– Guess 1:
– Guess 2:
– Guess 3:
Spelling out the question:half the way to the answer
And the winner is …
And the winner is …
Metagenomes
…
187 sets;
6
million sequences
Pearson Corr.0.524 eco-essentiality
Life essentials
fert
ility
Habitat -specific
Gene ubiquity in genomes (2,137)
Pearson Corr.0.645
Transposase
ABC transporterATP-binding
Glycosyltransferase
ABC transporterpermease
Two-component Sensor/ Regulator
tRNA synthetases
(How/Why) Does it matter?
• Current annotations suck! Improvement needed.
• Transposases no longer ‘junk hypothetical proteins’; their quorum dictates attention!
• The ‘selfish’ transposase genes must be offering their hosts some advantage.
• If rRNA is used to track genomes’ vertical history, transposases can track ‘horizontal’ history.
• Cheaters (always?) win…
• Transposases shall inherit the earth?
• This study could not have been possible without…
Rob Edwards & Mya Breitbart
And:
• Forest Rohwer, Liz Dinsdale, Anca Segall, Peter Salamon, & the Math group
• NSF funding (PhAnToMe grant)
Acknowledgment