Central DogmaInformation storage in biological molecules
DNA
RNA
Protein
transcription
translation
replication
DNA---deoxyribonucleic acid phosphate sugar (deoxyribose) backbone
4 nitrogen bases
Blackburn and Gait, Nucleic acids in chemistry and biology, Oxford University Press New York 1996. PyrimidinesPurines
T-A base pair2 H bonds
C-G base pair3 H bonds
Central DogmaInformation storage in molecules
DNA
RNA
Protein
transcription
translation
replication
RNA—ribonucleic acidphosphate sugar (ribose) backbone
4 bases, A,G,Cbut U instead of T
Single stranded
There’s an OH here instead of an H!
Types of RNA-
mRNA holds the Message transcribed from DNAwill be translated into a protein
rRNArRNA----a component of the a component of the RRibosomeibosome
tRNAtRNA—helps —helps TTransfer the message from base ransfer the message from base pairs to proteinpairs to protein
Note that rRNA and tRNA function in the cell as RNA Note that rRNA and tRNA function in the cell as RNA molecules and are never themselves translated into proteins molecules and are never themselves translated into proteins
RNA secondary structure
especially important for:
rRNAtRNA
Chastain, M. and Tinoco Jr., I., (1991) Prog. Nucleic Acid Res. Mol. Biol. 41, 131-177.
Central DogmaInformation storage in molecules
DNA
RNA
Protein
transcription
translation
replication
genomic DNA sheared to 3kb clone library
insert ends sequenced to 8X
coverage
computer assembly of sequence reads
finishing and closure using PCR to close gaps
and verify assembly
How do you sequence an entire genome?
First complete genome sequence of a free-living organism:
1995 Haemophilus influenzae
1,830,137 base pairs (1.8 Mbp), 1743 genes
Since 1995 there has been an explosion in the number of completed genomes
http://www.genomesonline.org/
147, 463
18, 26
27, 414
Why?
Advances in sequencing technology—major sequencing centers have enough capacity to complete a bacterial genome in a day!
Bacteria: 405 completed, 994 ongoing
Archaea: 31 Completed, 64 ongoing
Eukaryotes: 44 completed, 631 ongoing
Meta genome projects: 62
2004
http://www.genomesonline.org/
Environmental Genomics
100s of liters of water
Concentrate on filter
Extract HMW DNA
Clone into BAC or fosmid
Idea: to look at DNA directly from the environment
One way: clone really large pieces
Large insert vectors
BAC—bacterial artificial chromosome
Can clone DNA fragments 100- to 300-kb insert size (average, 150 kb) in Escherichia coli cells. Based on naturally occurring F-factor plasmid found in the bacterium E. coli.
Fosmid/Cosmid----Artificially constructed cloning vector containing the cos gene of phage lambda. Cosmids can be packaged in lambda phage particles for infection into E. coli; this permits cloning of larger DNA fragments (up to 45kb) than can be introduced into bacterial hosts in plasmid vectors.
YAC—yeast artificial chromosome
Can clone DNA fragments up to 1000 kb insert size (average, 150 kb) in yeast cells. Issues with insert stability, high rates of chimerism, and difficulty in purifyiing vector DNA.
Fosmid Library Construction
CopyControlTM
System (Epicentre Technologies)
Allows maintenace of cell stock at low vector copy number, and inducibility to high copy number when needed
Can be used for any vector type—plasmid, BAC, fosmid
Beja et al 2000 Environmental Microbiology 2: 516-529
Products of an environmental BAC library from California coastal waters
Can screen BAC/fosmid libraries multiple ways:
Sequence ends of each BAC/fosmid
Probe with gene of interest (rRNA or functional gene)
Sequence entire fosmid to see what else is there
PCR pooled library with primers for gene of interest
Narrow down which fosmid gave positive band
Sequence entire fosmid to see what else is there
Expression and activity of rhodopsin from environmental BAC
Beja et al 2000 Science
Comparison of environmental BACs to genomes of cultured organisms
Beja et al 2002 Nature 415: 630-633
Genomics in the Environment: a shotgun approach
Science, April 2, 2004
http://www.sorcerer2expedition.org/main.htm
Genomics in the Environment
Applied whole genome shotgun sequencing technique to 200 l of surface seawater
1.045 billion bases sequenced 1800 microbial species estimated to exist in
sample, including 148 novel phylotypes 1.2 million previously unknown genes 12 microbial genomes partially assembled
Whole genome sequencing
genomic DNA
sheared to 3kb
clone library
insert ends sequenced to 8X
coverage
computer assembly of
sequence reads
finishing and closure using PCR to close gaps
and verify assembly
0.1
GP2MIT930275M 09
75M 0875M 15
75M 18MIT9201MIT9312
MIT932175M 06MIT9107NS_000023SBMIT9314
AS9601MIT9301175M 16MIT9215RS810
75M 0275M 2075M 19
MB11E08MB11F02
MED4MIT9515
NATL2APAC1NATL1A
MIT9211SS120
MIT9303MIT9313
WH6501WH8102
WH7805WH8101
marine Synechococcus
High B/A low light adapted
Prochlorococcus
Low B/A high light adapted
Prochlorococcus
I
II
Venter et al 2004, Science
Comparison of MED4 with environmental scaffolds
High degree of synteny between MED4 and environmental Prochlorococcus scaffolds
MED4
Pro. SAR-1
aa
rbcL
glnA
idiA
9683
91 91
100
Variation at the nt and aa level between MED4 and environmental Prochlorococcus scaffolds
nt% identity
87