Date post: | 22-Dec-2015 |
Category: |
Documents |
View: | 213 times |
Download: | 0 times |
Reminder: Class on Friday, Discussion of Li et al.
Proposal/Projects
CAMERA feedback?
Eukaryotes
Large
Have organelles
Diploid (mostly)
linear chromosomes
lower % coding
Genes have introns
Genomes—How Big? Genome Size # of
GenesH. influenzae 1.8 Mb 1700E. coli 4.7 Mb 4400Yeast 12 Mb 6300Diatom (Thaps) 34 Mb11,000Fruit Fly 180 Mb 13,600Fugu 400 Mb30,000Human 3000 Mb 30,000
http://www.genomesize.com/ Gregory, 2004 Paleobiology 30:179-202
1pg ~= 1 billion base pairs (1000 Mbp).
Eukaryotic genomes are bigWhat does this mean for sequencing?
Strategies are similar Low coverage of large insert library (BACs,
fosmids) Higher coverage of small insert library
Finishing is harder Often additional mapping tools, RE maps, optical maps
employed to map scaffolds to chromosomes Genomes released in “versions” (Thaps 3.0) Publications often based on draft versions
Where are draft Versions in GenBank?
Model organisms have their own web sites
YeastDBWormDBFlyBase
Eukaryotic genomes are diploidWhat does this mean for sequencing?
Finishing is harder Will never get a 100% consensus Instead identify “high quality discrepancies” What is the sequence in the released genome? How to find where the SNPs are?
T. pseudonana 0.75% of nuclear genome polymorphic
Eukaryotic genomes are arranged in linear chromosomes
Finishing is harder Need to use additional maps to decide if
contigs shoulf be joined or belong on their own chromosoms
Additional mechanisms of gene duplication available/common
Eukaryotic genomes have low % coding
Finishing is harder Much of non coding DNA made up of “selfish DNA” Repeatsmake assembly problematic Thaps: 2% of genome is retrotransposons
Mammalian cells—less than 1% of genomic DNA is coding
Eukaryotic gene structure
Gene finding in eukaryotic genomesRelies on both signal sequences and coding statistics Signals: promoters, start and stop codons, splice sites, poly A
sites These are all relatively weak signals Need to combine with codon statistics
Organisms Specific Training Set is crucial Generated from cDNA library sequenced in
conjucntion with genome project
Implications for Environmental genomicsNeed even more sequencing to get adequate coverage
For any given piece of DNA, likely to have fewer genes than if were prokaryotic in origin
Current state of gene finding and available genomes for comparison mean gene finders likely have very poor perfomrance on DNA of unknown origin