+ All Categories

Today

Date post: 06-Jan-2016
Category:
Upload: shasta
View: 22 times
Download: 0 times
Share this document with a friend
Description:
Today. Please read… S cience 291: 1304-1315. Human Genome Project Dissenters My Brush with Greatness?. 1992 : Two years into the HGP, two of the projects biggest critics were… - PowerPoint PPT Presentation
Popular Tags:
34
Today Please read… Science 291: 1304- 1315
Transcript
Page 1: Today

Today

• Please read…

Science 291: 1304-1315

Page 2: Today

Human Genome Project DissentersMy Brush with Greatness?

• 1992: Two years into the HGP, two of the projects biggest critics were…

– Sydney Brenner: believed that the HGP should focus on human EST collections, and sequence the genome of a simple vertebrate (Fugu).

– Craig Venter: believed that the clone-by-clone approach was not the most efficient way to proceed, suggested that shotgun approaches, and even a whole genome approach was feasible.

…they were both right.

Page 3: Today

Sydney Brenner

2002 Nobel Prize (Medicine/Physiology)

Sydney Brenner and John E. Sulston, Britain

H. Robert Horvitz, United States

– for discoveries concerning how genes regulate organ

development and a process of programmed cell death.

Page 4: Today

End sequenced cDNAs(complementary DNA)

Expressed Sequence TagsESTs

cDNA: synthetic DNA transcribed from a mRNA template,

– through the action of an RNA dependant DNA polymerase called reverse transcriptase.

Online Primer: est.html

Brenner was right….

Page 5: Today

Still Sequencing cDNAs,

- first and easiest look into any genome,

- useful in understanding genomic sequence (gene finding),

- helps determine splice site variants,

- shorter than genomic clones, fits in plasmids,

- etc.

Page 6: Today

…tissue specific ESTs are very useful.

Used for microarrays…

…an array of DNA that can be hybridized with probes to study patterns of gene expression.

Page 7: Today

Whole Genome Assembly• 1995: 1.8 Mbp Haemophilus influenza genome sequenced,

• 1996 - on : Mycoplasma, E. coli and others*,

• 1999: Chromosome 2 of Arabidopsis,

• 2000: Drosophila (120 Mbp) genome,

…Human, Mosquito, etc…

• Lots of genomes, several applications...

*WGA of bacterial, viral populations...

Venter was right….

J. Craig Venter

Page 8: Today
Page 9: Today

• 1 year, 120 megabases,

• Assembly algorithms could generate accurate genomic sequences,

• Interim assemblies (or mapping) were not necessary.

24 MARCH 2000 VOL 287 SCIENCE

Page 10: Today

Big Biology

Page 11: Today

Think About This…

…the plasmid library construction is the first critical step in WGA sequencing,

– “if the DNA libraries are not uniform in size, non-chimeric, and do not randomly represent the genome, then the subsequent steps cannot accurately reconstruct the genome sequence.”

– “We used automated high-throughput DNA sequencing and the computational infrastructure to enable efficient tracking of enormous amounts of sequence information (27.3 million sequence reads; 14.9 billion bp of sequence).”

Page 12: Today

Who’s DNA?

• 21 enrolled donors,

– age, sex, ethnographic group,

– one African-American,

– one Asian-Chinese,

– one Hispanic-Mexican,

– two Caucasions*.

Page 13: Today

Who’s Mostly?

J. Craig Venter

Page 14: Today
Page 15: Today

8, September 1999 - 25, June 2000 543 bp average sequence read

…back to humans…

What to know?Individuals,Libraries,

Sequence coverage,Clone coverage,Other?

Page 16: Today
Page 17: Today

WGA Outline

Online Primer:snps.html

Page 18: Today

5’- actgtacgtgtagctgaca… - 3’ 5’- tagcgtagttattttgc… - 3’

=

sequenced ends~543 bp

unsequenced insert~ known size

=

5’- actgtacgtgtagctgaca

actgtacgtgtagctgaca - 3’

insert

vector

sequencing primersDNA in sized libraries…

DNA sequence in mate-pairs…cartoons

Page 19: Today

8, September 1999 - 25, June 2000 543 bp average sequence read

…back to humans…

What to know?Individuals,Libraries,

Sequence coverage,Clone coverage,Other?

Page 20: Today

Whole Genome Assembly

What does Shredder Do?Why?

1. Screener

2. Overlapper

3. Unitigger/Discriminator,

4. Scaffolder,

5. Repeat Resolver.

Page 21: Today

Screener

...finds and “masks” microsatellite repeats, known repeated regions and ribosomal DNA, etc.

– “masked” regions not used to make contigs,

– “marks” the rest for overlapping.

atgacttacttactgcatatttatttatttatttatttatttatttatttatttatttatttatttatttatttatttgacgtgtacgtgtacgtgtagctgtacgtgtacgtgacgggccgcattatcgtgatgctacgtgtacgttatatctgatcgtgcatgtga

read:

atgacttacttactgcatatttatttatttatttatttatttatttatttatttatttatttatttatttatttatttgacgtgtacgtgtacgtgtagctgtacgtgtacgtgacgggccgcattatcgtgatgctacgtgtacgttatatctgatcgtgcatgtga

masked:

atgacttacttactgcatatttatttatttatttatttatttatttatttatttatttatttatttatttatttatttgacgtgtacgtgtacgtgtagctgtacgtgtacgtgacgggccgcattatcgtgatgctacgtgtacgttatatctgatcgtgcatgtga

marked:

Page 22: Today

Overlapper

...looks for end-to end overlaps of at least 40 bp with no more than 6% differences in match,

What’s the significance? ...a one in 1017 event.

<--tactgtacgtagctgtgatgttcctcggatatagcgggcatatttattacgctattgtacgtgt-3’

5’- gttcctcggatatagcgggcatatttattacgctattgtacgtgtaaagtatcgt-->

> 40 bp, < 6% mismatch

…given perfect randomness.

Page 23: Today

Good News

... uniquely assembled contigs (unitigs) are readily identifiable,

– all of the assembled sequences match over all of the known sequence,

- and -

...are consistent with an 8x sequence coverage.

Page 24: Today

Whole Genome Assembly

What does Shredder Do?Why?

1. Screener

2. Overlapper

3. Unitigger/Discriminator,

4. Scaffolder,

5. Repeat Resolver.

Page 25: Today

Unitigs

...contig cluster is consistent with expected size (+8),

...no dissimilar sequences between any members.

...the Screener doesn’t include all of the “low frequency” level repeats,

...so, a majority of the Overlapper outputs turned out to be bogus.

But(t):

Page 26: Today

What Now?

– “over-collapsed” assemblies are identified and broken down into unitigs when possible...

– …these “too-large” contig sets are sent to the Unitigger/Discriminator.

Page 27: Today

...over-collapsed.

...in a world where real data matches expected data, each locus would have 8X coverage,

...if there are genomic repeats, then sequences would be “over-represented”, on average, 8 more per repeat, per contig.

Unitigger...differentiates between a true overlap, and an overlap that includes more

than one loci.

Page 28: Today

Discriminator

...parses the “over-collapsed” contig by using sequence outside of the overlap region

Page 29: Today

Discriminator

...may yield u-unitigs.

Unitigger/Discriminator Output: correctly assembled contigs covering 73.6% of the genome.

Page 30: Today

Scaffolder

...contigs the contigs,

– uses mate-pair information, two or more consistent mate-pair matches yields 1 in 1010 odds of being chance.

Page 31: Today

Repeat Resolver ...most of the remaining gaps were due to repeats.

“Rocks”

Use “low Discriminator Value” contig sets to fill gaps,

- find two or more mate pairs with unambiguous matches in the scaffold near the gap (2 kb, 10kb or 50 kb), (1 in 107),

“Stones”

- find mate pair matches 2 kb, 10 kb, and 50 kb from gap, place the mate in the gap, check to see if it’s consistent with other “placed” sequences.

confirm matches

Page 32: Today

Repeat Resolver ...most of the remaining gaps were due to repeats.

“Rocks”

Use “low Discriminator Value” contig sets to fill gaps,

- find two or more mate pairs with unambiguous matches in the scaffold near the gap (2 kb, 10kb or 50 kb), (1 in 107),

“Stones”

- find mate pair matches 2 kb, 10 kb, and 50 kb from gap, place the mate in the gap, check to see if it’s consistent with other “placed” sequences.

Page 33: Today

If that Doesn’t Work

...find a mate-pair that spans the gap, and sequence it,

Sequence Walking

...make sequencing primer from BES...

Page 34: Today

Wednesday

• Questions about WGA,

• CSA,

• Comparisons,

• Quality Control, etc.


Recommended