1 Next Generation Sequencing Itai Sharon November 11th, 2009 Introduction to Bioinformatics.

Post on 21-Dec-2015

223 views 0 download

Tags:

transcript

1

Next Generation Sequencing

Itai SharonNovember 11th, 2009Introduction to Bioinformatics

2

2010: 5K$, a few days?

2009: Illumina, Helicos40-50K$

Sequencing the Human Genome

Year

Log

10(p

rice)

201020052000

10

8

6

4

22012: 100$, <24 hrs?

2008: ABI SOLiD60K$, 2 weeks

2007: 4541M$, 3 months

2001: Celera100M$, 3 years

2001: Human Genome Project2.7G$, 11 years

3

In this Talk:

• Sequencing 1.0: Sanger• Assembly• Next generation sequencing (NGS)• NGS applications• Future directions

Genome Sequencing

• Goal figuring the order of nucleotides across a genome

• Problem Current DNA sequencing methods can handle only

short stretches of DNA at once (<1-2Kbp)

• Solution Sequence and then use computers to assemble the

small pieces

4

Genome Sequencing

55

ACGTGGTAA CGTATACAC TAGGCCATA GTAATGGCG CACCCTTAG TGGCGTATA CATA…

ACGTGGTAATGGCGTATACACCCTTAGGCCATA

Short fragments of DNA

AC..GCTT..TC

CG..CA

AC..GC

TG..GT TC..CC

GA..GCTG..AC

CT..TGGT..GC AC..GC AC..GC

AT..ATTT..CC

AA..GC

Short DNA sequences

ACGTGACCGGTACTGGTAACGTACACCTACGTGACCGGTACTGGTAACGTACGCCTACGTGACCGGTACTGGTAACGTATACACGTGACCGGTACTGGTAACGTACACCTACGTGACCGGTACTGGTAACGTACGCCTACGTGACCGGTACTGGTAACGTATACCTCT...

Sequenced genome

Genome

Sanger Sequencing

6

Sanger Sequencing

7

Sanger Sequencing

• Advantages Long reads (~900bps) Suitable for small projects

• Disadvantages Low throughput Expensive

8

Assembly

9

9

Cut DNA to larger pieces (2Kbp, 15Kbp) and sequence both ends of each piece (Fleischmann et al., 1994)

contig 1 contig 215Kbp mates

2Kbp mates

~(length―1,000)

~500 bp ~500 bp

resolving repeats

Better assembly of contigs, gap lengths estimation

many pieces to assemble

High coverage:

Assembly: How Much DNA?

10

Low coverage:

A few pieces to assemble

a few contigs, a few gaps

many contigs, many gaps

Input OutputLander and Waterman,

1988

Sanger Sequencing

11

1980 1990 2000

1982: lambda virusDNA stretches up to 30-40Kbp (Sanger et al.)

1994: H. Influenzae1.8 Mbp (Fleischmann et al.)

2001: H. Sapiens, D. Melanogaster3 Gbp (Venter et al.)

2007: Global Ocean Sampling Expedition~3,000 organisms, 7Gbp (Venter et al.)

12

Next Generation Sequencing: Why Now?

13

High Parallelism is Achieved in Polony Sequencing

PolonySanger

14

Generation of Polony array: DNA Beads (454, SOLiD)

DNA Beads are generated using Emulsion PCR

15

Generation of Polony array: DNA Beads (454, SOLiD)

DNA Beads are placed in wells

16

Generation of Polony array: Bridge-PCR (Solexa)

DNA fragments are attached to array and used as PCR templates

17

Sequencing: Pyrosequencing (454)

Complementary strand elongation: DNA Polymerase

18

Sequencing: Fluorescently labeled Nucleotides (Solexa)

Complementary strand elongation: DNA Polymerase

19

Sequencing: Fluorescently Labeled Nucleotides (ABI SOLiD)

Complementary strand elongation: DNA Ligase

20

Sequencing: Fluorescently Labeled Nucleotides (ABI SOLiD)

5 reading frames, each position is read twice

21

Single Molecule Sequencing: HeliScope

22

Technology Summary

Read length Sequencing Technology

Throughput (per run)

Cost (1mbp)*

Sanger ~800bp Sanger 400kbp 500$

454 ~400bp Polony 500Mbp 60$

Solexa 75bp Polony 20Gbp 2$

SOLiD 75bp Polony 60Gbp 2$

Helicos 30-35bp Single molecule

25Gbp 1$

*Source: Shendure & Ji, Nat Biotech, 2008

23

What, When and Why

• Sanger:Small projects (less than 1Mbp)

• 454:De-novo sequencing, metagenomics

• Solexa, SOLiD, Heliscope:– Gene expression, protein-DNA interactions– Resequencing

24

Applications

25

Applications

26

Where Do We Go from Here?

• Higher throughput, longer reads (Pacific BioSciences)

• Computational bottleneck• Shift to sequencing-based technologies• Will it help to cure cancer?