+ All Categories
Home > Documents > Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of...

Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of...

Date post: 13-Jan-2016
Category:
Upload: christiana-goodman
View: 216 times
Download: 2 times
Share this document with a friend
Popular Tags:
56
Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015
Transcript
Page 1: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Lecture 4: DNA Sequencing in the Genomics Era

Sandy Simon

Genomics Research FellowDepartment of Biology

August 28, 2015

Page 2: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

What is Genomics?

Study and analysis of all of the DNA contained in an organism

Comprehensive blueprint for what makes each individual unique

Powerful method for studying the integrated functions of an organism and even of a whole community

http://www.nature.com

Page 3: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Genomics is revolutionizing the study of living systems

Human health

Agriculture

Environment

Fundamental biology

Huge economic payoffs from genomics research:

$3.8 billion investment in human genome sequencing has yielded $796 billion in economic development and 310,000 jobs in the United States1

http://cisncancer.org

1Tripps and Grueber 2011. Economic Impact of the Human Genome Project. Battelle Memorial

Institute.

Impacts of Genomics

Page 4: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

PCR and the Molecular Revolution PCR: Polymerase Chain

Reaction

Invented by Kary Mullis in 1983

Exponential amplification of a specific sequence of DNA

Most important molecular marker techniques involve PCR

Components: primers, nucleotides, template, thermostable polymerase

http://www.dnalc.org/ddnalc/resources/pcr.html

Page 5: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Molecular markers provide closer link between phenotype and genotype

“Anonymous” molecular markers: RFLP, RAPD, AFLP and GBS: no knowledge of underlying sequence polymorphism or location in genome

“Sequence-Tagged” markers like microsatellites or SNPs derived from defined locations in genome

Often reveal higher levels of polymorphism than allozymes and morphological markers

Allow studies of neutral variation in natural populations

Molecular Markers

Page 6: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Anonymous markers often have short “primer” sequences (e.g., 10 bp primer sequences in RAPD)

Randomly amplify portions of genome

Sequence-Tagged markers have longer primers (e.g., 20 bp for microsatellite primers)

Anonymous and Sequence-Tagged Markers

AGTTCAGAGT

ATGCTGAGGTCGCTTAGCAGctctctctctctctctctctcctctctctctctctGGATCCTGAATGCTGACTG

TCAAGTCTCA

agctggactacctctacgtcagcTGAGACTTGAACTCTGAACT

ATGCTGAGGTCGCTTAGCAGctctctctctctctGGATCCTGAATGCTGACTG

Page 7: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

DNA Sequencing

Direct determination of sequence of bases at a location in the genome

Shotgun versus PCR sequencing

Dye terminators (Sanger) and capillaries revolutionized DNA sequencing

Modern sequencing methods (sequencing by synthesis, pyrosequencing) have catapulted sequencing into realm of population genetics

Page 8: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

SNPs A Single Nucleotide

Polymorphism (SNP) is a single base mutation in DNA.

The most common source of genetic polymorphism (e.g., 90% of all human DNA polymorphisms).

Identify SNP by screening a sample of individuals from study population: usually 16 to 48

Once identified, SNP are assayed in populations using high-throughput methods

Page 9: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

If nucleotides occur randomly in a genome, which sequence should occur

more frequently?AGTTCAGAGT

AGTTCAGAGTAACTGATGCT

What is the expected probability of each sequence to occur once?

How many times would each sequence be expected to occur by chance in a

100 Mb genome?

Page 10: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

AGTTCAGAGT

What is the expected probability of each sequence to occur once?

What is the sample space for the first position?A

T

G

C

Probability of “A” at that position?

4

1

Probability of “A” at position 1, “G” at position 2, “T” at position 3, etc.?

710 1054.925.04

1

4

1

4

1

4

1

4

1

4

1

4

1

4

1

4

1

4

1 xxxxxxxxxx

AGTTCAGAGTAACTGATGCT

1320 1009.925.0 x

Page 11: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

AGTTCAGAGT

How many times would each sequence be expected to occur in a 100 Mb

genome?

4.95101054.9 87 x

AGTTCAGAGTAACTGATGCT

5813 101.9101009.9 xx

Why is this calculation wrong?

Page 12: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

www.wv-inbre.net/bioinformatics/slides/IST444Genomicsequencing.ppt

Page 13: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

www.wv-inbre.net/bioinformatics/slides/IST444Genomicsequencing.ppt

Page 14: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.
Page 15: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Automated DNA Sequence Readouts

www.wv-inbre.net/bioinformatics/slides/IST444Genomicsequencing.ppt

Page 16: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Capillary Sequencers

www.wv-inbre.net/bioinformatics/slides/IST444Genomicsequencing.ppt

Revolutionized DNA sequencing by enabling multiple samples to be analyzed in parallel

Page 17: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

www.wv-inbre.net/bioinformatics/slides/IST444Genomicsequencing.ppt

Page 18: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

www.wv-inbre.net/bioinformatics/slides/IST444Genomicsequencing.ppt

Page 19: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Human Genome Project Sequencing Strategy

Clone-based physical mapping

Digest genome and make Bacterial Artificial Chromosomes (BACs, 150,000 bp each)

Digest BACs to create fingerprints

Organize BACs to form contigs

Select BAC clones for sequencing

Shear BACs and shotgun clone

Sequence clones and assemble overlaps

www.wv-inbre.net/bioinformatics/slides/IST444Genomicsequencing.ppt

Page 20: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

J. Craig Venter and Shotgun Sequencing

Proposed a whole-genome shotgun sequencing method to NIH in 1991. Proposal rejected.

Sets up The Institute for Genomic Research (TIGR) in 1992 (private and non-profit)

TIGR publishes the first complete genome sequence in 1995 (Haemophilis influenzae)

Forms Celera Genomics in 1998 to sequence human genome in three years (private, for-profit)

The Sequence of the Human Genome is published in Science. February 2001

Venter departs Celera. 2002www.wv-inbre.net/bioinformatics/slides/IST444Genomicsequencing.ppt

Page 21: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Shotgun Sequencing Strategy

Whole-genome shotgun sequencing of five individuals with 5 to 100 fold coverage

Computer assembles overlapping sequences to form contigs

Contigs are assembled into scaffolds

Scaffolds are mapped to the genome by two or more Sequence Tagged Site (STS) markers

www.wv-inbre.net/bioinformatics/slides/IST444Genomicsequencing.ppt

Page 22: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

www.wv-inbre.net/bioinformatics/slides/IST444Genomicsequencing.ppt

Page 23: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.
Page 24: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.
Page 25: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

NextGen Challenge: Sequence Assembly New sequencing

technologies produce billions of small fragments of information that must be assembled to produce useful information about the target genome

Eukaryotic genomes are very large and complex

Billions of bases of DNA

Repetitive sequence

Polymorphisms

Page 26: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Computing 128 processors

available in clusters 128 Gb RAM, 24

processor server for Next-Gen sequence Assembly

Currently ~200 Tb of redundant storage

WVU HPC Cluster: 2300 node high capacity cluster with up to 500 Gb of RAM

STARS Server (WV-INBRE): 1.5 Tb of RAM

Page 27: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Sequencing Technology

Next Generation Sequencing

Illumnia MiSeq/HiSeq

Ion Torrent

Third Generation Sequencing

PacBio Sequencing

Nanopore Technology

Page 28: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Sequencing platform

Roche (454) FLX

Illumina HiSeq

Illumina MiSeq

ABI SOLiDIon Torrent/Ion Proton

Sequencing chemistry

Pyro-sequencing

Synthesis by reversible dye terminators

Synthesis by reversible dye terminators

Sequencing by ligation

Semiconductor Sequencing

Template amplification method

Emulsion PCR

Bridge PCR Bridge PCR Emulsion PCR

Emulsion PCR

Read length 400-800 bases

100 to 250 bases

100 to 300 bases

50 -100 bases

100 to 400 bp

Sequencing throughput/run

0.40–0.60 Gb 200–300 Gb 1.5 to 7.5 Gb 100–200 Gb 0.1 to 35 Gb

Sequencing run time

10 h 6 days 10 to 20 hours

6 days 2 hours

Approx. Cost per Machine

$700K $700K $125K $1M $50K to $150K

Comparison of HT-NGS sequencing Platforms

Page 29: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Library PreparationTruSeq DNA and Small RNA Sample Preparation Kits

Page 30: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

How does the MiSeq Work?

https://www.youtube.com/watch?v=womKfikWlxMOption 1: Flashy Illumina video:

Or Option 2: my potentially boring explanation

Page 31: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

How Does the MiSeq Work?

Data Analysis4

3Sequencing

Cluster Generation2

1Library Preparation

Page 32: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Cluster Generation

Bind single DNA molecules to surface

Amplify on

surface

~1000 molecules per ~ 1 µm cluster

Page 33: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Hybridize Fragment & Extend

Adapter sequenc

e

3’ extensio

n

Surface of flow cell coated with a lawn of oligo

pairs

Single DNA libraries are hybridized to

primer lawn

Bound libraries then extended by polymerases

Page 34: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Newly synthesized

strandOriginal template

Denature Double-Stranded DNA

discard

Double-stranded molecule is denatured

Original template washed away

Newly synthesized strand is covalently attached to flow cell

surface

Page 35: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Bridge Amplification

Single-stranded molecule flips over and forms a bridge by

hybridizing to adjacent, complementary primer

Hybridized primer is extended by polymerases

Page 36: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Bridge Amplification

Double-stranded bridge is formed

Page 37: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Denature Double-Stranded Bridge

Double-stranded bridge is denatured

Result:Two copies of covalently bound single-stranded

templates

Page 38: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Bridge Amplification

Single-stranded molecules flip over to hybridize to adjacent

primers

Hybridized primer is extended by polymerase

Page 39: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Bridge Amplification

Bridge amplification cycle repeated until multiple

bridges are formed

Page 40: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Linearization

dsDNA bridges are denatured

Page 41: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Reverse Strand Cleavage

Reverse strands cleaved and washed away, leaving

a cluster with forward strands only

Page 42: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Blocking

Free 3’ ends are blocked to prevent unwanted

DNA priming

Page 43: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Read 1 Primer Hybridization

Sequencing primer

Sequencing primer is hybridized to adapter

sequence

Page 44: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

MiSeq Sequencing Workflow

Data Analysis4

3Sequencing

Cluster Generation2

1Library Preparation

Page 45: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Add 4 Fl-NTP’s +

Polymerase

Incorporated FI-NTP imaged

Terminator & fluorescent dye

cleaved from FI-NTP

X 36 - 151

Sequencing by Synthesis

Page 46: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Sequencing

A image

C image

T image

G image

After imaging is complete for one

section (tile), the flow cell is moved to the

next tile and the process is repeated

Clusters are images using LED and filter

combinations specific for each fluorescently-

labeled nucleotide

Imaging for the 1st cycle takes ~3 min., including focusing

routines

Page 47: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Ion Torrent Technology

Page 48: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Ion Torrent Platforms

Ion PGM 10 mb to 1 Gb capacity

per run

50 to 200 bp reads

$500 per run

Ion Proton

30 Gb capacity per run

200 to 400 bp reads

$1000 per run

Page 49: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.
Page 50: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Longest reads currently available (up to 10 kb with strobing)

Very high error rates Effective in “hybrid assemblies”

combining accurate technology (Illumina) with long reads

Page 51: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Nanopore sequencing

Page 52: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Structure of Protein Nanopore

https://www.iths.org/sites/www.iths.org/files/eventmedia/

ITHS_ThirdGenerationSequencers.pdf

Page 53: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

The Future: nanopore sequencing

Supposedly will sequence a human genome in one day

Single strand sequencing Reads in the hundreds of Kb size

range

Page 54: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Applications of NextGen Sequencing

Whole genome de-novo sequencing

Genome resequencing: discovery of polymorphisms among and within individuals

Identification of disease determinants

Diagnosis

Transcriptome sequencing/gene expression

Metagenomics

Population genetics/marker analyses

Page 55: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Genotyping by Sequencing New sequencing methods generate 10’s of millions of short

sequences per run

Combine restriction digests with sequencing and pooling to genotype thousands of markers covering genome at very high density

http://www.maizegenetics.net/images/stories/GBS_CSSA_101102sem.pdf

Generate 10’s of thousands of markers for <$100 per sample

Presence-Absence Polymorphism

SNP

Page 56: Lecture 4: DNA Sequencing in the Genomics Era Sandy Simon Genomics Research Fellow Department of Biology August 28, 2015.

Genotyping by Sequencing Cost Example

http://www.maizegenetics.net/gbs-overview


Recommended