+ All Categories
Home > Data & Analytics > 2011 jeroen vanhoudt_ngs

2011 jeroen vanhoudt_ngs

Date post: 08-Feb-2017
Category:
Upload: diana-mm
View: 2,724 times
Download: 0 times
Share this document with a friend
73
Next–generation DNA sequencing technologies – theory & practice
Transcript
Page 1: 2011 jeroen vanhoudt_ngs

Next–generation DNA sequencing technologies –

theory & practice

Page 2: 2011 jeroen vanhoudt_ngs

Next-Generation sequencing (NGS) technologies – overview

NGS targeted re-sequencing – fishing out the regions of interest

NGS workflow: data collection and processing – the exome sequencing pipeline

Outline

Page 3: 2011 jeroen vanhoudt_ngs

PART I: NGS technologiesNext-Generation sequencing (NGS) technologies – overview

Page 4: 2011 jeroen vanhoudt_ngs

The automated Sanger method is considered as a ‘first-generation’ technology, and newer methods are referred to as next-generation sequencing (NGS).

DNA Sequencing – the next generation

Page 5: 2011 jeroen vanhoudt_ngs

1953 Discovery of DNA double helix structure 1977

◦ A Maxam and W Gilbert "DNA seq by chemical degradation"◦ F Sanger"DNA sequencing with chain-terminating inhibitors"

1984 DNA sequence of the Epstein-Barr virus, 170 kb 1987 Applied Biosystems - first automated sequencer 1991 Sequencing of human genome in Venter's lab 1996 P. Nyrén and M Ronaghi - pyrosequencing 2001 A draft sequence of the human genome 2003 human genome completed 2004 454 Life Sciences markets first NGS machine

Landmarks in DNA sequencing

Page 6: 2011 jeroen vanhoudt_ngs
Page 7: 2011 jeroen vanhoudt_ngs

Random genome sequencing• 25 Mb• 300k reads• 110bp

Sanger sequencing• Targeted • 700-1000 bp

DNA Sequencing – the next generation

Page 8: 2011 jeroen vanhoudt_ngs

The newer technologies constitute various strategies that rely on a combination of ◦ Library/template preparation◦ Sequencing and imaging

DNA Sequencing – the next generation

Page 9: 2011 jeroen vanhoudt_ngs

Commercially available technologies◦ Roche – 454

GSFLX titanium Junior

◦ Illumina HiSeq2000 MySeq

◦ Life – SOLiD 5500xl Ion torrent

◦ Helicos BioSciences – HeliScope◦ Pacific Biosciences – PacBio RS

DNA Sequencing – the next generation

Page 10: 2011 jeroen vanhoudt_ngs

DNA Sequencing – the next generation

Page 11: 2011 jeroen vanhoudt_ngs

Produce a non-biased source of nucleic acid material from the genome

Template preparation: STEP1

Page 12: 2011 jeroen vanhoudt_ngs

Produce a non-biased source of nucleic acid material from the genome

Template preparation: STEP1

Page 13: 2011 jeroen vanhoudt_ngs

Produce a non-biased source of nucleic acid material from the genome

Current methods:◦ randomly breaking genomic DNA into smaller

sizes◦ Ligate adaptors◦ attach or immobilize the template to a solid

surface or support◦ the spatially separated template sites allows

thousands to billions of sequencing reactions to be performed simultaneously

Template preparation

Page 14: 2011 jeroen vanhoudt_ngs

Clonal amplification◦ Roche – 454◦ Illumina – HiSeq◦ Life – SOLiD

Single molecule sequencing◦ Helicos BioSciences – HeliScope◦ Pacific Biosciences – PacBio RS

Template preparation

Page 15: 2011 jeroen vanhoudt_ngs

In solution – emulsion PCR (emPCR)◦ Roche – 454◦ Life – SOLiD

Solid phase – Bridge PCR◦ Illumina – HiSeq

Template preparation: Clonal amplification

Page 16: 2011 jeroen vanhoudt_ngs

Template preparation: Clonal amplification - emPCR

Page 17: 2011 jeroen vanhoudt_ngs

Sequencing

SOLiD 454

Page 18: 2011 jeroen vanhoudt_ngs

Pyrosequencing

Picotitre plate Pyrosequencing

Page 19: 2011 jeroen vanhoudt_ngs

Pyrosequencing

Page 20: 2011 jeroen vanhoudt_ngs

Sequencing by ligation

Page 21: 2011 jeroen vanhoudt_ngs

Sequencing by ligation

Page 22: 2011 jeroen vanhoudt_ngs

Sequencing by ligation

Page 23: 2011 jeroen vanhoudt_ngs

Template preparation: Clonal amplification – Bridge PCR

Page 24: 2011 jeroen vanhoudt_ngs

Template preparation: Single molecule templates

Heliscope BioPac

Page 25: 2011 jeroen vanhoudt_ngs

HiSeq Heliscope

Page 26: 2011 jeroen vanhoudt_ngs

The major advance offered by NGS is the ability to cheaply produce an enormous volume of data

The arrival of NGS technologies in the marketplace has changed the way we think about scientific approaches in basic, applied and clinical research

DNA Sequencing – the next generation

Page 27: 2011 jeroen vanhoudt_ngs

PART II: NGS targeted resequencing

fishing out the regions of interest

Page 28: 2011 jeroen vanhoudt_ngs

The beginning

Random genome

sequencing

??? ??? Sanger sequencing• Targeted • 700-1000 bp

Page 29: 2011 jeroen vanhoudt_ngs

Library/template preparation Library enrichment for target Sequencing and imaging

DNA Sequencing – the next generation

Page 30: 2011 jeroen vanhoudt_ngs

Target enrichment strategies

Random genome

sequencing

Hybrid Capture

PCR based Sanger sequencin

g

Page 31: 2011 jeroen vanhoudt_ngs

Target enrichment strategies

Page 32: 2011 jeroen vanhoudt_ngs

Target enrichment strategies

Page 33: 2011 jeroen vanhoudt_ngs

Target enrichment strategies

Page 34: 2011 jeroen vanhoudt_ngs

Target enrichment strategies: MIP

Page 35: 2011 jeroen vanhoudt_ngs

Hybrid Capture

In solution• Agilent• Nimblegen• ...

Solid phase• Agilent• Nimblegen• Febit• ...

Page 36: 2011 jeroen vanhoudt_ngs

Hybrid Capture

In solution• Relatively cheap• High throughput is

possible• Small amounts of DNA

sufficient

Solid phase• Straightforward method• Flexible• Higher amounts of DNA

Page 37: 2011 jeroen vanhoudt_ngs

Target enrichment strategies

Page 38: 2011 jeroen vanhoudt_ngs

PCR based approaches

• Uniplex• Multiplex• Fluidigm• Raindance• Multiplicon

• Longrange PCR products• Raindance

Page 39: 2011 jeroen vanhoudt_ngs

PCR based approaches: Raindance

Page 40: 2011 jeroen vanhoudt_ngs

PCR based approaches: Fluidigm• 48.48 Access Array

Page 41: 2011 jeroen vanhoudt_ngs

PCR based approaches: Fluidigm• 48.48 Access Array

Page 42: 2011 jeroen vanhoudt_ngs

PCR based approaches: Fluidigm• 48.48 Access Array

Page 43: 2011 jeroen vanhoudt_ngs

Target enrichment strategies

Page 44: 2011 jeroen vanhoudt_ngs

PART III: NGS workflow

data collection and processing – the exome sequencing pipeline

Page 45: 2011 jeroen vanhoudt_ngs

The human genome◦ Genome = 3Gb◦ Exome = 30Mb◦ 180 000 exons

Protein coding genes ◦ constitute only approximately 1% of the human

genome ◦ It is estimated that 85% of the mutations with

large effects on disease-related traits can be found in exons or splice sites

Whole Exome Sequencing

Page 46: 2011 jeroen vanhoudt_ngs

gDNA3 Gb

Exome 38Mb NGS

Exome sequencing

Page 47: 2011 jeroen vanhoudt_ngs

1/01/2010 1/08/2010 1/01/2011

1100860

300

5900

2600

1000

7000

3460

1300

exome capture Seq - 2.5Gbases total cost

The past, present & future

Page 48: 2011 jeroen vanhoudt_ngs

HiSeq specifications:◦ 2 flow cells◦ 16 lanes (8 per flow cell)◦ 200-300 Gbases per flow cell◦ 10 days for a single run

Exome throughput◦ 96 @ 60x coverage per run◦ 3000 @ 60x coverage per year

Exome sequencing capacity

Page 49: 2011 jeroen vanhoudt_ngs

Data processing workflow

Data formatting & QC

Mapping & QC

Variant calling

Variant annotation

Variant filtering/comparison

Page 50: 2011 jeroen vanhoudt_ngs

Data processing

Page 51: 2011 jeroen vanhoudt_ngs
Page 52: 2011 jeroen vanhoudt_ngs

DATA STORAGEDATA GENERATION DATA PROCESSING

REPORTING &

VALIDATION

RESULTS

INTERPRETATION

Page 53: 2011 jeroen vanhoudt_ngs

Prepare sample library

Perfom exome capture

Perform sequencing

DATA GENERATION

Page 54: 2011 jeroen vanhoudt_ngs

Prepare sample library

Perfom exome capture

Perform sequencing

DATA GENERATION

Page 55: 2011 jeroen vanhoudt_ngs

Prepare sample library

Perfom exome capture

Perform sequencing

DATA GENERATION

Page 56: 2011 jeroen vanhoudt_ngs

Sequence Data10-15 Gb / exome

DATA STORAGEDATA GENERATION DATA PROCESSING

Image processingBase calling

Page 57: 2011 jeroen vanhoudt_ngs
Page 58: 2011 jeroen vanhoudt_ngs

NGS data processing: overview

1 •Mapping

2 •Duplicate marking

3 •Local realignment

4 •Base quality recalibration

5 •Analysis-ready mapped reads

Page 59: 2011 jeroen vanhoudt_ngs

Sequence Data10-15 Gb / exome

DATA STORAGEDATA GENERATION DATA PROCESSING

Image processingBase calling

QC sequencingMapping

sequencesQC capture exp

Page 60: 2011 jeroen vanhoudt_ngs

QC NGS

Mapping

QC HCDATA PROCESSING

Page 61: 2011 jeroen vanhoudt_ngs

QC NGS

Mapping

QC HCDATA PROCESSING

Page 62: 2011 jeroen vanhoudt_ngs

Sequence Data10-15 Gb / exome

DATA STORAGE

Mapping results5 Gb / exome

DATA GENERATION DATA PROCESSING

Image processingBase calling

QC sequencingMapping

sequencesQC capture exp

Variant CallingVariant Annotation

Page 63: 2011 jeroen vanhoudt_ngs
Page 64: 2011 jeroen vanhoudt_ngs
Page 65: 2011 jeroen vanhoudt_ngs
Page 66: 2011 jeroen vanhoudt_ngs

Sequence Data10-15 Gb / exome

DATA STORAGE

Mapping results5 Gb / exome

Variant Calls100Mb / exome

DATA GENERATION DATA PROCESSING

Image processingBase calling

QC sequencingMapping

sequencesQC capture exp

Variant CallingVariant Annotation

Page 67: 2011 jeroen vanhoudt_ngs

SNPs vs Indels

0

200000

400000

600000

800000

1000000

1200000

INDELSNP

Page 68: 2011 jeroen vanhoudt_ngs

exonic vs non-exonic

0

100000

200000

300000

400000

500000

600000

700000

800000

900000

1000000

stopgain SNVnonsynonymous SNVnonframeshift insertionnonframeshift deletionnon-codingframeshift insertionframeshift deletion

Page 69: 2011 jeroen vanhoudt_ngs

Exonic

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

synonymous SNVstoploss SNVstopgain SNVnonsynonymous SNVnonframeshift insertionnonframeshift deletionframeshift insertionframeshift deletion

Page 70: 2011 jeroen vanhoudt_ngs

Exonic

0

50

100

150

200

250

300

350

400

450

500

stoploss SNVstopgain SNVnonframeshift insertionnonframeshift deletionframeshift insertionframeshift deletion

Page 71: 2011 jeroen vanhoudt_ngs

Sequence Data10-15 Gb / exome

DATA STORAGE

Mapping results5 Gb / exome

Variant Calls100Mb / exome

DATA GENERATION DATA PROCESSING

Image processingBase calling

QC sequencingMapping

sequencesQC capture exp

Variant CallingVariant Annotation

Database knownVariants Public &

PrivateVariant Filtering

Page 72: 2011 jeroen vanhoudt_ngs
Page 73: 2011 jeroen vanhoudt_ngs

Sequence Data10-15 Gb / exome

DATA STORAGE

Mapping results5 Gb / exome

Variant Calls100Mb / exome

DATA GENERATION DATA PROCESSING

Image processingBase calling

QC sequencingMapping

sequencesQC capture exp

Variant CallingVariant Annotation

Database knownVariants Public &

PrivateVariant Filtering

REPORTING &

VALIDATION

RESULTSValidated variants in candidate

genes

INTERPRETATION


Recommended