+ All Categories
Home > Documents > DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1...

DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1...

Date post: 11-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
51
DepthOfCoverage Genetics for Dummies 2017 NGS II Illumina Sequencing Robert Kraaij Department of Internal Medicine [email protected]
Transcript
Page 1: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

DepthOfCoverage Genetics for Dummies 2017

NGS II – Illumina Sequencing

Robert Kraaij

Department of Internal Medicine

[email protected]

Page 2: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

• Data Analysis

• Applications

• Example: Exome Sequencing

Overview

Page 3: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Things to be addressed

NGS: many short reads that might contain errors

data analysis will handle these reads and errors

Page 4: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

• Data Analysis

• Applications

• Example: Exome Sequencing

Overview

Page 5: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

cBot

flowcell

bridgePCR

HiSeq2000

Illumina Sequencing

Page 6: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Per Cycle Imaging

Page 7: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

G A T C

Per Cycle Imaging

Page 8: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

G

good quality

G

poor quality

Per Cycle Base Calling

Page 9: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Phred Score Incorrect base Accuracy

10 1 in 10 90 %

20 1 in 100 99 %

30 1 in 1000 99.9 %

40 1 in 10000 99.99 %

50 1 in 100000 99.999 %

0 to 93 ASCII 33 to 126 = single character

Quality Scoring

Page 10: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

@SEQ_ID

GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTC

+SEQ_ID

!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>

FASTQ File

Page 11: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

T A C G G T A C T T G C A T A

G A T T A C G G T A C T T G C A T A G C T

Alignment or Mapping of Reads

R E F E R E N C E G E N O M E (HG19)

chromosome + position + strand

sample.bam

Page 12: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Run QC and filtering

sample.bam

Page 13: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

sample.bam

• both reads

• quality scores

• chromosome

• position

• quality flag

• duplicate flag

• off target flag

sortedBAM file

Page 14: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Coverage

T A C G G T A C T T G C A T A

G A T T A C G G T A C T T G C A T A G C T

A C G G T A C T T G C A T A G

G A T T A C G G T A C T T G C

G G T A C T T G C A T A G C T

T T A C G G T A C T T G C A T

5x coverage

Page 15: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Mean Coverage

bases on target

size of target

Page 16: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

% of Bases Above a Certain Threshold

T A C G G T A C T T G C A T A

G A T T A C G G T A C T T G C A T A G C T

A C G G T A C T T G C A T A G

G A T T A C G G T A C T T G C

G G T A C T T G C A T A G C T

T T A C G G T A C T T G C A T

5x 5x 4x 1x

Page 17: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Variant Calling

T A C G G T G C T T G C A T A

G A T T A C G G T A C T T G C A T A G C T

A C G G T G C T T G C A T A G

G A T T A C G G T G C T T G C

G G T G C T T G C A T A G C T

T T A C G G T G C T T G C A T

G = homozygous alternative

- - - - G A T T A C G G T G C

C G G T G C T T G C A T A G C

T G C A T A G C T - - - - - -

A T T A C G G T G C T T G C A

Page 18: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Variant Calling

T A C G G T G C T T G C A T A

G A T T A C G G T A C T T G C A T A G C T

A C G G T G C T T G C A T A G

G A T T A C G G T A C T T G C

G G T G C T T G C A T A G C T

T T A C G G T A C T T G C A T

A/G = heterozygous

- - - - G A T T A C G G T A C

C G G T G C T T G C A T A G C

T G C A T A G C T - - - - - -

A T T A C G G T G C T T G C A

Page 19: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Variant Calling

T A C G G T G C T T G C A T A

G A T T A C G G T A C T T G C A T A G C T

A C G G T G C T T G C A T A G

G A T T A C G G T A C T T G C

A/G = heterozygous?

Page 20: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Variant Calling

T A C G G T G C T T G C A T A

G A T T A C G G T A C T T G C A T A G C T

A C G G T G C T T G C A T A G

G A T T A C G G T A C T T G C

G

sequencing quality

good poor

Page 21: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

sample.vcf

• chromosome

• position

• quality

• annotations

VCF File

Page 22: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Variant Calling

T A C G G T G C T T G C A T A

G A T T A C G G T A C T T G C A T A G C T

A C G G T G C T T G C A T A G T A G

G A T T A C G G T A C T T G C

G G T G C T T G C A T A G C T

- G A T T A C G G T A C T T G C A T

deletion = heterozygous

- - - - G A T T A C G G T A C

C G G T G C T T G C A T A G C

T G C A T A G C T - - - - - -

- G A T T A C G G T G C T T G C A

Page 23: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Paired-End Sequencing

2 x 100 bp

Page 24: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Variant Calling: Mate Pairs

normal

400 bp

deletion

800 bp

insertion

200 bp

Page 25: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Variant Calling: Mate Pairs

normal

400 bp

translocation

Page 26: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Variant Calling: Split Reads

genome

800 bp

mRNA (cDNA)

Page 27: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

• Data Analysis

• Applications

• Example: Exome Sequencing

Overview

Page 28: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Applications

• Re-sequencing full genome SNPs and indels

• Re-sequencing mate pairs structural variations

• Re-sequencing regional SNPs and indels

• Sequencing de novo assembly

• RNAseq

• ChIPseq

• …seq

Page 29: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

www.illumina.com

Page 30: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Example:

Exome Sequencing

Page 31: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

funding by NGI-NCHA, NWO, BBMRI

n > 3,000 samples of random set from RS-I

start May 2011; Nimblegen

part of “CHARGE-S” effort:

>5,000 exomes across 4 cohorts

Framingham, CHS, ARIC, Rotterdam Study

Expand with exome variants array?

CHARGE

Exome Sequencing

Page 32: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Exome vs Full Genome

exon exon exon genome 3 Gb

exome ~30 Mb

Page 33: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Exome Sequencing Workflow

DNA

isolation

Library

preparation

Exome

capture Sequencing

Data

analysis

Page 34: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

+

+

Exome

capture

Page 35: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Nimblegen SeqCap EZ v2 Capture

• CCDS (Sept 2009)

• miRBase (v14, Sept 2009)

• RefSeq (Jan 2010)

• 2,100,000 probes

• 30,246 coding genes

• 329,028 exons

• 710 miRNAs

• 36.5 Mb primary target

• 44.1 Mb capture target

Page 36: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Illumina TruSeq V3 2x100 PE Sequencing

Page 37: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Data analysis: BWA-GATK pipeline

• BclToFastQ (CASAVA)

• Chastity Filter

Demultiplexing

• BWA (paired)

• SortSam, MarkDuplicates (picard)

Alignment • BaseQualityScore

Recalibration, IndelRealignment (GATK)

Processing

• HaplotypeCaller

• VQSR

• VarEval

Variant-Calling • ANNOVAR,

VCFtools

• PlinkSeq, SKAT, R

• Spotfire

Analysis

Page 38: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Sample QC and Variant QC

Page 39: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

RSX-2 Samples were sequenced to ~54x Mean Coverage

Average Mean Depth of Coverage

across the 44Mb SeqCap Exome

Perc

enta

ge o

f 44M

b c

overe

d 1

0x o

r b

ett

er

Page 40: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Mean Depth of Coverage by Flowcell

Mean D

epth

of

Covera

ge

Flowcell Number (Roughly Chronological Order)

Page 41: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Freemix Values by Flowcell

Estim

ate

d F

reem

ix V

alu

es

Flowcell Number (Roughly Chronological Order)

Page 42: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Determing Heterozygous Concordance versus 550k

genotyping arrays

Hete

rozygous C

oncord

ance

Flowcell Number (Roughly Chronological Order)

Page 43: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Comparing Concordance versus Freemix reveals cutoff

around 13% correction

Hete

rozygous C

oncord

ance

Estimated Freemix Values

Page 44: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Sample QC and Variant QC

Page 45: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Number of Detected SNPs per Samples by Flowcell

Flowcell Number (Roughly Chronological Order)

Page 46: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Heterozygous to Homozygous ratio per Sample by

Flowcell

Flowcell Number (Roughly Chronological Order)

Page 47: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

purines

Transition to Transversion Ratio

pyrimidines

tran

svers

ion

transition

Page 48: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Transition to Transversion Ratio per Sample by Flowcell

Flowcell Number (Roughly Chronological Order)

Page 49: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

QC and filtering results

Page 50: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000
Page 51: DepthOfCoverage Genetics for Dummies 2017 slides R… · Phred Score Incorrect base Accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 % 50 1 in 100000

Things to Remember

NGS: many short reads that might contain errors

coverage indicates the number of independent reads that

cover a base needed to analyse a genome

FASTQ file sequence + quality scores

BAM file aligned reads

VCF file called variants + annotation


Recommended