+ All Categories
Home > Documents > Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers.

Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers.

Date post: 23-Dec-2015
Category:
Upload: earl-garrett
View: 218 times
Download: 1 times
Share this document with a friend
15
Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers
Transcript
Page 1: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers.

Sequencing Errors and Biases

Biological Sequence AnalysisBNFO 691/602 Spring 2013

Mark Reimers

Page 2: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers.

Outline

• Sequencing errors• Initiation biases• Quantification biases• Are biases consistent across samples?• Compensating biases

Page 3: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers.

Types of mismatches in Illumina data are profoundly asymmetric and biased

Courtesy Thierry-Miegfrom uniquely mapped tags with a single mismatch

Page 4: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers.

Position of single mismatch in uniquely mapped tags

Courtesy Thierry-Mieg

Page 5: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers.

Initiation Biases

Page 6: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers.

Nucleotide frequencies versus position for stringently mapped reads.

Hansen K D et al. Nucl. Acids Res. 2010;38:e131-e131

© The Author(s) 2010. Published by Oxford University Press.

Page 7: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers.

Start Position Bias is Visible in MT-RNA

Page 8: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers.

Start Position Bias is Consistent Across Samples

Counts per start site in lane 1 vs lane 2 (Marioni et al, Gen Res, 2008)

Page 9: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers.

Quantification Biases

Page 10: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers.

Consistent Technology-Specific Biases

(a) 25-kb region of chromosome 11 amplified by three long-range PCR products (red rectangles). (b) A heat-map colored matrix displays the correlation of coverage depth across 260 kb of sequence between four samples by three technologies from Harrismendy et al Genome Biology 2009

Page 11: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers.

Quantitative Biases

• Not all regions represented equally• GC rich regions represented more• Independent of GC some chromosome regions

represented more – Euchromatin bias

• Sequence initiation site biases• ‘Mapability’ biases – some regions won’t have

any uniquely mapped tags

Page 12: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers.

GC Bias

• Density of reads depends strongly on GC content of regions

• Most bias seems to come from PCR reaction

• Newer techniques show less bias but still strong GC content (%) of 1 kb region

Num

ber o

f Rea

ds in

1 k

b re

gion

From Dohm et al 2008

Page 13: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers.

GC Bias depends on temperature

• Aird et al (Genome Biology 2011) did systematic tests of effects of various conditions on GC bias

• They provided protocols that improve CG bias but don’t eliminate it

NB. Log scale

Page 14: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers.

Even Best Protocols have Bias• GC bias in Illumina reads from

a 400-bp fragment library amplified using the standard PCR protocol (Phusion HF, short denaturation) on a fast-ramping thermocycler (red squares), Phusion HF with long denaturation and 2M betaine (black triangles), AccuPrime Taq HiFi with long denaturation and primer extension at 65°C (blue diamonds) or 60°C (purple diamonds)

From Aird et al Genome Biology 2011

Page 15: Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers.

Biases Are NOT Consistent

• The plot on left shows Log-fold changes between RPKM values from two biological replicates (NA11918, NA12761) from the data of Montgomery et al, Nature 2010

• From Hansen et al 2012


Recommended