+ All Categories
Home > Science > MacManes Evolution2014 trimming talk

MacManes Evolution2014 trimming talk

Date post: 08-Jul-2015
Category:
Upload: matthew-macmanes
View: 363 times
Download: 3 times
Share this document with a friend
Description:
This is a talk I presented at the Evolution meeting in Raleigh, NC in June 2014. It describes the work to date establishing optimal trimming for mRNAseq data.
14
Optimal Trimming of mRNA sequence data Matthew MacManes University of New Hampshire Twitter: @PeroMHC [email protected]
Transcript
Page 1: MacManes Evolution2014 trimming talk

Optimal Trimming of mRNA sequence data

Matthew MacManes University of New Hampshire !

Twitter: @PeroMHC [email protected]

Page 2: MacManes Evolution2014 trimming talk

Quality trimming of NGS data

• Universal practice

0.0

0.1

0.2

0.3

0.4

Nucleotide Position

Prob

abilit

y of

nuc

leot

ide

erro

r

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

Page 3: MacManes Evolution2014 trimming talk

Quality trimming of NGS data0.

00.

10.

20.

30.

4

Nucleotide Position

Prob

abilit

y of

nuc

leot

ide

erro

r

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

Phred=5

Page 4: MacManes Evolution2014 trimming talk

Quality trimming of NGS data0.

00.

10.

20.

30.

4

Nucleotide Position

Prob

abilit

y of

nuc

leot

ide

erro

r

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

Phred=10

Page 5: MacManes Evolution2014 trimming talk

Quality trimming of NGS data0.

00.

10.

20.

30.

4

Nucleotide Position

Prob

abilit

y of

nuc

leot

ide

erro

r

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

Phred=20

Page 6: MacManes Evolution2014 trimming talk

Trimming Experiment

• 2 Illumina datasets > adapter trimmed.

• Subsampled to 10M, 20M, 50M, 75M, 100M PE reads.

• Trimmed at Phred 0,2,5,10,20

• Assembled using Trinity and SOAPdenovo-Trans

• Developed metrics for evaluating transcriptome assemblies.

MacManes, Frontiers in Genetics 2014

Page 7: MacManes Evolution2014 trimming talk

1000

1400

1800

Num

ber o

f nuc

leot

ide

erro

rs p

er M

b of

ass

embl

y

No Trim Phred=2 Phred=5 Phred=10 Phred=20

10M 20M 50M 75M 100M

Quality trimming reduces error

MacManes, Frontiers in Genetics 2014

Page 8: MacManes Evolution2014 trimming talk

4000

5000

6000

7000

Num

ber o

f nuc

leot

ide

erro

rs p

er M

b of

ass

embl

y

No Trim Phred=2 Phred=5 Phred=10 Phred=20

SOAP10M SOAP20M

1000

1400

1800

Num

ber o

f nuc

leot

ide

erro

rs p

er M

b of

ass

embl

y

No Trim Phred=2 Phred=5 Phred=10 Phred=20

10M 20M 50M 75M 100M

Quality trimming reduces error

Page 9: MacManes Evolution2014 trimming talk

−5−4

−3−2

−10

1

Perc

ent d

iff in

num

ber o

f uni

que

BLAS

T hi

ts

No Trim Phred=2 Phred=5 Phred=10 Phred=20

10M 20M 50M 75M 100M

Quality trimming reduces BLAST hits

MacManes, Frontiers in Genetics 2014

Page 10: MacManes Evolution2014 trimming talk

−5−4

−3−2

−10

1

Perc

ent d

iff in

num

ber o

f uni

que

BLAS

T hi

ts

No Trim Phred=2 Phred=5 Phred=10 Phred=20

10M 20M 50M 75M 100M

−6−4

−20

Perc

ent d

iff in

num

ber o

f uni

que

BLAS

T hi

ts

No Trim Phred=2 Phred=5 Phred=10 Phred=20

SOAP10M SOAP20M

Quality trimming reduces BLAST hits

Page 11: MacManes Evolution2014 trimming talk

−15

−10

−50

Perc

ent d

iff in

num

ber o

f com

plet

e C

DS

No Trim Phred=2 Phred=5 Phred=10 Phred=20

10M 20M 50M 75M 100M

Quality trimming reduces complete CDS

MacManes, Frontiers in Genetics 2014

Page 12: MacManes Evolution2014 trimming talk

−15

−10

−50

Perc

ent d

iff in

num

ber o

f com

plet

e C

DS

No Trim Phred=2 Phred=5 Phred=10 Phred=20

10M 20M 50M 75M 100M

Quality trimming reduces complete CDS−1

5−1

0−5

0Pe

rcen

t diff

in n

umbe

r of c

ompl

ete

CDS

No Trim Phred=2 Phred=5 Phred=10 Phred=20

SOAP10M SOAP20M

Page 13: MacManes Evolution2014 trimming talk

Summary

• Trimming does reduce assembly error, but at the cost of content & contiguity.

• Proposed guidelines.

1. To max assembly content and contiguity ➠ Trim at 0 or 2

2. If concerned about error ➠ Trim at Phred=5

3. Usually probably never trim at Phred ≥ 10

MacManes, Frontiers in Genetics 2014

Page 14: MacManes Evolution2014 trimming talk

Questions? @PeroMHC

[email protected]


Recommended