+ All Categories
Home > Science > Base Calling Error Toleration in Reference Base Assembly

Base Calling Error Toleration in Reference Base Assembly

Date post: 19-Jan-2017
Category:
Upload: hadi-gharibi
View: 381 times
Download: 1 times
Share this document with a friend
14
Base Calling Error Toleration in Reference Based Assembly Hadi Gharibi Email: [email protected] Sharif University of Technology Max Planck Institute for Molecular Genetics May 2015
Transcript
Page 1: Base Calling Error Toleration in Reference Base Assembly

Base Calling Error Toleration in Reference Based Assembly

Hadi GharibiEmail: [email protected] University of Technology

Max Planck Institute for Molecular GeneticsMay 2015

Page 2: Base Calling Error Toleration in Reference Base Assembly

How Base Calling Error Can Be Tolerated in Next Generation Sequencing (NGS)

2

Importance

Challenges

Our Hypothesis

Our Approach

• Deal with Large Amount of Data • Impact on Sequencing Data Analysis Time and Accuracy

Researchers have developed many base calling algorithms, however, they have not resolved the tradeoff between accuracy and time complexity.

• Required Accuracy • Sequencing Data Analysis Execution Time

Base Calling Error Is Compensated in Down-stream Sequencing Steps

• Massive Data• Diverse Algorithms

Page 3: Base Calling Error Toleration in Reference Base Assembly

Importance: Base Calling Translates Noisy Intensity Data Into Reads

3© EMBO Conference, 2014 [1]© illumina Incorporation, 2011.[2]

IntensityImage Processing

Base Calling

ReadAssemblingGenome

Page 4: Base Calling Error Toleration in Reference Base Assembly

Challenge: Base Calling Errors Are Always Compared

4© C. Ye, 2014 [3]

Figure: Error rate for base callers per sequencing cycle on the PhiX174 test data is plotted. Accurate callers are slower than the others. [3]

Page 5: Base Calling Error Toleration in Reference Base Assembly

Fundamental Question:

5

How Much Accuracy Is Required?

Page 6: Base Calling Error Toleration in Reference Base Assembly

Our Approach: Analytical Assumptions and Method

6

Assumptions

• Random Genome• Single Variations• Mismatches << Read Length• Uniform Substitution Error• Equally Likely Base Errors

Method• Variant Calling for Re-sequencing

• Derive Variant Calling Errors

Page 7: Base Calling Error Toleration in Reference Base Assembly

Analytical Results: Base Calling Error Is Tolerated by Mapping Mismatch

7

Figure: Variant Calling Error Vs. Base Calling Error

Random GenomeMismatches={2, 5, 7, 9}Genome Size ~ 4MbpRead Length= 30bpVariation Rate= 0.01

Page 8: Base Calling Error Toleration in Reference Base Assembly

Simulation Method and Setup

8

• Generate Target Genome• Simulate Reads [4]• Add Base Calling Error• Call Variants• Calculate Variant Calling Error

Method Setup

© Gemsim, 2013[4]

Page 9: Base Calling Error Toleration in Reference Base Assembly

Simulation Results: Simulation Verifies Analysis Predictions

9

• E-Coli Genome [5]• Mismatches= {3, 4, 5}• Genome Size ~ 4Mbp• Read Length= 30bp• Variation Rate~ 0.01• Single-end Shotgun Run • Map with SOAP[6]

Figure: Variant Calling Error Vs. Base Calling Error

© NCBI, 2014[5]© G. BGI, 2008[6]

Page 10: Base Calling Error Toleration in Reference Base Assembly

Simulation Results: Random Genome Obviates Repeat Region Effect

10

• Genome Sizes ~ 4Mbp• Mismatches= 3• Read Length= 30bp• Variation Rate~ 0.01• Single-end Shotgun Run • Map with SOAP[6]

Figure: Random Genome Vs. E-Coli Genome

© G. BGI, 2008[6]

Page 11: Base Calling Error Toleration in Reference Base Assembly

11

Conclusion

Simulation Results

• Confirm the Hypothesis• Genome Repeat Regions Impair Accuracy

• Confirm the Hypothesis• Higher Mismatches May Not Obey

Analytical Results

Page 12: Base Calling Error Toleration in Reference Base Assembly

Next Steps

12

Simulation Steps• Genome Having More Repeat Regions • Develop Mapper with Higher Mismatches

• Genome Structure• Paired-end Shotgun Sequencing• Erasure Base Calling Error• Other Variant Types

Analytical Steps

Page 13: Base Calling Error Toleration in Reference Base Assembly

References[1] EMBO Conference, “Human Evolution in the Genomic Era: Origins, Populations, and Phenotypes,” 2014, [Online]. Available: events.embo.org/14-human-evo[2] Illumina Inc., “Theory of Operation, HCS 1.4/RTA 1.12”,2011.[3] C. Ye, C. Hsiao, and H. Corrada Bravo, “BlindCall: ultra-fast base-calling of high-throughput sequencing data by blind deconvolution,” Bioinformatics, 30(9), 1214–1219, 2014. [4] C. Ledergerber and C. Dessimoz, “Base-calling for next-generation sequencing platforms”, Briefings in Bioinformatics, 2011.[5] GemSIM, “Gemsim,” 2013. [Online]. Available: http://sourceforge.net/projects/gemsim[6] NCBI, “Escherichia coli o157:h7 str. sakai dna, complete genome - nucleotide - ncbi,” 2014. [Online]. Available: http://www.ncbi.nlm.nih.gov/nuccore/47118301?report=fasta[7] G. BGI, “Soap: Short oligonucleotide analysis package,” 2008. [Online]. Available: http://soap.genomics.org.cn

13

Page 14: Base Calling Error Toleration in Reference Base Assembly

Acknowledgement

Thank You for Your Patience, Time and Attention.

14

Danke Seher


Recommended