Bioinformatics tools for viral quasispecies reconstruction from next-generation sequencing data and
vaccine optimization
PD: Ion Măndoiu, UConnCo-PDs: Mazhar Khan, UConn
Rachel O’Neill, UConnAlex Zelikovsky, GSU
Outline• Background & aims of the project• Bioinformatics tools for quasispecies spectrum
reconstruction from NGS reads• Experimental validation on IBV data• Summary and ongoing work
Infectious Bronchitis Virus (IBV)• Group 3 coronavirus• Biggest single cause of
economic loss in US poultry farms−Young chickens: coughing, tracheal
rales, dyspnea−Broiler chickens: reduced growth rate−Layers: egg production drops 5-50%,
thin-shelled, watery albumin• Worldwide distribution, with
dozens of serotypes in circulation‒ Co-infection with multiple serotypes is
not uncommon, creating conditions for recombination IBV-infected
embryonormalembryo
IBV-infected egg defects
IBV Vaccination Broadly used, most commonly with attenuated live vaccine• Short lived protection• Layers need to be re-vaccinated multiple times
during their lifespan• Vaccines might undergo selection in vivo and
regain virulence [Hilt, Jackwood, and McKinley 2008]
RNA Virus ReplicationHigh mutation rate (~10-4)
Lauring & Andino, PLoS Pathogens 2011
Quasispecies identified by cloning and Sanger sequencing in both IBV infected poultry and commercial vaccines [Jackwood, Hilt, and Callison 2003; Hilt, Jackwood, and McKinley 2008]
Evolution of IBV
How Are Quasispecies Contributing to Virus Persistence and Evolution?
• Variants differ in– Virulence– Ability to escape immune response– Resistance to antiviral therapies– Tissue tropism
Lauring & Andino, PLoS Pathogens 2011
Project Aims• Develop bioinformatics tools for accurate
reconstruction of quasispecies sequences and their frequencies from next-generation reads
• Study quasispecies persistence and evolution of IBV in commercial layer flocks following vaccination
• Use results of this study to optimize vaccine development and vaccination protocols
Outline• Background & aims of the project• Bioinformatics tools for quasispecies spectrum
reconstruction from NGS reads• Experimental validation on IBV data• Summary and ongoing work
Next Generation Sequencing
10
http://www.economist.com/node/16349358
Roche/454 FLX Titanium400-600 million reads/run
Length up to 1,000 bp
Illumina HiSeq 2000up to 6 billion PE reads/run
35-100bp read length
SOLiD 4/55001.4-2.4 billion PE reads/run
35-50bp read length
Ion Torrent PGM1-10M reads/run
length up to 400bp
• Shotgun reads—starting positions
distributed ~uniformly
• Amplicon reads— reads have
predefined start/end positionscovering fixed overlappingwindows
Shotgun vs. Amplicon Reads
Reconstruction from Shotgun Reads: ViSpA
Read Error Correction
Read Alignment
Preprocessing of Aligned
Reads
Read Graph ConstructionContig AssemblyFrequency
Estimation
Shotgun reads
Quasispecies sequences w/ frequencies
User Specified Parameters: (A) Number of mismatches (B) Mutation rate
Reconstruction from Amplicon Reads: VirA
Reference in FASTAformat
Error-correctedSAM/BAMRead data
Estimate Amplicons
Max-Bandwidth Paths
Viral population variants with frequencies
Amplicon Read Graph
Frequency Estimation
Amplicon Sequencing Challenges
• Multiple reads from consecutive amplicons may match over their overlap
• Distinct quasispecies may be indistinguishable in an amplicon interval
Outline• Background & aims of the project• Bioinformatics tools for quasispecies spectrum
reconstruction from NGS reads• Experimental validation on IBV data• Summary and ongoing work
IBV Genome
Rev. Bras. Cienc. Avic. vol.12 no.2 Campinas Apr./June 2010
RT-PCR of S1 using redesigned primers
Experiment 110 clone pool
C1 20%C2 20%C3 15%C4 15%C5 10%C6 10%C7 4%C8 4%C9 1%C10 1%
Assembled quasispeciesPV1 PV2PV3…
PVk
454 reads
…
M42 Sample
454 reads
…
53 plasmid clones
…
V1 V2V3…Vn
Assembled quasispecies
Evaluated Reconstruction Flows
Reads Statistics & Coverage
Sample
Number of Reads
Uncorrected SAET Corrected Shorah Corrected KEC Corrected
M42 isolate 53062 53062 50858 48945
M42 clone pool 21040 21040 19439 17122
Reads Validation
How well we predicted sanger
clones
How well our prediction is
Average Prediction Error
Neighbor-Joining Tree for M42 Sanger Clones & Vispa Qsps
Experiment 2
Reads Statistics & CoverageSample
Number of Reads
Uncorrected SAET corrected Shorah corrected KEC corrected
M41 Vaccine 92113 92113 87883 85311
Field #1 38502 38502 33685 32521
Field #2 132513 132513 123370 111686
Field #3 76906 76906 71408 64507
Field #4 44467 44467 41653 37295
Neighbor-Joining Tree for Sanger clones and ViSpA Reconstructed Sequences
Outline• Background & aims of the project• Bioinformatics tools for quasispecies spectrum
reconstruction from NGS reads• Experimental validation on IBV data• Summary and ongoing work
Summary
• Developed software tools for quasispecies reconstruction from both shotgun and amplicon next-generation reads‒ Code and executables freely available at
http://alla.cs.gsu.edu/~software/VISPA/vispa.html http://alan.cs.gsu.edu/vira/
– ViSpA plugin developed for users of ION Torrent, available on ION community
• Experimental results on both simulated and real data show improved accuracy tradeoffs compared to previous methods
• Tools are applicable to quasispecies studies of other viruses
Ongoing Work
• Deployment of ViSpA and VirA on Galaxy servers maintained at UConn and GSU
• Tool validation on ION Torrent reads
• Comparison of shotgun and amplicon based reconstruction methods
• Combining long and short read technologies
• Quasispecies persistence studies using longitudinal sampling
Tool Validation for ION Torrent reads
• Shotgun IBV reads generated using 316 ION chip
– 2,384,007 reads (1,177,740 after SAET correction)– mean length 203.58 bp
• ViSpA results– 23 quasispecies with estimated frequency > .5%,
2,200 total
Longitudinal Sampling
Amplicon / shotgun
sequencing
Contributors
University of Connecticut:Rachel O’Neal, PhD. Mazhar Kahn, Ph.D.
Hongjun Wang, Ph.D. Craig ObergfellAndrew Bligh
Bassam TorkEkaterina Nenastyeva
Alex ArtyomenkoSerghei Mangul
Nicholas MancusoAlexander Zelikovsky
University of MarylandIrina Astrovskaya, Ph.D.