High Throughput Plasmid Sequencing with Illumina and CLC bio
Ajay Athavale Monsanto Company 05 June 2012
6/4/2012 1 Monsanto Company Confidential
Monsanto’s Transgenic Plant Pipeline: Sequencing 1,000’s of plasmids per year
Hypothesis
Nomina/on
Cloning
Sequencing Plant Transforma/on
Seed Genera/on
Efficacy Tes/ng
Monsanto Company Confidential 2
Increasing Clone Complexity Required New Solutions for Plasmid Sequencing & Analysis
• NextGen sequencing offers significant savings & reduced processing time • Illumina was demonstrated to be optimal NGS platform for plasmid sequencing • Scalable and accurate finishing tools were needed for Illumina sequence reads
Monsanto Company Confidential 3
0%
20%
40%
60%
80%
100%
120%
2007 2008 2009 2010 2011 2012‐Sgr 2012‐Ilmn
% of C
lone
s
Clone Complexity & Sequencing Costs
Simple Moderate Complex Total Cost
Requirements for HTP Plasmid Finishing Platform Performance, Accuracy & Scalability
Support structurally challenging plasmids Support for multiple sequencing platforms Hybrid assemblies (across platforms) User configurable & flexible parameters
• Read trimming • Mutation detection (SNPs) • Insertion & Deletion detection (DIPs)
Usability & User Interface Easily edit/manipulate assemblies Execution from both UNIX and GUI environments User friendly visualization tools
Integration with in house corporate research data systems
Customization of finishing workflows
Monsanto Company Confidential 4
CLC bio was chosen as next gen finishing tool, given these criteria
Demonstrated identical finishing results for structurally complex plasmids using Illumina and CLC Bio
Plasmid Known Complex Sequences
Observed Seq Sanger + CONSED
Observed Seq Illumina + CLC bio
Plasmid A None
Plasmid B Mixed PopulaDon
Plasmid C Homopolymer tracts
Plasmid D Homopolymer tracts Tandem Repeats
Plasmid E Homopolymer tracts Inverted Repeats
Plasmid F Homopolymer tracts Inverted Repeats Tandem Repeats
Plasmid G Large InserDon
Plasmid H Large DeleDon
5 Monsanto Company Confidential
MiSeq Sequencing
Assembly &
Analysis
Overview of the Current Sequencing & Finishing Process with Illumina and CLC bio
Monsanto Company Confidential 6
Glycerol Receipt
Plasmid DNA IsolaDon
Clone Re‐array & DNA NormalizaDon
Nextera Library Prep Library QuanDficaDon & Pooling
Assembly & Analysis in CLC bio
Review Sequence
Advance Perfect Clones
Colony PropagaDon
DNA Prep
Library Prep &
Sequencing
Automated Assembly & Analysis Workflow in CLC bio streamlines workflow • The CLC bio Genomics Server and Command line tools
enable efficient parallel processing of large batches of constructs
Read Mapping
Coverage Analysis
SNP Detection
DIP Detection
Read Mappin
g
Prepare Config file
Import Illumina data
Quality Trimming
Read Mapping
Coverage Analysis
SNP DetecDon
DIP DetecDon
Review Assemblies
• For the non‐UNIX user, command line tools can be easily run in the GUI (CLC Genomics workbench)
Assembly & Analysis Workflow
7 Monsanto Company Confidential
Identification of low coverage regions with the “Contig Analysis” tool
Easy identification of low coverage areas
8 Monsanto Company Confidential
The “Primer Creator” tool helps to streamline efforts to close gaps
9 Monsanto Company Confidential
Design primers around
annotated regions
10 Monsanto Company Confidential
Simplified variant detection with the “SNP Detection” tool
Workflow specific op/miza/on of SNP calling parameters is cri/cal for accuracy!
“DIP Detection” for Deletions & Insertions
11 Monsanto Company Confidential
Manual review is s/ll required as the
algorithm does not always detect large
inser/ons and dele/ons, leading to a
mis‐assembly...
The de novo assembly tool helps to resolve large inser/ons or dele/ons!
Denovo assembly used to resolve Indels
Challenging plasmids are typically able to be resolved using the de novo assembly workflow
de novo assembly is easier than tearing & joining conDgs manually, typically equally as effecDve
12 Monsanto Company Confidential
!
CLC Bio Meets Requirements for Monsanto HTP Plasmid Finishing
Monsanto Company Confidential 13
Performance, Accuracy & Scalability Support structurally challenging plasmids Support for multiple sequencing platforms Hybrid assemblies (across platforms) User configurable & flexible parameters
Read trimming Mutation detection (SNPs) Insertion & Deletion detection (DIPs)
Usability & User Interface
Easily edit/manipulate assemblies Execution from both UNIX and GUI environments User friendly visualization tools
Integration with in house corporate research data systems Customization of finishing workflows
Additional CLC bio capabilities are also used across Monsanto
• In silico cloning & plasmid design
• Sanger sequence review/assembly
• Plasmid sequencing & finishing via Sanger & Illumina
• Genome assembly for microbes and plants
• DNA and Protein alignment for research teams
• … and much more!
Monsanto Company Confidential 14
Acknowledgements
• CLC Bio – Jannick Bendtsen – Henrik Sandmann – Joe Salvatore
• Monsanto – Todd Michael – Dan Ader – Amber Ford – Susan Johnson – Cynthia LaBanca – Kim Lawry – Jing Lu – Karen Martin – Tim Mitsky – Stacie Norton
Monsanto Company Confidential 15