KUVEMPU UNIVERSITY
DEPARTMENT OF POST GRADUATE STUDIES AND RESEARCH IN
BIOTECHNOLOGY AND BIOINFORMATICS
Submitted by:Hithesh Kumar C.K
2ND Year M.Sc. – biotechnology
Under the Guidance: Dr. Y.L. Ramachandra M.Sc., Ph.D.,
Professor,Dept. of Biotechnology and Bioinformatics,
Kuvempu University
OUTLINE● SEQUENCING.● HISTORY OF SEQUENCING.● INTRODUCTION.● PRINCIPLE.● PROTOCOL OF NEXT GENERATION SEQUENCING.● SEQUENCING EQUIPMENT.● APPLICATIONS.● LIMITATIONS.● CONCLUSION.
SEQUENCING •DNA sequencing is the process of determining the precise order of nucleotides within a DNA molecule.
…ACGTGACTGAGGACCGTGCGACTGAGACTGACTGGGTCTAGCTAGACTACGTTTTATATATATATACGTCGTCGTACTGATGACTAGATTACAGACTGATTTAGATACCTGACTGATTTTAAAAAAATATT…
HISTORY
I started from the conviction that, if different DNA species exhibited different biological activities, there should also exist chemically demonstrable differences between deoxyribonucleic acids. Edwin Chargaff
Frederick Sanger
• 1953 : Discovery of DNA structure by Watson and Crick
• 1973 : First sequence of 24 bp published
• 1977 : Maxam-Gilbert and Sanger sequencing method published
• 1983 : Development of PCR• 1980 : Nobel Prize Wally Gilbert and
Fred Sanger• 1982 : Genbank started• 1987 : 1st automated sequencer :
Applied Biosystems Prism 373• 1990 : The first NGS-Technologies
started at Lynx Therapeutics. • 1996 : Capillary sequencer : ABI 310• 1998 : Genome of Caenorhabditis
elegans sequenced
Watson and Crick
• 2003 : Human genome sequenced• 2005 : 1st 454 Life Sciences Next
Generation Sequencing system : GS 20 System
• 2007 : 1st Applied Biosystems Next Generation Sequencer : SOLiD
• 2006 : 1st Solexa Next Generation Sequencer : Genome Analyzer
• 2009 : 1st Helicos single molecule sequencer : Helicos Genetic Analyser System
• 2011 : 1st Ion Torrent Next Generation Sequencer : PGM
• 2011 : 1st Pacific Biosciences single molecule sequencer : PacBio RS Systems
• 2012 : Oxford Nanopore Technologies demonstrates ultra long single molecule reads
WHAT IS NGS•High throughput sequencing•Lower Cost•Less time•Parallel Sequencing process•Sequence thousands of sequences at once
PRINCIPLES•Fragmentation and tagging of genomic/cDNA fragments – provides universal primer allowing complex genomes to be amplified with common PCR primers.
•Template immobilization – DNA separated into single strands and captured onto beads (1 DNA molecule/bead).
•Clonal Amplification – Solid Phase Amplification.
•Sequencing and Imaging – Cyclic reversible termination (CRT) reaction.
•Introduced by 454 life sciences based on sequencing by synthesis technique called pyrosequencing.
•A fluorescently labeled reversible terminator is imaged as each dNTP is added, and then cleaved to allow incorporation of the next base.
•Library preparation• Fragmenting of genomic DNA.• Ends of DNA strands repair & phosphorylated.• Tailing.• Ligate index adapter.• Denature & amplify for final product.
SEQUENCING BY SYNTHESIS(Illumina GenomeAnalyzer or HiSeq)
•Flow Cell: several samples can be loaded onto the eight-lane flow cell for simultaneous analysis on an Sequencing system.
•Preparation of Genomic DNA Sample: Randomly fragment genomic DNA and ligate adapters to both ends of the fragments.
•Attach DNA to surface: Bind single-stranded fragments randomly to the inside surface of the flow cell channels.
•Bridge Amplification: Add unlabelled nucleotides and enzyme to initiate solid-phase bride amplification.
•Fragments Become Double Stranded: The enzyme incorporates nucleotides to build double-stranded bridges on the solid-phase substrate.
•Denaturing the Double-stranded molecules: denaturing leaves single-stranded templates anchored to the substrate.
•Complete Amplification: Several million dense clusters of double-stranded DNA are generated in each of the flow cell.
•Determine First Base: The first sequencing cycle begins by adding four labelled reversible terminators, primers, and DNA polymerase.
•Image First Base: After laser excitation, the emitted fluorescence from each cluster is captured and the first base is identified.
•Determine Second Base: The next cycle repeats the incorporation of four labelled reversible terminators, primers, and DNA polymerases.
•Image Second Chemistry Cycle: After laser excitation, the image is captured as before, and the identity of the second base is recorded.
•Sequencing Over Multiple Cycles: The sequencing cycles are repeated to determine the sequence of bases in a fragment, one base at a time.
•Align Data: The data are aligned and compared to a reference, and sequencing differences are identified.
Pyrosequencing output
Runs of bases produce higher peaks – for instance, the sequence for (a)is GGCCCTTG. Sample (c) comes from a heterozygous individual (hencethe heights in multiples of ½)
SEMICONDUCTOR SEQUENCING (Ion Torrent)
Workflow : Library preparation
Emulsion PCR
Semiconductor Sequencing
•Ion torrent started semiconductor based detection system.
•This method of sequencing is based on the detection of Hydrogen ions that are released during the polymerisation of DNA.
• 44 node dedicated cluster• 128 Gb RAM, 24 processor server for Next-Gen sequence Assembly
• Currently ~125 Tb of redundant storage
• WVU HPC Cluster: >1000 nodes with up to 512 Gb RAM
Computing requirements for NGS
Sanger (old-gen) Sequencing
Now-Gen Sequencing
Whole Genome Human (early drafts), model organisms, bacteria, viruses and mitochondria (chloroplast), low coverage
New human (!), individual genome, 1,000 normal, 25,000 cancer matched control pairs, rare-samples
RNA cDNA clones, ESTs, Full Length Insert cDNAs, other RNAs
RNA-Seq: Digitization of transcriptome, alternative splicing events, miRNA
Communities Environmental sampling, 16S RNA populations, ocean sampling,
Human microbiome, deep environmental sequencing, Bar-Seq
Other Epigenome, rearrangements, ChIP-Seq
LIMITATIONS OF SANGER SEQUENCING•Low throughput.•Inconsistent base quality.•Expensive.•Not quantitative.
LIMITATIONS OF NGS•The increased throughput of NGS reactions comes at the cost of read length, as the most readily available sequencing platforms (Illumina, Roche, SoLiD) offer shorter average read lengths (30–400 bp) than conventional Sanger-based methods (500–1 kb).
•Ironically, one of the key limitations of NGS also serves as its greatest strength, the high volume of data generation. NGS reactions generate huge sequence data sets in the range of megabases (millions) to gigabases (billions),
REFERENCES•DNA sequencing - Wikipedia, the free encyclopedia. (http://en.wikipedia.org/wiki/DNA_sequencing).•Jay Shendure & Hanlee Ji. et al. Next-generation DNA sequencing, Nature Biotechnology 26, 1135 - 1145 (2008), published online: 9 October 2008 | doi: 10.1038/nbt1486. •http://www.nature.com/nbt/journal/v26/n9/fig_tab/nbt.1488_F1.html.•Daniel G. Hert1, Christopher P. Fredlake1, Annelise E. Barron1, Department of Bioengineering, Stanford University, Stanford, CA, USA, advantages and limitations of next-generation sequencing technologies: A comparison of electrophoresis methods.•Next Generation Sequencing for BIOL 321.•Dr. Kieleczawa's Second Volume, DNA Sequencing II: Optimizing The Preparation And Clean-Up.