+ All Categories
Home > Documents > The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam...

The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam...

Date post: 03-Mar-2019
Category:
Upload: vonguyet
View: 218 times
Download: 0 times
Share this document with a friend
46
The Final Frontier Jean Jasinski, Ph.D. Field Application Scientist Sept. 27, 2017 Data Analysis For Research Use Only. Not for use in diagnostic procedures. PR7000_0953 1
Transcript
Page 1: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

The Final Frontier

Jean Jasinski, Ph.D.Field Application Scientist

Sept. 27, 2017

Data Analysis

For Research Use Only. Not for use in

diagnostic procedures. PR7000_09531

Page 2: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

For Research Use Only. Not for use in diagnostic procedures.

PR7000_09532

Final Frontier: Data AnalysisAgenda

SureDesign (eArray)

NGS Data Analysis

Cartagenia (Alissa)

Microarray Data Analysis

Introduction

Page 3: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

Standard Disclaimer

Except for GenetiSureDX and Cartagenia Bench, all other products are Research Use Only (RUO)

For Research Use Only. Not for use in diagnostic procedures. PR7000_09533

Page 4: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

For Research Use Only. Not for use in diagnostic procedures.

PR7000_09534

Final Frontier: Data AnalysisAgenda

SureDesign (eArray)

NGS Data Analysis

Cartagenia (Alissa)

Microarray Data Analysis

Introduction

Page 5: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

Precision Medicine Needs Precision GenomicsHigh resolution, accuracy and sensitivity

Key Technologies

▪ Next Generation Sequencing

▪ Microarrays

▪ Digital PCR

▪ qPCR

▪ Oligonucleotide FISH

For Research Use Only. Not for use in diagnostic procedures.

PR7000_09535

Page 6: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

Puzzled by Options?

For Research Use Only. Not for use in diagnostic procedures.

PR7000_09536

Page 7: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

For Research Use Only. Not for use in diagnostic procedures.

PR7000_09537

Final Frontier: Data AnalysisAgenda

SureDesign (eArray)

NGS Data Analysis

Cartagenia (Alissa)

Microarray Data Analysis

Introduction

Page 8: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

SureDesign eArray

For Research Use Only. Not for use in diagnostic procedures.

PR7000_09538

• Gene Expression microarrays

• miRNA microarrays

• RNA-Seq targeted capture

• Mutagenesis (QuikChangeHT)

SureDesign and eArrayCreate and View Custom and Catalog Designs

Page 9: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

SureDesign and eArray

• Web-based tools

• Same login for both tools

• Must use institutional email for account

• Create custom designs

• Customize catalog designs

• Download designs

• Order designs (trigger quote)

• Free to use

For Research Use Only. Not for use in diagnostic procedures.

PR7000_09539

Page 10: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

For Research Use Only. Not for use in diagnostic procedures.

PR7000_095310

Final Frontier: Data AnalysisAgenda

SureDesign (eArray)

NGS Data Analysis

Cartagenia (Alissa)

Microarray Data Analysis

Introduction

Page 11: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

Microarray Data Analysis Tools

For Research Use Only. Not for use in diagnostic procedures.

PR7000_095311

• Agilent CGH and CGH+SNP

arrays (human and nonhuman)

• True two-color analysis

• Copy Number

• LOH and UPD (CGH+SNP)

• Suppress, classify, edit, annotate

aberrations

• Report generation

• Free

CytogenomicsDX

• GenetisureDX array analysis

• FDA-cleared

• Free

• Gene Expression arrays

• miRNA arrays

• Exon and Exon Splicing

Arrays

• Copy Number

• Clustering, GEO, GO, GSA

• Pathway Analysis

• Multiple vendor arrays

• License fee

MPP (Mass Profiler Pro)

• Metabolomics and

proteomics from Mass Spec

data

• License fee

Page 12: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

For Research Use Only. Not for use in diagnostic procedures.

PR7000_095312

Final Frontier: Data AnalysisAgenda

SureDesign (eArray)

NGS Data Analysis

Cartagenia (Alissa)

Microarray Data Analysis

Introduction

Page 13: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

OneSight

Seeing is Knowing

Research Use Only. Not for use in diagnostic procedures.

PR7000_0953

The OneSight cfDNA solution allows

labs to study the (aneu)ploidy status

of the DNA found in the cell-free

fraction of a biopsy sample from low-

pass whole genome sequencing

data.

Key features:

• Vizualisation tools: detailed views (aneu)ploidy status of each chromosome

▪ All chromosomes

• Automation tools: define classification rules for marking loci for review

• Reference sets: define normal samples in the study population

• Excluded regions: remove recurrent technical noise and biologically

irrelevant loci in the data

Page 14: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

OneSightTurnkey solution

Compatible with

any common

NGS library

prep kit

Compatible with the

most common NGS

sequencing

platforms

Select analysis pipeline

and reference set

Visually inspect

chromosome plots

OneSight

Upload raw NGS

data

Research Use Only. Not for use in diagnostic procedures.

PR7000_0953

Page 15: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

OneSightVisual plots

Research Use Only. Not for use in diagnostic procedures.

PR7000_0953

Normal Segmental aberration

Trisomy Complex aberrations

-Developed by Cartagenia, a part of Agilent Technologies, leveraging the company’s expertise with software

solutions for genetics labs

-Proven SaaS approach and technology platform

-Workflow efficiency, traceability & versioning, and automation

-Setup fee and per sample analysis

Page 16: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

SureCall Alignment to Mutation

For Research Use Only. Not for use in diagnostic procedures.

PR7000_095316

• NGS data analysis

tool for biologists

• Accepts fastq or

bam files

• Generates vcf (4.2)

and pdf or text

mutation reports

• Human (hg19) DNA

analysis only

• Free to Agilent

Target Enrichment

customers

(HaloPlex,

SureSelect,

OneSeq)

• Runs on local

computer

Page 17: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

SureCall 4.0 New Features

For Research Use Only. Not for use in diagnostic procedures.

PR7000_095317

Support SureSelect XTHS Data Analysis• Add Molecular Barcode (MBC) analysis for SureSelect XTHS

• Improves MBC analysis flexibility

• Indexing hopping control including optical duplication removal and ‘estimated index hopping frequency’ parameter

• Additional QC metrics and plots for HS analysis

Introducing Translocation Detection

• New algorithm module

• New visualization

Overall Software Improvement

• Check for internet connection while submitting the job. If the connection is not available, a pop-up message to warn user that without internet connection, annotations result will be affected.

• Now allow re-annotation for updating an analyzed sample or finishing up a failed job due to network issues

• Provide link out to EXAC (Exome Aggregation Consortium, hosted by Broad Institute) while in Triage View.

• Improved login dialog

• Better installer, checks system/hardware compatibility first

• Support VCF v4.2 format, which include all variant types (SNPs, Indels, CNVs, translocations, etc.) from a sample)

• QC report improvements (e.g. include SureCall version, Design ID, Genome Build in the report).

Page 18: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

Choose one of the four analysis types available in SureCall

Research Use Only. Not for Use in Diagnostic Procedures.

PR7000_0953

Description Result

Single Sample

Analysis

• For individual samples • SNPs, indels,

translocations

Pair Analysis • To determine copy

number changes (use a

normal reference).

• To determine somatic

mutations in tumor-

normal samples

• SNPs and indels

• CNVs

• Somatic mutations

Trio Analysis • For trios, typically

mother, father and child

• SNPs and indels

• de novo mutations

OneSeq

Analysis

• For simultaneous

detection of genome-

wide copy number

changes, cnLOH, SNP

and Indel mutations

• CNVs, cnLOH, SNPs

and Indels

18

Page 19: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

SureCall – Support of HaloPlexHS and XTHS

molecular barcodes

Research Use Only.. Not for use in diagnostic procedures.

PR7000_095319

Page 20: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

What are Molecular Barcodes (MBC)?

• Also known as Unique Molecular Identifiers (UMI) or Random Molecular Tags (RMT)

• The goal is for each original DNA fragment, within the same sample, to be attached to a unique sequence barcode

• Although similarly named, these are not the same as a sample barcode/index which allow for multiple samples to be run on a single sequencing run

• Molecular barcodes are a string of totally random nucleotides (such as NNNNNNN), partially degenerate nucleotides (such as NNNRNYN), or defined nucleotides (when template molecules are limited)

• Agilent uses 10-base MBCDNA

Adaptor

Sample Index

Molecular Barcode

Research Use Only. Not for use in diagnostic procedures.

PR7000_0953

Page 21: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

Why are Molecular Barcodes Useful?

In Capture based technology (SureSelectHS):

• Able to identify original DNA fragments with bias from fragmentation methods

• With deep sequencing, able to use duplicate reads for error correction

In Amplicon Based technology (HaloPlexHS):

• De-duplication – ability to determine original DNA fragments and PCR duplicates

In Both:

• Accurate low allele frequency variant calling

• Calling of copy number changes

• Error correction introduced by PCR and sequencing

Research Use Only. Not for use in diagnostic procedures.

PR7000_0953

Page 22: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

De-duplication – Capture without MBC

When you de-duplicate reads that have the same start and stop point,

all will be removed (discarded) except for one read.

Reference GenomeExon of interest

Research Use Only. Not for use in diagnostic procedures.

PR7000_0953

Page 23: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

De-duplication – Capture with Molecular Barcodes

Reference GenomeExon of interest

When you ‘de-duplicate’ using molecular barcodes, the reads that

have the same start stop point are not removed but are merged

together to create consensus reads. This way, errors introduced by

PCR or sequencing are removed.

Research Use Only. Not for use in diagnostic procedures.

PR7000_0953

Page 24: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

De-duplication – Amplicon without MBC

Reference GenomeExon of interest

When using amplicon technology de-duplication really isn’t possible

because of the nature of the amplicons the majority of the sequencing

data would be lost.

For Research Use Only. Not for use in diagnostic

procedures.

Page 25: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

De-duplication – Amplicon with MBC

Reference GenomeExon of interest

When using amplicon technology with molecular barcodes, it becomes

possible to ‘de-duplicate’ and identify the unique molecules of DNA.

The reads that have the same molecular barcode can then be used to

create consensus reads and remove errors created by the library prep

or sequencing processes.

Research Use Only. Not for use in diagnostic procedures.

PR7000_0953

Page 26: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

• Low allele frequency variants are difficult to detect by conventional NGS

methods

• Relatively high error rate of sequencers

Low Allele Frequency Variants (<3%)

Sequencer Error rate Error type

Illumina MiniSeq & NextSeq <1% Substitutions

Illumina MiSeq & HiSeq 0.1% Substitutions

Ion Torrent PGM, Proton & S5 1% Indels & homopolymers

PacBio 13% single pass

≤1% circular consensus read

Indels

Oxford Nanopore MinIon 12% Indels

Adapted from Goodwin et al (2016) Nature Reviews Genetics 17:333-351

Research Use Only. Not for use in diagnostic procedures.

PR7000_0953

Page 27: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

Detecting low allele frequency variants and DNA Inputs

Input

4000

End repair & A tail

3900

Ligation 2500

Hybridisation 1750

Capture 1250

Clean up 1000

Library

900

Perfect world (0.1% allele frequency)

• 4 reads to create a consensus therefore:

• 4000x coverage would be sufficient = 4000 original

copies of the genome (2000 cells)

• 12ng of DNA input required

In reality, library prep is inherently inefficient

Conclusion: To detect low allele frequency variants, higher

DNA inputs are requiredAdapted from: https://cofactorgenomics.com/heterogenous-dna-sequencing-lower-limits-minor-allele-frequency-sensitivity/

Research Use Only. Not for use in diagnostic procedures.

PR7000_0953

Page 28: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

Analysis Pipelines other than SureCall

• For customers with established bioinformatics pipelines, Agilent provides two separate java programs in AGeNT (Agilent Genomics NextGen Toolkit (AGeNT) that can be integrated into your pipelines: SureCallTrimmer and LocatIt

• SurecallTrimmer is called before alignment and handles adapter trimming (on both ends), trims low quality bases, and masks enzyme footprints. SurecallTrimmer is important for HaloPlex and HaloPlexHS data not processed in SureCall

• MBC reads are found in third fastq file

• Generation of consensus reads occurs after alignment by examining all reads that align to the same location (chr, start, stop) and share the same molecular barcode

• LocatIt handles MBC after alignment: consensus reads, filtering based on MBC, optical deduplication, etc. Must have bam file and MBC fastq

• Tools for bioinformaticians capable of developing and debugging pipelines

For Research Use Only. Not for use in diagnostic procedures.

PR7000_095328

Page 29: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

Other Types of NGS Analyses: Non-human or Other Type of Sequencing (RNA-, small RNA-, Methyl-, meDIP-, or ChIP-Seq)

• SureCall only performs DNA analysis for human (hg19) data only

• StrandNGS can align DNA, RNA, and small RNA using its own aligner or accept BAM or SAM inputs

• Workflows for DNA-Seq, RNA-Seq, small RNA-Seq, Methyl-Seq, MeDIP-Seq, and ChIP-Seq using algorithms specific to experiment type

• Powerful QC tools

• Extensive filtering options

• Pathway, GO analysis, clustering

• StrandNGS pipelines now available

• License fee

For Research Use Only. Not for use in diagnostic procedures. PR7000_095329

Page 30: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

For Research Use Only. Not for use in diagnostic procedures.

PR7000_095330

Final Frontier: Data AnalysisAgenda

SureDesign (eArray)

NGS Data Analysis

Cartagenia (Alissa)

Microarray Data Analysis

Introduction

Page 31: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

Enabling clinical analysis of genomic data

▪ Enables the interpretation, reporting, and sharing of genomic variants

▪ Manage increasing volumes of data and reduce turnaround time

▪ Draft clinical grade lab reports (FDA Class 1 Medical Device)

▪ Analyzed CGH and NGS data accepted as input

▪ Rebranded as Alissa Interpret

PR7000_0953

Page 32: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

How Cartagenia Works

Software as a Service• Scalable

• Secure

• Cost effective

Content is key! Knowledge Integration:• Over 100 public and private data sources

• Institution specific repositories

• Sharing across private and public consortia• Partnerships (Alamut, HGMD, OncoMD, CollabRx, N-of-1…)

Setting and Adopting Standards• Adapting to diagnostic standards

• ISO9001 and ISO13485 certified• Registered as Medical Device in US, Canada and Europe

Support• A fully-serviced solution

• Adapted to your needs, specialization and deadlines

PR7000_095332

Page 33: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

PR7000_0953

Benefits of Cartagenia Bench

Efficient

Productivity through Automation

Standardization

Knowledge Integration

Easy to use

Co-designed with you

and your peers

Integrated with lab and hospital IT

Robust

Validation

Versioning

Security & control

High quality support

Clinical grade

ISO Certification

Class I medical device

Page 34: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

Agilent Alissa Vision – from raw data to report

Make your work flow with Agilent Alissa Clinical Informatics for NGS

✓ One single platform from raw reads to lab reports

✓ Comprehensive QC metrics at your fingertips

✓ Alissa Interpret is Class I medical device (CGH and NGS)

✓ Alissa Align & Call (RUO future release)

For Research Use Only. Not for use in diagnostic procedures.

PR7000_0953

Page 35: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

Bonus Content

Index Hopping

For Research Use Only. Not for use in diagnostic procedures.

PR7000_0953

35

Page 36: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

Index Hopping (Illumina Sequencers)

• Incorrect assignment of reads to different sample

• Occurs in multiplexed samples

• Frequency is higher on patterned flow cells (ExAmp chemistry) but still occurs in bridge amplification chemistry

• Multiple causes (index contamination, sample contamination, postcapture PCR mispriming, excess adapters, overclustering)

• Detection best done during demultiplexing when data from all samples is available

For Research Use Only. Not for use in diagnostic procedures.

PR7000_095336

Illumina’s recommendationshttps://www.illumina.com/science/education/minimizing-index-hopping.html

Page 37: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

Observed index hopping rate using XTHS: Hiseq4000 vs. Hiseq2500

▪ We see an average hop rate** of 2.9% with HiSeq4000 (newer patterned flowcell).

▪ On HiSeq2500 (older non patterned flowcell), we see average hopping rate of 0.1-0.2%.

October 23, 201737

**: Hop rate = hopped reads/ total reads

Libraries are prepared and enriched individually, so hopping observed has occurred at sequencing level

0.0%

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%

3.5%

HiSeq4000 HiSeq2500

Index hopping rate

P5

MBC

Insert1

Index1

P7

Index2

P7

P5

MBC

Insert1

Index2

P7

For Research Use Only. Not for use in diagnostic procedures.

PR7000_0953

Page 38: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

What does this mean for your application*?

▪ For pooling of samples from the same germline application using XT low input or SureSelect XT, XT2▪ Assuming <5% alleles are not called

▪ Customers should not be concerned about index hopping

▪ For pooling of samples from the same somatic application using XTHS

▪ Variant calls with >5% alleles are likely not due to index hopping

▪ Variant calls with <5% alleles, might be impacted by index hopping.

▪ For heterogeneous pooling across applications, or of samples across species, single cell, microbiome, viral, RNA expression, etc.▪ Variant calls are possibly impacted by index hopping

▪ Consider index hopping risks when determining what samples to pool for sequencing

October 23, 201738

*: index misassignment discussed here is limited to hopping at sequencing level; HiSeq 2500 data suggest other source of misassignment, such as index purity,

are insignificant by comparison

For Research Use Only. Not for use in diagnostic procedures.

PR7000_0953

Page 39: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

Index Hopping Physical Corrections

• Use one sample (exome) per lane

• Do not use precapture pooling as PCR of multiplexed samples may misprime and cause index hopping

• Pool libraries right before sequencing and sequence pooled library as soon as possible

• Freeze pooled libraries at -20°C

• Remove as much free adapters and PCR primers as possible; second bead cleanup if see small MW blip

• If sample barcode is comprised of dual indexes, do not use all possible combinations of indices so illegal combinations can be detected and removed

For Research Use Only. Not for use in diagnostic procedures. PR7000_095339

Page 40: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

XTHS molecular barcode thresholding -Bioinformatically remove hopped reads

October 23, 201740

Fragments with multiple reads

(same MBC)

3

This will work well for low allele

frequency applications where error

correction with MBC is needed.

(All colors): molecular barcode

Good reads

Hopped reads

The vast majority of hopped reads, have just 1 read, regardless of sequencing depth

Fragments with single read

1

2 One way to minimize the impact of

hopped reads is to remove all single

reads (MBC thresholding). No error

correction utilizing MBC with these

reads anyway.

For Research Use Only. Not for use in diagnostic procedures.

PR7000_0953

Page 41: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

How effective is MBC thresholding on HiSeq4000?

October 23, 201741

▪ MBC2+ means all single MBC reads are

filtered out, i.e. MBC thresholding

▪ MBC thresholding results in a 10x reduction

in hop rate, from average of 2.9% to 0.3%,

close to observed hopping level on HiSeq

2500

Each data point is average hop rate of 2-3 HiSeq4000 runs per given sample.

Data include 3-plex, 4-plex and 8-plex lane runs

M B C (1 + ) M B C (2 + )

0

1

2

3

4

5

w ith v s . w ith o u t M B C th re s h o ld

% H

op

ra

te (

ho

pp

ed

re

ad

s/t

ota

l re

ad

s)

For Research Use Only. Not for use in diagnostic procedures.

PR7000_0953

Page 42: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

Impact of MBC Thresholding on XTHS Sensitivity

October 23, 201742

Expected

HiSeq2500

MBC1+

HiSeq4000

MBC1+

HiSeq2500

MBC2+

HiSeq4000

MBC2+

>2% known Variants 59 59 59 59 59

<=2% known Variants* 21 12 12 11 12

false positive (or unknown Variants)** 57 105 24 24

Total Sensitivity 88.75% 88.75% 87.50% 88.75%

Specificity 99.93% 99.86% 99.97% 99.97%

Precision (PPV) 55.47% 40.34% 74.47% 74.74%

77kb panel, 10ng input, 10,000X sequencing depth

*: 2 of 21 have expected frequency of 1-2%. both are detected. The rest 19 are <=1%

**: True variant calls are based on genome in a bottle. False positive count could

include unknown true variants.

Without thresholding, False

positive rates are significantly

higher with 4000 (low specificity)

HiSeq 4000 with MBC thresholding,

comparable sensitivity and specificity

to HiSeq 2500***

MBC thresholding on HiSeq 4000, while removing significant amount of sequencing data,

shows little to no negative impact on assay sensitivity.

For Research Use Only. Not for use in diagnostic procedures.

PR7000_0953

Page 43: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

MBC Thresholding for Hopped Reads in SureCall

October 23, 201743

MBC2+ is set by default

For Research Use Only. Not for use in diagnostic procedures.

PR7000_0953

Page 44: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

SureCall Estimated Hopping Frequency

For Research Use Only. Not for use in diagnostic procedures. PR7000_095344

• New parameter reduces noise generated by sample index cross-contamination

• Default setting is 0.005 (0.5%)

• Range is 0 to 0.1 (0 to 10%)

• How SureCall uses this parameter:

1) Calculate “Read numbers of variants could caused by indexing hopping” = Average coverage of each region X Estimated Index Hopping Frequency

2) Based on the reads number from the 1st step, SureCall calculates the probability of certain variant calls to be real or noise caused by index hopping and filters out such mutations

• Estimate value by comparing SureCall allele frequencies with known allele frequencies, from data for the particular sequencer used (higher in patterned flow cells), past experience

Number of variants

that might be due to

index hopping

Page 45: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

For Research Use Only. Not for use in diagnostic procedures. PR7000_095345

Optical Duplicates

These are only a problem for HiSeq 2500/MiSeq/NextSeq data. They come from

large clusters being called incorrectly as two separate clusters by Illumina’s RTA SW.

On a 2500: Some clusters are either too big or their shape does not conform to the

model and they get counted as 2+ clusters.

On a 4000: During amplification on the flow cell one of the local duplicates that are

part of a growing cluster break free and go on to seed a new nanowell and start

a cluster of its own nearby. After analysis, these two nanowells show the very same

data: sequence and MBC. The geographical coordinates are close to each other.

SureCall uses geographical location (tile) for optical deduplication before MBC

deduplication

Page 46: The Final Frontier Data Analysis - agilent.com · MBC, optical deduplication, etc. Must have bam file and MBC fastq • Tools for bioinformaticians capable of developing and debugging

For Research Use Only. Not for use

in diagnostic procedures.

PR7000_0953


Recommended