DNBseqTM lfrWGS
•
•
•
•
•
•
Long Fragment Read Human Whole Genome Sequencing
Technical NoteDNBseqTM
Short-read sequencing technology can detect small SNP and InDel mutations at low cost and high precision, but lacks the ability
to read information from long fragments of DNA. Long-read sequencing technologies, such as Pacbio or Oxford Nanopore,
enable high-accuracy SV detection, haplotypes, high homology coverage, but show low accuracy for small variation detection
(SNP, InDel), and the cost is high. These have limited their applications. In order to address these limitations, we introduce single
tube Long Fragment Reads (stLFR) technology-based [1] human whole genome sequencing, lfrWGS. stLFR is developed from
DNA co-barcoding technology [2], which is adding the same barcode sequence to sub-fragments of the original long DNA mole-
cules (Figure 1). Combing this unique cost-e�ective and accurate new library methodology with the world's leading DNBseqTM
sequencing technology, DNBseqTM lfrWGS enables high-quality mutation detection, diploid phasing of human genomic regions,
structural variation analysis, genomic de novo assembly and other long-read applications.
Figure 1. lfrWGS library construction and sequencing workflow. lfrWGS library starts with inserting transposons into long genom-
ics DNA followed by hybridization of the transposons integrated DNA onto clonally barcoded beads. After barcode ligation,
adapter ligation and PCR amplification, the co-barcoded sub-fragments are ready for high throughput sequencing on our propri-
etary DNBseqTM platform. DNBseq sequencers can e�ectively avoid the accumulation of errors and improve the sequencing
accuracy.
Highlights of lfrWGS:
high quality WGS data from as low as 1ng DNA.
Over 30Mb of haplotype Contig N50 and powerful detection of structure variations, such as deletions, inversions,
translocations and insertions.
About 85% of DNA long molecules are labeled with a unique barcode.
High SNP and InDel variation detection accuracy and sensitivity.
E�ective detection of structural variations greater than 20kb, such as inversion, ectopic, deletion, and insertion.
Capable of analyzing genome regions that are di�cult to process with regular WGS, for example, high homologous regions,
high repeat regions, etc.
Long Fragment Read Human Whole Genome Sequenc ing
High-Quality Data Performance
SNP & InDel calling
Read long fragment information
The average length of long DNA fragments from lfrWGS is 50-70kb (maximum length up to 300kb). Benefiting from over 30
million molecule barcodes, more than 85% of long DNA fragment can be co-barcoded by a single unique barcode. This makes
co-barcoded reads analogous to direct single molecule sequencing, but without the high error rates and low throughput.
At 30x coverage, lfrWGS demonstrated high quality variant calling performance equivalent to that of standard short-read WGS.
Both the positive predictive value (PPV) and the sensitivity of SNP detection are 0.99 and above (Figure 3). In addition, F-measures
of InDel detection above 0.95 are also achievable, indicating the great performance of lfrWGS in SNP and InDel calling.
Figure 3. The performance of small variants (SNP & InDel) calling of HG001-005 standard samples. HG001:NA12878;
HG002-HG004: Ashkenazim Father-mother-son Trio; HG005: Asian (Han Chinese) son.
Figure 2. DNA fragment length distribution and DNA molecules number per barcode (right). (A) Typically, lfrWGS can analyze
long fragments with an average length of 50-70 kb. (B) When starting from 1ng of high molecular weight DNA, over 85% of DNA
can be co-barcoded by a single unique barcode.
A B
SNP & InDel
Figure 4. An ideogram of the phasing blocks on each chromosome of NA12878 sample. Phased contigs are represented by
alternating colors (blue and gray).
Figure 5. Large deletion was detected by lfrWGS. The top left panel is a heat map drawn based on barcode overlap. Regions of
high overlap are depicted in dark red. Those with no overlap in beige. Arrows demonstrate how regions that are spatially distant
from each other on Chromosome 8 have increased overlap marking the locations of the deletion. Co-barcoded reads are
separated by haplotype and plotted by unique barcode on the y axis and chromosome 8 position on the x axis. The
heterozygous deletion is found in a single haplotype.
Phasing
To evaluate variant phasing performance, high confidence variants from GIAB (NA12878) were phased using the publicly
available software package HapCut2[3]. An ideogram of the phasing blocks on each chromosome is shown below in Figure
4. With 40X coverage, the phasing block N50 can reach 34 Mb with practically all heterozygous SNPs phased. Notably, the
arms of some chromosomes, such as Chr5 and Chr6 are almost completely phased.
Structure variation detection
With phasing and co-barcoding information, lfrWGS can also be used to detect large scale structure variations. To demon-
strate the power of stLFR technology to detect SVs, we examined barcode overlap data, and previously reported deletions
by Zhang, F et. al [4] in NA12878 were also found using lfrWGS data. Notably, as shown in Figure 5, lfrWGS successfully
detected a heterozygous deletion of 150 kb in length on Chromosome 8 in the NA12878 sample.
Long Fragment Read Human Whole Genome Sequenc ing
% heterozygous SNPs phased 99%
Phasing block N50 size(Mb) 34.0
Short switch error rate 0.0025
Long switch error rate 0.0020
Copyright ©2019 BGI. The BGI logo is a trademark of BGI. All rights reserved. All brand and product names are trademarks or registered trademarks of their respective holders.Information, descriptions and specifications in this publication are subject to change without notice.DNBSEQ is a trademark of MGI CO. Ltd.
Published September 2019.
Request Information or Quotation
Contact your BGI account representative for the most a�ordable rates in the industry and to discuss how we can meet your
specific project requirements or for expert advice on experiment design, from sample to bioinformatics.
www.bgi.com
BGI Genomics BGI_Genomics
We Sequence, You DiscoverAll Services and Solutions are for research use only.
Long Fragment Read Human Whole Genome Sequenc ing
Perfect coverage of high homology regions
Spinal Muscular Dystrophy (SMA) is an autosome recessive neuromuscular disease, which is commonly characterized by
muscle weakness, low muscle tone and weakened sputum response. There is no e�ective treatment of this disease yet. At
present, genetics has been confirmed that SMA is closely related to SMN1 mutation. SMN2 is the highly homologous genes
of SMN1 and these two genes are distinguished in exon 7 and exon 8. There is only a five-base di�erence for SMN1 and SMN2
throughout the DNA level, and only two-base in the coding region, making this case impossible to resolve. With the powerful
DNA co-barcoding strategy, lfrWGS enables analysis of regions which can be di�cult for regular WGS.
Conclusion
Benefiting from co-barcoding technology, DNBseqTM lfrWGS has more than 85% of long DNA fragments with a single unique
barcode and up to 20% of sub-fragments reaching 300 kb in length. Importantly, this is achieved without any amplification of
initial long DNA fragments, which limits the representation bias that arises from PCR amplification.
The quality of variant calling using lfrWGS is high and reproducible. Together with the added benefit that co-barcoding
enables advanced informatics applications, such as near complete phasing of the genome into long contigs with extremely
low error rates, detection of SVs and impressive coverage of high homology regions, lfrWGS is a promising technology to fill
in the gap between short read sequencing and long read sequencing.
References
Figure 6. Accessing regions di�cult for regular WGS. lfrWGS (upper panels) successfully sequenced and identified the culprit
of the SMN1 gene (left panels) and its highly homologous counterpart SMN2 gene (right panels) which is inaccessible by
regular WGS.
Wang, O., et al., E�cient and unique cobarcoding of second-generation sequencing reads from long DNA molecules enabling cost-e�ective
and accurate sequencing, haplotyping, and de novo assembly. Genome Res, 2019. 29(5): p. 798-808.
Peters, B.A., J. Liu, and R. Drmanac, Co-barcoded sequence reads from long DNA fragments: a cost-e�ective solution for "perfect genome"
sequencing. Front Genet, 2014. 5: p. 466.
Edge, P., V. Bafna, and V. Bansal, HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies. Genome Res, 2017.
27(5): p. 801-812.
Zhang, F., et al., Haplotype phasing of whole human genomes using bead-based barcode partitioning in a single tube. Nat Biotechnol, 2017.
35(9): p. 852-857.
[1]
[2]
[3]
[4]
SMN1 SMN2
lfrWGS
Regular WGS
27 kb 27 kb