Post on 26-May-2021
transcript
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BLLC1
CERGENTIS B.V. YALELAAN 62 3584 CM UTRECHT THE NETHERLANDS 0031 - (0)30 - 760 16 36 INFO@CERGENTIS.COM WWW.CERGENTIS.COM
TLA TECHNOLOGY & TARGETED COMPLETE NGS SEQUENCINGFeatures & applications of the human genomic DNA TLA protocol
INTRODUCTIONCergentis’ TLA Technology:
• enables targeted complete gene sequencing.• requires one primer pair complementary to a short locus specific sequence. • detects all Single Nucleotide Variants and Structural Variants. • enables haplotyping.• is easy to execute with standard laboratory equipment.
The original TLA protocol (Nature Biotechnology 20141) requires cells as input material. This application note describes the features and applications of the TLA protocol for isolated human genomic DNA.
LABORATORY AND INPUT DNA REQUIREMENTSThe gDNA TLA protocol only requires standard laboratory equipment. The protocol is straightforward and can be performed in 2 days.
TLA analyses on DNA isolated with standard protocols enable the amplification and sequencing of >70kb per primer pair. High Molecular Weight DNA results in coverage across a larger sequence per TLA amplification. Multiplex TLA amplifications can be performed to sequence multiple or larger loci.
The current protocol requires 5 µg of DNA. Smaller amounts of DNA (>10 ng) can be amplified with Whole Genome Amplification prior to a TLA analysis. TLA primers can be designed and ordered quickly from any oligonucleotide manufacturer. TLA thus enables both routine screening as well as the flexible targeted sequencing of individual loci in individual samples.
TLA TECHNOLOGY
1 http://www.nature.com/nbt/journal/v32/n10/full/nbt.2959.html
Figure 1. Overview of TLA-based amplification and sequencing of a locus of interest. TLA amplifications use one primer pair complementary to a short locus specific sequence. Generated NGS sequencing coverage (i.e. the number of NGS sequencing reads) is highest in immediate vicinity to the locus specific sequence and declines with greater physical distance from the locus specific sequence.
Sequencing coverage
PCR primers
Locus Specific Sequence
20 - 50 kb 20 - 50 kb
Locus
Figure 2. A summary of the TLA Technology.
First, genomic DNA is crosslinked. Crosslinking preferentially occurs between
sequences in extreme physical proximity.
Crosslinking therefore results in the
crosslinking of sequences from the same locus
(depicted in red).
This results in TLA Template; long stretches of
DNA consisting of religated DNA fragments
originating from the same locus.
The crosslinked DNA is fragmented, religated
with a ligase enzyme and then decrosslinked.
This template is fragmented and circularised.
Stochastic variation in the folding, crosslinking
& religation of DNA fragments in individual
copies of a locus results in a repertoire of DNA
circles that are composed of unique combinations
of DNA fragments from that locus.
Circular fragments originating from the locus
of interest are amplified with inverse primers
complementary to a short locus-specific
sequence.
As a result, the complete locus is amplified
and can be sequenced using Next Generation
Sequencing technologies.
In this manner the TLA Technology enables
targeted hypothesis-neutral sequencing.
It detects all sequence and structural variants
in loci of interest, also in heterogeneous
samples such as tumours.
The TLA Technology permits multiplexing.
Multiple loci can be amplified in multiplex
and/or multiple individual amplifications.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BLLC1
CERGENTIS B.V. YALELAAN 62 3584 CM UTRECHT THE NETHERLANDS 0031 - (0)30 - 760 16 36 INFO@CERGENTIS.COM WWW.CERGENTIS.COM
COMPLETE BRCA1 AND SERPINA GENE SEQUENCINGFigure 3 shows the results of TLA based targeted sequencing of the BRCA1 and SERPINA1 genes on DNA isolated from the NA12878 Cell-Line2.
4 TLA amplifications were performed across the BRCA1 gene and 2 TLA amplifications across the SERPINA1 gene. TLA amplicons were library prepped with Illumina® NexteraXTTM and sequenced on the Illumina MiniSeq® with 150bp paired-end reads. Generated sequence information was mapped with BWA SW and Single Nucleotide Variants (SNVs) were called with samtools mpileup. Identified variants were compared to the public NA12878 genome sequence.
2 http://www.nist.gov/mml/bbd/ppgenomeinabottle2.cfm Data release version Pedigreev0.2
BRCA1
94.844 94.846 94.848 94.850 94.852 94.854 94.856
Position chr14 [Kb]
41.20 41.22 41.24 41.26 41.28
0
20
40
60
80
100
Position chr17 [Kb]
Alle
le fr
eque
ncy
[%]
SERPINA1
NGS
Cov
erag
e de
pth
0
50
100
150
200 10 Kb 2 Kb
BRCA1 SERPINA1
Exons Introns Exons IntronsRegion size [bp] 7,362 73,827 3,687 10,259Bases covered [%] 100 99.533 100 100Bases>10X [%] 100 99.149 100 100Bases>20X [%] 100 98.843 100 100Bases>30X [%] 99.946 98.639 100 100Bases>50X [%] 98.343 98.148 100 100Bases>100X [%] 84.651 93.813 97.261 97.895Min coverage 22x 0x 75x 68xMedian coveragee 744.5x 435x 1017x 317x
BRCA1 SERPINA1
Total reads 3,000,000 500,000Mapped reads 2,976,887 496,580Mapped reads [%] 99,230 99,316Total mapped bases 722,614,711 129,059,094Bases on target 398,756,872 70,990,884Bases on target [%] 55.182 55.006
Gene size 81,189 13,946SERPINA1BRCA1
Tables 1 and 2: Statistics of generated sequencing data across BRCA1 and SERPINA1.
Figure 3. A) coverage profiles generated across BRCA1 & SERPINA1. White arrows indicate the positions of the TLA primer pairs. B) allelic ratios of all known SNVs in BRCA1 and SERPINA1(blue = heterozygous,red = homozygous).
One AT rich sequence of 347 bp in BRCA1 was not sequenced because it is not successfully prepped and/or sequenced in the cited combination of Illumina Library Prep and Sequencing. Otherwise, complete sequence information is obtained across the entire human BRCA1 and SERPINA1 genes and all previously identified SNVs (110 in BRCA1 and 27 in SERPINA1) were identified with the correct zygosity (Figure 3). Coverage statistics are shown in Table 1 and 2.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BLLC1
CERGENTIS B.V. YALELAAN 62 3584 CM UTRECHT THE NETHERLANDS 0031 - (0)30 - 760 16 36 INFO@CERGENTIS.COM WWW.CERGENTIS.COM
COMPLETE SEQUENCING OF SMALL AND LARGE STRUCTURAL VARIANTSTLA is highly suited to completely sequence both small and large structural changes in genes of interest. Figure 4 shows the results of a TLA analysis across the T cell receptor alpha-delta (TCRAD) locus in a sample harbouring a chr8 – chr14 translocation. Figure 5 shows the results of targeted sequencing of BRCA1 and NRXN1 in patient samples3,4.
3 Kind permission to share generated data was given by Dr. Andreas Rump of the Institut für Klinische Genetik, Dresden, Germany. 4 Kind permission to share generated data was given by Prof. Dr. Hilde Peeters of Centre for Human Genetics, Leuven, Belgium.
CATGTAAGTGATGAGAGGAGAT GAACCTTGGGGGGCA GGATAGCAACTATCAGTTAATCTGGN
GS C
over
age
dept
h
0
50
100
150
200
DAD1PVT1
chr8 chr14
129.10 129.12 129.14 129.16 22.98 23.00 23.02 23.04
10 Kb10 Kb
Genomic position [Mb]
0 20 40 60 80 100 120 140 160 180 200 220 240
chr1chr2chr3chr4chr5chr6chr7chr8chr9chr10chr11chr12chr13chr14chr15chr16chr17chr18chr19chr20chr21chr22chrX
Figure 4. A) Whole genome coverage plot and B) locus specific coverage plots generated with a TLA primer pair (white arrow) in a sample with a chr8-chr14 translocation. Peaks in coverage across the TRCAD gene and fusion partner are encircled in red. The identified breakpoint sequence is specified.
NGS
Cov
erag
e de
pth
NRXN1
−0.8−0.6−0.4−0.2
0.00.2
Log
R r
atio
10 Kb
0
100
200
Genomic position chr17 [Mb]
41.16 41.18 41.20 41.22 41.24 41.26 41.28
ACCCCCGCCTCCCAGGTTCAGGCGATTCTCC
BRCA1Genomic position chr2 [Mb]
NGS
Cov
erag
e de
pth
10 Kb
50.93 50.94 50.95 50.96 50.97 50.98
0
200
400
GAGATTTTTAAATCAGAGT ... AGATTTAA AAACAGAGATT
BRCA1RND2VAT1NRXN1 intron IFI35
Figure 5. A) IGV Coverage profile across deletions in the NRXN1 and BRCA1 gene generated with one TLA primer pair per sample (white arrows). The position of the deletion and sequence of the identified deletion breakpoints are shown. AThe breakpoint of the NRXN1 deletion contained a sequence insertion, which is shown in black. B) Log R ratios across the same deletion in the NRXN1 gene generated using a SNV microarray.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BLLC1
CERGENTIS B.V. YALELAAN 62 3584 CM UTRECHT THE NETHERLANDS 0031 - (0)30 - 760 16 36 INFO@CERGENTIS.COM WWW.CERGENTIS.COM
position on chr17
200
600
1000
position on chr17
position on chr17
NGS
Cov
erag
eN
GS Coverage
400
200
0
0
200
400 NGS Coverage
41200000 41220000 41240000 41260000
41200000 41220000 41240000 41260000
ILLUMINA PAIRED-END READS
41200000 41220000 41240000 41260000
exons + introns 97.4 95.3exons 100 99.6
>0 x >=10 xcoverage assignedto allele 1
exons + introns 98.1 96.4exons 99.9 99.4
>0 x >=10 xcoverage assignedto allele 2
PHASING USING SHORT READ PAIRED-END SEQUENCINGIn combination with paired-end Illumina sequencing, the unique composition of the TLA reads enables phasing across large distances and the assembly of sequencing reads in their allele of origin.
Figure 7 shows the results of sequencing and phasing of the BRCA1 gene in the NA12878 cell-line. The sample was sequenced using paired-end Illumina sequencing (4 million reads, 2 X 150 bp). Of the 110 known SNVs, 109 are heterozygous. All 109 heterozygous SNVs were phased using TLA data.
After the heterozyous SNVs have been phased each NGS read containing one of these SNVs can be assigned to have derived from one of alleles. Division of the reads based on their allele of origin resulted in complete coverage on both alleles (apart from the AT stretch that is missed in each experiment). This uniquely shows that both alleles of the gene were captured and sequenced.
Figure 7. A) Principle of TLA based phasing using paired-end Illumina Sequencing data – SNV’s found in paired-end data can be assigned to the same allele. B) Resulting phasing of heterozygous SNV’s across the entire BRCA1 gene. C) Sequencing coverage and identified SNV’s across BRCA1 locus in complete NGS data and in reads assigned to one of the individual alleles. D) Coverage statistics across each allele.
A)
B)
C)
D)
= A = C = G = T
41.21 41.22 41.23 41.24 41.25 41.26
Genomic position chr17 [Mb]
Alle
le fr
eque
ncy
[%]
5
25
50
75
95
10 Kb
95%
−5%
75%
−25%
50%
−50%
25%
−75%
95%
−5%
Figure 6. SAllelec frequencies of SNVs across BRCA1 in five serial dilutions of two homozygous cell-lines. The blue arrow indicates the position of the single TLA primer pair.
TARGETED DEEPSEQUENCING
TLA based deep sequencing enables the detection of rare variants. Figure 6 shows the allelic frequency of SNVs sequenced in homozygous cell-lines mixed in 95/5, 75/25, 50/50, 25/75, and 5/95 ratios in which BRCA1 was sequenced with one single TLA primer pair.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BLLC1
CERGENTIS B.V. YALELAAN 62 3584 CM UTRECHT THE NETHERLANDS 0031 - (0)30 - 760 16 36 INFO@CERGENTIS.COM WWW.CERGENTIS.COM
NGS Coverage
400
200
0
0
200
400 NGS Coverage
200
600
position on chr17
position on chr17
NGS
Cov
erag
e
41200000 41220000 41240000 41260000
position on chr17
SMRT READS
41200000 41220000 41240000 41260000
41200000 41220000 41240000 41260000
CONCLUSIONTLA based targeted sequencing presents unique opportunities for the targeted complete sequencing of genes of interest. It enables the (deep) sequencing of entire exonic and intronic regions and, as such, the detection of all Single Nucleotide Variants and Structural Variants. In combination with both short and long read Next Generation Sequencing technologies TLA also enables the phasing of regions of interest.
The TLA protocol is easy to execute and is suited for both routine analyses as well as the highly flexible targeted sequencing of individual regions of interest in individual samples.
Links:www.cergentis.com
http://www.nature.com/nbt/journal/v32/n10/full/nbt.2959.html
http://www.pacb.com/wp-content/uploads/AppNote-Targeted-Sequencing-Chromosomal-Haplotype-Assembly-Cergentis-TLA-Technology-SMRT-Sequencing.pdf
Illumina, Nextera and MiniSeq are are trademarks or registered trademarks of Illumina, Inc Pacific Biosciences, PacBio, and SMRT are trademarks of Pacific Biosciences. All other
trademarks are the sole property of their respective owners. For Research Use Only. Not for use in diagnostic procedures. Information in this document is subject to change without
notice. Cergentis assumes no responsibility for any errors or omissions in this document.
TLA, SMRT® SEQUENCING AND PHASINGPhasing is particularly e�ective in combination with long read sequencing technologies. Pacific Biosciences SMRT based sequencing enables the sequencing of entire TLA amplicons and therefore the phasing of all sequences that occur within one TLA amplicon (Figure 8).
In this experiment a single TLA amplification of the BRCA1 gene was performed in the NA12878 cell-line and sequenced using SMRT sequencing (79,323 CCS reads). 107/109 heterozygous SNVs in BRCA1 were captured and phased. Based on the phasing information the SMRT reads could be assigned to their allele of origin. This showed that 99% of both alleles were sequenced.
Figure 8. A) Principle of TLA based phasing using Pacific Biosciences SMRT Sequencing B) Sequencing coverage and identified SNV’s across BRCA1 locus in all data and in reads assigned to both individual alleles.
= A = C = G = T
A)
B)