Fully phased allele-level sequencing of highly polymorphic HLA genes is greatly facili-tated by SMRT ® sequencing technology. In the present work, we have evaluated mul-tiple DNA barcoding strategies for multiplexing several loci from multiple individuals, using three different tagging methods. Specifically MHC class I genes HLA-A, -B, and –C were indexed via DNA Barcodes by either tailed primers or barcoded SMRTbell™ adapters. Eight different 16-bp barcode sequences were used in symmetric & asymme-tric pairing. Eight DNA barcoded adapters in symmetric pairing were independently ligated to a pool of HLA-A, -B and –C for eight different individuals, one at a time and pooled for sequencing on a single SMRTcell. Amplicons generated from barcoded pri-mers were pooled upfront for library generation. Eight symmetric barcoded primers were generated for HLA class I genes. These primers facilitated multiplexing of 8 sam-ples and also allowed generation of unique asymmetric pairings for simultaneous am-plification from 28 reference genomic DNA samples. The data generated from all 3 me-thods was analyzed using LAA protocol in SMRT analysis V2.3. Consensus sequences generated were typed using GenDx NGS engine HLA-typing software.
Evaluation of multiplexing strategies for HLA genotyping using PacBio® Sequencing technology
HLA Sequencing on PacBio® RSII
Figure 1. Allele Sequence Coverage Comparison:A HLA-A amplifies using NGSGo Reagent & Sequenced on PacBio RS II B Traditional Sanger sequencing of exons #2 and #3.
.
Conclusions
• The long read lengths and high consensus accuracy of SMRT Se-quencing make it well suited for analyzing HLA loci
• In this pilot study we demonstrated that the Pac Bio RS II sequen-cing platform is capable of typing HLA class I alleles in a highly ac-curate way.
• Robust amplification performance for symmetric and asymmetric barcode-labelled amplification primers was shown
• High resolution typing results at the 3rd to 4th field resolution level were obtained concordant with pre-typing results. Only rare typing inconsistencies in case of unbalanced heterozygous amplification.
• Phased consensus sequences for complex HLA class I multiplex se-quences over the entire length of an amplicon were obtained using NGSengine (GenDx)
Figure 2. HLA B Diversity due to Exon Combinations Numbers above Exons denote unique CDS exons, while numbers between Exons denote the number of unique combinations with neighboring exons
SMRT® Sequencing Chemistry
Figure 3. SMRT Sequencing Read-Length DistributionDistribution of read lengths from a typical SMRT Sequencing run on a PacBio® RS II using the P5/C3 chemistry. With Median read lengths >8 kb and maxi-mum up to 30kb. Full length HLA genes are sequenced and correctly phased without clo-ning or manual curation.
Figure 4. SMRT Sequencing Consensus ConcordanceConcordance of consensus sequences by average genome coverage from SMRT Se-quencing using the P5/C3 chemistry.When sequencing errors are truly random, consensus accuracy depends only on having sufficient coverage.
Multiplexing Strategies Long Amplicon Analysis
Figure 8. HLA typing score for multiple library preparation strategies using NGSengine software. Three different libraries were run on the Pac Bio RSII system and typed using GenDx NGSengi-ne typing software. High resolution (3rd and 4th field) typings were obtained that are concordant with the pre-typings: HLA-A score was 100% for all methods. HLA-B score was 100%, except for the asymmetric set (96%) due to n=1 missed B*08 allele. HLA-C score was 100% (unlabeled), 88% (barcode-labeled, symm), and 93% (barcode-labeled, asymm) due to n=1 missed C*03:04 allele.
Results
References
[1] Robinson, James, et al. “the IMGT/HLA database.” Nucleic acids research41.D1 (2013): D1222-D1227.[2] Chin, Chen-Shan, et al. “Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.” Nature methods 10.6 (2013): 563-569.[3] https://github.com/bnbowman/HlaTools
Swati Ranade1, Kevin Eng1, John Harting1, Erik Rozemuller2, Nienke Westerink2, Brett Bowman1, Lance Hepler1, Maarten T Penning21Pacific Biosciences of California, Inc., Menlo Park, United States of America2GenDx, Utrecht, Netherlands
Figure 6. Diagram of Long Amplicon Analysis Barcoded, sub-reads are grouped by barcode pair and processed independently within each group. Sub-reads are filtered based on user-definable criteria for read quality and length. Sub-reads that pass all filters are aligned to each other and clustered based on the results. Each clus-ter is iteratively “phased” by identifying and separating sub-reads based on high-scoring muta-tions in a De novo pipeline. Each resulting sub-cluster is polished with Quiver to generate a high-quality consensus [2]. The consensus sequences are filtered to remove PCR artifacts.
Figure 7. Amplification performance for barcoded Amplification pri-mers for HLA-A, -B, and –C. HLA-A, -B, and –C locus-specification amplification was performed with unlabeled (Method I) and barcode-labelled amplification primers (Method II, symmetric vs asymmetric). Amplification per-formance was scored for robustness (%) and balanced allele fraction as determined by Sanger SBT.
Figure 5. Multi-plexing Strategies and Workflows A Symmetric SMRT-bell™ barcodes: These bar-codes are attached to the adapters and is the least preferred barcode method as libraries are to be pre-pared separately before pooling
B Symmetric Primer barcodes: Primers are tag-ged with symmetric barco-des on both forward and reverse orientation
C Asymmetric Primer Barcodes: Primers are tag-ged with asymmetric bar-codes
Recombination events also add to diversity in HLA genes (2). Exon only sequencing is therefore insufficient for resolving, variation in new alleles, sometimes caused by muta-tions occurring outside exon 2& 3 as well as the CDS region. Fully phased, allele-level genotyping with phasing across exons and introns for accurate SNP determination, in a single read span is highly advantageous.
Abstract
One Patient one Barcode
for all HLA genes
Method 1:Barcoded SMRTbell Adapters
Method 2:Barcoded Amplification
Primers
0
10
20
30
40
50
60
70
80
90
100
Unlabelled Labelled (Symm) Labelled (Asymm)
Typing score NGSengineHLA-A HLA-B HLA-C
Primary cause of missed alleles in both symmetric and asymmetric barcode
labeled primer was amplification imbalance
Figure 9. Example of HLA-A, -B, and –C typing result as generated with NGSengine. The location of the 5’UTR and 3’UTR (blue), exons (yellow) and introns (connecting black lines) are shown. The vertical colored bars indicate the heterozygous positions. Full phasing across the entire gene was obtained (bold horizontal red line). High resolution typing without exon mismat-ches are demonstrated.
For R
esea
rch
USe
Onl
y. N
ot fo
r use
in d
iagn
ostic
pro
cedu
res.
Paci
fic B
iosc
ienc
es, P
acBi
o, S
MRT
, SM
RTbe
ll an
d Is
o-Se
q ar
e tr
adem
arks
of P
acifi
c Bi
osci
ence
s of
Cal
iforn
ia, I
nc.
Gen
Dx,
NG
Sgo,
NG
Seng
ine
are
trad
emar
ks o
f Gen
ome
Dia
gnos
tics
b.v.
All
othe
r tra
dem
arks
are
the
pro
pert
y of
the
ir re
spec
tive
owne
rs20
14 C
opyr
ight
Gen
Dx,
all
right
s re
serv
ed.
1,000 kb 2,000 kb 3,000 kb 4,000 kb 5,000 kb 6,000 kb
A. Full Length + SMRT Sequencing
B. Traditional SBT
exon 1 exon 2 exon 3 exon 4 exon 5 exon 6
42 1210 1591 168 36 3
1836 4237 1275 819 62
0
10
20
30
40
50
60
70
80
90
100
Unlabelled Labelled (Symm) Labelled (Asymm)
Amplification performance HLA-A HLA-B HLA-C
0
10
20
30
40
50
60
70
80
90
100
Unlabelled Labelled (Symm) Labelled (Asymm)
Balanced amplificationHLA-A HLA-B HLA-C
HLA-A
HLA-B
HLA-C
Separate by Barcode
FilterSubreads Overlap Cluster
AsymmetricPrimers
SymmetricPrimers
SymmetricAdapters
Phase
FilterConsensusSequences
Quiver
Quiver
Report