+ All Categories
Home > Documents > Figure 2: Example of a figure caption€¦ · fosmid libraries of diverse ethnic origins (Figure...

Figure 2: Example of a figure caption€¦ · fosmid libraries of diverse ethnic origins (Figure...

Date post: 12-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
1
Pacific Biosciences, PacBio, SMRT, SMRTbell and the Pacific Biosciences logo are trademarks of Pacific Biosciences of California, Inc. All other trademarks are the property of their respective owners. © 2012 Pacific Biosciences of California, Inc. All rights reserved. Yoruban Tiling Path Scale chr14: User Track hg19 106,370,000 106,375,000 106,385,000 106,390,000 106,395,000 106,400,000 106,405 User Supplied Track Vega Protein-Coding Annotations IGHD1-14 IGHD6-13 IGHD5-12 IGHD4-11 IGHD3-10 IGHD3-9 IGHD2-8 IGHD1-7 IGHD6-6 IGHD5-5 IGHD4-4 IGHD3-3 IGHD2-2 AB019441.33 9.5Kb deletion Loss of 6 functional IGHD genes Deletion mediated by flanking repeats This deletion previously shown to influence Ab repertoire gene expression Assembly contigs of pooled non-overlapping fosmids Shear genomic DNA Select 40 Kb Generate end-sequences from library reference genome M Analy ind an circularized vector or (B) Characterizing haplotype diversity at the immunoglobulin heavy chain locus across human populations using novel long-read sequencing and assembly approaches Problem: A major barrier to genetic & functional studies in IGH are due to the current paucity of genomic data in the region. àThe full ~1Mb IGH V, D, and J gene region (excluding IGHC) has only been sequenced two times 2,5 . àThe current community IGH allele database, IMGT, is known to be incomplete, and ethnically biased 2,4,6,7,8 . Solution: Build a comprehensive map of sequence variation in IGH based on 14 complete IGH haplotypes assembled from 7 fosmid libraries of diverse ethnic origins (Figure 2). Building a Diverse Set of Reference Assemblies for the Human IGH locus Corey T Watson 1* , Melissa Laird Smith 2* , William Gibson 2 , Gintaras Deikus 2 , Oscar Rodriguez 2 , Maya Strahl 2 , Matthew Pendleton 2 , Phillip Comella 2 , Lana Harshman 3 , Wayne Marasco 4,5 , Evan E. Eichler 3 , Robert Sebra 2 , Jonas Korlach 6 , Andrew J. Sharp 2 , Ali Bashir 2 1 Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY; 2 Icahn School of Medicine at Mount Sinai & Icahn Institute for Genomics and Multi-scale Biology, New York, NY; 3 Department of Genome Sciences, University of Washington, Seattle, WA; 4 Department of Cancer Immunology & AIDS, Dana- Farber Cancer Institute, Boston, MA; 5 Department of Medicine, Harvard Medical School, Boston, MA; 6 Pacific Biosciences, Menlo Park, CA; *Contributed Equally Figure 1. (A) IMGT map of IGH locus on chromosome 14 1 , depicting IG functional, ORF, and pseudogenes, as well as alternate structural haplotypes. (B) Alternate haplotypes in the IGHV3-30 region that contain large insertions and deletions 1,2 . (C) Inter-population variability observed for functional polymorphisms in the IGHV1-69 region 2,3 . (D) GWAS arrays have low regional SNP density and poorly represent variants in the IGHV gene cluster 4 . (E) Germline polymorphism associates with IGHV1- 69 utilization in the naïve repertoire, and variability in the broadly neutralizing Ab anti-flu response 3 . The human immunoglobulin (IG) gene regions are among the most structurally complex and polymorphic regions of the human genome. àIG loci consist of duplicated variable (V), diversity (D), joining (J), and constant (C) genes that recombine in B cells to produce an individual’s expressed antibody (Ab) repertoire (Figure 1A). àThe IG heavy chain (IGH) harbors ~50-60 IGHV, 23 IGHD, 6 IGHJ, and 9 IGHC functional/ORF genes, with >250 known coding alleles (and counting!) (Figure 1A). àIGH is highly enriched for large complex structural and copy number variants (SVs; CNVs) up to 75 Kb in size, including insertions, deletions, and duplications (Figure 1A). àKnown coding single nucleotide polymorphisms (SNPs) and SVs/CNVs show considerable variation and evidence of selection between human populations (Figure 1B,1C). àExtreme haplotype diversity has hindered the use of high- throughput genomic assays in the region (Figure 1D). àHowever, in instances where IGH variants have been explicitly investigated in clinical cohorts, they associate with functional phenotypes (Figure 1E). TM (A) MB (0%, 0%) (0%, 25%) (0%, 0%) (0%, 25%) (0%, 40%) (0%, 50%) (0%, 20%) (16%, 16%) 0 10 20 30 40 50 60 106.40-50 106.50-60 106.60-70 106.70-80 106.80-90 106.90-00 107.00-10 107.10-20 107.20-30 AFR IGHV Variants Affy 6.0 SNPs Illumina Omni1-Quad SNPs Number of Variants (50%, 0%) (B) IGHV1-69 CNV/SNP Frequencies (C) (E) Development of a novel long fragment capture and sequencing assay for IGH Japanese Chinese Yoruban Utah (A) ID Ethnicity Fosmids NA18517 Yoruban 119 NA18507 Yoruban 140 NA18956 Japanese 136 NA19240 Yoruban 169 NA18555 Chinese 155 NA12878 CEPH/Utah 149 NA19129 Yoruban 148 (C) Background Figure 2. (A) Geographic locations of the 7 individuals previously sampled for fosmid library construction 9 . The 1000 Genomes Project IDs of fosmid samples, their ethnicities, and number of clones processed per library are provided in the table (bottom left). (B) For fosmid library construction, genomic DNA from each individual was sheared and size selected; 40 kb fragments were cloned into fosmid vectors. Sanger sequences generated from the ends of ~1 million clones per library were mapped to the reference genome assembly 9 , allowing for compilation of clone tiling paths across any locus of interest. We will utilize PacBio sequencing to generate a total of 14 ethnically diverse IGH reference assemblies from this fosmid resource. (C) Assemblies of initial fosmid tiling path in Yoruban NA18517 demonstrates utility of approach, and leads to the first complete description of a 9.5 Kb deletion, previously implicated in Ab repertoire gene usage variability 10 . Diploid Resolution of IGH (GIAB) Hap 1 Hap 2 PacBio Hap1 Specific Reads PacBio Hap2 Specific Reads 6kb 20kb Figure 3: Diploid haplotype resolution of an Ashkenazi Jewish proband from the Genome in a Bottle (GIAB) Consortium. An Ashkenazi Jewish Trio was sequenced using multiple-technologies including short and long-reads. A sample 20 kb interval with SNPs within the IGHV4-61 region is shown in the top panel. In the bottom panel, high-quality long-range phased SNPs allow PacBio reads two be partitioned into two distinct haplotypes via a novel algorithmic approach. These reads allow haplotype phasing of both SNPs and SVs within reads (e.g., the small deletion shown in Hap1) and the potential for de novo assembled haplotypes spanning large regions/events. Problem: Resolution of IGH complexity is challenging for standard genetic approaches. àSNPs alone are unable to represent complex allelic and structural haplotypes. àShort-read NGS data may allow for variant inference, but are often inaccurate. Phased assemblies are not possible. Solution: Develop a robust approach for assaying IGH genetic variation locus-wide with nucleotide resolution (Figure 4) that leverages longer read lengths to improve assemblies and the characterization of novel haplotype variation. (A) (B) (C) hg19, chr14:107,125000-107,230,000 Figure 4. (A) Nimblegen SeqCap probes were designed across the entire IGH locus using all existing haplotype data, corresponding to ~1.4 Mb of unique sequence, with an estimated coverage of 94.6% of targeted bases. (B) Standard assay conditions were adapted to capture 6-8kb fragments from a haploid hydatidiform mole sample. (C) PacBio long-read sequencing allowed for more reliable reconstruction of large structural variants between haplotypes. A large tandem duplication variant overlapping IGHV1-69 (black bar) is shown, which had been previously unresolvable with 300 bp MiSeq reads. This approach allowed for phasing of variants across the region and enabled the partitioning of reads into their respective tandem duplication blocks for improved assembly. Outcomes & Future Directions This work brings IGH into the modern genomics era, via: àResolving an expanded set of IGH haplotype maps and germline variants from a diverse set of human populations. These will allow for the discovery of novel genetic variation at this locus. àDevelopment of the beta design for the first locus-wide IGH genotyping platform. This will enable de novo, diploid resolution of IGH haplotypes, including: annotated germline IGH C, J, D, V allele calls; gene copy number; and a catalogue of non-coding SNPs and SVs. We believe this will enable many lines of novel investigation. Most importantly, by further defining the full extent of IGH diversity, we can examine the impact of this on antibody response in disease. Literature Cited 1.) Lefranc, M-P, Lefranc, G. 2001. The Immunoglobulin FactsBook. Academic Press, London. 2.) Watson, CT, et al. 2013. Complete haplotype sequence of the human immunoglobulin heavy-chain variable, diversity, and joining genes and characterization of allelic and copy-number variation. Am J Hum Genet. 92:530-46. 3.) Avnir, Y, et al. 2016. IGHV1-69 polymorphism modulates anti-influenza antibody repertoires, correlates with IGHV utilization shifts and varies by ethnicity. Sci Rep. srep20842. 4.) Watson, CT, Breden, F. 2012. The immunoglobulin heavy chain locus: genetic variation, missing data, and implications for human disease. Genes Immun. 13(5):363-73. 5.) Matsuda, F, et al. 1998. The complete nucleotide sequence of the human immunoglobulin heavy chain variable region locus. J Exp Med. 188:2151-62. 6.) Boyd, SD, et al. 2010. Individual variation in the germline Ig gene repertoire inferred from variable region gene rearrangements. J Immunol. 184(12):6986-92 7.) Scheepers, C, et al. 2015. Ability to develop broadly neutralizing HIV-1 antibodies is not restricted by the germline Ig gene repertoire. J Immunol. 194(9):4371-8. 8.) Gadala-Maria, D, et al. 2015. Automated analysis of high-throughput B-cell sequencing data reveals a high frequency of novel immunoglobulin V gene segment alleles. Proc Natl Acad Sci U S A. 112(8):E862-70. 9.) Kidd, JM, et al. 2008. Mapping and sequencing of structural variation from eight human genomes. Nature. 453: 56–64. 10.) Kidd, MJ, et al. 2016. DJ Pairing during VDJ Recombination Shows Positional Biases That Vary among Individuals with Differing IGHD Locus Immunogenotypes. J Immunol. 196(3):1158-64. contact: [email protected]; [email protected]; [email protected]
Transcript
Page 1: Figure 2: Example of a figure caption€¦ · fosmid libraries of diverse ethnic origins (Figure 2). _____ Building a Diverse Set of Reference Assemblies for the Human IGH locus Corey

Figure 2: Example of a figure caption

Pacific Biosciences, PacBio, SMRT, SMRTbell and the Pacific Biosciences logo are trademarks of Pacific Biosciences of California, Inc. All other trademarks are the property of their respective owners. © 2012 Pacific Biosciences of California, Inc. All rights reserved.

recruited to the J region. If the recombinase binds first to the 39 DRSS, the disruptions to the utilization frequencies of D genes wouldlikely be accompanied by disruptions to the utilization frequenciesof the J genes. This was not observed and, therefore, is consistentwith initial binding of the recombinase to the J RSS.It was suggested that persistent RAG protein expression leads to

successive rounds of DJ recombination, prior to V-DJ recombina-tion (29). It also was suggested that the tendency for more 59 Dgenes to be seen with more 39 J genes (IGHJ4 to IGHJ6) could bean outcome of this process (15). The results of the current study,using a very large dataset of nonproductive VDJ rearrangements,suggest that successive rounds of rearrangement could only accountfor some of the biases in the rearrangements of IGHJ5 and IGHJ6and cannot account for the overall patterns of IGHJ4 rearrange-ments. Our results are consistent with the existence of more com-plex positional biases, although variability in RSSs could also becontributing to the variability in the utilization of different genes.

There is no doubt that variations in both the heptamer andnonamer sequences of RSSs can influence recombination fre-quencies. This was shown in vivo (11) and in vitro (10). Opinionsdiffer regarding the effect of spacer sequences on the frequency ofrecombination. Wei and Leiber (13) concluded that the sequenceof the spacer had little effect on the frequency of gene recombi-nation, whereas other investigators (12) reported that differentialrates of rearrangement are associated with different spacer se-quences. The crystal structure of the mouse RAG1 nonamer-binding domain bound to DNA highlights, for the mouse atleast, the importance of contacts with the spacer sequence inRAG-mediated activity (30). The entire crystal structure of themouse RAG1-RAG2 complex has now been reported (31) andappears to corroborate the view that some sequence-specific rec-ognition of the recombinase by the nonamer-binding domain isimportant (30). However, differences in RSSs, including unre-ported variations in the RSSs, are, at most, just part of the ex-planation for variable recombination frequencies, because thereare many examples of genes with identical RSSs that have quitedifferent recombination frequencies (18).The contrasting variation that was seen in the pairing of IGHJ4

and IGHJ6 with different D genes, as well as the variation in geneutilization frequencies seen in individuals with differing D geno-types, gives credence to the view that DJ recombination is subjectto strong positional biases. However, the relative positions of thegenes along the chromosome are insufficient to explain the ob-served biases. Enhancer activity was proposed to facilitate loopingof the IGHV region DNA in mice, bringing distal genes into closerproximity to the other genes (19). Our data, in particular the un-dulating patterns observed for the proportions of each D generearranging with a particular J gene, suggest the possibility thatmultiple loops of D genes are active in the human. Such loopscould provide an explanation for the increase in use of IGHD2-2that accompanies the deletion of a distant block of IGHD genes.Ontogeny is associated with a progressive increase in the ad-

dition of N nucleotides in DJ junctions in both the human (14)and the mouse (32). The general lack of N addition during murinefetal and neonatal development ensures that IGHD genes are thedominant influence on the repertoire of fetal and neonatal CDR3sequences. This, together with strong pairing biases, could ensurethat critical murine specificities are generated at high frequency(33). Critical specificities may also be preferentially generated inearly human development, and this could explain the observationthat the proportion of 59D–39J pairings increases progressivelyfrom human fetal to neonatal to adult rearrangements (14). Theseincreases could possibly be achieved by chromatin remodelingleading to changes in the accessibility of certain genes. Therefore,the loss of critical D genes, as well as changes in DJ pairingfrequencies that are associated with different IGHD locus immu-nogenotypes, could be particularly important during fetal andneonatal development. Certainly, murine IGHD gene deletionswere shown to increase susceptibility to infection (34). Alterationsin IGHD gene sequences can also increase the likelihood of theproduction of self-reactive Abs (35).Biases in DJ pairing and differences in the pairing biases that

result from genotypic variation are firmly established by the resultsof this study, and these biases join a growing list of processesthat we now know can shape the Ab repertoire of an individual.Therefore, an inevitable outcome of immunogenotypic differencesamong individuals must be that particular V(D)J rearrangementsand particular H and L chain pairs are represented at differentfrequencies in the repertoires of different individuals. Thus, theview that the formation of any particular Ab specificity is simplythe result of stochastic processes is challenged again. In light of the

FIGURE 4. IGHD gene utilization frequencies in individuals with dif-ferent IGHD genotypes. Proportions of rearrangements using each IGHDgene were compared among four individuals who carried complete sets ofIGHD genes on both of their chromosomes, with pooled nonproductivesequence data for two individuals who carried a single copy of a deletionpolymorphism involving the six contiguous IGHD genes, IGHD3-3 toIGHD2-8 (Het6) (A), and pooled nonproductive sequence data for threeindividuals who carried a single copy of a deletion polymorphism in-volving the five contiguous IGHD genes, IGHD3-22 to IGHD1-26 (Het5)(B). Proportions of IGHD genes are a proportion of the total number ofrearrangements in which the IGHD gene could be identified. IGHDgenes are ordered as they appear in the genome, from 59 (left) to 39 (right).*p , 0.00001.

The Journal of Immunology 1163

at Mount Sinai School of M

edicine, Levy Library on February 2, 2016http://w

ww

.jimm

unol.org/D

ownloaded from

YorubanTilingPath

Scalechr14:

User Track

10 kb hg19106,370,000 106,375,000 106,380,000 106,385,000 106,390,000 106,395,000 106,400,000 106,405,000

User Supplied Track

Vega Protein-Coding AnnotationsIGHD1-14IGHD6-13IGHD5-12IGHD4-11IGHD3-10IGHD3-9IGHD2-8IGHD1-7IGHD6-6IGHD5-5IGHD4-4IGHD3-3IGHD2-2

AB019441.33AB019441.33AB019441.33

IGHD1-1AB019441.33AB019441.33

IGHV6-1

9.5Kbdeletion

Lossof6functionalIGHDgenes

Deletionmediatedbyflankingrepeats

ThisdeletionpreviouslyshowntoinfluenceAbrepertoiregeneexpression

Assemblycontigs ofpoolednon-overlappingfosmids

Shear

SheargenomicDNA

Select40Kb

circularized vector Generate end-sequences

from library

reference genome

Map reads to

reference:

Analyze paired reads,

individual reads,

and read depth

reference genome

deletion insertion inversion

DNA fragments

deletion insertion

No mapping position

(novel seq)

normal 2x

deletion 1x

3x

insertion

Paired-end read mapping Utilize separation and orientation

of paired reads to infer structural

alterations

Split-read mapping Utilize mapping of

individual reads to

identify breakpoints

Read depth Utilize regional depth of

coverage to infer copy

number

or

a

b c d

circularized vector Generate end-sequences

from library

reference genome

Map reads to

reference:

Analyze paired reads,

individual reads,

and read depth

reference genome

deletion insertion inversion

DNA fragments

deletion insertion

No mapping position

(novel seq)

normal 2x

deletion 1x

3x

insertion

Paired-end read mapping Utilize separation and orientation

of paired reads to infer structural

alterations

Split-read mapping Utilize mapping of

individual reads to

identify breakpoints

Read depth Utilize regional depth of

coverage to infer copy

number

or

a

b c d

(B)

Characterizing haplotype diversity at the immunoglobulin heavy chain locus across human populations using novel long-read sequencing and assembly approaches

Problem: A major barrier to genetic & functional studies in IGHare due to the current paucity of genomic data in the region.àThe full ~1Mb IGH V, D, and J gene region (excluding IGHC)has only been sequenced two times2,5.àThe current community IGH allele database, IMGT, is known tobe incomplete, and ethnically biased2,4,6,7,8.Solution: Build a comprehensive map of sequence variation inIGH based on 14 complete IGH haplotypes assembled from 7fosmid libraries of diverse ethnic origins (Figure 2).

___________Building a Diverse Set of Reference Assemblies for the Human IGH locus

Corey T Watson1*, Melissa Laird Smith2*, William Gibson2, Gintaras Deikus2, Oscar Rodriguez2, Maya Strahl2, Matthew Pendleton2, Phillip Comella2, Lana Harshman3, Wayne Marasco4,5, Evan E. Eichler3, Robert Sebra2, Jonas Korlach6, Andrew J. Sharp2, Ali Bashir2

1Department of Biochemistry and Molecular Genetics, University of Louisville School of Medicine, Louisville, KY; 2Icahn School of Medicine at Mount Sinai & Icahn Institute for Genomics and Multi-scale Biology, New York, NY; 3Department of Genome Sciences, University of Washington, Seattle, WA; 4Department of Cancer Immunology & AIDS, Dana-

Farber Cancer Institute, Boston, MA; 5Department of Medicine, Harvard Medical School, Boston, MA; 6Pacific Biosciences, Menlo Park, CA; *Contributed Equally

Figure 1. (A) IMGT map of IGH locus on chromosome 141, depicting IG functional,ORF, and pseudogenes, as well as alternate structural haplotypes. (B) Alternatehaplotypes in the IGHV3-30 region that contain large insertions and deletions1,2. (C)Inter-population variability observed for functional polymorphisms in the IGHV1-69region2,3. (D) GWAS arrays have low regional SNP density and poorly representvariants in the IGHV gene cluster4. (E) Germline polymorphism associates with IGHV1-69 utilization in the naïve repertoire, and variability in the broadly neutralizing Ab anti-fluresponse3.

The human immunoglobulin (IG) gene regions are among the moststructurally complex and polymorphic regions of the humangenome.àIG loci consist of duplicated variable (V), diversity (D), joining (J),and constant (C) genes that recombine in B cells to produce anindividual’s expressed antibody (Ab) repertoire (Figure 1A).àThe IG heavy chain (IGH) harbors ~50-60 IGHV, 23 IGHD, 6IGHJ, and 9 IGHC functional/ORF genes, with >250 known codingalleles (and counting!) (Figure 1A).àIGH is highly enriched for large complex structural and copynumber variants (SVs; CNVs) up to 75 Kb in size, includinginsertions, deletions, and duplications (Figure 1A).àKnown coding single nucleotide polymorphisms (SNPs) andSVs/CNVs show considerable variation and evidence of selectionbetween human populations (Figure 1B,1C).àExtreme haplotype diversity has hindered the use of high-throughput genomic assays in the region (Figure 1D).àHowever, in instances where IGH variants have been explicitlyinvestigated in clinical cohorts, they associate with functionalphenotypes (Figure 1E).

TM

(A)

(D)

MB

(0%, 0%)(0%, 25%)

(0%, 0%)(0%, 25%)

(0%, 40%) (0%, 50%)

(0%, 20%)(16%, 16%)

0

10

20

30

40

50

60

106.40-50 106.50-60 106.60-70 106.70-80 106.80-90 106.90-00 107.00-10 107.10-20 107.20-30

AFRIGHVVariants Affy6.0SNPs IlluminaOmni1-QuadSNPs

Nu

mb

er o

f Va

ria

nts

(50%, 0%)

www.nature.com/scientificreports/

2Scientific RepoRts | 6:20842 | DOI: 10.1038/srep20842

IGHV1-69 is one of the most polymorphic loci within the human IGHV gene cluster (14q32.33), exhibit-ing both allelic and copy number (CN) variation3,4. There are 14 alleles known to be associated with this gene that can be differentiated by the presence of either a phenylalanine (F) or leucine (L) at amino acid position 54 (Kabat numbering) within the apex of the CDR-H2 loop. Historically, this classification refers to the 51p1-like and hv1263-like allelic groups, respectively (Supplementary Fig. 1a). In addition to coding polymorphisms, the number of IGHV1-69 germline copies per diploid human genome can vary from 2–4 (Supplementary Fig. 1b)3,5,6, and there are 4 IGHV1-69 haplotypes with gene duplications in an earlier established American cohort5 (Supplementary Fig. 1c).

The relevance of F/L polymorphism to HV1-69-sBnAbs is the fact that almost all of these Abs originate from the IGHV1-69 F-allelic group. The conserved CDR-H2 Phe54 is a major anchor residue making direct contact with HA, and the replacement of Phe54 by Ala54 or Leu54 (L) has been shown to dramatically reduce binding affinities7,8. Importantly, in this study and in two recent studies9,10 the F/L polymorphism is shown to correlate with the frequencies of HV1-69-sBnAbs, being highest in individuals carrying F-alleles. In contrast, the predom-inant usage of the L-allele group in generation of non-neutralizing anti-gp41 Abs was recently demonstrated in a HIV-1 vaccination study11. These recent findings highlight the need to better understand how this genetic var-iability at the IGHV1-69 locus can modulate B cell repertoires as well as the extent to which this polymorphism varies across diverse human populations3,5,6,12. To address these two questions we analyzed Ab repertoires from an NIH H5N1 vaccinee cohort and samples from the 1000 Genomes Project (1KG)13, respectively. We report the new finding that the two allele families have markedly different effects on Ab repertoire expression that is in part explained by CN variation but there are also differences in B cell expansion and somatic hypermutation. In addition, we discovered marked variance in IGHV1-69 gene duplication and CN among the different ethnic populations that will affect HV1-69-sBnAb responses to influenza vaccines and natural infections.

ResultsComparison of antibody responses to H5N1 vaccine among three IGHV1-69 genotypic groups. Individuals from a 2007 H5N1 vaccination trial were genotyped and phenotyped for IGHV1-69 CDR-H2 Phe54 F/L polymorphism (rs55891010; see Fig. 1a and methods). Their one month post-vaccination sera was competed against the anti-stem sBnAb F10 for binding to the pandemic H1CA0709 HA, which was not circulating when the serum samples were collected. Figure 1b shows a statistically significant difference in F10 blocking activity among the groups and was highest for the F/F group, followed in decreasing order by the F/L and L/L groups. The microneutralization titers (MN) for the F/F group were 1.67 and 2.29 fold higher than the

Figure 1. Correlation between IGHV1-69 polymorphism and Ab response to the H5 vaccine. The pre-vaccinated sera of the 85 individuals were diluted 1/1250 and analyzed for binding activities against the anti-IGHV1-69 idiotype mAb G6. Binding activities were normalized by subtracting the G6 MSD signal with the MSD signal obtained from an isotype control, and by using a standard curve made with the IGHV1-69 F-allele-based IgG Ab D8035. (b) Post-vaccination sera (diluted 1/125) were competed with the anti-stem Ab F10 IgG for binding to H1CA0709. Cuzick’s trend test was used to further confirm that the occurrence of F-alleles increases the ability of serum to block F10 binding (L/L = 0, F/L = 1, F/F = 2). Error bars represent standard error of mean.

www.nature.com/scientificreports/

3Scientific RepoRts | 6:20842 | DOI: 10.1038/srep20842

mean values for F/L and L/L groups, respectively with a similar trend in their median values (Supplementary Fig. 2a). The post-vaccination hemagglutination inhibition titers (HAI) and the ELISA titers for H1CA0709 and H1CA0709 HA proteins were shown to not significantly differ from one another among the three IGHV1-69 genotypic groups (Supplementary Fig. 2b,d,e). In addition, when HAI and MN titers were compared within indi-viduals, there was also a trend toward lower HAI/MN ratios for the F/F and F/L groups compared to the L/L individuals (Supplementary Fig. 2c). Supplementary Fig. 3 shows that stem binding activity originally boosted by H5VN04 vaccination was generally maintained within each genotypic group over the 4-year period. The similar trends observed in the analysis of the F10 competition studies, MN titers, and HAI/MN ratios supports the con-cept that IGHV1-69 germline polymorphism has an effect on the profile of the HA-directed Ab response, with expression from F-alleles leading to a higher Ab response to the stem domain.

Effect on IGHV1-69 polymorphism on germline gene utilization and expressed HV1-69-sBnAb repertoires. To assess the role of IGH locus polymorphism on expressed IGHV1-69 germline gene reper-toires ≥ 5 × 106 PBMCs (circa 10% B cells) were analyzed from the blood samples of 18 individuals (F/F = 4, F/L = 11, L/L = 3), collected 4 years following the H5N1 vaccine trial. The IGHV-gene frequencies from inde-pendent V(D)J rearrangements were rendered non-redundant, and IgM and IgG class determinations were made by analyzing the PCR products obtained from reverse priming with IG constant region primers. Figure 2 shows that in both the unmutated IgM (naïve) and all IgG (memory) V-segment datasets, IGHV1-69 usage was at the highest frequency in the F/F group (7.7% IgM, 3.9% IgG), intermediate frequency in the F/L group (4.7% IgM, 3% IgG), and the lowest frequency in the L/L group (1.8% IgM, 1.4% IgG). The significance of the ~3-fold difference in IGHV1-69 usage between the F/F and L/L groups was further demonstrated by noting that, in the F/F group, IGHV1-69 was the 4th and 7th most frequently used IGHV germline gene in the unmutated IgM and IgG datasets, respectively, whereas in the L/L group IGHV1-69 was ranked 18th and 23rd (data not shown). This variation in IGHV1-69 germline gene utilization was also seen for putative HV1-69-sBnAbs with the highest frequencies and correlation coefficients in individuals with F/F alleles and across the IgM B cell subset (Supplementary Fig. 4a–d). We have been able to further delineate some of these HV1-69-sBnAbs signatures through functional analyses (Supplementary Fig. 5 and text). These results demonstrate that F-allele individuals have higher levels of circulat-ing IGHV1-69 Ab and HV1-69-sBnAb repertoires than L-allele individuals.

Differential effects of IGHV1-69 genotype on B cell expansion, somatic hypermutation (SHM) and evolution to HV1-69-sBnAb clones. We next investigated if other B cell functions were affected by IGHV1-69 genotype. Analysis of the naive and memory IGHV1-69 datasets within each individual’s repertoire revealed additional variation in clonal expansion, SHM frequency, and IgG-to-IgM ratios among each geno-typic group. For example, the frequency of highly expanded IGHV1-69 clones (frequency > 1e-4) was greater for L/L than the F/L or F/F genotypic groups (Supplementary Fig. 6a). However, the clones of the F/F group, of which there were fewer highly expanded clones, were also significantly more mutated than those of the L/L group (Supplementary Fig. 6b). Additionally, we note that IGHV1-69 is unusual among V-genes in that these BCRs appear at a lower frequency in memory B-cells than in naïve B-cells (Supplementary Fig. 6c) (an approximately 40% reduction)14. Interestingly, this effect was strongest in individuals of the F/F genotype. These results suggest that the capacity of the IGHV1-69 B cells to undergo expansion, SHM and Ig class switching may be different among the genotypic groups.

Figure 2. Analyzing IGHV1-69 V-segment gene utilization among the three IGHV1-69 genotypic groups. (a) The frequency of IGHV1-69 IgM clones defined by unmutated V-segments (b) the frequency of IGHV1-69 IgG clones. Error bars represent standard error of mean.

(B)

www.nature.com/scientificreports/

2Scientific RepoRts | 6:23876 | DOI: 10.1038/srep23876

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license,

unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Figure 1.

IGHV1-69 CNV/SNP Frequencies(C)

(E)

Diploid Resolution of IGHDevelopment of a novel long fragment capture and sequencing assay for IGH

Japanese

Chinese

Yoruban

Utah

(A)

ID Ethnicity FosmidsNA18517 Yoruban 119NA18507 Yoruban 140NA18956 Japanese 136NA19240 Yoruban 169NA18555 Chinese 155NA12878 CEPH/Utah 149NA19129 Yoruban 148

(C)

Background

Figure 2. (A) Geographic locations of the 7 individuals previously sampled for fosmidlibrary construction9. The 1000 Genomes Project IDs of fosmid samples, their ethnicities,and number of clones processed per library are provided in the table (bottom left). (B) Forfosmid library construction, genomic DNA from each individual was sheared and sizeselected; 40 kb fragments were cloned into fosmid vectors. Sanger sequences generatedfrom the ends of ~1 million clones per library were mapped to the reference genomeassembly9, allowing for compilation of clone tiling paths across any locus of interest. Wewill utilize PacBio sequencing to generate a total of 14 ethnically diverse IGH referenceassemblies from this fosmid resource. (C) Assemblies of initial fosmid tiling path in YorubanNA18517 demonstrates utility of approach, and leads to the first complete description of a9.5 Kb deletion, previously implicated in Ab repertoire gene usage variability10.

Diploid Resolution of IGH (GIAB)

Hap 1

Hap 2

PacBio Hap1 Specific Reads PacBio Hap2 Specific Reads

6kb

20kb

Figure 3: Diploid haplotype resolution of an Ashkenazi Jewish proband from theGenome in a Bottle (GIAB) Consortium. An Ashkenazi Jewish Trio was sequencedusing multiple-technologies including short and long-reads. A sample 20 kb intervalwith SNPs within the IGHV4-61 region is shown in the top panel. In the bottom panel,high-quality long-range phased SNPs allow PacBio reads two be partitioned into twodistinct haplotypes via a novel algorithmic approach. These reads allow haplotypephasing of both SNPs and SVs within reads (e.g., the small deletion shown in Hap1)and the potential for de novo assembled haplotypes spanning large regions/events.

Problem: Resolution of IGH complexity is challenging forstandard genetic approaches.àSNPs alone are unable to represent complex allelic andstructural haplotypes.àShort-read NGS data may allow for variant inference, but areoften inaccurate. Phased assemblies are not possible.Solution: Develop a robust approach for assaying IGH geneticvariation locus-wide with nucleotide resolution (Figure 4) thatleverages longer read lengths to improve assemblies and thecharacterization of novel haplotype variation.

(A)

(B)

(C) hg19, chr14:107,125000-107,230,000

Figure 4. (A) Nimblegen SeqCap probes were designed across the entire IGH locus usingall existing haplotype data, corresponding to ~1.4 Mb of unique sequence, with anestimated coverage of 94.6% of targeted bases. (B) Standard assay conditions wereadapted to capture 6-8kb fragments from a haploid hydatidiform mole sample. (C) PacBiolong-read sequencing allowed for more reliable reconstruction of large structural variantsbetween haplotypes. A large tandem duplication variant overlapping IGHV1-69 (black bar)is shown, which had been previously unresolvable with 300 bp MiSeq reads. This approachallowed for phasing of variants across the region and enabled the partitioning of reads intotheir respective tandem duplication blocks for improved assembly.

Outcomes & Future Directions

This work brings IGH into the modern genomics era, via:àResolving an expanded set of IGH haplotype maps andgermline variants from a diverse set of human populations. Thesewill allow for the discovery of novel genetic variation at this locus.àDevelopment of the beta design for the first locus-wide IGHgenotyping platform. This will enable de novo, diploid resolution ofIGH haplotypes, including: annotated germline IGH C, J, D, Vallele calls; gene copy number; and a catalogue of non-codingSNPs and SVs.We believe this will enable many lines of novel investigation. Mostimportantly, by further defining the full extent of IGH diversity, wecan examine the impact of this on antibody response in disease.

Literature Cited1.) Lefranc, M-P, Lefranc, G. 2001. The Immunoglobulin FactsBook. Academic Press, London.2.) Watson, CT, et al. 2013. Complete haplotype sequence of the human immunoglobulin heavy-chain variable, diversity, and joining genes andcharacterization of allelic and copy-number variation. Am J Hum Genet. 92:530-46.3.) Avnir, Y, et al. 2016. IGHV1-69 polymorphism modulates anti-influenza antibody repertoires, correlates with IGHV utilization shifts and varies byethnicity. Sci Rep. srep20842.4.) Watson, CT, Breden, F. 2012. The immunoglobulin heavy chain locus: genetic variation, missing data, and implications for human disease.Genes Immun. 13(5):363-73.5.) Matsuda, F, et al. 1998. The complete nucleotide sequence of the human immunoglobulin heavy chain variable region locus. J Exp Med.188:2151-62.6.) Boyd, SD, et al. 2010. Individual variation in the germline Ig gene repertoire inferred from variable region gene rearrangements. J Immunol.184(12):6986-927.) Scheepers, C, et al. 2015. Ability to develop broadly neutralizing HIV-1 antibodies is not restricted by the germline Ig gene repertoire. JImmunol. 194(9):4371-8.8.) Gadala-Maria, D, et al. 2015. Automated analysis of high-throughput B-cell sequencing data reveals a high frequency of novel immunoglobulinV gene segment alleles. Proc Natl Acad Sci U S A. 112(8):E862-70.9.) Kidd, JM, et al. 2008. Mapping and sequencing of structural variation from eight human genomes. Nature. 453: 56–64.10.) Kidd, MJ, et al. 2016. DJ Pairing during VDJ Recombination Shows Positional Biases That Vary among Individuals with Differing IGHD LocusImmunogenotypes. J Immunol. 196(3):1158-64.

contact: [email protected]; [email protected]; [email protected]

Recommended