+ All Categories
Home > Documents > 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

Date post: 10-Feb-2017
Category:
Upload: hoangque
View: 215 times
Download: 1 times
Share this document with a friend
29
1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk See De Ravin 1 , Ling Su 2 , Narda Theobald 1 , Uimook Choi 1 , Janet L Macpherson 3,4 , 4 Michael Poidinger 3,5 , Geoff Symonds 3,6 , Susan M Pond 3,7 , Andrea L Ferris 8 , Stephen H 5 Hughes 8 , Harry L Malech 1 , Xiaolin Wu 2 6 7 1. Laboratory of Host Defenses, National Institute of Allergy and Infectious 8 Diseases, National Institutes of Health, Bethesda, MD 20892 9 2. Laboratory of Molecular Technology, Frederick National Laboratory for 10 Cancer Research, PO Box B, Frederick, MD 21702 11 3. Johnson and Johnson Research Pty Ltd, Sydney, Australia 12 4. Present address: Cell & Molecular Therapies, Royal Prince Alfred Hospital, 13 Sydney & Sydney Medical School, University of Sydney, Sydney, Australia 14 5. Present address: Singapore Immunology Network (SIgN), Agency for Science, 15 Technology and Research (A*STAR), Singapore 138648 16 6. Present address: Calimmune Inc, 10990 Wilshire Blvd, Suite 1050, Los 17 Angeles, CA USA 18 7. Present address: United States Studies Centre, The University of Sydney, 19 Sydney, Australia 20 8. HIV Drug Resistance Program, National Cancer Institute, Frederick, MD 21 21702 22 23 24 Correspondence: 25 Xiaolin Wu 26 Laboratory of Molecular Technology 27 Leidos Biomedical Research Inc. 28 Frederick National Laboratory for Cancer Research 29 PO Box B, Frederick, MD 21702 30 Tel: 301-846-7677 31 Fax: 301-846-6100 32 Email: [email protected] 33 34 35 Harry L Malech 36 Laboratory of Host Defenses 37 National Institute of Allergy and Infectious Diseases 38 National Institutes of Health 39 10 Center Drive, Bldg 10, Rm 5-3750 40 Bethesda, MD 20892-1456 41 Tel: 301-480-6916 42 Fax: 301-402-0789 43 Email: [email protected] 44 JVI Accepts, published online ahead of print on 5 February 2014 J. Virol. doi:10.1128/JVI.00011-14 Copyright © 2014, American Society for Microbiology. All Rights Reserved. on April 7, 2018 by guest http://jvi.asm.org/ Downloaded from
Transcript
Page 1: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

1

Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk See De Ravin1, Ling Su2, Narda Theobald1, Uimook Choi1, Janet L Macpherson3,4, 4 Michael Poidinger3,5, Geoff Symonds3,6, Susan M Pond3,7, Andrea L Ferris8, Stephen H 5 Hughes8, Harry L Malech1, Xiaolin Wu2 6 7 1. Laboratory of Host Defenses, National Institute of Allergy and Infectious 8 Diseases, National Institutes of Health, Bethesda, MD 20892 9 2. Laboratory of Molecular Technology, Frederick National Laboratory for 10 Cancer Research, PO Box B, Frederick, MD 21702 11 3. Johnson and Johnson Research Pty Ltd, Sydney, Australia 12 4. Present address: Cell & Molecular Therapies, Royal Prince Alfred Hospital, 13 Sydney & Sydney Medical School, University of Sydney, Sydney, Australia 14 5. Present address: Singapore Immunology Network (SIgN), Agency for Science, 15 Technology and Research (A*STAR), Singapore 138648 16 6. Present address: Calimmune Inc, 10990 Wilshire Blvd, Suite 1050, Los 17 Angeles, CA USA 18 7. Present address: United States Studies Centre, The University of Sydney, 19 Sydney, Australia 20 8. HIV Drug Resistance Program, National Cancer Institute, Frederick, MD 21 21702 22 23 24 Correspondence: 25 Xiaolin Wu 26 Laboratory of Molecular Technology 27 Leidos Biomedical Research Inc. 28 Frederick National Laboratory for Cancer Research 29 PO Box B, Frederick, MD 21702 30 Tel: 301-846-7677 31 Fax: 301-846-6100 32 Email: [email protected] 33 34 35 Harry L Malech 36 Laboratory of Host Defenses 37 National Institute of Allergy and Infectious Diseases 38 National Institutes of Health 39 10 Center Drive, Bldg 10, Rm 5-3750 40 Bethesda, MD 20892-1456 41 Tel: 301-480-6916 42 Fax: 301-402-0789 43 Email: [email protected] 44

JVI Accepts, published online ahead of print on 5 February 2014J. Virol. doi:10.1128/JVI.00011-14Copyright © 2014, American Society for Microbiology. All Rights Reserved.

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 2: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

2

Abstract 45 46 Retroviral vectors have been used in successful gene therapies. However, in 47 some patients, insertional mutagenesis led to leukemia or myelodysplasia. Both the 48 strong promoter/enhancer elements in the Long Terminal Repeats (LTRs) of Murine 49 Leukemia Virus (MLV)-based vectors and the vector-specific integration site 50 preferences played an important role in these adverse clinical events. MLV 51 integration is known to prefer regions in or near transcription start sites (TSS). 52 Recently, BET family proteins were shown to be the major cellular proteins 53 responsible for targeting MLV integration. Although MLV integration sites are 54 significantly enriched at TSS, only a small fraction of the MLV integration sites 55 (<15%) occur in this region. To resolve this apparent discrepancy, we created a 56 high-resolution genome-wide integration map of more than one million integration 57 sites from CD34+ hematopoietic stem cells transduced with a clinically relevant 58 MLV-based vector. The integration sites form ~60,000 tight clusters. These clusters 59 comprise ~1.9% of the genome. The vast majority (87%) of the integration sites are 60 located within histone H3K4me1 islands, a hallmark of enhancers. The majority of 61 these clusters also have H3K27ac histone modifications, which mark active 62 enhancers. The enhancers of some oncogenes, including LMO2, are highly preferred 63 targets for integration without in vivo selection. 64 65 Importance 66 We show that active enhancer regions are the major targets for MLV 67 integration; this means that MLV preferentially integrates in regions that are 68 favorable for viral gene expression in a variety of cell types. The results provide 69 insights for MLV integration target site selection and also explain the high risk of 70 insertional mutagenesis that is associated with gene therapy trials using MLV 71 vectors. 72

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 3: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

3

Introduction 73 74 Retroviral vectors are used as gene delivery tools in a broad range of cells, 75 and for clinical gene therapy in patients, because of their high efficiency of 76 integration and stable delivery of target genes. However, insertional activation of 77 oncogenes has been reported in human gene therapy trials using MLV-based 78 vectors. Five out of 20 patients who were treated for SCID-X1 in two separate 79 studies using an MLV-based vector developed leukemia 3-5 years after treatment (1, 80 2). Gene transfer treatment of Wiskott-Aldrich syndrome with an MLV vector has 81 also been associated with the development of leukemia (3). Clonal expansion of 82 vector-modified cells and the development of myelodysplasia have also been 83 reported in a murine retroviral gene therapy trial for chronic granulomatous 84 disease (4). The expansion was attributed to the activation of nearby oncogenes, for 85 example LMO2 and MECOM, by the strong enhancer/promoter elements within the 86 long terminal repeats (LTRs) of the MLV vectors. Vector-specific integration 87 preferences may also play an important role. Much has been learned about the 88 integration preferences of HIV and HIV-based lentivectors and their targeting 89 mechanism. HIV strongly prefers to integrate inside actively transcribed genes (5). 90 The host protein LEDGF/p75, through its interactions with HIV integrase, is known 91 to be critical for this integration site preference (6, 7). 92 MLV and MLV-based vectors preferentially integrate near transcription start 93 sites (TSS) (8). However, the mechanism that underlies this preference was only 94 recently elucidated. Several groups identified bromodomain and extraterminal 95 (BET) proteins (BRD2, BRD3, and BRD4) as the major host factors that specifically 96 interact with MLV integrase and mediate the preferential integration of MLV near 97 TSS (9-12). BET proteins bind to acetylated histone tails via their bromodomains 98 (13-15). The ET domains of BET proteins selectively bind to the C-terminal domain 99 (CTD) of MLV integrase. Disruption of the CTD-ET interaction, or inhibition of the 100 bromodomain binding by small molecules such as JQ1 and I-BET reduces the 101 efficiency of MLV integration and its preference for TSS (9, 11). However, TSS and 102

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 4: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

4

the surrounding regions (±1kb) of the host genome comprise only a small fraction 103 (less than 15%) of all MLV integration sites. 104 In recent years, there have been major advances in understanding the 105 organization of the human genome and recognition of the importance of epigenetic 106 modifications of chromatin, including histone modifications. In this study, we 107 mapped more than 1 million integration sites for a clinically relevant MLV-based 108 retroviral vector designed to treat chronic granulomatous disease (16) in human 109 CD34+ hematopoietic stem cells and compared the integration sites to the 110 distribution of epigenetic marks in the human genome. Our results demonstrate that 111 histone modification H3K4me1, which marks enhancers, is present at 87% of all 112 integration sites for the MLV vector and that active enhancers are preferred over 113 inactive/poised enhancers. In addition, the MLV vector preferentially integrates 114 near LMO2 without any selection, potentially exacerbating the problem of 115 insertional mutagenesis in hematopoietic stem cells. 116 117 Materials and Methods 118

Transduction of human CD34+ cells. G-CSF mobilized CD34+ 119 hematopoietic stem cells were isolated from healthy adult human volunteers by 120 apheresis, immune-column selected (Miltenyi), and cryopreserved (NIAID IRB 121 approved protocol 94-I-0073). For transductions, the CD34+ cells were thawed, 122 placed into culture in X-Vivo 10 media (Lonza) supplemented with 1% human 123 serum albumin (Baxter Healthcare Corporation) and stimulated for one day with 124 Stem Cell Factor (SCF), FMS-like Tyrosine Kinase 3 Ligand (FLT-3L), thrombopoietin 125 (TPO) all at 50 ng/ml and Interleukin-3 at 10ng/ml (all from Peprotech). Starting on 126 the second day of culture, the CD34+ cells were transduced with the murine MLV 127 vector, MFGS-gp91 (16), daily for three days by spinoculation on retronectin coated 128 plates and harvested the morning after the last transduction. To provide an analysis 129 of HIV lentivector integration sites for comparison to the MLV vector integration 130 sites, CD34+ cells from similar healthy human volunteers were transduced with a 131 derivative of a clinically relevant self-inactivating lentivector, Cl20 i4 EF1α hγc OPT 132 (17, 18). Following one day of pre-stimulation in cytokines as described above, the 133

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 5: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

5

cells were exposed to the lentivector on two consecutive days. Upon completion of 134 the MLV vector or HIV lentivector transductions, the CD34+ cells were harvested, 135 washed, and genomic DNA was extracted for integration site analysis. 136 In vivo mouse xenograft model. CD34+ cells from three different healthy 137 adult volunteer donors were transduced with the MLV vector as described above. 138 For each donor, 4 x 106 transduced cells were transplanted into each of 6 NOD-139 SCIDγc- mice (4 donors, a total of 24 mice). The NOD-SCIDγc- mice were irradiated 140 with 300 rads 2 days before transplantation of the MLV vector-transduced human 141 CD34+ cells. Mice were analyzed after 8 weeks post-transplantation when human 142 CD34+ cells were recovered from the bone marrow. Bone marrow (BM) cells were 143 flushed from the femurs of each mouse into IMDM and the engraftment of the 144 human cells was determined by flow cytometric analysis of CD45+ cells. To enrich 145 for human cells, the bone marrow cells were cultured in a cocktail of human-specific 146 cytokines as described above for one week, then genomic DNA was extracted for 147 integration site analysis. 148

Survey of integration sites. Genomic DNA (2-10 ug) was sheared to an 149 average size of 300 bp - 500 bp using Covaris Adaptive Focused Acoustics 150 performed on an E220 Focused-ultrasonicator (Covaris, Wohurn, MA). The sheared 151 DNA fragments were end-repaired with End-It DNA end-repair kit (Epicentre, 152 Madison, WI). 3’-dA DNA tailing was performed with Klenow DNA polymerase to 153 add a single dA residue to the 3’ of the DNA fragments using the dA-Tailing kit from 154 NEB (Ipswich, MA). A partially double stranded linker with a 5’ T overhang was 155 ligated to the genomic DNA fragments. Specifically, the T-linker 156 (5'GTAATACGACTCACTATAGGGCTCCGCTTAAGGGACT3’, 5'–PO4-157 GTCCCTTAAGCGGAG-NH2-C3’) was ligated to the DNA fragment using T4 DNA 158 ligase (NEB, Ipswich, MA). The first round of PCR was carried out for 30 cycles with 159 standard PCR conditions using LTR-specific and linker-specific primers (MFGS3LTR, 160 5’CCTTGGGAGGGTCTCCTCTGAGT 3’; MFGS5LTR, 5’ATGGCGTTACTTAAGCTAGCTTG 161 3’; Linker-P1, 5’GTAATACGACTCACTATAGGGC3’;). Nested PCR was carried out for 162

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 6: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

6

15 cycles with primers appropriate for sequencing on an Illumina MiSeq/HiSeq 163 (MFGS3LTRnest,5'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGC164 TCTTCCGATCTTGACCATGACTACCCGTCAGCGGGGGTC 3’; MFGS5LTRnest, 165 5'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTC166 AGTTGCAAACCTACAGGTGGGGTCTTTC 3’; PE2_Linkernest, 167 5'CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGAT168 CTNNNNNNAGGGCTCCGCTTAAGGGAC3’; NNNNNN stands for barcodes). HIV 169 primers used include HIV-3LTR, 5’TGTGACTCTGGTAACTAGAGATCCCTC3’; 170 HIV-3LTRnest, 171 5'AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTN172 NNNNNCCCTTTTAGTCAGTGTGGAAAATC3’. The PCR products were purified with 173 an AMPure XP PCR purification kit (Beckman Coulter, Brea, CA). Sequencing was 174 performed using 2x150bp paired-end MiSeq sequencing kits, or 2x105 bp paired-175 end HiSeq sequencing kits from Illumina, following the manufacturer’s suggested 176 protocol (Illumina, San Diego, CA). Integration site junctions were mapped to the 177 human genome with BLAT and custom Perl scripts. Qualifying criteria for authentic 178 integration sites include the following: 1) the sequences must retain the last 5 bp of 179 the LTR sequence, 2) followed by > 20 bp high quality DNA sequence with >95% 180 match with genomic DNA starting within 3bp of the LTR junction, and 3) the paired 181 end sequence representing the sheared breakpoint must match the opposite strand 182 in the genome within 1kb of the mapped LTR junction site. All mapped unique 183 integration sites are listed as UCSC human genome hg19 bed files in supplemental 184 materials (Table S2-S8). Raw sequence files were deposited into the NCBI Sequence 185 Read Archive (accession number PRJNA236553). 186 187 Data analysis. ChIPseq data and microarray data for CD133+ hematopoietic 188 stem cells, CD34+ hematopoietic stem cells, and CD4+ T lymphocytes were 189 downloaded from NCBI (ChIPseq: GSE17312 and GSE12646; microarray: 190 GSM263935, GSM263936, GSM918288, GSM1135118, GSM1132598). ChIPseq data 191 were analyzed with MACS (Model-based Analysis of ChIP-seq) (19) and BEDTools 192

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 7: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

7

(20). Microarray data were analyzed with Parteck Genomic Suite software (Partek, 193 St Louis, MO). 194 Previously published MLV integration data in CD4+ T cells (21) were 195 downloaded from NCBI and mapped to UCSC human genome build using Blat. All 196 data in the analyses were mapped or converted to human genome build hg19. A 197 total of 31,982 unique integration sites from CD4+ T cells were included in the 198 comparison. A custom Perl script was used to generate theoretical random 199 integration sites throughout the human genome (hg19, excluding gap regions) to 200 compare with the vector integration site data. 201 MLV vector integration peaks were identified using MACS software. Each 202 integration site is treated as a tag for MACS input with the following settings: --p 203 0.001 --nolambda --nomodel. The cutoff p value for each peak was set at <0.001. 204 Overlapping MLV vector integration site peaks and ChIPseq peaks were identified 205 with BEDTools or the table browser intersect tool from the UCSC genome web 206 server. Co-occurrence statistics were performed by random shuffling of peaks 207 across the genome using BEDTools followed by a Chi-square test. 208 The association of MLV vector integration sites/peaks with genes is not 209 simple because many of these peaks are outside the bodies of genes and some are 210 relatively far away from genes. However, this information is useful for comparing 211 the MLV vector data obtained in this study to integration site data from other gene 212 therapy trials, most of which were reported in a gene-centric format. If the summit 213 of the MLV vector peak is within a gene, then this peak was assigned to that specific 214 gene. If the summit of the peak is outside a gene, then it was assigned to the closest 215 gene. 216 217 Results 218 219

Mapping of integration sites in transduced CD34+ cells. G-CSF mobilized 220 peripheral blood CD34+ hematopoietic stem cells were transduced with an MLV 221 vector (see Materials and Methods). Genomic DNA isolated from the ex vivo 222 transduced cells was fragmented and the integration sites were selectively amplified 223

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 8: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

8

using linker-mediated PCR. The PCR products were sequenced using the Illumina 224 platform as described in the Methods. We mapped 1,040,345 unique MLV vector 225 integration sites (445319 from the 3’LTR and 595026 from the 5’LTR) from the ex 226 vivo transduced human CD34+ cells. There were 2583 integration sites for which we 227 isolated both the 3’LTR and the 5’LTR junctions, suggesting that the infected cells 228 had a very large and diverse set of integration sites and that most of the integration 229 sites we isolated represent independent events. In most of the analyses, the 3’LTR 230 and 5’LTR datasets were analyzed separately and the results compared to validate 231 the data. As expected, the results of these separate analyses were highly consistent. 232 233

MLV vector integration sites are highly enriched at TSS and active 234 promoters, yet these sites account for only a fraction of the integration sites. 235 In a previous study, we showed that TSS were preferred targets for MLV vector 236 integration compared to either random or HIV lentivector integration sites (8). For 237 this study, we compared the 1 million MLV vector integration sites in CD34+ cells 238 with ~150,000 HIV lentivector integration sites obtained from a control sample of 239 HIV lentivector transduced human CD34+ cells using the same method. 240 Approximately 15% of the MLV vector integration sites were found within ±1kb of 241 TSS in both the 3’LTR and 5’LTR MLV vector datasets (FIG 1). Only 1.3% of the 242 computer-generated random control sites were found in these same regions. This 243 means that there is approximately a 10-fold enrichment of MLV vector integration 244 sites near TSS (FIG 1A). In contrast to MLV vector integration, HIV lentivector 245 integration slightly disfavors TSS (1.1%). Because of the high density of the data, we 246 were able to calculate integration frequency at single base-pair resolution around 247 TSS. The results showed that MLV vector integration sites have a bimodal 248 distribution near TSS, peaking at the +500bp and the -500bp positions (FIG 1B). The 249 upstream peak is larger than the downstream peak. Of note, there is a sharp dip in 250 MLV vector integration sites at TSS (-80bp to +20bp region). HIV integration is 251 profoundly disfavored near TSS, and is more enriched downstream in the gene body 252 than in the region upstream of TSS (FIG 1B). The histone H3K4me3 modification 253 marks active promoters (22, 23), and we asked whether the MLV vector shows a 254

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 9: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

9

preference for active promoters. TSS were divided into active and inactive 255 promoters based on whether an H3K4me3 peak was present near the TSS. In CD34+ 256 cells, 17,600 TSS are marked by H3K4me3 and classified as active promoters, while 257 5,000 TSS are not marked by H3K4me3 and classified as inactive promoters (Broad 258 Institute Human Reference Epigenome Mapping Project) (24). Active 259 TSS/promoters comprise ~15% of the total MLV vector integration sites, whereas 260 inactive TSSs/promoters comprise only 0.14% of the total MLV vector integration 261 sites. While the active TSS regions have 10 times more MLV vector integration sites 262 than the random control, the inactive TSS regions have 2-fold fewer MLV vector 263 integration sites than the random control. We also sorted promoters based on gene 264 expression levels measured by microarray in CD34+ cells (GSM981288). Promoters 265 were put into bins of 100 based on their level of expression (FIG 1C). The MLV 266 vector integration frequencies showed strong positive correlation with the level of 267 gene expression (R2=0.90). 268 Despite the fact that MLV vector integration sites are highly enriched at 269 TSS/active promoters, this only accounts for ~15% of the total integration sites. 270 Only ~25% of the total integration sites are accounted for if the regions are 271 extended to TSS±2.5kb. 272 273 MLV vector integration sites form tight clusters at previously 274

unidentified regions across the genome. Visual inspection of the high density 275 map of the distribution of MLV vector integration sites in the genome showed tight 276 clusters (FIG 2A). Clustering analysis, using Model-based Analysis for ChIP-Seq 277 (MACS) software, generated ~60,000 MLV vector integration site peak regions 278 across the genome (p<0.001), with average peak size of 970 bp. Many of the clusters 279 of MLV vector integration sites are not near TSS. Some are in introns or at the end of 280 the genes; others are in intergenic regions or in gene deserts (FIG 2A). The MLV 281 vector peaks represent only a small fraction of the human genome (55.8 Mb total for 282 all the peaks, or 1.9% of the human genome), suggesting that MLV vector integration 283 targets specific regions of the genome. 284 285

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 10: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

10

MLV vector integration site clusters are strongly associated with active 286 enhancers. Several studies have reported an association of epigenetic marks, 287 including histone modifications, with retroviral integrations (25-30). We compared 288 the distribution of the MLV vector integration sites with the distribution of 289 epigenetic marks that have been reported for human hematopoietic stem cells 290 (GSE12646 for CD133+ cells and GSE17312 for CD34+ cells). MLV vector integration 291 site clusters showed the strongest association with H3K4me1, a known mark for 292 enhancers (FIG 2B). Of the 60,754 MLV vector peaks, 54,014 (89%) overlapped with 293 H3K4me1 peaks. There is also a good correlation of the MLV vector integration site 294 peaks with the size and boundaries of the H3K4me1 peaks. It is common in genome 295 studies to extend the boundary of target regions to find overlapping peaks. 296 However, we defined overlapping peaks using a strict physical overlap of the 297 boundaries with no extensions. There were 50,412 MLV vector peaks that had at 298 least an 80% overlap with the corresponding H3K4me1 peaks. The total size of the 299 overlapping regions of the H3K4me1 peaks and the MLV vector integration site 300 peaks was 52.4 Mb out of the 55.8 Mb. The association is highly statistically 301 significant (p=0, chi-square test). If the 60,754 peaks were randomly placed across 302 the genome, only 7941 of the peaks would overlap with H3K4me1 peaks and the 303 size of overlapped regions would be ~5 Mb (FIG 2C). H3K4me1 islands comprise the 304 vast majority of the total MLV vector integration sites (86%) (FIG 2A, 2D). In 305 comparison, only 8.8% of the random sites and 25.6% of the HIV lentivector 306 integration sites are found in these regions. Although H3K4me1 is an epigenetic 307 mark for enhancers, it is also enriched near TSS/promoters (23). The H3K4me1 308 ChIPseq data in CD34+ cells confirmed that the majority (80%) of the promoter 309 regions (TSS ± 1kb) also have H3K4me1 modifications; these could represent 310 enhancers that are proximal to the TSS. 311 However, despite the fact that most MLV vector integration sites overlap 312 with H3K4me1 peaks, only a modest fraction (25%) of the H3K4me1 peaks overlap 313 with MLV vector integration site peaks. There are two simple explanations. The first 314 is that the MLV vector dataset is smaller than the H3K4me1 ChIPseq dataset. The 315 H3K4me1 ChIPseq dataset has ~18 million sites (reads) whereas the MLV vector 316

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 11: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

11

integration dataset has ~1 million integration sites. Thus, it is possible that some 317 MLV vector integration sites were missed; however, it is unlikely that 75% of the 318 integration sites in H3K4me1 peaks were missed. The second explanation is that 319 MLV vector targets only a fraction of the H3K4me1 marked enhancers. Just as 320 promoters can be classified as active and inactive, enhancers can also be in an active 321 or a poised state. Based on the preference for active promoters, we propose that the 322 MLV vector has a preference for integrating in active enhancers. Active enhancers 323 can be distinguished from poised enhancers by the presence of both the H3K4me1 324 mark and the H3K27ac mark instead of the H3K4me1 mark alone (31). In CD34+ 325 cells, approximately 31,000 H3K4me1 peaks overlap with H3K27ac peaks, denoting 326 active enhancers. Although that is only ~25% of the H3K4me1 peaks, these peaks 327 have ~70% of the total MLV vector integration sites, whereas the majority of the 328 H3K4me1 peaks that represent inactive/poised enhancers account for only ~17% of 329 the MLV vector integration sites (FIG 2D). This translates into approximately a 4-330 fold enrichment of MLV vector integration sites in active enhancers vs. inactive 331 enhancers. In contrast, the matched random dataset showed no preference for 332 active or inactive enhancers. Although theH3K4me1 mark was associated with the 333 highest percentage of the total MLV integration sites, the H3K27ac mark had the 334 highest level of enrichment (x20 fold over random) for MLV integration sites. 335 In addition to histone modifications, histone variants are also important 336 epigenetic marks. Histone H2AZ is commonly associated with enhancers and 337 promoters (23, 32). In hematopoietic stem cells, many of the MLV vector integration 338 site clusters overlap with H2AZ peaks. More than half (52%) of the total MLV vector 339 integration sites are within H2AZ peaks whereas only 4.6% of the random sites are 340 within same region, showing that there is an ~11-fold enrichment of MLV vector 341 integration inside H2AZ islands (FIG 2B). 342 MLV vector integration also showed a positive association with several other 343 epigenetic marks that define active chromatin, including RNAPol II, H3K9me1, 344 H3K27me1, H3K20me1, and H3K4me3. Again, H3K4me3, a histone mark that is 345 associated with promoters, was highly enriched for MLV (x14 fold over random 346 integration), but was associated with only ~20% of the MLV integration sites. MLV 347

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 12: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

12

vector integration showed a strong negative association with the repressive histone 348 mark H3K9me3 and H3K27me3 and no association/or only a weak negative 349 association with H3K36me3, which marks the bodies of actively transcribed genes. 350 These data clearly demonstrate that active enhancers are the major targets of MLV 351 vector integration. 352 In contrast, HIV lentivector integration showed a moderate preference for 353 H3K4me1 and H3K27ac marked enhancers (FIG 2B, 2D). HIV lentivector integration 354 showed strong associations with H3K36me3, H4K20me1, H3K9me1, and 355 H3K27me1, which mark the bodies of actively transcribed genes (FIG 2B) (23). FIG 356 2E shows MLV vector and HIV lentivector integration site distributions in the region 357 (±2kb) near peaks of three positive regulatory epigenetic marks (H3K4me1, H2AZ, 358 H3K4me3), a mark for the bodies of active gene (H3K36me3), and a mark for 359 repressed regions (H3K9me3). 360 361

MLV vector integration site clusters are cell-type specific. Based on the 362 observation that the MLV vector primarily targets active enhancers, and because the 363 activity of many enhancers is cell-type specific (33), we predicted that the 364 distribution of integration sites would show strong cell-type specificity. To test this 365 hypothesis, we compared our MLV vector dataset in CD34+ cells with the published 366 MLV dataset from activated CD4+ T cells (21). Both the CD34+ dataset and the CD4+ 367 dataset had very similar global preferences, such as enrichment at TSS and 368 enhancers. There are ~120,000 and ~96,000 H3K4me1 peaks in CD34+ cells and 369 CD4+ cells, respectively, and about 1/3 of these peaks overlap. We identified CD34+ 370 cell-specific and CD4+ cell-specific peak regions by removing all the overlapping 371 peaks. We then calculated the fraction of MLV integrations from these two cell types 372 in the cell-type specific H3K4me1 peaks. Fold enrichment was calculated by 373 comparison to a random control. As shown in FIG 3A, MLV vector integrations in 374 CD34+ cells show a 7-fold enrichment in CD34+ cell-specific H3K4me1 regions 375 compared to the random dataset. In contrast, MLV vector integrations in CD34+ 376 cells show a 2-fold lower frequency than random in CD4+ cell-specific H3K4me1 377 regions. And vice-versa, the MLV integration sites identified in CD4+ cells show a 4-378

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 13: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

13

fold enrichment in CD4+ cell-specific H3K4me1 peaks but not in CD34+ cell-specific 379 H3K4me1 peaks. Similarly, MLV integrations show cell-type specific enrichment in 380 H3K27ac peaks. 381 Not only can the cell-type preferences be detected at a global level, they are 382 also obvious in the individual clusters. For example, integration near the LMO2 383 gene, which has caused leukemia in patients (1), showed very strong cell-type 384 specificity. It was proposed that activation of LMO2 gives the cells a growth 385 advantage and is a major contributor to leukemia. However, our data show that 386 LMO2 is a preferential target for integration of the MLV vector in CD34+ cells 387 without any growth selection (FIG 3B). Out of the ~1 million MLV vector integration 388 sites, 1273 were found in this region, comprising almost 0.12% of the total 389 integration sites in CD34+ cells. Only 41 sites in the matched random control dataset 390 were found in the same region, suggesting that there is a 30-fold enrichment for 391 MLV vector integrations in LMO2 in CD34+ cells. Furthermore, there were no 392 integration sites in this region in the CD4+ cell dataset. Although the CD4+ dataset is 393 smaller (n=31,982), this difference is extremely significant (p<1e-100, a Chi-square 394 test). The histone modifications are much different in this region in CD34+ cells 395 versus CD4+ cells. There is a large enhancer region upstream of LMO2, which has 396 extensive H3K4me1 and H3K27ac marks in CD34+ cells. In CD4+ cells, the level of 397 H3K4me1 is much lower and the level of the repressive mark H3K27me3 is much 398 higher in this region. These results suggest that the chromatin around the LMO2 399 gene is in an activated state in CD34+ cells, but in a repressed state in CD4+ cells. As 400 would be expected from this difference, microarray data showed that the level of 401 LMO2 RNA was 77-fold lower in CD4+ cells than in CD34+ cells. Similarly, HOXA10 402 is a preferred target for MLV integration in CD34+ cells, but not in CD4+ cells. The 403 level of HOXA10 RNA was 250-fold lower in CD4+ cells than in CD34+ cells, and the 404 positive marks on the enhancer present in CD34+ cells were absent in CD4+ cells 405 (FIG 3C). The gene ITK (IL2-induced T cell Kinase) shows the opposite effect (FIG 406 3D). There are high levels of ITK RNA inCD4+ cells, but not in CD34+ cells (350-fold 407 difference). There are much higher levels of the active enhancer marks H3K4me1 408 and H3K4Ac in CD4+ cells than in CD34+ cells. The propensity of MLV to integrate 409

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 14: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

14

near ITK is much higher in CD4+ cells than in CD34+ cells (adjusted for differences 410 in the sample sizes). TCF7 (FIG 3E) and IL2RG are other examples of genes whose 411 RNA levels are higher in CD4+ cells than in CD34+ cells. In each case there is 412 preferential integration of MLV in CD4+ cells compared to CD34+ cells. 413 Not surprisingly, there were some MLV integration clusters that were 414 present in both CD4+ cells and CD34+ cells. These occur at enhancers that are active 415 in both cell types; for example, enhancers associated with house-keeping genes. In 416 general, enhancer elements marked by H3K4me1 and H3K27ac in both CD34+ cells 417 and CD4+ cells have more integration sites than cell-type specific enhancers. The 418 relative proportion of integrations in these shared clusters varies depending on the 419 enhancer activity in the two cell types. 420 Comparison of MLV vector integration sites ex vivo and in vivo. To 421 investigate the potential risk of clonal expansion associated with the MLV vector in 422 gene therapy trials, human CD34+ cells infected with the MLV vector were injected 423 into NOD SCID γ-immunodeficient mice to allow in vivo stable engraftment, 424 expansion and differentiation of the transduced cells. After 8 weeks, the mice were 425 euthanized, and CD45+ human cells recovered from the mouse bone marrow and 426 analyzed by flow cytometry. There was 20-60% engraftment of human cells in these 427 mice. Integration site libraries were prepared from both ex vivo and in vivo samples 428 (Materials and Methods). A total of 16,293 unique integration sites were mapped 429 from the in vivo library and compared to the integration sites in the ex vivo library. 430 The integration sites in the human cells recovered from the mice showed 431 integration site preferences for active promoters and enhancers that were similar to 432 the ex vivo library. Evidence for clonal outgrowth was assessed at two levels. First, 433 the frequency at each hotspot was calculated. Most of the hotspots observed in the 434 in vivo library were present in the ex vivo library (r=0.65 for the integration 435 frequencies in the same hotspots in the two datasets), suggesting that the 436 integration sites in cells recovered after 2 months of growth in mice were similar to 437 the sites in the founder population. The LMO2 gene was a hotspot for integration in 438 both libraries. The relative frequencies of the integrations at some hotspots are 439

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 15: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

15

enriched or reduced, but there are diverse hotspots in the in vivo library and no 440 obvious clonal expansion was observed. Clonal expansion can also be detected by 441 measuring the relative frequency of specific unique integration sites. We can 442 unambiguously identify clonally expanded cells that carry the same integration site 443 because shearing the DNA prior to PCR amplification produces distinct break points 444 in the amplified host DNA. We found no highly expanded clones after two months of 445 growth in the mouse model. The most abundant clones were less than 0.5% of the 446 population, based on sequence counts. For example, the integration site at 447 chr1+26083757 in gene Man1c1 was recovered 21 different times out of a total of 448 5862 independent 5’LTR integration events, suggesting that at least 21 cells (0.35% 449 of the sample) were derived from the same founder cell. This suggests that two 450 months may not be a sufficient period of time for small differences in the growth 451 potential of the transduced human CD34+ transplanted cells to cause a significant 452 clonal expansion in the xenograft NOD-SCID gamma C mouse model. 453 454 Discussion: 455 MLV integration highly favors regions near TSS (8). However, the 456 mechanisms that underlie that preference have remained elusive. To find host 457 proteins that target MLV integration, Studamire et al identified multiple 458 transcription regulators and chromatin binding proteins that interact with MLV IN 459 using the yeast two hybrid system (12). Recently, bromodomain and extra- terminal 460 domain (BET) proteins (including Brd2, Brd3, Brd4) were identified as the major 461 cellular proteins that interact with MLV IN (9-11). BET proteins interact specifically 462 with MLV IN but not HIV IN. A bimodal tethering model was proposed, in which the 463 C-terminal ET domain of the BET proteins bind MLV IN, and target MLV integration 464 to the TSS through the N-terminal BET protein bromodomain’s interaction with 465 acetylated H3 and H4 tails at the TSS. However, TSS regions only comprise a small 466 fraction of the MLV vector integration sites. Our results show that active enhancers 467 are the major targets of MLV vector integration. This model is compatible with 468 reports that BET proteins are the major targeting protein for MLV integration, and 469 the model explains the vast majority of the MLV vector integration sites. Our 470

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 16: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

16

experiments were done with an MLV vector. However, the viral integration 471 machinery used by the vector is identical to that used by intact MLV, and the 472 parental virus will have the same integration site preferences and targeting pattern 473 as the vector. 474 Histone acetylation plays a key role in regulating chromatin states and gene 475 expression (34). Acetylation is generally associated with transcriptional activation. 476 Acetylated histones are found not only at TSS/promoters, but also at active 477 enhancers. BET proteins are known to bind acetylated histone tails (14). Recently, 478 Zhang et al showed that Brd4 localizes to active enhancers in CD4+ T cells and that 479 enhancer binding is lineage specific (35). These results, taken together with the 480 interaction of the BET protein with MLV IN, explain the enrichment of MLV vector 481 integration sites at active promoters and enhancers (model in FIG 4). Thus, BET 482 proteins bind to MLV IN in the pre-integration complex (PIC) and target it to specific 483 histone acetyl marks at enhancers and promoters. This mechanism allows MLV to 484 integrate preferentially in active enhancers and ensures that the provirus is in an 485 optimal environment for expression in a variety of cell types. . 486 MLV integration is highly enriched near TSS because histone acetyl marks 487 are enriched near TSS. However, our high resolution map of MLV vector integration 488 shows that the region immediately adjacent to the TSS is not a preferred target for 489 integration (FIG 1B). The explanation is that, although histone acetylation is 490 enriched near the TSS of active genes, TSS are nucleosome free and there are no 491 histone acetyl marks for BET proteins to bind. Although it is reasonably clear that 492 the Brd 2, 3, and 4 proteins bind to acetylated residues on histone tails, exactly 493 which modified histone residues are most tightly bound by these three Brd proteins, 494 and the degree to which the binding sites of the various Brd proteins are influenced 495 by other interactions, is not well understood (15). 496 Our results provide a model for MLV vector integration targeting, and have 497 potential applications for genome research. MLV can infect a broad range of 498 different cell types at high efficiency and it is easy to map millions of integration 499 sites. MLV integration can be used to identify cell type specific 500

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 17: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

17

enhancers/promoters and/or to study the in vivo function of BET protein binding 501 sites. 502 Our findings also provide another level of explanation for LMO2 as the target 503 of insertional mutagenesis in the hematopoietic stem cell gene therapy trials with 504 MLV-based vectors. In most of the leukemia cases in the SCID-X1 and Wiskott-505 Aldrich MLV vector gene therapy trials, LMO2 misregulation was caused by the 506 insertion of a DNA copy of the MLV vector, which is thought to confer growth 507 advantage by the action of enhancers and activators within the vector on the nearby 508 gene. The CD34+ cell-specific enhancer at the large LMO2 locus makes the LMO2 509 locus a major hotspot for integration of the MLV vector. In a typical gene therapy 510 trial, ~100 million vector infected CD34+ cells are injected into a patient. If each of 511 the cells was independently infected, at the rate we observed for integrations in this 512 region (0.12%), there would be ~120,000 infused cells that have a copy of the MLV 513 vector integrated in LMO2 locus, considerably increasing the risk of ectopic LMO2 514 expression. This cell-type specific preference for MLV vector integration into LMO2 515 suggests that the risk of LMO2 insertional activation might be reduced in other 516 types of cells. Indeed, Biasco et al compared the effects of using the same MLV-based 517 vector in two clinical trials for ADA SCID, one in which peripheral blood 518 lymphocytes (PBL) were transduced, and the other in which CD34+ hematopoietic 519 stem cells (HSC) were transduced (30). They found that LMO2 was a common 520 integration site (CIS) in the HSC model, but not the PBL model. They also reported 521 cell-type specific integration preferences at other targets, which can be explained by 522 our findings. 523 Our results with the large number of unselected integration sites for an MLV 524 vector are also valuable for the gene therapy field. This dataset provides a baseline 525 for the integration site preferences of an MLV vector in CD34+ cells. We have 526 calculated the top targeted genes/regions with their normalized frequencies 527 (supplemental Table S1). This information can be used to interpret in vivo data. For 528 example, if we were only to look at the integration sites from our in vivo mouse 529 model library, LMO2 would be identified as prominent CIS. However, this does not 530

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 18: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

18

mean there was clonal expansion associated with LMO2 during the 2 month 531 engraftment period in the murine xenograft model. It is simply a founder effect of 532 the integration site preference of the vector during ex vivo transduction of CD34+ 533 cells. When we compared the frequency of MLV vector integration sites near LMO2 534 in the ex vivo and in vivo data, there was no significant increase in the first 2 months 535 of in vivo growth. Because MLV integration strongly favors active enhancers and 536 promoters, it is possible that integration poses a smaller immediate risk of 537 insertional mutagenesis in the CD34+ cells than previously thought because many of 538 the key target genes (like LMO2) are already expressed at a high level. The risk of 539 insertional mutagenesis will become greater when these cells 540 differentiate/reprogram and need to shutdown genes like LMO2. Thus, insertional 541 mutagenesis in the clinical setting of gene transfer into CD34+ cells could be more a 542 problem of “failing to shutdown” instead of “turning on” genes like LMO2. The 543 physiology of the specific genetic disorder being treated with MLV vector likely also 544 plays a role in subsequent growth pressures and selective advantages for certain 545 cell types arising in the patient from the transduced CD34+ cells. It may also be 546 possible to alter the risk posed by insertion at some of the known hotspots by using 547 BET protein inhibitors such as JQ1 during the ex vivo transduction (11). Such 548 inhibitors could reduce the frequency of MLV vector integration in the important 549 regulatory elements, though likely at some cost to the overall transduction 550 efficiency, unless new inhibitors that only affect integration site selectivity can be 551 developed. 552 553 Acknowledgements 554 555 This work has been supported in part by the Intramural Program of the National 556 Institute of Allergy and Infectious Diseases (NIAID), National Institutes of Health 557 (NIH) under project Z01-AI-000644. This work was also funded in part with federal 558 funds from the National Cancer Institute, National Institutes of Health, under 559 Contract No. HHSN261200800001E, and by funding from the Intramural Program of 560 the National Cancer Institute. The content of this publication does not necessarily 561

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 19: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

19

reflect the views or policies of the Department of Health and Human Services, nor 562 does mention of trade names, commercial products, or organizations imply 563 endorsement by the U.S. Government. This work was supported in part by funding 564 from Johnson & Johnson Research Pty Limited (a subsidiary of the Johnson & 565 Johnson group of companies) in the context of a Cooperative Research and 566 Development Agreement (CRADA AI-0167) with NIAID. 567 Adult healthy volunteers signed written informed consent under IRB approved NIH 568 protocol 94-I-0073 for apheresis collection of peripheral blood mobilized CD34+ 569 hematopoietic stem cells. NOD-SCID gamma C mouse xenograft studies of 570 transplanted human CD34+ hematopoietic stem cells were performed under NIAID 571 IACUC approved animal protocol LHD3E. 572 573 References 574 575 1. Hacein-Bey-Abina S, Garrigue A, Wang GP, Soulier J, Lim A, Morillon E, 576

Clappier E, Caccavelli L, Delabesse E, Beldjord K, Asnafi V, MacIntyre E, 577 Dal Cortivo L, Radford I, Brousse N, Sigaux F, Moshous D, Hauer J, 578 Borkhardt A, Belohradsky BH, Wintergerst U, Velez MC, Leiva L, 579 Sorensen R, Wulffraat N, Blanche S, Bushman FD, Fischer A, Cavazzana-580 Calvo M. 2008. Insertional oncogenesis in 4 patients after retrovirus-581 mediated gene therapy of SCID-X1. The Journal of clinical investigation 582 118:3132-3142. 583 2. Howe SJ, Mansour MR, Schwarzwaelder K, Bartholomae C, Hubank M, 584 Kempski H, Brugman MH, Pike-Overzet K, Chatters SJ, de Ridder D, 585 Gilmour KC, Adams S, Thornhill SI, Parsley KL, Staal FJ, Gale RE, Linch 586 DC, Bayford J, Brown L, Quaye M, Kinnon C, Ancliff P, Webb DK, Schmidt 587 M, von Kalle C, Gaspar HB, Thrasher AJ. 2008. Insertional mutagenesis 588 combined with acquired somatic mutations causes leukemogenesis following 589 gene therapy of SCID-X1 patients. The Journal of clinical investigation 590 118:3143-3150. 591 3. Krause D. 2011. Gene Therapy for Wiskott-Aldrich Syndrome: Benefits and 592 Risks. The Hematologist 8:1. 593 4. Ott MG, Schmidt M, Schwarzwaelder K, Stein S, Siler U, Koehl U, Glimm 594 H, Kuhlcke K, Schilz A, Kunkel H, Naundorf S, Brinkmann A, Deichmann 595 A, Fischer M, Ball C, Pilz I, Dunbar C, Du Y, Jenkins NA, Copeland NG, 596 Luthi U, Hassan M, Thrasher AJ, Hoelzer D, von Kalle C, Seger R, Grez M. 597 2006. Correction of X-linked chronic granulomatous disease by gene therapy, 598 augmented by insertional activation of MDS1-EVI1, PRDM16 or SETBP1. 599 Nature medicine 12:401-409. 600

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 20: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

20

5. Schroder AR, Shinn P, Chen H, Berry C, Ecker JR, Bushman F. 2002. HIV-1 601 integration in the human genome favors active genes and local hotspots. Cell 602 110:521-529. 603 6. Ciuffi A, Llano M, Poeschla E, Hoffmann C, Leipzig J, Shinn P, Ecker JR, 604 Bushman F. 2005. A role for LEDGF/p75 in targeting HIV DNA integration. 605 Nature medicine 11:1287-1289. 606 7. Cherepanov P, Ambrosio AL, Rahman S, Ellenberger T, Engelman A. 607 2005. Structural basis for the recognition between HIV-1 integrase and 608 transcriptional coactivator p75. Proceedings of the National Academy of 609 Sciences of the United States of America 102:17308-17313. 610 8. Wu X, Li Y, Crise B, Burgess SM. 2003. Transcription start regions in the 611 human genome are favored targets for MLV integration. Science 300:1749-612 1751. 613 9. Gupta SS, Maetzig T, Maertens GN, Sharif A, Rothe M, Weidner-Glunde M, 614 Galla M, Schambach A, Cherepanov P, Schulz TF. 2013. Bromo- and 615 extraterminal domain chromatin regulators serve as cofactors for murine 616 leukemia virus integration. Journal of virology 87:12721-12736. 617 10. De Rijck J, de Kogel C, Demeulemeester J, Vets S, El Ashkar S, Malani N, 618 Bushman FD, Landuyt B, Husson SJ, Busschots K, Gijsbers R, Debyser Z. 619 2013. The BET Family of Proteins Targets Moloney Murine Leukemia Virus 620 Integration near Transcription Start Sites. Cell reports. 621 11. Sharma A, Larue RC, Plumb MR, Malani N, Male F, Slaughter A, Kessl JJ, 622 Shkriabai N, Coward E, Aiyer SS, Green PL, Wu L, Roth MJ, Bushman FD, 623 Kvaratskhelia M. 2013. BET proteins promote efficient murine leukemia 624 virus integration at transcription start sites. Proceedings of the National 625 Academy of Sciences of the United States of America 110:12036-12041. 626 12. Studamire B, Goff SP. 2008. Host proteins interacting with the Moloney 627 murine leukemia virus integrase: multiple transcriptional regulators and 628 chromatin binding factors. Retrovirology 5:48. 629 13. Kanno T, Kanno Y, Siegel RM, Jang MK, Lenardo MJ, Ozato K. 2004. 630 Selective recognition of acetylated histones by bromodomain proteins 631 visualized in living cells. Molecular cell 13:33-43. 632 14. Nakamura Y, Umehara T, Nakano K, Jang MK, Shirouzu M, Morita S, Uda-633 Tochio H, Hamana H, Terada T, Adachi N, Matsumoto T, Tanaka A, 634 Horikoshi M, Ozato K, Padmanabhan B, Yokoyama S. 2007. Crystal 635 structure of the human BRD2 bromodomain: insights into dimerization and 636 recognition of acetylated histone H4. The Journal of biological chemistry 637 282:4193-4201. 638 15. Hnilicova J, Hozeifi S, Stejskalova E, Duskova E, Poser I, Humpolickova J, 639 Hof M, Stanek D. 2013. The C-terminal domain of Brd2 is important for 640 chromatin interaction and regulation of transcription and alternative 641 splicing. Molecular biology of the cell 24:3557-3568. 642 16. Kang EM, Choi U, Theobald N, Linton G, Long Priel DA, Kuhns D, Malech 643 HL. 2010. Retrovirus gene therapy for X-linked chronic granulomatous 644 disease can achieve stable long-term correction of oxidase activity in 645 peripheral blood neutrophils. Blood 115:783-791. 646

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 21: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

21

17. Zhou S, Mody D, DeRavin SS, Hauer J, Lu T, Ma Z, Hacein-Bey Abina S, 647 Gray JT, Greene MR, Cavazzana-Calvo M, Malech HL, Sorrentino BP. 648 2010. A self-inactivating lentiviral vector for SCID-X1 gene therapy that does 649 not activate LMO2 expression in human T cells. Blood 116:900-908. 650 18. Throm RE, Ouma AA, Zhou S, Chandrasekaran A, Lockey T, Greene M, De 651 Ravin SS, Moayeri M, Malech HL, Sorrentino BP, Gray JT. 2009. Efficient 652 construction of producer cell lines for a SIN lentiviral vector for SCID-X1 gene 653 therapy by concatemeric array transfection. Blood 113:5104-5110. 654 19. Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, 655 Nusbaum C, Myers RM, Brown M, Li W, Liu XS. 2008. Model-based analysis 656 of ChIP-Seq (MACS). Genome biology 9:R137. 657 20. Quinlan AR, Hall IM. 2010. BEDTools: a flexible suite of utilities for 658 comparing genomic features. Bioinformatics 26:841-842. 659 21. Roth SL, Malani N, Bushman FD. 2011. Gammaretroviral integration into 660 nucleosomal target DNA in vivo. Journal of virology 85:7393-7401. 661 22. Hon GC, Hawkins RD, Ren B. 2009. Predictive chromatin signatures in the 662 mammalian genome. Human molecular genetics 18:R195-201. 663 23. Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, Wei G, 664 Chepelev I, Zhao K. 2007. High-resolution profiling of histone methylations 665 in the human genome. Cell 129:823-837. 666 24. Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic 667 A, Meissner A, Kellis M, Marra MA, Beaudet AL, Ecker JR, Farnham PJ, 668 Hirst M, Lander ES, Mikkelsen TS, Thomson JA. 2010. The NIH Roadmap 669 Epigenomics Mapping Consortium. Nature biotechnology 28:1045-1048. 670 25. Wang GP, Ciuffi A, Leipzig J, Berry CC, Bushman FD. 2007. HIV integration 671 site selection: analysis by massively parallel pyrosequencing reveals 672 association with epigenetic modifications. Genome research 17:1186-1194. 673 26. Aiuti A, Biasco L, Scaramuzza S, Ferrua F, Cicalese MP, Baricordi C, 674 Dionisio F, Calabria A, Giannelli S, Castiello MC, Bosticardo M, Evangelio 675 C, Assanelli A, Casiraghi M, Di Nunzio S, Callegaro L, Benati C, Rizzardi P, 676 Pellin D, Di Serio C, Schmidt M, Von Kalle C, Gardner J, Mehta N, Neduva 677 V, Dow DJ, Galy A, Miniero R, Finocchi A, Metin A, Banerjee PP, Orange 678 JS, Galimberti S, Valsecchi MG, Biffi A, Montini E, Villa A, Ciceri F, 679 Roncarolo MG, Naldini L. 2013. Lentiviral hematopoietic stem cell gene 680 therapy in patients with Wiskott-Aldrich syndrome. Science 341:1233151. 681 27. Biffi A, Montini E, Lorioli L, Cesani M, Fumagalli F, Plati T, Baldoli C, 682 Martino S, Calabria A, Canale S, Benedicenti F, Vallanti G, Biasco L, Leo S, 683 Kabbara N, Zanetti G, Rizzo WB, Mehta NA, Cicalese MP, Casiraghi M, 684 Boelens JJ, Del Carro U, Dow DJ, Schmidt M, Assanelli A, Neduva V, Di 685 Serio C, Stupka E, Gardner J, von Kalle C, Bordignon C, Ciceri F, Rovelli A, 686 Roncarolo MG, Aiuti A, Sessa M, Naldini L. 2013. Lentiviral hematopoietic 687 stem cell gene therapy benefits metachromatic leukodystrophy. Science 688 341:1233158. 689 28. Eidahl JO, Crowe BL, North JA, McKee CJ, Shkriabai N, Feng L, Plumb M, 690 Graham RL, Gorelick RJ, Hess S, Poirier MG, Foster MP, Kvaratskhelia M. 691

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 22: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

22

2013. Structural basis for high-affinity binding of LEDGF PWWP to 692 mononucleosomes. Nucleic acids research 41:3924-3936. 693 29. Cattoglio C, Pellin D, Rizzi E, Maruggi G, Corti G, Miselli F, Sartori D, 694 Guffanti A, Di Serio C, Ambrosi A, De Bellis G, Mavilio F. 2010. High-695 definition mapping of retroviral integration sites identifies active regulatory 696 elements in human multipotent hematopoietic progenitors. Blood 116:5507-697 5517. 698 30. Biasco L, Ambrosi A, Pellin D, Bartholomae C, Brigida I, Roncarolo MG, 699 Di Serio C, von Kalle C, Schmidt M, Aiuti A. 2011. Integration profile of 700 retroviral vector in gene therapy treated patients is cell-specific according to 701 gene expression and chromatin conformation of target cell. EMBO molecular 702 medicine 3:89-101. 703 31. Creyghton MP, Cheng AW, Welstead GG, Kooistra T, Carey BW, Steine EJ, 704 Hanna J, Lodato MA, Frampton GM, Sharp PA, Boyer LA, Young RA, 705 Jaenisch R. 2010. Histone H3K27ac separates active from poised enhancers 706 and predicts developmental state. Proceedings of the National Academy of 707 Sciences of the United States of America 107:21931-21936. 708 32. Creyghton MP, Markoulaki S, Levine SS, Hanna J, Lodato MA, Sha K, 709 Young RA, Jaenisch R, Boyer LA. 2008. H2AZ is enriched at polycomb 710 complex target genes in ES cells and is necessary for lineage commitment. 711 Cell 135:649-661. 712 33. Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, 713 Zhang X, Wang L, Issner R, Coyne M, Ku M, Durham T, Kellis M, 714 Bernstein BE. 2011. Mapping and analysis of chromatin state dynamics in 715 nine human cell types. Nature 473:43-49. 716 34. Shahbazian MD, Grunstein M. 2007. Functions of site-specific histone 717 acetylation and deacetylation. Annual review of biochemistry 76:75-100. 718 35. Zhang W, Prakash C, Sum C, Gong Y, Li Y, Kwok JJ, Thiessen N, Pettersson 719 S, Jones SJ, Knapp S, Yang H, Chin KC. 2012. Bromodomain-containing 720 protein 4 (BRD4) regulates RNA polymerase II serine 2 phosphorylation in 721 human CD4+ T cells. The Journal of biological chemistry 287:43137-43155. 722 723 724

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 23: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

23

Figure Legends 725 726 FIG 1. MLV vector integration sites are highly enriched at active TSS and promoter 727 regions. (A) MLV vector and HIV lentivector integration frequency in the region 728 near TSS (± 1kb) are compared to random sites and represented as fold enrichment 729 on the left axis, and percentages of total integration sites on the right axis. (B) 730 Mapping of integration sites from MLV vector (blue), HIV lentivector (brown), and 731 Random (green) within 5000bp upstream and downstream of TSS. MLV integration 732 sites peak at both the promoter region upstream of active TSS (-500bp) and the 733 region downstream of TSS (+500bp). In contrast, no such peaks are observed near 734 inactive TSS/promoters, which have few, if any, H3K4me3 marks (black). There is a 735 sharp dip at the region immediately adjacent to the TSS. HIV integration is reduced 736 near TSS and increases downstream of TSS in the gene body. (C) Promoters are 737 sorted into bins of 100 each based on gene expression level in CD34+ cells. MLV 738 vector integrations are counted in each bin in the promoters/TSS ± 1kb region. 739 740 FIG 2. MLV vector integration sites form tight clusters that co-localize with 741 enhancers and promoters. (A) MLV vector integration clusters on chr11. Integration 742 sites are represented by small ticks with colors denoting the orientation of the 743 provirus. Red represent + strand orientation and blue represent – strand 744 orientation. Some clusters co-localize with promoters/TSS(yellow box). Some are 745 within introns (pink box). Others are intergenic (blue box). However, all MLV 746 integration clusters co-localize with epigenetic marks for active enhancers and 747 promoters including H3K4me1, H3K27ac, and H3K4me3. (B) Percentage and fold 748 enrichment of MLV and HIV vector integration sites within peaks of specific 749 epigenetic marks identified by ChIPseq in CD34+ cells. Fold enrichment was 750 compared to frequencies of random integration sites. (C) Venn diagrams showing 751 the overlap of MLV vector integration site peaks with H3K4me1 peaks or the 752 overlap of random peaks withH3K4me1 peaks. The majority of the MLV vector 753 integration site peaks (89%) overlap with a subset of H3K4me1 peaks. Base pairs 754 that overlap are shown in parentheses. (D) Percentage of integration sites 755

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 24: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

24

associated with active/inactive enhancers identified by H3K4me1 alone or 756 H3K4me1 and H3K27ac together. A total of 87% of MLV integration sites are within 757 enhancers marked by both H3K4me1/H3K27ac or H3K4me1 alone. (E) Integration 758 site distribution around the peaks of specific epigenetic marks. Integration 759 frequencies at each base surrounding the peak summit (± 2kb) were calculated and 760 presented as heatmaps. MLV vector integration is associated with H3K4me1, H2AZ, 761 and H3K4me3. Integration of the HIV lentivector is associated with H3K36me3. 762 Both MLV and HIV vectors avoid repressed regions. 763 764 FIG 3. MLV integration sites are cell-type specific. (A) MLV integration site 765 enrichment in cell-type specific H3K4me1 and H3K27ac peaks. MLV vector 766 integration in CD34+ cells is enriched in CD34+ cell-specific H3K4me1 peaks 767 whereas MLV integration in CD4+ cells is enriched only in CD4+ cell-specific 768 H3K4me1 peaks (left). A similar cell-type specific preference was seen for H3K27ac 769 peaks (right). (B) MLV integration sites are clustered at the enhancer for the LMO2 770 gene in CD34+ cells, but not in CD4+ cells. In CD34+ cells, the enhancer region 771 shows marks that are characteristic of active enhancers: high levels of H3K4me1, 772 H3K27ac, H3K4me3 marks and a low level of the repressive mark H3K27me3. 773 However, in CD4+ cells, the levels of the active marks are much lower and the level 774 of the repressive mark H3K27me3 is higher. (C) CD34+ cell-specific MLV vector 775 integration site cluster near HOXA10 gene. No MLV integration sites were found in 776 this region in CD4+ cells. (D) and (E) CD4+cell-specific MLV integration site clusters 777 near genes expressed inCD4+cells. The CD4+ cell-specific clusters have a much 778 smaller number of integrations compared to the CD34+ cell specific clusters. No 779 integration sites were found in the same window in CD34+ cells (E). 780 781 FIG 4. MLV integration targeting model. MLV pre-integration complex (PIC) is 782 targeted to active enhancer and promoter regions through interaction with BET 783 proteins (BRD2, BRD3, BRD4), which interact with histone acetyl modifications in 784 active enhancers and promoters. 785 786

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 25: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

MLV HIV Random

Expected

Fold

En

rich

me

nt

Ne

ar

TS

S (±

1k

b)

0

2

4

6

8

10

12MLV

HIV

Random

MLV inactive

TSS

Inte

gra

tio

n F

req

ue

nc

y (

CP

M)

TSS-2500bp-5000bp +2500bp +5000bp

0

50

100

150MLV

Random

R2=0.90

0

1000

2000

3000

Inte

gra

tio

n F

req

ue

ncy

Ne

ar

TS

S (

CP

M)

100500 150 200Gene Bins with Increasing Activity

A B C

15%

1.3%

Pe

rcen

tag

e n

ea

r TS

S (±

1k

b)

FIG 1

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 26: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

100 kb hg199,350,000 9,400,000 9,450,000 9,500,000 9,550,000 9,600,000 9,650,000 9,700,000 9,750,000

TMEM41B

IPO7SNORA23LOC644656

ZNF143

WEE1 SWAP70LOC440028

SBF2-AS1

chr11

MLV 3LTR

MLV 5LTR

H3K4me1

H3K27ac

H3K4me3

RefSeq Genes

H3K4me1

H3K4me1H3K27ac

Others

0%

100%

20%

40%

60%

80%

Pe

rce

nta

ge

of

inte

gra

tio

n

MLV HIV Rnd

A

D-2kb +2kb

MLVHIVRnd

MLVHIVRnd

MLVHIVRnd

MLVHIVRnd

MLVHIVRnd

H3K4me1, enhancers

H2AZ, enhancers/promoters

H3K4me3, TSS/active promoters

H3K36me3, active genebodies

H3K9me3, repressed regions

E

C

H3K4me1peaksMLVpeaks

54,0146,740 94,806

117,12352,813 7,941

H3K4me1peaksRandom peaks

(52Mb)

(5Mb)

FIG 2

H3K4me1, enhancers

H2AZ, enhancers/promoters

H3K27ac, active reg elements

H3K9me1, activation

H3K27me1, activation

RNAPolII, transcription

H3K4me3, promoters

H4K20me1, active genebodies

H3K36me3, active genebodies

H3K9me3, repressed regions

H3K27me3, repressed regions

MLV% HIV% Rnd% MLV

fold

Enric

hmen

t

HIV

fold

Enric

hmen

t

B

30.2%

5.5%

9.1%

42.6%

33.1%

12.1%

2.1%

45.6%

51.8%

4.0%

0.9%

10.0%

4.6%

2.3%

8.1%

8.4%

2.4%

1.4%

7.6%

10.2%

7.3%

6.5%

9

11

20

6

4

12

14

3

-1

-3

-4

3

1

4

5

4

5

1

6

5

-2

-7

86.9%

51.8%

46.4%

47.7%

32.7%

28.6%

19.9%

19.6%

9.2%

2.1%

1.7%

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 27: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

CD34+specifc H3K4me1 peaks

CD4+specifc H3K4me1 peaks0

1

2

3

4

5

6

7

CD34+

Integra�onCD4+

Integra�on

Fold enrichment

FIG 3

A

chr11 100 kb hg1933,850,000 33,900,000 33,950,000 34,000,000

34,050,000

LMO2LMO2

MLV CD34+ Cells

MLV CD4+ Cells

H3K4me1 CD4+

H3K27ac CD4+

H3K4me3 CD4+

H3K27me3 CD4+

H3K4me1 CD34+

H3K27ac CD34+

H3K4me3 CD34+

H3K27me3 CD34+

CD34+ cell–specific MLV cluster near LMO2 gene

36

036

0200

023

0

CD4

36

036

0200

023

0

CD34

RefSeq Genes

chr7 50 kb hg1927,150,000 27,200,000 27,250,000

HOXA3HOXA3 HOXA4

HOXA-AS3HOXA5

HOXA6HOXA-AS3

HOXA7HOXA9

HOXA10-HOXA9HOXA-AS4

HOXA10

HOXA11HOXA11-AS

HOXA13HOTTIP

MLV CD34+ Cells

MLV CD4+ Cells

H3K4me1 CD4+

H3K27ac CD4+

H3K4me3 CD4+

H3K27me3 CD4+

H3K4me1 CD34+

H3K27ac CD34+

H3K4me3 CD34+

H3K27me3 CD34+

CD34+ cell–specific MLV cluster near HOXA gene

36

036

0200

023

0

CD4

36

036

0200

023

0

CD34

RefSeq Genes

B

C

CD34+specifc H3K27ac peaks

CD4+specifc H3K27ac peaks

Fold enrichment

CD34+

Integra�onCD4+

Integra�on

0

2

4

6

8

10

12

14

16

CD34

C

CD34+

Integra�onCD4+

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 28: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

chr5:200 kb hg19

156,300,000

156,500,000 156,600,000 156,700,000 156,800,000 156,900,000 157,000,000

MLV CD34+ Cells

MLV CD4+ Cells

H3K4me1 CD4+

H3K27ac CD4+

H3K4me3 CD4+

H3K27me3 CD4+

H3K4me1 CD34+

H3K27ac CD34+

H3K4me3 CD34+

H3K27me3 CD34+

HAVCR1

FAM71B

ITKCYFIP2CYFIP2

FNDC9NIPAL4

ADAM19SOX30

CD4+ cell–specific MLV clusters near ITK gene

36

036

0200

023

0

CD4

36

036

0200

023

0

CD34

RefSeq Genes

chr5 20 kb hg19

133,410,000 133,420,000 133,430,000 133,440,000 133,450,000 133,460,000 133,470,000 133,480,000

TCF7TCF7TCF7

MLV CD34+ Cells

RefSeq Genes

MLV CD4+ Cells

H3K4me1 CD4+

H3K27ac CD4+

H3K4me3 CD4+

H3K27me3 CD4+

H3K4me1 CD34+

H3K27ac CD34+

H3K4me3 CD34+

H3K27me3 CD34+

CD4+ cell–specific MLV clusters near TCF7 gene

36

0

36

0

200

0

23

0

CD4

36

0

36

0

200

0

23

0

CD34

E

FIG 3

D

MED7

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from

Page 29: 1 Enhancers Are Major Targets for MLV Vector Integration. 1 2 3 Suk ...

GenePromoter

RNAPII

BRD2

BRD3

BRD4

Ac Ac

MLV

PIC

Enhancer

BRD2

BRD3

BRD4

MLV

PIC

Ac

Me Me

Ac

H3K4me1

H3K27ac

Enhancer

BRD2

BRD3

BRD4

MLV

PIC

Ac

Me Me

Ac

H3K4me1

Enhancer

BRD2

BRD3

BRD4

MLV

PIC

Ac

Me Me

Ac

H3K4me1

TSS

H3K4me1

H3K4me3

FIG 4

on April 7, 2018 by guest

http://jvi.asm.org/

Dow

nloaded from


Recommended