+ All Categories
Home > Documents > media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional...

media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional...

Date post: 07-May-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
34
Supplementary Information (SI) Genomic and transcriptional landscape of P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia Cornelia Vesely, 1* Christian Frech, 1* Cornelia Eckert, 2 Gunnar Cario, 3 Astrid Mecklenbräuker, 1 Udo zur Stadt, 4 Karin Nebral, 1 Fiona Kraler, 1 Susanna Fischer, 1 Andishe Attarbaschi, 5 Michael Schuster, 6 Christoph Bock, 6 Helene Cavé, 7 Arend von Stackelberg, 2 Martin Schrappe, 3 Martin A. Horstmann, 4 Georg Mann, 5 Oskar A. Haas, 1,5 and Renate Panzer-Grümayer. 1
Transcript
Page 1: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

Supplementary Information (SI)

Genomic and transcriptional landscape of P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia

Cornelia Vesely,1* Christian Frech,1* Cornelia Eckert,2 Gunnar Cario,3 Astrid Mecklenbräuker,1

Udo zur Stadt,4 Karin Nebral,1 Fiona Kraler,1 Susanna Fischer,1 Andishe Attarbaschi,5 Michael Schuster,6 Christoph Bock,6 Helene Cavé,7 Arend von Stackelberg,2 Martin Schrappe,3 Martin A. Horstmann,4 Georg Mann,5 Oskar A. Haas,1,5 and Renate Panzer-Grümayer.1§

1

Page 2: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

1. Supplementary Methods2. Supplementary Results3. Supplementary References4. Additional Files

1. Supplementary Methods

Characterization of P2RY8-CRLF2 major clone-positive ALL cases

All samples from initial diagnosis and relapse were assessed for the presence of P2RY8-CRLF2 by quantitative PCR amplification of the genomic breakpoint and by RT-PCR as described previously1. Cases from AT were further routinely screened by FISH as indicated in Supplementary Table 1 using the CRLF2 Dual Color Break Apart Probe (Zytovision, Bremerhaven, Germany).

Sample preparation and high-throughput sequencing

In brief, 50 ng of genomic double stranded DNA was enzymatically sheared to an average size of 200 bp. Further library preparation was performed using Illumina Nextera Rapid Capture Kit (Illumina) and 100 bp paired-end sequencing was performed on 12 samples per lane, 2 lanes per sample on a Illumina HiSeq 2000 (Illumina) to reach a coverage of at least 50 x.

MLPA

SALSA MLPA probemix P202-B1 IKZF1 (for IKZF1, IKZF2, IKZF3 and CDKN2A/2B) and P335

ALL-IKZF1 (for IKZF1, CDKN2A/2B, EBF1, Xp-PAR-region, PAX5, ETV6, BTG1 & RB1) (MRC-

Holland, Amsterdam, NL).

SNP Array analysis

The following criteria for the determination of copy number aberrations were applied: For losses, the minimum number of probes was 25 for genome-wide regions and 20 for leukemia-associated regions, whereas the minimum number of probes was 50 covering at least 50 kb for genome-wide gains, and 25 probes for leukemia-associated gains.

QC of WES

Average number of pass-filter reads per sample was 93M (range 40M-228M) (additional File 1). The mean target coverage was 74x per sample (range 30x-126x) and 95% of targeted regions were covered by at least 10 reads on average (range 76%-98%).

WES read alignment and variant calling

2

Page 3: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

Reads were aligned with BWA-MEM v0.7.8-r455 [http://arxiv.org/abs/1303.3997] to human reference genome GRCh37 as provided by the 1000 Genomes Consortium, which includes human decoy sequences and masked pseudoautosomal regions on chromosome Y. Aligned BAM files were processed using GATK v2.82 and Picard v1.118 (http://broadinstitute.github.io/picard) to mark PCR duplicates, realign reads around indels and to recalibrate base quality scores. Somatic point mutations and indels were called using MuTect v1.1.53 and IndelGenotyper2 (http://www.broadinstitute.org/cancer/cga/indelocator), respectively.

WES variant annotation and filtering

Variant annotation and filtering was performed as described previously4. Briefly, variants were annotated with SnpEff version 3.65 and dbNSFP version 2.6.6,7 Predicted deleteriousness of non-silent point mutations is based on a combination of PolyPhen-2,8 SIFT9 and SiPhy10

predictions. Variants were excluded if they (a) had allelic (read) frequency <10%; (b) overlapped with repeats, segmental duplications, or blacklisted regions; (c) were present in at least two remission samples sequenced and processed with the same protocol; (d) corresponded to common SNPs without known medical impact (http://www.ncbi.nlm.nih.gov/variation/docs/human_variation_vcf/#common_no_known, version Aug 26th, 2014); or (e) were present either in G1K11 or EVS [Exome Variant Server, NHLBI GO Exome Sequencing Project (ESP), Seattle, WA (URL: http://evs.gs.washington.edu/EVS/)] with population allele frequency above 1%.

WES CNA detection and filtering

WES-based copy number aberration (CNA) detection was performed using a custom pipeline based on exomeCopy version 1.14.0.12 We largely followed the exomeCopy vignette but incorporated several changes that in our hands improved results on cancer genome samples. Briefly, per-exon background read depth and variance was calculated from remission samples using exomeCopy functions countBamInGRanges and generateBackground. Exons with very high or low read depth variance in remission samples (top and bottom percentile) were excluded. In addition, to help exomeCopy to deal with aneuploid samples, CNA were not called on a per-chromosome basis but on a single “virtual” chromosome created by concatenating all autosomal chromosomes. This step was critical to allow the hidden Markov model-based segmentation algorithm to predict the correct copy number state of larger segmental aberrations that extend over a significant portion of a chromosome, in particular whole-chromosome or chromosome-arm CNAs. Sex chromosomes were processed separately for males and females, with background read depths calculated from only sex-matched remission samples. CNA calling was performed with function exomeCopy, including log background read depth, GC content, squared GC content, and exon width into the regression model. Possible copy number states (parameter S) were set to 0:6 and copy number for the normal state (parameter d) was set to 2. All other parameters were left default.

3

Page 4: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

To further enrich for true-positive somatic CNA, the following filters were applied after calling CNAs: (1) the CNA must not be found in any remission sample, including the patient-matched remission (= germline CNA filter) AND (2a) the log odd score is >= 50 (= high-confidence CNA filter) OR (2b) the CNA is found also in at least one additional tumor sample from the same patient (= conserved CNA filter) OR (2c) the CNA is found also in at least three non-matched tumor samples (= recurrent CNA filter). CNAs from different samples were considered identical if they shared at least 30% of their exons (intersection divided by union). In addition, single-exon CNAs were considered unreliable and removed.

Performance assessment of WES-based CNA predictions

To assess the accuracy of WES-based CNAs, we compared copy number states predicted by WES with those obtained from complementary assays, including PCR, SNP-arrays, MLPA, and FISH. Examined loci included the PAR1 region, IKZF1, CDKN2A/B, and PAX5, as these were the four loci most frequently impacted by CNA in our cohort. PAR1 copy number status was known in all sequenced samples from PCR. IKZF1 and CDKN2A/B status was assessed by MLPA in 47 samples. In addition, high-density SNP-array results were available for 20 samples, providing CN information for all four loci. Three samples (715D, 715R, and GI8R) were excluded from analysis because WES coverage signals were too noisy. A summary of CNAs detected in these four loci by each assay can be found in additional File 4.

Based on CNAs in PAR1, IKZF1, CDKN2A/B and PAX5 we determined the overall sensitivity and specificity of our pipeline as 92% (84 of 91 known events identified) and 100% (no false-positive predictions), respectively (Supplementary Figure 1). In the vast majority of cases WES predictions agree very well with what is seen in SNP-arrays. Seven events were below the detection limit of our pipeline. For example, the PAR1 deletion in sample 961D was missed because it is present on only one of four copies of chromosome X. Three IKZF1 micro-deletions (108R, DL2R, DS10898D) were not reported because they were just above filtering thresholds. In sample 360D the CDKN2A/B deletion was overlooked because exomeCopy erroneously split one small bi-allelic CDKN2A/B deletion into two segments with copy number 0 and 1, both of which were then too small to meet subsequent filtering cut-offs. Of note, all nine known PAX5 deletions were found by our pipeline although some of them were quite small (e.g. only two exons in HV80R and three exons in 715RR). Visual examination of WES coverage plots confirmed coverage signals for all seven false-negative events, suggesting that pipeline sensitivity could be further improved by fine-tuning CNA calling and filtering cut-offs.

RNA-Seq gene expression analysis

Sequencing yield (22 samples) was 25.6 million reads on average (range 12.5-33.8), of which 8.6 million reads (34%) mapped uniquely to protein-coding exons (range 3.6-13.2). Reads were mapped with GSNAP v2014-12-2813 (“--maxsearch=100 --npaths=1 --max-mismatches=1 --novelsplicing=0 “) against GRCh37 (same reference genome as for WES) and assigned to Ensembl gene models (build 75) using HTSeq14 (“htseq-count -f bam -t exon -s no”). Known SNPs and splice sites were extracted from dbSNP build 138 and Ensembl

4

Page 5: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

GRCh37 build 75, respectively. All processed samples passed internal quality control checks (base qualities, mapping rates, duplication rates, 5’-3’ coverage) performed with FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc) and RSeQC15.

After read mapping and counting, DESeq216 was used to generate normalized gene expression profiles (function ‘fpm’, robust=TRUE) and to call differentially expressed genes (nbinomWaldTest, minReplicates=5, cooksCutoff=0.7). For Supplementary Figure 8, DESeq2 normalized gene expression profiles (log2-scale, 30,940 genes) were converted into sample distances using R function ‘dist’ (default parameters) and projected on two dimensions using multidimensional scaling (MDS) as implemented in R ‘cmdscale’ (k=2). Comparing IKN and IKC, DESeq2 identified 262 significantly up- and 322 significantly down-regulated genes (|log2FC| >= 0.8, q <= 1.0E-4) (additional File 5). Between IKD and IKC, 86 significantly up- and 126 significantly down-regulated genes were reported (|log2FC| >= 0.8, q <= 0.25) (additional File 6). Top-50 up- and down-regulated genes from both comparisons (200 genes in total) were combined into a gene-sample matrix and hierarchically clustered (Figure 3A). Genes were clustered using average linkage clustering of Spearman correlation coefficients, and samples were clustered by complete linkage clustering of Euclidean distances. The clustered data matrix was then plotted with R function ‘heatmap.2’, with expression values scaled by row.

Gene set enrichment analysis was performed with the javaGSEA desktop application from the Broad (http://software.broadinstitute.org/gsea)17,18 (v2.0.13) and with gene sets obtained from MSigDB 5.0 (http://software.broadinstitute.org/gsea/msigdb).17 Genes were pre-ranked by DESeq2 p-values from most to least significant, considering the directionality of change (i.e. the most significantly up-regulated genes were at the top, the most significantly down-regulated genes at the bottom). Command line options for xtools.gsea.GseaPreranked included “-collapse false –mode Max_probe –norm meandiv –scoring_scheme weighted –include_only_symbols true –make_sets true –rnd_seed 149 –gui false –nperm 1000 –set_max 5000 –set_min 5”.

The complete RNA-Seq analysis pipeline (including alignment, QC, differential gene expression analysis, and gene set enrichment) was implemented in the workflow management platform Anduril (http://www.anduril.org).19

5

Page 6: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

Supplementary Figure 1. Flow diagram showing the number of cases/samples analyzed by various methods for sequence and copy number alterations and transcriptional profile.

6

Page 7: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

Supplementary Table 1. Experimental assays performed on each sample

7

UPN WES HD SNP

array

MLPA/FISH

RNA- Seq

N7 D xN7 R x x

DL2 D x xDL2 R x x x xGI13 D x xGI13 R x x x108 D x x/x108 R x x/x

108 RR x x xHV80 D x x xHV80 R x x x x

DS10898 D x xDS10898 R x x

B36 D x xB36 R x xGI8 D x x xGI8 R x x x92 D x x x x92 R x x x

HV57D x x xHV57 R x x x737 D x x x/x737 R x x x/x

737 RR x x xVS14645 D x xVS14645 R x x

839 D x x x/x839 R x x x/x x

BB16 D x x xBB16 R x x

GL11356 D xGL11356 R xAL9890 D xAL9890 R x x

AL9890 RR x715 D x x x/x715 R x x x

715 RR x x xSE15285 D x xSE15286 R x x

S23 D xS23 R x x

KE17247 D x x1060 D x x/x x

BJ17183 D x x

UPN WES HD SNP

array

MLPA/FISH

RNA- Seq

841 D x xKT14158 D x x x

400 D x x x379 D x x x242 D x x x/x

1089 D x x/x x948 D x x x/x903 D x x/x833 D x x/x887 D x x/x x365 D x x/x x

HW11537 D xTL14516 D x

360 D x x/x x769 D x x/x506 D x x x

1066 D x x/x961 D x x x/x x802 D x x/x x

5755 D x7839 D x6603 D x7361 D x

14197 D x11898 D x7118 D x

11536 D x13906 D x4558 D x4868 D x

Page 8: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

2. Supplementary Results

UPN Timepoint

Cohort Gene MAF AA change

Impacted domain

108 D relapsing JAK2 0.26 R683G Tyrosine-protein kinase, catalytic domain108 R relapsing JAK2 0.51 R683G Tyrosine-protein kinase, catalytic domain108 RR relapsing JAK2 0.30 R683G Tyrosine-protein kinase, catalytic domainDL2 D relapsing JAK2 0.36 R683G Tyrosine-protein kinase, catalytic domainDL2 R relapsing JAK2 0.41 L611S Tyrosine-protein kinase, catalytic domainGI8 D relapsing JAK2 0.23 R683G Tyrosine-protein kinase, catalytic domainGI8 R relapsing JAK1* 0.72 R93H Band 4.1 domain; FERM domainVS14645 D relapsing JAK2 0.49 R683G Tyrosine-protein kinase, catalytic domainVS14645 R relapsing JAK2 0.42 R683G Tyrosine-protein kinase, catalytic domainB36 D relapsing JAK3 0.45 R657W Tyrosine-protein kinase, catalytic domainB36 R relapsing JAK3 0.48 R657W Tyrosine-protein kinase, catalytic domain839 D relapsing JAK2 0.20 R683S Tyrosine-protein kinase, catalytic domain839 R relapsing JAK2 0.95 R683S Tyrosine-protein kinase, catalytic domainKE17247 D relapsing JAK2 0.30 R683G Tyrosine-protein kinase, catalytic domainS23 D relapsing JAK2 0.25 R683G Tyrosine-protein kinase, catalytic domainHV80 D relapsing CRLF2 0.63 F232C adjacent to Fibronectin type-III1060 D relapsing IL7R 0.16 S185C Fibronectin, type IIIAL9890 D relapsing JAK2 0.55 R683S Tyrosine-protein kinase, catalytic domainHV57 D relapsing JAK2 0.32 R683S Tyrosine-protein kinase, catalytic domainBJ17183 D relapsing JAK2 0.22 R683T Tyrosine-protein kinase, catalytic domainBJ17183 D relapsing JAK2 0.49 V461I Tyrosine-protein kinase, catalytic domainGI13 R relapsing JAK2 0.38 R683G Tyrosine-protein kinase, catalytic domainSE15285 R relapsing JAK2 0.14 R683G Tyrosine-protein kinase, catalytic domainDS10898 R relapsing SYK 0.28 P541L Tyrosine-protein kinase, catalytic domain242 D non-rel JAK1 0.36 T901R Tyrosine-protein kinase, catalytic domain379 D non-rel JAK2 0.61 F694L Tyrosine-protein kinase, catalytic domain400 D non-rel CRLF2 0.13 F232C adjacent to Fibronectin type-III400 D non-rel JAK3 0.48 R657Q Tyrosine-protein kinase, catalytic domain841 D non-rel JAK2 0.27 I682F Tyrosine-protein kinase, catalytic domain1089 D non-rel CRLF2 0.12 F232C adjacent to Fibronectin type-IIIHW11537 D non-rel JAK2 0.37 R789Q Tyrosine-protein kinase, catalytic domainHW11537 D non-rel JAK2 0.35 R683S Tyrosine-protein kinase, catalytic domainKT14158 D non-rel JAK2 0.13 R683S Tyrosine-protein kinase, catalytic domainTL14516 D non-rel JAK2 0.36 R683S Tyrosine-protein kinase, catalytic domain

MAF, mutant allelic frequency; AA change, predicted amino acid change; *, non deleterious mutation.

Table 2. Summary of all cases with JAK/STAT pathway mutations.

8

Page 9: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

Supplementary Table 3. Summary of all cases with RTK/Ras pathway mutations.MAF, mutant allelic frequency; AA change, predicted amino acid change.

MAF, mutant allelic frequency; AA change, predicted amino acid change.

9

UPN Timepoint

Cohort Gene MAF AA change

Impacted domain

839 D relapsing PTPN11 0.29 E76K Src Homology 2 domain839 R relapsing PTPN11 0.37 E76V Src Homology 2 domain715 D relapsing KRAS 0.45 G12D Small GTP-binding protein domain715 R relapsing KRAS 0.55 G12D Small GTP-binding protein domain715 RR relapsing KRAS 0.42 G12D Small GTP-binding protein domain737 D relapsing KRAS 0.35 G12D Small GTP-binding protein domain737 R relapsing KRAS 0.47 G12D Small GTP-binding protein domain737 RR relapsing KRAS 0.39 G12D Small GTP-binding protein domain92 D relapsing KRAS 0.47 G12D Small GTP-binding protein domain92 R relapsing KRAS 0.49 G12D Small GTP-binding protein domainB36 D relapsing KRAS 0.18 G12D Small GTP-binding protein domainB36 R relapsing NRAS 0.55 G12S Small GTP-binding protein domain1060 D relapsing KRAS 0.16 G12D Small GTP-binding protein domainAL9890 R relapsing KRAS 0.31 G12V Small GTP-binding protein domainAL9890 RR relapsing KRAS 0.34 G12GP Small GTP-binding protein domainS23 R relapsing KRAS 0.33 K117N Small GTP-binding protein domainS23 RR relapsing KRAS 0.37 V14GV Small GTP-binding protein domainDS10898 R relapsing KRAS 0.25 G12R Small GTP-binding protein domainHV80 R relapsing NRAS 0.21 G12D Small GTP-binding protein domainHV80 R relapsing KRAS 0.15 A146T Small GTP-binding protein domainGL11356 R relapsing NRAS 0.25 Q61R Small GTP-binding protein domainHV57 R relapsing FLT3 0.35 D839G Tyrosine-protein kinase, catalytic domain903 D non-rel NRAS 0.36 G12A Small GTP-binding protein domain360 D non-rel KRAS 0.11 G12S Small GTP-binding protein domain948 D non-rel KRAS 0.30 G12S Small GTP-binding protein domain769 D non-rel KRAS 0.16 G13D Small GTP-binding protein domain360 D non-rel KRAS 0.41 G12D Small GTP-binding protein domain

Page 10: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

Supplementary Figure 2. Identification and validation of WES copy number alterations. (A) Representative examples of CNAs detected by both WES and SNP arrays. Top to bottom: deletions in PAR1 (#92D), IKZF1 (#737R), PAX5 (#715RR) and CDKN2A (#737R). Dots show normalized copy number probe intensities and exon copy number ratios as computed by ChAS (SNP array) and exomeCopy (WES), respectively. Predicted copy number states are color-coded as gray (diploid), red (gain) and blue (loss). (B) Schematic overview of all IKZF1 deletions detected by WES, SNP array, or MLPA. Gained regions are indicated in red, losses in blue. (C) Gene-specific and overall classification performance of WES CNA predictions. TP, true-positives; FP, false-positives; TN, true-negatives; FN, false-negatives. Sensitivity=TP/(TP+FN); Specificity=TN/(TN+FP); Accuracy=(TP+TN)/(TP+TN+FP+FN).

10

Page 11: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

Considered were all events for which, in addition to WES, copy number status was known from at least one confirmatory assay.

Supplementary Figure 3A. Genome-wide overview of CNAs detected in relapsing cases . Experimental assays used to detect the depicted CNAs in each sample are indicated to the right and include SNP-arrays (A), MLPA (M), and WES (E). If both SNP-array and WES results were available, only SNP-array results are shown. Genomic locations of selected genes that were recurrently lost or gained in this cohort are indicated at the top. Segment colors correspond to copy number states as explained in the figure legend. Abbreviations: UPD, uniparental disomy; n/a, genome-wide CNA profile from WES or SNP array not available.

11

Page 12: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

Supplementary Figure 3B. Genome-wide overview of CNAs detected in non-relapsing cases. Legend see Supplementary Figure2A.

12

Page 13: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

Supplementary Figure 4. Genomic sequence and copy number alterations of 41 P2RY8-CRLF2-positive ALL cases according to their constitutional status. Left part, non-DS cases, right part, DS cases. Within each category, recurrently altered genes in relapsing cases (left) and non-relapsing cases (right) in columns. Non-silent, predicted deleterious sequence and copy number alterations in genes (rows) are listed according to functional groups in diagnostic (top) and relapse (bottom) samples. Mutations are marked by color codes (as indicated) to show their clonal or subclonal nature based on adjusted allelic frequency (adj. AF), predicted functional effects and conservation from diagnosis to relapse. UPN, unique patient number (columns); OG, oncogene; TS, tumor suppressor; DS, Down Syndrome; CN chr21>2, somatic gain of chromosome 21; Sex Chrs. abnorm., copy number aberration of sex chromosomes; TTR, time to relapse.

Comparison of the frequencies of CN and sequence alterations between DS and non DS ALL

cases revealed the underrepresentation of primary lesions (P=0.05) and RTK/Ras pathway

alterations (P=0.047) in DS cases at initial diagnosis and of OG/TS alterations at relapse

(P=0.055), whereby CDKN2A/B deletions were mainly responsible for this effect.

13

Page 14: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

Validation of somatic mutations and deletions by Sanger sequencing

Mutations of JAK3 (exon 15) were validated using the following primers for PCR followed by Sanger sequencing.

JAK3exon15: F: 5´-taggagtttgccaaacagactcttc-3´JAK3exon15: R: 5´-gtgagcactgagggaatgaaagt-3´

For validation of IKZF1 mutations (exon 5) we used the following published PCR primers 16.

IKAROS e4 bF IKAROS e4 bR

F: 5´-aaggagctggcaggtttagtc-3´R: 5´-ggttagccagcaaggacaca-3´

14

Page 15: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

Supplementary Figure 5. Validation of somatic mutations in JAK3 and IKZF1. Top: Chromatograms showing the two mutations changing amino acid R657 in JAK3 in case 400 (left) and B36 (middle). Bottom: Chromatograms showing the three mutations causing amino acid change G158S in IKZF1 in case 92 (top row, left), R143Q in IKZF1 in case DL-2 (middle row, left and center) and R162Q in IKZF1 in case 365 in a minor subclone (bottom row, left).

15

Page 16: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

Supplementary Figure 6. Clonal composition and stability of JAK/STAT (top) and RTK/Ras (bottom) pathway mutations. Symbols represent individual genes (see symbol code) and mutant genes are blotted according to the adjusted allelic frequency (Adj. AF). Mutations in unmatched samples (empty symbols reflect non-relapsing cases, full symbols relapsing cases) are depicted in the left side of the pannel. Kinetics of mutations in matched diagnosis and relapse samples are shown on the right side.

16

Page 17: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

UPN 5755

7839

6603

7361

1419

711

898

7118

1153

613

906

4558

4868

MLPA x x x x x x x x x x xFemale x x x x x xDS x xHDiAMP21 x xSex Chrs. abnorm. +X

CN chr21 3 3 1P2RY8-CRLF2IKZF1IKZF2IKZF3PAX5ETV6CDKN2A/BRBMRD risk group IR IR IR IR IR HR IR SR IR IR IR

Non-relapsing cases (FRALLE)

Supplementary Figure 7. Gene-case-matrix showing CNAs of 11 additional non-relapsing P2RY8-CRLF2 ALL cases identified by MLPA. The matrix shows mutated genes (rows) from all cases (columns) at diagnosis. Blue squares signify micro-deletions, dark blue squares are biallelic deletions, red squares symbolize the P2RY8-CRLF2 fusion.

17

Page 18: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

Supplementary Figure 8. Kinetics of mutations between diagnosis and relapse. Shown are the 13 remaining cases, not represented in Figure 2B. Comparison of adjusted allelic frequency (adj. AF) at initial diagnosis, indicated at the x-axis, and relapse, plotted at the y-axis, of sequence (green), copy number (blue) and P2RY8-CRLF2 (red) aberrations in selected samples. Indicated are exclusively deleterious aberrations. The dashed line indicates the border of subclonal to clonal adj. AF.

18

Page 19: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

Supplementary Figure 9. Impact of IKZF1 status on clinical outcome of P2RY8-CRLF2-positive ALL cases. Kaplan-Maier estimates at 5 years showing the probability of overall survival (pOS; left) and event free survival (pEFS; right) according to IKZF1 gene status (genomically altered ALL cases, red; wt cases, blue). Based on study design, all first events were relapses.

19

Page 20: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

Supplementary Table 4. P2RY8-CRLF2+ leukemia samples harboring IKZF1 deletions and/or sequence mutations.

UPN Time point Group IKZF1 del. Region IKZF1 mut.HV57HV57

Dx other yes ex4-7 -Rel other yes ex4-7 -

HV80HV80

Dx other yes ex4-8 -Rel other yes ex4-8 -

108108108

Dx other yes ex4-8 -Rel other yes ex4-8 -RR other yes ex4-8 -

GI13 Rel other yes whole gene -DL2DL2

Dx other yes ex4-8 R143QRel other yes ex4-8 R143Q

GI8 Rel other yes ex4-7 -DS10898DS10898

Dx other yes ex2-7 -Rel other yes ex2-7 -

KT14158 Dx other yes ex4-8 -B36 Rel iAMP21 yes whole gene -N7N7

Dx other yes ex4-7 -Rel other yes ex4-7 -

9292

Dx iAMP21 yes whole gene G158SRel iAMP21 yes whole gene -

737737737

Dx dic(9;20) yes ex4-7 -Rel dic(9;20) yes ex4-7 -RR dic(9;20) yes ex4-7 -

BJ17183 Dx other yes ex4-8365 Dx other no - R162Q*841 Dx HD yes whole gene -

In bold are IKZF1 alterations resulting in a dominant negative isoform.Group, genetic subgroup of BCP ALL; *, subclonal; Dx, diagnosis; Rel, relapse; RR, subsequent relapse. A horizontal line separates relapsing from non-relapsing cases.

20

Page 21: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

Supplementary Figure 10. Distinct expression profile of samples harboring IKZF1 dominant-negative isoform and biallelic alterations. Samples harboring a dominant-negative IKZF1 isoform (IK6 deletion or sequence mutation, termed IKN) shown in red, IKZF1 deletions leading to haploinsufficiency (IKD) shown in blue, and control samples expressing IKZF1 wild-type (IKC) shown in grey. IKN samples clearly separate from the remaining ones suggesting a strong difference in their transcriptional profile.

21

Page 22: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

Supplementary Figure 11. IKN signature genes enriched in published IKZF1 gene sets. GSEA enrichment in (A) top-300 up- and down-regulated genes after 48h of IK6 overexpression in mouse Ba/F3 cells reported by Ferreirós-Vidal et al. 21; (B) top-300 up- (left) and top-150 down-regulated genes (right) in IKZF1-deleted B-ALL patients as reported in Supplementary Table 3 from Iacobucci et al.22; (C) 164 mouse Ikzf1-repressed genes provided in Table 1 from Schwickert et al. 23. In each plot, black vertical lines mark the positions of gene set genes in a gene list that contains all genes expressed in IKN or IKC and that is ranked by significance of differential expression between IKN and IKC. A normalized enrichment score (NES) is calculated that reflects the degree to which a gene set is overrepresented at the top (up-regulated in IKN) or bottom (down-regulated in IKN) of the entire ranked list. NES and associated p-values are indicated for each gene set.

22

Page 23: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

3. Supplementary References

1. Morak M, Attarbaschi A, Fischer S, Nassimbeni C, Grausenburger R, Bastelberger S, et al. Small sizes and indolent evolutionary dynamics challenge the potential role of P2RY8-CRLF2-harboring clones as main relapse-driving force in childhood ALL. Blood 2012 Dec 20; 120(26): 5134-5142.

2. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 2010 September 1, 2010; 20(9): 1297-1303.

3. Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotech 2013 03; 31(3): 213-219.

4. Malinowska-Ozdowy K, Frech C, Schonegger A, Eckert C, Cazzaniga G, Stanulla M, et al. KRAS and CREBBP mutations: a relapse-linked malicious liaison in childhood high hyperdiploid acute lymphoblastic leukemia. Leukemia 2015 08; 29(8): 1656-1667.

5. Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w(1118); iso-2; iso-3. Fly 2012 04/01;6(2):80-92.

6. Liu X, Jian X, Boerwinkle E. dbNSFP: A Lightweight Database of Human Nonsynonymous SNPs and Their Functional Predictions. Hum Mutat 2011 04/21;32(8): 894-899.

7. Liu X, Jian X, Boerwinkle E. dbNSFP v2.0: A Database of Human Non-synonymous SNVs and Their Functional Predictions and Annotations. Hum Mutat 2013 07/10; 34(9): E2393-E2402.

8. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A method and server for predicting damaging missense mutations. Nature methods 2010; 7(4): 248-249.

9. Ng PC, Henikoff S. Predicting Deleterious Amino Acid Substitutions. Genome Res 2001; 11(5): 863-874.

10. Garber M, Guttman M, Clamp M, Zody MC, Friedman N, Xie X. Identifying novel constrained elements by exploiting biased substitution patterns. Bioinformatics 2009 05/27; 25(12): i54-i62.

11. The Genomes Project C. A map of human genome variation from population scale sequencing. Nature 2010; 467(7319): 1061-1073.

12. Love MI, Myšičková A, Sun R, Kalscheuer V, Vingron M, Haas SA. Modeling Read Counts for CNV Detection in Exome Sequencing Data. Statistical Applications in Genetics and Molecular Biology 2011 11/08; 10(1): 52.

23

Page 24: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

13. Wu TD, Nacu S. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 2010 02/10; 26(7): 873-881.

14. Anders S, Pyl PT, Huber W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 2015 09/25; 31(2): 166-169.

15. Wang L, Wang S, Li W. RSeQC: quality control of RNA-Seq experiments. Bioinformatics 2012 August 15, 2012; 28(16): 2184-2185.

16. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome Biol 2014 12/05; 15(12): 550.

17. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 2005 09/30; 102(43): 15545-15550.

18. Mootha VK, Lindgren CM, Eriksson K-F, Subramanian A, Sihag S, Lehar J, et al. PGC-1[alpha]-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 2003 07; 34(3): 267-273.

19. Ovaska K, Laakso M, Haapa-Paananen S, Louhimo R, Chen P, Aittomäki V, et al. Large-scale data integration framework provides a comprehensive view on glioblastoma multiforme. Genome Med 2010 09/07; 2(9): 65-65.

20. Mullighan CG, Goorha S, Radtke I, Miller CB, Coustan-Smith E, Dalton JD, et al. Genome-wide analysis of genetic alterations in acute lymphoblastic leukaemia. Nature 2007 Apr 12; 446(7137): 758-764.

21. Ferreirós-Vidal I, Carroll T, Taylor B, Terry A, Liang Z, Bruno L, et al. Genome-wide identification of Ikaros targets elucidates its contribution to mouse B-cell lineage specification and pre-B–cell differentiation. Blood 2013 2013-03-07; 121(10): 1769-1782.

22. Iacobucci I, Iraci N, Messina M, Lonetti A, Chiaretti S, Valli E, et al. IKAROS deletions dictate a unique gene expression signature in patients with adult B-cell acute lymphoblastic leukemia. PLoS One 2012; 7(7): e40934.

23. Schwickert TA, Tagoh H, Gultekin S, Dakic A, Axelsson E, Minnich M, et al. Stage-specific control of early B cell development by the transcription factor Ikaros. Nature immunology 2014 03; 15(3): 283-293.

24

Page 25: media.nature.com€¦ · Web viewSupplementary Information (SI) Genomic and transcriptional landscape of . P2RY8-CRLF2-positive childhood acute lymphoblastic leukemia. Cornelia …

4. Additional Files

Additional File 1: WES quality control metrics. Output of Picard’s “CalculateHsMetrics” showing per-sample coverage metrics, including total read numbers, depth of target coverage, and breadth of target coverage.<Excel table “Additional File 1 - WES quality control metrics.xlsx”>

Additional File 2: List of point mutations and indels identified by WES. <Excel table „Additional File 2 - List of point mutations and indels identified by WES.xlsx”>

Additional File 3: List of CNAs identified by WES.<Excel table „Additional File 3 - List of CNAs identified by WES.xlsx”>

Additional File 4: Summary of CNAs detected in PAR1, IKZF1, CDKN2A/B, and PAX5 by WES and confirmatory assays. Confirmatory assays used to validate WES predictions include PCR (for PAR1), MLPA (IKZF1 and CDKN2A/B), FISH (PAX5) and SNP-arrays (all four loci). This table forms the basis for WES performance metrics shown in Supplementary Figure 1C.<Excel table „Additional File 4 - Summary of CNAs detected in PAR1, IKZF1, CDKN2A, and PAX5.xlsx”>

Additional File 5: Genes differentially expressed between IKN and IKC.<Excel table “Additional File 5 - Genes differentially expressed between IKN and IKC.xlsx”>

Additional File 6: Genes differentially expressed between IKD and IKC.<Excel table “Additional File 6 - Genes differentially expressed between IKD and IKC.xlsx”>

Additional File 7: Log2 fold-changes of top-100 differentially expressed genes in discovery and validation cohort.<Excel table “Additional File 7 - Log2FC top-100 genes discovery vs. validation cohort.xlsx”>

Additional File 8: Results of gene set enrichment analysis (GSEA) comparing IKN with IKC and IKD with IKC.<Excel table “Additional File 8 - GSEA results.xlsx”>

25


Recommended