Supplementary appendix - The Lancet appendix ... that grows many classification trees. ... all...

Supplementary appendixThis appendix formed part of the original submission and has been peer reviewed. We post it as supplied by the authors.

Supplement to: Olmos D, Brewer D, Clark J,et al. Prognostic value of blood mRNA expression signatures in castration-resistant prostate cancer: a prospective, two-stage study. Lancet Oncol 2012; published online Oct 9. http://dx.doi.org/10.1016/S1470-2045(12)70372-8.

SUPPLEMENTARY METHODS. Data analysis and statistical considerations Differential expression analysis was applied to the dataset using various groupings. Linear models were determined for each probeset and an estimate for the overall variance calculated by an empirical Bayes approach1. The linear models also included variables to take into account the variance caused by plate, centre and patient cohort. A moderated t-statistic was computed for each transcript cluster with the resulting p values adjusted for multiple testing using Benjamini and Hochberg’s method to control the false discovery rate2. Those transcript clusters with an adjusted p < 0.05 were considered significantly differentially expressed between the groups considered. Latent process decomposition (LPD) was used to classify our samples into groups as has been described in detail previously3. It is an unsupervised Bayesian approach that assigns a probability that each sample belongs to a particular group. There are two steps to this method, the first is to find the optimal number of processes (or groups) in the structure of the data and the second is to assign probabilities for each sample that it belongs to each of the approaches. Both a maximum likelihood and maximum a posteriori (MAP) model using variational LPD are used in the process selection step. The top 500 expression profiles of probesets with greatest variability across the data was scaled to a standard normal distribution and used as input. A sample was said to belong to a particular group if its probability of belonging was greater than 0.5. Hierarchical clustering (HC) analyses carried out by a method similar to that described previously 4. Briefly, HC analyses were used on all tumour samples using the complete gene- level dataset of 17 881 probesets, using an agglomerative HC method (package hclust in the R statistical programming language) using a dissimilarity metric defined by Pearson’s correlation (1 − Pearson’s correlation) and the average agglomerative method that combines clusters where the mean distance between the elements of each cluster are smallest. Principal component analysis was performed with data from each probeset scaled to have unit variance. Gene- enrichment analysis was used to test whether sets of genes defining published prognostic signatures were differentially expressed between different LPD groups and whether LPD genes were expressed in other sets. The approach used is that described in the ‘Limma’5 and is based on the proposals of Mootha et al41. Microarray tissue expression data was obtained from http://biogps.gnf.org. Lists of genes expressed differentially between the LPD defined groups were uploaded into the Ingenuity pathway analysis application (Ingenuity, Mountain View, CA, USA). A score was computed for each network according to the fit of the original set of significant genes. This score reflects the negative logarithm of the p-value, which indicates the likelihood of the focus genes in a network being found together as a result of random chance. The random forest machine learning algorithm was used to develop a 10 probeset signature that would be suitable to be used as a test for LPD group 1 membership6. This algorithm is an ensemble classifier that grows many classification trees. Each tree is built using a bootstrap of the samples, and at each divide a random subset of variables are selected as candidates. Each tree has a “vote”, and the classification with the most votes is assigned to the test sample. Firstly, a dataset with all features in was used to determine the 10 probesets that contribute the most to a classifier. In more detail: a ranked list of most important features was produced for each of a 100 bootstrap sample sets, the 10 probesets with the smallest mean rank across sample sets were selected. Then the dataset was reduced to the gene expression profiles of these 10 probesets and the final classifier produced. Performance was determined by 10-fold cross-validation. The number of input variables tried at each split was set to 6, the number of trees set to 500, and the minimum size of the terminal nodes were set to 1. The associations of LPD expression patterns groups with baseline clinical characteristics were analyzed using Chi-square test or Fisher's exact test, Spearman correlation rank test, or Mann-Whitney U test for categorical, ordinal or continuous variables, respectively. Survival analyses were performed on data from CRPC patients using time of blood drawn to death (events) or last follow up (censored event). Univariable and multivariable analyses were carried out by proportional hazard (Cox) regression analysis. Median overall survival (OS) and the 95% confidence intervals (CIs) for each LPD group were determined with the Kaplan–Meier method and OS curves were compared using the log-rank test. Model selection will be performed on multivariable Cox regression models using Akaike information criteria (AIC) based bi-directional selection to drop non-contributing factors. Date cut-off for this analysis was 31/09/2010. Unless otherwise specified all analyses were performed using R software or SPSS version 16.0 (SPSS Inc., Illinois, USA).

Supplementary Methods References 1. Sartor MA, Tomlinson CR, Wesselkamper SC, Sivaganesan S, Leikauf GD, Medvedovic M. Intensity-based hierarchical Bayes method improves testing for differentially expressed genes in microarray experiments. BMC bioinformatics. 2006;7(1):538. 2. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological). 1995;57(1):289-300. 3. Rogers S, Girolami M, Campbell C, Breitling R. The latent process decomposition of cDNA microarray data sets. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2005:143-56. 4. Lee YF, John M, Falconer A, Edwards S, Clark J, Flohr P, et al. A gene expression signature associated with metastatic outcome in human leiomyosarcomas. Cancer Res. 2004 Oct;64(20):7201-4. 5. Smyth GK. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical applications in genetics and molecular biology. 2004;3(1):1027. 6. Breiman L. Random forests. Machine learning. 2001;45(1):5-32.

SUPPLEMENTARY TABLES

Supplementary table 1. TaqMan probes for selected LPD1 signature or upregulated genes

Gene symbol TaqMan probe ID

TERF2IP Hs00430292_m1

RIOK3 Hs01590882_m1

GABARAPL2 Hs00371854_m1

STOM Hs00925242_m1

CRISP2 Hs00162960_m1

SLC4A1 Hs00175592_m1

SNCA Hs01103383_m1

TFDP1 Hs00955491_gH

IFI27 Hs00271467_m1

RHAG Hs00165623_m1

MMP8 Hs00233972_m1

GYPA Hs00266777_m1

HMBS Hs00609296_g1

TMCC2 Hs01099575_m1

HBB Hs00747223_g1

18s rRNA Hs03003631_g1

Supplementary table 2. Primers used in the detection of TMPRSS2/ERG transcripts in blood

TMPRSS2/ERG transcripts by PCR

Initial 35 cycle PCR primers

ERG exon 5 primer GTAGTTCATCCCAACGGTGTCTG

TMPRSS2 exon 1 primer CAGGAGGCGGAGGCGGA

35 cycle nested PCR primers

ERG exon 5 primer CCCAACGGTGTCTGGGCTG

TMPRSS2 exon 1 primer GGAGCGCCGCCTGGAG

Supplementary Table 3. Significantly and differentially expressed genes between CRPC and AS patients. http://www.icr.ac.uk/array/array.html Supplementary Table 4. Significantly and differentially expressed genes in LPD1 vs LPD2/3/4 http://www.icr.ac.uk/array/array.html Supplementary Table 5. Significantly and differentially expressed genes in LPD2 vs LPD1/3/4 http://www.icr.ac.uk/array/array.html Supplementary Table 6. Significantly and differentially expressed genes in LPD3 vs LPD1/2/4 http://www.icr.ac.uk/array/array.html Supplementary Table 7. Significantly and differentially expressed genes in LPD4 vs LPD1-3 http://www.icr.ac.uk/array/array.html

http://www.icr.ac.uk/array/array.html�

http://www.crukdmf.icr.ac.uk/array/array-DOlmos.html�




Ingenuity Canonical Pathways -log(p-value) Ratio

iCOS-iCOSL Signaling in T Helper Cells 4.60E+00 1.90E-01

Calcium-induced T Lymphocyte Apoptosis 4.55E+00 2.38E-01

Role of NFAT in Regulation of the Immune Response 3.93E+00 1.50E-01

CD28 Signaling in T Helper Cells 3.91E+00 1.75E-01

Methane Metabolism 3.85E+00 1.08E-01

Primary Immunodeficiency Signaling 3.71E+00 2.61E-01

Phospholipase C Signaling 3.69E+00 1.45E-01

Phenylalanine Metabolism 3.29E+00 9.35E-02

Pancreatic Adenocarcinoma Signaling 3.27E+00 1.74E-01

T Cell Receptor Signaling 2.93E+00 1.68E-01FYN,PIK3C2B,PRKCQ,MAPK1,CD3E,PLCG1,BMX, PPP3CC,IKBKB,LCK,JUN,PPP3CB,PPP3R1,ZAP70, NFATC2,PIK3R2,CARD11,TRA@

Signature-genes associated with the pathway

Table S8. Top 10 Ingenuity canonical pathways associated which LPD1 process and significant genes

LPD1 process

PIK3C2B,HLADOA,CD40LG,PRKCQ,CD3E,TRD@, PLCG1,ITPR1,PPP3CC,IKBKB,LCK,CD40,PPP3CB, PPP3R1,ZAP70,ITPR3,HLADOB,NFATC2,IL2RA, PIK3R2,ICOSLG,TRA@

HLA-DOA,PRKCQ,TRD@,CD3E,PLCG1,PPP3CC, ITPR1,LCK,PPP3CB,PPP3R1,ITPR3,ZAP70,NFATC2,HLA-DOB,TRA@

BLNK,FYN,HLA-DOA,CD3E,TRD@,MAPK1, GNB2L1,GNB5,CD79A,IKBKB,LCK,JUN,PPP3CB, PPP3R1,PIK3R2,PIK3C2B,PRKCQ,CD79B,GNA12 PLCG1,PPP3CC,ITPR1,GNAS,ITPR3,ZAP70,NFATC2,HLA-DOB,RCAN3,TRA@

PIK3C2B,FYN,HLA-DOA,PRKCQ,CD3E,TRD@,CDC42, PLCG1,ITPR1,PPP3CC,IKBKB,LCK,JUN,PPP3CB, PPP3R1,ZAP70,ITPR3,HLA-DOB,NFATC2,PIK3R2, CARD11,TRA@

PRDX5,EPX,CAT,MPO,SHMT2,PRDX6,PRDX2

BLNK,LCK,CD19,CD40LG,CD3E,CD40,ZAP70,CIITA, IGHM,IGHG1,CD79A,IGHD

PEBP1,BLNK,FYN,TRD@,CD3E,MAPK1,ARHGEF7, GNB2L1,GNB5,PPP1CB,RHOH,CD79A,TGM2,LCK, PPP3CB,AHNAK,PPP3R1,GPLD1,MYL4,RHOF, ADCY9,ITGB1,ARHGEF12,PRKCQ,CD79B,PLCG1, ITGA5,PPP3CC,ITPR1,MYL6B,GNAS,ZAP70,ITPR3, NFATC2,FNBP1,TRA@

PRDX5,EPX,MPO,DBT,SMOX,GOT2,PRDX6,MAOA, ELOVL6,PRDX2

PIK3C2B,TGFBR1,JAK1,VEGFB (includes EG:7423), PA2G4,MAPK1,TFDP1,CDC42,SMAD3,SIN3A,BIRC5, RAD51,BCL2L1,CYP2E1,CDK4,GPLD1,SMAD4, PIK3R2,MMP9,E2F2

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

��

�

��

�

�

�

�

�

�

�

�

��

�

�

�

�

�

�

�

�

�

�

�

�

� �

�

�

�

�

�

��

� ��

�

�

�

�

��

�

2 3 4 5 6 7

ERG

log2 expression signal

LPD

pro

cess

12

34

� ��

�

�

��

�

�

�

�

�

��

�

�

�

��

�

�

��

�

��

�

�

� �

�

�

�

�

�

�

�

�

��

�

�

��

�

�

�

�

�

�

�

�

� �

��

�

�

�

��

�

�

�

�

�

�

�

�

�

1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6

AR


LPD

pro

cess

12

34

� � �CRPC AS T/ERG RTPCR Positive T/ERG RTPCR Negative

��

�

�

�

� �

� �

�

�

�

� �

�

�

�

� �

�

�

��

�

� �

�

�

�

�

�

�

�

�

�

�

�

� �

�

�

�

� �

�

�

�

�

�

�

�

��

�

�

�

�

�

� ��

�

�

�

�

�

�

�

1.5 2.0 2.5 3.0 3.5 4.0

KLK3 (PSA)


LPD

pro

cess

12

34

A

B

C

S1 (Supplementary Figure 1). Prostate cancer associated genes expression and TMPRSS2/ERG by RT-PCR across the different LPD groups. A. ERG expression array data, B. AR expression array data, C. PSA expression array data.

PC1

PC

2

−100

−50

0

50

100

150

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

� �

�

�

�

�

�

�

�

�

� �

�

��

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

��

�

�

�

�

�

�

�

�

�

−200 −100 0 100 200

Group� one� two� three� four

CRPC.Survillnce� CRPC

Surveillance

0.00

0.01

0.02

0.03

0.04

Cluster Dendrogram

hclust (*, "average")d

Hei

ght

Group B Group C Group A

S2 (Supplementary Figure 2). A. Principal component analysis (PCA) of the data. B. Hierarchical clustering of the expression data.

0 200 400 600 800 1000

0.0

0.2

0.4

0.6

0.8

1.0

Log rank test p − value = 0.00192Days

Pro

porti

on s

urvi

ved

LPD GroupABC

S3 (Supplementary Figure 3). Kaplan-Meier overall survival (OS) curves according to HC group A (red line), HC group B (green line) or HC group C (blue line) membership in CRPC patients. Log-rank test p-value = 0.00192.

TMCC2−Hs01099575_m1

HMBS−Hs00609296_g1

GYPA−Hs00266777_m1

MMP8−Hs00233972_m1

RHAG−Hs00165623_m1

IFI27−Hs00271467_m1

TFDP1−Hs00955491_gH

TERF2IP−Hs00430292_m1

RIOK3−Hs01590882_m1

GABARAPL2−Hs00371854_m1

STOM−Hs00925242_m1

SLC4A1−Hs00175592_m1

SNCA−Hs01103383_m1

CRISP2−Hs00162960_m1

HBB−Hs00747223_g1

LPD group

1234

S4 (Supplementary Figure 4). Heatmap of the Top 5 LPD1 upregulated genes and the 9 signature genes (annotated on the right) evaluated with TaqMan probes in the 94 patient samples. The dendrogram on the top ilustrates the arrangement of patients clusters using Euclidean distance as the distance metric. The vertical dendrogram on the left illustrates the strength of genes clusters within the signature.

Only genes found to be differential expressed (LPD1 vs rest) in our dataset

Principal component 1 (34.03% variance)

Prin

cipa

l com

pone

nt 2

(11.

51%

var

ianc

e)

−60

−40

−20

0

20

��

�

�

�

��

�

�

� �

�

�

�

�

�

�

�

��

�

�

�

�

� �

�

�

�

�

�

��

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

�

��

��

�

�

�

−80 −60 −40 −20 0 20 40

GroupLPD1

� LPD2� LPD3� LPD4

Control

CRPC

Surveillance

S5 (Supplementary Figure 5). Using probesets found to be significantly and differentially expressed between LPD1 and LPD2-4 we performed PCA on a dataset which included normal controls. The first principal component accounting for 34% of variance separated LPD1 from LPD2, LPD3 and LPD4 and the controls without cancer. The second principal component accounting for 12% of variance separated the prostate cancer (PrCa) samples from the controls without cancer. This suggests that blood expression may indicate both PrCa and aggressive PrCa.

Date post:	21-May-2018
Category:	Documents
Upload:	vuongnhu
View:	221 times
Download:	0 times

Supplementary appendix - The Lancet appendix ... that grows many classification trees. ... all...

Documents