Cancer Genetic Markers of Susceptibility
Stephen J Chanock, M.D. November 29, 2006
http://cgems.cancer.gov
Mission of CGEMS
Conduct genome-wide SNP scans in Prostate cancer (1 in 8 men) Breast cancer (1 in 9 women) Analyze and publish findings
Rapid sequential replication studies Aggressive timeline Initial scan in nested case-control studies from
Prostate, Lung, Colon, Ovary (PLCO) Project Nurses’ Health Study
Replication Strategy for Prostate Cancer
Initial Study 1150 cases/1150 controls >500,000 Tag SNPs
Replication Study #1
3000 cases/ 3000 controls
Replication Study #2
2400 cases/ 2400 controls
Replication Study #3
2500 cases/ 2500 controls
~24,000 SNPs
~1,500 SNPs
200+ New ht-SNPs
25-50 Loci
Fin e
ly m
appe
d ha
plot
ypes
Power of the first two phases of CGEMS Point wise significance 10-7 ; "genome wide" significance 0.05
1
Power
0.8
0.6
0.4
AdditiveGRR : 1.4
0.10 0.2 0.3 0.4 0.5
Minor Allele Frequency
Recessive GRR : 2
Dominant GRR : 1.5
Multiplicative GRR : 1.3
0.2
0
GRR AA Aa aa
Recessive 2.0 1.0 1.0 2.0
Dominant 1.5 1.0 1.5 1.5 Continuous line : power for direct detection (r2 = 1) Additive 1.4 1.0 1.4 1.8 Dashed line : power for r2 = 0.8
Multiplicative 1.3 1.0 1.3 1.69 Skol et al. Nat Genet (2006)
CGEMS Scans
Prostate Cancer Breast Cancer T
Two Scans One Scan Illumina Illumina
317k 240k 550k (available) (Feb 2007)
(March 2007)
Recruitment Incidence Density SamplingNb. of selections
1st medic. end 1st end 2nd end 3rd end 4th end 5th
Rec
ruitm
ent f
rom
1 c
ente
r
random selection of 5 controls amongthese
random selection of 1 controls amongthese
random selection of 2 controls amongthese
random selection of 3 controls amongthese
random selection of 2 controls amongthese
5 pairs of 1 pair of 2 pairs of 3 pairs of 2 pairs of 3 pairs of prevalent incident incident incident incident incident
cases/controls case/control case/control case/control case/control case/control
1
1 1
1 2
2
2 1
2 1 1
1
as asexamination period period period period period control case
1111111111111111
16 pairs of case/control
25 DNAs to type
5 periods 6 strata
Aggressive Prostate Cancer
• High priority to examine early vs aggressive• Cohort based studies (screening)
– Bias towards early cases • Enrich primary scan with >55%
aggressive:45% early – Aggressive defined as:
• Gleason>7 +/or Stage C/D – Follow-up studies in cohorts
• Comparable distributions for early/advanced
Inclusion in CGEMS from PLCOof prostate cancer patients
1994
Oct 2001
Oct 2003
28 521 eligible participants
Aggressive Cancer0 0
737 624Non-aggressive Cancer
Matching with controls was performed for 737 aggressive cases and 493 randomly selected non-aggressive cases.
Non aggressive : stage <=2 (non invasive) and Gleason score <=6 Aggressive : stage >=3 (invasive) and Gleason score >=7
Distribution of genotyped individuals usedfor the search of association
Prostate cancer status Number of times selected as controls at start of CGEMS project 0 1 2 3 Total
durin
g fo
llow
-up
Always negative 0 1 082 22 1 1 105 "controls"
Diagnosed with 461 26 1 0 488non-aggressive C. 1 177 casesDiagnosed with 673 16 0 0 689aggressive C.
Total 1 134 1 124 23 1 2 282
dropped : 1 XX DNA 4 unexpected dup 1 173
"controls" dropped : 1 XX DNA dropped :
2 unexpected dup 1 XX DNA 3 failed genotype 4 failed genotype
Buccal Cell DNA and InfiniumTM II:ACS:CGEMS Pilot
23 matched blood and buccalArchived Buccal samples (2001/2002 in CPS-II)
Swish with ScopeTM and store after centrifugation
Extracted simultaneously with Autopure (Gentra)
Target 50ng/uL by QDNA (picogreen)4 outliers (0.5ng/uL- 35ng/uL)
HumanHap300 InfiniumTM II protocol Completion 99.02% Concordance 99.96%
PLCO WGS QC Removal of Inconsistent Genotypes
Low Completion Rate (<95%) Duplicates: HapMap & PLCO qc samples
Fitness for HW Proportion in controls Exclusion Cut-off: <0.001
Re-Map SNP Positions Examine adjacent bps of SNPs Heterogeneity in Cases/Controls
Cryptic stratification STRUCTURE (Pritchard) Principal Component Analysis (Price Nat Gen 2006) Study Center (9 for PLCO)
Discordance rate
Mean discordance
rate 2 10-4
Mean discordance
rate 2 10-4
28 individuals(with 24 duplicates)
Mean discordance
rate 1.4 10-3
PLCO CEPH-CGEMS CEPH-49 duplicate pairs 74 duplicate pairs HapMap
log 1
0(p-
valu
e)
log-log quantile plot ofp-value for Hardy-Weinberg proportion
-2
-3
-4
-5
-6
-7 -2-3-4-5-6
20 simulations
Observed values
expected : 244 observed : 586
expected : 2600 observed : 3340
Exact test , 299 779 SNPs log10(quantile)
QQ plot for ~300k SNPsQuantile
0
0.2
0.4
0.6
0.8
1.0
0 0.2 0.4 0.6 0.8 1.0p value
Log1
0(pv
alue
)Log-Log quantile plot for p-value for the 4
statistical tests used307,256 SNPs
-3
-4
-5
Log10(quantile) -3-4-5
Sing. Sampl. No cov
Sing. Sampl. with cov
Incid. Den. Sampl. No cov
Incid. Den. Sampl with cov
-6
Log(p-value)
Log(quantile)
Log/log quantile plot of p value (observed)
0
-2
-4
-6
-6 -4 -2 Log(quantile) 0
-1.9
-2.0
-2.1 -1.9-2.0-2.1
Log(p-value) or Log(quantile)
Log(quantile)
Log(p-value) Genomic control parameter = 0.99
PLCO Recruitment SitesOpportunity to look at
geographic differences
Admixture coefficient in PLCO samples
Asia Method : run merged PLCO data + HapMap data on STRUCTURE with 6000 SNPs having no pairwise r2 and high FST values. The population of origin of the HapMap samples is specified Result : Reliable identification of 3 outliers. They are all three control DNAs. and have to be removed from subsequent analysis
control
case
Africa Europe
Log-Log quantile plot for p-values of 101 SNPs that differentiate the populations of South and North of Europe
0
-1
-2
Log10(quantile) 0-1-2
Sing. Sampl. No cov
Sing. Sampl. with cov
Incid. Den. Sampl. No cov
Incid. Den. Sampl with cov
Seldin et al. PLOS Genetics 2:1339-1351 (2006)
Log1
0(p-
valu
e)
Lactase region Log10(p-value for association) LCT 0
rs2117511
rs6739713
rs1438307
rs6430585
rs9287442
rs1469996 rs4954633
rs2322659
rs2874874
rs3754690
rs3754689
rs309126
rs12478902
rs309160
136300k 136350k 136400k 136450k 136500k
-1
log10(0.05) rs4988235 rs182549
-2
-3
Bersaglieri et al. AJHG 74:1111-1120 (2004) position
-2
-3Log 1
0Pva
lue
Log10Pvalue of the 4 d.f. χ2 test plotted against the position of the 8q24 SNP (rs#1447295)* in build 35
0
-1
-4
-5 126000000 127000000 128000000 129000000 130000000 131000000
mapinfoPosition in build35 *Amundadottir Nat Genet 2006
*Freedman PNAS 2006
Characteristics of the SNPs demonstrating the strongest signal of association in 8q24
+-----------------------------------------------------------+| position Pval HW completion || rsnumber (b.35)
MAF controls rate
|-------------------------------------------------------|298. | rs4242382 128586755 .14 .7604 1 | 299. | rs7017300 128594450 .18 .1629 1 | 300. | rs7837688 128608542 .14 .8663 .999 | 301. | rs1447295 128554220 .14 .6012
1
| +------------------------------------------------------------+
Linkage disequilibrium (r2) with rs1447295 of the SNPs demonstrating the strongest signal of association
r2 with
| rs# position rs1447295 passoc | (b. 35)
| rs4242382 128586755 .94 .00007 || rs7017300 128594450 .71 .00009 || rs7837688 128608542 .84 .00003 || rs1447295 128554220 - .0003 |
Prostate Scan8q24 Region
Genotype RR for Indolent Genotype RR for aggressive
rs number susceptibility
allele allele
frequency Heterozyg. Homozig. Heterozyg. Homozig.
rs1447295 A 0.1 1.08 1.45 1.24 1.46 rs4242382 A 0.1 1.13 1.39 1.27 1.39 rs7017300 C 0.13 1.14 1.63 1.17 1.37 rs7837688 T 0.1 1.14 1.36 1.26 1.54
Key Findings: 1. Comparable risk as original reports in Nat Genet and PNAS
2. Comparable risk for BPC3 (~6500 cases/controls)
3. Discovery of 1 and perhaps 2 additional loci
Value-Added Analysis CGEMS Opportunity to investigate
• Gene:environment • Covariates: BMI, smoking, serum levels
• Multi-SNP Analysis • Gene:gene interactions
• Explore pathways • Follow-up in cohort studies in CGEMS
http://cgems.cancer.gov
CGEMS: caBIG PostingPre-Computed Analysis
Pre-computed Analysis No Restrictions
Raw Genotype Case/control Age (in 5 yrs) Family Hx (+/-)
Registration
Association Finding
Association Finding Report
Population Frequency Report
Acknowledgements
NCI Gilles Thomas Robert Hoover Joseph Fraumeni Daniela Gerhard Kevin Jacobs Zhaoming Wang Meredith Yeager Robert Welch Richard Hayes Sholom Wacholder Nilanjan Chatterjee Kai Yu Margaret Tucker Marianne Rivera-Silva
HSPH David Hunter Peter Kraft
ACS Heather Feigelson Carmen Rodriguez Eugene Calle Michael Thun