Cohort Research in the US NCI
Daehee Kang
Molecular & Genomic Epidemiology Laboratory (MGEL)
Seoul National University
Genesmutation
polymorphismsEnvironment
Diseases
Gene EnvironmentalInteraction
Etiology of Diseases
multi-factorial and complex, involving several interrelated biochemical pathways under
the influence of multiple genes and environmental exposures.
0
0.5
1
1.5
2
1995 1996 1997 1998 1999 2000 2001
Rate Ratio
Source : Korean Central Cancer Registry, 2003
Standardized Cancer Incidence Ratioamong Females in Korea
Breast 166%
Colorectum 147%
Lung 135%
Liver 132%
Stomach 111%
Cervix 86%
(98% with CIS)
GENOMIC COHORT STUDY DESIGN
Exposure Disease
Exposure &
Genome /
Gene-environment
Molecularly defined
disease
Plasma protein profile
Rx / Molecularly
defined disease
Early detection of
molecularly defined
disease
Outcome
Genome /
Gene-environment Disease
Genomic cohort 연구의 5 critical
components
• Goal and objectives:
– Rationale, uniqueness, competitiveness
• Recruitment
– Study design: efficiency (cost-benefit), feasibility
• Baseline & repeated survey
• Biospecimen collection
• Follow-up and case ascertainment
Cohort Studies in the US NCI
• OEEB (Occupation and Environment)– The Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial (PLCO)
– The Agricultural Health Study (AHS)
– Shanghai Women's Health Study (SWHS)
– Formaldehyde exposed workers
• REB (Radiation)– The U.S. Radiologic Technologists Health Study (USRT)
• NEB (Nutrition)– ATBC study
– AARP Diet and Health Study
– Nutrition Intervention Trials in Linxian
• Others: Swedish construction, Swedish BPH patients
PLCO Trial: Study Design
• Screening Centers: 10
• Coordinating Center
• Participants: 155,000– Screening arm (6 years)
• Prostate - PSA & DRE
• Lung - x-ray
• Colorectal - sigmoidoscopy
• Ovarian – CA125 & TVU
– Control arm: usual care
• Gender 50:50
• Age 55-74 years
• Recruitment: 1993-2001
• Screening: 1993-2006
• Follow-up to 2015– Annual surveys
– Mortality searches
Studies in PLCO
• Does early detection reduce mortality?
• Clinical epidemiology
• Dietary risk factors
• Other behavioral risk factors
• Genetic risk factors
• Gene-environment interaction
• Blood-based early disease markers
Shanghai Women’s Health Study (SWHS)
• The large-scale population-based cohort study
• 75,049 Chinese women
• 40 - 70 years old
• 1997 – 2000
• urban Shanghai
SWHS: selected characteristics
10,6799,725
7,628
10,543
20,918
15,449
0
5,000
10,000
15,000
20,000
25,000
40-44 45-49 50-54 55-59 60-64 65-70
Age
No.
of
subje
cts
0
5
10
15
20
25
30
%
13
10,173
20,89027,683
8,1478,036
0
5,000
10,000
15,000
20,000
25,000
30,000
No e
duca
tion
Elem
enta
ry
Middle
scho
ol
Hig
h sc
hool
Ove
r colle
ge
Unkn
own
School
No.
of
subje
cts
0
5
10
15
20
25
30
35
40
%
Employment
M anufacturing, 35,367
, 47%
Agricurtural
, 2,054 ,3%
Profess ion
al, 21,389 ,29%
Cons tructio
n, 357 , 0%
Serv ices ,15,493 ,
21%
Unknow n,282 , 0%
Projected number of incidence cancer cases
Cancer site 2004 2008 2013
All sites
Breast
Lung
Liver
Stomach
Colon
Rectum
Esophagus
Bladder
Pancreas
NHL
1,452
353
154
68
124
133
89
18
18
48
29
2,690
697
320
123
209
258
142
37
46
95
86
4,635
1,201
593
207
357
451
245
63
85
166
146
The Nested Case-Control study of Breast Cancer
Matching variables- Age of baseline ±2 years
- Sample collection date <31 days
- Antibiotic use in the past week
- Previous cancer history
- Menopause status
- Post menopausal
1 : 2
74,942 subjects
353 breast
cancer cases
708
controls
• Biospecimen (e.g., lymphocytes, DNA)
• At least 1,500 cancer cases by 2,010
• Active F/U : cancer registry
• Sufficient amounts of baseline information using
standardized tools
• Intention to participate in Consortium
Genomic cohort 연구의 5 critical
components
• Goal and objectives:
– Rationale, uniqueness, competitiveness
• Recruitment
– Study design: efficiency (cost-benefit), feasibility
• Baseline & repeated survey
• Biospecimen collection
• Follow-up and case ascertainment
Genomic cohort 연구의5 critical components
• Goal and objectives:
– Rationales, uniqueness, competitiveness
– Exposure and/or occupation: AHS, USRT,
– Special groups: SWHS, AARP
– Screening efficacy: PLCO
– Preventive trials: ATBC, Linxian
Genomic cohort 연구의5 critical components
• Recruitment & Study Design– Sample size:
• G-E, G-G, G-G-E
– Efficiency (cost-benefit)• Utilize already existing system: PLCO
• Set up new infrastructure: AHS
– Feasibility
– Catchment area (SWHS vs PLCO),
– Representativeness (PLCO vs SWHS)
– Response rate,
SWHS: Sample collection
* Of the 75,221 study participants, 279 women who did not meet the age eligibility requirements for the study were excluded, resulting in a cohort of 74,942 subjects
No. of eligible
subjects
No. of
participants
Response
rate (%)
Baseline survey (1997–2000) 81,170 75,221* 92.7
Specimen collection
Blood
Urine
Exfoliated buccal cells
74,942*
74,942
18,111
56,831
65,754
8,934
75.8
87.7
49.3
Follow-up survey I (2000–2004)
Lifestyle survey
74,942
72,983
74,768
67,163
99.8
92.0
The EPIDEMIOLOGIC component of
Human Genomic Epidemiology
A fundamental question is how large
should a study be?
What is the study trying to accomplish?
• To estimate the overall effect of a SNP or haplotype?
• To detect gene-environment and gene-gene interactions?
• To protect against false positive findings?
Sample size for 80% power to detect range of genetic odds
ratios for varying prevalences of “at risk”genotype
Control/case ratio=1.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0
250
500
750
1000
1250
1500
1750
2000
2250
2500
2750
3000
No
. cases (
=N
o.c
on
tro
ls)
Probability of "at risk" genotype
OR=1.5
OR=2.0
OR=3.0
Sample size for 80% power to test for
multiplicative and additive interactions
Power = 80%a-level = 5%P(G) = 50%
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00
250
500
750
1000
1250
1500
1750
2000
2250
2500
2750
3000
Prevalence of exposure
No.
cas
es(=
no. co
ntro
ls)
Multiplicative
Additive
Genomic Cohort 연구의5 critical components
• Baseline survey– Sufficient amounts of baseline information
– Standardized tools
– Data collection (self-administered vs interview)
– Data entry: OCR, OMR, computer-based
– Related DB
• Repeated survey– Frequency
– How much information be collected
Materials collection in the PLCO Trial
Exam Risk Usual Viable TumorCycle Factors Diet Serum Plasma RBC DNA Cells Sample
Intervention Arm
Baseline X X X X X X
Year 1 X
Year 2 X
Year 3 X X X X X X
Year 4 X X X
Year 5 X X X X
2004-2008 P P
Non-intervention Arm
PX X X (X: in place; P: proposed)
Genomic Cohort 연구의5 critical components
• Biospecimen collection– Types & amount
• Whole blood
• Cryopreserved WB
• Isolated lymphocyte
• EBV transformation
• Serum, plasma, packed RBC
• Sputum, nail, hair, exfoliated cells (filter paper)
• Buccal cell DNA
• Urine
• Fresh frozen tissues
– Storage: LN2
• Follow-up and case ascertainment
SWHS: Sample storage
Vanderbilt University (VU), Shanghai Cancer Institute (SCI), National Cancer Institute (NCI)
VU SCI NCI
Sample from the first 20,000 women
Plasma (4 x 2ml/subject)
WBC (2ml/subject)
RBC (2 x 2ml/subject)
Urine (7 x 4ml, 1 x 20ml/subject)
2
1
1
2
2
0
1
3
0
0
0
4x4ml, 1x20ml
Samples from the remaining women
Plasma (4x 2ml’subject)
WBC (2 x 2ml/subject)
RBC (2 x 2ml/subject)
Urine (6 x 4ml/subject)
Buccal cells (2 x 2ml/subject)
1
0
1
1
1
1
1
0
2
0
2
1
1
3
1
PLCO : Whole Blood
Non-viable cells by days to freeze
0
5
10
15
20
25
30
Day 1 Day 2 Day 3 Day 4
Leukocytes
Lymphocytes
% Non-
viable
40
50
60
70
80
90
100
Fresh
Cryopreserved
1 d < 1 m 6-8 m 18-20 m
PLCO: Cryopreserved Blood:
Lymphocyte Viability (%)
CD45-gated Lymphocytes
Cohort study is NOT merely an
assembly or linkage of existing
records or simply collecting and store
the data