+ All Categories
Home > Documents > Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl...

Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl...

Date post: 27-Sep-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
26
in Cancer Epidemiology. 1.42.Lyon: IARC, 1997, i, Mocarelli P, Gerthoux eedham L, et al. Serum ns and breast cancer risk nen's Health Study. En- :ct 2002;1.10:625-28. rs CI, \Wang LE, Guo Z, :pair of tobacco carcino- adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce ,n of Biomarkers in Can- IARC Sci Publ No. 142. pp 1-18. Concepts in Cancer Epidemiology and Etiology PAGONA LAGIOU, DIMITRIOS TRICHOPOULOS, AND HANS-OLOV ADAMI Epiderniology has been a powerful tool in understanding the logic underlying cancer epidemiology, however, central concepts in epidemiology-the study of disease etiology-will be reviewed. 'We examine cohort and case-control studies (with spe- cial reference to studies of genetic epide- miology), we considerthe impact of chance and systematic errors (confounding and bias), and we trace the process of causal reasoning. Familiarity with these concepts is essential for critical reading and under- standing of the chapterson specific cancers. A glossary found at the end of the chapter provides a summary of definitions for words in italics. 'the identification of causes of infectious :..tre frequently, but not always, of infectious ETIOLOGY Causality The definition of a causeshould apply to all diseases, whether defined on the basis of a pafticular exposure, such as many infectious and occupational diseases, or documented by ^ constellation of clinical andf or lab- oratory findings-for example, malignant diseases and the elucidation of the condi- tions underlying epidemic outbreaks that edology. Around the middle of the twenti- eth century, first in the United Kingdom {Doll and Hill, 1950) and later in the United Statesand the rest of the world (Wynder and Graham, 1950; Clemmesen and Niel- *en, 1.9 57;MacMah on,'1.9 57 ), epidemi ol ogy expanded in scope by focusing also on the etiology of chronic diseases, irrespectiveof the nature of the causal agents. Since then, epidemiology has developed and matured to become a rich and powerful toolbox for the study of biologic phenomenain humans. !(lith a number of fine textbooks nowa- days available to students of epidemiology {for instance Miettinen, 1985; Hennekens snd Buring, 1,987; 'S7alker, 1,991,; MacMa- hon and Trichopoulos, L996; Rothman and Greenland,1,998; Rothman, 2002; and sev- eral others), this chapter is not intended to expand on methods or quantitative considerations. For the purpose of better 1,27
Transcript
Page 1: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

in Cancer Epidemiology.1.42. Lyon: IARC, 1997,

i, Mocarelli P, Gerthouxeedham L, et al. Serumns and breast cancer risknen's Health Study. En-:ct 2002;1.10:625-28.rs CI, \Wang LE, Guo Z,:pair of tobacco carcino-adducts and lung canceridemiologic study. J NatlZzl764-72.

Toniolo P, Boffetta P,man N, Hulka B, Pearce,n of Biomarkers in Can-IARC Sci Publ No. 142.pp 1 -18 .

Concepts in Cancer Epidemiology and Etiology

PAGONA LAGIOU, DIMITRIOS TRICHOPOULOS,AND HANS-OLOV ADAMI

Epiderniology has been a powerful tool in understanding the logic underlying cancerepidemiology, however, central conceptsin epidemiology-the study of diseaseetiology-will be reviewed.

'We examine

cohort and case-control studies (with spe-cial reference to studies of genetic epide-miology), we consider the impact of chanceand systematic errors (confounding andbias), and we trace the process of causalreasoning. Familiarity with these conceptsis essential for critical reading and under-standing of the chapters on specific cancers.A glossary found at the end of the chapterprovides a summary of definitions forwords in italics.

'the identification of causes of infectious

:..tre frequently, but not always, of infectious

ETIOLOGY

Causality

The definition of a cause should apply to alldiseases, whether defined on the basis of apafticular exposure, such as many infectiousand occupational diseases, or documentedby ^ constellation of clinical andf or lab-oratory findings-for example, malignant

diseases and the elucidation of the condi-tions underlying epidemic outbreaks that

edology. Around the middle of the twenti-eth century, first in the United Kingdom

{Doll and Hill, 1950) and later in the UnitedStates and the rest of the world (Wynderand Graham, 1950; Clemmesen and Niel-*en, 1.9 57; MacMah on,'1.9 57 ), ep idemi ol o gyexpanded in scope by focusing also on theetiology of chronic diseases, irrespective ofthe nature of the causal agents. Since then,epidemiology has developed and maturedto become a rich and powerful toolbox forthe study of biologic phenomena in humans.!(lith a number of fine textbooks nowa-days available to students of epidemiology

{for instance Miettinen, 1985; Hennekenssnd Buring, 1,987;

'S7alker, 1,991,; MacMa-

hon and Trichopoulos, L996; Rothman andGreenland, 1,998; Rothman, 2002; and sev-eral others), this chapter is not intendedto expand on methods or quantitativeconsiderations. For the purpose of better

1,27

Page 2: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

t28 BACKGROUND

tumors, connective tissue disorders, or psy-choses. In terms of a particular individual,exposure to a cause of a disease implies thatthe individual is now more likely to developthe diseasg- although there is no certainrythat this will happen. The complexiry ofbiological phenomena and our ignoranceor limited understanding of many of theunderlying processes hinder a deterministic,logically unassailable, explanation of dis-ease causation. Hence, causation of diseasecan only be conceptualized in a probabilis-tic (stochastic) sense that involves srarisricalterms and procedures. For instance, whileheavy smokers are much more likely to de-velop lung cancer than nonsmokers, mostsmokers never develop lung cancer and somenonsmokers do.

In epidemiology, there are several modelsof causality that have been applied to helpclarify the role of various exposures in theetiology of disease. The causal pies pre-sented by Rothman (1976) provide perhapsthe most coherent approach to conceptual-izing causality in a variety of epidemio-logic settings (Rothman, 1,986ir. Each ofthese pies describes a set of exposures thatwork together on the same pathway tocause disease (Fig. 5-1). Different expo-sures may occur within a short time span, ormay happen decades apaft. Once every ex-posure in a causal pie has occurred, that is

the pie is complete, disease is, in a deter*rministic context, inevitable. Table 6-t prwvides a summary of the attributes of th*'icausal pie model.

Causality is rarely, if ever, characterizedtby a simple one-to-one correspondence lx'.,fween a particular exposure and a specifiedisease. If so, the presence of the exposurewould be both necessary and sufficient fsr:;,the occurrence of the disease. By necessarywe mean that the disease cannot occurwithout the presence of that exposure (al. rthough other exposures may be required forthe occurrence of the disease). By sufficientwe mean a set of exposures that inevitablyproduce disease. There may of course bcdifferent ways by which one could get dis':ease, and thus sufficient causes may not k,lnecessary.

In cancer epidemiology, the only known,examples of exposures that are sufficient tocause disease refer to the genetic origin ofsome rare cancers due to dominant geneiwith complete penetrance. In this instance,the causal pie would require only one factotfor the pie to be complete and this would h

sation of a certain disease. Even powerfulexogenous factors, such as life-long heavy

ftrtairry (al

, chror

the way that carriers would get the specific , le injucancer. Also rare is the existence of single:. r velfactors that are in and by themselves suffi.., 5; Rocient (although not necessary) for the cau-,, hon r

fF l .

BLNI

* Rothr

ion

;able

Pre'

ing,con, d .

Figure 6.1. The causal pie modef describes a set of exposures that work together in the samepathway to cause disease. These are hypothesized ways in which a series of exposures couldinteract biologically over time to cause disease. This figure provides an example of suffi-cient causes from cancer epidemiology. Tobacco is an established component cause in manycases of oral cancer. However, tobacco use by itself is not enough for the disease to occur; inaddition, oral cancer can occur among people who have never used tobacco. In a givencausal pie, the complementary exposures can occur simultaneously, or many years apart. Ifeven one of the component causes did not occur, disease would be prevented by this pathway,although a person could develop the disease by another mechanism (a different causal pie).

ile the,addit

n P ,cervlc;being:ncers d

theill cases.' Ilor m,lnry cau{.SousalPi,In examlIuggeste(the first '

Page 3: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

CONCEPTS IN CANCER EPIDEMIOLOGY AND ETIOLOGY

Table 6-1. Aaributes of the causal pie

129

ATTRIBUTE DESCRIPTION

Inevitability

Causality

llurden of disease

'femporaliry .,

lnteraction

Attributable fraction

Disease prevenrion

Completion of a sufficient cause (causal pie) is synonymous with eventual occurrence(though not necessarily diagnosis) of the disease.

A component cause (piece of a causal pie) can involve presence of a detrimentalexposure or absence of a preventive exposure.

The amount of disease caused by a sufficient cause depends on the prevalence of allcomplementary component caus€s.

Component caus€s can act far apart in time

Component causes in the same pie interact biologically to cause disease

Different component causes are responsible for more than 100 percent of disease cases.

Blocking the action of any component cause prevents completion of the respectivesufficient cause and therefore prevents disease by that pathway.

disease is, in a deter-itable. Table 5-1 pro-the attributes of the

if ever, characterizedle correspondence be-:posure and a specificsence of the exposure;ary and sufficient fordisease. By necessary

lisease cannot occurof that exposure (al-

es may be required fordisease). By sufficientosures that inevitably:re may of course beich one could get dis-:nt causes may not be

logy, the only knowns that are sufficienr tothe genetic origin of

re to dominant genesrnce. In this instance,equire only one factorlete and this would bewould get the specifiche existence of singleC by themselves suffi-ecessary) for the cau-sease. Even powerfulich as life-long heavy

ther in the same:xposures could.ple of suffi-r cause in many:ase to occur; inIn a givenyears apart. If

ry this pathway,:nt causal pie).

Source: Rothman. 1976.

smoking, and strong genetic influences, likethose conveyed by dominant breast cancergenes, do not always cause disease in anindividual.

Certain exposures are by definition nec-essary (although not sufficient) for the oc-currence of a particular disease. For exam-ple, chronic lead disease cannot occur in theabsence of lead exposure, and a motor ve-hicle injury requires the involvement of amotor vehicle (MacMahon et aL,1,950;Hill,1965; Rothman, 1.975; Susser, 1991,; Mac-Mahon and Trichopoulos, t996). Again,while these represent necessory causes, thereare additional cofactors that must work inconcert before disease is inevitable. Mosthuman cancers can occur via several path-ways, so it is hard to define any singlenecessary cause. Asbestos, in relation tomesothelioma (cancer of the pleura), andhuman papillomavirus infection, in relationto cervical squamous cell cancer, are closeto being necessary. However, cases of thesecancers do arise without the exposure beingdocumentable, either because the exposureoccurred but could not be identified, or be-cause these exposures are not necessary forall cases.

For most diseases, there is no one neces-sary cause. Indeed there may be numerouscausal pies by which disease can occur. Suchan example is illustrated in Figure 5-1, withsuggested sufficient causes of oral cancer. Inthe first example, exposure to tobacco and

alcohol over time are contributing factors(component cnuses) in the etiology of oralcancer. However, the oral cancer would nothave occurred in the presence of a dentalvisit that could have treated precancerouslesions and might have prevented the disease.\7hile smoking is a component cause inmany causal pies for oral cancer, people canget oral cancer without smoking, as shownby the second causal pie in this figure.

Interventional Epidemiology

How do we design a scientific study to eval-uate whether a particular exposure (forexample, asbestos) is a cause of a specificdisease (for example, lung cancer)? To un-derstand the most appropriate design inpractice, it is useful to begin by describingthe ideal scientific study. Imagine for a mo-ment that we have access to a time machine.

In an imaginary study, we follow a groupof individuals from birth to death, whereeveryone is exposed to asbestos, and weobserve whether they develop lung cancer.We then send everyone back in a time ma-chine, to live the exact same lives they lived,except that we completely remove asbestosfrom the environment so that no one is ex-posed. I7e then compare whether there arechanges in the frequency of occurrence oflung cancer before and after use of the timemachine. Since the same people live identi-cal lives but for the presence/absence ofasbestos, any difference in the frequency

Page 4: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

130

may be attributed to alterations in the ex-posure to asbestos, which leads to the def-inition of cause.

How then can we develop the time ma-chine analogy into a realistic epidemiologicapproach? \7e could study two groups ofpeople who are comparable on every char-acteristic, except that one group had expo-sure and one did not. The randomizedcontrolled trial closely approximates thisgoal. By randomly allocating who receivesan exposure, for example treatment, andwho does not, the exposure occurs onlybecause the investigator has assigned it.For example, an investigator randomly as-signs one group of people to receive vitaminE supplements (exposed), while the othergroup receives a placebo (unexposed). Studyparticipants are then followed forward intime to see whether they develop cancer.Ifhether someone receives vitamin E thendoes not depend on whether or not thesubject, for example, smokes, drinks, eats ahigh-fat diet, or has a certain genetic sus-ceptibiliry.

In this wly, the randomization in a trialmakes the two (or more) groups, those ex-posed and those unexposed, comparable onother study factors that might cause the dis-ease. Hence, the unexposed group is a proxyof what would have happened to the ex-posed group if they had been unexposed-that is if we could have sent them back inthe time machine. Comparability is essen-tial in order to ascribe any changes in thefrequency of disease to alterations in theexposure.

\fhile some researchers describe the ran-domized controlled trial as the gold-standard of scientific studies, this design isimpractical in the majority of epidemiologicsituations. For one thing, most exposureswe study are detrimental. If we want tostudy the impact of asbestos on lung cancer,we cannot ethically randomize people tolive in a house with asbestos. But even forexposures that are not necessarily. detri-mental, randomization may be difficult orimpractical. For instance, it is very difficultand expensive to randomize alarge group toeat a low-fat versus a normal diet, and have

everyone comply with this allocation overthe course of many years. Most trials are

. thus only conducted for no more than a fewyears, an unrealistically short period to testthe effect of most exposures because of thelong latency between exposure and diag-nosis of cancer. Furthermore, in many ran-domized trials, subjects become noncom-pliant over time-that is people allocated tothe intervention arm stop taking the inter-vention, and those in the original placebo orusual care arm may adopt the intervention(a phenomenon called cross-over). This di-minishes the contrast between the originalrandomized groups, reducing the power todetect a difference in disease rates betweenthe groups.

Because of the limitations of the ran-domized controlled trial, the observationalcohort and case-control designs are exten-sively utilized in epidemiology. As will bediscussed later in the chapter, attention toboth the design and analysis of these studiesmay allow us to approximate the standardsof comparability, necessary to validly eval-uate the effect of an exposure on the fre-quency of a disease.

Observational Epidemiology

The essence of observational epidemiologyis the noninteruentional investigation of dis-ease causation in human population groups.The argument is that only by studyinghumans is it possible to draw confident con-clusions about normal or pathological pro-cesses concerning humans (MacMahon,7979;MacMahon and Trichopoulos, 1,996).In vitro studies, such as those involving cellcultures, and studies in laboratory animalsare valuable. They are indeed indispensablewhen toxic exposures or invasive proce-dures like repeated biopsies are needed forthe study of physiologic or pathologic pro-cesses, such as carcinogenesis. However, invitro systems are frequently artificial, andthere are physiological and metabolic dif-ferences between humans and laboratoryanimals that hinder interspecies analogies.These analogies are further complicated bythe unavoidably limited number of animalsused in laboratory studies and the relatively

CON(

short life span of theswhich impose the adndoses of suspected agenate a sufficient numbersequently, questionableolations to humans hav

Even when experimerandomized controlled rethical, they are, withpractical because most (

their latent period, thatexposure to a cause andclinical disease, is long,essary to enroll unrealbers of compliant volunperiod (Hennekens aMacMahon, 1979; Mchopoulos, 1996).

Observational, that rstudies represent the maepidemiology. Such sttment causal relations ociations berween particcancer or other diseasesation on the basis ofwhen the associationbiologically credible-eancer, or hepatitis B '

cer, are striking exampldifficult when the associeompelling but the epence weak-for exampllevel ionizing radiatiorpassive smoking and Iinterpretation also be,when the epidemiologicconvincing but the bicuncertain, as it is withgnd colorectal cancer ol{ancer. I7hen an epiderir weak, is derived frontionable quality, and flYgcuum, inferring caus:

STUDY D

Descriptive Studies

It is possible to distin;epidemiological studies*nnlytic. In descriptive s

BACKGROUND

Page 5: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

CONCEPTS IN CANCER EPIDEMIOLOGY AND ETIOLOGY

rhort life span of these animals, both of of occurrence of a disease (incidence)-which impose the administration of high or of death from a disease (mortality)-is

' doses of suspected agents in order to gener- estimated in a population, by routinely*te a sufficient number of outcomes. Con- available time, place, andf or group charac-

'ffquently, questionable quantitative extrap- teristics. Descriptive studies are essentiallyolations to humans have to be undertaken. exploratory and hypothesis generating.

Even when experimental studies, such as For instance, descriptive studies that docu-rsndomized controlled trials, in humans are mented the increasing trend of lung can-ethical, they are, with few exceptions, im- cer incidence among men, but not amongpractical because most diseases are rare and women, in the early part of the twentieththeir latent period, that is, the time berween century pointed to tobacco smoking as acxposure to a cause and the appearance of a likely cause of this disease. In contrast, theclinical disease, is long. This makes it nec- objective of analytic studies is to documentrssary to enroll unrealistically large num- causation from the pattern of association inbers of compliant volunteers for a very long individuals between one or more exposuresperiod (Hennekens and Buring, t987; on the one hand, and a particular disease onMacMahon, 1979; MacMahon and Tri- the other.chopoulos, L996).

Observational, that is nonexperimental, EcologicStudies

studies represent the mainstream of modern Ecologic studies in epidemiology occupy anepidemiology. Such studies seek to docu- intermediate position berween descriptivement causal relations on the basis of asso- and analytic investigations, in that theyciations between particular exposures and share many characteristics with descriptivecancer or other diseases. Inference of cau- studies, but serve etiologic objectives. Insation on the basis of association is easy ecologic studies, the exposure and the dis-wheir the association is both strong and ease under investigation are ascertained notbiologically credible-smoking and lung for individuals but for groups or even wholecancer, or hepatitis B virus and liver can- populations (Morgenstern, 1,952). Thuscer, are striking examples. It becomes more the prevalence of hepatitis B virus (HBV)difficult when the association is biologically in several populations could be correlatedcompelling but the epidemiologic experi- with the incidence of liver cancer in theseence weak-for example, in studies of low- populations, even though no informationlevel ionizing radiation and leukemia or could be obtained as to whether any par-passive smoking and lung cancer. Causal ticular individual in these populations wasinterpretation also becomes problematic or was not an HBV carrier and has or haswhen the epidemiologic association is fairly not developed liver cancer. Associationsconvincing but the biological rationale is from ecologic studies are viewed with skep-uncertain, as it is with respect to red meat ticism, because these studies are susceptibleand colorectal cancer or alcohol and breast to unidentifiable and intractable confound-cancer. Vhen an epidemiologic association ing as well as to several other forms of biasis weak, is derived from a study with ques- (Morgenstern, '1.982;

Greenland and Ro-tionable quality, and floats in a biological bins, t9941.vacuum, inferring causation is perilous. ri7hen an exposure is fairly common, for

example, smoking, or even prevalence ofHBV carriers, ecologic studies can provide

STUDY DESIGN useful evidence on the possible effects of

Descriptive studies :l::t exPosures' For instance' following the

" lncrease in tobacco consumption, the inci-It is possible to distinguish observational dence of lung cancer increased sharply overepidemiological studies into descriptive and time, and the incidence of primary liveranalytic. In descriptive studies the frequency cancer is higher in populations with higher

1 3 1

th this allocation overyears. Most trials arefor no more than a fewrlly short period to testposures because of then exposure and diag-hermore, in many ran-ects become noncom-rt is people allocated tostop taking the inter-

the originalplacebo oradopt the intervention:d cross-over). This di-t between the originalreducing the power tor disease rates berween

mitations of the ran-:rial, the observationaltrol designs are exten-demiology. As will be: chapter, aftention toLnalysis of these studies:oximate the standardsressary to validly eval-l exposure on the fre-

miology'vational epidemiologywl investigation of dis-ran population groups.lat only by studyingto draw confident con-al or pathological pro-xumans (MacMahon,dTrichopoulos, 1996).as those involving cellin laboratory animals

:e indeed indispensable'es or invasive proce-riopsies are needed forrgic or pathologic pro-rogenesis. However, in:quently artificial, and:al and metabolic dif-lmans and laboratoryinterspecies analogies.:urther complicated by.ted number of animals-rdies and the relatively

Page 6: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

132

prevalence of HBV. As a corollary, lack ofan association in ecologic studies between awidespread exposure that has rapidly in-creased over time and the incidence of adisease allegedly caused by this exposure,does not support a strong causal relation.

Analytic Studies

Analytic epidemiologic investigations ascer-tain exposure and disease outcome in indi-viduals and are usually distinguished intocohort and case-control studies, althoughthere are also several variants of these pro-totype designs (MacMahon and Tricho-poulos, 1,995; Rothman and Greenland,1998). The objective of analytic epidemio-logic studies is to ascertain whether a par-ticular exposure, such as a physical, chemi-cal, or biological agent, and a specific canceror other disease are unrelated (independent)or associated. An association does not nec-essarily indicate causation. Chance, bias,and confounding (see following) can alsogenerate associations, and they frequentlydo. Causation is unlikely when there is noassociation observed. Even if a causal rela-tion does exist, however, it may sometimesbe difficult to document it, particularly whenthe association is weak, the study has limitedstatisticalpower, or the exposure is seriouslymisclassified.

Person-time and Study Base

The concepts of person-time and study baseare fundamental to the design and analysisof epidemiologic studies. As the name im-plies, there are two key components in ourdescription of the person-time, namely thenumber of people and the time they arefollowed. To illustrate this, we could askhow many brain cancer cases we wouldexpect if we followed one million peopleexposed to x-rays f.or zero seconds. Con-versely, how many cases would we expect ifwe followed zero people for one millionyears? The answer in both instances is, ofcourse, zero. Hence, neither people nor timealone provides adequate information aboutthe disease experience of a population, andthus both should be taken into account.Person-time is the sum of all the time con-

tributed in a study by subjects at risk of adisease.

Theoretically, an ambitious investigatormight wish to include the entire worldpopulation in an epidemiologic study duringmany decades. Needless to say, such a studywould provide marvelous opportunities toevaluate many different exposures in rela-tion to many diseases. Millions of person-years would be generated even within a fewweeks.In real life, however, any investigatorhas to restrict the person-time from whichinformation is harvested. This specifiedperson-time is called the study base. Defin-ing the person-time to be included in thestudy base may include geographic restric-tions, defined time periods, and certain agelimits. Personal characteristics such as gen-der, ethnicity, and occupation may furtherspecify the study base. For example, the studybase may be comprised of all British doctorswho answered a questionnaire in 1,951. (Doll

and Hil l, 1.9561, or by all Swedish womenwho were aged 50 to 74 between"l.994 and1995 (\ilTeiderpass et al,'1,999),and who gen-erated person-time until they died or untilthe follow-up was completed.

Thus, the study base is simply the person-time of a population of individuals at riskof a disease under study. Defining the studybase is a crucial step in the design andconduct of an epidemiologic study. Thereare three central considerations. One is toaccommodate realistic goals with regards tofeasibility and resources, as certainly noinvestigator is independent of time andmoney. A second goal is to make the studyefficient. For example, it would make littlesense to study the association betweensmoking and cancer in a population wherevery few are smokers. Likewise, a study ofdiet and prostate cancer would be ineffi-cient among men younger than 40, sincevirtually no cases arise among such youngpeople. The final challenge is to identify astudy base that allows valid inferences con-cerning associations between exposure(s)and a particular disease-that is, a studybase that does not impose intractable con-founding or raise insurmountable obstaclesof other biases.

CO

Person-time is the swant to investigate, frcnce of cancer. To helretter understanding ;base, we will use the trisk of brain cancer ()ulation, five people harays, and another five Iattd remain unexposerriod. While in real lifearc much larger, we uarnple to illustrate rhe

Among the peoplep ( ' r s o n s 3 a n d 5 w e rtirne they were €XpoS€ruf the study period-aPersons 1, 2, and 4,hrain cancer at the encmspectively. Once theslrr;rin cancer, they are

oxposure tox-rays

person 1

person 2

person 3

person 4

person 5

person 6

person 7

BACKGROUND

no axposure person gto x-rays

person 9

person 10

?r nver time. Of the fivrtudy period (person 9

rtce of rhese ren ind

!Irr

0.0

is the study-base.

Page 7: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

CONCEPTS IN CANCER EPIDEMIOLOGY AND ETIOLOGY 133

ry subjects at risk of a

ambitious investigatorude the entire world

emiologic study duringiess to say, such a studY'elous opportunities to

rent exposures in rela-

:s. Millions of person-

:ated even within a few,wever, any investigatorerson-time from which

vested. This specifiedI the study base. Defin-to be included in the

ude geographic restric-reriods, and certain age

:acteristics such as gen-

)ccupation may further

:. For example, the studY;ed of all British doctors

stionnaire in 1951 (Doll

by all Swedish womeno 74 between L 994 and

al,, 1.999),and who gen-

until they died or unti l

ompleted.rse is simply the Person-,n of individuals at risk

.udy. Defining the studY;tep in the design and

emiologic study. Therernsiderations. One is to

;tic goals with regards to

)urces, as certainly no

ependent of time and

oal is to make the studY

rle. it would make little

,. association betweenr in a population where:rs. Likewise, a studY of

cancer would be ineffi-younger than 40, since

rrise among such young

hallenge is to identify arws valid inferences con-

.ls between exPosure(s)lisease-that is, a studY

impose intractable con-

:rsurmountable obstacles

Person-time is the source of any event wewant to investigate, for example the occur-rence of cancer. To help set a foundation forbetter understandingperson-time in a studybase, we will use the example of x-rays andrisk of brain cancer (Fig. 6-2). In this pop-ulation, five people have been exposed to x-rays, and another five have not been exposedand remain unexposed during the study pe-riod. \7hile in real life the study populationsare much larger, we use this elementary ex-ample to illustrate the principles.

Among the people exposed to x-rays,persons 3 and 5 were followed from therime they were exposed to x-rays till the endof the study period-a total of 5 years each.Persons l, 2, and 4, however, developedlrrain cancer at the end of years 1.,4, and 2,respectively. Once these individuals developbrain cancer, they are no longer at risk of

the disease, and thus no longer contributeinformation to the study base. The person-time among those exposed to x-rays is esti-mated by summing up the person-time of allthe individuals while at risk for the disease,that is:

(2 persons x 5 years) * (1 person x 1 year)

* (1 person x 4 years) * (L personx 2 years):1'7 person-Years

'W'e can similarly sum up the person-time

among the group of five individuals whowere not exposed to x-rays:

(4 persons x 5 years) * (1 personx 2 yeats) :22 Person-years

Later on when we discuss analysis of epi-demiologic studies, we will see how the

l;igure 5.2. Experience of a theoretical study population over time. Five individuals who werecxposed to x-rays and five individuals who are unexposed are followed over time to see ifthey develop brain cancer. Among those exposed to x-rays, persons 3 and 5 are followed forthe duration of the study period, which in this case was five years. Person 1 develops braincilncer after L year, person 2 develops brain cancer after 4 years, while person 4 developsc:rncer after 2 years. Persons 1,2 and 4 stop contributing person-time after they develop braineilncer, since they are no longer at risk for the disease. The total person-time in the exposedgroup is 17.0 person-years.Similarly, w€ can look at the population of people unexposed tox-rays over time. Of the five people who are unexposed, only one develops brain cancer duringthc study period (person 9). The remaining people are followed completely for five years.Therxperience of these ten individuals over time, that is the person-time that the subjects con-rihuted, is the study-base.

person 1

person 2

person 3

person 4

person 5

Brain cancer

exposure tox-rays

no exposureto x-rays

person 6

person 7

person 8

person 9

person 10

3.0

time

Page 8: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

r34

person-time data will help us to comparedisease incidence between exposed andunexposed people.

Cohort studies

The word cohort derives from the similarLatin word, which identified one of the tendivisions in a Roman legion. In epidemi-ology, cohorts are groups of individuals,which can be followed over time. In cohortstudies, individuals are classified accordingto their exposure and are observed for as-certainment of the frequency of diseaseoccurrence or death in the various exposure-defined categories (Fig. 5-3A). In each cate-gory the frequency of occurrence is calcu-lated either as risk or as incidence rate. Riskdescribes the proportion of those who de-veloped the disease under study among allindividuals in this category. Rate describesthe number of those who developed thedisease divided by the person-time duringwhich the individuals in this category havebeen under observation. Cohort studies havethe following defining characteristics.Cohort studies are exposure-based. Thegroups to be studied are selected on the ba-sis of exposure. In special exposure cohorts,the groups are chosen on the basis of. a par-ticular exposure. In general population co-horts, groups offering logistical advantagesfor follow-up are initially chosen and theindividuals are classified according to theirexposure status. Special exposure cohortsmay be necessary when rare exposures needto be studied, such as those encounteredin the occupational setting. For example,to study efficiently the effect of vinyl chlo-ride on liver angiosarcoma, or aromaticamines on bladder cancer, epidemiologicstudies have been conducted in cohorts ofworkers in the plastic and dyestuff manu-facturing industries, respectively.

The general population cohort is appro-priate when the exposure under consider-ation is fairly common. Classical examplesof general population cohorts, in which theprofession facilitated accessibility of cohortmembers rather than being a study factor,include the British Doctors Study and theNurses Health Studv. The British Doctors

cohort, established in "1.951, consisted ofmore than 30 000 doctors from GreatBritain. In this landmark study, Doll andcolleagues prospectively followed the co-hort and collected updated information onmultiple exposures, particularly smoking,over several decades. Indeed, prospectivedata from the British doctors were amongthe first to demonstrate convincingly therole of tobacco in the etiology of lung can-cer (Doll and Hill, 1,9561. More than fourdecades later, data from the British Doctorshave continued to provide insight into theetiology of cancer (Doll et al, 2005).

Another notable cohort is the NursesHealth Study, which began in 1976 withover 120 000 US registered nurses. This co-hort was assembled initially to evaluate pro-spectively the effect of oral contraceptiveson the risk of breast cancer (Hennekens et al,1984). Subsequently, diet and many otherexposures have been studied in relation tothe risk of cancer as well as other chroniccondition s (Zhang et al, 2005 ). Informationon these diverse exposures has been col-lected biennially through questionnaires.Moreover, blood samples have allowed re-searchers to explore biomarkers and geneticfactors. For example, prospective data fromthe Nurses Health Study has provided in-sight into the role of both exogenous andendogenous estrogens in breast cancer eti-ology. A particular characteristic of thesetypes of cohorts is that the individuals can befollowed almost completely over time, dueto their membership in groups with a highinterest in health studies and registrationrequirements that facilitate initial contactand long-term follow-up.

Cohort studies are patently or conceptu-ally longitudinal.The study groups are ob-served over a period of time to determinethe frequency of disease occurrence amongthem. The distinction beween retrospec-tive and prospective cohort studies dependson whether the cases of disease occurred inthe cohon at the time the study began. [na retrospective cohort study, exposures andhealth outcomes occurred before the in-vestigation started. These are typically as-sembled from pre-existing records of a

coN

Figure 6.3A. A cohort stufactor(s) of interest. Wher

'' Newly diagnosed cases ofrecorded. The exposure scigarenes for five years, t

' xquently, each person ca,,., considered exposed, if tht

.:: tv6$ accumulating non-ex,:.'' t?ncy. The total amount c

lnd non-exposed cases ca:, ttccurred in the exposed c

;, calculate the incidence raii, bttween the exposure an(

BACKGROUND

E

.j1.', enhort study, the rele':Iil gn*y not have acted anc:;. $sy nor nave acteo an("', wrtainly have nor ye'', '

ftrlklwing identification

, thc investigator must w

l#ilp.;' t-?"g cohort r',.lir.

Mcthodologically, thi ',1 sshort studies: closed c' '

, *pcn or dynamic cohc,,i-ir, pc frequent in occupa

ffid the study of outbr;: &*arts dominate canc(

funr the conceptual b,*FRtrol studies. The key*sfien and closed cohor

l;rj.::l:l:::..

' ' t

pOpulatron over tlme--i;'.' Plo)'ment histories of'.r.j.=.

linlted to recorded he',''j

"$ation of the worker

Page 9: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

in 1951, consisted ofdoctors from Great

mark study, Doll andvely followed the co-pdated information onparticularly smoking,

s. Indeed, prospectiveh doctors were amongtrate convincingly thee etiology of lung can-1955). More than four'om the British Doctorsrovide insight into the)ol l et a l , 2005).cohort is the Nursesh began in t976 with;istered nurses. This co-nitially to evaluate pro-of oral contraceptives

:ancer (Hennekens et al,

, diet and many otherr srudied in relation to; well as other chronic: al, 2005). Informationposures has been col-rrough questionnaires.nples have allowed re-biomarkers and genetic

, prospective data fromitudy has provided in-rf both exogenous andrs in breast cancer eti-characteristic of thesert the individuals can benpletely over time, due, in groups with a high:udies and registrationrcilitate initial contactr/-up.? patently or conceptu-e study groups are ob-I of time to determine3ase occurrence amongon between retrospec-cohort studies depends; of disease occurred inne the study began. Inrt study, exposures and:curred before the in-These are typically as-existing records of a

CONCEPTS IN CANCER EPIDEMIOLOGY AND ETIOLOGY 1 3 5

- x1 ExposedOases

lncidence rate ratio:(x1/P) (xo/Po)

Figure 6.3A. A cohort study comprises individuals who are either exposed or unexposed to the

factor(s) of interest. When these people are followed over time, they generate person time.

Newly diagnosed cases of a particular disease, that occur while person-time is accumulated are

recorded. The exposure status of a person can change. A person could be smoking high tar

cigarettes for five years, then switch to light cigarettes for fifteen years, and then quit. Con-*cquently, each person can contribute to person-time in different exposure groups. A case is

eonsidered exposed, if the disease occurred when the person who developed the disease

was accumulating exposed person-time. A case is non-exposed if it occurred while the personwas accumulating non-exposed person time. The example assumes, for simplicity, zero la-

tency. The total amount of exposed and non-exposed person time and the number of exposedlnd non-exposed cases can be calculated. After that, one can determine whether more casesoccurred in the exposed or non-exposed group per unit of person-time, that is, one cancalculate the incidence rate ratio. This ratio will indicate whether there is a relationship

between the exposure and the disease of interest.

population over time-for example, the em- ship in the cohort is determined. In a closedployment histories of a factory can be cohort, it is determined by a membership-linked to recorded health-outcome infor- defining event that occurs at a point in time.mation of the workers. In a prospective For example, people who were living incohort study, the relevant causes may or Hiroshima and Nagasaki when the atomicmay not have acted and the cases of disease bombs were dropped in 1945 are part ofcertainly have not yet occurred. Hence, a cohort whose membership began on the

following identification of the study cohort, date of the bombing. These subjects remainthe investigator must wait for the disease to in the cohort until they die.rppear among cohort members. Open cohorts are composed of individ-

Methodologically, there are fwo rypes of uals who contribute person-time to the co-

cohort studies: closed or fixed cohorts, and hort only as long as they meet the criteria

open or dynamic cohorts. Closed cohorts for a membership-defining state (Fig. 6-sre frequent in occupational epidemiology 3A). Examples of such criteria include placegnd the study of outbreaks, whercas open of residence, age, and health status. Oncecohorts dominate cancer epidemiology and individuals can no longer be characterized

form the conceptual basis for most case- by the defining state(s), they cease to con-control studies. The key distinction between tribute person-time to the open cohort andapen and closed cohorts is how member- are no longer members. Open cohorts are

l gl 1

I Non-exposed person-time (Ps) :l - _ l

: l l l ,r l l l ri - - - - - - - - l - l - - - - - -1- -t t lI I I----rl Y Y Y

Page 10: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

136 BACKGROUND

used, for example, in cancer epidemiologystudies based on registry data (Hanssonet al, 1996). A person could be a member ofthe cohort, for example, only as long as heor she was a resident of Sweden and was notdiagnosed with the cancer under srudy. Ifthe person emigrated from Sweden to an-other country, he or she stopped contrib-uting person-time to the cohort at that time.Similarly, if someone born outside of Swe-den immigrates there later in life, he or shewould begin contributing person-time tothe cohort at that time. In studies based onopen cohorts it is not possible to directlymeasure risk, otherwise referred to as cu-mulatiue incidence. Analyses are based onperson-time using incidence rate measures.

As an example, assume that in a closedcohort among 5000 nonsmoking men fol-lowed for an average period of L0 years(Po:50 000 person-years), xo:25 werediagnosed with lung cancer, and among10 000 smoking men followed for an av-erage period of 8 years (Pr:80 000 per-son-years), Xr :500 were diagnosed withlung cancer. In this example the incidencerate among nonexposed would then be 50per 10' person-years and among exposed750 per 1.05 person-years. The reilatiue risk(incidence rate ratio) would b. ffiffi, otL5. The conclusion is that there is'a 1S-foldincrease in lung cancer occurrence fromsmoking.

Case-control Studies

In case-control studies, patients diagnosedwith the disease under consideration formthe case series. As in cohort studies, theirexposure to the f.actor under investigationis ascertained, for example, through ques-tionnaires, interviews, examination of re-cords, undertaking of laboratory tests inbiological samples, and other means (Fig.6-38). Using the same methods, the patternof exposure to the study factor(s) is thenestimated in the population, or more strictlyin the person-time from which the case se-ries arose. This is done among control sub-jects selected as a sample of the study basefrom which the cases arose. If only rwo ca-tegories of exposure are relevant (exposed

and unexposed), the relative risk can be es-timated by dividing the odds of exposureamong cases with the corresponding oddsamong the controls, the odds ratio.

Thus, if among 200 male patients diag-nosed with lung cancer (cases), a:'1.50were smokers and b:59 nonsmokers,whereas among 300 men similar in age tothe cases but without lung cancer (con-trols), c-50 were smokers and d--250were nonsmokers. the odds ratio would be

#-#- f f i o r 15. Th is measure is agood approximation to the relative risk (orrisk ratio, or rate ratio). Hence, similar tothe cohort study, these data from a case-control study show a 1S-fold excess of lungcancer among smokers.

There are some features of case-controlstudies that make this design susceptible tobias (see following). A well-designed case-control study, however, is a valid and cost-efficient approach to the study of the eti-ology of cancer and other conditions.

Nested case-control studiesSome case{ontrol designs are methodolog-ically superior to others. The best example isthe nested case{ontrol design. The defini-tion of this study design is still somewhatambiguous (\ilfalker, 1991,; Rothman andGreenland, 1998). A definite requirement,however, is that controls are chosen fromthe clearly defined person-time from whichall cases have arisen. In other words, if oneof the controls had developed the diseaseunder study, he or she would have definitelybeen included among the cases. Defining theunderlying person-time from which a seriesof cases-for example, lung cancer casespresenting at a referral hospital-arose canbe difficult. Sampling controls from a cohortdifferent from the one that gave rise to thecases often results in selection bias.

According to a more strict definition, theterm nested case-control study is used onlywhen the underlying cohort and the corre-sponding person-time have been previouslyenumerated and the exposure informationwas collected prior to the diagnosis. In otherwords, the controls are selected from ex-actly the same person-time that gave rise to

., '.

Figure 6.38. It is not alw, ,itl.. The case-control study is'=

ofexposed and unexpose,,,,i '. '., fxposed to unexposed per,' ,t;-- {controls) without the dis,.,,.r''. nining their exposure stat

,t, thcir exposure status, thel.,-.,1,, , tntal study person-time. Tl',io.- lmong the cases of the disr.::1..:..., ${n then be made Of the Od...,t- lhe odds ratio, which is ar

;.l.,,, Hhether there is an associ,

Odds ratio:xrYol lr,[t

the cases, the study bar' tional case-control des

. to bias due to selectivdifferences in recall. thifcrves the validi w of a

. *tudy. Case-contiol stu,

. lrting cohort are being retl$ efficiency when ar:'fiembers

requires substNested case-control

'frequently used in occ

, ology (Rothman and Gr,{X'cupational cohort cadclined whereas abstrac

Fo$ure information fro: f?quires substantial wor*fficient to investigate olsrest and a sample fr<i* the controls. Nowa,

'sonrol studies are used

F.rsure information is derf,rprnsive laboratory pr,

IIIIIII

r , ' ' ' i i ; ' .: . a;::;:i . r:il-;,l;:.:::'::l=::.

:'+'.,,;,1r,,€ '

' . . t . . . t$ '

, ,

,jii,: I iti.L_

, ' : : : :: :

' ' , i " :

,'r*,'

Page 11: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

: relative risk can be es-; the odds of exposurehe corresponding oddsthe odds ratio.

,00 male patients diag-ancer (cases) , a :150I b-50 nonsmokers,) men similar in age toout lung cancer (con-smokers and d -250

he odds ratio would be15. This measure is a

r to the relative risk (orrtio). Hence, similar torese data from a case-r 1S-fold excess of lungers.3atures -of case-controlris design susceptible toA well-designed case-

ver, is a valid and cost-o the study of the eti-other conditions.

' studies

esigns are methodolog-ers. The best example istrol design. The defini-esign is still somewhat., 1,991.; Rothman and\ definite requirement,ntrols are chosen from,erson-time from which. In other words, if onedeveloped the disease

re would have definitely

; the cases. Defining theme from which a seriesple, lung cancer casesral hospital-arose can

; controls from a cohortne that gave rise to theselection bias.

rre strict definition, thentrol study is used only

; cohort and the corre-re have been previouslyexposure information

r the diagnosis. In otherare selected from ex-

n-time that gave rise to

CONCEPTS IN CANCER EPIDEMIOLOGY AND ETIOLOGY 137

ExposedCases

E . - - - > f x r

Z h ExposedControls

i_-_-_-_-_-_-_-_l Jzo UnexposedControls

Odds ratio =

xtlolvsyt - Case's

Figure 6.38. It is not always practical or economical to evaluate the entire study person-time.The case<ontrol study is a more efficient design. Instead of enumerating the total amountof exposed and unexposed person time that makes up the study person-time, the ratio ofexposed to unexposed person-time is estimated. This is achieved by randomly selecting people{controls) without rhe disease of interest from the underlying study person-time and deter-mining their exposure status. If a sufficient number of controls are selected, without regard fortheir exposure status, then the exposure distribution in the controls will estimate that in thetotal study person-time. The exposure distribution among the controls is then compared to thatamong the cases of the disease of interest that have arisen in the study person-time. An estimatecan then be made of the odds of exposure among the cases compared to that among controls, orthe odds ratio, which is an unbiased estimator of the incidence rate ratio and so indicateswhether there is an association between the disease and the exposure of interest.

the cases, the study base. Unlike the tradi- logic samples such as blood or blood prod-tional case{ontrol design, which is liable ucts, tissue, urine, or nails.to bias due to selective participation and One such example is a study of seleniumdifferences in recall, this nested design pre- status and breast cancer risk in the Nurses'$erves the validiry of a prospective cohort Health Study (Hunter et al, 1990). Onstudy. Case<ontrol studies nested in an ex- the basis of prior evidence that seleniumisting cohort are being used increasingly for intake may influence breast cancer risk andcost efficiency when analysis of all cohort since selenium levels in toenails are a reli-members requires substantial resources. able source of selenium exposure over sev-

Nested case-control studies have been eral months, the participating women werefrequently used in occupational epidemi- asked to provide toenail clippings in 1982.ology (Rothman and Greenland, 1998). The After 4 years of follow-up, there werc 434occupational cohort can often be readily cases of breast cancer. It would have beendefined whereas abstraction of detailed ex- very expensive and inefficient to get expo-posure information from existing records sure information for all the 62 000 nursesrequires substantial work. Hence, it is more who had returned toenail samples at theefficient to investigate only the cases of in- start of follow-up. Hence, 434 controlsterest and a sample from the cohort that without breast cancer were sampled fromis the controls. Nowadays, nested case- the cohort. Using this design meant thatcontrol studies are used routinely when ex- only 858 rather than 62 000 samples had toposure information is derived, often through be sent to the laboratory for selenium ana-expensive laboratory procedures, from bio- lyses (Hunter et al, t990).

t rt t

I Non-exposed person-time (Po) :I _

i t l------l-------l*r - - - - - - - - l ' - F - - - - - l - - - - - - - J

t t lt t l------rlv Y Y

Page 12: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

1 3 8

Matching in case-control studies generally rare, presumably because of nat-ural selection pressure.

Genetic association studies can be ofeither cohort or, mo[e frequently, case-control design. They are frequently under-taken in the general population, rather thanin families, and are conceptually similarto traditional epidemiological investiga-tions. The difference is, however, that in-stead of focusing on environmental factors,like smoking or diet, genetic associationstudies evaluate as "exposures" specificalleles (rather than loci) of genetic poly-morphisms, usually single nucleotide poly-morphisms (SNPs). The specific alleles maybe etiologically related to cancer or, muchmore frequently, very closely linked to thetruly etiological allele which may not beknown. The actually investigated allele andthe true etiological allele are said to be inlinkage disequilibrium-that is, they are soclosely linked that they tend to be inheritedtogether. Two loci in linkage disequilibriumare obviously linked, but two linked locimay not be in linkage disequilibrium ifthey are sufficiendy apart in the chromo-some to be separated, sooner or later, bythe frequent cross-over process in the mei-osis phenomenon during the generation ofgametes. In other words, linkage coverslonger genetic regions than linkage disequi-librium (Cordell and Clayton, 2005; Teareand Barrett, 2005). The specific allele maybe chosen to study because the correspond-ing locus is thought to be involved in theetiology of the cancer under investigation(eg, a candidate gene). Many SNPs overlarge parts of the genome, or even over thewhole genome, may also be evaluated, withlittle or no prior evidence that most of themare etiologically relevant or are in linkagedisequilibrium with etiologically relevantgenes. In the latter situation, most statisti-cally significant findings are likely to be falsepositive and special procedures are recom-mended to delineate which ones among theapparent associations are probably genuine(\Uflacholder et al, 2004). Genetic associa-tion studies have not been very successful todate in identifying genes or polymorphismsinvolved in cancer etiology, possibly be-

: . r . . CONCt : : : l :''a':'.1_.

',, Stuse the resPective rel,.'.,. very linle from the null

''t"' sause the tools to exal' '

the genome simultaneou'i'. I limited number usual

basts of weak Prlor Pr(. ' ruly associated-are o' ' . ' lvai lable.

.; THE ROLE OF: :

, '

,'' Before an epidemiologi<

...' h considered true and:"', iRterpretation in causal'.

*hance and systematic, ,'' Considered.

=... The P-value'lt Ou, daily lives are full' €v€ntS and coincidences

,,,., ' thourands of people har, ' '-f1s1n lotteries; many n

' ,,, .ilfilnge accidents, even tl

:i "'-''' ltics for the respective er'''' :' tmall-say one in 100 0: - : i l - . : : i : ' t l r .e r

,_:..:.,hf,son is simple: High.,-,-i' hppen by chance all the

' r l , * tubl ing 0.125, because

, ;l gftp,rsite outcome, three

"',.. *rtreme as three heads- : i :

G l l r b r l r w a r I

' . ' five heads or tails in a rt, .1: - gv! nsaus oI' ':f,

ffl*picion (p : [t/z]s x 2,, l r - ' . - . 1 , ,

' , I' ,l-' l{X} people have tossed

,:.. r 'oin five times each, it s

BACKGROUND

Occasionally, case-control studies are mat-ched. This means that controls are chosen soas to match particular cases with respect togender, age, race, or any other factor that islikely related to the disease under investi-gation but not intended to be analyzedin theparticular study. Matching is not strictlynecessary, nor does it increase the validity ofresults. But it improves statistical efficiencyand, thus, the ability to substantiate a trueassociation (Rothman and Greenland, L998) .\(ihat is necessary, however, is that, when-ever matching has been used in the enroll-ment of cases and controls, the statisticalanalysis should accommodate the match-ing process. This can be done through ei-ther a matched analysis (for example, con-ditional modeling) or unmatched analysiswith explicit control for the matching fac-tors (proper application of unconditionalmodeling).

Studies of the Genetic Epidemiologyof Cancer

Genetic epidemiology of cancer is consid-ered in more detail in a distinct chapter(Chapter 4). Here, we refer briefly to suchstudies, to provide an integrated picture ofepidemiologic designs available for the studyof cancer etiology. Two main types of epi-demiologic studies are used for the identifi-cation of genes predisposing to cancer: ge-netic linkage studies and genetic associationstudies.

Genetic linkage studies are generally un-dertaken in families with a high cancerburden and rely on the principle that twogenetic loci, or a cancer and a particularlocus, are linked when they are transmittedtogether from parent to offspring more of-ten than expected by chance. Linkage ex-tends over large regions of the genome andrefers to a locus, rather than specific allelesin that locus, which can vary from study tostudy and from family to family (Teare andBarrett, 2005). Such studies have led to theidentification of genes that have substantialimpact on the occurrence of breast cancerand colorectal cancer, but these genes are

,-. .ttst operate differently ir1.,,' Ind everyday life. In

i:.' pmper quantification anr

0R sound substantive kr, '

Hrary before considerin;

,,, l ikcly explanation for a 1r ; l,et us take, as an exal'

lunbiased) coin that h;' ' ' ' - - ' - t - ^ L i l l + . ' ^ f r " - - i - -'1:i ,.pobability of turning

$u"lical ProbabilitY ofr,;, '

Josti*g t|t.coin thrge

i€'threc heads in a row is ::,4'ht it can hardly be takt],if ,.s'l[t lt Can nar(ll/ De taK(, ' ' , Sgt the coin is system

, . 't. lhissed) toward tails. T#;;;;;;;;; i' o 2s "nd is .,i ',5.plying 0.5 x 0.5 x 0.5 :

t-

Page 13: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

nably because of nat-P

n studies can be of)re frequently, case-are frequently under-opulation, rather thanconceptually similar

miological investiga-is, however, that in-

rnvironmental factors,:, genetic association"exposures" specificoci) of genetic poly-ingle nucleotide poly-he specific alleles mayd to cancer or, much'closely l inked to thee which may not benvestigated allele andllele are said to be inn-that is, they are so:y tend to be inheritedinkage disequilibrium

but two linked locirge disequilibrium ifrpart in the chromo-1, sooner or later, by:r process in the mei-'ing the generation of'ords, linkage coversthan linkage disequi-

Clayton, 2005; Tearehe specific allele may;ause the correspond-ro be involved in ther under investigatione). Many SNPs over)me, or even over thelso be evaluated, withnce that most of themant or are in linkageetiologically relevant:uation, most statisti-gs are likely to be falserocedures are recom-vhich ones among theare probably genuine04). Genetic associa-,een very successful tores or polymorphismstiology, possibly be-

fluse the respective relative risks deviaterery little from the null value, but also be-i|use the tools to examine alleles across

genome simultaneously-rather than atI limited number usually selected on the'basis of weak prior probabilities of beingtrufy associated-are only just becominglvailable.

THE ROLE OF CHANCE

,Scfore an epidemiologic'br considered true andinterpretation in causal€hance and systematic

139

that about six (L00 x 0.0625) among themwould have obtained either five heads orfive tails in a row.

It must be real:r,ed that stochastic (prob-abilistic), in contrast to deterministic, pro-cesses always have built-in uncertainty. Intheir research, all investigators want to re-duce chance-related uncertainty as muchas possible in order to allow more reliableconclusions. This can be achieved mainlyby enrolling progressively larger numbersof individuals in a study. The remaininguncertainty can always be assessed by uti-lizing statistical procedures that generate anumber of summary statistics, including thep-ualue.

The true meaning of the p-ualue, how-ever, is poorly understood and the conceptitself is widely misused. Surprisingly, thismisunderstanding and misuse is quite com-mon even in scientific research. Tradition-ally, p-ualues arc expressed as numericalfractions of 1. For example, a p-ualue of0.1. for a particular positive association (ordifference) indicates that there is a 10i"chance that such an association or a moreextreme one (or a symmetrically oppositeone-that is an inverse association) wouldappear by chance, even if there were in re-ality no association at all.

In essence , the p-ualue is interpretable assuch when only one comparison or one testis performed.

'!7hen multiple comparisons

or multiple tests are carried out the set of therespectiye p-ualues loses its collective inter-pretability. Various procedures for adjustingp-ualues according to the number of com-parisons undertaken or tests performed havebeen proposed (I7acholder et al, 2004).

A p-ualue of 0.05 or smaller isuaditionally-and indeed arbitrarily-treated in medical research as evidence thatan observed association may not have arisenby chance. For example, the proportion oflong-term smokers is found to be largeramong lung cancer patients than amongindividuals without the disease and thep-ualue for this difference is, S8I, 0.05.This implies that the probability of findinga difference of this magnirude or larger (inabsolute terms) is 5% if smoking were

CONCEPTS IN CANCER EPIDEMIOLOGY AND ETIOLOGY

association couldtherefore deserve

terms, the role oferrors should be

:The P-value'Our daily lives are full of highly unlikelywents and coincidences. At the extremes,thousands of people have become wealthy

.from lotteries; many more have died in:ltrange accidents, even though the probabil--lties for the respective events are extremely,imall+ay one in 100 000 or smaller. Thelesson is simple: Highly unlikely eventslrapp.n by chance all the time. Chance does

..not operate differently in scientific research,tttd everyday life. In science, however,-proper quantification and judgment, relyingi on sound substantive knowledge, are nec-'sBrI before considering chance as an un-likely explanation for a phenomenon.

. Let us take, as an example, tossing a f.atr

{unbiased) coin that has a 50% or 0.5probability of turning up heads and anidentical probability of turning up tails.

,:Tossing the coin three times and gettingthree heads in a row is somewhat unusualbut it can hardly be taken as an indicationthat the coin is systematically influenced'{biased) toward tails. The p-ualue tn thisinstance is 0.25 and is calculated by multi-plying 0.5 x 0.5 x 0.5 - 0.'1.25, and then,doubling 0."1.25, because the symmetricallyopposite outcome, three tails in a row, is as

.: txtreme as three heads in a row. Gettingfive heads or tails in a row generates someruspicion 1p:[r/z]5 x2-0.06251. But if,100 people have tossed a fair (unbiased)coin five times each, it should be expected

Page 14: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

t40

unrelated to lung cancer. In this situation,chance is considered unlikely to explain theassociation. However, small p-ualues, in-cluding values considerably smaller than0.05, do not guarantee that an association(or difference) is genuine-let alone causal.

Even when the p-ualue is very small andwas generated from a carefully conductedstudy, it could still be dismissed when therelevant result makes no sense (Miettinen,1985). Hence, a statistically significant as-sociation, linked by convention to a p-ualueof 0.05 or less, does not necessarily implycausation. Systematic errors, generated byconfounding or bias (see following), cannotalways be confidently discounted in obser-vational epidemiology. Moreover, as indi-cated at the end of this chapter, the exis-tence of a genuine association that can beconfidently attributed to causation does notnecessarily imply that someone who devel-oped the disease following the exposure didso because of that exposure.

A common misconception (Miettinen,1985) is that if a p-ualue (for example,

2:0.03) has been properly derived, then itscomplement (0.97 in our example) can beinterpreted as the likelihood that the re-spective association is indeed causal. Thismisconception is rarely stated explicitly inthe scientific literature, but it underlies theconclusions of many epidemiologic reportsthat are not securely anchored in methodo-logical principles and biomedical substance.

Lastly, it must be recognized that the p-ualue itself does not convey any informationabout the strength of the respective associ-ation. A weak association may be statisti-cally highly significant (very small p) whenthe study is large, and a strong associationmay be statistically nonsignificant (larye p)when the study is small. Hence, all p-ualuesare inherently dependent on the study size,because statistical power-the abiliry todetect an association (or a difference) whenit exists-increases when a study is larger(Rothman and Greenland, 1998).

Confidence Intervals

In order to integrate information about thestrength of an association (as reflected in the

relative risk-effect measure, described lateron) and its statistical significance, the con-cept of confidence interual has been devel-oped. Most common are 95"h confidenceintervals. With a 95o/" confidence interval,one can be 95t/" confideht that the intervalcovers the true measure of association (for

example the relative risk). But in 5 times outof 100, the true measure is not included.The confidence interval is closely linked tothe p-ualue. The width of the confidenceinterval is determined primarily by the de-sired level of confidence and the sample size.Hence, the interval is wider if it includes thetrue value with 95% confidence than with,for example, 80% confidence. Likewise,smaller studies create wider confidence in-tervals-that is, greater uncertainty aboutthe true value-than larger studies.

SYSTEMATIC ERRORS

The Experimental Study

The chance-related issues apply to all typesof studies, observational as well as experimental. As discussed earlier, experimentalstudies undertaken under optimal condi-tions are methodologically superior to ob-servational studies. With randomization ofexposure, complete follow-up of srudy sub-jects, and double-blind assessment of out-come, they are not as liable to the pitfalls oftypical observational studies-that is con-founding and bias (Miettinen, 1985; Hen-nekens and Buring, t987; MacMahon andTrichopoulos, 1996; Rothman and Green-land, 1998; Rothman,2002). Proper evalu-ation of the association between a particularexposure and a specific disease presupposesthat every other factor that could influencedisease occurrence is either constant amongsubjects studied or distributed equally be-fween exposed and unexposed subjects.

In other words, an experimental studyuses random allocation of study subjectsinto those who will be exposed and thosewho will not. Thus, the two or more groupswill tend to be similar in distribution toknown as well as unknown factors thatmay influence the results. In some studies,

CON(

blinding of researchers

- and devices (for exampiinert pills, the so-calledther assure that every ft

', disease occurrence. oth(under study, is kept at ebetween the exposed an<

Experimental studies' Latin dictum ceteris p

being equal). However,timal conditions that crconfounding and bias a

' even in randomized con

' over, as already indicatefullv control the inher

' ' , role of chance, except

. alistic objective in man;: The randomized cont::,. methodological advant:

' perimental research in

,:, .. experiments faces seri'-'-'- most important of whi

. obviously not acceptabl

..' intentionally to a poter.r,.... fl8ent in order to asce.t' 't' tion. For this reason,:., , €orltrolled trials in hun., ,', 5t*ed to evaluate trea',, t,.' lnd occasionally to det.." ,',' Sive potential of vaccine,:, i,, tuPPlements. In most in

,: ,.,.,'di$ease etiology has to.':,,5. nal models-with inht

fture 6.4A. Infection witi=laofounded by hepatitis B

ifunding is disregarded, th

BACKGROUND

Page 15: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

reasure, described laterI significance, the con-tterual has been devel-n are 95% confidence

"/" confidence interval,fident that the intervalrure of association (forrisk). But in 5 times out:asure is not included.val is closely linked to,dth of the confidence:d primarily by the de-rce and the sample size.; wider if it includes the, confidence than with,confidence. Likewise,

'e wider confidence in-lter uncertainty aboutlarger studies.

NC ERRORS

rudyssues apply to all fypesonal as well as experi-I earlier, experimentalunder optimal condi-gically superior to ob-With randomization ofbllow-up of study sub-Lnd assessment of out-; liable to the pitfalls ofI studies-that is con-Miettinen,'1,985; Hen-1987; MacMahon andRothman and Green-

y 2002). Proper evalu-rn benareen a particularfic disease presupposesor that could influenceeither constant among

iistributed equally be-rnexposed subjects.m experimental studyrion of study subjectsbe exposed and those

:he two or more groupsilar in distribution tounknown factors that:sults. In some studies,

blinding of researchers and study subjectstluough the use of appropriate proceduresrnd devices (for example, indistinguishableinert pills, the so-called placebos) may fur-ther assure that every factor that can affectdisease occurrence, other than the exposure[nder study, is kept at about the same levelhtween the exposed and unexposed groups.

Experimentai studies aim to fulfill theLatin dictum ceteris pariba (other thingshing equal). However, in humans, the op-

. timal conditions that completely eliminateconfounding and bias are difficult to createcven in randomized controlled trials. More-over, as already indicated, there is no way tofully control the inherently unpredictablerole of chance, except by the use of verylarge numbers of study subjects-an unre-rlistic objective in many studies.. The randomized controlled trial, with itsmethodological advantages, dominates ex-perimental research in laboratory animals.[n humans, however, the undertaking ofcxperiments faces serious obstacles, themost important of which are ethical. It isobviously not acceptable to expose humansintentionally to a potentially carcinogenicggent in order to ascertain cancer causa-tion. For this reason, most randomizedcontrolled trials in humans have been per-brmed to evaluate treatment effectivenessend occasionally to determine the preven-tive potential of vaccines, vitamins, or othersupplements. In most instances, research ondisease etiology has to rely either on ani-mal models-with inherently dubious as-

141

sumptions about interspecies similaritiesand exposure dose extrapolations-or onepidemiologic studies with an observationaldesign.

Epidemiologic studies have indeed gen-erated most of what is currently knownabout the etiolory of human diseases ingeneral, and cancer in particular. At thesame time, however, epidemiologic studieshave also generated conflicting results, un-warranted concern about everyday expo-sures, and considerable confusion over therational ranking of public health priorities(Taubes, 1,995). The problem arises becauseepidemiologic studies must confront notonly the vagaries of chance but also theproblems of systematic errors that under-mine their validitv.

Confounding

Confounding is the systematic error gener-ated when another factor that causes thedisease under study, or is otherwise relatedto it, is also related to the exposure un-der investigation (Fig. 6-aA). Thus, if onewishes to examine whether hepatitis C virus(HCV) causes liver cancer, hepatitis B virus(HBV) would be a likely confounder. Con-founding arises because HBV causes livercancer and carriers of HBV are more likelyto also be carriers of HCV (because theserwo viruses are largely transmiffed by thesame routes). Hence, if the confoundinginfluence of HBV is not accounted for inthe design (by limiting the study to HBV-negative subjects) or in analyses of the data,

CONCEPTS IN CANCER EPIDEMIOLOGY AND ETIOLOGY

Figure 6.4A. Infection with hepatitis C virus (HCV), a cause of liver cancer, is (positively)

confounded by hepatitis.B virus (HBV) infection, another cause of liver cancer. If this con-founding is disregarded, the strength of the association between HCV and liver cancer will beoverestimated.

Page 16: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

BACKGROUNDCONC

then the strength of the association betweenHBC and liver cancer would be overes_timated (Fig. 5-aA).

A more trivial example is the strong as_sociation between carrying matches l,

"cigar_ette lighter and developing lung can_cer. Obviously, neither matches nor lilhtersc.ause lung cancer and their associati,on tothe disease is due entirely to confoundingby cigarette smoking. The confoundin g f^Jtor, cigarette smoking, is the true cause oflung cancer and the dependence of cigarettelighting on match., oriighters generates theconfounded, entirelv sfurious associationof the lafter rwo factois with the disease(Fie. 5-aB).

There are several ways to deal withconfounding: some simple, others morecomplicated. They all assume that two con_ditions are satisfied: (1) rhat all the con_founders have been identified or at leastsuspected, and (2) that the identified orsuspected confounders can be adequatelyconceptualized and accurately measured.$7hen.the study is fairly large, it is alwayspossible to evaluate all zuspected confound_ers in the analysis. However, the abilirvto- conceptualize and accurately -."rrr.all of them is frequently beyona in. conrrolof any investigator. The result is what hasbeen rermed residual confounding, that is,confounding left unaccounted for (Mac_Mahon and Trichopoulos, 1996; Rothmanand Greenland, 1999).

Bias

.Compounding the problems of epidemio-logic studies is that the data are almostnever of optimal quality. Data collectionrelies on the recollection of exposures andtheir accurate reporting by study partici_pants, laboratory procedures, or existingrecords. These sources are rarely p.rf..tlFor example, studies on diet rely on indi_viduals' imperfect recall on how irequentlythey eat specific foods, or on serum markersof nutrients that are far from perfect indi_cators of long-term consumption. Suchmisclassificarion, or informatiin bias, caninfluence the relative risk in any directionand, thus, entails exaggeration, underesti_mation, or even reversal of the true associ_ations.

. mare of the effect ofI {Chang er al, 2006).

A well thought-out 1iued procedures, and b,'.,, lrol measures can redu

' i0me quanrification of i.. However, complete assu

been eliminated can ne,lddition,. the reliance o{

: ' ies on a control series t

eiency, and general prac... *usceptible to selection.. rble direction and magr- arise when eligible conr, , *enrarive of the populati,' , the person-tim., th", g,

: {tffacholder et al, j,991t

, 1992b; S(acholder et al.Assume as in the sam,

,, Ihat controls refuse to p'. rrn if they ar. smokeiftorsrrokers. $7e would

' 'rnoking in the control

,' $verestimate both the ,,'i, gg$es and controls and tl'

, pital conrrols, neighborl. C{rltrols enrolled throug'..

phone lists have their o,. dtese have been extensivr,'l:l!lahon and Trichopoul<., ln contrast to selectio'-

btases, issues of chance ar:-SQually relevant to cohor

.1:tinvestigations (Hennek'.,.;1987; MacMahon and T,t..Rothman and Greenlanc

Figure 6'4B.' An association berween carrying matches and lung cancer would arise spuriouslydue to confounding by smoking_the major ."ur. of lung nless this confoundingis accounred for in the design or the analysis

In case-control studies, the ascertain_ment of exposure occurs after the occur_rence of disease. Therefore, this study de_sign is parricularly subject to informationbias. In particular, cases may be likely toremember their exposures differently ihancontrols-a form of information bias calledrecall bias. For example, a reasonable con_cern is that cases, or their relatives, are in_clined to ruminate about the disease andidentify a particular exposure as the caus_ative agent, either for conscious or subcon_scious reasons. Cases may also try harderthan controls to recall relatives with thedisease of interest, leading to a biased esti_

rr ANALYSIS OF EPIISTUDII,:. :

' Gff.rt Measures

. ?he underlying goal of*termine the magnitudStermrne the magnitud

S$e trequency caused byl'.:'$ we accomplish this?

:,i the cumulative incidencr, lfiong those exposed t<

, lfiple, we could observe

Page 17: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

d arise spuriouslyris confounding

,roblems of epidemio-the data are almost

rality. Data collection:tion of exposures andting by study partici-:ocedures, or existing:es are rarely perfect.r on diet rely on indi-:all on how frequentlys, or on serum markersfar from perfect indi-

r consumption. Suchinformation bias, can: risk in any directionaggeration, underesti-rsal of the true associ-

rudies, the ascertain-:curs after the occur-:refore, this study de-ubject to informationases may be likely to,sures differently thannformation bias calledple, a reasonable con-their relatives, are in-rbout the disease andexposure as the caus-'conscious or subcon-i may also try harderall relatives with theading to a biased esti-

Ftate of the effect of the family history

{ehang et al, 2006).A well thought-out protocol, standard-

ired procedures, and built-in quality con-trol measures can reduce bias and allowrome quantification of its potential impact.However, complete assurance that bias hastreen eliminated can never be achieved. Inrddition, the reliance of case{ontrol stud-i*s on a control series that simultaneouslyhas to meet criteria of compliance, compa-rebility to the case series, statistical effi-eiency, and general practicality makes them*usceptible to selection bias of unpredict-eble direction and magnitude. Such biasesarise when eligible controls are not repre-fcntative of the population, or more strictlythe person-time, that gave rise to the cases(Vacholder et al, 1,992a; I(acholder et al,1992b;

'sfacholder et al, 1.992c).

Assume as in the same previous examplethat controls refuse to participate more of-ten if they are smokers than if they arenonsmokers. We would then underestimate*moking in the control group and therebyoverestimate both the difference betweeneases and controls and the excess risk. Hos-pital controls, neighborhood controls, andGontrols enrolled through searches of tele-phone lists have their own problems, andthese have been extensively discussed (Mac-

Mahon and Trichopoulos, 1995).In contrast to selection and information

biases, issues of chance and confounding areequally relevant to cohort and case-controlinvestigations (Hennekens and Buring,1987 ;MacMahon and Trichopoulos, 1,996;Rothman and Greenland, 19981.

ANALYSIS OF EPIDEMIOLOGICSTUDIES

Effect Measures

The underlying goal of epidemiology is todetermine the rnagnitude of change in dis-

:ease frequency caused by an exposure. How, do we accomplish this? \fle could measure,the cumulative incidence or incidence rategmong those exposed to a factor. For ex-ample, we could observe that the incidence

143

rate of breast cancer in a population of al-coholic women is 50/10 000 person-years.This information provides an estimate ofthe overall disease burden in this studybase. However, we do not know how manycases would have arisen in the study base ifall the women in this population had notbeen alcoholics. In epidemiology, the un-exposed group stands in for the person-timeexperience of the exposed group had it notbeen exposed. Thus, we need to harvestinformation from both exposed and unex-posed person-time.

There are several ways through whichan association, or lack thereof, is assessed.Consider a population of women exposed toa high saturated fat diet and a group exposedto low saturated fat diets that are followedfor 5 years to see if they develop breastcancer. The absolute effect of the high-fatdiet would be the difference in the cumula-tive incidence bet'ween the fwo groups, orthe difference in the incidence rates. Sincethe experience of the low saturatedfatgroupshould represent what would have h"p-pened to the high saturated fat group if theyhad not eaten the high saturated fat, and ifthe n,rro groups are equivalent with respectto other breast cancer risk factors, the dif-ference in risks or rates represents the excessrisk or rate. These absolute-effect measuresare called the risk difference and rate dif-

ference, respectively.Although the absolute measures are easily

interpreted, more common are effect mea-sures that are taken as ratios and collec-tively known as the relatiue risA. This termincludes the risk ratio, rate ratio, odds ratio,standardized mortaliry ratio, and standard-ized incidence ratio. The risk ratio is simplythe cumulative incidence of disease amongthe exposed, divided by the cumulative in-cidence among the unexposed. The rate ra-tio is a ratio of the rates of disease amongthe exposed and unexposed. The odds ratiois the odds of disease among the exposeddivided by the odds of disease among theunexposed. Lastly, the standardized mor-tality ratio or standardized incidence ratio isa ratio of the observed number of deaths orcases in a cohort, divided by the expected

CONCEPTS IN CANCER EPIDEMIOLOGY AND ETIOLOGY

I

Page 18: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

1,44

number of deaths or cases inpopulation, usually stratifiedgender.

the generalby age and

A relative risk value of 1 implies that theexposure under study does not affect theincidence of the disease under consider-ation. Values below and above f. indicate anegative (inverse) and a positive association,respectively. For example, a relative risk of0.5 implies that the disease occurs only halfas frequently among exposed as among un-exposed individuals; the studied factor ap-pears to be protective. In contrast, if the rela-tive risk is 1.5, then the occurrence (usuallythe incidence) is 50% higher among ex-posed than among unexposed individuals.

Studies based on follow-up of closed co-horts may be ana|yzed by using either cu-mulative incidence (risk) measures or bycounting person-time and calculating inci-dence rate measures. Analyses based on cu-mulative incidence measures are only usefulunder certain conditions, such as no /ossto follow-up, no competing risks, and un-changed exposure status throughout fol-low-up. In addition, study subjects shouldbe followed for the same period of time.\0fhether or not these conditions are met, it isalways valid to conduct analyses based onperson-time, using incidence rate measures.

Interaction

The term interaction has been used to de-scribe different biological and statisticalconcepts. Indeed, even in the epidemiologicliterature, statements about interaction areoften ambiguous and inadequately speci-fied. From a biological point of view, com-ponent causes within the same sufficientcause may be thought of as interacting(Fig. 6-1). In other words, the exposures actsynergistically to produce disease, since in theabsence of one factor, disease will not occurby that mechanism. From an epidemiologicpoint of view, interaction is frequentlycharacterized as effect-modification: Thatis, a factor A and factor B alone have acertain relationship with a disease, but to-gether the factors have an effect differentthan that expected on the basis of themagnitude of their individual effects. The

expectation of the joint effect of factors Aand B can be assessed in either an additive ora multiplicative way.

'$7e can use the example in Table 6-2 to

illustrate how interaction is assessed. Whena multiplicative scale is assumed, there isstatistical interaction if the relative riskamong those exposed to both factors Aand B (that is, RRas) is different than theproduct of the two individual relative risks(that is, RRe x RRe). \fhen an additivescale is assumed, there is interaction if theRRes is different than (RRa + RRB - 1).In this exanrple, the expected relative riskfor someone with both exposures is 5.0(6.0-- [4.0 + 3.0-1J) under the additive-effect assumption (Table 6-2A), whereas itis 12.0 (12.0:4.0 x 3.0) under the mult i -plicative- effect assumption (Table 6-2F-l'.

Hence, interaction between two expo-sures is present when the relative risk issignificantly different from what is ex-pected according to a specified scale. Thus,for those with both exposures, we wouldhave interaction on the additive scale if therelative risk is significantly different from6.0 (Table 5-2A), and on the multiplicativescale if the relative risk is significantly dif-ferent from Q.A Gable 6-2F-l.If the rela-tive risk following exposure to both fac-tors compared to having neither is greaterthan the sum (minus the reference risk of1, which should not be counted twice) orproduct of the individual risks, we call thisinteraction super additiue or super multi-plicatiue, respectively. If the relative risk issignificantly lower, we refer to this as eithersubadditiue or submubiplicatiue. l

'Sfe can illustrate the concept of inter-

action using data from an epidemiologicalstudy of asbestos, smoking, and lung cancerrisk. The source population for the datashown in Table 6-2C is a cohort of in-sulation workers from the United States andCanada (Hammond et al, 19791. The ex-posed person-time was the experience ofover 12 000 male workers with at least 20years of asbestos exposure. The comparisonperson-time came externally from the ex-perience of more than 73 000 men of sim-ilar social class.

CONC

Table G2. Definitions of iexposed (+) or not exposedthese factors comprise the

Table 6-2A. Staristical intsuperadditive factors.

Factor A

1 .0(referen

4.0

Table G2B. Statistical inttand supermultiplicative fac

Factor A

1 .0(referenc

4.0

Teble 6-2C. Effects on lun

Smoking

1 .0(referenc10.9

*t*erce: Hammond et al. 1979.

BACKGROUND

.,:.i : l - : : .

:', :r Compared to men wh

.,i -, Fxure, the relative risk t

,...'- Htokers, but who were I'', ' h*tos occupationally wa

.ti-.tl.iidt of those exposed to' i ' t ' t t t t t - - - - - -

,;: 1i:1.;.;,' Xare not smokers, was

itive model (RR,-o1

i** 10.9 + 5.2 - 1). I(efhr+rrc interaction on

l'r.i-,Fd to both asbestos;F'*lative risk of lung canc' , , *rcd to those with neit

t*nmple, there appears r(: " , ' - ' i . * - * . l l i - i - , ^ ^ ^ ^ l ^ ^ i - ^ ^=-i,fo ndditive r."i., sincei:j",r ilan,,, = 53.2 is substanr;€fo rxpected relative risk

Page 19: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

CONCEPTS IN CANCER EPIDEMIOLOGY AND ETIOLOGY 145

"l"able 6-2. Definitions of interaction. Relative risks of developing a certain disease among subjectscxposed (+) or not exposed (-) to one or both factors denoted A and B. Subf ects exposed to neither ofthese factors comprise the reference category and their relative risk is by definition 1.0.

Table 6-2A. Statistical interaction on the additive scale with examples of subadditive andsuperadditive factors.

Factor B

t'actor A

1 . 0(reference)

4 .0

3.0

2.06 .08 .0

SubadditiveExpected under additive effects assumptionSuperadditive

Table GZB. Statistical interaction on the multiplicative scale with examples of submultiplicativeand supermultiplicative factors.

Factor B

Factor A

1 . 0(reference)

4.0

3.0

8.012.016.0

SubmultiplicativeExpected under multiplicative effects assumptionSupermultiplicative

Table 6-2C. Effects on lung cancer risk of smoking, asbestos, and both factors.

e joint effect of factors A

ised in either an additive or,ray.r example in Table 6-2 to

eraction is assessed. When

scale is assumed, there is

:tion if the relative risk

posed to both factors A

Rne) is different than thero individual relative risks

RRs). When an additivethere is interaction if the

r than (RRe + RRB - 1).

the expected relative risk

:h both exposures is 5.0-11) under the additive-r (Table 6-2A), whereas it

.0 x 3.0) under the multi-;sumption (Table 6-28).:tion between two expo-when the relative risk is

erent from what is ex-

to a specified scale. Thus,

oth exposures' we would

on the additive scale if thegnificantly different from

, and on the multiplicativeve risk is significantly dif-(Table 5-28).If the rela-

ng exposure to both fac-

r having neither is greater

.inus the reference risk of

not be counted twice) or

dividual risks, we call this' additiue or suqer multi-ively. If the relative risk is

rr, we refer to this as either'bmuhiplicatiue.'ate the concept of inter-

r from an epidemiological

, smoking, and lung cancerpopulation for the data

6-2C is a cohort of in-

from the United States and

rnd et al, 1,979). The ex-

le was the exPerience of

e workers with at least 20

exposure. The comParison

e externally from the ex-

than 73 000 men of sim-

Asbestos

!imoking

1 . 0(reference)

r0.9

5.2

53.2

Source: Hammond et al- 1979.

Compared to men who had neither ex-posure, the relative risk of those who weresmokers, but who were not exposed to as-bestos occupationally was 10.9; the relativerisk of those exposed to asbestos, but whowere not smokers, was 5.2. For those ex-posed to both asbestos and smoking, therelative risk of lung cancer was 53.2 com-pared to those with neither factor. In thisexample, there appears to be interaction onthe additive scale, since the RRr-oker andrrbestos:53.2 is substantially higher thanlhe expected relative risk of 15.1 under theadditive model (RRr-or". * RR"r6"..o,l- '1,0.9 + 5.2 - 1).

' \J7e do not, however,

observe interaction on the multiplicative

scale, since the relative risk for both smok-ing and asbestos (53.2) does not repre-sent a significant departure from whatis expected under the multiplicative-effectassumption (55.7: RRr-ok.. x RR"r6"r-

asbestos -'1,0.9 x 5.2 ).There are not any clear-cut guidelines

on whether to assess interaction in theadditive or multiplicative sefting for thevarious disease outcomes examined in epi-demiology, although both approaches areused (Brennan, t9991.

Meta-analysis

Random variation per se in epidemiologicstudies is not an insurmountable problem.

Page 20: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

146

Larger studies and eventually quantitativesummary analyses are increasingly used.Such systematic statistical evaluations ofresults of several independent investigationscan effectively address genuine chance-related concerns. Quantitative summaryanalyses have been termed meta-analysesandpooled analyses. There is no completelyaccepted distinction between the rwo terms,although meta-analysis is used more fre-quently when published results are com-bined. By contrast, in pooled analysis pri-mary individual-level data from differentstudies may be made available to an inves-tigator who undertakes the task of com-bining them. This facilitates the use ofuniform exposure categories and statisticalanalyses across studies and may permitanalyses that were not in the original pub-lications. For instance, analyses of effectmodification for which each initial studymay have been too small to be informative.

Meta-analyses and pooled analyses havebeen widely and effectively used for ran-domized controlled trials and interventionstudies, because in properly undertaken in-vestigations of this nature confounding andbias are nonissues (Sacks et al, 1,987). Forobservational epidemiologic studies, how-ever, the role of meta-analysis is not uni-versally accepted (Shapiro, 1994; Feinstein,L995). Some investigators are concernedthat no statistical summarization can effec-tively address problems generated by resid-ual confounding, unidentified bias, and theway investigators choose to present theirresults (legitimately, but occasionally se-lectively or arbitrarily). Nevertheless, meta-analyses have provided important, widelyaccepted data, even when derived from ob-servational data.

CAUSAL INFERENCEIN EPIDEMIOLOGY

General Principles

Regulatory agencies and policy makers mayrecommend standards, set limits, or autho-rize action even when the scientific evidenceis weak. These decisions serve public healthobjectives by introducing a wide safety mar-

gin, but they should not be confused withthe establishment of causation based onscientific considerations alone.

When results of an observational epide-miologic study designed to address a specifichypothesis are striking, the study is large,and there is no evidence of overt confound-ing or major biases, it is legitimate toattempt etiologic inferences. In contrast,interpretation becomes problematic when aweak association turns out to be statisticallysignificant-for example, in a large but im-perfect data set. Although that associationcould reflect.a weak-but genuine-<ausalassociation, it might also be the result ofresidual confounding, subtle unidentifiablebias, or chance, perhaps following a multi-ple testing process.

Repeated demonstration of an associa-tion of similar direction and magnitude inseveral studies, undertaken by different in-vestigators in different population groups,increases confidence in a genuine causalbasis but cannot conclusively establish this.Nor do meta-analyses establish causality.These techniques essentially address the is-sue of chance and provide no guarantee thata particular bias, unrecognized confound-ing, or selective reporting have not operatedin the constituent studies. It is at this stagethat both biologic and epidemiologic con-siderations should be taken into account ininterpreting the results of empirical studies.

Criteria for inferring causation from ep-idemiologic investigations have been pro-posed, over the years, by several authors,including MacMahon et al (19601, theUS Surgeon General (US Department ofHealth, 1964), Sir Austin Bradford Hill(Hil l, 1.965), the IARC (1987), and others.In spite of differences in emphasis, a similarset of principles have been invoked by mostauthors. Sir Austin Bradford Hill (1965)advocated the nine widely used criteria lis-ted in Table 6-3,to distinguish causal fromnoncausal associations.

The Hill criteria, although sensible anduseful, do not separately address the in-herently different issues that are posed bythe results of a single study, the results ofseveral studies, and the likelihood of cau-

col

sation in a certain indiperceived likelihood obefween a particularcific disease moves fora continuous spectruraccumulate. The eviddeclared as sufficienrthreshold has been rea,requires reevalutationquent evidence (Cole,

The IARC Classificati

The International AgeCancer (IARC) evaluaagents to determine ifin humans. In order tsion, the IARC has im1of criteria for evaluarcity of agenrs. After cridence, the IARC worktgent to one of five cat

Table 6-3. The Hill critr

(,riteria

litrength

tlonsistency

!ipccificity

I'nnporaliry

$rEtlient

BACKGROUND

,.,-:..ktfurtmental evidence

i;,-i':.'t,fui*l,,sv

Page 21: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

liation in a certain individual. In reality, theperceived likelihood of a causal associationberween a particular exposure and a spe-cific disease moves forward or backward inil continuous spectrum as research resultsaccumulate. The evidence for causality isdeclared as sufficient when a oarticularrhreshold has been reached, but on occasionrcquires reevalutation in the light of subse-quent evidence (Cole, 1997).

The IARC Classification

The International Agency for Research on(lancer (IARC) evaluates the risk of specificagents to determine if they are carcinogenicin humans. In order to come to a conclu-sion, the IARC has implemented its own setof criteria for evaluating the carcinogeni-city of agents. After considering all the ev-idence, the IARC working group assigns theagent to one of five categories, summarized

Table 6-3. The Hill criteria for inferring causation

147

in Table 6-4. Group 1 indicates that thereis sufficient evidence to conclude that theagent is carcinogenic to humans. A label ofgroup 2A means that there are insufficienthuman data, but there is strong evidencethat the agent is carcinogenic in animalmodels. Agents for which there is limitedevidence in humans and insufficient evi-dence in experimental animals are assignedto group 28. Group 3 is used when there isinadequate human and animal data to cometo a conclusion. Group 4 indicates that theagent is most likely not a carcinogen inhumans based on adequate evidence sug-gesting that it is not a carcinogen in bothanimal models and human studies.

The Process of Causal Inference

Criteria for causality can be invoked, ex-plicitly or implicitly, in evaluating the resultsof a single epidemiologic study, although,

CONCEPTS IN CANCER EPIDEMIOLOGY AND ETIOLOGY

not be confused withI causation based onlns alone.r observational epide-.ed to address a specificng, the study is large,rce of overt confound-s, it is legitimate toferences. In contrast,es problematic when ars out to be statisticallyrple, in a large but im-rough that association-but genuine-causal

also be the result of

;, subtle unidentifiableaps following a multi-

:ration of an associa-.ion and magnitude inrtaken by different in-nt population groups,

in a genuine causal:lusively establish this.es establish causality.:ntially address the is-,vide no guarantee thatrecognized confound-ting have not operatedrdies. It is at this stagerd epidemiologic con-, taken into account ints of empirical studies.ng causation from ep-rtions have been pro-'s, by several authors,)n et al (1960),, theI (US Department ofAustin Bradford Hill.C (1987), and others.rin emphasis, a similarbeen invoked by mostBradford Hil l (t9551ridely used criteria lis-listinguish causal from:ls.although sensible and:ately address the in-ues that are posed bye study, the results ofthe likelihood of cau-

( lriteria. Definition

Strength

( ionsistency

Specificity

'l 'cmporality

( iradient

I' lausibil ity

(.oherence

l..xperimental evidence

Analogy

A strong association is more likely to be causal. The measure of strength of anassociation is the relative risk and not statistical significance.

An association is more likely to be causal when it is observed in differentpopulation groups.

When an exposure is associated with a specific outcome only (for example, acancer site or even better a particular histological type of this cancer), thenit is more likely to be causal. There are exceptions, however, for example,smoking causing several forms of cancer.

A cause should not only precede the outcome (disease), but also the timing ofthe exposure should be compatible with the latency period (in non-infectious diseases) or the incubation period (in infectious diseases).

This criterion refers to the presence of an exposure-response relationship. Ifthe frequency or intensity of the outcome increases when an exposure ismore intense or lasts longer, then it is more likely that the association iscausal .

An association is more likely to be causal when it is biologically plausible.

A cause and effect interpretation of an association should not conflict withwhat is known about the natural history and biology of the disease, or its

distribution in time and place.

If experimental evidence exists, then the association is more likely to becausal. Such evidence, however, is seldom available in human populations.

The existence of an analogy (for example, if a drug causes birth defects, thenanother drug could also have the same effect) could strengthen the beliefrhat an association is causal.

I twrce: Hi l l , 1955.

Page 22: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

1,48

in this instance, a firm conclusion is all butimpossible. In the approach introduced byCole (19971, this situation is denoted as sin-gle study level, or level I. Criteria for cau-sality are more frequently used for the assess-ment of evidence accumulated from severalepidemiologic studies and other biomedicalinvestigations. At this stage, the intellectualprocess is inductive, moving from the spe-cifics to generalization (several studies level,or level II). Finally, when causation has beenestablished at level II, then, and only then,can the cause of the disease in a particularindividual be considered (specific personlevel, or level III). At this level, the intellec-tual process is deductive, moving from thegeneral concept of disease causation to theexamination of what might have causeddisease in a particular individual.

The indiuidual study (leuel I)Causality can never be inferred on the basisof a single epidemiologic study, but the like-lihood that an observed association is causalis strengthened when several of the follow-ing criteria are met: (7)minimal confound-ing; (2) minimal bias; /3/ limited chancevariation; (4) rclatively strong association;

/5i monotonic exposure-disease association,otherwise referred to as exposure-responseor dose-response association; (6) internalconsistency, exemplified by similarity ofexposure-response patterns among varioussubgroups of study subjects; (7) compati-bility of the temporal sequence of exposureand outcome with the known or presumedlatency of the disease; and, lastli, 8) Aio-logic plausibiliry, that is, a causal link be-rween the exposure and the disease shouldbe, at a minimum, biologically conceivable

(it should not contradict physical theory orbiological principles ).

The general case (seueral studies, leuel II)Establishment of the etiologic role of a par-ticular exposure on the occurrence of a dis-ease ideally requires strong epidemiologicevidence, an appropriate and reproducibleanimal model, and documentation atthe mo-lecular or cellular level of the morphologicalor functional pathogenetic process. Some-times, an intended or unintended change, ornatural experiment, greatly facilitates etio-logic inference: This happens when, for ex-ample, an occupational group is exposed tohigh levels of compounds rarely encounteredin other seftings, a religious group avoids anexposure that is otherwise widespread, or avaccine that creates herd immuniry against aparticular virus turns out to reduce the in-cidence of a certain form of cancer.

These conditions, however, are rarely col-lectively satisfied. Instead investigators haveto be guided by the best available biomedicalevidence in order to interpret correctly epi-demiologic data from several studies. Thefollowing criteria need to be considered: (1)consistency, that is similarity (lack of het-erogeneity) of results obtained by differentinvestigators using different study designsin different populations; (2) overwhelmingbiomedical evidence for weak associations,whereas for strong associations reliance onpowerful biomedical knowledge is less crit-ical; (3) compatibiliry of exposure-responsepatterns across different studies exploringthe exposure-disease association in differentexposure ranges; (4) coherence, which re-quires results from analytic epidemiologicstudies to be compatible with ecologic pat-

CONC

terns and time trends, suincidence of lung canceling the increasing use cby the population; (5)

exists when one type ctently linked with onerather than several exPsociated with a certainof exposure being asscdiseases; and (6) biologexists when a similarshown to cause a similaspecies or a different fohumans. For example'shown to cause leukemspecies and at least onemia in humans.

None of these criteriaabsolutely necessary forsine qua non. But the evistrengthened when mos

Disease in a specific PeCausality can be con<between a particular e:

and a particular disease

trast, it is not possiblelink conclusively betwta particular disease offor example, smoking i

cancer. It is possible, h

ductively that the speness was rnore likely thspecified exposure.

For this conclusionfollowing criteria must(1)Theexposure under

entify, must be an esttdisease under conside(level II). (2) The rele'particular individual r

comparable (in terms (

nssociated latencY, etcheen shown to cause tlsideration. 13l The dirperson must be identifymptomatological spr

that, as an entity, h:linked to the exposurenot have been exPostlished or likely cause

BACKGROUND

Table 6-4. International Agency for Research on Cancer (IARC)classification of carcinogeniciry of agents, mixtures or processes

Group 1

Group 2A

Group 28

Group 3

Group 4

The agent is carcinogenic to humans

The agent is probably carcinogenic to humans

The agent is possibly carcinogenic to humans

The agent is not classifiable in terms of its carcinogenicity

The agent is not carcinogenic to humans

Page 23: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

terns and time trends, such as the increasingincidence of lung cancer over time, follow-ing the increasing use of tobacco productsby the population; (5) specificity, whichexists when one fype of disease is consis-tently linked with one fype of exposurerather than several exposures all being as-sociated with a certain disease, or one rypeof exposure being associated with severaldiseases; and (5) biological analogy, whichcxists when a similar exposure has beenshown to cause a similar disease in anotherspecies or a different form of the disease inhumans. For example, viruses have beenshown to cause leukemia in several animalspecies and at least one rare form of leuke-mia in humans.

None of these criteria can be considered asabsolutely necessary for causal inference-asine qua non. But the evidence for causaliry isstrengthened when most of them are met.

Disease in a specific person (leuel III)Causality can be conclusively establishedbetween a particular exposure as an entity-and a particular disease as an entity. In con-tfast, it is not possible to establish such alink conclusively bet'ween an exposure andr particular disease of a given individual-for example, smoking in a patient with lungcancer. It is possible, however, to infer de-ductively that the specific individual's ill-ness was more likely tban not caused by thecpecified exposure.

For this conclusion to be drawn, all the, following criteria must be met (Cole, 1,997)2', (1)The exposure under consideration, as an

sntify, must be an established cause of the,disease under consideration, as an entity

{level W. Q) The relevant exposure of theparticular individual must have properties

: comparable (in terms of intensity, duration,I tssociated latency, etc) to those that have'been shown to cause the disease under con-dderation. (3) The disease of the specifiedpcrson must be identical to, or within the

i $tmptomatological spectrum of, the disease'that, as an entity, has been etiologicallylinked to the exposure. ft)Thepatient mustBot have been exposed to another estab-fished or likely cause of this disease. If the

149

patient has been exposed to both the factorunder consideration (for example, smoking)and to another causal factor (for example,asbestos), individual attribution becomes afunction of several relative risks, all versusthe completely unexposed: (a) relative riskof those who only had the exposure underconsideration, (b) relative risk of those whohad only been exposed to the other causalfactor(s), and, (c)relative risk of those whohave had a combination of these exposures.(5) The relative risk should be reasonablyelevated (e9,2 or more).

The last criterion stems from the fact thatthe relative risk comprises a baseline com-ponent equal to 1., which characterizes theunexposed, plus another component thatapplies only to the exposed. I7hen the rel-ative risk is higher than 1 but less than 2 theindividual who has been exposed and hasdeveloped the disease is more likely thannot to have developed the disease for rea-sons not entirely due to the exposure. Forinstance, if the risk of a light-smoking 55-year-old man to suffer a first heart attack inthe next five years is 67", and that of a same-age non-smoking man is 4"/" (relative risk1..5), then only 33o/o of the smoker's risk(that is, 1/3 of the total 5"/o) cag be attrib-uted to his smoking. I7hen the relative riskis higher than 2, apartrcular individual whohas been exposed and has developed thedisease under consideration is more likelythan not to have developed the disease be-cause of the exposure.

CONCLUSION

Manipulation of exposures in humans, manyof which may be harmful, is frequently un-feasible, unethical, or both. Therefore, epi-demiologists have to base their inferences onexperiments that humans subject themselvesto intentionally, naturally, or even uncon-sciously. The study of risk for lung canceramong smokers compared with nonsmokersis one classic example of a natural experi-ment.

Because human life is characterized bymyriad complex, often interrelated, behav-iors and exposures-ranging from genetic

CONCEPTS IN CANCER EPIDEMIOLOGY AND ETIOLOGY

ARC)esses

)genlclry

lict physical theory or

'eral studies, leuel lI):tiologic role of a par-re occurrence of a dis-strong epidemiologic

'iate and reproducible:umentation at the mo-,l of the morphologicalenetic process. Some-unintended change, or

treatly facilitates etio-rappens when, for ex-al group is exposed tonds rarely encounteredigious group avoids an:wise widespread, or a:rd immunity against aout to reduce the in-

rrm of cancer.owever, are rarely col-:ead investigators havest available biomedicalnterpret correctly epi-n several studies. Thed to be considered: (L)imilarity (lack of het-obtained by different

ifferent study designs>ns; (2) overwhelmingfor weak associations,;sociations reliance onknowledge is less crit-' of exposure-responserent studies exploringrssociation in differentcoherence, which re-

rnalytic epidemiologicible with ecologic pat-

Page 24: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

traits and features of the intrauterine envi-ronment to growth rate; physical activity;sexual practices; use of tobacco, alcohol,and pharmaceutical compounds; dietary in-take; exposure to infections, environmentalpollutants, and occupational hazards; andso on-epidemiologic investigation is diffi-cult and challenging. Given this complexity,it is not surprising that from time to timeepidemiologic studies generate results thatappear confusing, biologically absurd, orcontradictory. However, it is reassuringthat a wealth of new knowledge has beengenerated by epidemiologic studies over thelast few decades. This knowledge now laysthe scientific ground for primary preventionof many major cancers and other chronicdiseases among humans globally.

A detailed study of epidemiologic meth-odology in any textbook (Hennekens andBuring, 1987; Miettinen, 1985; Walker,7991; MacMahon and Trichopoulos' 1995;Rothman and Geenland, 1,998; Rothman,2002) can be fascinating and indeed neces-sary for those who want to pursue their ownresearch. However, for the reader of thistextbook, the general concepts introduced inthis chapter should provide a sufficient ba-sis. We have tried to convey that the some-times esoteric theory of modern epidemi-ology can be condensed to a few centralissues-namely (1) how to quantify andunderstand the impact of chance , (2)how tobest harvest information on exposures andoutcomes from a source population by usinga cohort design, a case-control design, orvariants thereof, (3) how to achieve validresults by minimizing the impact of con-founding and bias, and, (4) how to addressthe central issue of causality in a structuredway.

GLOSSARY

150 BACKGROUND

Closed cohort A closed cohort comprisesa set of individuals who are followed for adefined period of time. After becoming amember of the cohort, an individualremainsin the cohort until the end of the study. ordevelopment of the outcome.

Competing risks The risk of death from acertain disease competes with the risk ofdeath from another disease by affecting timeat risk. Competing risks generally bias riskratios, but not rate ratios, since person-timeallows for different follow-up time.

Component cause An exposure that acts inconcert with other factors (component cau-ses) to produce disease. None of these factorsare sufficient in themselves to cause disease.

Confidence interaal A statistical measurethat provides range of possible values thatinclude the true measure of association witha particular degree of certainty. For exam-ple, a 95% confidence interval provides arange of values that wil l include the truevalue 95% of the t ime.

Confounding A systematic error generatedwhen another factor, that causes the diseaseunder study or is otherwise related with it,is also related to the exposure under inves-tigation, without being in the pathway thatl inks exposure under investigation with thedisease under study.

Ecologic study The study of exposure andthe disease at the population level, ratherthan at the individual level.

Epidemiology The nonexperimental inves-tigation of determinants of human disease.

Experimental study See randomized con-trolled trial

Infonnation bias A random, or nonrandom,misclassification of information on eitherthe exposure, outcome, or confounding var-iables that leads to a biased estimation of thetrue effect.

Cause A factor is a cause of a certain disease Loss to follow-up The inability to follow

when alterations in the frequency or inten- beyond a certain point in time and thus

sity of this factor-without concomitant ascertain the ultimate fate of individuals in

alterations in other factors-are followed a cohort study.

by changes in the frequency of occurrence of Necessary cause A factor or exposure thatthe disease, after the passage of a certain is essential in the etiology of the disease andtime period (latency, or induction period). without which the disease cannot occur.

lror example, the hr rrrrs is a necessal' r r r rnodef ic iency syt , letors may be in, l rscase to occur.

Nonexperimentals t r r d / .()bsentational stu,rnvestigator cannlst.lrces of the exp<()tlds ratio A relatron, which is calc, , . lc ls of d isease :t r. led by the odds ,, ' r posed.()pen cohort A col'nt'nrbership changt ' r r tcr ing or exi t ingi l i l .

I'crson-time The sr ' . reh study part ic ip

1t-ualue A value tIr, rod of observing i.r \ , of mofe extfel

lr t ' t rve€rr a part icul:

r lrscrs€, i f there we

Rrndomized contr r r t 'n tz l s tudy des ig :

' . r r rc lomly a l locater' r r l l be sub jected c

posUfe.

Rrcoll bias A misc\ r r rc , common in c, re urs when subje

r r r t ' rnbef or repor t

, ' r r t ly than those w

Raktiue risk A te.. , I t l )cs the various,, rr i rr t iof l , that is t l

t r r , , t [ s odds ra t i t

, r r . i t l cnce or mor ta

\rlaction bias Asy:l r or r r the process of

r l rc s tLrd | or on acl l r r t ' r rCC par t ic ipat ic

l r r . r r occufs when tt l r t ' t ' xposu re and t J

I l r , rsc in the s tudy' . t r r t l r ' .

Page 25: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

CONCEPTS IN CANCER EPIDEMIOLOGY AND ETIOLOGY 1 5 1

>sed cohort comprisesyho are followed for ane. After becoming a:, an individual remainsre end of the study, orutcome.

: risk of death from aretes with the risk ofisease by affecting timesks generally bias risktios, since person-timeollow-up time.

n exposure that acts inctors (component cau-:. None of these factors;elves to cause disease.

A statistical measurelf possible values thature of association withf certainty. For exam-ce interval provides awill include the true

re.

:matic error generatedthat causes the diseaseerwise related with it,exposure under inves-rg in the pathway thatinvestigation with the

fudy of exposure andrpulation level, ratherI level.

rnexperimental inves-nts of human disease.

See randomized con-

.ndom, or nonrandom,nformation on eithere, or confounding var-iased estimation of the

re inabiliry to followint in time and thus: fate of individuals in

ctor or exposure thatogy of the disease andlisease cannot occur.

For example, the human immunodeficiencyvirus is a necessary cause of acquired im-munodeficiency syndrome, although otherfactors may be involved in order for thedisease to occur.

Nonexperimental study See observationalItudy.

Obsentational study A study in which theinvestigator cannot control the circum-ftances of the exposure.

Odds ratio A relative measure of associa-iion, which is calculated as the ratio of theodds of disease among the exposed di-vided by the odds of disease among the un-txposed.

Open cobort A cohort of individuals whosemembership changes over time, with peopletntering or exiting based on defining crite-ria.

Petson-time The sum of all time spent byeach study participant at risk for a disease.p-ualue A value that indicates the likeli-hood of observing an association as extremegs, or more extreme than, the one foundberwben a particular exposure and a certaindisease, if there were in fact no association.

Fsndotnized controlled trial An experi-mental study design in which the researcherrandomly allocates subjects to groups thatwill be subjected or not to a particular ex-posure.

Recall bias A misclassification of an expo-sure, common in case{ontrol studies, thatoccurs when subjects with the disease re-member or report their exposures differ-cntly than those without disease.

Relatiue risk A term that collectively de-rcribes the various relative measures of as-rociation, that is the risk ratio, the rate ra-tio, the odds ratio, and the standardizedincidence or mortality ratio.

Selcction bias Asystematic error that resultsfrom the process of selecting participants forthe study or on account of factors that in-fluence participation in the study. Selectionbias occurs when the relationship befweenthe exposure and the disease is different forthose in the study than for those not in thertudy.

Study base The person-time of a group ofindividuals at risk for a disease from whichan investigator aims to harvest informationabout disease occurrence.

Sufficient cause A minimal set of factors orexposures that inevitably produce the dis-ease after a certain period of time.

REFERENCES

Brennan P. Chapter 1.2: Design and analysis is-sues in casrcontrol studies addressing ge-netic susceptibiliry. IARC Sci Publ 1,999;1,48:123-32.

Chang ET, Smedby KE, Hialgrim H, Glimelius B,Adami HO. Reliability of self-reported fam-ily history of cancer in a large case-controlstudy of lymphoma.J Natl Cancer Inst 2006;98(1) :51-58 .

Clemmesen J, Nielsen A. Comparison of age-adjusted cancer incidence rates in Denmarkand the United States. J Natl Cancer Inst1957;192989-98.

Cole P. Causaliry in epidemiology, health policyand law. Environmental Law Reporter 1,997;2721,0279-85.

Cordell HJ, Clayton DG. Genetic Epidemiology3-Genetic association studies. Lancet 2005;366:1,121-37.

Doll R, Hill AB. Smoking and lung cancer: pre-liminary report. Brit Med J 1950; 2:739-48.

Doll R, Hill AB. Lung cancer and other causesof death in relation to smoking. Brit MedJ 1956;221071,-81.

Doll R, Peto R, Boreham J, Sutherland I. Mor-taliry from cancer in relation to smoking:50 years observations on British doctors. BrJ Cancer. 2005;92:426-29.

Feinstein AR. Meta-analysis: statistical alchemyfor the 21't century. J Clinical Epidemiology'1,995;48:71-79.

Greenland S, Robins J. Invited commentary:ecologic studies-biases, misconceptions, andcounterexamples. Am J Epidemiol 1994;739:747-50.

Hammond EC, Seikoff IJ, Seidman H. Asbestosexposure, cigarette smoking and death rates.Ann NY Acad Sci 1,979;330:473-90.

Hansson LE, Nyren O, Hsing A'!7, Bergstrom R,Josefsson S, Chow

'WH, et al. The risk of

stomach cacner in patients with gastric or du-odenal ulcer disease. New Engl J Med 1,996;3352242-49.

Hennekens CH, Buring JE. Epidemiology inMedicine. Boston: Little, Brown, 1,987.

Hennekens CH, Speizer FE, Lipnick RJ, Rosner,Bain C, Belanger C, et al. A case-controlstudy of oral contraceptive use and breastcancer. J Natl Cancer Inst 1984;72:39-42.

Page 26: Concepts in Cancer Epidemiology and Etiology...adducts and lung cancer idemiologic study. J Natl Zzl764-72. Toniolo P, Boffetta P, man N, Hulka B, Pearce,n of Biomarkers in Can-IARC

r52 BACKGROUND

Hill AB. The environment and disease: associa-tion or causation? Proc Roy Soc Med 1965;58:295-300.

Hunter DJ, MorrisJS, Stampfer MJ, Colditz GA,Speizer FE, \Tillet \7C. A prospective studyof selenium status and breast cancer risk.

JAMA 1.990;264:11,28-31.International Agency for Research on Cancer.

IARC Monographs on the Evaluation ofCarcinogenic Risks to Humans, Supplement7, Overall Evaluations of Carcinogeniciry:An Updating of IARC Monographs, Vo-lumes "1. to 42, Lyon 1987.

MacMahon B. Epidemiological evidence on thenature of Hodgkin's disease. Cancer 1957;10:1045-54.

MacMahon B. Strengths and limitations of epi-demiology. In: The NationalResearch Coun-cil in 1979. Current issues and studies.Washington, DC: National Academy of Sci-ences, 1979291-104.

MacMahon B, Pugh TF, Ipsen J. EpidemiologicMethods. Boston: Little, Brown, 1960.

MacMaho4 B, Trichopoulos D. Epidemiology:Principles and Methods. Boston: Little,Brown, 1,996.

Miettinen OS. Theoretical Epidemiology: Prin-ciples of Occurrence Research in Medicine.New York: 'Wiley, 1985.

Morgenstern H. Uses of ecologic analysis in ep-idemiologic research. Am J Public Health1,982;72:7336-44.

Rothman KJ. Causes. Am J Epidemiol 1,975;1,04:587-92.

Rothman KJ, Modern Epidemiology. Boston:Little, Brown, 1,986.

Rothman KJ. Epidemiology: An Introduction.New York, Oxford University Press, 2002.

Rothman KJ, Greenland S. ModernEpidemiol ogy-2"0 Ed. Philadelphia:Lippincott-Raven, 1998.

Sacks HS, BerrierJ, Reitman D. Meta-analysis ofrandomized controlled trials. N Engl J Med1,987;316:450-55.

Shapiro S. Meta-analysis/Shmeta-analysis. Am JEpidemiol 199 4 ;1,40 :7 7 1-7 8.

Susser M. IThat is a cause and how do weknow one? A grammar for pragmatic epi-

demiology. Am J Epidemiol 1.991;1332635 -48 .

Taubes G. Epidemiology faces its limits. Science1,995;269:1.64-69.

Teare DM, Barrett JH. Genetic Epidemiology2-Genetic linkage studies. Lancet 2005;366:1036-44.

US Department of Health, Education and Wel-fare. Smoking and Health. Report of theAdvisory Committee to the Surgeon Generalof the Public Health Service. Publication1103. Washington, DC: US GovernmentPrinting Office; 1964.

rilTacholder S, Mclaughlin JK, Silverman DT,Mandel JS. Selection of controls in case-control studies: I. Principles. Am J Epidemiol1,992; 1.3 5 :1.01.9-28.

lVacholder S, Chanock S, Garcia-Closas M, ElGhormli L, Rothman N. Assessing the prob-ability that a positive report is false: an ap-proach for molecular epidemiology studies.

J Natl Cancer Inst. 2004;962434-42.Wacholder S, Silverman DT, Mclaughlin JK,

Mandel JS. Selection of controls in case-control studies: II. Types of controls. Am JEpidemiol 1992;13 5 :1.029- 41.

Wacholder S, Silverman DT, Mclaughlin JK,Mandel JS. Selection of controls in case-control studies: III. Design options. Am JEpidemiol 1992;13 5 :1042-50.

\0alker AM. Observation and inference: an in-troduction to the methods of epidemiology.Newton Lower Falls, MA. EpidemiologyResources Inc, "1.99 "1..

Weiderpass E, Adami HO, Baron JA, Magnus-son C, Bergstrom R, Lindgren A, et al. Riskof endometrial cancer following estrogenreplacement with and without progestins.

J Natl Cancer lnst 1,999; 91,:1,1,31,-37.Wynder EL, Graham EA. Tobacco smoking as a

possible etiologic factor in bronchiogeniccarcinoma-a study of 584 proved cases.

JAMA 19 5 0;1.43 :329-36.Zhang SM, Hankinson SE, Hunter DJ, Gio-

vannucci EL, Colditz GA, Villett VC. Folateintake and risk of breast cancer characterizedby hormone receptor status. Cancer Epide-miol Biomarkers Prev. 2005 :1,4:2004-8.

Ct

BY


Recommended