+ All Categories
Home > Documents > Geographically-based cancer control: Methods for targeting and evaluating the impact of screening...

Geographically-based cancer control: Methods for targeting and evaluating the impact of screening...

Date post: 30-Dec-2016
Category:
Upload: elmer
View: 212 times
Download: 0 times
Share this document with a friend
11
J clla Epidemiol Vol. 41, No. 6, pp. 543-553, 1988 Printed in Great Britain. All rights reserved 0895-4356/88 $3.00 + 0.00 Copyright Q 1988 Pergamon Press plc GEOGRAPHICALLY-BASED CANCER CONTROL: METHODS FOR TARGETING AND EVALUATING THE IMPACT OF SCREENING INTERVENTIONS ON DEFINED POPULATIONS JON F. KERNER,‘~* HOWARD ANDREWS,~ ANN ZAUBER’ and ELMER STRUENING~ ‘Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY 10021 and *New York State Psychiatric Institute, 722 W. 168th St, New York, NY 10032, U.S.A. (Received in revised form 9 December 1987) Abstract-Successful implementation of cancer control programs depends on efficient targeting to those at highest risk of developing and dying from the disease. This study presents a methodology for targeting cancer screening on the basis of population and disease variation among small geographic areas. Techniques for quantifying the impact of targeting on the predictive value of a positive test are demonstrated, using 329 New York City health areas. Age-truncated crude incidence, late-stage incidence and mortality rates for breast, cervix, and colorectal cancer are used, using site-specific truncation points relevant to the age groups appropriate for screening. Coefficient alpha was used to determine rate stability with 2, 3, 5 and 7 years of data. The stability of most small area rates was found to reach acceptable levels only with 5 and 7 years of data. Targeting into areas where breast cancer prevalence was high increased the expected predictive value of a positive test by as much as 50% when compared with areas of average prevalence. Geographic targeting will be most useful where between-area variability in prevalence is large and within-area variability is small. The implications of these results are discussed and future studies are suggested. Small area variation Cancer incidence Cancer mortality Cancer screening INTRODUCTION An issue of growing importance to the nation’s effort to control cancer is the identification of mechanisms by which national goals and objec- tives for reducing cancer incidence and mor- tality can be translated into programs tailored for specific populations in specific geographic regions. If the nation’s cancer programs are to avoid the pitfalls that befell the cancer control demonstration projects of the 1970’s_major resource expenditures with limited evaluated impact-then national priorities must reflect an This investigation was supported in part by Grant No. CCG 236, awarded b; the American Cancer Society, and PHS Grant No. CA 16402. awarded bv the National Cancer Institute, Department of Heal& and Human Services. *Reprint requests should be addressed to: Dr Jon F. Kemer, Box 60, Divison of Cancer Control, MSKCC, 1275 York Avenue, New York, NY 10021, U.S.A. awareness of: (1) the diversity of regional cancer problems, (2) the resources needed and available to solve these problems, and (3) methods with which the effectiveness of different solutions can be assessed. One methodology that contributes to our understanding of all three issues is the analysis of the small area variation of population and disease characteristics. The mapping and analysis of geographic vari- ations in disease incidence has a long history of providing etiological clues for primary pre- vention [I]. For cancer, the mapping of mor- tality statistics has been utilized in the same fashion in a variety of studies which generated etiological hypotheses, provided support or cast doubt on existing hypotheses, and suggested places and levels of aggregation for future epidemiological research [2-51. Geographic mapping of disease variation has also been used, primarily with other diseases, in 543
Transcript
Page 1: Geographically-based cancer control: Methods for targeting and evaluating the impact of screening interventions on defined populations

J clla Epidemiol Vol. 41, No. 6, pp. 543-553, 1988 Printed in Great Britain. All rights reserved

0895-4356/88 $3.00 + 0.00 Copyright Q 1988 Pergamon Press plc

GEOGRAPHICALLY-BASED CANCER CONTROL: METHODS FOR TARGETING AND EVALUATING THE

IMPACT OF SCREENING INTERVENTIONS ON DEFINED POPULATIONS

JON F. KERNER,‘~* HOWARD ANDREWS,~ ANN ZAUBER’ and ELMER STRUENING~

‘Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY 10021 and *New York State Psychiatric Institute, 722 W. 168th St, New York, NY 10032, U.S.A.

(Received in revised form 9 December 1987)

Abstract-Successful implementation of cancer control programs depends on efficient targeting to those at highest risk of developing and dying from the disease. This study presents a methodology for targeting cancer screening on the basis of population and disease variation among small geographic areas. Techniques for quantifying the impact of targeting on the predictive value of a positive test are demonstrated, using 329 New York City health areas. Age-truncated crude incidence, late-stage incidence and mortality rates for breast, cervix, and colorectal cancer are used, using site-specific truncation points relevant to the age groups appropriate for screening. Coefficient alpha was used to determine rate stability with 2, 3, 5 and 7 years of data. The stability of most small area rates was found to reach acceptable levels only with 5 and 7 years of data. Targeting into areas where breast cancer prevalence was high increased the expected predictive value of a positive test by as much as 50% when compared with areas of average prevalence. Geographic targeting will be most useful where between-area variability in prevalence is large and within-area variability is small. The implications of these results are discussed and future studies are suggested.

Small area variation Cancer incidence Cancer mortality Cancer screening

INTRODUCTION

An issue of growing importance to the nation’s effort to control cancer is the identification of mechanisms by which national goals and objec- tives for reducing cancer incidence and mor- tality can be translated into programs tailored for specific populations in specific geographic regions. If the nation’s cancer programs are to avoid the pitfalls that befell the cancer control demonstration projects of the 1970’s_major resource expenditures with limited evaluated impact-then national priorities must reflect an

This investigation was supported in part by Grant No. CCG 236, awarded b; the American Cancer Society, and PHS Grant No. CA 16402. awarded bv the National Cancer Institute, Department of Heal& and Human Services.

*Reprint requests should be addressed to: Dr Jon F. Kemer, Box 60, Divison of Cancer Control, MSKCC, 1275 York Avenue, New York, NY 10021, U.S.A.

awareness of: (1) the diversity of regional cancer problems, (2) the resources needed and available to solve these problems, and (3) methods with which the effectiveness of different solutions can be assessed. One methodology that contributes to our understanding of all three issues is the analysis of the small area variation of population and disease characteristics.

The mapping and analysis of geographic vari- ations in disease incidence has a long history of providing etiological clues for primary pre- vention [I]. For cancer, the mapping of mor- tality statistics has been utilized in the same fashion in a variety of studies which generated etiological hypotheses, provided support or cast doubt on existing hypotheses, and suggested places and levels of aggregation for future epidemiological research [2-51.

Geographic mapping of disease variation has also been used, primarily with other diseases, in

543

Page 2: Geographically-based cancer control: Methods for targeting and evaluating the impact of screening interventions on defined populations

544 JON F. KERNER et al.

relation to health resource utilization, and as baseiine data for health program planning and evaluation [6,7]. Health information about specific populations is an essential prerequisite for planning in the health care field [8].

The authors have been working to develop and evaluate different models for targeting can- cer control programs on the basis of a small area analysis of census, cancer mortality, and cancer incidence data. The long term goal is to develop a series of models that will permit regional cancer control programs to be targeted on the basis of disease risk and population character- istics that have intervention implementation im- plications. Thus community health planners may choose to target cancer control programs using (a) census data only, (b) census and cancer mortality data, or (c) census, cancer mortality and incidence data, depending on the avail- ability of data in a particular region, and the type of program being targeted (e.g. prevention, screening, management).

Our previous studies of small area disease variation focused on 1976-79 female breast and male colon cancer incidence rates, and selected 1980 census variables within one county of New York City [9]. Selected census data were utilized to: (1) identify census tract clusters of popu- lation characteristics that would suggest a higher risk of developing cancer, and (2) develop predicted crude cancer incidence rates for census tracts that could be cluster analyzed to identify high risk areas. In addition, observed census tract crude cancer incidence rates were cluster analyzed. Although statistically significant differences were observed between high rate and low rate clusters, the magnitudes of the inter-cluster differences in disease vari- ation were not large enough to justify targeting cancer control programs on this basis alone.

In the more recent work reported here, we have focused on the variation in 197682 female breast, cervix, and colorectal (male and female combined) cancer incidence and 1977-83 cancer mortality data, and selected 1980 census vari- ables for all five counties of New York City. The additional cancer sites, and the use of incidence and mortality rates limited to particular age categories, were selected primarily because one major focus of our work has been in utilizing this methodology to target cancer screening programs into neighborhoods where the histor- ical rates of cancer mortality and late-stage incidence are highest. The combination of rela- tively high mortality rates and a relatively high

incidence of late-stage disease in specific neigh- borhoods suggests the potential of targeting a screening program in these areas and provides a mechanism to evaluate the community-wide im- pact of the program on “downstaging” disease and reducing cancer mortality.

Moreover, any screening program utilizing tests of a given sensitivity and specificity can improve the predictive value of a positive test [lo] if it can: (1) locate in an area where the prevalence rate of interest is higher and (2) attract the area residents most likely to develop the disease. The predictive value of a positive test is the proportion of true positives among all screening program participants who test positive.

Most models of cancer screening programs focus on sub-groups of the general population based on age and sex-specific incidence, preva- lence and mortality rates of a particular cancer. While targeting programs on the basis of these individual characteristics tell us who to screen, we demonstrate here that an equally important criterion is where to screen.

Thus, depending on the size and population density of the catchment area served, locating a cancer screening program close to the popu- lation at risk may increase the likelihood that the program will be utilized. This could be accomplished by reviewing the population characteristics and the program resources of high-risk areas, and designing and promoting screening programs to take advantage of existing community social and cultural norms and support systems to change health behaviors related to cancer screening program utilization.

If geographic targeting through small area analyses of disease risk and population charac- teristics is to be cost-effective, cancer control planners will require methods for quantifying:

1. The stability of small area rates over time. 2. The improvement in case detection that implementation of a geographic targeting program can be expected to yield.

This paper describes our progress to date in developing and refining such methods, utilizing census, cancer incidence, and cancer mortality data. Specifically, we examine the relationship between the number of years of data used and small area rate stability. We then estimate the improvement in the positive predictive value of a breast cancer screening program that would be obtained by targeting it into high-risk areas.

Page 3: Geographically-based cancer control: Methods for targeting and evaluating the impact of screening interventions on defined populations

Geographically-ba sed Cancer Control 545

METHODS

Geocoding cancer incidence and mortality

The details of geographically coding the New York State Department of Health cancer inci- dence and New York City Department of Health mortality data have been previously described [9]. Briefly, all cases and deaths were geocoded to census tract of residence by com- puter. Approximately 80-85% of all addresses were successfully matched in this manner. Non-matched addresses were then manually searched, and the combined process resulted in a 90-95% rate of successful matching. Denom- inators for all small area rates were obtained from 1980 U.S. census data (Summary Tape File 3, 1980). This census year was used as an approximate midpoint for the cancer incidence (1976-1982) and cancer mortality (1977-l 983) data.

Incidence, mortality, and census data were aggregated to the level of health areas, an administrative unit used by the New York City Department of Health for reporting, planning and evaluation. The aggregation process is error free because health areas are coterminus with census tracts. As a cluster of census tracts, most health areas provide a population base ranging in size from approximately 6000 to 20,000. Health areas with a total 1980 population of less than 5000 or with more than 10% of the total population institutionalized were excluded from the analysis. Of the 351 New York City health areas, 329 satisfied the selection criteria. In 1980, these health areas contained 99% of the population of New York City.

Rate calculation

We chose to use age-truncated crude rates rather than age-adjusted incidence and mor- tality rates. It is customary to adjust for age when comparing geographic variability in can- cer rates, because of the increased risk of cancer with increasing age. However, when targeting geographic areas for screening service delivery, a concentration of age groups at higher risk for a particular cancer should be considered as part of the planning process.

Thus, for breast cancer, we calculated the incidence and mortality rates for women 35 and older, and for cervix cancer (both in situ and invasive) we chose women who were 20 and older. For male and female colorectal cancer combined, we used men and women 45 and older as the base for calculating rates. For each

cancer site, an age-truncated crude incidence rate, an age-truncated/late-stage incidence rate (incident cases with regional spread and meta- static disease combined), and an age-truncated crude mortality rate were calculated.

All rates are expressed in yearly terms. Thus, the breast cancer 35 and older incidence rate for a particular health area, based on 7 years of data, was calculated by summing incident cases among women 35 and older over the 7-year period 197682, dividing by seven, and then dividing the result by the total number of women 35 and older in the health area in 1980.

Rate stability

A major concern in small area analysis is rate stability. To the extent that defined population risk factors for a given cancer vary more be- tween than within small areas, the incidence rate for that cancer should vary systematically be- tween areas. However, in any given year, the observed variability in rates between and within health areas will reflect chance factors as well as risk levels. It is clear that the contribution of chance to variability between areas will be high when the number of cases in the numerator of the rate calculation is small. This will be true when (a) the average population size of the area is small, and/or (b) the cancer represents a relatively rare event.

To reduce the contribution of chance to between- and within-area variability, the num- ber of cases can be increased by using data from several years in the rate calculations. However, collection of data over an extended period of time introduces a second potential source of rate instability. The rate in an area will change from year to year if the level of population risk shifts due to in- and out-migration, aging, or changes in birth and/or other disease mortality rates.

In previous work [9] we proposed the use of Cronbach’s alpha, a coefficient of reliability [ 111, as a measure of the relative or rank-order stability of health area cancer incidence and mortality rates over time. Developed originally to test the internal consistency of items on psychological tests, the equation for Cronbach’s alpha coefficient is:

In the present context we substitute years for test items and health areas for subjects. Thus, the sum of the variation between areas within years (C 0:) and the variation between areas of

Page 4: Geographically-based cancer control: Methods for targeting and evaluating the impact of screening interventions on defined populations

546 JON F. KERNER et al.

the summed rates across all years (at) are substituted for the variation between subjects within test items and between subjects of the summed scores across all items. The subscript k denotes the total number of years of data, and i represents the ith year ranging from 1 to k. The value of rkk, which was calculated using the reliability program from SPSSX [12], will generally range in most applications from 0 (no stability) to 1 (perfect stability).

As applied to the problem at hand, the for- mula operates as follows. If rates are relatively stable, the rank order of the rate for each health area will tend to be similar from year to year; the rates will be correlated for each pair of years, and the health area rankings will be preserved when the rates are summed across all years. In a case of less stable rates, the rank order of each health area will tend not to be preserved from year to year, and the correlation of rates between pairs of years will be low.

Greater stability from year to year leads to higher between-area variance in the rates summed across years (a:) in relation to the sum of the between-area variances within years (I: 0;). Lower stability results in a decrease in the variance of the summed rates (a$) which will approach the value of the summed variances of the individual years (Z af). Thus, more stable rates will cause the variance ratio in the equa- tion (C af/a$ to approach zero, and rkk to approach the value one. Conversely, lower sta- bility will lead to the variance ratio approaching a value of one, which in turn will cause the value of rkk to approach the value zero.

It should be noted that Cronbach’s alpha measures stability with respect to relative rank- ing and does not assess the extent to which the absolute value of the rate remains constant from year to year within small geographic areas. For the purposes of targeting services, rank order stability of rates is more relevant. The screening program planning question centers on whether certain areas generate rates that are consistently higher than those of other areas, and not whether the absolute value of each area’s rate remains constant from year to year. For appli- cations in which there is a concern with the stability of absolute values, the reader is referred to reviews by Shrout and Fleiss [ 131 and Bartko and Carpenter [ 143.

To determine the relationship between years of data used and rate stability, we calculated reliability coefficients for nine rate/site combina- tions (three sites and three rates). Within each

rate/site combination, 2, 3, 5 and 7 year re- liability estimaters were obtained. In calculating 5 and 7-year reliability estimates, only one set of unique years was available for each analysis, since only 7 years of data were available. It was possible to create two independent groups of 3-year estimates and three independent groups of 2-year estimates. We then computed 2-year and 3-year average estimates of rate stability. In all cases, groups of years were formed beginning with the earliest year available (1976 for incidence, 1977 for mortality).

Population risk and rate stability

A second, indirect, indicator of rate stability is the tendency for small areas which share certain population risk characteristics to be similar with respect to disease rates. Thus, to the extent to which small areas sharing population risk characteristics cluster geographically, there should also be geographic clusters of areas with similar rates. In this paper we present com- puter-generated maps showing the geographic distribution of certain rate/site combinations, together with a representative risk factor, based on 1980 census data.

Predictive value of a positive test

We selected the predictive value of a positive test as the indicator of targeting efficacy, and we chose breast cancer as the cancer site with which to demonstrate the impact of geographic tar- geting. Given a group of women who have been tested by physical examination and mam- mography, the predictive value of a positive test will be the number of positives who truly have breast cancer (i.e. true positives) divided by the total number of women who tested positive. The formula for the predictive of a positive test, as given by Vecchio [lo] is as follows:

PI/ = [pa&a + (1 -p)(l -b))] x 100

where PV = positive predictive value; p = prevalence of disease; a = sensitivity of the test; and b = specificity of the test.

This formula presents the number of true positives as prevalence multiplied by sensitivity (pa), and the number of false positives as the complement of prevalence multiplied by the complement of specificity (1 - p)(l - b). This formulaton of predictive value demonstrates how PV increases with an increase in disease prevalence.

We estimated breast cancer prevalence for each health area, by multiplying race-specific

Page 5: Geographically-based cancer control: Methods for targeting and evaluating the impact of screening interventions on defined populations

Geographically-based Cancer Control 547

estimates for median survival [ 151 by the appro- priate age-truncated crude incidence rates. We then grouped the health areas of each county into prevalence quintiles. The predictive value of a positive physical exam and/or mam- mography was then calculated for each quintile and the entire county, based on the estimated prevalence and sensitivity (0.85) and specificity (0.98) values from HIP data (S. Shapiro, personal communication, 1985).

Analysis of missing data

In calculating the age-truncated, late-stage rates, a cancer case was not included if the staging information was missing from the cancer registry. This produced an estimate of the late-stage disease rate that was adjusted for missing stage data. The overall percentage of missing stage data across all health areas was 5.4% for breast cancer and 7.2% for colon cancer.

While the percentage of missing data for both cancer sites was stable over the 7 years for which data were available, there were statistically significant differences between the mean per- centages of missing stage data, when health areas were grouped by age-truncated crude rate quintiles (breast Fcd,324) = 4.6, p < 0.005; colon Fc4,324) = 5.6, p < 0.001) and by county (breast Fc4,324) = 12.5, p < 0.001; colon Fc4,324) = 7.5, p < 0.01). These results were obtained using the square root transformation to adjust for positive skew in the missing stage distribution.

Given the low percentage of missing stage data for both sites, classification of high risk areas utilizing stage of disease data or late-stage would be biased only if the proportion of actual late-stage disease among those cases in which staging information was missing in each health

area differed dramatically from the proportion of late-stage disease among the cases for which staging information was reported to the population-based cancer registry.

RESULTS

The results of our analyses are presented in three parts. First, we examine the stability of breast, cervix, and colorectal cancer incidence and mortality rates over time. Second, we pro- vide a descriptive analysis of breast cancer prevalence rates and demonstrate the impact on the predictive value of a positive test, of tar- geting breast cancer screening programs into high-risk areas. Finally, we provide a graphic representation of the small area variation of cervical cancer incidence and mortality rates in New York City, and demonstrate the im- portance of reviewing the population character- istics of those at greatest risk of developing and/or dying from the disease.

Table 1 displays the average 2- and 3-year estimates, and a 5- and 7-year estimate of the stability coefficients of truncated crude health area cancer incidence rates, truncated crude late stage incidence rates, and truncated crude mor- tality rates for cancers of the female breast, cervix, and male and female colon/rectum, for the period 1976-82 for cancer incidence and 1977-83 for cancer mortality data in New York City.

Looking at the values of Cronbach’s alpha based on 7 years of data, age-truncated crude incidence rates provide the most stable estimate of cancer risk over time across health areas. For cervix and colorectal cancer, the invasive and late-stage incidence rates respectively are the next most reliable, followed by the mortality

Table 1. Two- and three-year average reliability estimates and five- and seven-year estimates for small area breast, cervix and colorectal cancer

incidence and mortality rates

Site/Rate Average two-year

Average three-year Five-year Seven-year

Breast (35+) Incidence Late-stage incidence Mortality

Cervix (20 + ) Invasive + in situ Invasive only Mortality

Colorectal (45 +) Incidence Late-stage incidence Morialitv

0.45 0.56 0.66 0.74 0.17 0.22 0.31 0.41 0.23 0.27 0.40 0.48

0.58 0.64 0.75 0.82 0.29 0.37 0.47 0.59 0.18 0.25 0.32 0.47

0.52 0.62 0.68 0.76 0.24 0.42 0.49 0.57 0.18 0.31 0.39 0.53

Page 6: Geographically-based cancer control: Methods for targeting and evaluating the impact of screening interventions on defined populations

548 JON F. KERNER et al.

Table 2. Implications of small area variation in cancer incidence on breast cancer screening*

County Number Estimated prevalence Range: % Positive area of per 100,000 population non-local predictive

quints. areas (average for quintile) diseaset value

Brooklyn 1st 2nd 3rd 4th 5th

Bronx 1st 2nd 3rd 4th 5th

MtUhtt9ll 1st 2nd 3rd 4th 5th

QlleMs 1st 2nd 3rd 4th 5th

Richmond 1st 2nd 3rd 4th 5th

22 23 23 23 22

11 12 11 12 11

15 15 1294.7 39.7-61.5 15 915.0 42.1-76.7 15 867.8 37.5-64.7 16 618.8 25.C92.9

14 15 15 15 15

1004.1 57.4% 1600.7 43.2-59.9 1282.8 46.7-67.4 970.7 38.6-82.6 670.1 48.3-83.3 500.1 41.7-81.3

0.29 0.41 0.36 0.29 0.22 0.17

830.9 54.8% 0.25 1476.5 40.654.2 0.39 1066.1 44.4-62.2 0.31 674.8 42.349.6 0.22 541.1 38.9-74.2 0.19 401.2 14.3-83.3 0.15

1081.1 53.1% 1709.4 34.8-57.3

0.31 0.43 0.35 0.28 0.27 0.21

1167.4 53.9% 1533.1 40.4-59.7 1366.3 43.1-61.2 1227.2 44.14.9 983.2 42465.4 751.6 45S74.3

1150.3 1442.7 1302.7 1076.5 1047.8 882.0

0.33 0.40 0.37 0.35 0.30 0.24

57.6% 0.33 52456.3 0.38 52.8-54.5 0.36 49.1-56.7 0.32 60468.9 0.31 61.2-64.1 0.27

*Assuming sensitivity of 0.85 and specificity of 0.98 from HIP data. tAdjusted for stage unknown.

rates. However, for breast cancer, the mortality rate is somewhat more stable than the late-stage incidence rate, despite the fact that the number of late-stage incident cases per health area is between 21 and 42% larger than the number of deaths per health area.

We can also see the impact of varying the number of years of cancer data on the stability of the health area rates over time. A question for many states, where cancer incidence registries have been recently initiated, is: how many years of data are required to obtain stable small area rate estimates? Table 1 displays the average stability of the same health area cancer inci- dence and mortality rates over time for a series of 2- and 3-year periods compared with a 5- and 7-year estimate for breast, cervix, and colorectal cancer incidence and mortality data.

Table 1 clearly displays the benefit, in terms of the stability of small area incidence and mortality rates over time, of increasing the number of years of data used in the analysis.

The table also indicates that longer periods may be needed to provide acceptable levels of sta- bility for age-truncated late-stage incidence and mortality rates than for age-truncated crude incidence rates. What constitutes an “accept- able” stability level will depend on the impact of different levels of rate stability on targeting effectiveness. This issue remains to be explored.

From an evaluation perspective, where one might choose to compare small area rates from an intervention population with small area rates from a control area population, the marked variation observed in 2-year reliability estimates (e.g. the average was based on values that ranged between 0.45 and 0.73 for 20+ invasive/in situ cervical cancer combined) could produce misleading estimates of intervention impact. Thus, sizable follow-up periods may be needed to make reliable estimates of intervention impact on defined populations.

Turning to the potential impact of targeting screening programs into ecologically defined

Page 7: Geographically-based cancer control: Methods for targeting and evaluating the impact of screening interventions on defined populations

Geographically-based Cancer Control 549

high-risk areas, the greatest yield for cancer screening programs will be gained by targeting into areas where the prevalence of the disease is highest and where a high proportion of the cases are being diagnosed with late stage disease. Thus, we sought to pinpoint neighborhoods with a high prevalence of disease so that the positive predictive value of screening would be the highest, given relatively stable rates over a T-year period in the five counties of New York City.

We see from Table 2 that the predictive value of a positive test, for breast cancer as an exam- ple, is substantially higher in those health area quintiles with the highest prevalence of overall disease.

If one takes the county positive predictive values as an estimate of the yield one can expect when one sets up a screening program without considering the variation in disease prevalence within a community, then the incremental im- provement from targeting can be represented by the percentage difference between the positive predictive value for the highest prevalence areas and the value for the county as a whole. In New York City, the largest benefit is achieved in the Bronx (56%), Brooklyn (41%), and Manhattan (39%). For Queens (21%) and Richmond (15%), the anticipated benefit is less. This is probably a function of less variability between health area populations in these two counties. Where the population in a large geographic region is relatively homogeneous, small area variation within the region will be limited, and the benefits of geograpically targeting screening interventions will be reduced.

Just as one can improve the positive predic- tive value by targeting into high rate areas, one can also produce very low yields if one happens to set up a program and recruit a population of users from a low-rate area. This may explain why in urban areas such as New York City, the duration and consistency of community-based screening programs is so limited. When limited health resources are expended on a screening program, where the yield of detected cases is low over a l- or 2-year period, the chances of continued financial and community support are not good. In terms of the impact on down- staging of disease, we also note that the range of non-localized disease is the smallest for the quintile with the highest prevalence rates. This is consistent with the larger number of observed cancers, and suggests that down-staging as an

outcome measure may be somewhat more limited in high incidence areas.

Figures 1 and 2 display the geographic vari- ability and clustering of cervical cancer inci- dence and mortality rates for women 20 years and older. Cancer control program planners who want to improve the effectiveness of Pap smears would do well to target the communities of the South Bronx, Northern Manhattan, North-Central Brooklyn, and South-Eastern Queens. The more limited consistency observed within health area mortality clusters as com- pared to the health area incidence clusters is in part a function of the lower level of temporal stability associated with small area mortality rates (see Table 1).

However, small area cervical cancer mortality will also vary as a function of the impact of Pap smear utilization and follow-up on the ratio of invasive to in situ disease, and as a function of the availability and utilization of state-of-the- art patient management practices in these high- risk neighborhoods. Finally, Fig. 3 graphically displays the variation in the percentage of the population in each health area that is black. A comparison of Figs 1 and 3 shows the ecological relationship between race and cervical cancer incidence; the comparison of Figs 2 and 3 shows the ecological relationship between race and cervical cancer mortality.

Beyond the etiological implications, re- viewing the population characteristics of tar- geted high-risk neighborhoods provides cancer control program planners with important information about tailoring the structure, promotion, and evaluation of cancer control interventions to specific social and cultural characteristics of the potential user population.

DISCUSSION

Several important methodological issues must be considered when using small area incidence and mortality rates for cancer control planning purposes. Here we focus on: (1) the use of age-specific vs age-standardized rates in analyz- ing small area variation, and (2) the stability of rates over time.

While the use of age-standardized rates is important when investigating the relationships between suspected causal agents and disease variation in different populations, we have cho- sen to use age-truncated crude rates to target cancer control programs. The fact that popu- lations vary by age must be incorporated into,

Page 8: Geographically-based cancer control: Methods for targeting and evaluating the impact of screening interventions on defined populations

NEW YORK CITY HEALTH ARE Cervical cancer incidence rate*

among women 20 and older, 1976 -1982

(per 100,000 popuCatlon)

N 329 Mean 64.27

&

31.89

91.56+ -195.29

66.93+ - 91.56

I# 49.14+ - 66.93

mm 36.36+ - 49.14

Ea 8.58 - 36.36

*In situ and invasive

AS k

NEW YORK CITY HEALTH ARE-AS Cervical cancer mortality rate

among women 20 and older, 1977-1983

(per 100,000 population)

N 329 Mean 8.08 SD 6.62

11.69+-38.33

7.47+-11.69

Ri 5.35+- 7.47

m 3.26+- 5.35

0.0 - 3.26

Fig. 1

Fig. 2

550

Page 9: Geographically-based cancer control: Methods for targeting and evaluating the impact of screening interventions on defined populations

Geographically-based Cancer Control 551

NEW YORK CITY HEALTH AREAS percent black powlation.

N 329 Meon 27.88 so 31.42

6.80+-23.56

•III 1.46+- 6.80

??0.0 - 1 .46

Fig

not standardized out of, program planning and evaluation.

By choosing the particular categories to group age-specific rates for cancers of the female breast (35 and older), cervix (20 and older), and male and female colon/rectum (45 and older), we selected age-groupings that reflected (1) the higher rate of these cancers among younger members of a major New York City minority group (i.e. blacks) and (2) the generally accepted NC1 and ACS screening guidelines for these disease sites. Thus, the selection of age catego- ries for site-specific cancer rates should reflect both the population characteristics of the com- munities involved and the interventions being planned and/or evaluated.

Another approach to incorporate small area variation in age into cancer control planning analyses would be the use of both age- standardized rates and the age profile of the population in the analytic model. Such an approach might provide greater flexibility when considering the targeting of prevention and patient management programs, in addition to cancer screening programs. When assessing the relative merits of the two approaches, it is

3

important to evaluate both the methodological merits as well as the ease in comprehending the analytical results by program planners and ad- ministrators. We will examine the benefits and costs of these two approaches when we begin to evaluate this methodology for targeting other types of cancer control programs.

With respect to the stability of rates over time, one can anticipate, and we have observed, that the relative difference in cancer incidence and mortality rates among small areas will vary in part randomly from one year to the next. The amount of this random variation will depend on the size of the area selected for analysis and the extent to which the disease of interest is a rare event. In general, as the size of the area becomes smaller, and the disease event becomes more rare, the amount of random variation can be expected to increase.

Cancer control planners and evaluators need a mechanism to assess the stability or consis- tency of rates over time. Particularly for plan- ning purposes, if interventions are to be targeted on recent historical data, then planners will want to assess how confident they can be that relatively high rate areas in the past are likely to

Page 10: Geographically-based cancer control: Methods for targeting and evaluating the impact of screening interventions on defined populations

552 JON F. KERNER et al.

continue to be high rate areas in the future, when the interventions are implemented.

This paper has reviewed the utility of Cronbach’s alpha coefficient of reliability as a stability index of small area disease variation over time. In large urban areas, where there is homogeneity within small areas and high vari- ability among them, as little as 3 years of data may be needed to reliably estimate neigh- borhood risk status with respect to disease development. However, our analysis of New York City health areas suggests that targeting interventions on the basis of late-stage disease and mortality rates may require 5-7 years of data.

One question raised by Table 1 was why breast cancer late-stage rates were less reliable over time than breast cancer mortality rates, despite larger numerator values in the calcu- lation of late-stage rates. Late-stage rates may tend to be less reliable because each late-stage rate is a function of both incidence and the percent having a given stage designation, and both of these variables are subject to random variation. However, for colorectal cancer, late- stage incidence rates were more reliable than mortality rates.

A second possible explanation of the difference between the reliability of mortality and late-stage incidence rates is that population risk characteristics, associated with site-specific late-stage incidence rates, may differ somewhat from those associated with site-specific mor- tality rates. To the extent that these different characteristics also differ with respect to their geographic variability, this may contribute to different reliability estimates for late-stage incidence vs mortality data. We are currently conducting a set of analyses to explore the factors which contribute rate stability.

The question also points to the problem of using late-stage rates from population registry data, when one is concerned about the reliability of crude stage of disease classifications and the number of cases where the reported stage is unknown. To limit the impact of staging classification errors, we dichotomized the four staging categories for breast cancer into in- situ/local and regional/distant. Our analysis of missing stage data indicated that bias due to non-classification is possible but unlikely, when the percentage of missing cases is relatively small.

The stability of late-stage disease rates is an important methodological problem for Phase

IV defined population studies that will use stage of disease data as an “intermediate” endpoint for mortality reduction. These results suggest that Phase IV studies that plan to use late-stage disease as an “intermediate” outcome measure of screening program effectiveness, may need follow-up periods as long as those projected for mortality data, and will have to carefully evalu- ate the small area variation in the percentage of cases that are staged unknown.

Turning to the interpretation of results ob- tained from ecological analyses of small area data, a major criticism has stemmed from the concern about inferring causal relationships from aggregate data to individuals. Robinson [16] showed that correlations between variables at the aggregate level may differ from cor- relations between the same variables at the individual level in size and in direction. Thus, it was concluded that researchers should not use aggregate data to study individual-level re- lationships, and that those who did risk falling prey to an “ecological fallacy” [17]. Firebaugh [18], clarified the problem of making causal inferences across levels of analysis by specifying the conditions under which an ecological fallacy will occur. As Firebaugh notes, most research- ers who use aggregate-level data in lieu of individual-level data must evaluate their meth- odological assumptions theoretically rather than empirically because no individual-level data is available.

From a cancer control perspective, this issue can be viewed somewhat differently. Unlike the investigator using small area disease rates to infer individual-level etiologic hypotheses, we propose that cancer control program planners use small area cancer incidence and mortality rates to identify high-risk areas to which cancer control programs should be targeted. As such, small area variation in disease rates is not utilized as a surrogate for studying individual variation, but rather represents a meaningful phenomenon in and of itself.

The existence of stable high-risk and low-risk areas provides a significantly improved basis for targeting or evaluating the impact of limited program resources, whether or not the risk differential is a reflection of individual differ- ences or is a reflection of both individual differences (e.g. a more elderly population) and shared area characteristics (e.g. inadequate cancer screening facilities). In this paper, the potential impact of targeting cancer screening programs into high prevalence areas has been

Page 11: Geographically-based cancer control: Methods for targeting and evaluating the impact of screening interventions on defined populations

Geographically-based Cancer Control 553

clearly demonstrated in terms of the improve- ments in the predictive value of a positive screening test.

However, this does not discount the im- portance of determining whether small area variation in disease rates is a function of individ- ual and/or area factors. The value of identifying the relative importance of individual differences and shared area characteristics in explaining small area disease variation lies in improving the effectiveness of a proposed intervention in a set of high-risk areas. As Figs l-3 suggest, by tailoring the intervention design and its pro- motion to take into account area and popu- lation characteristics, program effectiveness can be further enhanced.

Our future reports will focus on: (1) testing models for using census and mortality data to generate predicted late-stage incidence rates for targeting in communities without a population- based incidence registry; (2) evaluating the im- pact of different rate reliability estimates on the accuracy of high-risk area identification over time; (3) quantifying the relationship between rate stability and the size and homogeneity of the geographic units selected for analysis; and (4) evaluating different methods for presenting small area data to enhance its usefulness in cancer control program planning.

REFERENCES

1. Broome FR. Techniques for statistical mapping via automation. In: Proceedings of the 1976 Workshop on Automated Cartography and Epidemiology. Hyattsville, Md: National Center for Health Statistics. DHEW Publication No. (PHS) 79-1254, August, 1979: 6768.

2. Blot WJ, Fraumeni Jr JF. Studies of respiratory cancer in high risk communities. J Occup Med 1979; 21: 27&278.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

18.

Blair A, Fraumeni Jr JF, Mason TJ. Geographic patterns of leukemia in the United States. J Chron Dis 1980; 33: 251-260. Dayal H, Chin CY, Sharrer R, Mangen J, Rosenwalke I, Schapiro S, Henly AJ, Goldberg-Alberts R, Kinman J. Ecologic correlates of cancer mortality patterns in an industrialized urban population. J Nat1 Cancer Itwt 1984; 73: 565-574. Dent 0, Goulsten K. Geographic distribution and demographic correlates of &lo&tal cancer mortality in Svdnev. New South Wales. Sot Sci Med 1984: 19: 433239: McPherson K, Wennberg JE, Ovind OB, Clifford P. Small area variation in the use of common surgical procedures: An international comparison of New England, England and Norway. N Engl J Med 1982; 307: 131&1314. Connell FA, Blide LA, Hanken MA. Clinical correlates of small area variations in population-based admission rates for diabetes. Med Care 1984; 22: 939-949. Wennberg J, Gittelsohn A. Small area variations in health care deliverv. Science 1973: 182: 1102-l 108. Kerner JF, Streuhing E, Pittman J, Andrews H, Sampson N, Strickman N. Small area variation in cancer incidence and mortality: A methodology for targeting cancer control programs. In: Engstrom PF, Anderson PA, Mortenson LE, Eds. Advances in Can- cer Control: Epidemiology and Research. New York: Alan R. Liss Inc.; 1984: 225-234. Vecchio TJ. Predictive value of a single diagnostic test in unselected populations. N Engl J Med 1966; 274: 1171-1173. Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika 1951; 16: 297-334. SPSSX User’s Guide, 2nd edn. Chicago: SPSS Inc.; 1986: 856-873. Shrout PE, Fleiss JC. Intraclass correlations: Uses in assessing rater reliability. Psych BuII 1979; 86: 42&428. Bartko JJ, Carpenter WT. On the methods and theory of reliability. J Nerv Ment Dls 1973; 163: 307-317. Cancer Patient Survival Report Number 5. NIH Publication No. 81-882. Bethesda: USDHHS; 1986. Robinson WS. Ecological correlations and the behavior of individuals. Am Sot Rev 1950; 15: 351-357. Selvin HC. Durkheim’s “suicide” and problems of empirical research. Am J Sot 1958; 63: 607619. Firebaugh G. A rule for inferring individual-level relationships from aggregate data. Am Sot Rev 1978; 43: 557-572.


Recommended