+ All Categories
Home > Documents > Malaria and Mortality

Malaria and Mortality

Date post: 21-Jan-2017
Category:
Upload: peter-newman
View: 212 times
Download: 0 times
Share this document with a friend
8
Malaria and Mortality Author(s): Peter Newman Source: Journal of the American Statistical Association, Vol. 72, No. 358 (Jun., 1977), pp. 257- 263 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2286786 . Accessed: 15/06/2014 18:31 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. http://www.jstor.org This content downloaded from 185.2.32.134 on Sun, 15 Jun 2014 18:31:42 PM All use subject to JSTOR Terms and Conditions
Transcript

Malaria and MortalityAuthor(s): Peter NewmanSource: Journal of the American Statistical Association, Vol. 72, No. 358 (Jun., 1977), pp. 257-263Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2286786 .

Accessed: 15/06/2014 18:31

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof the American Statistical Association.

http://www.jstor.org

This content downloaded from 185.2.32.134 on Sun, 15 Jun 2014 18:31:42 PMAll use subject to JSTOR Terms and Conditions

Malaria and Mortality PETER NEWMAN*

Controversy has arisen concerning the extent near eradication of malaria affects mortality in tropical countries. Guesses of malaria's contribution to the classic fall in mortality in post-war Sri Lanka have varied from zero to 100 percent, while more serious estimates range from 21 percent (Gray 1974) to 48 percent (Newman 1965, 1970). In this article a general model is devised in which the previous estimating equations may be embedded. The Box-Cox approach to nonlinear regression analysis is then used to estimate close approxima- tions to this model. The method yields a "best" estimate of 44 per- cent but also yields fairly wide margins of error.

KEY WORDS: Malaria; Mortality; Box-Cox methods; Sri Lanka.

1. INTRODUCTION

Severely endemic or epidemic malaria is a maj or problem for public health wherever it occurs, but modern methods of controlling the disease can reduce its inci- dence to insignificant levels and have done so in many countries. This process appears to have caused sub- stantial changes in the relevant rates of population growth. For many reasons, however, the measurement of such induced changes encounters serious conceptual prob- lems, the most important of which arise from the well- established fact that malaria tends to reduce greatly the general health and resistance to disease of any affected population. Therefore, deaths from many causes may be affected and not just those from malaria alone. As Macdonald (1951, p. 36) has remarked, a high total death rate can be "the typical cost of acquiring a firm group tolerance to heavy malarial infection."

It is not by chance that so much of the research on these problems of measurement has concentrated on the classic case of Sri Lanka (Ceylon).' The existence of a relatively simple epidemiologic pattern of the disease together with quite satisfactory vital statistics has meant that in this, as in several other aspects of demography, Sri Lanka makes a fine laboratory in which to examine some difficult problems (Taeuber 1949). The present paper continues in this tradition and is concerned only with the effects of the near eradication of malaria on levels of nonfetal mortality leaving the effects on fertility for treatment elsewhere.

* Peter Newman is Professor of Political Economy, The Johns Hopkins University, Baltimore, MD 21218. The author is indebted to Charles Mallar for advice on problems of statistical inference and to Carl Christ, R.H. Gray, Louis Maccini, and Jiirg Niehans for helpful discussions; the comments of the referees of the earlier ver- sions were also valuable. Asis Bandyopadhyay provided excellent research assistance.

1 Abhayaratne (1950), Barlow (1967, 1968), Cullumbine (1950), Frederiksen (1960a, 1960b, 1962, 1966a, 1966b, 1970), Gill (1940), Gray (1974), Meade (1968), Meegama (1967, 1969), Newman (1965, 1969, 1970).

The problems of inference involved are not obvious and need explicit modeling. Fortunately, it turns out that once formulated, the appropriate model can be closely approximated with the techniques originated by Box and Cox (1964). This in turn makes possible a systematic evaluation of various specifications of the ways in which malaria affects mortality, in particular those in an interesting paper by Gray (1974) and in earlier works of mine.

The main quantitative conclusions of this investigation may be summarized, as follows: Neither Gray's semi- logarithmic nor my linear specification can be rejected by the data; the best specification (i.e., that which maximizes the appropriate likelihood function) lies a little more than halfway between, in a sense made precise by the Box-Cox approach. These three specifications give the following answers to the question of how much of the fall in mortality from 1936-45 to 1946-60 is attributable to the near eradication of malaria during the immediate post-war period: Gray 26 percent, best 44 percent, Newman 48 percent.2 The balance of this fall in mortality was due to other improvements in health which are not individually analyzed here.

2. PHENOMENA TO BE EXPLAINED

As measured by crude death rates the observed fall in mortality in post-war Sri Lanka was from a national average of 20.4 per thousand for 1936-45 to 11.7 for 1946-60, a fall of 43 percent. But accompanying this extraordinary improvement was a less obvious but equally remarkable phenomenon whose significance was first emphasized by Coale and Hoover (1958). For 1936-45 the unweighted mean of the 21-district crude death rates (one for each of the administrative districts into which the island was then divided)3 was 23.2 per

2 The apparently discrepant estimate of 42 percent in Newman (1965 and 1970) refers to the drop from the longer period 1930-45 to 1946-60 and is, therefore, not directly comparable with the present estimates. The divergence between Gray's estimate as given here and those presented in his original article will be fully discussed later.

3For reasons given in Newman (1965), pp. 23-32, the official estimates of crude death rates (CDRs) by district, as published in the Annual Reports of the Registrar-General, cannot be accepted as a basis for serious work on these problems. All data used here are taken from the revised estimates contained in Part I (2) and the Appendix to Part I in that monograph, to which the reader is also referred for an appraisal of the quality of the basic vital registration data.

It is natural to be concerned whether measures of mortality more refined than CDRs are appropriate for this investigation. However, it

? Journal of the American Statistical Association June 1977, Volume 72, Number 358

Applications Section

257

This content downloaded from 185.2.32.134 on Sun, 15 Jun 2014 18:31:42 PMAll use subject to JSTOR Terms and Conditions

258 Journal of the American Statistical Association, June 1977

thousand, with a variance of 40.2; the corresponding figures for 1946-60 were 11.7 and 1.6 giving a variance- to-mean ratio of 1.73 in the first period and 0.14 in the second. Therefore, one needs to explain both (i) the sharp fall in the level of the national crude death rate and (ii) the almost complete disappearance in variability of the district rates.

It has seldom been seriously asserted that the reduction of malaria accounted wholly for (i), there being wide agreement that nonmalarial factors were also important for a complete explanation. But disagreement is much sharper with (ii); Frederiksen (1960a, 1960b, 1962, 1966) and Meegama (1967, 1969) in particular have laid major emphasis on nonmalarial factors. However, Gray (1974) has demonstrated convincingly that neither changes in regional disparities in levels of nutrition (as asserted by Frederiksen) nor similar changes in levels of medical services other than the control of malaria (as asserted by Meegama) can explain this narrowing of regional death rates. Gray concludes (p. 226), "Only the control of malaria can explain the differential decline in mortality between districts in post-war Ceylon," a position which is consistent with the analysis of Newman (1965, 1970).

In the latter article it was shown (pp. 135-9) that a simple specification, in which in any period the crude death rate in any district depends linearly on the current prevalence of malaria there, can indeed explain the two observed phenomena. But it is clearly not unique in this respect, and the problem then is to sort out in a syste- matic way which of the competing specifications is in some sense best with much of the difficulty arising pre- cisely from what, in this context, can be meant by best.

3. AN EXPLANATORY MODEL The following notation is used, though for simplicity

subscripts will be omitted in the early going: xi(s): the prevalence of malaria in district i, (i = 1,

2, ..., 21), during period s (where s = 1 for 1936-45 and s = 2 for 1946-60), as estimated by the spleen rate averaged over 1938 through 1941 for s = 1 and 1950 through 1952 for s = 2.

is shown in Newman (1970, pp. 139-143) that almost all of the falls in district CDRS during this period were due to falls in district age- specific mortality rates rather than to changes in district age distribu- tions or to internal migration. In turn, the dependence of district age-specific mortality rates on malaria is shown in Newman (1965) Part III (9A) and in Gray (1974, p. 224) to have followed patterns very similar to those displayed by the district CDRs. So for simplicity it is the latter measures of mortality that are used exclusively in this paper.

I The spleen rate is the best single measure of the prevalence of malaria, as distinct from its incidence. It is normally calculated as the percentage of all children attending school in the district con- cerned, who had enlarged spleens at the time of the relevant survey. On problems connected with its use, both in general and for Sri Lanka in particular, see Newman (1965, pp. 32-37) and Gray (1974, p. 209); for the actual data see Newman (1965, pp. 90-93). The available information for 1936-45 consists of six annual surveys by district from February 1936 through March 1941, while the post-war data consists of eight biannual surveys from September 1946 through March 1950 and five annual surveys from March 1951 through March 1955, by which time spleen rates had been so uniformly close to zero for so long that the surveys were discontinued.

yj(s): a vector of variables (yil(S), Yi2(S), ..., Yik(S),...,

yij(s)) where J is some finite but unknown integer. Each Yik (S) is the average value of a variable other than malarial prevalence which affects or may affect mortality in district i during period s.5

zi (s): the average value of the crude death rate in district i during period s.

On examination, the issue comes down to the following counterfactual question: How would mortality in the various districts have fallen if no significant change had occurred in regional levels of malarial prevalence but if all other variables affecting mortality had taken their actual values throughout?

To model this, suppose that a functional relationship uniform over all districts and both time periods existed between some invertible transform T of the crude death rates z (s) and the explanatory variables x (s) and y (s) and, furthermore, that this relationship was additively separable, so that

T(z(s)) = g(x(s)) + h(y(s)), (s = 1, 2) . (3.1)

This assumption of separability is restrictive but consistent with all previous treatments of this topic. Writing z*(2) for the counterfactual death rate, i.e., that occurring with no change in malarial prevalence but with all other change as it actually was, it follows from (3.1) that

T(z(l)) - T(z*(2)) = h(y(1)) - h(y(2)), (3.2)

which is independent of malarial prevalence. The effects of the nonmalarial factors on mortality may be de- composed, exhaustively, into:

A. Effects due to factors that in any one period affected all districts roughly the same but which varied over time. Such factors may, among others, have been those brought about by social policy (e.g., levels of education) or have been more directly economic in character (e.g., levels of income or con- sumption per capita6).

B. Effects due to factors that varied with the district concerned but which were roughly uniform in operation throughout both periods. Such persisting factors may have been natural in origin, such as climate, or may have been determined more by culture (e.g., ethnic composition) or policy (e.g., the provision of roads).

C. Effects due to factors that varied (not necessarily systema- tically) both with the district concerned and over time. An example of a possible systematic variation of this type is the provision of medical care, though the evidence cited by Gray (1974, pp. 214-9) throws doubt on the importance of this for Sri Lanka over these two periods.

D. Effects due to factors that were constant over both space and time. Although the existence of such factors is logically possible, it is hard to think of any that were in fact relevant for Sri Lanka.

I In principle, past malarial prevalence may affect district age- distributions and hence district CDRs, both through its obvious in- fluence on the age pattern of deaths and also through its possible effects on fecundity and its well-documented effects on fetal mortality (see e.g., Macgregor and Avery (1974)). Hence yr(s) may include past values of xi. But as footnote 3 pointed out, changes in age distribution in fact played little role in the relevant falls in CDRS in Sri Lanka.

6 For example the data cited in Gray (1974, pp. 219-221, especially Table 11) do not show that marked disparities in nutritional levels existed in different areas of the country during the 1940s, while there is some evidence (cited in Newman (1969)) that the national level of nutrition fluctuated considerably during the Second World War.

This content downloaded from 185.2.32.134 on Sun, 15 Jun 2014 18:31:42 PMAll use subject to JSTOR Terms and Conditions

Malaria and Mortality 259

Because persisting factors of both types (B) and (D) are for any district common to each term on the right- hand side (rhs) of (2.2), neither type affects the model under examination here. But types (A) and (C) cer- tainly do, so that for any district i one may write

h (yi (1)) -h (yi (2)) = k + ui , (i = 1, 2, ..., 21) , (3.3)

where the constant k refers to changes in the effects due to (A) factors (i.e., those changes applying to all districts equally) and the variable ui refers to changes in the effects due to (C) factors, which may vary by district.

To make further progress, note that once the DDT

spraying campaign began in Anuradhapura in November 1945 all spleen rates fell rapidly towards zero. In the first period there was abundant regional variation in the prevalence of malaria; the unweighted mean of the district spleen rates for 1938-41 was 26.9 percent and their variance 442.2 with a range from 1.4 percent for Kalutara in the southwest to 67.3 percent for Anurad- hapura in the northeast. But by 1950-52 the lowest spleen rate was down to 0.2 percent (both Colombo- Negombo and Jaffna) while even the highest was only 3.5 percent (Puttalam) giving an unweighted mean of 1.8 percent and a variance of 1.0. So it is a very good approximation to put xj(2) equal to zero for each district, obtaining from (3.1) that

T(zi(2))- (g(O)) + h(yi(2)) (i =1, 2, ..., 21), (3.4)

and from (3.1), (3.3), and (3.4),

T(zi(1)) - T(zi(2)) g((xi(l)) -g (0) + k + ui (i =1, 2,.. ., 21) . (3.5)

If for given specification of T and g the left-hand side (lhs) of (3.5) is regressed on the xi(l), the resulting esti- mate of the intercept on the z axis will be an estimate of (k + ui) for some hypothetical district that had no malaria in the pre-DDT period.7 In fact no such district did exist in pure form, but there were several observations that yielded values of x (1) close to zero, so such an estimate of (k + ui) involves little extrapolation beyond the range of the sample.8 Clearly, the two components of (k + ui) cannot be separately identified by this esti- mating procedure. In particular, such a model can neither refute nor support the proposition that for each such hypothetical malaria-free district ui was zero.

I More precisely, it can be interpreted as an estimate of the mean value of (k + u,) over those hypothetical districts that had a spleen rate of zero and differing us's, since there is no reason to suppose that just one district of this type would have existed nor that their ui's would have been closely correlated with malarial prevalence.

8 The three closest such observations, with 1938-41 spleen rates of 1.4, 2.4, and 2.9 percent, respectively, were Kalutara, Colombo, and Galle Districts, together comprising 31.3 percent of the national population during 1936-45. It is shown in Newman (1965, pp. 33-34) that according to criteria developed by Gabald6n (1949) all three districts were essentially free of endemic malaria, while all but Colombo were epidemic free as well, Colombo having a low but significant score on his epidemic index.

The next and final assumption in the model subsumes this last possibility in the more general postulate that for any district i the component ui was indeed a random variable with an expectation of zero. Since ui refers to the changes from Period 1 to Period 2 in the effects due to the (C) factors and not to the effects themselves, this assumption of zero expectation is perhaps reasonable. It is worth recalling that persistent area-specific factors (i.e., those of type (B)) have already been accounted for in these models as have nationwide changes of type (A).

For purposes of estimation (3.5) is now amended by the introduction of a random error term vi which in principle depends upon T and g although the notation does not reflect this. A compound stochastic term wi is then defined by

wi = ui + vi, (i = 1, 2, . .. , 21) . (3.6)

In addition, a new function G is defined by the formula

G(xi(l)) = g(xi(l)) - g(O) + k , (i= 1,2,...,21) , (3.7)

so that G (0) = k. Then using these definitions the stochastic version of (3.5) may be written

T(zi(l)) - T(zi(2)) = G(xi(l)) + wi, (i = 1, 2, ..., 21) . (3.8)

The following standard assumptions are made about the random term wi: (i) E(wi) = 0; (ii) E(wi)2= =-2, which is finite and positive; (iii) E (wiwj) = 0 for i - j; (iv) wi is normally distributed. Under these assumptions, if T and G are taken to be linear in parameters then least-squares regression will produce a best linear un- biased estimate of k, say, k, and the usual tests of signifi- cance may be applied. (See, e.g., Kmenta (1971, Chap. 7).)9

With ui assumed to be zero for each district i and the estimate k = G(0) available, estimates of the numbers Zi*(2) may then be obtained from (3.2)-(3.8) as

Zi*(2) = T-'(T(zj(l)) - G(0)), (i = 1, 2, ..., 21) . (3.9)

The fall in the national crude death rate Z from 1936-45 to 1946-60 may be decomposed into two additive components, as follows:

21

Z(1) - Z(2) = { E pi(l)(zi(l) -zi*(2))

21 21

+ { pj(l)zj*(2) -E p(2)zj(2)} , (3.10) i=l i=l

where pi(s) is the proportion of the national population at time s that lived in district i. Then the fraction of the change in mortality not due to malaria is taken to be the

9 Note that if in addition it were supposed that the vi are normally distributed it would follow from (iv) that the ui are also normally distributed. It is relevant to remark here, that principally because the explained variables are crude death rates, there are no problems of heteroscedasticity of the error terms in this sample.

This content downloaded from 185.2.32.134 on Sun, 15 Jun 2014 18:31:42 PMAll use subject to JSTOR Terms and Conditions

260 Journal of the American Statistical Association, June 1977

first term in curly brackets on the rhs of (3.10), divided by the lhs, and the fraction due to malaria is thus ob- tained as 1 minus this fraction.'0

4. PROCEDURES FOR ESTIMATION It is easy to see that each of the explicit specifications

employed in past work can be embedded in the frame- work of Section 3. These specifications seem to be four in number, as follows :"

Newman (1965, 1970):

zi(1) - zi(2) = a, + bixi(1) + wli ; (4.1)

Gray (1974):

logio zi (1) - logio (2) a2 + b2Xi(1) + W2i , (4.2)

logo zi(1) - logio zi(2) = a3 + b3 logo xi (1) + w3i , (4.3)

(zi(l) - zi(2))/z(1) = a4 + b4Xi(l) + W4i * (4.4)

Comparing each of (4.1)-(4.4) with (3.8) makes it obvious what the implied T and G functions are in each case, except for (4.4). If in that equation the stochastic term W4i is put at its expected value of zero, then it may be written

zi(2)/zi(l) = (1 - a4) -b4Xi (1)

from which

log zi(1) - log zi(2) log [(1- a4) -b4Xi(1)1-1

so that here T = log (to any base) and G(x) is given bv the rhs; in particular G (0) = log (1 -a4)-j. Using these functional forms in (3.9),

Zi*(2) = antilog (log zi(1) - log (1 -a4)-l) = (1 - a4)zi(1)

Then substituting from this into (3.10), the first term on the rhs of (3.10) becomes

21

pi(l) (zi(l) - (1 - a4)zi(1)) = a4Z(1) i=l

and the estimated fraction of the change in Z not due to malaria is a4Z (1)/ (Z (1) -Z (2)).

Using data from Newman (1965) ordinary least- squares estimation of (4.1) through (4.4) yields the following numerical versions, the figure appearing below

10 Second-period weights rather than those for the first period could be applied here, yielding a decomposition of the lhs of (3.10) that in principle is different from that given in the text. But in the present case the distributions of these two sets of weights were very close. The square root of the mean squared difference between them was only 0.002914, while the proportion of the average district was 1/21 = 0.047619. So the two decompositions in fact differ very little, and only the estimates based on first-period weights are reported here.

11 This note of hesitation is in order because it is not clear that both specification (4.2) and (4.3) really appear in Gray (1974). The discus- sion in the last paragraph of Gray (1974, p. 222) contains elements of both versions, but the numerical estimates given conform much more closely to (4.2). It seems unlikely therefore that Gray actually put forward (4.3), which will turn out in any case to be a poor specification.

each parameter estimate being the relevant t statistic :12

zi(1) - zi(2) = 4.5832 + 0.2581xi(1) (5.89) (11.32)

r2= 0.8708, (4.1')

logio zi (1) - logio zi (2) = 0.1671 + 0.004418xi(1) (11.73) (10.59)

r2 = 0.8552, (4.2')

loglo zi(1) - logio zi(2) = 0.0755 + 0.1703 logio xi(1) (1.98) (5.91)

r2 = 0.6473, (4.3')

(zi(1)- zi(2))/zi(l) = 0.3354 + 0.004966xi(l) (19.72) (9.97)

r2 = 0.8395 . (4.4')

It is quite straightforward to apply the measurement procedures of Section 3 to these equations, for in each case it is the constant term of the equation that plays the key role. The resulting estimates of malaria's effect on the fall in mortality are as follows: (4.1), 0.477; (4.2), 0.255; (4.3), 0.628; (4.4), 0.218.13

These estimates differ so widely that it is obviously essential to find out which best represents reality. Reference to equations (4.1')-(4.4') shows that all the specifications fit the data rather well (with the exception of (4.3) whose provenance is suspect anyway), so the appropriate estimate of malaria's effect can hardly be selected on that score. In any case, since the explained variable is different in each of the specifications (4.1), (4.2), and (4.4), comparison of the correlation coefficients would give quite illusory results.'4 In order to make sensible comparisons between these specifications it is necessary to develop some single criterion by which they may all be judged. This last task is carried out in the next two sections.

12 Gray (1974) is not explicit about the data sources for the numeri- cal estimates he provides. As footnote 11 indicates it is uncertain that his last equation on p. 222 is in fact (4.2'), but the correspondence seems close. His estimate of (4.4) is such that the mean of the lhs is 0.501. By least-squares regression this should be equal to the mean of the rhs, which according to Gray is 0.304 -t 0.117 = 0.421. This discrepancy signals an error in his calculation procedures, apparently located in his parameter estimates rather than in the estimates of the means of the explained and explanatory variables. Naturally this error affects his corresponding computation of the effects of malaria on mortality.

13 The figure for (4.2) differs from Gray's apparent estimate of 0.210, but his measure contains so many puzzles that it is hard to accept. For example, he uses what he claims to be the antilog of (b2x (1)) which seems to be inconsistent with his apparent employ- ment of the spleen rate in natural not logarithmic form. Moreover, this antilog is reported as 0.44 which for logarithms to base 10 would require either b2 or x(l) to be negative; but neither is. On examina- tion it seems that this figure of 0.44 must have been obtained as the difference between the antilog (2.05) of the mean of the lhs of his equation and the antilog (1.61) of the intercept term on the rhs. But while the mean of the lhs does equal the sum of the means of the terms on the rhs there is no such relationship between their antilogs, which is apparently what Gray relies upon.

14 For a good elementary account of why this is so, see Rao and Miller (1971, pp. 13-18). The grounds for suspecting (4.3) are described in footnote 11.

This content downloaded from 185.2.32.134 on Sun, 15 Jun 2014 18:31:42 PMAll use subject to JSTOR Terms and Conditions

Malaria and Mortality 261

5. A UNIFIED METHOD OF ESTIMATION

In Section 2 it was noted that in the second (post-DDT) period the crude death rates zi were quite close to uniformity across districts. This particular feature of Sri Lanka's experience may be utilized in order to make possible the use of powerful methods of nonlinear regres- sion analysis first developed by Box and Cox (see e.g., Box and Cox (1964), Zarembka (1974)). To see why, first rewrite (3.8) as

T(zi(1)) = T(zi(2)) + G(xi(1)) + wi (i = 1,1 2, . .. , 21),1 (5.1)

and then suppose as a good approximation that

T(zi(2)) = M(T) + ei(T) , (i = 1, 2, ..., 21), (5.2)

where M(T) is the mean of the transformed variables zi(2), and Ei(T) is a random error term with mean zero and a small variance, whose distribution depends in principle on T as well as the zi(2). In fact, application of the Kolmogorov-Smirnov one-sample test (see e.g., Siegel (1956), pp. 47-52) showed that both for T = I and T = logio the assumption of normality for zi(T) is consistent with the data, even at the 90 percent level of significance; the values of the test statistic D were 0.107 and 0.110, respectively. Since, as will become apparent, almost all the nonlinear cases discussed here- after are in a sense combinations of these two polar cases, it is reasonable to combine (5.1) and (5.2)' and write

T(zi(1)) = M1(T) + G(xi(1)) + (wi + Ei(T)) (i = 1, 2, ... , 21) , (5.3)

where the new compound disturbance term (wi + Ei(T)) has the same properties as the simple term wi (which itself depends on 1' and G, as already noted). A regression of T(z (1)) on G(xi(1)) will then produce a best linear unbiased estimate of the constant term M(T) + G(0). Sinew T is known, M (T) can be calculated from the data and hence G(0) estimated; the computation of the zi*(2) proceeds as before, from (3.9).

For given T and G, this procedure is clearly inferior to that followed in Section 4. The latter involved the exact regression equation (3.8) rather than (5.3), which is derived only via the approximation (5.2). Now suppose that one does not wish to impose a prior structure on T and G but wants the observations themselves to disclose the best possibilities. Then equation (5.3) comes into its own, for unlike (3.8) it is accessible to treatment by the Box-Cox technique.

In their pioneering paper Box and Cox proposed an ingenious family of transformations. Given variables z and x between which an unknown but possibly non- linear relation is assumed to hold, they proposed the

transformations

z(? = (zx-1)/X for X 0 = ln z for X=O

and z > O X(z)= (xI-l1)/,u for g-4O

= ln x for u = O and x > O, (5.4)

where X and A, are unrestricted real numbers and z and x each takes nonnegative values except when X or A' is zero, in which cases z or x must be positive. It is easily shown that when z is positive z W is a continuous function of X, the expression (zX - 1)/X tending to ln z as X tends to zero; and of course similarly for x(S).

If now one postulates a linear relationship between z(X) and x(X) so that, applied to the situation under discussion,

zi(?(1) = a + bxi(0)(1) + (wi + Ei(T)) (i = 1,2, ...,21) , (5.5)

then in general this is a nonlinear relationship between z and x, determined by the four parameters a, b, X, and ,u, whose values are to be estimated from the data.

Special interest attaches to particular values of X and ,. Thus if X = Au = 1, substitution in (5.4) and (5.5) yields

zi(l) = (1 + a - b) + bxi(l) + (wi + ei(T)) (i= 1,2,...,21) , (5.6)

which is of the form (5.3), with T = I, G linear, and M(T) + G(O) = 1 + a - b. In the same way, X = 0 and A, = 1 leads to

ln zi (1) = (a - b) + bxi (1) + (w i+ et (T)) (i= 1,2, ...,21) , (5.7)

where T = ln, G is linear and M(T) + G(O) = (a - b). Apart from the use of ln rather than logio, which because there is a constant term in (5.7) is only a trivial change of scale for the Box-Cox method (cf. Schlesselman (1971)), (5.7) is an approximate version of (4.2). Simi- larly, putting X = 0 and At = 0 produces

ln zi(1) = a + b ln xi(1) + (wi + Ei(T)) (i = 1, 2, ... 21) , (5.8)

which is entirely analogous to (4.3), with T = ln and M(T) + G(O) = a. It may be shown also that the case X = -1 and t= 1 provides a good approximation to (4.4), but the demonstration is lengthy and therefore is omitted.

Following the procedures outlined at the beginning of this section one can use the Box-Cox technique, for given X and Au, to obtain estimates of malaria's effect on mortality. In particular, by choosing the values of X and At just given it should be possible to approximate the estimates obtained directly from (4.1)-(4.4) in Section 4. The closeness of the two estimates will in each case

This content downloaded from 185.2.32.134 on Sun, 15 Jun 2014 18:31:42 PMAll use subject to JSTOR Terms and Conditions

262 Journal of the American Statistical Association, June 1977

Comparison Between Direct and Box-Cox Estimates of Malaria's Effect on Mortality

Corre- sponding Box-Cox param-

eter Proportion of fall in mortality values due to malaria

Equation A ,u Direct estimate Box-Cox estimate

(4.1) 1 1 0.477 0.514 (4.2) 0 1 0.255 0.304 (4.3) 0 0 0.628 0.769 (4.4) -1 1 0.218 0.150

indicate how good the approximation is, based on (5.2).15 The table gives these comparative estimates and shows that the degree of approximation yielded by this use of the Box-Cox method is great enough to give some confidence that it can provide a unified framework within which to consider alternative specifications of T and G.

The Box-Cox equations on which the estimates of the table are based are

(i) X = 1, I = 1:

zi(1) - 1 = 15.1755 + 0.2704(xi(l) 1)

(16.30) (9.69) = 0.8317

(ii) X = O, g = 1:

ln z(1) = 2.8155 + 0.0113(xi (1) - 1)

(66.50) (8.90) = 0.8065

(iii) X = 02l 0:

ln z (1) = 2.5538 + 0.1948 ln xi (1) (25.47) (5.92)

r2= 0.6486

(iv) X=-l,= 1:

zj(1)-1)/- 1 = 0.9411 + 0.000490(xi(l) - 1),

(442.87) (7.69) r2= 0.7568

with the t statistic given as usual in parentheses below each parameter estimate.

6. THE "BEST" ESTIMATE

A beauty of the Box-Cox approach is not merely that quite disparate specifications can be accommodated within one general class but that the value of the likeli- hood function which each specification of X and ,u gener- ates is a measure by which each of them can be ranked. Thus maximum likelihood provides a good criterion for selecting the best pair of T and G functions.

15 In the case of (4.4) the diff erence in the estimates also involves the additional approximation already referred to.

It can be shown that the likelihood function L is given by

21

In L(a, b, X, ,) = X I in zi(s) - (21/2) In #W2(X, y) i=l

21

- In zi(s) + (21/2) In 2r - (21/2) e=1

where &w2 is the estimated variance of the (normally distributed) (wi + Ei(T)), which of course depends on a and b, and 21 is the sample size N. In order to maximize L with respect to a, b, X, and ,u, it is necessary and sufficient to maximize a much simpler expression, namely

21 21

P(X, ,u) = X j In z (s) - (21/2) In (E U(X, p)2)

i=1 i=1

where the variable part of the second term on the rhs is the sum of squared residuals, and for convenience one writes Ui rather than (wi + Ei(T)).

If (X*, ,.t*) are the maximum likelihood estimates of (X, ,), then the statistic P(X*, ,*) - P(X, u) has a chi- squared distribution, so the appropriate test statistic for the confidence level a is then (-)X2(2, a).

A systematic search through more than 300 pairs of values of X and , produced the following equation that maximized the probability of the observations for Sri Lanka during 1936-45,

(zi(1)0 - 1)/0.65 = 7.9401 + 0.0886(xi(1) - 1) (25.50) (9.49)

r2= 0.8259, (6.1)

so that X = 0.65 and ,u = 1 are the best estimates for the Box-Cox parameters. This specification represents in a rough sense a weighted average of (4.1), corresponding to X = 1 and ,i = 1, and (4.2), corresponding to X = 0 and , = 1 with the respective weights being 0.65 and 0.35.

Obviously (6.1) has the same form as (5.3), while a Kolmogorov-Smirnov test does not reject the hypothesis that (wi + Ei(T)) is normally distributed; the value of D is 0.109. Applying the measurement procedures of Sections 3 and 5 to (6.1) produces an estimate of the proportion of the fall in mortality attributable to malaria of 0.439. It is interesting that the appropriate weighted average of the first and second Box-Cox estimates in the table gives very nearly the same result, i.e.,

(0.65 X 0.514) + (0.35 X 0.304) = 0.441

Although (6.1) provides the best point estimates for the parameters X, ,, a, and b, the flatness of the likelihood surface generated by the observations for Sri Lanka means that rather different measures of malaria's effect on mortality are also consistent with the data. In par- ticular, although using the test statistic already described leads to the rejection (at the 99 percent level) of the Box-Cox approximation of (4.3) as a plausible specifi- cation, the corresponding approximation of (4.4) is only on the margin of rejection at the 95 percent level, and

This content downloaded from 185.2.32.134 on Sun, 15 Jun 2014 18:31:42 PMAll use subject to JSTOR Terms and Conditions

Malaria and Mortality 263

neither of the Box-Cox approximations of (4.1) or (4.2) can be rejected even at the 90 percent level.'6

We must conclude, therefore, that although the best guess for malaria's share in the decline of mortality is 44 percent, the data are not sufficiently detailed to deny that this estimate may be in error fifteen or more per- centage points either way.'7

[Received October 1975. Revised September 1976.]

REFERENCES

Abhayaratne, D.E.R. (1950), "The Influence of Malaria on Infant Mortality in Ceylon," Ceylon Journal of Medical Science, 7, 33-54.

Barlow, Robin (1967), "The Economic Effects of Malaria Eradica- tion," American Economic Review, Papers and Proceedings, 57, 130-48.

(1968), The Economic Effects of Malaria Eradication, Re- search Series No. 15, Bureau of Public Health Economics, School of Public Health, University of Michigan, Ann Arbor, Michigan.

Box, George E.P., and Cox, David R. (1964), "An Analysis of Trans- formations," Journal of the Royal Statistical Society, Ser. B, 26, 211-43.

Coale, Ansley J., and Hoover, Edgar M. (1958), Population Growth and Economic Development in Low-Income Countries, Princeton, New Jersey: Princeton University Press.

Cullumbine, H. (1950), "An Analysis of the Vital Statistics of Ceylon," Ceylon Journal of Medical Science, 7, 91-272.

Frederiksen, Harold F. (1960a), "Malaria Control and Population Pressure in Ceylon," Public Health Reports, 75, 865-8.

(1960b), "Determinants and Consequences of Mortality Trends in Ceylon," Public Health. Reports, 76, 659-63.

16The values of the test statistic corresponding to significance levels of 90, 95, and 99 percent are 2.3025, 2.9955, and 4.6050, re- spectively. The actual sample values of that statistic for the various specifications were: (4.1), 0.1532; (4.2), 0.5175; (4.3), 6.7841; (4.4), 2.9562.

Strictly speaking, of course, what were tested here were not the equations (4.1)-(4.4) but the corresponding pairs of Box-Cox pa- rameters (1, 1), (0, 1), (0, 0), and (-1, 1).

17 Since the Box-Cox estimates corresponding to (4.1) and (4.2) in the table are each a little higher than the direct estimates derived from (4.1) and (4.2), it is reasonable to suspect the MLE of being biased upwards by three or four percentage points, so that a better estimate may be 40, not 44 percent.

(1962), "Economic and Demographic Consequences of Malaria Control in Ceylon," Indian Journal of Malariology, 16, 379-91.

(1966a), "Determinants and Consequences of Mortality and Fertility Trends," Public Health Reports, 81, 715-27.

(1966b), "Malaria Eradication and Population Growth," American Journal of Tropical Medicine and Hygiene, 15, 261-3.

(1970), "Malaria Eradication and the Fall of Mortality: A Note," Population Studies, 24, 1, 111-3.

Gabald6n, Arnaldo (1949), "The Nation-Wide Campaign Against Malaria in Venezuela," Transactions of the Royal Society of Tropical Medicine and Hygiene, 43, 113-64.

Gill, C.A. (1940), "The Influence of Malaria on Natality, with Special Reference to Ceylon," Journal of the Malaria Institute of India, 3, 201-52.

Gray, R.H. (1974), "The Decline of Mortality in Ceylon and the Demographic Effects of Malaria Control," Population Studies, 28, 2, 205-29.

Kmenta, Jan (1971), Elements of Econometrics, New York: The Macmillan Co.

Macdonald, George (1951), "Community Aspects of Immunity to Malaria," British Medical Bulletin, 8, 33-6.

Macgregor, J.D., and Avery, J.G. (1974), "Malaria Transmission and Fetal Growth," British Medical Journal, 3, 433-6.

Meade, T.W. (1968), "Medicine and Population," Public Health, 32, 3, 100-10.

Meegama, S.A. (1967), "Malaria Eradication and Its Effect on Mortality Levels," Population Studies, 21, 3, 207-27.

(1969), "The Decline in Maternal and Infant Mortality and Its Relation to Malaria Eradication," Population Studies, 23, 2, 289-302, 305-6.

Newman, Peter (1965), Malaria Eradication and Population Growth: With Special Reference to Ceylon and British Guiana, Research Series No. 10, Bureau of Public Health Economics, School of Public Health, University of Michigan, Ann Arbor, Michigan.

(1969), "Malaria Eradication and Its Effects on Mortality Levels: A Comment," Population Studies, 23, 285-8, 303-5.

I (1970), "Malaria Control and Population Growth," Journal of Development Studies, 6, 2, 133-58.

Rao, Potluri, and Miller, Roger (1971), Applied Econometrics, Belmont, California: Wadsworth Publishing Co.

Schlesselman, J. (1971), "Power Families: A Note on the Box and Cox Transformation," Journal of the Royal Statistical Society, Ser. B, 33, 307-11.

Siegel, Sidney (1956), Nonparametric Statistics, New York: McGraw- Hill Book Co.

Taeuber, Irene B. (1949), "Ceylon as a Demographic Laboratory: Preface to Analysis," Population Index, 15, 293-304.

Zarembka, Paul (1974), "Transformation of Variables in Econome- trics," Frontiers in Econometrics, ed. P. Zarembka, New York: Academic Press.

This content downloaded from 185.2.32.134 on Sun, 15 Jun 2014 18:31:42 PMAll use subject to JSTOR Terms and Conditions


Recommended