+ All Categories
Home > Documents > Measuring Consumption Compulsory readings

Measuring Consumption Compulsory readings

Date post: 20-Mar-2023
Category:
Upload: khangminh22
View: 0 times
Download: 0 times
Share this document with a friend
581
Measuring Consumption Compulsory readings Disclaimer The print version of the compulsory readings has been shortened: it only provides the required sections from the papers and books indicated as compulsory for each lecture. For the sake of brevity, most of the acknowledgments, abstracts, forewords, annexes and appendices from the original publications are not reproduced here. List of compulsory readings 1. Deaton, A. and Zaidi, S. (2002). Guidelines for Constructing Consumption Aggregates for Welfare Analysis. LSMS Working Paper No. 135. Washington, DC: The World Bank. 2. Grosh, M., and Glewwe, P. (1998). Data Watch: The World Bank's Living Standards Measurement Study Household Surveys. The Journal of Economic Perspectives, 12(1), 187- 196. 3. Grosh, M. and Glewwe, P. (2000). Designing Household Questionnaires for Developing Countries, Lessons from 15 years of Living Standards Measurement Study, Volume One: World Bank. (Ch. 2, 3). 4. Iarossi, G. (2006). The power of survey design - a user's guide for managing surveys, interpreting results, and influencing respondents. Washington, DC: World Bank. (Ch. 2) 5. FAO and The World Bank (2018). Food data collection in Household Consumption and Expenditure Surveys. Guidelines for low- and middle-income countries. Rome. (Sections 2, 3). 6. Smith, L. C., Dupriez, O., and Troubat, N. (2014). Assessment of the reliability and relevance of the food data collected in national household consumption and expenditure surveys. International Household Survey Network. (Section 3). 7. Oseni, G., Durazo, J., and McGee, K. (2017). The Use of Non-Standard Units for the Collection of Food Quantity. LSMS guidebook. 8. Amendola, N. and G. Vecchi (2014). Durable goods and poverty measurement, World Bank Policy Research Working Paper no. 7105. 9. De Waal, T., Pannekoek, J., and Scholtus, S. (2011). Handbook of Statistical Data Editing and Imputation. New York: John Wiley and Sons (Ch. 1). 10. Barnett, V., and Lewis T. (1994). Outliers in Statistical Data. 3rd edition. J. Wiley and Sons (Ch. 1 and 2). 11. Cowell, F. (2011). Measuring inequality. Oxford University Press. (Ch. 1 and 2). 12. Ravallion M. (2016). The Economics of Poverty: History, Measurement, and Policy. Oxford University Press. (Ch. 3.1-3.2, 4, 5.1-5.3). 13. Glewwe, P., and Levin, M. (2005). Presenting simple descriptive statistics from household survey data. In UN, Household Sample Surveys in Developing and Transition Countries. Studies in Methods Series F No. 96. 14. Schwabish, J. A. (2014). An economist's guide to visualizing data. Journal of Economic Perspectives, 28(1), 209-34.
Transcript

Measuring Consumption

Compulsory readings

Disclaimer

The print version of the compulsory readings has been shortened: it only provides the required sections

from the papers and books indicated as compulsory for each lecture. For the sake of brevity, most of the

acknowledgments, abstracts, forewords, annexes and appendices from the original publications are not

reproduced here.

List of compulsory readings

1. Deaton, A. and Zaidi, S. (2002). Guidelines for Constructing Consumption Aggregates for

Welfare Analysis. LSMS Working Paper No. 135. Washington, DC: The World Bank.

2. Grosh, M., and Glewwe, P. (1998). Data Watch: The World Bank's Living Standards

Measurement Study Household Surveys. The Journal of Economic Perspectives, 12(1), 187-

196.

3. Grosh, M. and Glewwe, P. (2000). Designing Household Questionnaires for Developing

Countries, Lessons from 15 years of Living Standards Measurement Study, Volume One: World

Bank. (Ch. 2, 3).

4. Iarossi, G. (2006). The power of survey design - a user's guide for managing surveys,

interpreting results, and influencing respondents. Washington, DC: World Bank. (Ch. 2)

5. FAO and The World Bank (2018). Food data collection in Household Consumption and

Expenditure Surveys. Guidelines for low- and middle-income countries. Rome. (Sections 2, 3).

6. Smith, L. C., Dupriez, O., and Troubat, N. (2014). Assessment of the reliability and relevance

of the food data collected in national household consumption and expenditure surveys.

International Household Survey Network. (Section 3).

7. Oseni, G., Durazo, J., and McGee, K. (2017). The Use of Non-Standard Units for the Collection

of Food Quantity. LSMS guidebook.

8. Amendola, N. and G. Vecchi (2014). Durable goods and poverty measurement, World Bank

Policy Research Working Paper no. 7105.

9. De Waal, T., Pannekoek, J., and Scholtus, S. (2011). Handbook of Statistical Data Editing and

Imputation. New York: John Wiley and Sons (Ch. 1).

10. Barnett, V., and Lewis T. (1994). Outliers in Statistical Data. 3rd edition. J. Wiley and Sons

(Ch. 1 and 2).

11. Cowell, F. (2011). Measuring inequality. Oxford University Press. (Ch. 1 and 2).

12. Ravallion M. (2016). The Economics of Poverty: History, Measurement, and Policy. Oxford

University Press. (Ch. 3.1-3.2, 4, 5.1-5.3).

13. Glewwe, P., and Levin, M. (2005). Presenting simple descriptive statistics from household

survey data. In UN, Household Sample Surveys in Developing and Transition Countries.

Studies in Methods Series F No. 96.

14. Schwabish, J. A. (2014). An economist's guide to visualizing data. Journal of Economic

Perspectives, 28(1), 209-34.

I rrn%r LSM135LI I LU May 2002Living StandardsMeasurement StudyWorking Paper No. 135

CGildelines for Conctriiwtinc Cnnznimntinn

Aggregates for Welfare Analvsisc oC1 c, . . --- - - - --. - -

Pub

lic D

iscl

osur

e A

utho

rized

Pub

lic D

iscl

osur

e A

utho

rized

Pub

lic D

iscl

osur

e A

utho

rized

Pub

lic D

iscl

osur

e A

utho

rized

Table of Contents

FOREA R WARD. vii

ABSTRACT ............................................................ ix

ACKNOWLEDGEMENTS ............................................................ xi

1. INTRODUCTION ............................................................ 1

2. THEORY OF THE MEASUREMENT OF WELFARE ............................................................ 42.1 INTRODUCTION: ................................................................... 42.2 MONEY METRIC UTuIIY: ................................................................... 42.3 AN ALTERNATrVE APPROACH: WELFARE RATIOS: ................................................................... 82.4 INCOME VERSUS CONSUMPTION: ................................................................... I 12.5 DURABLE GOODS: ................................................................... 32.6 THE EVALUATIO N OF TIM E AND LEISURE: ................................................................... 152.7 PUBLIC GOODS AND PUBUCLY SUPPLIED GOODS: ................................................................... 172.8 FARM HOUSEHOLDS: ................................................................... 182.9 DIFFERENCES IN TASTES ACROSS PEOPLE AND HOUSEHOLDS: ................................................................... 19Box 1. SUMMARY OF THEORETICAL ISSUES AND RECOMMENDATIONS ................................................................. 21

3. CONSTRUCTING THE HOUSEHOLD CONSUMPTION AGGREGATE ....................................... 233.1 iNTRODUCTION: ............................................................ 233.2 FOOD CONSUMPTION: ................................................................... 253.3: CONSUMFTION OF NON-FOOD ITEMS: .............................................................................. 293.4 CONSUMER DURABLES: ................................................................... 33

Box 2. RECOMMENDATIONS FOR CONSTRUCTNG THE CONSUMPTION AGGREGATE .............................................. 38

4. ADUSTING FOR COST OF LIVING DIFFERENCES ............................................................ 394.1 !N i iRODUCU IION: ............................................................... ....................................... 394.2 PAASCHE PRICE INDEX: ................................................................... 414.3 %Ci-A--uL. -a l iG ALINSPE-YKES uN-DEx: .. ........................................................................................... 4.3

5. ADUSTING FOR HOUSEHOL-D COMrOSMTONr .................................................................... 465.1IN T RODUC TON : ......................................................................................................... 465.2 EQUI-VAIENCE SCALES: ......................................................................................................5.3 BEHAvioRAL APPROACH: .................................................................. ............................... 48

a .tSUbJL I IV E,APRO L A -H: .................... ............................................................................... 149

5.5 ARBITIARY APPROACH: .................................................................. ................................ 50Box 3. ADrjus tXiTS i,-~a-viFOROST-OF-rLI-VLL,4%G D'iFFr-ERCACESAN KIVU S HL %CONrruS lkN ................................................. ,2

). ~ - -n 1.40T.

6. iviEUiHODS urOF4aii SNi x Avil, I' Ai'ALY .......................................................................... 536.1 NTRODUCToN: ................................................................. 536.2 STUvCHSIsC W .uANCE.: .4................... 536.3 USING SUBSETS OF CONSUMDn1N AND THE EFFECTS OF MEASUREMEENT ERROR: .................................................... 556.4-Er SFSi-I Vi-IY ANALY IS -SVVI rnE Q UVALENCE SC Al ES ........................................................................... 5,8

REFEREKNClES ................................................................................................... 64

I XThTVI2 ^1M TI"'VV^.TNA. AU AVV JM PL U-k L.P

Pnvertvh is comnlex nhennmnenni invnlvrino niiiltinle .din iniw of(denrintwiV.nof whiAh the lack of

goods and services is only one. Even so, there is a good deal of consensus on the value of using a consumption

aggregate as a summary measure of living standards, itself an important component of human welfare. In

recent years, in much of the World Bank's operational work as well as in applied research, consumption

aggregates constructed from survey data have been used to measure poverty, to analyze changes in living

standards ovAer tme, aned to assess tli A i.utirua. i ^r.:tof p m.s ar.a policies.

Despite this widespread use of consumption aggregates, there is little in the way of guidelines on how

to construct consumption aggregates from survey data. Researchers and analysts interested in using

consumption as a welfare measure must often work from whatever documentation exists from earlier exercises,- 1AA041 - I-- - _{_i A.1 _A A;_ A _ AAs9-Z_L A_ A 1A *1 . .__ IA- -- A 3-

GULL UL DU1LiL W65VIO, iuIi uv10%11FL1IV1 aL'I II suLLOui3J us. 11 Luii U L UlUii9 zao ui a guu Oval vi. ULUMCadAMy

replication with each analyst working afresh through the underlying theoretical and practical issues. This paper

seeks to fill the gap by providing a brief theoretical introduction followed by practical advice on how to

construct a consumption aggregate from household survey data.

We re oguiizue hait tLhere are several distiict audiencus 0or nese dgueiines, wno wiu use dinerent parts

of what follows, with different kinds of surveys; and for different purposes; so that it is useful to start with

something of a road map:

Audience. We hope that these guidelines will be useful, not only to those whose immediate task is to

use a survey (or surveys) to construct consumption aggregates, but also to statisticians, economists, or advisors

who are interested in why consumpntinn agrevateQ mioht be iiefiul and the venne l fe ires of their

construction. This latter group includes those in Statistical Offices who might be considering instituting a new

consumption survey, or in modifying an old one. The arguments for and against consumption, usually in

comparison with income aggregates, come up often enough that is useful to have guidelines on the main

arguments, and on what is involved in constructing a consumption aggregate. T he first part of these guidehnes,

which out lines the mndevrhAing theo,r, as well as the 4,rnrnnr 1,Boxes,^ 1ll bU of,r.os* ineresat tot gu.

Issues of survey and questionnaire design are not dealt with in these guidelines but are dealt with in the

companion piece by Deaton and Grosh (1998). At the same time, we have tried to discuss most of the detailed

decisions that would have to be made by our first audience, those actually doing the calculations. There is

illustrative code in the Appendix covering much of what has to be done, and there is discussion of most of the

1

practical issues that have arisen over the years. But it is important that the calculations not be done

mechanically. Each survey is different from every other survey, if only in detail, and each country has its own

institutions that need to be taken into account. Constructing consumption aggregates without knowledge of the

country and it insLitutions will not give -Ueflul results. In consequen,ce, analysts n,eed to be famlliliar wiuh Ule

theory in order to be able to make sensible decisions when a new Droblem presents itself. as is alwavs the case

in practice.

Surveys: LSMS versus others? These guidelines have been prepared by and for the LSMS group in the

Bank, and the examples in tne Appendix are drawn from LSMS surveys around the worid. wnenever we

require asptneifi examnle, we take it frnm some LSMS survey, and we generally assu-mme that some version .f

LSMS protocols have been used. However, we believe that these choices should not compromise the

usefulness of the guidelines for those who are constructing consumption aggregates from other surveys. The

theory is general, and almost all of the details of the construction would have to be followed through in one

form or another using any consumption survey. It shouid aiso be noted that as the number of LSMS surveys

has grown there has bee.n a great deal of variation in sulrvPy desigm, so that thpre are very few consumption

surveys around the world whose design would not be represented in one or more LSMS surveys. A more

serious issue is that many non-LSMS surveys will lack at least some of the information used in constructing a

comprehensive measure.

.. osean A C_,4 _fAO T_ -,U-+ -;.., .A4 11- -- Ap_ fl+U-*AA + A A A_II Az__11r I %A VLJL%AI. Ii4i WIVLaL LVJiLJ%WAvva WI,1.7 wYJiVaJL G.aOLL1%" LaL ULIM uLw ,U11LOUIUU11 a8&Vr,4IZ WIll Ur,

used in poverty analysis, identifying the poor, and computing standard measures of poverty and ineaualitv.

Such aggregates are also used for incidence analysis, to identify the position in the income distribution ofthose

who are likely to benefit or lose from some policy, such as subsidies or taxes, or the provision of a service. We

discuss the procedures that would normally be followed in constructing a consumption aggregate for such

purposes. However, -we shall encounter a number of examples where procedures wii nave to be modified

depending on the context and purpose. For example. some of the theoretically ideal concepts are hard to

implement, and because the best is sometimes the enemy of the good, we will often recommend not trying to

implement the theoretically ideal solution. But there will always be cases where the purpose of the exercise is

compromised by such a decision, and attempts must be made. For example, it is very difficult to measure the

welfare effects of public good provision, and we recommend against the routine inclusion of such valuations in

the consnmption agregat.es. But if the aggregates arete to be uedto exami.ne the effects of public good

provision on (for example) the regional distribution of poverty, then some attempt must be made. Again, the

theoretical framework is the ultimate guide as to what to do.

2

The rest of the paper is laid out as follows: The theoretical framework underlying the use of the

consumption aggregate as a welfare measure is briefly reviewed in Section 2, along with a discussion of some

ILUN UVO I. Uu1115 r, V WILW OUCdI a 1inesiur W1VULUV UlvaUU%. LJjJ9%ILL'L.. rUIV&aL4O .ll LL VY &WS %

consumption based measure of welfare are then presented in Sections 3-5. The paper outlines a three-part

procedure for the construction of a consumption-based measure of individual welfare: the various steps

involved in aggregating different components of household consumption to construct a nominal consumption

aggregate are laid out in Section 3. The construction of the price index in order to adjust for differences in

prices faced by nouseholds is then reviewed in Secuon 4. Tne aUJustntIL of ihe rew conswumpuior aggregate for

differences in composition between households is then presented in Section 5. Finally, Section 6 provides

examples of some of the analytic techniques that can be used to examine the robustness of the measure to

assumptions and choices made at the construction stage.

The consumption aggregates constructed in recent years from tne Living Standards Measurement

Sthudyf zA SM) glruPv-tJt frnm i.irht CiCntrip.S GTh:nma Vietnam7 Npena the CvRermnhIir- FcIladnr Smniith

Africa, Panama, and Brazil were reviewed for this paper (for a brief introduction to the LSMS project as well

as a description of the main survey instruments typically used in these surveys, please consult the appendix). In

none of the countries covered did we find the procedures followed to be fully in conformance with the

recommendations provided in this paper; nonetheless, these case studies provided the basis for much of theprc.1 nvA..4- n-A rtolara. nspreser,ted *n ffaper. Th. nro m. s used to nc . * nnnrn.m nn

aggregates in these countries are included in the appendix as they provide useful illustrations of the general

steps involved in constructing the aggregates.

3

2. TiEOWY OF TkIL H ASUmKLENi OF WAWTLLFAL

2.1 Introduction:

In this section, we discuss briefly the theoretical basis for the consumption-based measure of welfare

whose detailed construction is explained elsewhere in the report. Our concern here is a fairly narrow one,

focusing on an economic definution of living standards. 'w'e do not consider otiher important components of

welfare, such as freedom, health status; life-exnectancyv or levels of education. all of which are related to

income and consumption, but which cannot be adequately captured by any simple monetary measure.

Consumption measures are limited in their scope, but are nevertheless a central component of any assessment

of living standards.

One important concept here is money metric utility, Samuelson (1974), which measures levels of living

by the money required to sustain them. We start with this in Section 2.2 below. An alternative approach, based

on Blackorby and Donaldson's (1987) concept of welfare ratios, whereby welfare is measured as multiples of a

poverty line, is presented in Section 2.3. Eacn ofI te money-metric and weliare-rauo approacnes nas iLts

strPngths and weaknesses; bnth start frnm a nominal consumption aggregate, but adiust it differently= These

first subsections cover the basic ideas, and are followed by subsections on a range of theoretical issues that

repeatedly come up in practice. A fuller, and only slightly outdated, treatment is given in Deaton (1980) in one

of the earliest LSMS Working Papers (no. 7). Our treatment here skips theoretical developments that are of

limited relevance in practice given the data that are typically available, or that can be calculated. For exampie,

,,e r be~ now y,<rt4j iie ofdish u owipprcsa,w. Siirci iin. r.s fthe releuv,"t _ase4* i.t if mi';cu1.tt i-lla'ts.im.i

with any accuracy.

2.2 Money metric utility;

s.ejr ZP'Llrg po:nt, la the can.o-1..%ca corsjZu1FV l+tion JJ-L ^ ltn unwhich a ho-useO.I.ldJL V1hVUbU 'use

consumption of individual goods to maximize utility within a given budget and at given prices. Consumer

preferences over goods are thought of as a system of indifference curves, each linking bundles that are equally

good, and with higher indifference curves better than lower ones. A given indifference curve corresponds to a

given level of welfare, well-being, or living-standards, so that the measurement of welfare boils down to

labeling fne indifference curves, and hien iocatung each nousenoid on an indifference curve. hnere are many

wavs of iRhelinng indifference curves. One possibilitv would be to take some reference cnommdityh bndle and

to label indifference curves by the distance from the origin of their point of intersection with the bundle. In

Figure 1, the reference quantity vector is shown as the line qo so that the two indifference curves Hand JJare

4

labeled as OA and OB respectively. Instead of a reference set of quantities, we can select a reference set of

prices, gnd calculate the amount of money needed to reach the two indifference curves; this is Samuelson's

money metric utility. In the Figure, money metric utility is constructed by drawing the two tangents to thei,de-.3 c wvs,v. lpestb h .-ef.e-ce pr.ces sof uta -. ot fra. gfec.v r C'an

OD'in terms of a. or OC and OD in terms of a,.

q2tl I I

Flgure 1: Two ways of labeling indifference curves

To see how this works, we introduce some notation. Write x for total expenditure, and denote

bDy c u, p ) tne cost or expendiubre funiction, which associates wiui each vector of prices p uie minimuin

cost of rearhing the utility level u Since the household maximizes utility, it must minimize cost of reaching u;

so that

C(u { ;,) = X. (,2.1)

Denote by superscript h the household whose welfare we are measuring, and let p° denote a vector of

5

reference prices, the choice of which we discuss below. Money metric utility for household h, denoted Um , is

defined by

Uh =C(U, PO ) fl

which is the minimum cost of reaching u. at prices po. Note that, although utility itself is to a large extent

arbitrary, we can label indifference curves any way we choose, as long as higher indifference curves are iabDeie

wuith larger vhaues of utilityv money metric utility is defined by an indifference curve and a set of nrce; is

independent of the labels, and is therefore well-defined given the indifference curves.

The exact calculation of money metric utility requires knowledge of preferences. Although preferences

can be recovered from knowiedge of demand functions, we typically prefer some shortcut method that, even if

app,Atpc enot r. i4vp the Pe e teist. .n f okf b_h.tir%rnl rp1 at. ia .xni, nth l1 teip nefw%m,¶nn,iCg

assumptions, including often controversial identifying assumptions, and potential loss of credibility. The most

convenient such approximation comes from a first-order expansion of c ( Uh, po ) in prices around the vector

of prices actually faced by the household, ph The derivatives of the cost function with respect to prices are

the quantities consumed, a result known as Shephard's Lemma (or Roy's Identity), see for example Deaton and

1VIU_11UClUI I 1 70VJ '..4LGIJ 4J. III AL A WV Wi iL& 4i LVIV UP.' V% WAL9V VI %JU4LUL4o,x Ww w.11 GiaVALILIaL,

the cost function as follows

c( u, p ) _cp( uh, P)+(P Ph).q (2.3)

where the centered "* "indicates an inner product. Since the minimum cost of reaching uh at ph is the amount

spent ph * qh , (2.3) can be written as

umh =C(Uh pI) ;UpI *qh (2.4)

National Income Accounting Practice, in which real national product would include real consumer's

expenditure, which is the sum over all consumers of their consumption valued at base prices, i.e. the sum of the

right hand side of (2.4) over all agents.

Tlnis equation is still not quite in convenient form for practice, since we rarely observe a complete set

nf nt,i,m,ita for e'^h hnlii.h.nld, arld m2y inot even h2ve. ev.i!2hlep cnm rnhlPtP. se-t of rpfprPrn.Pp pitnri.i The

Paasche price index comparing the price vectors ph and po is defined as

6

ph P q (2.5)

so that, from (2.4), we have

,h _Ah _hU h # F F = AL (2.6)

m ph ph

so that money metric utility can be approximated by adding up all the housenolc's expendaiures, ana ividamg

hu a P iaashp ip npof pnrines.

For readers who are used to thinking about price indexes as summarizing prices at different points of

time, it is perhaps useful to add a few words of explanation about our use of the Paasche (and later Laspeyres)

labels for the price indexes used here. When we are working with a single cross-sectional household survey,4

s. p..c *v.a. #..o n isls e,.oa tho,, spat.;l people ,x,Zhr live~ in, Aiffpvp,'t ,arta of th.e poni,tru pay, Ai^..r.n.t

prices for comparable goods. (If we have two surveys for the same country at different times, or if the survey is

spread over months or years, the variation will be both temporal and spatial.) In industrialized countries, where

transportation is easy and inexpensive, and there are integrated distribution systems for most consumer goods,

spatial price variation is small, housing being the major exception. But in many developing countries, spatial

puIce UIIfe[ZII5 can UV laV rdgV, iLt UUUI relati-ve ariu abUslute PL..ces, aLU IL is IJUI14oLIL Lt LaIl-e UiUIVIL JILrlJacco-.it.

In the temDoral context. a Paasche price index is one whose (quantity) weights relate to the current period,

rather than the base period. In the current spatial context, the "current period" is replaced by the "household

under consideration", whose purchases are used to weight the prices it faces relative to some base or reference

prices. Perhaps the major practical point about (2.5) is that the weights for the prices differ from household to

househnold so that for example, two households in the same village, buying their goods in the same maryetLs,

snd faning the same nrices- will have different nrice indexes if thev have different tastes or incomes. At first

sight, such a situation may seem hopelessly complicated. But the transparency is restored if we tiink of money

metric utility as (2.4), the household's consumption bundle priced at fixed prices, and if we recognize that

(2.6), the deflation of nominal expenditure by a Paasche index with household specific weights, as simply a

means of caiculating the constant price totai.

Deriving total expenditure and dividing it by a price index is our basic strategy for using LSMS

consumption data to measure welfare. In practice, there are myriad adjustments and approximations to be

7

made, and there are cases where the conceptual framework has to be (slightly) extended. We deal with the most

important of these in the rest of this section. Before doing so, however, we must discuss a potential problem

with money metric utility, and an alternative approach.

2.3 An alternative approach: welfare ratios:

One of the important uses of measures of standard of living is to support policy, particularly policy

where distribution is an issue. In particular, much policy is conducted on the basis that transfers of money are

more valuable the lower in tne distriDution is the recipient. Tnis may take tne form of a focus on poverty wnere

the pnnr nre given nreferenre nver thy nnn-nnnr- nr it m:v he mnre Qnnhiqtjcnated invnlving diqtrihlltinnTl

weights that decline as we look at people with higher standards of living. Blackorby and Donaldson (1988)

have shown that the use of money metric utility can cause difficulties in this context. To see the problem, start

by assuming that total household expenditure (or income) x is a satisfactory measure of living standards,

something that would be true if everyone faced the same prices, and everyone lived alone, or at least in

hkJouseholAs LfI%o al! cond ts ied position. ,VASflA&J bW*OLvfl 0 bfl*fC c*Voj.espond elySJ ton '.d

in welfare, so that policymakers who are averse to inequality can work under the assumption that increases inx

have a lower social marginal value the higher in the distribution is the recipient. But money metric utility is not

x, but a function of x. As Figure 1 makes ciear, money-metric utility is higher the higher is x, so that more

money corresponds to a higher indifference curve and standard of living. But what Blackorby and Donaldson

sh1Uw lb U^,al, sFpeial cases apaLt, L.IUoL.-.y iIIIU.1U U4hLJ is jIUL .o acoLcave fun.cuorUL of A, u-.aL U1 1LV ML Wat w JLI

money metric utility increases with x can be constant, decreasing, or increasing, and that, in general, which is

the case depends on the choice of the reference price vector p° . This has the effect of breaking any close link

between redistributive policy and the measurement of its effects. For example, suppose that a change in

policy-for example, a transfer policy-has the effect of transferring money from better-off to worse-off

housenoids, so ulal uie ustribution of money income 'nas become more equa. But because we do noti know

exactly hnw mnney metric utility is linked to moneyv there is no miaramtee that the distrhiitinn ofmnney metric

utility has also narrowed. So we have lost the ability to monitor the distributional effects of policy, and what we

get when we try will be different at different choices of reference prices p 0 . Since we are often forced to use

whatever prices are available to us, we may not even be able to control the outcome.

n ovrder to avoid these problems, Blackorb-y ad Donaldson (1997) have -propose '2.e 'o a

"welfare ratio" measure in place of money-metric utility; within the Bank, the use of welfare ratios is reviewed

by Ravallion (1998). The basic idea is to express the standard of living relative to a baseline indifference curve.

In poverty analysis, a natural (and useful) choice is the poverty indifference curve, the level of living that marks

8

the boundary between being poor and non-poor. The welfare ratio is then the ratio of the household's

expenditure to the expenditure required to reach the poverty indifference curve, both expressed at the prices

faced by the household. Once again, Figure 1 can serve to illustrate. fifl- is taken to be the poverty indiference

Cirve, alidn Iftlhe inrAiffPrP. . c-mrvu we nrt-. trna ton mneaijr, thpen pnro^ded 1 tune orioc linpe nrc tAnlcp tn

illustrate current, not reference, prices, the welfare ratio is OD/OC or (equivalently) OD VOC'. In terms of the

cost functions, the ratio is given by

, h h~h C(u,p ) (2.7)wr = 27

C(u , p )

where u z is the utility poverty-line, the utility corresponding to the poverty indifference curve.

Unlike money metric utility, which is a money measure-the minimum amount of money needed to

reach an. in.dif e ciiv_-kthe welfare r.atio is a pure n.um..ber t.e staarAd of livinrg as a multiple of tfe

poverty line. In practice, it is useful to convert the welfare ratio into a money measure, and again the obvious

procedure is to multiply the ratio by the poverty line, defined as the cost of obtaining poverty utility at

reference prices, c(uz,po) . This gives the welfare ratio measure, which we denote by u .

r(uAh nh.1

c( , x c(u , p ) (2.8)

Like the money metric utility measure, (2.8) is total expenditure x divided by a price index, in this

case the true cost of living index ph versus p° computed at the poverty line indifference curve. This cost-of-

living price index would normally be approximated by the Laspeyres index

Dh z P pq_ Pi q (Pih ) ,,O('Lz o

p * q i. 1 poqZ pi idI pi)

(2.9)

where q2 is the quantity of i consumed at the poverty line and the weights w' are the shares of the budget at

the poverty line indifference curve and prices p 0 . Putting (2.8) and (2.9) together, we get an expression for the

money version of the welfare ratio that corresponds to (2.6) for money metric utility

9

h.h _' X ( r

U4r Ph k_-/

If we compare (2.6) and (2.10), we see that money metric utility involves deflation of expenditure by a

Paasche index of prices, while the welfare ratio measure involves deflation of expenditure by a Laspeyres price

index. (The calculation of the poverty-line weights in (2.9) will be discussed in Section 4.)

Jul somle applica.fions, s-uch. as in cor.a1jJIr. r.tUioal priceI rI.dexe atL 1WU Ir..orVL.e. Vt LoiIIr., P7asch1

and LasDevres price indexes are close to one another, either because the two sets of weights are similar in the

two periods, or because relative prices are similar. In the current context, where we are most often interested in

comparing prices between different places, where both weights and relative prices are often quite different, the

Paasche and Laspeyres price indexes will also be different, as will therefore be money metric utility and

weliare ratio measures. O'n 'iini 'iieooretical side, the point to note is tiat tne Laspeyres index M (2.IV) is

comnuted at the povertv indifference curve. so that its weights (see also 2.9) are unaffected by changes in total

expenditure of household h. As a result, ur is proportional to xh, and there is a direct link between

redistributive policy and the measurement of its effects. Welfare ratios resolve the difficulties of using money-

metric utility to monitor the outcomes of distributionally sensitive policies. On the empirical side, the Paasche

and Laspeyres indexes wiii be close to one another when the price relatives are close to one another over

diffiprfm.t crrnod zTni sPr%An,pQ nr when the wFeiahtq awp1ieid tn them are th. samei at the base, i.n tkica fkp

poverty line, as for other households in the survey. But there is no reason to suppose that either will be true in

cross-sectional surveys. Regional price differences are often markedly different across goods depending on

agricultural zones or distance from the ocean, and expenditure pattems differ sharply over households of

different types, or even across households that have much the same observable characteristics. In practice, as

wel as;n heo.1,I^. .. one-...etri;c ar.d we gr-..o approaches ar=ieyt .v ut il.LtanssIL1 as ju U1eULY, UJAI IUIV U*LL. auuW "GLIV-ZLaIapl ua.i aLv I IA)'v" t1v r Iv.UILLi UII~IVL4UlI aWVLO.

How do we choose between the two approaches to welfare measurement? As we have presented it so

far, the balance seems to favor the welfare ratio approach. It is simpler to calculate, since the weights for the

price index are the same for everyone, and it has a straightforward theoretical link to total expenditure, which

facilitates distIUouUUnal analysis. it Is also clear irom conversautions win nanK stf, fnat aeiiaton oI an

expenditure measure by a fixed weizht Lasnewres index is a procedure that is bnth simple and tran.sparent and

that could be explained and defended to policymakers. For some, those benefits are likely to be decisive.

Nevertheless, the welfare ratio approach is not without its own Achilles heel. As Blackorby and Donaldson

show, welfare ratios do not necessarily indicate welfare correctly. It is possible for a policy to make someone

10

better off, and yet to decrease their welfare ratio. This cannot happen for money metric utility, no matter which

set of reference prices are used in the evaluation. So while money metric utility is more problematic for

distributional calculations, the welfare ratio approach throws out at least some of the baby along with the bath-

water. Our own choice is to stick with money metric utility, and we recommend at least trying to calculate theralo,,onf Dnr.n.h irn.de-es as disc.useA ir- Ca,l+n A. A Tfhtl a,%-Aa, to .-n,Arnrise tr-rnnnn 0nA ornnhinit.,

we recommend describing money metric utility according to (2.4) where each household's bundle of goods and

services is evaluated, not at the prices they paid, but at a common set of prices. It is also worth noting that,

given the difficulties of calculating prices and price indexes in practice, as well as the much graver conceptual

and practical problems of dealing with differences in household size and composition, see Section 5, the choice

Ub.wLWVr L,UnVy ,1,,e-i, dlu welfar1V 4LV UtiLi is likely to be o.U-y Uoir of eveail dLIicUIL decisio,, ar.u Ilay

not be of paramount importance.

2.4 Income versus consumption:

Among economici measures oi iiving sianuarus, uie Mrain compeUtr Lo a consuipton-DaseU measure

is a measure based on income. In most industrialized countries. including the U.S.. living standards and

poverty are assessed with reference to income, not consumption. This tradition is followed in much of Latin

America, where many household surveys make no attempt to collect consumption data. By contrast, most

Asian surveys, including the Indian NSS and the Indonesian SUSENAS, have always collected detailed

consumption data, and are thus closer in spirit to LSMS surveys. Tnere are both theoretical and practical

reasons that rm.ust be considered whekn mn>iing the choice to use inconme or consuempirtioin tor neasClre liAng

standards.

In the theory outlined in the previous subsection, the choice between income and consumption did not

arise because, in a single period model, there is no distinction; all income is consumed, and income and total-44p.^;on -e i-an4nal lt1, n,nrc ti-an- one period, i,o di,.ereroni.ce vte..m .nnn-co .An .fnt- .d c - -a;n is

saving, or dissaving, so that in terms of the theory, the choice between income and consumption is tied to the

choice of the period over which we want to measure welfare. Over a long enough period of time, such as a

lifetime, and provided that we work in present value terms, the average level of consumption (including any

bequests) must equal the average level of income (including any inheritances), so that, if the concern is to

nLaeasure 1VU11i..LUi WeUlfar, Uie choIUeV UdVo In0t Ir,a ViI. Iila I is :,i,Uaeed a case LUe -114aUre fVI WUInLgII WIUI a

lifetime measure. Many would argue that ineaualitv is overstated by including the comDonent that comes from

the variation in living standards with age. According to this view, there is no inequality if, over life, everyone

gets their turn to be relatively rich or relatively poor. But the argument for abolishing the concept of age-related

11

povertv is weaker; and policymakers (and their constituents) frequently show concern about child and old-age

poverty. Even so, few would argue for very short reference periods for living standards; that someone is "poor"

for a day or two is of little concern, since most people have ways of tiding themselves over such short periods.

There is more concen about seasonal poverty, especially in agricultural societies with limited or very

expensive credit availability. But most standard househoid surveys are not designed to capture seasonal

fluictuiatinns in inenme or exnentitire- antd mnqt anti-nnvertv nolicies are directed at ln-nger term levels of

living. On balance, and for most purposes, there is widespread agreement that a year is a sensible practical

compromise for the measurement of welfare. In consequence, we must decide whether it is consumption,

income, or wealth, or some combination of all three, that permits the best measure of living standards over a

year.

The empirical literature on the relationship between income and consumption has established, for both

rich and poor countries, that consumption is not closely tied to short-term fluctuations in income, and that

consumption is smoother and less-variable than income. Extreme versions of the smoothing story involve

people evening out their resources over a lifetime, something for which there is little convincing evidence. But

*U-.r .s goode.lc tz os.-,.cr sr.ohot .. o. Ic-a.;rs ;.tLeshr e..t e"t'yoeUJVt iL UJ Wu , %VI%VLI1 Ulat AI UL. U."LiL %IUIL U JCJLLI UI P ,Ui LLU'..I.U4LLJ1I0 Iii L% U . 1IWILL t1wI HI, %,VLLUII3 VVVL

seasons, and in most cases, over a few years. As a result, in circumstances where income fluctuates a great deal

from year to year-as in rural agriculture-the ranldng of households by income will usually be much less

stable than the ranking by consumption, though exceptions can occur as discussed in Chaudhuri and Ravallion

(1994). Even limited smoothing gives consumption a practical advantage over income in the measurement of

living standards because ooservmig consumption over a reiauveiy snort penou, even a weeK or two, will tell us

a great deal more about annual-or even longer period-livingy standards than will a similar observation nn

income. Although consumption has seasonal components-for example, those associated with holidays and

festivals-they are of smaller amplitude than seasonal fluctuations in income in agricultural societies. In such

communities, it is usually not possible to get a useful measure of living standards based on income without

multiple seasonal visits to the household, something that has rarely been attempted within LSMS protocois. In

seasons wh2e.n people h.av e l.itl.e or no ircv.e their -ncmim..tr 4.n is ffr.an.ced frtomr assets, or from _nvits so fiiet

an alternative way to measuring living standards without consumption data would be to gather data on income

and assets. But assets are typically difficult to measure accurately, so that this is not usually a practical

alternative.

1FIq- cuui D4.IILC IVui i UIL IM4OL110 WJ113 IL 10 LJIVIUI- jAaAMA.IV LU raUJIJI V.ULIOLUMPUL)II UljLIl iIL,LICLIXICL4 HIU

most countries where an LSMS is beine run. Where self-emloyment, including small business and agriculture,

12

is common, it is notoriously difficult to gather accurate income data, or indeed to separate business transactions

from consumption transactions. Income from self-employment is hard to measure in industrialized countries

too, but self-employment is rarer relative to wage income, so that, for most households, a fairly accurate picture

o0 no-usehiu micomre cain ub obuined fixum only a Iew qUesLiUI1b cUvenng UlfiLe-rr LypVs o1 UiWcIIlU. iiI uiL

U.S.. it costs five times as much Der household to collect consumution (and other) information in the Consumer

Expenditure Survey (CEX) as it does to collect income (and other) data in the Current Population Survey

(CPS). As a result, the CPS can be much larger than the CEX, and it is the former that is used for poverty

statistics because of the greater regional and racial disaggregation that the larger sample can support. In

developing countries, the calculation of income often requires tne measurement of aii own-account

tr~iisacinn m innimen uwth muijtinle -Asitc as weu!p as a host nf sassimntintin aonuit sci,h 12hteTs as tl

depreciation of tools or animals. Consumption data are expensive to collect in poor countries as in rich, but the

concepts are clearer, the protocols are well-understood, and less imputation is required. Perhaps in

consequence, there is a long tradition of successful and well-validated consumption surveys in developing

countries.

One argument that can be made for income is that it is often possible to assign particular sources of

income to particular members of the household; for example, earnings from the market can be attributed to the

individual who did the work, and pensions are typically "owned" by an identifiable member of the household.

By contrast, consumption is only occasionally measured for individual household members. While many

aLUU1esIi UW U 1 LU4LLa laV, IL14UU r0'U UOV 01 OUWl1 U1UU11L U4aL4 a wLUUy L1ULdaU0UI WIUZI U1 LtUUWllUlU, "LiU

to examine the effects of who "owns" the income on purchases, it should be clear that there is no very clear

link between individual welfare and individual income. Earnes or pensioners share their incomes with non-

earnes and non-pensioners, so that the attribution of individual welfare from individual income requires some

sort of imputation scheme, just as it does for consumption. Although we shall discuss issues of how to adjust

welfare for housenold size and composition in Section 5 below, we provide no guidance on how to use survey

data on either consumntion or income to study the allocation of resources within the household. Such

allocations are often best studied through other measures, for example anthropometric or educational status,

though there is an extensive (though only occasionally successful) literature on using household consumption

data to make inferences about intrahousehold allocation, see Deaton (1997, Chapter 3) for a review and

discussion.

2.5 Durable goods:

Because durable goods last for several years, and because it is clearly not thepurchase of durables that

13

is the relevant comnonent of welfare. they reauire special treatment when calculating total expenditure. It is the

use of a durable good that contributes to welfare, but since use is rarely observed directly, it is typically

assumed to be proportional to the stock of the good held by the household. In consequence, when we add up

total household expenditures during the year, we add to expenditures on non-durables the annual cost of

hoiding tne stock of each durable. This cost is estimated from a conceptual experiment in wnich we imagine

the holusehold huviTi the tiurahle good at the beginningy nf each vear- and then selling it again at vear's end-

The costs of doing this depend on the price at the beginning of the year, pt, say, its price at the end of the

year, p,+,, on the nominal interest rate, rt, which is the cost of having money tied up in the good for the year,

and on the extent to which the durable good deteriorates during the year. Deterioration is modeled by means of

the simple assumption that the quantity of the good is subject to "radioactive decay"' so that, if the householdStutSa nff tl,p year w it the a.n,n,int St it w;ill li arn aarn..irit (11I C' S toep I back at th,e er'.d af t). year-

Seen from the beginning of the year, the sales at the end of the year must be deflated to put them on discounted

present value terms so that, in today's money, the discounted present cost (negative profit) ofthe transaction is

St Pt - Pt+j_ (2.11)

so that the cost of maintaining the st0cr-_,;r,h ;a iS of wi- r.ee to ad up total pei

approximately (provided the interest rate and depreciation rate are small)

.V^ n {w r+ ;){1<

where t is the rate of inflation of the durable good price, ( pt,+ - Pt) / Pt. If it is assumed that the rate of

inflation of the durable good is the same as that of other goods, the first two terms in the bracket give the realr.ta fA Aintoro en that thA - ,fia r t2Al1.A necaf 4 A,n.nlo A farS a^ ne afr ;e ;te. nneron nrina n...hln:AS k, *

sum of the real interest rate and its rate of deterioration. This is typically referred to as "user cost" or, since it

would be the rental charge for the durable in a competitive market, as the "rental equivalent." In Section 3.4

below, we discuss how the elements of (2.12) are computed from the LSMS data.

Note that tue ap-proach based on -user cost makes nO allowance for the (oi;tn considerabie) ransactions

costs involved in buying and selling durable goods, particularly used durable goods. Such cnots mean that

households cannot easily take advantage of temporarily high real interest rates by reallocating their portfolios

away from durables and holding money or other assets. Given this, it is important not to make user cost too

sensitive to market fluctuations in real interest rates, and this can be accomplished by using, not today's real

14

interest rate, but some average computed over a number of years.

One of the most important durable goods for many households is housing itself. Many people rent their

accommodation, in which case the "rental equivalent" is actual rent, which is gathered in the surveys and

dUUUU IL thLe cUUrUIlpL1U1 LUL.ol. rul UlUos WnhU LJWU U'uli IoIus11rg, uic ieutu1uu for ouier durables can

sometimes be used, if people have some idea of what their house is worth, or the rental rate can be imputed by

observing the rental costs of similar units. In Section 3.5 below, we discuss how this is calculated from the data

gathered in LSMS surveys.

2.6 The evaluation of tlme and leisure:

It is often pointed out that people's levels of living depend, not only on how much they spend, but also

on the amount of leisure they have, so that using a pure consumption measure could be misleading. For

example, if two people have the same income and expenditure, but one has a two hour daily commute to get to

work, and the other none, they are not equally well off. Similarly, singie-parent households with children areli.kely to be shorrt of innn-ar..t-qt tsir..e crnm.pared xAth, twn-nnntC-rt h16miiAhicd .;it' i- te nm.. iine'n7.e ain.

expenditure. Adding in an allowance for the value of leisure or of non-market work could eliminate these

anomalies.

The theory in Section 2.2 can readily be extended to tell us what to do. In the single period model,--. _.I_A..P_U1_ A. . -tl +-^A_n^+ -;- -+- .. . 4; Up U-A _ _+ CA_ __A -A _ 1_ - Ut___ -AAwuvav wuvin i avaiauivv aL a %vIJOaroLlt vvarLr LaL-,v w, uiv UUUr,%,L %VIIWUCU11L- i'I uVUL& aOu AUs UVI% UvIuII%O

p q=w(= - e) + y (2.13)

where T is the total time endowment, e is time spent in leisure, and y is income that is not associated with time

in the market. Rewriting this gives

p-q + we=wT + y (2.14)

so that leisure takes itS place with the other gnond5- with price w- and the budget constraint says that

expenditures on all goods, including leisure, must be no more than "full income," defined as non-market

income plus the value of the time endowment. Leisure can then be incorporated into the welfare measure by

working not with expenditure on goods, x, but with expenditure on goods and leisure together.

T1iics is cor.rea far as it goes, buIt if ,Ipfrar .eastepr.rt stops ha.rp c,4rvrnl,, o., r .i?rn a pt.A-,,,-

with full expenditure, a serious error will have been made. In the theory at the beginning ofthis section, money

15

metrin anid welfare ratio utility were rneasured- not hy expenditure.s x. but by x divided by a price index. In

those situations where the prices of goods do not differ much across households, which apart perhaps from

housing is the normal situation in industrialized countries, a welfare ranking of households according to x will

be very similar to a welfare ranking according to x deflated by the price index. But once leisure is introduced,

the situation is quite different, because the price of leisure, the wage rate, differs across peopie. Rankings byfi1ll expenditare are thefew- uy Adif.frp-t frnm ranw-lOiin by rlef6at.eA fiill ey.p tiijbip. uvhe.re thF. AlPflatnr

includes the wage as one of the prices. By the failure to deflate, the welfare of high wage people is overstated,

and the welfare of low wage people understated. A high wage rate not only makes the time endowment more

valuable-which is taken into account in full income or full expenditur-but it also makes leisure more

expensive-which is not. It is incorrect to assess individual or household welfare levels usingfiull income or

J"gt c,rr"sz UW"t&"sU Vs"I

Suppose that the error is avoided, and a price index including the wage is constructed which is then

used to deflate full expenditures. In some circumstances, the resulting welfare measure will be better than one

based on expenditures ignoring leisure. But there are also a number of problems that cause us not to

recoimmwenu uUs p[rocedure Hi g9VUVF4l. IL i,= zi 1s Uldt 'ul LrebuLts are bsels1vI* LU U-V Value assumieu for -ilu

time-endowment. T. should this be 24 hours for each day, or should it be something less, to allow for sleep and

"minimal personal maintenance?" More serious still is the real possibility that the simple model of labor supply

that underlies the calculations may be at odds with the facts. For example, suppose that we find an adult in the

survey who does not work. According to the model, this person is voluntarily allocating resources to leisure,

and although we don't oDserve tnat person's wage-because he or she is not worKing-we can impute some

value based onr the pewrson'ns ednucalinn and e.xperience, nr using the wavges -reivj. hv Atlw- _iiimlar peoplie w^ h,o

are working. But this person might be unemployed, and unable to find work, or maybe able to find work only

at wages that are much lower than those who are working, and whose wages we are using to value "leisure." It

adds insult to injury to class unemployed people as well-off by imputing to them a value of leisure based on

wages in a formal sector to which they have no access.

Because ofthese dangers, we believe that the attempt to value leisure introduces more problems than it

is likely to solve, and may compromise the integrity and general credibility of the welfare measures produced

from the survey data. Of course, we are not disputing that leisure is valuable, nor that there will be specific

cases where assigning some value to it will generate useful supplementary evidence on levels of living. Indeed,

time-use data, when available, are a val-uable comnplement to cons-umpLion aggregates for studying welfare.

They allow us to identifv those-such as people who must travel long distances to work- or women who mu-st

16

combine childcare with market work-whose welfare is incorrectly assessed by their consumption alone, and

permit at least rough-and-ready corrections in circumstances where such cases are a focus of interest.

2.7 Public goods and publicly supplied goods:

Another important contribution to living standards that is ignored by private consumption is that made

by publicly provided goods, the most important of which are education and health, but which also include such

things as police, water, sanitation, justice, public parks, and national defense. The major problem with

icuAdUUUig U1*Les is iiiuui a se; of prJces (or shadow pr1ices) U14t reflect what~ vUI are WULUlU to'ea4h huUe11U1U.

One approach to estimating prices is to look for effects of the provision of public goods on the demand for

private goods. For example, we might be able to assess the value of a new public clinic by seeing how much

less people spend on private doctors or clinics. But it is clear that this line of investigation, although useful in

some cases, cannot work in general. If the publicly provided good is separable in preferences from private.~~~~~~ ~~~~~~~ _ -- -_ _r* -___ ~1 -- _ .--- l1_ s ~_ -- _ __ * .s ___@s _ _rt_ -_ __ -_ -*. __ -- _~ _ -s *consunption, or if part ofI I is separable, cnanges mi the provision oI tIe former tor m its separablc part) will

have no effect on the latter. In conseauence. there is no hone of comnuting the full shadow nrice based on

observable behavior. The other approach, which has recently become popular in the project evaluation

literature, is to ask people how much they would be prepared to pay for an additional unit ofthe good. Whether

such "contingent valuation" procedures yield useful numbers remains controversial among both economists

and psychologists, see Hanemann (I 994) for the arguments in favor, and Diamond and Hausman (i994) for the

(,,ntiih mo,nre rn,nv u,n,r.in,g 1wmu.-m^^npt- _g:i~nct As . ith the i. nurn1t1atnn of leisulre, w e believe tha>t i em. tUtiniw* ,finr

public goods are likely to compromise the credibility and usefulness of welfare measures in general. None of

which gainsays the fact that the documentation of who gets access to publicly provided goods and services, and

whether these people are poor or rich, remains an important element in any overall assessment of living

standards and poverty.

It should be noted that there are some cases where the necessity to make some allowance for public

goods cannot be avoided. The most obvious case is when maldng international comparisons where in one

country, some good-health and housing are the obvious examples-is publicly provided or subsidized, while

in the other it is obtained through the market. Even within a country, urban residents may have access to

sUUbsUild hUospitals, cl iULc, rU "fai pFL-;ce shsIUF LU4L are IIUL availab:I iII U1e uULLUY51UV. %JIVVII LUhe

difficulties of measurement, and the variety of possible cases, it is impossible to make useful general

recommendations about how imputations might be done. It will sometimes be enough to be aware of the

problem and its implication for certain types of welfare comparisons; in other cases, it will be necessary to try

to revalue consumption at international or unsubsidized prices, even if such imputations carry a large margin of

17

error.

2.8 Farm households:

1W s _ . t A1 AN - AAA1 _w 1_4A _A1 _--_ A_1 -- _>s- A. --AA -_A U-AA .1-af1liV1aL UUULVUSIJIUJLUO UL V9VqIVFJLL5r, %AJUlUlID CL'. 11J Only ULJOWAl1ClIS 0 rVVUQ CJUKJ QUL . WC4 V , U%l a4

producers. Many people have small, own-account business, and many more are farm-households who produce

goods, sometimes for the market, and sometimes for their own consumption. The standard approach to these

mixed entities is to split them into a consumption unit and a production unit. This can be done under the

conditions of the "separation" property, see Singh, Strauss, and Squire (1976). If markets are perfect, so that all

factors are perfectly oimiugencous and can be bought and sold at fixed prics m uimii ted quanuties, tnen a

farm-ho:usehold behaves exactly as if it were the sum of a farm. which maxi-mizes nrofits at given market

prices, and a household, which chooses its consumption bundle so as to maximize its welfare at fixed prices

and subject to its income, including the profits from its farm. The assumptions of the separation theorem are

more obviously appropriate to the owners of an agribusiness who live in New York city than to most

subsistence farm househoids in developing countries, or eisewhere. Family labor is not the same as hired labor,

wnrlr may not always be Avnilnhle 2t "the" w,2ge, and the costs of translport tno and frnm uwnrk may redluce the

effective price of work on the home farm. All of these issues can be dealt with by suitable modifications of the

theory, but only at the cost of introducing shadow prices that are even more difficult to observe and to calculate

than the actual prices, the collection of which itself imposes considerable difficulty.

"T,. jJSW.-c , it i8 Wiflc.uAAt to - -be.f.er h.. t- o- treat. -fl% fats b .4..es a0

distinct units, and to value the sales from one to the other at some suitable prices. These prices are of course

not observed for the households for which they are required, but must be imputed from purchases of such

goods by other households, or from prices collected in the community questionnaire. This tends to be a very

approximate business, so that it is perhaps unreasonable to insist too strictly on abstract considerations.

1NUVVUIIAVIC65, IL lb WVLUI UIULULIg UIUL IIWLa1r.eL piIVre ofUIL r,IAIUUde aU eVIVIRVIL UL WUlUpnUL aLIU UdlfibUUUUHi costs

that should not be included when evaluating consumption from home production: "farm-gate" not "market"

prices are appropriate for imputation. It is also necessary to be careful about quality comparability, home

produce may (or may not) be of lower quality, and water from the local pond is certainly different fromL 'Eau

Perrier.

Ac we shall qee helnw imnuitatinns are tvnicallv roiuh and readv and subjeihict tn a goon deal nf

inaccuracy. In countries where a large fraction of food consumption comes from home production-see Table

3.1 for examples-imputations, and the role of the separation theorem, can generate considerable discomfort

18

with the resulting calculations. The methods of this paper make most sense where markets are active, and

where the standard neoclassical model is a good approximation to reality. For many non-monetized subsistence

economies, this is hardly the case. In such economies, the ratio of measurement to imputation is often quite

low, and there is a real question about whether we are "'measuring" or "assuniing;. And even if imputations are

accurate on av .agewhich would be assIMing a geat deAIt,ey tenrd to be less ,.4-able tkan would be the

true data, so that their use tends to understate inequality and (in most cases) poverty. Money metric and

welfare-ratio measures of welfare were developed to measure living standards for households who obtain their

goods and services through the market and make the best choices that their incomes will permit given the

prices that they face. In peasant economies, this neoclassical model is often a poor approximation to reality,GUU V11~~ 11Vd~U~ii1i1L d~U III %AJ IVUIjJII drL,-,L lb UHLKViy LV Ur, V1ULVI 4%'ALUILV, VI UbrIUI. .ILar.du we:fare- r,a1e.e. based. on. a consun-.wYon -aggrga. isurlk'ytbei.r cua o-sfl..e

again. we have no useful counsel except to be aware of the issue, and sometimes to be prepared to concede

defeat.

2.9 Differences in tastes across people and households:

The theoretical framework of Section 2.2 works with a single set of nreferences so that when we rank

different households according to their money metric utility, we are locating their different expenditures levels

on the same set of indifference curves. Since different people have different tastes, it is not clear why this is the

correct thing to do.

O)ne' 2rclimtnt iq that there iS little intetpsqt i,n 2vltiintina xnv intdiv iVA12'S e!fi2r. aG nrAing tn hs or

her own lights, but that we need to know about the welfare of a reference person given the circumstances ofthe

individual. Hence, we need a reference set of preferences, as well as a reference set of prices. The answer to the

question "How well-off would John Doe be with household h's income?" is of more general- interest than

allowing the idiosyncrasies of each person's tastes to affect the evaluation of his or her resources. For example,

n... Aessr.t- a gntrn. ;nnnm r- ,nrtles, but wve v,nou1d hardly cont.+ sor.eone as pr jusot bemaOus thai

income did not match their greed. More seriously, altruists are not deemed to be rich because their neighbors

are rich nor, in the same circumstances, are the envious deemed to be poor.

Nevertheless, there are some taste factors that affect the translation of money into welfare for everyone,

ard U14L are U4U11 LyUUre LgUU. Ill aODVbsLl* WeL1far. EIaVl'Ul bLaLUb Lb UIr, btll s IhU l d-IboUlI WULI ub'VU lU pllU

a great deal of money for life-saving surgery or simply to stay alive would not be deemed to be rich because of

such expenditure. But in practice, the most important taste-like factor that must be allowed for is household

size and composition. There is a useful analogy here with prices; prices, like needs, moderate the way in which

19

expenditures on each good generate welfare. If the price of rice is three times as high, 50 rupees can onlybuy a

third as much rice. Similarly, 50 rupees worth of rice buys only a third as much per person in a household of

three persons as in a household of one. According to this analogy, expenditure must not only be deflated by a

price index that reflects variations in the costs of goods and services, but it must also be deflated by some

n-^easrve of household sL: U.L VA-Ud tL aaV b :r.Ud ViLUUa wefiUrn. SVecUUio 5 Ls coIr.CIMLlmed i-ui hiuw LU Con LIruCt

the appropriate measures.

There is another issue about taste variation. This is the question of "regrettable necessities," goods and

services that yield no welfare in their own right, but that have to be purchased, for example, in order to earn

icorme. W ork clotuhes or raIIsport to wUrk are Ub-viuus exaples, aid ule argument is tuh sucn items snouid be

deducted from income rather than included in consumntion. If this is not done- individualq with differen.t

expenditures on regrettable necessities will not be correctly ranked if we rely only on their total consumption

inclusive of such expenditures. Again, the theoretical validity of such points should not blind us to the practical

difficulties. Transport to work is a regrettable necessity for someone who has little choice of where to work or

where to live, but is consumption for someone wno chooses to live in a pieasant suburb. Out-of-pocket medical

eypen es are anecesritv for nomen hilt a rihnir- fnr nther, n in, Gritivp wvr. l eametic. I m ; ediu.i,e TIto;odt+o

see how guidelines could be constructed that would allow one and not the other. The issue here is essentially

the same as that facing a tax authority when deciding what expenses should be allowed as deductions against

income in the computation of income tax. While recognizing the occasional injustice, such authorities tend to

take a hard line on such deductions in order to avoid large scale abuse. Exactly the same arguments apply here.

20

Box 1. Summarv of Theoretical Issues and Reeommendations

Issue Recommendation

Money Metric Utility (MMU) vs. Welfare Rgftio(WR)

MMU is the amount required to sustain a level of living and requires that consumnption Attempt should be made toUV 4UJuMLVU Uy a Paasche price iiidex ihai rfiecuts ta pnces uic hu-usehoid faces ana use Money Metric utility andwhose weights are different for each household. to calculate the Paasche price

indices with individuaiWR is an indication of how much better or worse off a household is than a reference household weights.howehol (s y at the p ..n ay and uu equis cnu :onU tUV ausate blUy a

Laspeyres price index that reflects the prices faced by the reference household butwhose weights are tihe sam-e for oil hou6seholds.

1'lp o,~pf NMUrT can cause difficuti-es in a..... ing thhe ,w.pt of ri- b policy l l

but, on the other hand, WR does not necessarily represent welfare correctly. The latteris the mnre serinus dra.whack- in nractic

Ineome vs Cannnnintian

IConsunmntion is a theoretically more satisfactorv measure of well-being Tin most de-velopin contrieswhere LSMS and /or

Income is wed in industrial countries where self-employment is relatively rare so that household exnendituremost household income comes from a few sources, where annual income variation is surveys are available,low, and consumption data are relatively costly to gather. consumption is the

appropriate measure to use.Consumption is less variable over the period of a year, much more stable than incomein agricultural economies and makes it more reasonable to extrapolate from two weeksto a year for a survey household. When self-employment is common, income data is at |least as expensive and as difficult to collect as are onsumption data.

Durable Goods and Housing

A measure of use-value, not purchase, of durable goods is the right measure to include Exclude expenditures -in the consumption aggregate from a welfare point of view instead, calculate a rental

equivalent / user cost forhousing & durable goodsowned by the household.

Time and Leisure

Households with more leisure time have a higher level of welfare than households with Omit time and leisure in theno leisure. However, valuing leisure for each individual is problematic. Furthermore, calculation of consumption.it is diffcult to distinguish between leisure, non-market work for the household, and linvoluntary unemploymentl

l l l~~~~~~~~~~L

Issue I Recommendation

Publc Goods

Clearly presence of public goods such as hospitals and schools improves the welfare of J Do not include anynearby households more than that of households without good access to these services. valuation of public goods inHowever, estimating the value of those services is problematic. Households may the calculation of thechoose private services even if public services are available. Contingent valuation of household consumptionservices that don't exist are sometims used but of questionable accuracy. aggregate.

Frm Households

It is possible to consider households as consumers separately from household Treat the farm household asbusinesses or farms in economies with active markets. In subsistence economies, this j a business selling to theassumption is sometimes hard to justify; however trying to separate the producer from household. Attempt to valuethe consumer using estimates of farm-gate prices is the best strategy in practice. In produce at "farmgate" rathercountries where a large fraction of consumption comes from home production, and than "market" prices.markets are less active, the evaluation of welfare becomes sensitive to difficultdecisions about imputations, and should be regarded with caution. l_l

Differences in Tastes

P,xpenditure on regrettable necessities should, in theory, be exciuded but in practice it Include expenditure onis impossible reliably to distinguish between necessities and choices. Household size, items that may or may notnowever, is importan d gnu auecis uae nousenoid weirare associatea witn a given ievei be regrettable necessities.of expenditure. Adjust household

expenditure to refiecthousehold size.

22

3. CONSTRUCTING THE HOUSEHOLD CONSUMPTION AGGREGATE

3.1 IntroIuction:

Following the diiscussion of the basic theoretical firmewnrk inplicit in using c-nnlrnution as a

measure of welfare, this section provides specific guidelines that the analyst can follow to construct a nominal

consumption aggregate from a typical LSMS household survey. For the purposes of this paper, the procedures

followed in constructing the consumption aggregate from recent household surveys in the following countries

were reviewed in detail: Vietnam, Nepal, Ghana, the Kyrgyz Republic, Ecuador, South Africa, Panama, and

One important preliminary issue should be emphasized, though it is one where it is hard to give any

very precise guidelines. This is the issue of data cleaning. In most cases, analysts who are constructing

consumption aggregates will be using a "clean" set of data that has already been subjected to the usual

IoraLiI.ey cheLv-s and elllliUV i o1L1V Vo gross VULIie rlsU andUcin1g U.-or. IUVV1 Ule:es, eAxp,ier.11V h1s soWrI

that every new exercise reveals new problems with the data, and the construction of a consumption aaaregate is

no exception. As we shall see, the construction of a consumption aggregate involves adding together a large

number of items, many but by no means all from the consumption section of the questionnaire. It is of the

greatest importance that the analyst check each of these items for the presence of "gross" outliers, typically by

graphing the data, for exarmpic using iiie oneway andi oox options in STA I A. r or inherently posiuve

quantities, it is often useful to do this in logs as well as in levels. Aggaregates and sub-aa repates should

similarly be checked. Such checks often reveal, not only isolated outliers, but groups of outliers, for example if

the units have been misinterpreted for all observations in a cluster. Sometimes, outliers can clearly be attributed

to coding errors, as when the units have been misinterpreted, or where zeros have been added, and in such

cases it is routine to impute an average (or better median) value for other househoids in the same ciuster or

region. In other cases, it is unclear uwhether thi "Antlier" is genr.uini nr inot, and the analyst m.ust ma,e a

judgment that balances the desirability of keeping any reasonable number against the risk of contaminating the

aggregate.

In Table 3.1, the components of consumption are aggregated into four main classes: (i) food items, (ii)

nnfoodf ,tt*lI.-, (iii)j --orfl f -lf -, ar.d (v ho uJin. A - lA WL LUlt WA t.r WA e UAtbOY. oAcasses U;n

the overall consumption aggregate depends on many factors, including the average level of income in the

country, prevalent tastes and norms, as well as the types of data collected in the survey. In this regard, it should

be noted that there was considerable variation in the design of questionnaires across the various countries, so

that the aggregates do not always include the same items. Nonetheless, the table is indicative of the order of

magnitude and relative importance of the sub-aggregates.

Table 3. 1: Main components of the consumption aggregate

Share of consumption aggregate (per cent)Sub-aggregate Vietnam Nepal Ghana Kyrgyz Ecuador S. Africa Panama Brzil

i]9S7J L996 1900Y7 ;796 19977A ;993 ;99I i99_Y

Food 50.9 64.2 65.2 44.5 49.6 30.4 45.9 27.7Purchases ' 34.1 29.0 44.4 33.4 44.3 28.2 39.8 21.0Home oroduction b 16.8 35.2 20.8 11.1 5.3 2.2 6.1 6.7

Non-food Items: 28.7 194 28.0 22.5 29.1 45.1 45.8 32.0Education 2.5 3.4 N/a 2.4 8.2 3.2 7.8 6.4Health 5.7 3.2 N/a 1.0 . 1.7 0.9 4.5Other non-foods 20.5 12.8 N/a 19.1 20.9 40.2 37.1 21.1

Consumer Durables 12.7 1.4 2.2 3.5 5.2 . 5.4

Housing 7.7 .1i 2.5 29.6 16.0 24.5 2.8 40.2Rent 5.9 12.6 1.7 17.6 12.1 15.6 2.1 31.4Utilities 1.8 2.5 0.8 11.9 3.9 8.9 0.7 8.8

OVERALL 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

GNP per capita (S)' 170 210 390 550 1,280 2,980 3,080 4,400

a Includes meals taken away from the home.b Includes also food received from other household members, friends, and in the formn of in-lind payments.c GINP per capita is taken from intemational statistics for the same year of tne survey, except for Panama wnere the latest available

estimate is for 1996.

in generl, as we would e-xpect irom nEgel's law, hui snare of food items in the Lotal tends to be

relatively more irnmortant the lower the level of income in tie countrv. The share of home-production in the

food consumption aggregate tends to be higher in countries where relatively fewer transactions take place

through the market place (Nepal, Vietnam) compared to those countries where agricultural markets are

relatively well-developed (Ecuador, Panama, South Africa).

T,he c.kiva of onn.irn.wtin, atffhiiitahL. to oAipaftie%" ard4 health alse An'.vAa mn t+p leave of iirn.-.e of

the country, as well as the extent to which these services are purchased through the market, or else are provided

instead by the state at subsidized rates. A more detailed discussion of each of the main classes in the overall

consumption aggregate is taken up in the sections that follow:

24

3.2 Food consumption:

In principle, constructing a food consumption sub-aggregate is a straightforward aggregation exercise;

all that is needed are data on the totai value of the various food consumed in the reference period, or else on the

total quantities of Aiffp.rpn+t food iters conisurmed as mrel a rpfprpt,on. spt of nriw-es at u.hich tn v alue thenm. Tn

practice, however, households consume food obtained from a variety of different sources, and so in computing

a measure of total food consumption to include as part of the aggregate welfare measure, it is important to

include food consumed by the household from all possible sources. In particular, this measure should include

not just (i) food purchased in the market place, including meals purchased away trom home for consumption at

Vl away 4 l.V t UVIl', VUU QIoL y.1J LULJ t saul o *1tJfl1flVy1tflW .. .'.A 1 UJ *LJL o. *.__. _v 1 ° L .W v* S* fl1fl to

from other households, as well as (iv) food received from employers as payment in-kind for services rendered.

In some cases where food can be and is stored over long periods of time, and where the questionnaire permits

it, "food consumed" can be distinguished from "food purchased". In principle, it is the value of the former that

should go into the consumption aggregate. A household that stocks up on cereals once every few months, and

wh O-useU iS IUdUgL by Ulus ei svy, s,iUUIU :IUL Ur, UberLUy countUe Well-oIA., I.or should mone WIo

did not stock up in the survey period be counted as poor.

The food consumption module of most LSMS questionnaires typically contains separate sets of

questions on (a) purchased and (b) non-purchased food items. As can be seen from Table 3.1, the relative

importance of these two components in the food consumption sub-aggregate varies considerably by countiy in

Nepal, home-produced food items constitute more than half of food consumption, while in Souith Aica they

compri"se less than 10 per rent of food connqumption It is even more obvious that the extent of non-purchased

food varies within countries, particularly between rural and urban sectors, but also within rural areas according

to the level of living. As a result, failure to capture the value of consumption from home-production is likelyto

overstate both poverty and inequality.

Th.0 foodA "phiuc,nae r,ei,lp- in T QM.if niipct. ninn.r-iva thnimnllv rnntnin niipQ tinne nn inrrlknec of 2

fairly comprehensive list of food items (a) during a relatively short reference period, such as the last two weeks,

and / or (b) during a typical month in which such purchases were made. Data are often collected on the total

amount spent on purchasing each food item, and sometimes also on the quantities purchased, during the

specified reference period. Calculating the food purchases sub-aggregate involves converting all reported

exper:.^e on~ food nc it1,, to a ,;fo.... refere.cen nner,oo., nor. yer-arn.....dn tur ag egar;nog t*hesFFj.11lU1LLU 1%0 VJIALW J JV L 11.1.41 LU UL UAfLA.W11l ' W- .. WV "~ 1. JI'I3O - 01 4111 . 1U

61'1%

expenditures across all food items purchased by the household.

In surveys where information on food purchases has been collected for more than one recall period, the

question arises as to which of the two sources of information should be used. Note once again that, in these

guidelines, we are not concerned with how the data should be collected and what reference periods should be

used, but rather with the decisions that must be made by an analyst wno is confronted with multuple measures

1 an already conlljtPie Q1ru. y. Conlsumptionn survevy-includirng LSMS sqirvevs-have used several diffe-mrent

designs in collecting consumption data, from a single question about purchases over the last two weeks, to

multiple visits each with much shorter recall periods, to repeated visits over the year designed to capture

seasonal variations in consumption patterns. There is large (but far from decisive) literature on the benefits and

costs of these different designs, much of which is reviewed in the context of LSMS surveys in Deaton and#-At (1 [AflQ] Tf-s rsA olrza 1-..,, Al AtPr 4-tin iirw th-n~ nnip ov vn th-ft thPrP ic _ rhif1P -n_lvct.JIU;:OII 1L770.. Li" 0113 Ev* 0w fldJ * ,..Vfl L- .. f. , , tf 0_._

should choose the altemative that is likely to provide the most accurate estimate of annual consumption for

each household, not for households on average. In perhaps the ideal (but most expensive) case, where in each

"season" the household has been visited on several occasions, estimates should be made of consumption in

each of the seasons, and the seasonal totals added to get annual consumption. In most surveys, this will not be

an op.UIo, d U1 Ll14 4I.4uai U±ML LSM s veys, he-eI Cs eiUN.4 11r.I CIlAice or ClL0IV%e LO UIiUL'..A oi z . ^o41I' Q 1U.t' LV

weeks" (or shorter period) measure, and a "usual month" measure. The literature reviewed in Deaton and

Grosh leads to a recommendation in favor of the latter over the former, at least for the present purpose. The

former tends to be biased by progressive forgetting, as well as the occasional intrusion of (especially well-

remembered) purchases from outside the period. The latter has the advantage of being closer to the concept that

we want-usual consumption is a better weifare meas-ure than what actually happened in the las two -weeks,

whi.-.h eniild have been unusual for any number of reasons--and reduces nroblems with seasonalitv. but will

suffer from measurement error if respondents find it difficult to calculate a reasonable answer. In any case, and

whenever possible, data from very short reference periods should be avoided. Over a period of a day or two,

purchases are quite unrepresentative of consumption. Averaged over a large number of households, mean

purchases will still be accurate for mean consumption, but dispersion will be exaggerated, with consequent

exge-aiv o~f ,pii.qa1jh --nd (,i. no,rr.i1. p~cases flnurty. Co.nnsiimnrhnn mP--asirte based onf very 'shnr rpt,ldl are

not suitable for the construction of consumption aggregates for welfare purposes.

The total value of meals consumed outside the household (restaurants, prepared foods purchased from

the market place) should also be included in the food consumption aggregate, as should the value of meals

L4ZII by I3 UalOULldU,-.L..bJL atl MI. I -Arg.JLi, WeJLN U LII.U1 V^GLJGVAL1, LIL. aIlJ l LSJMIVI.U O11 V VLys ask%L lAPL.ILtL1y

about the total value of meals taken outside the home by all household members; this amount should also be

included in the food consumption aggregate. In some cases, however, it is impossible to disentangle

expenditure on some meals taken outside the home from other related (and more aggregate) non-food

26

expenditures such as miscellaneous schooling expenses, total expenditure on vacations, etc. reported elsewhere

in the questionnaire. This need not be cause for concern as long as these expenditures are included in the

o ve-r-al IlJ aggrer-ate hoLus,oVldVL cor,s U,,JL,UoIn ,,laurue irL oef ore ' u,e Ulu ie.

Almost all LSMS questionnaires contain a separate set of questions or module on consumption of

home-produced food items. Here it is more common to find questions only on the amount of home-produced

food items consumed in a typical month (rather than in the past 2 weeks), as well as the number of months each

food item is typically consumed in a year. Data are often coliected on both the totai value and quantity of

consumntion of each home-produced food item The home-production food sub-aggregate can th.us be

calculated by adding the reported value of consumption of each of the home-produced food items in a manner

analogous to that followed in the case of food purchases.

in principle, it is possibie to calculate the food home-production sub-aggregate using data on reported

quniinin f-e ninmeid in CnjiO.cioGinn uAth pr_es from the fnfood prkchases section. However, a pointed -

section 2.8 above, and to the extent possible, "farm-gate" prices should be used when imputing values to

home-produced food items. Moreover, home-produced food items consumed by the household may not be

comparable in quality to items traded in the market place. Households' own valuation of the amount they

would expect to receive (pay) if they had sold (bought) the home-produced food items that they consume are

therefore likely to be a m.uch better approximation to their tUue "fz .. g;aLe" vaue, ratLher Uthn estLUm4,ates derived

using prevailing market prices from the food purchases section.

In most LSMS questionnaires, food received as payment in-kind, as well as in the form of gifts,

remittances, etc., are usually lumped together into one set of questions (usually on total value of consumption.. _ . - - - . -_ - - _ I - - -J .- .j ' _ - - _ ! - - - - 1 _ - - _j_ !- - ~ - - - - ___ . e - . ._ - IfUrM tiiib ,UFCui), U1 ES buUbmUeIdU unudr uth quesuunis on nume prucuCtion. Consumption of Iood aerved

from these sources should be added to the overall food aggregate, if it is not already imnlicitly included in the

home-produced food sub-aggregate described above.

In some cases, however, it may be that questions on consumption of home-produced food items are not

included in the quesuonnaires explicitly, so that data are avaiiabie for consumption of purchased food items

only. Tn such caseq- it may still he pnosihle to uSe data from the agrictultire section to dtpnve an es,t.i Em, te ofthe

total value of home-produced food items. The section on crop production of most LSMS surveys typically

includes a question of the type: "How much of ..[crop].. did your household keep for consumption at home?"

as well as questions on dairy and other livestock products that the household consumed from its own

2I -7

production, so this information, in conjunction with data on prices, can be used to calculate the total value of

home-produced food consumption.

Pnr ivncta nnr i,n the apep nfthe 1 QO .Tvrav7 1Rpn,hlinr T LSMS rrnn,elinwintin nfhnmP.nrnrdm-l-. r,ngn

and animal products was calculated from the "Agro-Pastoral Activities" section of the questionnaire, because

the section on "Food Expenditure and Consumption" collected data on food purchases only. Exclusion ofthese

items from the food consumption aggregate would have resulted in underestimating average food consumption

by 30 per cent. Furthermore, because the share of home-produced food in rural areas was much higher than in

-bLar aiLLr ..fl0 LLWA.r, LILJ ' ag51 UL 5gregate COn.LarJL-IpIoJLL L..-.eQLL would have L%1..iLVU iLJLiI :.ns.LUously LurLUI-

estimating the welfare of rural compared to urban households.

Because all LSMS surveys collect information on total value of the food item consumed (for both pur-

chased and non-purchased foods), the question of assigning monetary values does not arise. However, in

surveeys wvhe-re datw iae cected on both va'ue as weii as quanuty of food ite consuimied, it may oe utax due to

interviewer error-or a varietv of other reasons-we find households consuming non-zero quantities of a

particular item, but where data on the total value of consumption may be missing. In such instances, the

question arises as to what prices to use to value food consunption of these items - (i) average or median prices

calculated from the survey data for other households, (ii) prices from the price (conmmunity) questionnaire, or

else (iii) prices from some otner externai source?

Faced with a choice of prices, the best choice is usually the one that offers the closest approximation to

the amount actually paid. Except where there is a large choice of quality, the values reported by the household

are likely to be better guide than market prices, if only because they record actual, not hypothetical

transactions. When such data are not available, the analyst can construct prices from the data for otherh1, ns-ol-dn onAr -ae t

tk.n r.,n (n prefere..c to the" r.ear. chire ser,s;..

4v1 to out'.ier;,) price -A i..

*L.Jao.A.JalO,Wfl LW% ts% L - ---ALL 1ALL F.L -- - -h. -V ILI lLULXV .h~ O A..L OfL,UV .tIOL I A, WUUALLIV 1 IA J L%A FtjaLU U)

other households in the same cluster. When these data are not available, there is no choice but to use prices

reported by other households in the same sub-region, district, division, or province, depending on whichever is

the next higher level of aggregation for which price information is available. When making such substitutions,

great care must be exercised, particularly through checking that the prices being imputed are reasonable.

Mrech1anical imputatiVon can. resuUlt inI tUhe mLatcLhin1g of prices fu[ gouds uhat are ini fact very different, witi

catastrophic conseauences for consumDtion agzregates. In one famous example, a survey imputed a value for

water collected by households from local wells by using the geographically nearest price for purchased water,

which in this case turned out to be imported bottled water from a French spa. By this remarkable imputation,

28

rural households were given living standards well in excess of their urban counterparts.

3.3: Consumntion of non-food items:

£LSMVLY3 4UL&VLL1aIUO L.yJiFaiy coal]ec .fo UIIIRa.III on1 co1rsU11,-JL,ti VI oL WLL ie4I Ira.o.I LII-.nVoU iLt.-.^

For example, data are collected on consumption of daily-use items such as soap and cleaning supplies,

kerosene and petrol, newspapers, tobacco, stationary and supplies, recreational expenses and miscellaneous

personal care items, as well as other less frequently purchased items such as clothing, footwear, kitchen

equipment, household textiles such as sheets, curtains, bedcovers, etc., and other household use items. Data are

also collected on education and health expenditures for all nousehold members. Expenditures on nousenold

utilities are tvnicallv collected in the housing module- and for households that have small hbuines enterpn:ses,

that module can provide information on non-food items that were produced for home consumption. Finally,

these questionnaires typically also solicit information on other infrequent expenses such as legal fees and

expenses, home repair and improvements, taxes and levies, as well as expenditure on social ceremonies,

marriages, births, and funerais, etc.

The actual computation of an annual non-food consumption aggregate is straightforward. The

difficulties lie in the choice of which items to include. The choice depends not only on which data are

available, but also on the analytic objectives of the study being undertaken. However, there are a few general

issues that apply to most LSMS survey data and for the standard welfare analyses; these are taken up later in

Unlike many homogeneous food items, most non-food goods are too heterogeneous to permit the

collection of information on quantities consumed-exceptions are some fuels, like kerosene or electricity, and

some transportation items-so that LSMS surveys collect data only on the value of non-foods purchased over

Ulr.e .-efe.-e. J-.Lcpeiod. D.aa Von purchases1;; VI non, food itenrs are of erL collected forWLIVL dif-f,-- L ,-eallp,iods, for

example over the past 30 days. the past 3 months, or the past 12 months. depending on how freauentlv the

items concerned are typically purchased. Constructing the non-food aggregate thus entails converting all these

reported amounts to a uniform reference period-say one year-, and then aggregating across the various

items.

As far as singling out which non-food "exnenditures" should be excluded from the consumntinn

aggregates, some choices are straightforward. Expenditures on taxes and levies are not part of consumption,

but a deduction from income, and should not be included in the consumption total. An apparent exception can

29

sometimes be argued for some local taxes, such as property taxes, that are used to provide local services, such

as schools, policing, or garbage collection. In some locations, these taxes bear no relation to services provided

and so should not be included in the consumption aggregate. But where such taxes are closely related to

services provided, households that are paying more tax are receiving more services, are better off as a result,

and A 4-, :-,-u of -;1 t ". d 'e to " e A- -k -- l A;ff o i, piublic goroA pi.vrridni

between different households. Commodity taxes are included in the prices of goods, and so (correctly) find

their way into the consumption aggregate through the prices-though it is also possible to imagine using

reference prices for money metric utility that exclude commodity taxes. In any case, no special treatment is

required for commodity taxes. As we have already argued, expenditure on "regrettable necessities", such as

travel to wvork or wor'-rivelated clouuing, are b i.,cuded, u'uU1 bus sx ed -

oneration of own-account business must be excluded. These distinctions are much more easily enunciated than

implemented; the welfare analyst faces much the same difficulties as does a tax inspector! Some surveys list as

"expenditures" items that are clearly capital account transactions, such as expenditures for a "saving club". All

purchases of financial assets, as well as repayments of debt, and interest payments should be excluded from the

consumption aggregate.

More complex is the case of "lumpy" and relatively infrequent expenditures such as marriages and

dowries, births, and funerals. While almost all households incur relatively large expenditures on these at some

stage, only a relatively small proportion of households are likely to make such expenditures during the

reference period typically covered by the survey. For instance, in the case of the LSMS survey conducted in

p l tf (1001 PDT)I, lves thkai, 2 pe,r -er.t nflhk,mc.hntlA repourtpe hag r iadep a do, pa.urunupt Al,riun thep

past 12 months; however, such expenses constituted 20 per cent of their total annual consumption, Howes and

Zaidi (1994). Ideally, we would want to "smooth" these lumpy expenditures, spreading them over several

years, but lacking the information to do so-which might come, for example, by incorporating multi-year

reference periods for such items-we recommend leaving them out of the consumption aggregate. Note the

4RUiUr,y WlUl JIlUs4buremlent III LO-. t%IUIVU611 UaDIWLLI VAPVLULUtULVO WlG ilW4l VllvUUI, UIeilbUIIIMUUII aFsSa>4ur

that include them can be thought of as "noisy" measures of the longer-run averaged totals that we would really

like to measure. In this sense, measurement error and lumpiness can be thought of together, and the techniques

we discuss in Section 6.4 below can be applied to both.

Expenditure on healtn is an often lumpy expenditure where a decision almost aiways has to be made.

Orn.e argum..e-nt for exclusion is that such ernendiit ure reflects a regrettable inecersity that dnes nothing to

increase welfare. By including health expenditures for someone who has fallen sick, we register an increase in

30

welfare when, in fact, the opposite has occurred. The fundamental problem here is our inability to measure the

loss of welfare associated with being sick, and which is (presumably) ameliorated to some extent by health

expenditures. Including the latter without allowing for the former is clearly incorrect, though excluding health

expendiLLtLUres d1LtgUeur r Learns uhatL We r,iUs UIhe UIIVIVII96.or )VW.I ILW peoUplJ, UVUL V11 WHLUIH &-e sic, b-UL UILy

one of which pavs for treatment. It is also true that some health expenditures-for example cosmetic

expenditures-are discretionary and welfare enhancing, and that it is difficult to separate "necessary" from

"unnecessary" expenditures, even if we could agree on which is which. It is also difficult without special

health questionnaires to get at the whole picture of health financing. Some people have insurance, so that

expenditures are only "out of poCket expendiu-res whmcn may be only a small fracton oI the total, while

others have none, and may bear the whole cost Simply adding up expenditures will not give the right answer.

Yet another approach is a pragmatic one that recognizes that measured health expenditures are a noisy

approximation to what we would ideally like to have. As we shall see in Section 6.3 below, the decision about

whether to include them in the total depends, not only on the extent of the measurement error, but also on

elasticity of health expenditures with respect to totai expenditure. The higher the elasticity, the stronger the

Pc2c fnr inet,hicinn

Table 3. 2: Elasticity of Health and Education Expenditures

Health Exnenditures Education ExpendituresCountry Year Estifm t- R Estim. t- R

elascl s22. a+:s4c~4 -------s:c-;Sas4c qare

Vietnam 92-93 0.86 33.2 0.19 1.35 46.8 0.43Nepal 1996 0.75 20.9 0.15 1.65 43.5 0.48Kyrgyz Republic 1996 0.74 14.3 0.14 0.68 13.1 0.13Ecuador 94-95 -- -- -- 1.38 46.6 0.37South Africa 1993 1.14 58.7 0.40 1.32 67.2 0.45Panama 1997 0.80 29.2 0.25 1.24 54.9 0.49Brazil 96-97 0.85 31.0 0.26 1.25 47.9 0.45

The elasticity of expenditure on health was estimated from the LSMS data from the seven countries

reviewed for this paper. With the exception of South Africa, the elasticities of health expenditures are

estimated to be relatively low (see Table 3.2), a result that should be contrasted with the estimated elasticities

for educationai expendiiures, which arc also shown in the table. Given these numbers, and given the

measurement probhemns, we think that there is a relatively good case for exliidinog health expenditures in the

consumption aggregate. Table 3.2 also shows elasticities for educational expenditures, for which similar issues

arise as for health. Although educational expenses are not as irregular as health expenditures, they are located

31

at a particular point in the life-cycle, so that, even if all households paid the same for education and had the

same number of children, some would appear better-off than others simply by virtue of their age. In this sense,

educational expenditures, like health expenditures, would ideally be smoothed over life. There is also the

argument that education is an imvestment, not consumpuon, ana snoula be mcluaed m saving, not m me

conumpntion aagregate. But we follow standard natinnal income accauntina nraceice and recnmmend that it he

included in the consumption aggregate.

Another important group of items to consider are items such as consumer durables and housing whose

useful life typically spans a time-period greater than the interval for wnich the consumption aggregate is being

constructedA. As disc,sed 1n Sectionn 2.4 a nnup, the revn-Iunt nnm"s-ie.,t nftAP tntal. is not tinh f "" a on

such items but a measure of the flow of services that they yield. How to calculate this measure of "user-cost"

for consumer durables and for housing is taken up in more detail in Sections 3.4 and 3.5 respectively.

Another group of expenditures are gifts, charitable contributions, and remittances to other households.

A case cW* I- -. A. for I;A.clu ng i it, tonl, oU-A. -flo +tI-, fat. ula, Lit.3 mi,u -t.e1A as .- 11..u weL^h e '.oU

transmitting household as do other consumption expenditures that could have been made with the funds.

However, their inclusion in the consumption aggregate would involve double-counting if, as one would expect,

the transfers show up in the consumption of other households. Average living standards could be increased

without limit if each household were simply encouraged to donate its income to another household, and so on;

noUthig would have changed except oUrU measure of welfare. Wue uie-refore r;mmend excludiig gifts and

transfers, counting them as they are spent by their recipients.

Finally, there are various miscellaneous non-food items that are worth mentioning. Expenditures at

weddings and funerals are another lumpy and occasional item. In some countries, these expenditures are really

transfers-to tne bride and groom, or to tneir parents-and shouid probably be treated as such and excluded

frnm the aggoregate Their tranitoriness w ould lead to the saMe econcluscin-n So-mei hnusehoJds ownx. sm.1l

enterprises which produce goods for own-consumption; such items should be treated analogously to home-

produced food, priced as well as is possible in the circumstances, and added to the total. There are also a

number of non-foods received as payment in kind; housing subsidies, transport to work, and education are

probably the most important examples. In principle, all such items should be valued and included though, as-l1,, rnn th^. . 1st chn.ill1. hi,r tA tm. orlnn#l,nn 1A AA ohnA OA2nrstrnnrla.t7roe ~.A t.nA ...A_ _l_

.64 VT ,4O IAVJ 6 1ab OflfFlf t_ *IS*. t v .. * 14I4'.4 ... ^t^V_V%d1li *11J11f.4lOVl1COtf UIt vIJUI llllV, uUIU 1 fl.lUL41VLsl

error on the other, again see Section 6 below. Expenditures on utilities, water, gas, electricity, or telephone can

also be problematic if some households are subsidized and some are not. For example, some households may

32

receive high quality piped water at little or no cost, while others have to buy expensive, inconvenient, and

lower quality water from local vendors. In some cases, making accurate regional (and certainly international)

welfare comparisons will make it necessary to make corrections to (by repricing) the reported expenditures.

3.4 Consumer durables:

From the point of view of household welfare, rather than using expenditure on purchase of durable

goods during the recall period, the appropriate measure of consumption of durable goods is the value of

se(VYCe ULL U".a L1UU1LU1U AVVl eives all 'U1 duu-able goosu IiI 1U in u :'s UIosessi oUU UVVe U,e rle-v-ar1L .Ut1 fIeIodU.

As discussed earlier in Section 2.5, the "user cost" or "rental eQuivalent" for durable goods is apDroximately:

where S, p, is the current value of the durable good, r, - r, the real rate of interest, and a the rate of

depreciation for the durable good. Although in theory, r, is the general nominal rate at time t, and ;rt is the

s-pecific rate of ifIUaLion for each durable good at tLime t, il practice it is best o collapse he ivWo iinto a siigle

real rate of interest. taken as an average over several years. and to use that real rate for all durable goods.

Almost all LSMS surveys collect data on the stock of durable goods currently owned by the household.

However, the amount of detailed information collected about each durable good varies quite considerably

across surveys. 1 nerefore, depending on uie type of data avaiiabie, tne analyst must cnoose between a number

of different strategies when using (3 1) to estimate the durahle oods connumntion -mbh-am-ewate

In the case of the Vietnam and Nepal LSMS surveys, the "Inventory of Durable Goods" module of the

questionnaire collected information on (i) the current value of each durable good (S, p,), (ii) the age of the

item T in years, as well as (iii) tne value of the item wnen purcnased (S, Pt-rTJ Using (3. 1), consumption of

Adlbleh goodsc WAC th-^. c- mlcuAti-edA fo!!owse:

First the depreciation rate 6 for each type of durable good was calculated using:

( .. 'Yr_-r=1_( I V, (3.2)

(. Pt-Tij

For instance, estimates of 6 - ir calculated from the survey data in Nepal ranged from 13 per cent fortelevision sets, 1'7 per -ert for -A-o=csette Pl.,or, ar onb.4 -n far,s, 11 o2 p -* e c^'f-&,1V V .O1WA o0n.o, A~ ~fl eSF.. b - u a.n , .. WJf t. 1W IL F W %A3t UIUY%..!LO. I ttA%,D

33

estimates were then used, in conjunction with data on the real rate of interest r, - ;r, and the current value of

durable goods owned by each household S, p,, to calculate the durable goods consumption sub-aggregate. In

order to minimize the influence of any outliers in the data, the median value of depreciation rates were used for

each of the 16 items for which data were collected (i.e. rather than using household-specific values of 6 s

In the case of the Ecuador and Panama data sets, information was available only on (i) current value of

durable goods owned by the household St p, as well as (ii) the age of the item T in years. As the value of the

item when new was not available in the data sets (i.e. SI P-rT), (3.2) could not be used to calculate the 5 s;

iLLZtVadG aLL aLIiL-mat VI cvionuiLLIpLVIVI Uo durabUI goUod was.O cLaCL.eu1C&.rU as IfIIUws.

First, the average age for each durable good, T, is calculated from the data on the purchase dates of

the goods recorded in the survey. We then estimate the average lifetime of each durable good as 2T under the

assumption that purchases are uniformly distributed through time. (In some cases, for example where a good_____1- - - -__ ! - j _ _ __ - -- - -_ _ __ t_ _ _ -- A _- _-- - I - - -I - I - _- \s. . -nas ouny rwcuntiy beun mitruuuocu, some other guess wouiu nave to be maue.) I ne remaminig nie oI eacn gooa

is then calculated as 2T - T in this case- and somewhat arbitrarilvy this estimate is "rn'rnded uip" to 2 yenrc

when the estimate was less. A rough estimate of the flow of services is then derived by dividing the current

replacement value S, p, by its expected remaining life. For the countries, the interest component in the flow of

services was ignored.

zr-1 log zr.d .-ezzrgi. +'.e +erm.s som.ew"at (32 cnb rem;.e-l sa GhLiir L'JrO GLIA ,U i~GL CLIS~L.A.r II II, L U1Ia.vvLa. k .Z..a1 %CUL P% LIUWLIULV1i a~..

ln(p,) = 1n(P-T) - Tln(l -6 + it) (3.3)

thus, in cases where data are available on the current value and age of the durable good only, using (3.3)

6- it can be estimated by regressing the current value of the durable good on a constant and T (i.e. by

assuming that the current value of the durable good when new is a constant).

LTi the L_QX' eovgsny fi. th T[x.^ 1T>..21; A.+t,. -U,- o w+leklo-o ; +_+1 s_ 1s9

fla...~J,LV.S O - -lI L.aJ. flfl8

- -aI Sn W -lSM YO,,UuS. MAIJ~1Jt UZI . JLLV tVW U LU1%.iiL VOLUq- Vk UIt

stock of durable goods owned by each household. In this case, (3.1) was estimated directly assuming a value of

10 per cent for ( r. - ;t, + 6 ), a number that seemed reasonable given the prevailing real rate of interest and

plausible values of 6. Finally, in the case of the Brazil and South Africa data sets, consumption of durable

goods was not included in the overall consumption aggregate because of unavailability of data. Whenever good

34

data are available on the total stock of durable goods owned by the household, we would reconmnend

incorporating in the overall consumption aggregate a measure ofthe flow of services accruing to the household

from these goods.

3.5: Housing:

Of all components of the household consumption aggregate, the housing sub-aggregate is often one of

the most problematic. The underlying principle is the same as for other consumer durabies; what is required is

a measure rn moneay terms of the flow of services that the household receives from occupying its dwelling.

Because house purchase is such a large and relatively rare expenditure, under no circumstances should

expenditures for purchase be included in the consumption aggregate. In the hypothetical case where rental

markets function perfectly and all households rent their dwellings, the rent paid is the obvious choice to include

in the consumption aggregate. Whenever such rental data are available, and provided the rents are a reasonable

s 11 itiL1lIr us-- -- al a I. -U11 X A A- U ;*% SIkI U-- -9UUX6oU .-KFW t A +1IA -

consumption total.

In many cases, however, households own the dwelling in which they reside and do not pay rent as

such. Others are provided with housing free of charge (or at subsidized rates) by their employer, a friend, a

relative, government, or other sucn entities. Iim many LSMS surveys, non-renter households are as'&ed how

Muc-h it w uld cost them if they had to rent the dwelling in which they reside. and this "implicit rental value"

can be used in place of actual rent. Such measures must be treated with caution and carefully inspected prior to

use. Implicit rent is a hypothetical concept, perhaps to the interviewer as well as to the respondent, and the

numbers reported may not always be credible or usable. Even when people are apparently confident about their

estimates, they may do a very poor job of reporting market rents. Rents inown to them may be subsidized, out

of at.e, or in. sorm.e .w. ainrPnresenentntive of the g.nernl run of nronertv in their area

The hardest cases arise when there are data on neither actual nor imputed rent. In the case ofthe South

African LSMS, in addition to information on rents, data were collected on the total property value (i.e. current

sale value) of the dwelling. For households who reported property values but neither actual nor imputed rents,

+Ult locAb UnAa.r;Jn ft L LI. = .lWS of I j to propeivIe tOi used to curatl tn. … * ir.puted r*tt*l. Tin WVc Wlher

the property value of the dwelling was also missing, a median property value per room was used in each

locality to assign a property value to the dwelling based on the total number of rooms, and the estimated

property value used to estimate its rental value.

35

In the Nepal and Kyrgyz Republic LSMS data sets, hedonic housing regressions were used to impute a

value of housing consumption wherever information on rents was missing. The idea behind this approach is to

estimate an econometric model in which rents reported by a subset of the population (either actual or reported,

as the case may be) are regressed on a set of housing characteristics including, for instance, the number of

rooms and m.easures of quiality of fthe dweling such as tpe of roof, flnnri, innntriinicn maternia1 of walls, t,Vpe

of sanitation, etc. as well as regional dummies. The parameter estimates obtained from this model are then used

to calculate rents for that segment of the population for which data on rents are missing.

In cases where data on imputed rental value for non-renting households are not available, or where

Well estnimtsaLre dee-..e to,L1L% Le ur1reliaible or difficuh. to esuzmw wee bcue r=n.al L.. Mtkets zreU +U- .m (zs iAs tLe

case, for instance, in rural areas in some countries), the hedonic regression approach can also be used to impute

rents for such households. The regression model is first estimated using rent paid by renter-households as the

dependent variable; the results of the model are then used to impute rents for the rest of the population.

Because there may be systematic differences in characteristics between renters-and non-renter households, the

nfCuKiiinn (17Y 6) Lw)U-5L48g bLUlhla4UUiH Xl-UIUU Lb islsU oUI LUIV1V=5 UsVU WIIVHL rbLU1.4LU1r, bU%sIc I,iUUedoic r1,,uoe,

see for example Lee and Trost (1978) and Malpezzi and Mayo (1985).

Finally, in cases where data on rental value are not available for both renters as well as non-renters, or

where the percentage of the population renting their dwelling unit is so small as to make estimation of a

hedonic housing model unfeasible, data on property values can be used to estimate the value of housing

consumnpion. Followingarnanaprarlhsimilar tnthatii ed for conRil mer diirahle. oitlined earlier in Re.tinn 34

the value of the flow of services received by the household from housing can be calculated by using an

appropriate guesstimate of the user cost per unit to derive a measure of housing consumption from the total

property or "stock value" of the dwelling. This was the approach used in the case of the Vietnam LSMS data

set.

Once again, it is necessary to warn against the mechanical application of these (and other related)

procedures. In some countries, housing and rental markets are not well enough developed to permit any serious

estimate of rental value, and attempts to repair the deficiency using data from a small number of households are

unlikely to be effective, however sophisticated the econometric technique. Even if there is information on rents

Hi someu par s VIof. wuleoltyIt iS obUviou-ly I*L'LUdUo LU appy It toU UUiI MUM, 4LU UUMlMUIVL1lU IjLes

sometimes do no more than disguise the Droblem. In extreme cases. the best available solution may simplv be

to exclude the housing component for all households.

36

Note finally that data related to expenditures on water, electricity, garbage collection, and other such

utilities and amenities are usually collected in the housing module of LSMS questionnaires. They should also

be included in the housing sub-aggregate, and in the measure of totai expenditure.

37

Box 2. Recommendations for Constructing the Consumption Aggregate

Food Consumption l

Food purchased from market: amount spent in the typical month x 12 (or number of months typically consumed)rooa mat is nomne-producca: quanuity in typical montuhx Lari-gawe pnre x numoer of mouLn typicauy uonsumeuFood received as gift or in-kind payment: total value for a yearMeacis coiisumed o-uLlbid heI hoIme:

Amount spent in restaurantsA _ount spen on 4rp red oodZ%¶JJIUUL *F,ULUL FJLVja1VU L%;VJUD

Amount spent on meals at work [here or in work-related expenditures]AmountIfl spentfl on mesls at scoo!Lf. LUr o: in U.GLf edu^ado ALSnIUtores

Amount spent on meals on vacation [here or in vacation expenditures]

Issues: Missing prices or unit values, first choice is price (unit value) reported by the household; if not available, use as apro.xy the median - not mean - price paid by 'similar' housepholds in the neighborhood, ciihipt tn chefks thnt scih prices are Iplausible. Check data for outliers; miscoding or misunderstanding of units for quantities causes errors in unit values.Non-Fnood Conn umntinn

T)nDilv nze ittemnq annualize the valueClothing and housewares, annualize the valueHealth exnenses should only be included if they have high income elasticity in relation to their transitory variance ormeasurement error.Education expenses: Typically measured quite accurately in most surveys - our recommendation is to include them.Work-related expenses: To the extent possible, purely work-related expenditures should be excluded. This recommendationdoes not include transport to work or work clothing.Exclude taxes paid, purchase of assets, repayment of loans, expenditure on durable goods and housing, as well as other lumpyexpenditures such as marriages and dowries. To the extent that local property taxes bear a relation to services rendered, werecommend their inclusion.Durable Goods

Calculate an annual rental equivalent using an appropriate real rate of interest and median depreciation-values for each itemcalculated across all households owning that item.Housing

If a household pays rent, annualize the amount of rent paid. Even if the dwelling is owned by the household or received free |of charge, an estirate of the annual rental equivalent must be included in the consumption aggregate. In countries where fewhouseholds pay rent, rental equivalents are potentially inaccurate, and the benefits of completeness need to be weightedagainst the costs of error. l

Weights or Raising Factors

| If households interviewed in the survey had differing probability of being selected in the survey sample, household "weights"I (also known as expansion a raising factors) should be included in the data. Remember to use these when deriving l

representative statistics for the entry under consideration.

38

A A ThTQT!ThTA! d'ID £'IQT fTV T TIrTNXTf "ITMI7EDU.'17f'1k C1. v.~~ £ ^ As 5 xJ A ^A RA V AJ^. E.PAAT i^ Z~L3s^

4.1 Introduction:

In this' Section, we lay out some of the practical issues involved in calculating the price indexes that are

used to deflate the nominal consumption aggregate. As we saw in the theory section, the caiculation of money

mn,etrn utility reqnires that the nnrimil na2oreO2te he diefl.ted by a Pans-he pnric.e index i.n u,hieh the wPeights

vary from household to household. If the analyst prefers to work with the welfare ratio approach to

measurement, the deflator is a Laspeyres index whose weights are the same for all households. We present the

price indexes in that order, which follows our recommendation in favor ofthe money metric approach. We note

that these price indexes are of independent interest beyond their roles in deflating expenditures, simply for

r.ea...g prices.

Price indexes are used to aggregate a large number of individual prices into a single number, so that

individual prices are the raw material for the indexes. In LSMS and other surveys, there are several possible

sources for the prices, see Deaton and Grosh (1998) for further discussion of how prices can be collected and

flo ar1 ar4aly-is UI ufll UI Ulu U1ie11r9LI%oVb Ur.LWreI Ul.111. III ULief, UwavL arv Uu rv pIJosblUiV sourc. I.Le fUL

source is the survey itself, and the reports of purchases by the households surveyed. In many (but not all)

surveys, households report both quantities and expenditures for most of the foods they purchase (three kilos of

rice for 5 rupees) as well as for a few non-food items where quantities are well-defined, fuels being the obvious

example. Dividing expenditures by quantities gives "unit values". These are affected by quality choices;

someone wno buys beiter cuts of meat wili pay more per unit, but experience snows that the spatial variation of

unit values is closely related to price variation. As a result; unit values provide good price informationn

especially when averaged over households in a cluster.

The second source of price information is a dedicated price questionnaire, often admninistered in each

cluster as part of a community questionnaire. T ne price questionnaire seeks to measure prices in the markets

aurt1al.lu ntrn1ni7eA hy _uurveu hAluup.Qhnlrl and iin nrininiple pnrnvides a direcmt mneie nf iwhat we need. Tn

practice, there may be some compromise of data quality from the fact that the investigators do not actually

make purchases. There are also sometimes problems of locating a wide enough range of homogeneous goods in

all the relevant markets, so that it may be hard to match prices from the questionnaire with the expenditure

patterns of the households in the survey. But this is the preferred source of price information when quantitiesare- not coillece fror ac hP- -' ousehold, a.-A the orly sourc for +lk-s goods, -uch r.ost- no-fo itrsA^,

Lvted LonI 'L AL~ AAJA%&, 6411%& %4~ LA4_ 0WLL.U AV LUlLAtMOt. 5 Lh'30, OLLO)A00 AUPJOL U1UA9UJUU ILdIJO, "IL%&

39

food eaten away from home, where quantity observation is not possible in principle.

The third source of price data is ancillary data, for example from government price surveys. This is

typically a source of last resort. Such data are often thin on the ground, and there will often be many

householdQswhoe nenrest obsezrvedp nrie ic SO far auwa8yv § t,o be irrelevant. Neve.rtheless, such dnta sre

sometimes the only information available, and it is usually better to use them than to make no correction at all.

Note finally that the situation is somewhat different depending on whether we need to compute price

indexes over space or over time. In the latter case, for example when we are comparing two surveys for the

s.coin ttll%ltOWb sor.e ,rsaar,4 the.re tnl t,o.iamll't, ba o.,nllabl sor.e r.ativ.a! nnnrn,n,ar priamc , irdx ht tolla uso

by how much the general price level has changed between the two surveys. In the absence of spatial data on

prices, the temporal index should be used to deflate all nominal expenditures to ensure that welfare

comparisons between the two periods are not being driven by inflation.

BJeflUr LLLLILUr, ln Uto 2LLVAilD, It is LfuIl LU Ubgin Uy IVvall'ur,U UI'e LUILIUma fIo m lulvy'Iiivulit andU

welfare-ratio utilities, whereby each is expressed as total expenditure deflated by a price index. For money

metric utility, we have from (2.6) that

n .a xU a ;Z ^ =- - (4.1)

p P

wlI.el- 'UL £.e.,IIV P JaIaIsc ULIiLxL L U UUI.e dro-ujIIaoi is g.IVeI Uy

h h

PP _ O _ h (4.2)P *q

Here, the weights for the price index are the quantities consumed by the household itself and therefore differ

from one household to another. By contrast, welfare-ratio utility uses a Laspeyres index so that, from (2.10)

hh... X

U, = n*(4.3)

where, if we are using the poverty line as the base, the Laspeyres is given by (2.9)

40

PL. z nh(4.4)

p 4WiI ( P p~ q r ~~PFo

Most of past practice has been based on using Laspeyres indexes for adjustment, though not always

with weights tailored to the poverty line as in (4.4), and relatively little attention has been given to the

calculation of the Paasche index. In this section, we focus on the calculation of (4.2) and (4.4) using the data

from a typical LSMS survey.

4.2 PaascIe price index:

It is useful to express (4.2) in a manner that makes it easier to see how the Paasche index could be

calculated from the type of data typically collected in an LSMS survey. Equation (4.2) can also be rewritten in

inme orm:

Wk k k *(p /p (4.5)

where w. is the share of household h's budget devoted to good k. This formula can be calculated from

expenditure data and price relatives alone. The following approximation may also be used:

In ph Wk iFkJ (4.6)

Note that these indexes involve, not only the prices faced by household h in relation to the reference

prices, UUL also I.householdU Is e xApdUitUII1 eJaLLe11, sOr.IIVUUIP UI42L Ls n0t -UUC U-I 4 I.d4ZFVyIV5 1IIUr,e. il,.

distinction is an important one; to convert total expenditure into money metric utility, the price index must be

tailored to the household's own demand pattern, a demand pattern that varies with the household's income,

demographic composition, location, and other characteristics.

-0

l ne reerencV price veir p iemvitably seiected as a mauer of convenience, bui snould noi be very

different from prices actually observed. A good choice is to take the median of the nrices observed from

individual households (for foods and fuels, if unit values are collected) or from the community questionnaire

(otherwise). Especially when using the unit values from individual records, there will be some outliers, not

only for the usual reasons, but also because there are often misunderstandings about units--such as eggs being

reported in dozens instead of in units. Use of medians rather than means reduces sensitivity to such accidents.

The' iie nf a nntininl average nprie vectnr t-nmires that the m ney s mptri, m¶1Pnirp con,nfnrm no 1nlse.r n

41

Dossible to national income accounting practice, as well as eliminating results that might depend on a price

relative that occurs only rarely or in some particular area.

In general, even if quantities and unit values are available at the household level, this will only be the

case for a limited set of goods, typicaily foods and perhaps some fuels. For nonfoods, and pernaps some foods,

price relatives will come from communitv auestionnaires or even other regional sources. and will not be

available at the household level. In such cases, we must use the price relative that seems most appropriate for

each household, in which case (4.6), for example, becomes

In Pp= 'wIn( P /kp.) + 5 wln(p,/p,) (4.7)k.F keNF

where F denotes the set of goods (foods) for which we have individual household price relatives, and NFis the

set where we do not (nonfoods), and the superscript c denotes a cluster or regional price. One further

refinement is likely to be useful. Because the household level unit values are likely to be noisy, and to containfocionaOl rltli-r1 it is nAaa ta replace 4,,. i"A;-.;A,-i, ph bky thnir rnA.i-oa.oe househomld in- -- s.n DVT T

or locality.

Analysts often want to use LSMS data for purposes other than deflating nominal consumption for each

household, and calculate some indicator of regional price levels, or of regional price levels at different times

LUilougll uLe sul vey year. LLis ca. be doe U SUC USig eViUlVl UlV I Pabsce ,nLUdAes of UlID sUUbsecUUIo, VI U1,e L-Y-

indexes discussed below. The most straightforward procedure is simply to take means (or better. medians)

within the relevant region or season of the individual Paasche indexes as calculated above. Such indexes could

be made more relevant to the poor by averaging the individual household price indexes only over those at or

below the poverty line, see the next subsection for discussion of procedures. Note that when all households

within a region R face he isame prices, so th-at

1ph =EWi( Pt/ PW) (4.8)

the average of the (log) prices is Riven by

In ppR= Wk in (Pk / Pk (4.9)

SO UI4L UIV 4ajJLVpIuaL%, 'WCIriLL LVI UIV, 4VV~Iar IIUVA ai:LV UIIU 14II19LI VL LIqI. UUUFr.L bUazes uVV, ~in kI,,

poor) households. Note that is not the same as using the weights defined as the share of aggregate purchases in

42

aggregate total expenditure, weights that are typically used in computing consumer price indexes by statistical

offices. These aggregate weights effectively weight each household, not on a "democratic" basis, with one

household or individual getting equal weight, but on a "plutocratic" basis in which each household is weighted

according to its total expenditure. Because better-off households have, by definition, larger total expenditures,

ute YV.4iAtLL UI oJf LLuLV%146&c indxe zesLV,Ui;JiLve .nore ol rich'l UIofL Up ooUr VA,)11U1LLUr piaLrIIi1, a1 Uid hUaL

causes problems when relative prices change in a way that affects the Door and the rich differently. For

example, if the relative price of a staple food rises, a plutocratic price index will rise by less than a democratic

price index if the staple is a necessity, and the poverty-increasing effects of the price change will be

understated.

4.3 Calculating laspeyres index:

For researchers who wish to follow the welfare-ratio rather than money-metric approach to measuring

living standards, the relevant price index is not the Paasche index (4.2), but the Laspeyres index (4.4). Because

this index uses the same weights for all households, it is typically more straightforward to caiculate than is the

Pascehe, though in hnth ec the hardlest t2cL- is fine1lin the twrire re1ntivrec na raolltn t1Fsvnlt

again, it is often useful to write the Laspeyres in terms of budget shares and price relatives so that,

corresponding to (4.5), we now have

pi z f Wk , (4.10)p .q Pk)

which corresponds to (4.4) or, aitematively, corresponding to (4.6),

} L EWk i A n (4.11)

The discussion of measuring price relatives for foods and non-foods, and of aggregation over

households goes through as before, though when we average the Laspeyres indexes, only the price relatives are

being averaged, not the weights, though the principle of averaging price indexes over households remains

uncnanged.

The welfare ratio approach requires comparison of actual indifference curves with a baseline

indifference curve, here taken to be the poverty-line indifference curve, and the theory requires that the weights

for the Laspeyres index used for deflation be calculated at that indifference curve. In practice, it may not be

43

obvious how to do this. There are usually many households near the poverty line, though rarely many (or even

any) exactly at it, so we lack the data for the quantity or budget share weights in (4.10) and (4.11). A useful

solution to this problem is to calculate weights by averaging over the expenditure patterns of households near

the poverty line, with those closer to it given more weight than those further away. Weights with this property

are conveniently provided by a "kernel" function, here denoted Kh(.) and the weigits in (4.4), (4.10) or

(A 1 1 qre enl3iflated1 frnm

H

jjk = Kr(xh z)wk (4.12)h-I

This sum is a weighted average over al households in the sample of the budget shares w using the

k l-...l We.1iUts. The.re are a n1um11AJ of suimble chLoices CON fne kerr.el fi nictlionlwhI M.ust be posibel.-, r.ust

sum to one over all households, and which must be smaller the larger is the absolute difference between xh

and the poverty line z. One convenient choice is the "bi-square" function

I ( _ _,) -- -

K ((X-Z)=-tnLL± for-; •1 (4.13)

and

Kr(x - z) = 0, otherwise. (4.14)

The nuimbe-r T iS a "bandwidth" that controls how many households are included in the samnple. The

larger is T, the more households are used, which makes the average more precise, but can cause bias by

including households a long way from the poverty line. In practice, setting T so as to include a few hundred

households around the poverty line will usually be satisfactory. These equations are also likely to work better if

xtL and z in (4.12) to (4.14) are replaced by their logarithms, so that distances from the poverty line are

m.easued proporti,onat.el.y, not absolut.e1".

Note finally, that although different price indexes will sometimes be similar, it is dangerous to assume

that this will always be true. Because of poorly developed infastructure, relative prices sometimes vary a good

deal from one place to another, and when this is the case, price indexes are sensitive to the weights used to

construct ulVIII. ,oeU uauil U18L U1U WVigLp Ail Ulu the Pauash IUrdeb xes aIushuUllIU spec-fILc WvlgUw , sO uIal

because household level demand patterns are quite variable, the (apDroDriate) deflation of total expenditure by

the household level Paasche index will generally give different money metric utility ranilings than will (the

44

inappropriate) deflation by local (e.g. Laspeyres) indexes that do not vary from household to household. Even

when price data are sparse, and only available for a few regions, it is still desirable to calculate the household-

specific indexes, not because prices vary from one household to another within the same region, but because

the weights do.

Our recommendation here follows from our original recommendation for the use of money metric

utility. Money metric utility is calculated by deflating nominal consumption expenditures by the Paasche index

(4.5) and (4.6), and that is what we recommend using. Calculation of the Laspeyres index might be marginally

more convenient-though given the other household specific calculations, constructing household specific

-r;- i-A-xs shou-I pose no Ad;.4-1n 1--pnAl.4

45

C ADUSTQlNGrWi2 VWD UfFTTQPIUf%J fL C4n%VDnQOTTIflnV1. Z'Z U%ULA AL Kl q.N A' A AA%.P J.JU LJA~A~ JU ~._J'LVAA %.ULJAJ A A.11

5.1 introduction:

Sections 3 and 4 have presented guidelines on how to use LSMS data to construct a nominal measure

of totai household consumption and of how to adjust it to take into account cost-of-living differences. How-

ever, we are ulntimatelv interected in individual welfare, nnt the welfare nfa lhoueholAd snme.thing that ic hard.

to define in any very useful way. If it were possible to gather data on consumption by individual family

members, we could move directly from the data to individual welfare, but except for a few goods, such data are

not available, even conceptually-think of public goods that are shared by all household members. As it is, the

best that can be done is to adjust total household expenditure by some measure of the number of people in theho,WWWWWld, ar.d to assibgn. +-..he result-.g mvae r.eas tow eacth l.ousehioldd r.e..e s l ik;vsul

Equivalence scales are the deflators that are used to convert household real expenditures into money

metric utility measures of individual welfare. If a household consists entirely of adults, and if they share

nothing, each consuming individually, then the obvious equivalence scale would be household size, which is

uhI u,,ALbUer of people ove;V WILIVIc howUUZVIdUU eApnILtULLrUe aiV JJIVadU. JEveVV- whUer hiiu-ulduu conssis of adu'its

and children, welfare is often assessed by dividing expenditures by household size, as a rough-and-ready

concession to differences in family size. However, such a correction does not allow for the fact that children

typically consume less than adults, so that deflating by household size will understate the welfare of people

who live in households with a high fraction of children.

Moreover- simply deflating household expenditures hy total household size askn meane implircitly

ignoring any economies of scale in consumption within the household. Some goods and services consumed by

the household have a "public goods" aspect to them, whereby consumption by any one member of the

household does not necessarily reduce the amount available for consumption by another person within the

same household. Housing is an important household public goods, at least up to some limit, as are durable

ite l;ka UVO. bCrwV e 6;neJ.byles tor ca, V.ll .U1 -n bea. st.A Iby -tt 1 ho-u1Aold r.A...Ub s+ at di.=-l-

times. Because people can share some goods and services, the cost of being equally well-off does not rise in

proportion to the number of the people in the household. Per capita measures of expenditure thus understate

the welfare of big households relative to the living standards of small households.

In. tUis SetIon wie discuss equi-valen.ce scales in geInera and o-utline some of thie miiairn approaches io

46

their calculation. But before doing so, it is worth emphasizing that we do not recommend abandoning the use

of per capita expenditure. Twenty years ago, per capita expenditure was itself something of an innovation, and

many studies worked with total household expenditure or income without correction for household size. In theoa- '- A A 414- +A cap -_ ba s h> as +1U 0 o+--A-A ---- A..o -A 1tA1 AOI;R O

; s LJl%1, UVLJaLJ1 a WjJI a per a UaOIO A1L.0 UVIIvJ1f U101 "&I%& ,uUJUBU11 v.L 1'u 0.L'.

widely understood, none of the alternatives discussed have been able to command universal assent. As a result,

no calculation of welfare or poverty profile should ever be done without the calculation of per capita

expenditure as at least one of the alternatives. In part, this recommendation reflects the burden of the past;

results are almost always compared with previous analyses for the same country, or with similar analyses for

otner couiLiris w'nici use per caLpita expVndi Ure. But It is also tneui tait v years of vxperiee -w-ith per cupiw

expenditure has given analvsts a good working understanding of its strengths and weaknesses. when it is sound

(in most cases), and when it is likely to be misleading (for example, in comparisons of the average living

standards of children and the elderly.)

5.2 Equivalence scales:

To make welfare comparisons across households with different size and demographic composition, we

need some way of adjusting aggregate consumption measures to make them comparable across households. In

this regard, just as a price index is used in order to make comparable consumption levels of households with

different cost-of-living, equivaience scales are a way to make comparable consumption aggregates of1^h1.aA%^1Ae w.tl Adfifirxt eAavnnm.o,i1vhi.. eirm1tiiit.';o-,W, UJhul .my AiffArp,t mptlinA, hwup h p'up . wnv. cnA in. thin

literature to calculate the exact conversion factors used in each particular set of equivalence scales, the

underlying principle is often the same: the basic idea is that various members of a household have "differing

needs" based on their age, sex, and other such demographic characteristics, and that these differing needs

should be taken into account when making welfare comparisons across households.

The costs of children relative to adults and the extent of economies of scale are of the first-order of

importance for poverty and welfare calculations. Indeed, the direction of policy can sometimes depend on

exactly how equivalence scales are defined. Larger households typically have lower per capita expenditure

levels than small households but until we know the extent of economies of scale, we do not know which group

1i bULLtr ofl, of WheLUVI e, ilt-pOvc4VeLy JrJILdI4I hol b eliUIU Ur, Lu4gLvu to onL oI Ulu Lulur. Rurdl hliuseolius a re

often larger than urban households, and we are sometimes unable to compare rural with urban poverty without

an accurate estimate of the extent of economies of scale. Another frequent comparison is between children and

the elderly, and both groups have claims for public attention on grounds of poverty. Children tend to live in

larger households than do the elderly, and (obviously) live in households with a higher fraction of children. As

47

a result, comparisons of welfare levels between the two groups are often sensitive to what is assumed about

both child costs and about economies of scale, see the calculations in Section 6 below. Issues involving

comparison between children and the elderly have acquired a new salience in work on the transition economies

of Eastern Europe wnich, compared with developing countries of Africa or Asia, have relatively large elderly

ponpnatin"e uulirh _rjPpivP este linnnrt thrniiorh npneinn antid health uihsidie. As a remult the two aoru n are

in competition for welfare support, and an accurate assessment of their relative poverty has become an

important issue.

Unfortunately, there are no generally accepted methods for calcuiating equivalence scales, either for

4m.e .elah'vr cots of nhulArnn or for pt-nonuic of scale.T.he.re are t1Mr vu,or. annp.roac. to Am--nrn--

equivalence scales: (i) one relying on behavioral analysis to estimate equivalence scales, (ii) one using direct

questions to obtain subjective estimates, and (iii) one that simply sets scales in some reasonable, but essentially

arbitrary, way. Each of these is discussed in turn in the sections that follow. Our recommendation, apart from

the continuing use of per capita expenditure, is the arbitrary method, and we offer some suggestions for its

.r-pcil tlnpICL;LMUUM

5.3 Behavioral approach:

The behavioral approach has generated a large literature, much of which is reviewed in Deaton (1997).

'Wnhile there are metiods for calc-ulating t,e costs of children utat arc relatively soundly based - tiough not all

woulld agree even with this -- there are so far no satisfactorv methods for estirnatinL economies of scale. Manv

of the standard methods, such as Engel's procedures for calculating both child costs and economies of scale,

are readily dismissed, see again Deaton (1997) and Deaton and Paxson (1998). One idea that seems correct,

and that can sometimes give a useful if informal notion of the extent of economies of scale, is that shared goods

within the household, or household public goods, are the root cause of economies of scale. In the simplest case,

.- e two sor of g i to h F l*- - - 1 .--old, prgi -..e byor.e o.d o r

person only and where consumption by one person precludes consumption by another, and public goods, where

there is an unlimited amount of sharing, and where consumption by one member of the household places no

limitation on consumption by others. In this case, Dreze and Srinivasan (1997) have shown that, in a household

with only adults, the elasticity of the cost-of-living with respect to household size is the share of private goods

inILU4 11Uo IUi ho-eh ld conumpionU. If OLl gooU O.i Ie p;V8te, t'UsW 11Dei :,,^, UUUI p-oLUo toV 'd^,e U~ ,,U,e ofU1 peop: P.- "^

household, while if all goods are public, costs are unaffected by the number of people. This sort of argument

supports the intuitive notion that, in very poor economies with a high share of the budget devoted to food-

which is almost entirely private-the scope for economies of scale is likely to be smali. In other settings, where

48

housing-which has a large public component-is important, economies of scale are likely to be larger.

Unfortunately, attempts to extend this sensible approach to a more formal estimation of the extent of

economies of scale have not been successful, Deaton and Paxson (1998).

5.4 Subjective approach:

The subjective approach to setting equivalence scales has attracted increased attention in recent years.

One widely used technique is the "Leyden" method pioneered by van Praag and his associates, see van Praag

and Wamaar (1997) for a recent review. In the household survey, each household is asked to provide estimateso~f th.e am .t of i,nmnp it 1x,ni1 nMeedA as that th.eir 16aiatances he A^otvib1dA as "e.ry badA '" "bA d,

"insufficient," "sufficient," "good," and "very good." Suppose that the answer to the "good" question by

household h is ch. From the cross-section of results, Ch is regressed on household income and family size (or

numbers of adults and children) in the logarithmic form

ln ch = a + f In nh + Inyh (5.1)

e _ .~~~~- - - .- ._ ---- I . ._ --- .- ,t < !_ -I ,1 , ,,,,,,,,*_,l ms equaiuon IS useu wL 1Sal;u1La ILu 1UV ci or iume y wun us nousenolu woua nave lo nave m oruer to

name its actual income as "good." Evidently, this is given by

In yh = + p In nh (5.2)1-;v 1-Y

ii^ y is interpreted as a measure of needs in that it wouid be regarded by a househoid receiving it as

goano "thpen the nii2nrtitv A / - vinedto he inttprnret a the e1sqtiitu nfneptil' tn hn1,cPehnId Sj7'e nn

(a negative) measure of economies of scale. van Praag and Warnaar report an estimate of,B / (1 - y) for the

Netherlands of 0. 17, 0.50 for Poland, Greece, and Portugal, 0.33 for the US. Taken literally, these numbers

indicate very large, not to say incredible, economies of scale.

Eve if wxe accept the ger.eral trsAiodo ym, itiard ti tale .Ir se niat,s. .sseiously. T- -4l,- if

the costs of children, or more generally the costs of living together, vary from household to household, the

estimation of(5.1) will lead to downward biased estimates of fi. To see this, rewrite (5.1) including the error

term as

ln ch =a + l 1nnh + y n yh + uh (5. l a)

r ne term u vanes irom one nousenotu to tne next, ania represenis uie iciosyncrauc costs oi iving ior

49

that household, the amount that household needs above the average for a household with its income and size.

The trouble with this regression is that households choose their size nh, partly through fertility, but more

importantly by adults (and some children) moving in and out. People who like living with lots of other people

will live in large households (high nh) and will report that they need relatively little money to live in a large

h;.ousehol.d (ow ,h) A a result, he W... b e,..t,rpl. l. .-A ,npthi h-ouhold s h ar.d

estimates of .0 will be biased downward, consistently with what van Praag and Warnaar report.

5.5 Arbitrary approach:

,al Wil U1i c.U.LI.e.^ ureliability of eiul.eU1 UIC beLIaV1i1rI or ULC OUjectI-ve ap-IJJIIGchI, LL'*.e. is Irl.uh to b

said for maldng relatively ad hoc corrections that are likely to do better than deflating by household size. One

useful approach, detailed in National Research Council (1995), is to define the number of adult equivalents by

the formula

AE=(A+aK)K (5.3)

wnere A is ne numnber of aduits in e nousenoid, and k is me number o c'iidren. i ne parameter a is me cost

of a rhild relative to that of an ad1ult, and lieS s-mewhere between 0 and 1. The nther parameter, 0, whic*h also

lies between 0 and 1, controls the extent of economies of scale; since the elasticity of adult equivalents with

respect to "effective" size, A + a K is 0, (1 - 9) is a measure of economies of scale. When both a and Oare

unity-the most extreme case with no discount for children or for size-the number of adult equivalents is

simply household size, and deflation by household size is equivalent to deflating to a per capita basis. An

lora+h¶.r^at v,oemv.-in of (5 .3) is f,eu.nn.nnlh.ap use innn ntp,lmr.nle. *l,p th.ec fi adut col.rtsn nas or., . subseque.^o.t

adults are discounted, so that the A in (5.3) is replaced by) + fi (A - 1) for some 0 less than unity. This is

really an alternative treatment of economies of scale so that, if this scheme is used, the parameter 0 would

normally be set to unity.

Acae carL Ve *mUV 10fo ULU -poJs,UbioUn U14L tulAllltb UU l4%,r%Ac lb LVI Use (J.3) fox UIV iulL)Vln 01 1UUIL

eauivalents. simplv setting a and 0 at sensible values. Most of the literature - as well as common sense -

suggests that children are relatively more expensive in industrialized countries (school fees, entertainment,

clothes, etc.) and relatively cheap in poorer agricultural economies. Following this, a could be set near to unity

for the US and westem Europe, and perhaps as low as 0.3 for the poorest economies, numbers that are

consistent with estimates based on Rothbarth's procedure for measuring child costs, Deaton and Muelrbauer

(18ORI) nid Dle2ton (1997). Tf we think nf enononmies f scale s comincr frnm the PYiqtPnr-e nf haredh pu mih

goods in the household, then Owill be high when most goods are private and low when a substantial fraction of

50

household expenditure is on shared goods, see Section 5.3 above. Since households in the poorest economies

spend as much as three-quarters of their budget on food, and since food is an essentially private good,

economies of scale must be very limited, and 9 should be set at or close to 1. In richer economies, 9 would be

lower, er.iaps.nI'u. _eio of 0.,5.

In Section 6 below, we argue that it is important to assess the robustness of poverty comparisons using

stochastic domninance techniques, and we sketch out a simnple methodology for doing so. When the results are

not robust, for example when the comparison of poverty rates between children and the elderly is sensitive to

the cnoice of a and 6 within thle sensibie range for that couniry, there is probaDby not much aiternauve to

farinog failuire snqarelv. Certainly the behavioral nnmroacrh is unlikely to nrovide estimates that would he

sufficiently precise and sufficiently credible to support such fine distinctions. In such situations, it might be

better to turn to other indications of well-being, such as mortality or morbidity. When the analyst is not

concerned with situations in which everything depends on the choice of a and 9-for example in comparing

the poverty of children and the eiderly-our recommendations are straightforward. At the first round, calculate,ePr nariit~ pviermntliirp~ finr p!)fh hn,,cphn1,l hv, Al<fhitino thP Pvr,Pv,,itinp aorcwroratP hii h,nuce4inlAl d., Ac "i

alternative, and likely more accurate supplement, use the arbitrary method, with values of a and 9 set

according to the level of development. In poor economnies, we reconunend setting a low, perhaps 0.25 or 0.33,

and setting 9 high, perhaps 0.9. Children are not very costly in poor, agricultural economies, and when the

budget share of food is high, there is not much scope for economies of scale. As we move to richer economies,child.JU1-er. z-e- meloatti;ely moare expensive, z.-d economieS of scIale larger. 'XTD (1~ n995) .- c-A...e se"-.g#- -U-

parameters to 0.75 for the US, and others have noted that the official US poverty lines are quite well

approximated by setting a to be 0.5 and 9 to be unity. To some extent, these parameters are substitutes for

one another; a low a goes with a high 0, and vice versa.

-o ~_ _ -_- _I -- - .-- ~ - - .-.- ~r FIr uiO eLUUiIualy UonLsULting UlbV HIfsdbUWS, 'ulser- iU al iiuporwii L.UeUhdi aUUIL piUL lb IS dsULuse m

the second naragraph of Section 6.4 below: exnenditure measures divided by eauivalence scales need to be

normalized prior to use.

51

Box 3. Adjustments for Cost-of-Living Differences and Household Composition

i Issue i Recommendation

| Cost-of-Living Differences

Nominal consumption aggregate must be adjusted to take into account Use price indexes to adjustdifferences in cost-of-living in different narts of the countrv nominal consumntion

Oft.e,. a vari>-y nf altPri-nn,tiveP souraes for nrril r im,l.ng, (in ) Imnit valueII I P w itT i wku.n-srvy rri.ces l

from the survey itself, (ii) prices collected in the price (community) supplemented by prices from the

:<r___ - -_.1 A_+ __- _r___ A_ :_. _ _%JUL&V0UiWUULQLV "I aW.A IJ) CmO.IIIUAJ. LELI(, IJL A"I .AGIIq, JLLIJIi rUV L. %-J 1 OL YVU_I VLLyV,J J9i. UVUV1U14UJLq, 1 VU avUaiV

LJIiIWr?t typus of pnre;s iunudxs;

Paasche Index: A useful approximation in calculating the (log of the) indexis to take a weighted average of (the log of) the ratio of prices faced by the The Paasche index is ourhousehold relative to a set of reference prices, where the weights of each preferred price index to use toprice relative are the budget share devoted by the household to the good adjust for cost-of-livingconcerned; in practice, because prices are rarely if ever available at the differences faced by differenthousehold level for each and every good consumed by the household, prices households.obtained from the community questionnaire can be used as a proxy for the |I rices faced by the household for some of these goods

I TL.qenvres Tndex: AS bhove_ the LasTPyres index can be apnroxYimated hya Aweighted average of (the log of) the relative prices, though in this case the-. 4-Un--A..tu are *k average (;- a --rnn..,*r; r.t - l-+.c1-m--44 --nse) bAudgt

shares devoted to the good concerned in the sub-group of interest. OnceI agaui, p[ierelafi-vebV fU[ a subseU VI gUUU [MY nUeeU 'o be .i&H irrum Ue I

community (or price) questionnaire instead _ l

Household Composition differences in I Need to deflate householdHousehold aggregate needs to be adjusted to take into account differences in aggregate by appropriate measure

I size and composition amongst households I of size/composition

Different methods of deriving deflators; including the behavioral apnroach, Continue usqing PCEl|the subjective approach, and the arbioy approach n supplemted with measures

L{1uV h i of1 1pWrLVf5 a 4liu v use iow a and hign v in poor Icountries, and the reverse in richer

I___________________________ ____________________________________ 1 countries l

52

6. METHODS OF SENSITIVITY ANALYSIS

6.1 Introduction:

Although the general procedures for calculating money metric utility are well-defined in theory, in

practice, compromises have to be made, and difficult choices have to be made between imperfect alternatives.

Is it better to add in a poorly measured component of consumption-sucn as imputed rent, or a component utat

is lutni.ivy ar!d tmamSirtnrJv-such as health exen~diturie-s-and sa~crIfioie acciirae for ain at-t-emnr-tr-t at cfrnmn-etenes--

Decisions about equivalence scales are almost always controversial, and even if we use the formulas (5.3) or

(5.4), how do we know that the results are robust to the choice of parameters that control child costs and

economies of scale? Even with perfect estimates of money metric utility, poverty analysis is subject to its own

inherent uncertainty associated with the difficulty of choosing a poverty iine. Although there is much to be saidfor malidnrr theP best der.cisions one can picktinga senshible p^orvfhrt line,t-h an prPesinge ah it icn>

informative to examine the sensitivity of key results to alternatives. In recent years, much use has been made of

stochastic dominance analysis to examine the sensitivity of poverty measures to different poverty lines, and

this work has led to a much closer integration between poverty measurement and welfare analysis more

generally. Stochastic dominance techniques can also be useful in examining the sensitivity of poverty analyses

LU U1, way Hi W.LL.c r.oneI)y 111r.LI,e UtiLty 1i coUnstUcted, InclUUdiLr,g L U1A, UcLoLU LcUL L VI eqULVa',1eIr, sal. III

this Section, we explore some of these issues.

Suppose that we ha-ve a money metric utility Measure which, for the mUmeL tIIU LU treUUdc nUoULaLIUW

clutter- we denote by x. Su.npose too that we are interested in the headcount ratio (HCR). the .oronortion of

people whose money metric utility is below the poverty line z. If F(.) is the cumulative density function of x

in the population, F(z) is the fraction below z, and thus is the HCR. The sensitivity of the HCR to changes in

z, can be assessed simply by plotting the HCR as a function of z, i.e. by plotting the cdf F(z) as a function of

z. Suppose tnen tnat we nave two measures oi money metric utility, xo and x1, corresponding to two different

Apt-idiins ahnit c,n.rtriuc.tnn. Suinnnse~ th2t these decisnns a e su.h that it rnskes S-nc to USe th rie Sam pvi'ertv

line for both - this will be the case if both are unbiased for the true money metric utility, and neither is more

precise than the other. We discuss what happens when this is not the case in the subsections below, though it is

sometimes obvious how to adapt the poverty line in moving from one situation to the other. Then if the two

cdfs are F, ( and F2 (, the two HCRs are F, (z) and F2 (Z). Plotting both of These functions against z on a

s.i..gle. g... how.. ,.4.1, v.,1iich n.,tes th.e .h. e.ghe H-CR, arnd lx,v h dflu,. A .c in. FT.CR1s ,tirns~ upx,1it.e choice ofp rB.f

poverty line z. Figure 2 illustrates the lower part of the cumulative distribution

4F(x)

cdf of measure I

/ cdf of measure 2

I -s 1 i

L. ~~~~~~~Z !h !Za ~~Zb X

Figure 2: Cumulative distribution functions of two measures of welfare

functions for two (imaginary) measures of welfare. If the horizontal axis is thought of as the poverty line, each

line tells us the fraction of people in poverty corresponding to that poverty line. Putting the two graphs on the

same figure tells us how ronust the head count rano willb be o ui -coice oI measure ad ifiierent poverty anes.

Fnr arnu low enouAh poverty line helow t Lhe headcount ratio will be higher for measure 2. Between choice

of poverty line between Za, and Zb , measure 1 gives the higher poverty count, reversing again above Zb. Given

some idea of the relevant poverty line, such figures tell us how the choice of measure affects the headcount.

This rather mechanical exercise becomes more interesting when we come to construct poverty proniles,

fovr example for Aifferent voups, such ea chilAre n.ad the elderly, or holcehkolds in differ.nt regins. Suppose

54

that we have two groups G and H, and that the conditional cdfs of the two measures are now F,( . G ) and

F2 ( .j G) for G with similar expressions for H. What we are typically concerned about is whether the relative

pov.rty rates of G arnd H are cp"a;ve ton Lh.e *hie%bt- hPftp te ftk, o t m.easiiesr arA to .what extent the

conclusion depends on the choice of the poverty line. For poverty line z, and measure i, for i equal to 1 or 2, the

difference in poverty rates between the two groups is

A i (z)=Fi(zlG) - Fi(zlH). (6.1)

Plotting Ai ( z ) against z for a given i, and seeing whether it ever cuts the horizontal axis, tells us

WIVUIVI LUIV fUVfIL- r iAiuung VI UW LWU '- UUJfO lb bTiseI-UV LU ULU ;UPVe Uhic VI pVVILj HIMie. PIULLUIg ULU LWU Li

functions on the same graDh tells us whether, at any given poverty line, the ranking is sensitive to the

construction of the utility measure, and whether that sensitivity (or lack of it) depends on the choice of poverty

line. A worked example of this kind of analysis is given in Section 6.3 below.

Sensitivity caicuiaiuons ior mie nea-c;ount ratio invoive ml'e comparison of ine cais o0 iwo uismriouuons.

Similar calculation-, are possible for other povertv measures- for example, the sensitivitv of the novertv gan

measure to the poverty line can be examined by plotting the areas under the cdfs, see Deaton (1997) for a

review of the literature and for examples. These higher order stochastic dominance comparisons can be used in

the same way as above to examine the effects of construction on higher-order poverty measures.

6.3 Using subsets of consumption and the effects of measurement error:

It is often clear from the data collection exercise or from the subsequent analysis of the data that some

components of consumers' expenditure are much better measured than others. Food is sometimes thought to be

easier to measure than non-food, if only because in households that eat from a common pot, there is a single

we 1ll- .fo..,.edA al vuo can act s .-. - T-,..4 ,to 4 - -a nt., q-te wtn , .-of r -1-,', +l,-o f-r

imputed rent for owner occupiers in an economy where house tenancy is very rare. As a result, most analysts

who have had to work through an LSMS survey, writing code to make the imputations, tend to be rather

unwilling to make much use of the subsequent numbers. Whether it is better to use a subset of well-measured

expenditures to assess poverty is an important question that has been raised by Lanjouw and Lanjouw (1996).

tAs We have alrady ser,II, UbVIRIL4Iy Ulu sa,liv issuve aislv III UdUeIding -WilvUIvi Uo SLot to ulPiuUV air eAper,di.ure

item where there are large. occasional expenditures. Transitory expenditure around a longer run mean is

effectively the same as measurement error. In the rest of this subsection, we sketch out some results that are

useful in thinking about measurement error and transitory expenditure. While we follow the lead of Lanjouw

and Lanjouw, there are some differences in the analysis, both in methods and in results.

55

Before going on, it is worth noting that instrumental variable techniques for measurement error that are

standard for making imputations, or for correcting regression analysis, are of more limited use when we are

concermned with meaLuring povertv or inequality. The es-nettial probhl-m is that povertyv and inteqnulitv depend

on dispersion, not means, or even conditional means. If we are trying to estimate the mean expenditure of the

population on some item, and some households have missing or implausible values, it is standard practice to

impute an estimate, often from the mean of similar households, or more generally, from a regression using

instruments, variables that are thoughit to be correlated with the missing information. But because such

regreacsiins only cape a fraltin of the v iariationinthe "ie xr.le the ;fitedA value swill tb lless varilkle-e,^-w^-.j a of U.- -e -_ mid_l-,Uv -v -w swa -.. V .0 *-& IV

than the actuals, and imputation will tend to reduce inequality and poverty (if the poverty line is low enough.)

Of course, for transitory expenditures and for measurement error, variance reduction is exactly what we want.

But imputations are likely to eliminate not only the measurement error, but also the genuine variation across

households, something that we need to preserve.

Start by assuming that there is a subset of total expenditure, such as food. expenditure on which is

denoted by e, and that, conditional on total expenditure, x, we have

E(elx)=m(x); V(elx)= 2r (61,

The regression function m( x ) can be thought of as an Engel curve, or as the true value of x whenx

is measured witU error, or tue long-run vaiue of x when x has a large transitory component. h ne poverty line in

terms of xis. as before, z, and the cdfofx is FQ. so that the head count ratio is F! z ) Supnose that in qte-ti

of defining the poor in terms of low x, we define them in terms of low e; to do so, we must select an

appropriate poverty line for e, and one obvious choice is to take the level of e on the Engel curve where total

expenditure is equal to the poverty line, i.e. m ( z ). The headcount ratio using e is then given by

rs-. = _4 / _ / - It I .-re Fe I lS ( h/ / ko9.3)

where Fe.() is the cdf of e. If we assume that m ( x ) is monotone, and therefore invertible, it can be shown

that pe is related to the "true" headcount ratio pX by the approximation

A a 2 f ( (f (z) m(z))

56

where f (x) is the pdf ofx. (This result is closely related to those derived in a somewhat different context by

Ravallion, 1988.)

Note first that when the Engel curve fits perfectly (or there is no measurement error, or no transitory

expenditure), so that a = 0, the two poverty lines coincide, a result that is exact. Otherwise, the two poverty

counts will diverge in a way that depends on the slope of the density of x at the poverty line, and on the

convexity or concavity of the Engel curve. When the Engel curve is linear or when we are dealing with

transitory expendiures or measurement error, uie second term in brackets is zero, so that "food" poverty wiii

overstate "true" poverty if f'( z) > 0 7which will occur if the density ofx is unimodal and the novertv line is

below the mode. If this condition holds, the overstatement will be exacerbated if the Engel curve is concave,

and moderated if it is convex.

These results are a useful starting point, but are not directly practical. if we knew both x and its

Cnm.nnnqint e there uiniil hbe nn nieead tn use the lattert Nevertheless, there tare twn imimedniate nrn11nri.es thAat

are more useful. The first is the case where m ( x) = x, so that e is just an error ridden measure of x, so that

(6.4) becomes

Pe tsPr + ay 2f'(z) (6.5)

which gives us a guide about how measurement error inflates (or deflates) the poverty measure. This formula is^.1-14c lal se, .. 4.1 ,,1- ., 1k - - _ WA-. o +uA wiane_ arth eo..-.c) 1 ^.^f -a-l,; ^jim L'.I m £3Q Ly A.kI Wk.V'.A WV iGAY V D%JLLW. IUV"I J1 UA VGaL1GALV- %JL IJ'u J..uIA 41'.A 4 L Au C'VLA'4, JIwL LAGLAIjJiL,

could be estimated from two error-ridden but independent measures of x. Note also that (6.5) is the basis for

the (often somewhat mysterious) result that for unimodal distributions, where f'(x) is first positive and then

negative, adding measurement error increases the head count ratio if the poverty line is below the mode, so that

f rz) > 0, and decreases it when the poverty line is above the mode, where f rz) < 0. Except in the very

pooiesarea, we wouiU expect 'Uep pover.; LIeV LU UV, UVIVW UVle UIUUV.

The approximation formula is also useful when considering whether or not to include a poorly

measured component in the total. To simplify, suppose that e is the noncontroversial component of the total x,

so that adding in the controversial component would, in principle, take us to the total x. Suppose that the Engel

curve for e is linear, so t'ihat tihe denivaiive mm'(x) is corLstant, cqual to p say. To avoid confusion, rewrite its

variance around the regression line as ar _ where the subscrint e identifies the noneontroversial comnonent

From (6.4), the poverty count using the comprehensive, but noisy measure is

C57.JI

P.-P. + ac, f'(z) (6.6)

where a. is the measurement error in the comprehensive (but noisy) total; c is for comprehensive. From (6.4),

the poverty count using the non-controversial component alone is

"r2 f'(z )P,_Px + 2* ' (6.7)

3i11LA It is nUUnflfly uie UcasU UitLL LUle PUVV[L liU is below uie m euu, wu Ucn assume ulat X (z) is posiuv;, m

which case the povertv count based on the comprehensive but noisy measure will be closer to the truth if

p8 < ae (6.8)ac

Note that , is the share of the marginal rupee devoted to the non-controversial good, and that 1 -/p is

the ahare orning tn the rontirnvemrial cinnt sn that the p.q e fnr ineliminn nf the cnntmnrnversia! iteni ic strning if at

the margin, a large share of total expenditure is devoted to it, while the case is weaker the larger is the ratio of

variance in the comprehensive measure to the noncontroversial measure. This result is perhaps not surprising.

A strong link to total expenditure is a case for inclusion, while making the total noisier is a case against

conclusion. Note finally that (6.8) can be written in terms of the total-expenditure elasticity of the non-

('ffli(VI7PV,R,P; .1 Pof.f oYIlflh Tt . .2,IA^ th.G r.a1,,i-A,n i.rn.1^ o,¶p 1i-t VVWSO- OQ

E_ < /e <6.9)01/X/x

S;UrV .ce %5.e LuuI s.o uhL coI. WAInuv vial .U noU.cL1L%-UUo-VVIid 'LdbLlLicLlre is UILY, (u.9) lb a

prescription of including controversial items if their total expenditure elasticities are large, provided they do not

add too much measurement error. Ofcourse, neither ae nor ac can actually be observed in practice, but the

formulas (6.8) and (6.9) tell us what to look for and what to think about when making the decision to trade off

comprehensiveness versus precision.

6.4 Sensitvity analysis with equivalence scales:

Suppose that we are working with the formula (5.3) that links adult equivalents to the number of adults

A and the number of children K according to

58

EA=(A + aK)° (6.9)

and that we do not know a or O, though we may be prepared to commit to a range of values for each. Given

values for the two parameters, we can compute money metric utility values for everyone so that, armed with a

poverty line, we can calculate poverty rates for any groups. In this context, groups that we are particularly

likely to be interested in are children, adults, and the elderly, as well as other groups where househoids have

dL A i .t.d cor.J&.fl..ons, such* tas* .t a. ,.fo b. mon sSflt&c. S e. tJ *Jtv v

of a, 9, and z, proceeds in very much the same way as discussed in Section 6.1 above.

However, as in Section 6.2 but in contrast to Section 6.1, we cannot simply change the parameters and

leave the poverty rate unchanged. For example, suppose that a is set at 1, and 0 is reduced from I to 0.5. As a

lrsulL, el WU UU U 1%UU9QU V.1 alv l _1OL40VAULUUD VJLWVFL U1vOU W UI vuL/t _ 3 jrJ' . 'Jvll$ O UIGL, ii *1 fR), V'..W' mY

line were held constant, poverty would be decreased. But this is not what we want changes in the parameters of

the equivalence scale to do. Instead, we want to alter the relative standings of large households relative to small

households, or households with large numbers of children relative to those with none. A straightforward way to

do this is to select a particular household type as "pivot," and .to choose the equivalence scale in such a way

tiat the money metric utility of people m such houshnoids are unafecteud by changes in the paraneters. Denote

the number of adults and children in the reference or nivot household by (Ao . Ko );in practice this should be

chosen as the modal type, for example, a two adult and three child household. We then define money metric

utility, not as x divided by AE, but as

x (+aK 0 ) ° (6.10)(A+aK)° " Ao+Ko

A^t an.y g4San".v.,alues of a an.d 9, X..¾ ~isjut a scaled ve.rsion of X / A F; but .fnr thep .referer.re hn,,csehd,i X

is always equal to per capita expenditure, and is unaffected by,changes in a and 9.

An alternative procedure, not pursued here but equally useful in practice, is to alter the poverty line for

use with equivalent expenditure so as to hold constant the measure of interest, for example the head count

I auu. 1LIa 1aE W * 1UOLDLIIMY UvIJ w U UJ u nAUL. i -uLwu"sWl JJI. L& V GW.4L. UAJ WS V11 *V ull 1wu %U31VLhi VUJ

dividing total expenditure by equivalent adults calculated using the chosen values of a and 9. For a trial

poverty line, calculate the head count ratio, and continue adjusting until the head count ratio returns to its value

using per capita expenditure. Equivalently, the ratio of the new to the old poverty lines can be used to deflate

expenditure per equivalent, at which point the original poverty line can be used.

J7

Figures 3-5, reproduced from Deaton and Paxson (1997), show what happens to the relative poverty

of children, non-eiderly adults, and the elderly in South Africa using the 1993 South African LSMS. 1flese

calculations are done on an individual basis wihPrPey whker. mnen metric ltilityis assigned to ahusehioed, it is

assigned to each person in that household. When we are doing population calculations, such as a mean or a

measure of dispersion, the money metric utility of the household is weighted by the product of the number of

people in the household and the household's sampling weight or inflation factor. Figure 3 shows the cdfs for

the three groups, for a range of possible poverty lines, and for nine combinations of values for a and 6 .11AQASpC AV ' JLCUA VM4UV ,, *arA - M4 - iA.J+ J - I-U., 1- - ..lAA. 13 _A- a waj 1+a1- U- UJ. I

as9;Muskvs ^9 v uw avvwssXas. =LssiecB9,ls, viL uw poveri,y &LI1AI CLUL&sW LD a CUsW ala. Ll1v v a vw%,J

headcount ratio than do children or the elderly. The poverty profile of the elderly versus that of children

depends on the values of the parameters. In the top right of the figure, where children are cheap, and

economies of scale are large, children do better than the elderly, who benefit relatively little from either

economies of scale or inexpensive children. At the bottom left of the picture, where there are no discounts for

WuIWWlulI U! I iaigv DLr Z%V UI4L IiiUiiry IILUULV UUIJLY lb CAPUI1UILWIC PU %RtplW., Uir. UUIUM,WVII WV IIIU[U UKLY LU

be poor than the elderly at all poverty lines.

Figures 4 and 5 show plots of the difference between the cdf for the elderly and the cdf for children for

the same range of the poverty line, but with plots for different values of a and a on the same graph. By

discarding the automatic increase in tne cdf with fne level of the poverty line, and looking only at differences,

these rapnhs permit reater focis on the differences of intrc-t here the elderlyverss c.hildre'n Figure 4 shouw

the movement on Figure 3 from top right to bottom left, and shows how children become relatively poorer, and

that, in the middle configuration, with a = 0 = 0. 75, the relative poverty rates depend on the value of the

poverty line. Figure 5 shows the progress through Figure 3 from top left to bottom right, and shows a more

muddied picture. All three graphs show that the relative poverty rates of the two groups depend on the poverty*fl1;. _l. f ^1.%; .l1 tan. 'inn M be laes poor at V (lWk.dC

60

.8 ci~~~~~~~~~~~~~~~~~~~~~~~ederly elderlyr

1 fiPi-[f- H v 1 45a-w7Ss 5 < t s Il~~~~~~~~~~~~~~~~~lpha = 5,*c tht = .

O .S8 alpha 1, theta 5 alpha = .75, theta = .75 alpha = .5, theta = .75e

.6

.4 chhildree c

{:8alpha 1 , dtEa = ap -.75, tbt -1a =.75, thleta = 175

ell"dey 7_.2-

C5

childerl: diei E,preLivdrentepnlur

alpha - 3: thelta fia povrt ahe ladun atios ahta vliu oe alnesam fo:r theaniu I i ot n so cl

.6

.4 eldery eledry

.2 - 0 10 5

0 0 1010 200 250 0 50 100 150 200 250 0 50 100 ISO 200 250

Poverty linie in. PEX, per eq[uivatlent expenditure

Filgure 3: Souith Africa, poverty headcount rastios at various poverty lines aind for varilous child costs and economnies of scale

61

.05 Ialpha=theta=05

- - ------- alpha=theta=0.75

I | \ alpha=theta= I-.05 . t

Sc~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I I

0 50 10O 150 200 250Poverty line in PEX, per equivalent expenditure

F;.mlre A4 Sutfh AfAro nnvpovrtu ra*hte nf thea iedrly and ,hdilArern

alpha= I, theta=0.5

*~.02

alpha=0.5, theta=l

-NA

04 alpha=0.75, theta=0.75

0 50 100 150 200 250

Poverty line in PEX. per equivalent exnenditure

Figure 5: South Africa: poverty rates of the elderly and children

62

What should we conclude from sensitivity analyses like these? Much of the time, the desired result

from a sensitivity analysis is to find ihat the results are robust, so that clear conclusions can be drawn. This will

sometimes be the case, but rarely for the analysis of equivalence scales, where we know from a large body of

WU.k+ U14L bVIIIV, I1irp LGULL 1aO.su are .- ot LoUUsL. LA16LVL, LJeDatLVL.9LLU aA.d CL&OWL DW s:.riGL DVIe.UVAUc U%,LW'..rLL

the relative poverty rates of children and the elderly, not only for South Africa, but also for Ghana, Pakistan,

Taiwan, and Thailand, but not Ukraine. In the absence of a breakthrough in behavioral and or subjective

methods of measuring equivalence scales, it may simply be necessaxy for policy to be conducted in ignorance

of the relative poverty of some groups.

This section is somewhat more speculative (as well as more technical) than the other sections in these

guidelines. Nevertheless, there are a number of general points and recommendations that should be drawn from

the analysis.

First, to the extent that the welfare measures are to be used for poverty analysis, and in particuiar the

CAl-IlntiAnn Of ht-eadcoumt rtinos, the firto rderpsto.1hastic dnminanne ttec.ihnii nqf Sef.'iR n 6.2 (illustrated fnr

equivalence scales in this Section) are easy to use and often provide useful insights. That said, these techniques

should not be used to check out the results of every controversial decision in constructing the consumption

aggregates. There are so many points where judgment calls have to be made, and they combine with one

another to produce an impossibly large number of alternatives. Decisions have to be made for better or worse.

But there are often. critica decisions, of whirh that ahout equvl.en.ce scales it o..e, an.d the ncluion of a noisy

item of expenditure is often another, where we know in advance that the decision is going to matter for the

poverty analysis, and where it is important to have more information on exactly how it matters. For this,

stochastic dominance analysis is ideally suited.

oewnU weQVL aLLv nLo6AV-LIL..,ILU.er.&f onLabLL howV* LV rIVLL%0.e .MVGOUL%ALLIL .VLV at LVp:. UL. LO LL.UL.r a

auestion of survey design. The crucial point is always to be aware of it existence, and to ask, every time a

decision is made, whether or not that decision would be different depending on the extent of measurement

error. We hope that the formulas in Section 6.3, although no panacea, will be helpful in that enterprise.

63

Blackorby, Charles and David Donaldson, 1987, "Welfare ratios and distributionally sensitive cost-benefitanalysis," Journal of Public Economics, 34, 265-90.

Blackorby, Charles and David Donaldson, 1988, "Money metric utility a harmless normalization?" Journal ofEconomic Theory, 46, 120-29.

Chaudhurii Shubhan and Martin Ravallion- 1994; "How well do static indicators identifv the chronicallypoor?" Journal of Public Economics, 53, 3 67-94.

Deaton, Angus S., 1980, "The measurement of welfare: theory and practical guidelines," LSMS Working'n---XT- 7 r%f: l lTIv l

IV" OU. I,v atIIIt6UI, iJ'.. i11 BU I T IU LdII.

Deaton, Angus S., 1997, Tne analysis of nousenold surveys: microeconometric analysis for developmentpolicy. Baltimore, Md. Johns Hopkins University Press for The World Bank.

Deaton, Angus and Margaret Grosh, 1999, Chapter 17: Consumption, in Margaret Grosh and Paul Glewwe,eds., Designing Household Survey Questionnairesfor Developing Countries: Lessonsfrom Ten Yearsof LSMS Experience, World Bank (forthcoming).

Deaton, Angus and John Muellbauer, 1980, Economics and consumer behavior, New York, CambridgeUniversity Press.

Deaton, Angus S., and John Muellbauer, 1986, "On measuring child costs: with applications to poorcountries-" Journal of Political Economy; 94; 720-44.

fTlotnn Anmgu S. and Chrictinn 14 Paxsnn 1 O9R "1rnncnmieP nf seal, husehphnlA si7pz anti the diemai.nd for

food," Journal of Political Economy, 106, 897-930.

Deaton, Angus S., and Christina H. Paxson, 1998, "Poverty among children and the elderly in developingco-ur7tries," RXresea-ch K1r g in DrevelOp,l-ent SLuUies, PrincLUII U111vF.1Ly, p[cUUsseU.

Diamond, Peter A., and Jerry A. Hausman, 1994, "Contingent valuation: is some number better than nonumber," Journal of Economic Perspectives, 8, 45-64.

Dreze, Jean and P. V. Srinivasan, 1997, "Widowhood and poverty in rural India: some inferences fromhousehold survey data," Journal of Development Economics, 54, 217-34.

Grosh, Margaret, and Paul Glewwe, 1998, "The World Bank's Living Standards Measurement StudyHousehold Surveys," Journal of Economic Perspectives. 12. Number 1 187-196.

Hamnemann W Michael, 1994, "V2 th5 enivimnmentthmliirh cnntingent v2Al1tinnf final nf ,onom.ic

Perspectives, 8, 19-43.

Heckinan, J., 1976, "The Common Structure of Statistical Models of Truncation, Sample Selection andLimited Dependent Variables and a Simple EstiMato or fSuch Models," Annals of Economic andSocial Measurement 5:475-92.

Howes, Stephan and Salman Zaidi, 1994, "Notes on some household surveys from Pakistan in the eighties and

64

nineties," STICERD, London School of Economics, mimeo.

Lanjouw, Jean Olson, and Peter Lanjouw, 1997, "Poverty comparisons with noncompatible data: theory andiUlUUsLAor,s, A Pol AP%ese&h1 TV n UZLUP Aper, Was:V GUAritor., DJC. "%P,.e TV VIAS CuLJ.f

__T__ J ~__ -= T% TN In 70 XS_ __ __T: :._- 1 TS__ __ J- L_ -r_-1 A_s_sw t :s :_ _.LeU, L. anu I roUst, . 19 I 8, bEtimauun oL SomIe LimiteU Depunuunt V ariaule ivsoulb wiUI Appulcaton to

Housing Demand," Journal of Econometrics, 8, 357-382

Malpezzi, S. and Mayo, S., 1985 "Housing Demand in Developing Countries," WorldBank StaffPaperNo: 733,The World Bank Washington D.C.

National Research Council, 1995, Measuring poverty: a new approach, Washington, DC. National AcademyPress.

Ravallion. Martin. 1988. "ExDected povertv under risk-induced welfare variabilitv." Economic Journal. 98.1171-82.

Ravallion, Martin, 1998, "Poverty lines in theory and practice," LSMS Working Paper 133, Washington, D.C.Th.e World Bank

0aniauelbso, Paul AL., 197,4, C01'' CTCNml.wiUw..-j-An Cssay oni 'use. -v aruniveasaLy of 'uir ru'A-tlleu

revolution in demand theory," Journal of Economic Literature, 15, 24-55.

Singh, Inderjit, Lyn Squire, and John Strauss, 1986, Agricultural household models: extensions andapplications, Baltimore, Md. Johns Hopkins University Press for The World Bank.

van Praag, Bernard M. S. and Marcel F. Wamaar, 1997, "The cost of children and the use of demographicvariables in consumer demand," Chapter 6 in Mark Rosenzweig and Oded Stark, eds., Handbook ofPopulation and Family Economics, IA, Amsterdam, North-Holland, 241-273.

65

American Economic Association

Data Watch: The World Bank's Living Standards Measurement Study Household SurveysAuthor(s): Margaret E. Grosh and Paul GlewweSource: The Journal of Economic Perspectives, Vol. 12, No. 1 (Winter, 1998), pp. 187-196Published by: American Economic AssociationStable URL: https://www.jstor.org/stable/2646946Accessed: 04-05-2019 13:38 UTC

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide

range of content in a trusted digital archive. We use information technology and tools to increase productivity and

facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

https://about.jstor.org/terms

American Economic Association is collaborating with JSTOR to digitize, preserve and extendaccess to The Journal of Economic Perspectives

This content downloaded from 151.28.174.136 on Sat, 04 May 2019 13:38:33 UTCAll use subject to https://about.jstor.org/terms

Journal of Economic Perspectives-Volume 12, Number 1-Winter 1998-Pages 187-196

Data Watch

The World Bank's Living Standards

Measurement Study Household Surveys

Margaret E. Grosh and Paul Glewwe

This section will offer a description of a data source that may be of interest to

economists. The purpose is to describe what data are available from that source or

in that subject area, what questions can be addressed because of the unique features

of the data, and how an interested reader can gain access to the data. Suggestions

for data sources that might be discussed here (or comments on past columns)

should be sent to Greg J. Duncan, Center for Urban Affairs and Policy Research,

Northwestern University, 2040 Sheridan Road, Evanston, Illinois, 60208. His e-mail

address is ([email protected]).

Introduction

The World Bank established the Living Standards Measurement Study (LSMS)

in 1980 to explore ways of improving the accuracy, timeliness and policy relevance

of household survey data collected by government statistical offices in developing

countries. The objective of LSMS surveys is to collect data on many dimensions of

household well-being that can be used to assess household welfare, understand

household behavior, and evaluate the effect of various government policies on the

living conditions of the population. This paper describes the data that have been

collected under the LSMS survey program.'

' The World Bank has also supported other household surveys, most notably those under the Social Dimensions of Adjustment (SDA) project for Africa. This paper covers only LSMS surveys because they

are the main source of household survey data at the World Bank that are available to outside scholars.

* Margaret Grosh and Paul Glewwe are Senior Economists, Development Research Group, The World Bank, Washington, D. C.

This content downloaded from 151.28.174.136 on Sat, 04 May 2019 13:38:33 UTCAll use subject to https://about.jstor.org/terms

188 Journal of Economic Perspectives

Table 1

Information on Data Sets in LSMS Archivesa

Data Information

Number of

Year of First Number of Rounds Households in

Country Access Policyb Survey Fielded to Date Sample

Albania 3 1996 1 1500

Bulgaria 1 1995 1 2500

Cote d'Ivoire 3 1985/86 4 1600

Ecuador 1 1994 2 4500/5500

Ghana 2/3 1987/88 2 3200 Guyana 3 1992/93 1 5340

Jamaica 2 1988 11 2000-6000

Kyrgyzstan 1/2 1994 3 2000

Nepal 2 1996 1 3373

Nicaragua 3 1993 1 4200

Pakistan 1 1991 1 4800

Peru 1985 1 1985/86 1 5120

Peru 1990 1 1990, '91, '94 3 1500/2200/3500

Romania 3 1994/95 continuous 36,000

Russia 1 1992 4 6500

South Africa 1 1993 1 9000

Tanzania

Kagera Region 1 1991 4 800

Tanzania

national 1 1993 1 5200

Vietnam 2 1992/93 2 4800/6000

Note: "Data from surveys either recently completed or currently underway in Brazil, China (Hebei and

Liaoning provinces) Kazakstan, Panama, Paraguay, Tunisia and Ukraine may be added to the archives

over the next two years. Similarly, data may be included from additional survey rounds in countries

already shown. The reader should note that the LSMS program has also assisted surveys in other coun-

tries, but because the governments have chosen to handle all dissemination themselves (Latvia, Lithua-

nia) or because the data we have received is incomplete or incompletely documented (Mauritania,

Morocco, Bolivia, Venezuela), we do not list these in the LSMS archives.

" Code 1 means that no prior permission from government is required to use the data. Code 2 means that such permission is required but the track record for a timely, positive response is good. Code 3

means that, according to PRDPH's best information, a substantial proportion of data requests have been

denied, left unanswered, or answered affirmatively only after substantial delays.

Since 1985, LSMS surveys have been conducted in nearly 20 developing coun-

tries, listed in Table 1. Each year, surveys are done in a few new countries, and

several countries that have already conducted surveys carry out new ones. The

World Bank provides technical assistance for LSMS surveys. Financing comes from

several sources: World Bank loans, grants from other aid agencies, and government

funds from the countries implementing the surveys. Each survey project differs in

its goals, scope, context and results: some only collect data, some also build capacity

for data collection, some even build capacity for data analysis. The cycle from plan-

This content downloaded from 151.28.174.136 on Sat, 04 May 2019 13:38:33 UTCAll use subject to https://about.jstor.org/terms

Margaret E. Grosh and Paul Glewwe 189

ning a survey, implementing it, and preparing the abstract and public use data sets

can last from 18 to 36 months, usually with 12 months used in field work. Project

costs have run from $200,000 to $3 million, with the median being about $750,000.

All this variation means that the surveys are not a strictly standardized product, but

differ among themselves. In the next section we describe a "composite prototype"

LSMS survey, though none of the surveys adheres exactly to the definition, and

several differ from it in substantial ways. After describing the typical LSMS survey,

we briefly review what has been learned from the analysis of LSMS data, discuss the

future of LSMS surveys, and explain how LSMS survey data can be obtained.

Sample Design, Data Collected and Data Quality

LSMS surveys tend to use small samples, usually on the order of 2,000 to 5,000

households. The typical approach is first to draw areas from census-based sampling

units (such as census tracts or enumeration districts), and then to draw dwellings

from a list of all dwellings within that sampling unit. LSMS surveys usually have

equal numbers of dwellings, typically 16, drawn from each sampling unit. The sam-

ples are usually representative of the country as a whole and they are large enough

to allow consideration of certain subgroups, such as rural vs. urban, or a few major

agro-climatic zones. However, the samples are rarely large enough to allow mean-

ingful analysis at the state or province level. Although larger samples would reduce

sampling errors, they would also be far more difficult to manage well and therefore

would tend to have higher non-sampling errors. Moreover, larger samples would

also greatly increase the cost of the whole survey effort.

In some countries, only one LSMS survey has been done, while in others mul-

tiple cross-sectional data sets are available. In a few countries, panel data sets with

two or three rounds of data have been collected, often separated by a year. For

example, two-year panels exist for Cote d'Ivoire and Ghana. Jamaica has a four-year

panel. The 1992-93 Tanzania survey (Kagera region) interviewed households four

times over a period of two years. The series of Peruvian surveys has a panel of

households in Lima for 1985, 1990 and 1991, and a nationwide panel exists for

1991 and 1994. By the end of 1998, there will be panel data from Vietnam for two

points in time, five years apart.

A standard, "full-size" LSMS survey normally uses three different kinds of ques-

tionnaires: a household questionnaire, a community questionnaire, and a price

questionnaire. A fourth type, a school or health facility questionnaire, is sometimes

used as well.

The household questionnaire collects comprehensive consumption data. There

are detailed questions on cash expenditures and on the value of food items grown

by the household or received as gifts. In addition, sufficient information is collected

on the ownership of housing and durable goods to calculate an annual use value.

A wide range of income information is also usually collected. For individuals in

formal sector jobs, most surveys contain detailed questions about wages, bonuses

This content downloaded from 151.28.174.136 on Sat, 04 May 2019 13:38:33 UTCAll use subject to https://about.jstor.org/terms

190 Journal of Economic Perspectives

and various forms of in-kind compensation. In most surveys, lengthy agriculture

and small enterprise modules are included to yield estimates of net household

income from these activities. Finally, non-labor income is recorded as well, such as

the receipt of private inter-household transfers, public transfers (in cash or in kind),

lottery winnings and interest income. Other parts of the household survey collect

information on outcomes and behaviors pertaining to education, health, migration,

fertility and nutrition. Collecting data on a variety of household characteristics from

the same households makes it possible both to describe the multiple dimensions

of living standards and to analyze relationships among different dimensions of

household welfare and behavior, such as the impact of parents' education on child

nutrition or the effect of health status on employment and school enrollment.

Information on local conditions that are common to households living near

each other is gathered in a community questionnaire. In most countries, these ques-

tionnaires have been used in rural areas, where local communities are easier to

define than in urban areas. The community questionnaire usually includes the

location and quality of nearby health facilities and schools, local agricultural con-

ditions and practices, prevailing wage rates for unskilled labor, and the condition

of local infrastructure such as roads, sources of fuel, water, electric power and means

of communication. The information is provided by interviewing community leaders

(mayors, headmen, village elders) and service providers (teachers, health workers,

agricultural extension agents) as appropriate depending on the governance struc-

ture in the country and the content of the questionnaire.

Price questionnaires are important because in many countries, prices vary con-

siderably among regions, so information on the prices that households actually face

is needed. Thus, most LSMS surveys include short questionnaires to record local

prices of commonly purchased goods, including food items, basic non-food goods,

medicines and selected agricultural inputs.

Sometimes detailed information on the availability, quality or prices of schools

or health clinics is desired. When this is the case, special facility questionnaires are

developed to supplement and/or replace those sections of the community

questionnaire.

Concerns are often expressed about the quality of data from developing coun-

tries. LSMS surveys are designed and carried out in ways that should produce high

quality data. For example, interviewers receive a full mionth of training prior to field

work, and supervision ratios are high-one supervisor for every two or three inter-

viewers. The design of the household questionnaire includes explicit wording of

each question, prompts, skip patterns and detailed interviewer instructions, all of

which reduces the potential for variation among interviewers. Interviewers are in-

structed to administer the schooling, health, employment, migration and fertility

modules individually to each household member, and to avoid having one adult

provide answers for another. Data entry and editing are carried out concurrently

with field work either in local field offices or by data entry operators who accompany

the field teams with laptop computers. The data are checked with custom-designed

software programs to minimize data entry errors and to detect inconsistencies and

This content downloaded from 151.28.174.136 on Sat, 04 May 2019 13:38:33 UTCAll use subject to https://about.jstor.org/terms

The World Bank's Living Standards Measurement Study 191

out-of-range values. These can usually be resolved during a second visit to the house-

hold. For in-depth discussions of LSMS data quality control procedures see Grosh

and Mufioz (1996).

These procedures have produced some impressive results (Grosh and Glewwe,

1995, offer further details). Very little data is missing; for example, in four surveys

in C6te d'Ivoire, Peru and Ghana, covering 86,827 persons, only 48 persons are

missing data on their age. There are also very few missing modules. In C6te d'Ivoire,

at least one height and weight measurement was obtained for almost 90 percent of

all household members, both children and adults. Internal consistency of the data,

which can be checked by comparing answers from different adults within a house-

hold, seems quite high. There is less evidence comparing the LSMS data to external

sources, but what there is generally finds that the averages from LSMS data are very

close to the averages from other sources (Deaton and Grosh, 1997; Bhushan, 1997).

Overall, compared to other household surveys conducted in developing countries,

the quality of LSMS data is well above average.

Research Uses

LSMS data have supported work done by hundreds of researchers in recent

years, including government analysts, World Bank staff, academics, graduate stu-

dents and independent researchers. In this section we review some recent re-

search based on LSMS data.

Much of the analysis done using LSMS data has focused on documenting reg-

ularities concerning the nature of poverty in developing countries. In particular, they

have shown that the poor are: disproportionately found in rural areas; very likely to

be self-employed (particularly in small-scale agriculture); usually less likely to be un-

employed; not necessarily likely to bear the brunt of structural adjustment programs;

more likely to have stunted (low height for age) children; less likely to report suffering

an illness in the past month; less educated; and less likely to send their children to

school.2 Such analysis has been used by the World Bank and the governments of the

countries with LSMS surveys to quantify the dimensions, causes and consequences of

poverty, and to design policies to reduce it.

Information on poverty from LSMS surveys can be used to examine the distri-

bution of benefits from price subsidies and from publicly provided services. The

health, education, housing, and consumption modules include questions about

expenditures on and use of subsidized services or commodities (schools, health

clinics, electricity, public water supply, staple foods, kerosene, and more), and the

community questionnaire provides information about the availability of services

2 These findings can be found in Glewwe (1990), Glewwe and Hall (1994) and Grootaert and Kanbur (1995). They are also documented in the various World Bank PovertyAssessments, which are summarized

in World Bank (1996).

This content downloaded from 151.28.174.136 on Sat, 04 May 2019 13:38:33 UTCAll use subject to https://about.jstor.org/terms

192 Journal of Economic Perspectives

(health clinics, schools, post offices) and infrastructure (roads, electric supply, sew-

erage networks, and so on). Accompanying background information about house-

holds can then be used to show who benefits from price subsidies and publicly

provided services (Grosh, 1994; Benjamin and Deaton, 1993).

LSMS data have been used for policy analysis on topics that reflect the policy

issues in each country. In Ecuador, for example, the data were used to help refine

poverty maps, which are used to target government programs. In Jamaica, the data

were used to reformulate the benefit levels and eligibility thresholds for the food

stamp program. In Ghana, the data were used in considering whether to subsidize

kerosene. In South Africa, the data helped in evaluating the potential of programs

such as public works and old age pensions to alleviate poverty. In Bolivia, the LSMS

data helped in evaluating the benefits to workers of employment in a labor-intensive

public works scheme. In Jamaica, the data helped evaluate the targeting of the

student loan program. Most of this work is not published, and sometimes not even

formally written up in government documents, but it is influential nonetheless.

LSMS surveys have also been used to support a great deal of academic work,

including articles in professional journals, several books, and a large number of

official government and World Bank working papers and reports. Much of this

work has focused on topics that had proved difficult to handle with previously

existing data. For example, how can one determine the price elasticity of demand

for health care in developing countries, given that in such countries the public

sector dominated service delivery and provided most services for free? The de-

tailed LSMS data on travel time to health facilities and labor market information

allowed calculating price elasticities based on travel time, which found that these

elasticities can be fairly high, especially for the poor (Gertler and van der Gaag,

1990; Alderman and Lavy, 1996). This work has implications for the extent to

which the poor should be required to pay for health care; location decisions for

public health facilities; and the issue of whether to provide higher quality health

care at a higher cost to users. Another question is whether child malnutrition

leads to substantially lower school performance. The LSMS data have been used

to investigate whether malnutrition delays enrollment in school and/or reduces

school attainment (Glewwe and Jacoby, 1995). Yet another set of questions in-

volves the labor supply behavior of rural households, which is difficult to measure

because most rural workers are self-employed on family farms. The detailed LSMS

data allow estimation of labor supply functions that incorporate productivity in

self employment and time allocation in other activities, including leisure (Jacoby,

1993; Newman and Gertler, 1994).

Table 2 lists some papers using LSMS data that have recently been published in

leading economics journals. Note that this is something of a biased sample of all the

work done using the data. It underrepresents the work done on the most recent surveys

due to the lag between data availability and publication; it overrepresents analysis done

by World Bank staff and consultants because the first countries to implement LSMS

surveys initially restricted access to the data, denying it to academic researchers; and it

underrepresents in-country and policy uses of the data because the table is based on

This content downloaded from 151.28.174.136 on Sat, 04 May 2019 13:38:33 UTCAll use subject to https://about.jstor.org/terms

Margaret E. Grosh and Paul Glewwe 193

Table 2

Recent Journal Articles Based on LSMS Data

Welfare and Poverty Analysis Glewwe and Hall (1997), Grootaert (1995), Kakwani

(1993a, 1993b), Lanjouw and Ravallion (1995),

Newman, Jorgensen and Pradhan (1991), Sahn and

Sarris (1991)

Labor Markets and Returns to Schooling Alessie et al. (1992), Angrist and Lavy (1997), Glewwe

(1996), Hoddinott (1996), Jacoby (1993), Newman

and Gertler (1994), Moll (1997), Pradhan and van

Soest (1997), Schaffner (1997), Stelcner et al. (1989),

van der Gaag and Vijverberg (1988), Vijverberg

(1993)

Pricing Policies on Health and Education Gertler and Glewwe (1990), Gertler and Sturm (1997),

Gertler and van der Gaag (1990), Seldon and

Wasylenko (1995)

Determinants of School Performance Glewwe (1997), Glewwe et al. (1995), Glewwe andJacoby

(1994, 1995), Handa (1996),Jacoby (1994), Lavy

(1996)

Savings and Consumption Smoothing Cox andJimenez (1992), Deaton (1992)

Health, Nutrition Status and Food Policies Jacoby (1997), Lavy et al. (1996), Schultz and Tansel

(1997), Strauss et al. (1993), Thomas et al. (1996) Gender Issues Deaton (1989), Higgins and Alderman (1997) Jacoby

(1995), Thomas (1994)

Fertility Ainsworth, et al. (1996), Appleton (1996), Benefo and

Schultz (1996)

information from the Journal of Economic Literature, which often excludes government

policy papers and academic dissemination within developing countries.

The Future of LSMS Surveys

When the first LSMS surveys were started in 1985 they were considered exper-

imental. Their complexity led some survey statisticians to doubt whether they could

be implemented successfully. Their feasibility has since been demonstrated. We

expect that such surveys will be undertaken in a growing number of countries and

that additional rounds will be done in countries that have already completed them.

The trend toward increasing diversity in the surveys' objectives, questionnaires and

sample design will continue.

The World Bank's Development Research Group is currently undertaking a

major evaluation of the design of LSMS surveys. Comprehensive guidelines for

planning and implementing LSMS surveys are found in Grosh and Mufioz (1996).

New and extremely detailed recommendations for questionnaire design should be

ready in early 1998. The range of survey possibilities will be much broader, since

survey designers will be able to choose different combinations of modules, and yet

the resulting surveys will still incorporate the lessons from the experience of the

This content downloaded from 151.28.174.136 on Sat, 04 May 2019 13:38:33 UTCAll use subject to https://about.jstor.org/terms

194 Journal of Economic Perspectives

past 10 to 15 years. This new set of tools should allow a wide range of actors to

develop their own LSMS surveys, by customizing the standard prototype to fit their

needs and interests.

One especially exciting technical innovation that will affect future LSMS activities

is the use of global positioning system technology based on satellite signals. This tech-

nology will make it easier to estimate distances between households and key services,

such as schools and health facilities. It should also improve the accuracy of the sample

by assisting interviewers in finding sampled households. Finally, use of global position-

ing technology will make it easier to merge LSMS data with other geographically re-

ferenced data, especially with soil, climate and topographical maps and with maps of

government provided services such as schools, clinics, roads and the like.

Obtaining LSMS Data

When the first LSMS surveys were completed in the mid- to late 1980s, the data

were often not accessible to economists outside the World Bank, since the data were

the property of the national statistical agencies of the countries carrying out the

surveys. However, many countries have now agreed to allow the World Bank's De-

velopment Research Group to distribute the data to potential users around the

world. Information is available at the LSMS web site (http://www.worldbank.org/ lsms/lsmshome.html), and is available by mail from LSMS Surveys, DECRG, World

Bank, 1818 H Street NW, Washington, D.C. 20433. The e-mail address is

([email protected]). The fax number is 202-522-1153.

Those interested should begin with the broad overview of data availability,

found both on the website and in paper form as Grosh and Glewwe (1995). A

second, more detailed, level of information is contained in the Basic Information

documents for each country that summarize the questionnaire, sample design, or-

ganization of field work, contents and structure of data sets and data access rules

for each country's survey. The actual questionnaires are also available for review.

Third, there are the actual data. Whether obtained from the web site or through

the mail, data sets can usually be obtained as ASCII, SAS-portable, or STATA files.

There is a small processing fee charged if these data are sent by mail.

In some cases, it will be necessary to receive country permission before receiv-

ing the data, which involves writing to the address for that country given at the

LSMS website. When the researcher has obtained written permission to use the data

from the country, it should be sent to the LSMS mailing address, and the data will

then be made available.

* We would like to thank Jere Behrman, Brad De Long, Carlo del Ninno, Greg Duncan,

Chris Grootaert, Alan Krueger, Martin Ravallion and Timothy Taylorfor comments on earlier

drafts. Thefindings, interpretations, and conclusions expressed in this paper are entirely those

of the authors. They do not necessarily represent the vievs of the World Bank, its Executive

Directors, or the countries they represent.

This content downloaded from 151.28.174.136 on Sat, 04 May 2019 13:38:33 UTCAll use subject to https://about.jstor.org/terms

The World Bank's Living Standards Measurement Study 195

References

Ainsworth, Martha, Kathleen Beegle, and An-

drew Nyamete, "The Impact of Women's School-

ing on Fertility and Contraceptive Use: A Study

of Fourteen Sub-Saharan African Countries,"

World Bank Economic Revietv, 1996, 10i1, 85-122. Alderman, Harold, and Victor Lavy, "House-

hold Responses to Public Health Services: Cost

and Quality Tradeoffs," World Bank Research Ob-

server, 1996, 11:1, 3-22.

Alessie, Rob, Paul Baker, Richard Blundell,

Christopher Heady, and Costas Meghir, "The

Working Behavior of Young People in Rural C6te

d'Ivoire," World Bank Economic Review, 1992, 6:1, 139-54.

Angrist, Joshua, and Victor Lavy, "French Lan-

guage Skills and the Return to Schooling in Mo-

rocco," Journal of Labor Economics, 1997, 15:1, S48-S76.

Appleton, Simon, "How Does Female Educa-

tion Affect Fertility? A Structural Model for C6te

d'Ivoire," Oxford Bulletin of Economics and Statis-

tics, 1996, 58:1, 139-66.

Benefo, Kofi, and T. Paul Schultz, "Fertility

and Child Mortality in C6te d'Ivoire and

Ghana," World Bank Economic Reviewv, 1996, 10:1, 123-58.

Benjamin, Dwayne, and Angus Deaton, "Household Welfare and the Pricing of Cocoa

and Coffee in Cote d'Ivoire: Lessons from the

Living Standards Surveys," World Bank Economic

Review, 1993, 7:3, 293-318.

Bhushan, Indu, LSMS Surveys and Economic De-

mography Analysis: A Critical Reviewv. World Bank, Washington D.C., 1997.

Cox, Donald, and EmmanuelJimenez, "Social

Security and Private Transfers in Developing

Countries: The Case of Peru," World Bank Eco-

nomic Review, 1992, 6:1, 155-69.

Deaton, Angus, "Looking for Boy-Girl Discrim-

ination in Household Expenditure Data," World

Bank Economic Review, 1989, 3:1, 1-15. Deaton, Angus, "Household Savings in LDC's:

Credit Markets, Insurance and Welfare," Scan-

dinavian Journal of Economics, 1992, 94:2, 253-73. Deaton, Angus, and Margaret Grosh, "Measur-

ing Consumption," in Margaret Grosh and Paul

Glewwe, eds., DesigningHousehold Survey Question-

naires for Developing Countries: Lessons from Ten Years of LSMS Experience. Draft book manuscript,

Policy Research Department, The World Bank,

Washington, D.C., 1997. Gertler, Paul, and Paul Glewwe, "The Willing-

ness to Pay for Education in Developing Coun-

tries: Evidence from Rural Peru,"Journal of Public

Economics, 1990, 42:2, 251-75.

Gertler, Paul, and Jacques van der Gaag, The

Willingness to Pay for Medical Care: Evidence from

Two Developing Countries. Baltimore: Johns Hop-

kins University Press, 1990.

Gertler, Paul, and Roland Stunn, "Private

Health Insurance and Public Expenditures inja- maica," Journal of Econometrics, 1997, forthcom-

ing.

Glewwe, Paul, Improving Data on Poverty in the

Third World: The World Bank's Living Standards

Measurement Study. Policy, Research and External

Affairs Working Paper No. 416. The World Bank,

1990.

Glewwe, Paul, "The Relevance of Standard Es-

timates of Rates of Return to Schooling for Ed-

ucation Policy: A Critical Assessment," Journal of

Development Economics, 1996, 51:2, 267-90.

Glewwe, Paul, The Economics of School Quality

Investments in Developing Countries: An Empirical Study of Ghana. Indianapolis: Macmillan Press,

1997, (forthcoming).

Glewwe, Paul, Margaret Grosh, Hananjacoby, and Marlaine Lockheed, "An Eclectic Approach

to Estimating the Determinants of Achievement

in Jamaican Primary Education," The World Bank

Economic Review, 1995, 92, 231-58.

Glewwe, Paul, and Gillette Hall, "Poverty, In-

equality and Living Standards during Unortho-

dox Adjustment: The Case of Peru, 1985-1990,"

Economic Development and Cultural Change, 1994,

42:4, 689-717.

Glewwe, Paul, and Gillette Hall, "Are Some

Groups More Vulnerable to Macroeconomic

Shocks Than Others? Hypothesis Tests Based on

Panel Data from Peru," Journal ofDevelopmentEco-

nomics, 1997, forthcoming.

Glewwe, Paul, and Hanan Jacoby, "Student

Achievement and Schooling Choice in Low In-

come Countries: Evidence from Ghana," Journal

of Human Resources, 1994, 29.3, 843-64. Glewwe, Paul, and Hanan Jacoby, "An Eco-

nomic Analysis of Delayed Primary School En-

rollment in a Low Income Country: The Role of

Early Childhood Nutrition," Review of Economics

and Statistics, 1994, 77:1, 56-69.

Grootaert, Christiaan, "Structural Change

and Poverty in Africa: A Decomposition Analysis

for Cote d'Ivoire," Journal ofDevelopmentEconom- ics, 1995, 47:2, 375-401.

Grosh, Margaret E., Administering Targeted So- cial Programs in Latin America: From Platitudes to Practice. World Bank, Washington, D.C., 1994.

Grosh, Margaret E., and Paul Glewwe, A Guide

to Living Standards Measurement Study Surveys and

This content downloaded from 151.28.174.136 on Sat, 04 May 2019 13:38:33 UTCAll use subject to https://about.jstor.org/terms

196 Journal of Economic Perspectives

Their Data Sets. Living Standards Measurement

Study Working Paper No. 120. World Bank,

Washington, D.C., 1995.

Grosh, Margaret E., and Juan Muiioz, A Man-

ual for Planning and Implementing the Living Stan-

dards Measurement Study Survey. Living Standards

Measurement Study Working Paper No. 126.

World Bank, Washington, D.C., 1996.

Handa, Ashu, "Maternal Education and Child

Attainment in Jamaica: Testing the Bargaining

Power Hypothesis," Oxford Bulletin of Economics

and Statistics, 1996, 58:1, 119-37.

Higgins, Paul, and Harold Alderman, "Labor

and Women's Nutrition: A Study of Energy Expen-

diture, Fertility and Nutritional Status in Ghana,"

Journal of Human Resources, 1997, forthcoming.

Hoddinott, John, "Wages and Unemployment

in an Urban African Labour Market," Economic

Journal, 1996, 106:439, 1610-1626.

Jacoby, Hanan, "Shadow Wages and Peasant

Family Labour Supply: An Econometric Appli-

cation to the Peruvian Sierra," Review ofEconomic

Studies, 1993, 60:4, 903-21.

Jacoby, Hanan, "Borrowing Constraints and Pro-

gress Through School: Evidence from Peru," Re-

view of Economics and Statistics, 1994, 76:1, 151-60.

Jacoby, Hanan, "The Economics of Polygyny

in Sub-Saharan Africa: Female Productivity and

the Demand for Wives in Cote d'Ivoire,"Journal

of Political Economy, 1995, 103:3, 938-71. Jacoby, Hanan, "Self-Selection and the Redis-

tributive Impact of In-Kind Transfers, " Journal of

Human Resources, 1997, 32:2, 233-49. Kawkani, Nanak, "Statistical Inference in the

Measurement of Poverty," Reviezv of Economics

and Statistics, 1993a, 74:4, 632-39.

Kakwani, Nanak, "Poverty and Economic

Growth, with Application to C6te d'Ivoire," Re-

viezv of Income and Wealth, 1993b, 39:1, 121-39. Lanjouw, Peter, and Martin Ravallion, "Pov-

erty and Household Size," Economic Journal,

1995, 105:433, 1415-1434.

Lavy, Victor, "School Supply Constraints and

Children's Educational Outcomes in Rural

Ghana," Journal of Development Economics, 1996,

51:2, 291-314.

Lavy, Victor, Duncan Thomas, John Strauss,

and Philippe de Vreyer, "Quality of Health Care,

Survival and Health Outcomes in Ghana, Journal

of Health Economics, 1996, 15:3, 333-58.

Moll, Peter, "Primary Schooling, Cognitive

Skill, and Wages in South Africa," Economica,

1997 (forthcoming).

Newman, John, and Paul Gertler, "Family Pro-

ductivity, Labor Supply, and Welfare in a Low-

Income Country," Journal of Human Resources,

1994, 29:4, 989-1026.

Newman, John, Steen Jorgensen, and Menno

Pradhan, "How Did Workers Benefit from Bo-

livia's Emergency Social Fund?" World Bank Eco-

nomic Reviezv, 1991, 5:2, 367-93.

Pradhan, Menno, and A.H.O. van Soest,

"Household Labor Supply in Urban Areas of Bo-

livia," Reviezv of Economics and Statistics, 1997

(forthcoming).

Sahn, David, and Alexander Sarris, "Structural

Adjustment and the Welfare of Rural Smallhold-

ers: A Comparative Analysis from Sub-Saharan

Africa," World Bank Economic Review, 1991, 5:2,

259-89. Schaffner, Julie, "Premiums to Employment in

Larger Establishments: Evidence from Peru," Jour-

nal of Develonnent Economics, 1997 (forthcoming). Schultz, T. Paul, and Aysit Tansel, "Wage and

Labor Supply Effects of Illness in C6te d'Ivoire

and Ghana: Instrumental Variable Estimates for

Days Disabled," Journal of Development Economics,

1997. forthcoming.

Selden, Thomas, and Michael Wasylenko,

"Measuring the Distributional Effects of Public

Education in Peru," in D. van de Walle and K

Nead, eds., Public Spendingand the Poor: Theory and

Evidence. Baltimore: Johns Hopkins University

Press, 1995.

Stelcner, Morton, Jacques van der Gaag, and

Wim Vijverberg, "A Switching Regression Model

of Public-Private Sector Wage Differentials in

Peru," Journal of Human Resources, 1989, 24:3,

545-59.

Strauss, John, Paul Gertler, Omar Rahman,

and Kristin Fox, "Gender and Life-Cycle Differ-

entials in the Patterns and Determinants of

Adults' Health," Journal of Human Resources,

1993, 28, 791-831. Thomas, Duncan, "Like Father, Like Son: Like

Mother, Like Daughter: Parental Resources and

Child Height," Journal of Human Resources, 1994,

29.4, 950-88. Thomas, Duncan, Victor Lavy, and John

Strauss, "Public Policy and Anthropometric Out-

comes in the C6te d'Ivoire," Journal ofPublicEco- nomics, 1996, 61, 155-92.

van der Gaag, Jacques, and Wim Vijverberg,

"A Switching Regression Model for Wage Deter-

minants in the Public and Private Sectors of a

Developing Country," Review ofEconomics and Sta- tistics, 1988, 702, 371-81.

Vijverberg, Wim, "Educational Investments

and Returns for Women and Men in C6te

d'Ivoire," Journal of Human Resources, 1993, 28:4, 933-74.

World Bank, Poverty Reduction and the World Bank: Progress and Clhallenges in the 1990's. World Bank, Washington, D.C., 1996.

This content downloaded from 151.28.174.136 on Sat, 04 May 2019 13:38:33 UTCAll use subject to https://about.jstor.org/terms

20731May 2000

Volume onet. W.

a DaUI

'0 FD ~~~esigning

Household Surveys-"'< Questionnaires for?n CDeveloping Countries

Lessons from 1 5 years of the0D' Living Standards Measurement Study

m

0-

Edited by Margaret Grosh and Paul Glewwe

r1

The World Bank

Oxford

TheWorld

Pub

lic D

iscl

osur

e A

utho

rized

Pub

lic D

iscl

osur

e A

utho

rized

Pub

lic D

iscl

osur

e A

utho

rized

Pub

lic D

iscl

osur

e A

utho

rized

Pub

lic D

iscl

osur

e A

utho

rized

Pub

lic D

iscl

osur

e A

utho

rized

Pub

lic D

iscl

osur

e A

utho

rized

Pub

lic D

iscl

osur

e A

utho

rized

Contents

Foreword ix

Acknowledgments xi

Contributors xiii

Volume I

Part I Survey Design

1. Introduction 5

Margaret Grosh and Paul Glewwe

2. Making Decisions on the Overall Design of the Survey 21

Margaret Grosh and Paul Glewwe

3. Designing Modules and AssemblingThem into Survey Questionnaires 43

Margaret Grosh, Paul Glewwe, andJuan Munoz

Part 2 Core Modules

4. Metadata-Information about Each Interview and Questionnaire 77

Margaret Grosh and Juan Munoz

5. Consumption 91

Angus Deaton and Margaret Grosh

6. Household Roster 135

Paul Glewwe

7. Education 143

Paul Glewwe

8. Health 177

Paul j. Gertler, Elaina Rose, and Paul Glewwe

9. Employment 217

Julie Anderson Schaffner

10. Anthropometry 251

Harold Alderman

v

CONTENTS

11. Transfers and Other Nonlabor Income 273

Andrew McKay

12. Housing 293

Stephen Malpezzi

13. Community and Price Data 315

Elizabeth Frankenberg

Volume 2

Part 3 Additional Modules

14. Environmental Issues 5

Dale Whittington

15. Fertility 31

Indu Bhushan and Raylynn Oliver

16. Migration 49

Robert E B. Lucas

17. Should the Survey Measure Total Household Income? 83

Andrew McKay

18. Household Enterprises 105

Wim P. M. Vijverberg and Donald C. Mead

19. Agriculture 139

Thomas Reardon and Paul Glewwe

20. Savings 183

Anjini Kochar

21. Credit 211

Kinnon Scott

22. Time Use 249

Andrew S. Harvey and Maria Elena Taylor

Part 4 Special Topics

23. Recommendations for Collecting Panel Data 275

Paul Glewwe and Hanan Jacoby

24. Intrahousehold Analysis 315

Nobuhiko Fuwa, Shahidur R. Khandker, Andrew D. Mason, and Tara Vishwanath

25. Qualitative Data Collection Techniques 337

Kimberly Chung

26. Basic Economic Models and Econometric Tools 365

Jere R. Behrman and Raylynn Oliver

Volume 3 Draft Questionnaire ModulesIntroduction I

Module for Chapter 4: Metadata 5

Margaret Grosh andJuan Muinoz

Module for Chapter 5: Consumption 15

Angus Deaton and Margaret Grosh

vi

CONTENTS

Module for Chapter 6: Household Roster 3 1

Paul Glewwe

Module for Chapter 7: Education 37

Paul Glewwe

Module for Chapter 8: Health 73

PaulJ. Gertler, Elaina Rose, and Paul Glewwe

Module for Chapter 9: Employment 147

Julie Anderson Schaffner

Module for Chapter 10:Anthropometry 219

Harold Alderman

Module for Chapter I :Transfers and Other Nonlabor Income 221

Andrew McKay

Module for Chapter 12: Housing 229

Stephen Malpezzi

Module for Chapter 13: Community Data 247

Elizabeth Frankenberg

Module for Chapter l4: Environment 285

Dale Whittington

Module for Chapter 15: Fertility 325

lndu Bhushan and Raylynn Oliver

Module for Chapter 16: Migration 333

Robert E. B. Lucas

Module for Chapter 18: Household Enterprise 349

Wim P M. Vijverberg and Donald C. Mead

Module for Chapter 19:Agriculture 407

Thomas Reardon and Paul Glewwe

Module for Chapter 20: Savings 453

Anjini Kochar

Module for Chapter 21: Credit 461

Kinnon Scott

Module for Chapter 22:Time Use 483

Andrew S. Harvey and Maria Elena Taylor

Module for Chapter 23: Panel Data 495

Paul Glewwe and Hanan Jacoby

vii

Making Decisions on the Overall Design2 of the Survey

Margaret Grosh and Paul Glewwe

Comprehensive, multitopic household surveys such as LSMS surveys usually consist of three sep-arate questionnaires: a household questionnaire, a community questionnaire, and a price question-

naire. Each questionnaire is composed of modules, sections that collect information on a specifictopic. Questionnaires and their modules can be combined in a variety of different ways to createa multitopic household survey.There is no single right way to combine modules and question-naires into a survey; each way has advantages and disadvantages. The key is to choose a designthat provides the best fit given the objectives of, and constraints on, the proposed survey.

The starting point for designing the modules and The third step is to work out, question by ques-questionnaires to be used in a multitopic survey is a set tion, draft questionnaires for each module that will beof policy questions.The overall objective of each sur- included in the survey. This can be done by drawingvey is to collect the data needed to answer these ques- on the detailed recommendations in the chapters intions. There are five steps involved in survey design. Parts 2 and 3 of this book, as well as the draft modulesThe first step is to define the fundamental objectives included in Volume 3. The fourth step is to compareof the survey and to choose an overall survey design the modules to each other to ensure that they are con-that best fits these objectives. This is usually done not sistent and well integrated, and to combine them intoby a single individual but by a team of survey design- draft household, community, and price questionnairesers who consult extensively with a broad range of (in some cases omitting the community question-individuals and organizations interested in the survey. naire). The fifth and final step is to translate and fieldIn choosing the overall design, the team must take into test the draft questionnaires. Translation may not beaccount several important factors including the capac- necessary in some countries; field testing is alwaysity for collecting data within the country, the funding essential and must not be done quickly or superficial-available, and the amount and quality of data available ly. The first two steps are discussed in this chapter. Thefrom other sources. third, fourth, and fifth steps are discussed in Chapter 3.

The second step involves deciding which modules While these five steps should ideally be done into include in the survey, specifying the objectives of the order given above, in reality there is likely to beeach module, and proposing an approximate length for substantial movement backward and forward amongthe modules. This step is needed because a survey that the various steps. Some objectives originally set out forattempts to include all possible modules will be too the survey may prove impossible to achieve given thelarge and complex to implement. constraints. And discussion of the detailed objectives

21

MARGARET GROSH AND PAUL GLEWWE

for each module may cause the survey design team to

reassess the overall objectives of the survey. In other Box 2.1 The Importance of aWell-Rounded Design

words, it may be necessary to take one or two steps Team

backward at some points in order to continue to move An effective survey design team must include researchers

forward. This is to be expected and even encouraged. and policy analysts, policymakers, and staff from the organi-

As more is learnecd about what can and cannot be zation implementing the surveyThe problems that can arise

done, survey designers are more likely to produce a when one or more of these groups is not involved in

survey design that meets their objectives-which may designing the questionnaires are illustrated by the experi-

also have become more realistic. It is better to pare ence of an LSMS survey implemented n Jamaica. The

down the number of objectives in order to achieve household questionnaire for the first Jamaican Survey ofLiving Conditions (implemented in 1988) was designed pri-

some of them than to attempt to do too much and, as marily by international experts who had little knowledge of

a result, achieve few or none of the original objectives. Jamaican social programs. Although the househo d ques-

The first four sections of this chapter cover the tionnaire was largely successful in accomplishing its analyti-

first step of the survey design process. The first section cal objectives, it had two serious fiaws. First, although food

provides an overview of who should be involved in subsidy policies were an important issue at the time, the

designing and assembling the questionnaires. The sec- consumption module did not clearly distinguish expendi-

ond section discusses the main factors that survey tures on key subsidized food items from expenditures ondesigners should take into account when choosing similar nonsubsidized items. This made it more difficult todesigners shoulveydtakesignoptionsh ao whenchionost study the incidence of food subsidies. Second, the ques-among survey design options. The third section out- tionnaire asked respondents about their receipt of food

lines "core" elements that must be included in any stamps during the previous month even though food

LSMS or similar multitopic survey and reviews sever- stamps were provided only every two months.This made it

al classic survey designs, each of which supplements difficult to identify which househods had received food

the core in a different way.The fourth section presents stamps, thus hindering the study of another issue importantguidelines for choosing the survey design most appro- at the time. Fortunately, these flaws were identified and cor-

priate for each of a range of different circumstances. rected in the following year's househo d questionnaire.

The fifth and final section explains the second step

of the survey process: choosing the modules to include rily by researchers and policy analysts, and much of the

in the survey, setting objectives for each module, and success of past LSMS surveys in supporting policy-rel-

setting the approximate length of the modules. evant research is due to the fact that the surveys used

questionnaires designed by people who would be

Organizing a Survey Design Team actively involved in the analysis of the data.

Researchers and policy analysts can ensure that the

The most important factor ensuring the success of a information collected in multitopic surveys is well

multitopic household survey is the involvement of the suited for policy research.

right people in the process. Designing the survey The lead role in designing the questionnaires of

questionnaires is much too large a task for one person. an ESMS or similar multitopic household survey

Instead, a team of experts must be involved, including should be given to a small group of researchers and

members of the organization implementing the survey policy analysts who share two characteristics. First,

as well as research analysts from other institutions. If they should know what issues are of most concern to

the team does not contain a sufficient diversity of the country's policymakers. Second, they should have

experts, this can have negative repercussions for the experience in using data from similar surveys to ana-

data (Box 2.1). The design team must work together lyze these issues. The group of researchers and policy

with policymakers and program managers to define analysts should include members of the national plan-

the overall objectives of the survey and to settle on ning agency, representatives from the national statistics

many details at each step of the survey design process. agency, local academic researchers, and one or more

people who have helped analy7e or design multitopic

Researchers and PolicyAnalysts surveys in other countries.

It is essential to involve researchers and policy analysts The team must include individuals with extensive

in questionnaire design.This book was written prima- experience in implementing and analyzing other

22

CHAPTER 2 MAKING DECISIONS ON THE OVERALL DESIGN OF THE SURVEY

household surveys in the country in question. Ideally, people should be consulted before the modules are

local researchers and policy analysts should take pri- created, and they should also be shown draft modules

mary responsibility for designing the survey, because to elicit comments.

they have an intimate knowledge of the country's cul- Unfortunately, in many previous LSMS surveys

ture, economy, and society, and they are very familiar the survey design team did not give enough attention

with existing programs and key policy issues. Local to communicating or consulting with policymakers.

researchers and policy analysts are also likely to know Policymakers, who are often unfamiliar with house-

about previous surveys done in the country that have hold surveys, may find it difficult to read complicated

covered some of the topics included in the new sur- questionnaires or to imagine what analyses the result-

vey. And they will know which people and institutions ing data could support. One option is to show policy-

should be consulted during the survey design process. makers and program managers examples of the kinds

It may also be desirable to involve international of tables and analyses that could be produced using

researchers in the design of the questionnaire, espe- data from the questionnaire; these might be either

cially in countries where local data analysts are not hypothetical tables for the country of the survey or

familiar with LSMS and other multitopic household tables made in other countries using data from similar

surveys. International researchers can contribute their surveys. Another strategy is to show policymakers a

experience about what has and has not worked in sur- report based on the first year's data; this is an excellent

veys in other countries. 1 Judicious use of the advice of way to obtain policymakers' feedback on the design of

both local and foreign experts will significantly follow-up surveys.A third strategy is simply to ask pol-

improve survey design. icymakers what they need to know to implement

Past LSMS surveys have probably made insuffi- effective policies.

cient use of the knowledge available from local

rcscarchcrs and policy analysts.Too often thc involvc- Data Producers

ment of local professionals has been limited to statisti- It is critical that the survey design team include staff

cians from the statistical agency (data producers) and from the organization implementing the survey. This

thus failed to draw on the expertise of social policy should ensure that the questionnaires designed are

researchers from the government or academia (data workable. Often data collection can be greatly sim-

users). Statisticians may have only a limited knowledge plified by making minor changes in the layout or

of sectoral policy issues and programs. While they do flow of a questionnaire, changes that do not diminish

have an important role to play, their input must be the questionnaire's analytical content. Data producers

combined with input from data users to set priorities are an excellent source of suggestions for such

among the different possible objectives for policy changes. They are usually also experienced in details

research. of designing a questionnaire, such as questionnaire

formatting. For all of the above reasons, the team

Policymakers members from the organization implementing the

When defining the fundamental objectives of the sur- survey should help design, or comment on, every

vey, the team responsible for drafting questionnaires draft of the questionnaire.

must seek extensive input from policymakers and pro- It is also useful for the survey design team to solic-

gram managers in the country being surveyed. The it the input of experienced field supervisors, who will

team's initial discussions with policymakers should notice whether the instructions to the interviewer are

focus in broad terms on the most important issues to clear, whether the skip codes are correct, and whether

be covered, which will determine the relative size of the format is consistent. There is of course a natural

the different modules in the questionnaires. After this tension between data analysts, who want comprehen-

round of discussions, further discussions should be sive information, and field supervisors, who are likely

held to identify the important issues within each sec- to see all of the disadvantages but few of the advan-

tor. Since drafting the module or modules for each tages of administering a lengthy, complex question-

sector requires a substantial amount of knowledge naire. Each side must be prepared to make compro-

about how specific programs work, technical experts mises and carefully listen to the other side's point of

in many program agencies must be consulted. These view.

23

MARGARET GROSH AND PAUL GLEWWE

Factors for Deciding among Various Survey Structural analysis of descriptive data can sometimes

Designs be used to draw conclusions about the likely impact of

government policies on living standards. Examples of

After the members of the survey design team have such analyses are the "poverty profiles" typically pro-

been selected, work can begin on designing the survey. vided in the World Bank's poverty reports.

The first task for team members is to review the fac- In both types of descriptive analysis, the range of

tors that influence the overall design of the survey. variables used to measure living standards can vary

This section discusses those factors in detail. widely; variables may he used from virtually all of the

The appropriate design of a household survey or survey modules or from only a small subset of mod-

sequence of household surveys differs from country to ules. In general, most of the variables included in sta-

country. The most important factors for determining tistical abstracts and descriptive analyses come straight

the design of a proposed survey are: the kinds of poli- from the questionnaire (for example, percentage of

cy issues the survey aims to address; the information households that have electricity) or require only a

available from existing surveys and other data sources; small amount of manipulation (for example, nutrition-

the country's institutional capacity for collecting data; al status as derived from weight and height data). Only

and the financial and other resources available for one "complex" variable needs to be constructed: total

implementing the survey, including any constraints on household consumption. Other complex constructed

how these resources can be used. variables, such as total income or net wealth, are used

less often in simple descriptive presentations.

Policy Issues

The design of a household survey should reflect the MONITORING POVERTY AND LIVING STANDARDS. The

policy issues it is intended to address. One xvay to clas- descriptive analyses discussed above focus on living

sify policy issues is in terms of their subject matter, standards at one point in time. However, another

such as health, education, employment, or housing. important role of multitopic household surveys is to

Another way to classify policy issues is in terms of the monitor how living standards and poverty change over

kinds of data used to address them. The four most time.When data are used for this purpose they must

common kinds of household survey analysis used to be comparable over time; for this to be the case, the

address policy issues are: simple descriptive statistics on data must be gathered using the same methods each

living standards; monitoring poverty and living stan- time the survey is implemented. One aspect of such

dards over time; describing the incidence and coverage consistency concerns the design of the sample, which

of government programs; and measuriing the impacts in each case must use the saime definitionis of basic

of policies and programs on household behavior and concepts such as the distinction between urban and

welfare. This subsection reviews these four types of rural areas. A second requirement for comparability is

analysis and provides a practical example of how the that the questions defining variables of interest must

information needed affects the design of the survey. remain the same each time the survey is administered.

This is necessary because seemingly innocuous

SIMPLE DESCRIPTIONS OF LIVING STANDARDS. The changes in the wording of questions can lead to seri-

mnost straightforward objective for a household survey ous comparability problems; changing the recall peri-

is to describe the living standards of the population at od for food expenditures can make it impossible to

one point in time, often with particular emphasis on compare estimates of poverty and inequality over

the living standards of the poor. This can be done by time.

using the data to tabulate means and frequencies of Another issue to consider when monitoring

key variables. The results of these tabulations are often poverty and living standards over time is the frequen-

disseminated by the national statistical agency in the cy with which indicators of living standards must be

form of statistical abstracts (reports) that contain a monitored. Indicators that are fairly stable over short

large number of tables and a minimal amount of periods of time-stuch as fertility and adult literacy-

descriptive text. It is also possible to produce more need not be measured each time the survey is done.

structured descriptive analyses that supplement house- However, indicators that can change more quickly,

hold survey data with information from other sources. such as consumption expenditure, children's nutrition-

24

CHAFPER 2 MAKING DECISIONS ON THE OVERALL DESIGN OF THE SURVEY

al status, and employment status, should be measured in great detail and provide many examples of issues

every time the survey is implemented. Surveys that that require the modeling of household behavior. The

monitor poverty and living standards over time are following questions give an idea of the range of poli-

typically fielded every year, although it is also possible cy issues that can be addressed:What is the impact of

to field them biannually or semiannually. charging user fees at government health clinics on the

use of those clinics by adults and by children? How

EXAMINING THE INCIDENCE AND COVERAGE OF can the government encourage parents to enroll their

GOVERNMENT PROGRAMS. Data from multitopic children in school? What are the impacts of women's

household surveys can also be used to measure the employment opportunities on their fertility? How do

incidence and coverage of specific government pro- changes in prices brought about by structural adjust-

grams. For example, data on the enrollment of house- ment programs affect the welfare and productivity of

hold members in public schools are useful for investi- agricultural households?

gating which children benefit from the provision of

public schooling. Household survey data can also be ExAMPLE SCENARiO: THE INCLUSION OR EXCLUSION OF

used to study participation in government assistance THE ANTHROPOMETRY MODULE. The decision about

programs such as food stamps, cash assistance, and including an anthropometry module demonstrates

school meals. Another example is descriptive statistics how the analytical potential of a multitopic household

on purchases of subsidized food items, which can be survey is related to its content. (See Chapter 10 for

used to examine whether the benefits of specified detailed information on the collection of anthropo-

food subsidies vary by households' levels of income. metric data.) If an anthropometry module is not

The incidence and coverage of these different kinds of included, the survey is not useful for studying nutri-

programs are easy to calculate and useful for policy- tion issues. However, by collecting limited anthropo-

makers to know. metric data such as the height and weight of children

A moderate sample size (2,000 to 5,000 house- under five years of age, the survey will allow analysts

holds) should be sufficient to evaluate programs that to describe the extent and patterns of malnutrition in

affect a large proportion of the population. Evaluating early childhood. If the country studied has adopted

programs that serve a small proportion of the popula- large-scale food distribution or subsidy programs, the

tion usually requires using a much larger sample of data can also be used to assess how well these programs

households or including a disproportionately large are targeted to undernourished children. If the survey

number of target and beneficiary households in the is repeated annually or biannually, it becomes possible

sample. to monitor changes in both the nutritional status of

the population and the targeting of the program.Thus

ESTIMATING THE IMPACT OF POLICIES ON HOUSEHOLD the data collected from a limited anthropometry mod-

BEHAVIOR AND WELFARE. Policymakers are often faced ule can address, at least partially, three of the four types

with questions that can be answered only by analyzing of policy questions outlined above.

household behavior. Policymakers may want to know A full version of the anthropometry module

how changes in commodity taxes or subsidies would would collect data on height and wcight for all house-

affect agricultural production or the consumption of hold members, not just children. Such data could be

basic food items. Answering such questions requires used not only to gauge adult health but also to analyze

calculating price elasticities and thus modeling house- the impact of government policies on household wel-

holds' production and consumption decisions. Such fare and behavior. Suppose policymakers want to pre-

modeling requires data that go well beyond measure- dict the impact of food programs on children's nutri-

ment of living standards indicators. tional status. This requires estimating the determinants

In multitopic household surveys that attempt to of child weight and height. Because heredity is so

model household behavior, each module that collects important, parental height and weight information are

data on a behavior of interest is usually designed to needed to estimate these relationships accurately; lack

gather information that can be used to estimate the of data on parents' weight and height could lead to

impact of several different policy changes. The chap- estimates that suffer from omitted variable bias and

ters in Parts 2 and 3 of this book discuss each module thus do not accurately show the impact of the food

25

MARGARET GROSH AND PAUL GLEWWE

programs on children's nutritional status. In general, survey is to describe living standards and recent

including the full version of the anthropometry mod- anthropometric data are already available from anoth-

ule in the survey-measuring both adults and er survey, it may seem reasonable to drop anthropo-

children-greatly expands the possibilities for examin- metric measurements from the new survey. However,

ing the impact of government policies on household there are two important advantages to collecting

behavior and living standards. anthropometric data in the new survey. First, collect-

Defining the objectives of a survey is often a less ing these data would make it possible to produce

tidy process than the discussion so far has implied. descriptive tables that show simple relationships

Institutions, and people within institutions, may have between nutritional status, as revealed by anthropo-

varying objectives. Each sectoral ministry in a country metric measurements, and other variables of interest-

is likely to be primarily interested in its own subject. for example, household expenditure levels. Second,

The government as a whole may want the surveys to collecting anthropometric data in the new survey

measure or monitor only a few indicators of welfare, would ensure that the anthropometric data used the

while academics in the country's universities and other same definitions and classification schemes as other

research institutions may want the surveys to yield the survey data, and thus could be used to draw effective

detailed data needed to model household behavior. If comparisons. If the two surveys classified, say, educa-

international agencies are financing the survey, they tion levels or rural and urban areas differently, this

may have still another set of objectives. For example, would make it difficult to present analyses from the

they may wish to ensure that the data are comparable two surveys side by side in ways that xvould be simple

with similar data from other countries or that the data to interpret. Analysis based on combining results from

can be used to study issues of interest to the develop- separate surveys will usually be more difficult, and thus

ment community in general, even if these issues are more prone to error, than analysis based on data that

not a high priority in the country of the survey. have all been collected in a single survey.

Whatever the objectives envisaged when the survey is The case for collecting anthropometric data is

first designed, it is likely that researchers will later use even stronger if the purpose of a new survey is to

the data for other analytical purposes. investigate the impact of nutritional status on other

The multiple (and sometimes competing) objec- socioeconomic outcomes (such as education, fertility,

tives of household surveys are to be expected and even or labor force productivity).This objective implies that

encouraged, since each of the groups with a stake in the survey must include an anthropometry module,

these surveys has its own legitimate priorities.The task even if recent information on nutritional status is

of survey designers is to accommodate the different available from other sources.To conduct these kinds of

objectives as much as possible without compromising analyses, the variables of interest must all come from

the quality of the survey. the same household survey!2

Although it is essential that data on key

Other Information Available and Its Relation to Survey household-level variables come from the same house-

Objectives holds, it is often useful to supplement household sur-

No household survey takes place in a vacuum. In most vey information using data from a source other than

countries there are several other household surveys a multitopic survey. In some cases, price data collect-

that have gathered or will gather information on issues ed for generating consumer price indices can replace

that the new multitopic survey is intended to cover. the price questionnaire typically used in LSMS and

The extent to which data from these sources influence other multitopic household surveys (see Chapter 13).

the design of the new questionnaire depends on the Other such alternative data sources are time-series

amount and type of' data available and on the objec- data on weather and maps of soil quality and topog-

tives of the new survey raphy, all of which can be used for analy7ing agricul-

If the main objective of the new survey is to tural issues. In health and education, further possibil-

describe various aspects of the living standards of the ities arise for matching household survey data with

population, it may seem that the topics already covered data from other sources; some countries collect data

in other surveys needl not bc included in the new mul- from health clinics and schools that may be matched

titopic survey. For example, if the only goal of the new with the communities covered in a household survey.

26

CHAPTER 2 MAKING DECISIONS ON THE OVERALL DESIGN OF THE SURVEY

However, survey designers should exercise caution collecting data in the country undertaking the survey.

when contemplating this approach. Although match- Because maintaining data quality becomes more diffi-

ing data from different sources appears simple, it is cult when surveys become more complex, this capac-

often very difficult in practice. Many of the chapters ity should be carefully assessed when planning LSMS

in Parts 2 and 3 of this book discuss the potential for or similar multitopic surveys, which can be very com-

matching household survey data to data from other plicated. In countries where the capacity to collect

sources. data is weak, it may be better to implement a limited

An important question that often arises when multitopic survey yielding reliable data on a relatively

planning a multitopic survey is whether such a survey small number of topics than an overly ambitious sur-

can replace one or more existing surveys and thereby vey that could yield unreliable data on a wide variety

reduce total costs without any loss of information. A of topics.

multitopic survey with an anthropometric module A survey containing 10 modules is easier to plan

could replace a periodic anthropometric survey prima- and implement than a survey containing 15 or 20

rily intended to measure the extent of malnutrition. modules.The fewer the modules, the less time is need-

However, a multitopic survey cannot replace all other ed by survey planners to contact different sectoral

household surveys. Labor Force Surveys often require agencies and thus the less time is needed to build con-

much larger sample sizes and more frequent data col- sensus. Also, smaller questionnaires require less time to

lection than would be appropriate for a multitopic sur- design, and less time to carry out the fieldwork, enter

vey. And specialized surveys, such as Demographic and the data, and manage the database. However, other

Health Surveys and comprehensive farm management steps in developing and implementing a household

surveys, contain much more data on those topics than survey, such as planning the sample design, do not vary

can usually be collected in LSMS and similar multi- with the size of the questionnaire. Therefore, a survey

topic surveys. Still other surveys, such as farm surveys with a questionnaire half the size of the questionnaire

and small business surveys, have very different sampling for a full multitopic household survey will involve

frames since they are based on samples of farms or substantially more than half the effort required for a

businesses rather than samples of households. full survey.

A final issue is whether survey designers should Despite the complexities of full multitopic sur-

implement an entirely new survey, modify an ongoing veys, some very successful multitopic surveys have

survey, or find creative ways to analyze existing data. been carried out in countries with very limited data

Two arguments support implementing a new survey. collection capacity. Several steps can be taken to over-

First, past surveys may not have been adequately doc- come the problems posed by limited capacity. For

umented, or access to their data may be restricted. LSMS surveys, international experts have been

Second, inter-agency rivalry, arguments concerning brought in to draw the sample, draft the questionnaires

ownership of survey data, and coordination problems and interviewer manuals, and write the data entry pro-

when different surveys are carried out by different gram. Such experts initially substitute for government

agencies may make it easier to begin a new survey agency staff, but they can also train agency staff to take

instead of using existing data or modifying an existing their place in future surveys. It is highly recommend-

survey. On the other hand, survey designers should at ed that countries with limited capacity for collecting

least consider trying to remedy these problems so that survey data use such expert assistance.

existing surveys can be used (perhaps with some mod- In countries with weak institutional capacity, seri-

ifications) to meet the designers' data and policy ous consideration should also be given to improving

objectives-thus avoiding any unnecessary duplica- that capacity; capacity building yields long-term ben-

tion. Examples are given later in this chapter of coun- efits that gradually reduce the need to use interna-

tries in which an existing survey was modified to be tional experts to help with data collection. Genuine

more like an LSMS survey. capacity building takes time, money, and political and

managerial effort. An international sampling expert

Institutional Capacity may be able to design and draw a sample for a survey

Decisions about what kind of survey to implement in a few days, but it will usually take him or her much

also depend in part on the institutional capacity for longer to teach local staff how to do so. Training to

27

MARGARET GROSH AND PAUL GLEWWE

build capacity requires significant resources beyond entry operators is not a serious problem since they can

those already budgeted for a survey.Whether building be trained in a matter of weeks, and no previous expe-

a country's data collection capacity is important rience is required. However, it takes longer to trans-

enough to warrant committing these resources will form government staff with no household survey

vary from country to country. Where capacity build- experience into competent interviewers. While inter-

ing is deemed necessary, the survey's work plan and viewers may be trained in a month, it is not so easy to

budget must both be significantly enlarged. compensate for little or no interviewing experience.

If capacity building is a goal, a program of annual Experience is even more important in the case of

(or biannual) surveys will xwork better than a program supervisors. It may take years to overcome shortages of

for a single survey or for a sequence of surveys that experienced interviewers and supervisors.

take place every three to five years. An annual survey

usually has a permanent allocation of skilled staff, staff Constraints Imposed by Funding Sources

time, and equipment. Even when the team works only Surveys are always constrained by their funding. Most

part of the year on the multitopic survey, staff have a LSMS and similar multitopic household surveys

chance every year (or every two years) to use the skills receive some portion of their financing from sources

that they have acquired in managing such a survey. other than the national budget, at least initially. As a

And as the staff of the agency develop survey manage- result, they are subject to constraints associated with

ment skills, the need for technical assistance from both the national budget and funds from external

international experts should diminish. When some sources.3

staff members working on the survey leave, their The first and most obvious constraint imposed by

replacements can learn their jobs from other staff the source of funding is the total amount of funds

members who have worked on earlier rounds of the available. National budgets are often very restricted.

survey. In addition, the continuity provided by an Some external funding sources have upper limits for

annual survey may make it easier to improve survey how much may be spent on a single project, and most

quality; if one year a problem arises in data collection have administrative procedures that grow in complex-

or initial analysis, the people who deal with the prob- ity as the size of a project increases. Also, the larger the

lem are likely to be involved in planning the next sur- survey budget, the more difficult it is for survey plan-

vey and can better address the problem in the next ners to justify using the money for the proposed sur-

survey. vey rather than for some other purpose. Limitations on

In contrast, a survey carried out every four or five the size of the budget often constrain the size of the

years may require new skills, staff, and equipment each sample used in the survey and in sonme cases curtail the

time it is implemented. In the intervening period, survey's analytical depth and breadth.

many of the individuals who carried out the first sur- Another potential constraint relates to the time

vey may have moved on to other jobs either inside or period over which funds may be spent. Funding agen-

outside the statistical agency. Those who remain may cies may stipulate that a survey project be completed

not have been involved in planning the previous sur- in only one or two years, even though a single full-

vey, and the skills of those who were involved may scale survey can easily take three years or more to

have deteriorated over time. Vehicles and computers complete-6 to 18 months to plan, a year for field-

used in the first survey will have been allocated to xvork, and 6 to 18 months for data dissemination and

other purposes, and some may have ceased to function analysis. Moreover, chances of obtaining future fund-

altogether. Most importantly much of the institution- ing can influence whether a proposed survey is car-

al memory about problems and potential solutions ried out only once or is the first of a series of surveys.

may have been lost. And funding limitations can affect such other aspects

A final note of caution is needed regarding insti- of the survey as the thoroughness of the survey

tutional capacitv. Sometimes, even when a statistical designers' work during the planning stage, whether

agency has sufficient management and technical the fieldwork is spread over a frill year or concentrat-

capacity to implement a complex multitopic survey, ed into a period of a few weeks, and the amount of

there may not be enough experienced supervisors, analytical work funded from the survey project's

interviewers, or data entry operators. Lack of data budget.

28

CHAPTER 2 MAKING DECISIONS ON THE OVERALL DESIGN OF THE SURVEY

Finally, many funding agencies also have rules on nents, discusses the strengths and weaknesses of each of

how survey funds can be spent.These rules may impose the three main survey types, and describes two other

controls on: the percentage of funds spent in local or survey options.

international currency; the balance between recurrent

and investment costs; the amounts that can be spent on The Core

the salaries of local staff, survey equipment, and pay- Any LSMS-type multitopic survey must collect cer-

ment of international experts; the nationalities of such tain essential information about the household, its

experts; and various aspects of budgeting, accounting, members, and the local community, including:

and procurement. Spending rules rarely influence big * A roster that lists, and collects basic information

issues of survey design (such as survey duration, sample about, all household members.

size, or questionnaire design), but they can affect many * Detailed information on household consumption

details of the structure and implementation of a proj- expenditures.

ect. Rules that prohibit the shifting of expenditures * Basic housing data such as type of dwelling, water

between items or between time periods may limit the source, type of toilet, and whether the dwelling has

ability of survey planners to deal with unanticipated electricity.

problems. For example, an additional international * The education of all household members, including

expert may be needed quickly, but may be difficult or who is currently in school.

impossible to obtain because hiring this expert was not * The employment status of everyone of working age

included in the original budget. The end result can be and, for those who are working, their occupation,

a delay in the survey or a reduction in quality. Another the number of hours they worked during the pre-

example is if an accident occurs involving a survey vious seven days, and, if they are employees, their

vehicle; fieldwork may be delayed if expenditures to wage earnings.

repair or replace the vehicle cannot be made available * The reccipt of money or in-kind assistance from

promptly. key government or NGO programs.

* The use of social services and programs, such as

Summary government health facilities, schools, agricultural

The analytical objectives of a survey, the availability of extension services, and social assistance programs.

information from other sources, local institutional * Basic information related to the design of the sam-

capacity, and constraints imposed by funding are all ple and the outcome of the household interview.

key factors that typically affect whether to perform a * Local prices of basic food and nonfood goods (unless

new survey and the form such a survey will take. price data are available from another source, or the

Many other factors are also critical, including institu- country is so small and its markets so well integrated

tional inertia and rivalry and the compromises that there is very little regional price variation).

required to build a coalition to support and conduct a These components are referred to in this book as

survey. However, it is difficult to provide general the essential core. In addition to the essential core, it is

advice because these factors usually depend on the set- highly recommended that the following five types of

ting of the survey; survey planners must deal with information be collccted in LSMS and similar multi-

these issues as best they can given the particular cir- topic surveys:

cumstances they face. * Anthropometric measurements (height and weight)

of children 0-5 years old (unless malnutrition is

Classic Survey Designs known to be negligible in the country).

* The immunization status of children 0-5 years old.

There are three basic ways to combine modules into * Information on basic household assets such as

questionnaires and combine those questionnaires into durable goods, housing, land, and the capital equip-

a survey or sequence of surveys: the full LSMS-type ment used for agricultural activities and nonagri-

multitopic survey, the scaled-down LSMS-type survey, cultural household enterprises.

and the core and rotating module survey. All of these * Information on interhousehold transfers.

survey formats must include certain "core" compo- * Information on rental payments for those house-

nents. This section outhnes the core survey compo- holds that rent their dwellings.

29

MARGARET GROSH AND PAUL GLEWWE

In this book the set of modules formed by adding 6 (and provided inVolume 3) collects such basic infor-these five components to the essential core is referred mation, along with another piece of information thatto as the recommended core, is less essential: questions that link each married indi-

The essential core of an LSMS or similar multi- vidual to his or her spouse.topic survey collects the information needed todescribe poverty and to monitor it over time.The rec- CONSUMPTION EXPENDITURES (ESSENTIAL). The expe-ommended core adds some very basic child health rience of LSMS surveys and other household surveysinformation, along with information on assets, inter- strongly suggests that household consumption expen-household transfers, and rental payments (the use of ditures are the single most important indicator ofwhich will be explained below). Judgments about household welfare that can be obtained from a house-which data are part of the essential and recommended hold survey. (See Chapters 5 and 17 for further discus-cores are based on many years of experience that sion on this point.) Chapter 5 describes how to collectWorld Bank staff have in using data from LSMS sur- data on consumption expenditures, stressing that thereveys to produce poverty profiles for a wide range of are no costless shortcuts for collecting such data. Indeveloping countries.Table 2.1 lists the components of some circumstances it might be possible to omit ques-both the essential and recommended cores of LSMS- tions on the ownership of durable goods and on trans-type multitopic surveys. The paragraphs that follow fers given to other households, but the rest of the con-describe each of these components in greater detail. sumption module is an essential part of the core and

should not be reduced further. Data on householdHOUSEHOLD ROSTER (ESSENTIAL). Virtually every expenditures on education, health, and housing arehousehold survey should begin by determining how collected in the core elements of those modules (dis-many people belong to each household and collecting cussed below) and need not be included in the con-very basic information on each household member, sumption module. Consumption in the form of in-including age (or date of birth), sex, nationality, rela- kind payments (such as meals, clothing ortionship to the head of household, and marital status. transportation) from employers is best collected in thePart A of the household roster introduced in Chapter employment module.

Table 2.1 The Essential and Recommended Cores of LSMS-Type Multitopic Surveys

Module Sections usedThe Essential Core

Household Roster All of Part A except questions 8 and 9Consumption All questions except transfers given to other households (Part D) and ownership of curable goods (Part E)Housing QuestionsA Al-A, Bi-B5, Ci-C3, and C 3-24 of the short moduleEducation All questions in the short moduleEmployment Questions A2-Ai3, B i B2.BY-Bi i ,D3; D ano D8-DiiTransfers and Other Nonlabor Income All of Part B l; see text for further discussionHeaith Questions i.0-38 of tne short module ,.........................................................................Metadata Household Identitcation and Control submodule; Questions I-A in Summary of Visits and Interviews

submodu e

...................................................................... I............................................................................... ............... I.............................................................

Pr.ces 30-40 food tems and 10-20 nonfood itemsCredit Questions 9-14 and 21-28 of the short module (on cred t obtained from NGOs or government agencies)Agriculture All of Part P (use of agricu[turai extension services), which is tne same for all modules

Additional components for the recommended core

Anthropometry Entire modu e, for children 0-5 years old................................................. ......................... AIl ... of... Part...C... im...m... unizat... on)...........................................................................................................................Health All of Part C (immunizat on) ,.... ..........................................Consumption All of Parts D and E........................................................... ....................... ...E7 ..... C7' ... ofthe... sho t...m... odue.........................................................................................................Hous ng Questions C7-C[12 of the short modu eHousehold Enterprises Part C of the short module, quest ons 1 3

.......................... .................................. 7............ :................................................................................................................................................Agriculture Parts A, B, ano E of the snort module. ............................. ... . ........ ........... .. .Transfers and Other Nonlabor Income Questions on income from interhousehold transfersSource Authors' recomnmendations.

30

CHAPTER 2 MAKING DECISIONS ON THE OVERALL DESIGN OF THE SURVEY

HOUSING (ESSENTIAL). InformllatioIn oJn housing, indicator of living standards, is also easy to collect. And

including the type of dwelling, the construction mate- it is usually more convenient to collect information on

rials used, the number of rooms, the availability of household education expenditures in the education

electricity, the source of drinking water, and the type module than in the consumption module.

of toilet, are very basic indicators of living standards.

They provide analysts and policymakers with informa- EMPLOYMENT (ESSENTIAL). Basic employment infor-

tion on a household's standard of living that goes mation on household members of working age (7 and

beyond consumption expenditures. Because housing older in many countries) should be collected as part of

information is simple to collect, it should be included the essential core of any LSMS-type survey. The most

in any LSMS-type survey. The long version of the important source of income for poor people in devel-

housing module introduced by Chapter 12 (and pre- oping countries is their labor; employment data,

sented in Volume 3) collects substantially more hous- including information on unemployment, indicate

ing data than are necessary for the essential core, and how this labor is being used.

even the short version is longer than the essential core. Essential employment information includes each

Only the following questions from the short version person's occupation and the number of hours that he

of the housing module need to be included in the or she has worked in the previous seven days.While it

essential core: Al-A7, BI-B5, and B11-B21. Even would also be useful to gather data on the incomes of

some of these questions can be omitted in some coun- all employed household members, this is not easily

tries. The questions on heating (B18-B21) can be done for the self-employed (see Chapter 17 for further

removed for countries with warm climates, and the discussion). However, income data should still be col-

questions that distinguish between wet and dry seasons lected for employees even when such data cannot be

(B1-B3) can be simplified in countries where this dis- collected for the self-employed, for two reasons. First,

tinction is not important. these partial data are useful for understanding which

Another part of the core data set, housing-related occupations pay well and which do not. Second, since

consumption expenditures (such as expenditures on data are already needed from employees on in-kind

electricity, water, and cooking fuel), are most conve- benefits provided by their employers (in order to cal-

niently collected in the housing module rather than in culate consumption expenditures), it would seem

the consumption module. A final useful indicator of strange to ask about those benefits without first asking

living standards is information on the ownership of about money income.

the household's dwelling. Questions Cl-C3 in the The short version of the employment module

short version of the housing module collect expendi- introduced by Chapter 9 (and presented inVolume 3)

ture information, and questions C13-C24 collect collects more information than the core of an LSMS-

ownership information. typc survey requires.Thc following questions from the

draft employment module constitute the essential

EDUCATION (ESSENTIAL). Education is both a determi- core:A2-A13, B1, B2, B7-B10, D3, D4, and D8-D17.

nant and a key indicator of living standards. The short Job-specific information-questions in Parts B or D-

version of the education module introduced in should be collected both for the person's main occu-

Chapter 7 (and presented in Volume 3) comprises all pation and for any secondary occupation. (The main

of the essential core questions on education.The only occupation is the job the respondent spent the most

questions that might be omitted are the two questions hours doing during the previous seven days.)

on grade repetition.

The short education module assesses education GOVERNMENT AND NGO TRANSFERS (ESSENTLAL).

from several different angles, including school attain- Many developing countries have programs that pro-

ment, current enrollment, and education expenditures. vide money or in-kind assistance to households. Some

Information on the school attainment of household of these are government programs and some are run

members ages 5 and older is easy to collect and has by nongovernmental organizations (NGOs). Examples

many analytical uses (such as classifying households in of these programs include cash welfare payments, pen-

terms of the education level of their head of the house- sions, unemployment insurance, food stamps, food

hold). School enrollment among children, another key rations, school feeding programs, community soup

31

MARGARET GROSH AND PAUL GLEWWE

kitchens, scholarships, and free or subsidized text- version of the credit module introduced in Chapter

books.While the range of programs is very wide, there 21. It is possible to identify beneficiaries of public

are usually only a few sizable programs in any partic- works programs by adding one or two questions to the

ular country. employment module that ask whether an individual's

A key policy question that LSMS surveys can current employment is related to such a program.

address is who benefits from these programs. However, Finally, information on housing-related physical infra-

only programs that reach a substantial fraction of the structure services-such as water, sanitation, and elec-

population can be studied with the relatively small tricity-is collected in the core of the housing mod-

sample sizes recommended for LSMS and similar mul- ule, as discussed above.

titopic surveys.

Questions about government and NGO transfer METADATA (ESSENTIAL). The last type of information

programs should not necessarily all be in the same that must be collected in the household questionnaire

module (a fact that makes this part of the core difficult of any LSMS-type survey consists of basic data on

to standardize).While questions about cash income fit where the household fits in the sample and on the

best in the transfers and other nonlabor income mod- outcome of the interview. This type of information,

ule, questions on school feeding programs should known as "metadata," is discussed in Chapter 4. For

probably be in the education module. However, Part the essential core, it is not necessary to collect all of the

BI in the transfers and other nonlabor income mod- information covered in the metadata module. The

ule is a good place to start collecting this information. essential metadata are the date of the interview or

interviews, the identification (ID) codes for the house-

SOCIAL SERVICES (ESSENTIAL). Related to programs hold and its primary sampling unit, 4 the ID codes of

that provide cash or in-kind assistance are programs the interviewer and the other team members who col-

that provide services. The most common examples of lected, checked, or entered the data from that house-

social services are public schools, public health servic- hold, information on whether an interview actually

es, agricultural extension services, credit programs, took place (and if not, why it did not), and perhaps

public work schemes, electricity supply, public water some data on the ethnic group and religion of the

supply, and sewage systems. LSMS surveys and similar household. This information is collected in the meta-

multitopic surveys should always collect some infor- data module, on the Household Identification and

mation on the use of social services, at least enough to Control submodule and in questions 1-4 of the

measure variation in access to and utilization of such Summary ofVisits and Interviews sub-module.

services across different socioeconomic groups.

As with direct assistance to households, the types PRICES (ESSENTIAL). Price information should be col-

of programs and the amount of detail needed to iden- lected at the level of the community (the primary

tify who benefits from them will vary among coun- sampling unit) since all households in a given com-

tries. School enrollment information is already collect- munity face the same prices. How to collect price

ed in the core, as discussed above, although additional information is discussed in Chapter 13. The main task

information may need to be collected on any school is to select the items for which price data will be col-

services that are available to some students and not to lected. While the exact items will vary across coun-

others, such as tuition waivers or afterschool programs tries, prices should be collected for at least 30-40 of

for disadvantaged students. Information on the use of the most commonly consumed food items and 10-20

public health services is also very important; such of the most commonly purchased nonfood items. In a

information is collected in questions 10-38 of the few countries other sources of reliable price data may

short version of the health module (introduced in already exist for both urban and rural areas; if these

Chapter 8). Data on the use of agricultural extension data can be matched to the communities covered in

services are collected in Part F of all versions of the the survey, there is no need to collect new price data.

agriculture module introduced in Chapter 19. Some And in some small countries such as Jamaica, prices

countries have subsidized credit programs to assist vary little among regions. In these cases, no price data

poor households; information on these programs is need to be collected as long as national price data exist

collected in questions 9-14 and 21-28 of the short that show changes in prices over time.

32

CHAPTER 2 MAKING DECISIONS ON THE OVERALL DESIGN OF THE SURVEY

ANTHROPOMETRIC MEASUREMENTS (RECOMMENDED). as radios, televisions, bicycles, motorcycles, and cars is

Anthropometric data, particularly on height and a simple indicator of living standards. Second, the sum

weight, should be collected for children 0-5 years old of the value of all these different household assets gives

in almost every LSMS or similar household survey. a rough (and admittedly incomplete) indicator of

Stunting (low height for age) and wasting (low weight household wealth. Third, data on the ownership of

for height) are common measures of children's nutri- land and on capital assets used in agricultural and

tional status; height and weight data are critical in nonagricultural enterprises indicate productive assets.

countries where children are at risk of malnutrition. Fourth, in some countries, particularly countries of the

And collecting basic anthropometric data about chil- former Soviet Union, there is evidence that adding the

dren is simpler and more reliable than collecting other consumption derived from durable goods and housing

data on health status. The details of how to collect to total consumption can lead to substantial changes in

children's anthropometric data are explained in the relative economic positions of different types of

Chapter 10. households.

Collecting height and weight data requires some Information on the ownership of consumer

effort. The data are collected using special equipment durable goods can be collected using Part E of the

that is bulky and troublesome to carry around to each consumption module. Data on the value of owner-

household. One consequence of this is that another occupied housing are collected in the short version of

individual is often added to each survey field team. If the housing module, using (at minimumi) questions Cl

collecting children's height and weight data were eas- and Cll, with C3 and C 12 providing alternative val-

ier, such anthropometric measurements would have uations. A short set of questions on the assets used in

been classified as part of the essential core of any household enterprises is provided in Part G of the

LSMS-type survey. short version of the household enterprise module;

only questions 1-3 are needed. Parts A, B, and E of the

IMMUNIZATION (RECOMMENDED). Almost all LSMS short agriculture module collect a modest amount of

and similar multitopic surveys should collect inunu- information on households' land holdings, livestock,

nization records for children ages 0-5. In recent years machinery, and other agricultural assets.

child immunization programs have dramatically

reduced the incidence of several life-threatening PRIVATE INTERHOUSEHOLD TRANSFERS (REcoM-

childhood diseases in many developing countries- MENDED). Private interhousehold transfers, which are

significantly reducing infant and child mortality rates. pervasive in many countries, are used by many house-

However, many countries still do not have 100 percent holds to cope with poverty and economic vulnerabil-

immunization coverage. Therefore, information on the ity. Transfers received are covered by the transfers and

extent of coverage and on where coverage is low is other nonlabor income module (introduced in

important for almost any analysis of living standards. In Chapter 11) and transfers sent are covered by the con-

addition, since child immunization coverage can sumption module (introduced in Chapter 5). At least

change dramatically over a year of two, it serves as a the short versions of the private interhousehold trans-

useful indicator of chaniges in the provision of govern- fer submodules should be used in virtually all surveys.

ment services during periods of economic or social Even in a relatively simple survey it may be worth-

instability. Child immunization information is collect- while to use the standard version of the submodule on

ed in Part C of the health module introduced by transfers received.

Chapter 8.

RENTAL PAYMENTS FOR HOUSING (RECOMMENDED).

ASSETS (RECOMMENDED). Household assets include Estimates of the annual rental value of dxvellings are

information on any consumer durable goods owned needed to estimate the consumption value of housing

by the household, the value of owner-occupied hous- for households that own these dwellings. In most

ing, and the ownership of land and capital assets relat- countries such estimates can be calculated by estimat-

ed to agricultural activities and household enterprises. ing the relationship between basic housing character-

There are several reasons for collecting these data. istics, which are already part of the core, and the rental

First, the possession of household durable goods such payments made by households that rent their

33

MARGARET GROSH AND PAUL GLEWWE

dwellings.TIhe key piece of information needed is the Participation in specific government programs such

rental payments of households that rent. Questions as food stamps programs,job training programs, and

C7-C12 in the short version of the housing module agricultural extension services.

collect information on rental costs. Having all this information for a group of households

makes it possible to describe many indicators of living

Full LSMS-type Survey standards, estimate the determinants of different

In practice, the essential core-and even the recom- dimensions of living standards and different types of

mended core-will tap only a small part of the household behavior, and estimate the relationships

potential policy uses of an LSMS-type survey. In between dimensions of living standards and household

most LSMS and other multitopic surveys, much behavior (such as the impact of children's nutritional

more can and should be added to the questionnaires status on their school performance).

to gather information beyond what is collected in The full ISMS household questionnaire is long

the core. This subsection and the two that follow it and complex. In almost all cases it is too long to be

discuss different ways to add to the core by expand- completed in a single visit by an interviewer to a

ing modules and combining them to form a survey household. Instead, an interviewer typically visits each

or sequence of surveys. household twice. All of the individual-specific mod-

A full LSMS-type multitopic household survey ules (roster, education, health, employment, and

can be formed by combining the short or standard migration) are administered in the first visit, some-

versions of most of the modules in the household times with the addition of one or two household-level

questionnaire with the corresponding parts of the modules such as housing. The interviewer makes an

community and price questionnaires. This produces a appointment for a second visit, usually about two

household survey similar in design to the original weeks later, to reinterview household members who

LSMS surveys first used in 1985, except that the mod- are most knowledgeable about the other household-

ules presented in Volume 3 of this book (and level modules (such as consumption, agriculture, and

described in Parts 2 and 3) include revisions based on household enterprises). To ensure that high quality

15 years of experience with LSMS and other house- data are collected and to keep the budget within rea-

hold surveys. sonable limits, the samples in full LSMS-type surveys

Because some of the standard versions of modules are usually relatively small-between 2,000 and 5,0()0

presented in Parts 2 and 3 are significantly larger than households. Samples of this size are still large enough

versions used in the original LSMS surveys, a household to provide accurate information on the nation as a

questionnaire including all of the standard modules whole, on rural and urban areas, and on a small num-

would almost certainly be too large to be practical. Thus ber of geographic regions. However, such samples are

the household questionnaire of a full LSMS-type survey not large enough to provide accurate statistics for each

needs to be trimmed, either by replacing the standard state, province, department, or district in a country.

versions of some modules with their short versions or Even at the national level, they cannot provide precise

by dropping some nonessential modules. information on phenomena that do not pertain to

A well designed full LSMS-type multitopic survey most households or individuals-such as post-second-

collects information that measures or otherwise ary education or participation in a program used by

describes: only a small fraction of the population. See Grosh and

* Household consumption. Munoz (1996) for a more thorough discussion of sam-

* Household incorne. pIe size and sampling issues.

* Key nonmonetary indicators of welfare such as In most cases it is not worthwhile to implement a

nutritional and health status, education status, and full LSMS-type multitopic survey every year. Much of

housing conditions. the analysis for which LSMS surveys are designed does

* Many aspects of household behavior, such as not need to be repeated annually. For example, while

income-generating activities, human capital invest- it is important to understand the determinants of fer-

ments, fertility, and migration. tility, it is unlikely that these determinants change

* The local econornic environment (including prices greatly from one year to the next. Sizable changes are

and the availability of services). likely to occur only over the course of several years, as

34

CHAPrER 2 MAKING DECISIONS ON THE OVERALL DESIGN OF THE SURVEY

economic conditions and people's attitudes change. retaining only questions on agricultural extension

Another reason not to implement a full survey every services that are part of the essential core and questions

year is that it is costly to administer such a compre- on assets that are part of the recommended core.Yet

hensive household questionnaire, and requires substan- the questions on wages from the employment module

tial work at each stage. Therefore, a full LSMS-type and the questions on public and private transfers from

survey should be implemented only once every three the transfers and other nonlabor income module

to five years. should be retained, as they are part of the essential core

During 1985-99 the following countries imple- of any LSMS-type survey.

mented full-size LSMS surveys: Algeria, Brazil, Cote Other questions that can be dropped are questions

d'lvoire, Ecuador, Ghana, the Kyrgyz Republic, on any aspects of household behavior that are of little

Mauritania, Morocco, Nepal, Pakistan, Panama, Peru interest to policymakers. The savings, credit, fertility,

(in 1985-86, 1991, and 1994), Turkmenistan, and and migration modules have often been deleted from

Vietnam. previous LSMS surveys. Because the new time-use

module is quite lengthy, it is also a candidate for omis-

Scaled-down LSMS-type Survey sion, unless data on time use are of particular interest

A scaled-down LSMS-type survey can be constructed to policymakers. If analysts aim to measure use of

by omitting some modules from the household ques- social services but not to estimate the determinants of

tionnaire of a full LSMS-type survey and by abridging demand for them, survey designers could choose to

other modules. Such a survey will still be a multitopic use the short, rather than the standard, versions of the

survey, but will cover fewer topics than a full-size sur- health and education modules.

vey would. Substantial reductions in the size of the An alternative way to obtain a scaled down multi-

household questionnaire may mean that the question- topic survey is to "scale up" an existing single-topic

naire can be completed in a single visit by the inter- household survey, such as a labor force or household

viewer to the household, as compared to the two vis- expenditure survey. In Romania, Latvia, and

its needed for a full LSMS-type multitopic survey. Bangladesh, new modules on the use of social services

The extent to which various modules should be and programs were added to existing household income

reduced or eliminated will depend on which policy and expenditure surveys. In Guyana, households that

questions are most important in the country in ques- had been interviewed in a previous income and expen-

tion. However, there is a limit to how much the ques- diture survey were revisited to collect information on

tionnaire can be cut. The essential core of an LSMS or health, education, and anthropometrics; the separate

similar multitopic survey, as described above, must data files were later merged for purposes of analysis. In

remain. In addition, the elements that are added to Jamaica, households from the Labor Force Survey were

form the recommended core (data on anthropomet- revisited by interviewers who administered the Survey

rics, child immunization coverage, basic household of Living Conditions; the two data files were later

assets, interhousehold transfers, and rental payments of merged. In Paraguay, additional modules were added

households that rent their dwellings) should almost directly to the Labor Force Survey questionnaire.

always be included. The community questionnaire Scaling down the household questionnaire of a

may or may not be included in a scaled-down survey, full LSMS-type survey reduces the analytical potential

but the price questionnaire should always be used, of data collected, especially in parts of the question-

except in those rare cases in which fully adequate naire that are dropped or abridged. A reduced ques-

price data already exist or price variation across tionnaire produces fewer descriptive statistics on many

regions is negligible. Overall, the analytical objectives dimensions of household welfare than would be pos-

of a scaled-down LSMS-type survey are more modest sible using a full-size survey. Data from a scaled-down

than the objectives of a full-size survey. questionnaire can be used to analyze only a few of the

One common way to abridge the questionnaire is determinants of living standards. And such data reduc-

to decide not to collect the data needed to measure tions substantially reduce the range of analytical meth-

total household income. Not measuring total house- ods that can be used.

hold income allows survey designers to delete most of A scaled-down LSMS-type survey can be imple-

the agriculture and household enterprise modules, mented fairly often, perhaps annually or every other

35

MARGARET GROSH AND PAUL GLEWWE

year. Such frequent implementation is desirable further details on health facility questionnaires). In thebecause one of the main uses of data from a scaled- third year the health module would return to its orig-down survey is to monitor changes in poverty and inal "core" size and a new subject, such as education orother dimensions of welfare over time. Also, the fact savings, would be given special emphasis. Expansion ofthat a scaled-down survey collects less data on the any particular module might require making somedeterminants of household welfare and behavior than additions to other modules in the survey to ensure thatdoes a full-size survey means that implementing it fre- the analytical potential of the data collected in thequently wastes less resources than would implement- expanded module could be fully exploited. Each chap-ing a full LSMS-type survey every one or two years. ter in Parts 2 and 3 of this book explains what data areAnother advantage of a scaled-down survey is that it is needed from other modules to complement the dataeasier and less expensive to carry out than a full-size collected in the module covered by that chapter.survey. Finally, a scaled-down survey can be carried The core and rotating module design is a hybridout using somewhat larger samples than a full LSMS- of a full LSMS-type multitopic survey and a frequent-type survey because it is subject to fewer managerial ly implemented scaled-down LSMS-type survey.and budget constraints. Implementing a core and rotating module survey

Scaled-down LSMS surveys have been carried annually would allow for the same monitoring ofout, with World Bank support, in Albania, Azerbaijan, poverty and welfare that is possible with data from anBolivia, Bulgaria, Pakistan (1995/96 and 1996/97), annual scaled-down survey. In addition, in each rota-Peru (1990), and Tanzania. tion of a particular module, this kind of survey would

collect the data analysts need to study the determi-Core and Rotating Module Design nants of household behavior for a specific topic-inThe "core and rotating module" design for a multi- other words, data comparable to what are collected intopic household survey is an attempt to combine the a full-size survey. It might even be possible to use dataadvantages of full and scaled-down LSMS-type sur- from the scaled-down modules to study topics that areveys. In this design, a scaled-down LSMS-type survey not emphasized by the survey in a particular year.forms the "core," while one or two modules are added The cost and sampling implications of the coreor greatly expanded each time the survey is carried and rotating module design lie somewhere in betweenout. Modules that are added or expanded in any given those of a full-size LSMS-type survey and those of ayear revert back to their "core" size the following year, scaled-down survey. Perhaps of greatest concern in thecreating a module "rotation" scheme for the modules core and rotating module design are the institutionalthat go beyond the core. In most cases the survey is arrangements for developing, implementing, and ana-fielded annually, although it can also be a semiannual lyzing the special modules.While for both full-size andor biannual survey. The core that is repeated each time scaled-down LSMS-type surveys it is possible to put athe survey is implemented must include the essential lot of effort into the design of the first survey and givecore described above, and in almost all cases it should less attention to improving its design in subsequentinclude everything in the recommended core. In many years, implementing the core and rotating modulecases the core of a core and rotating module design design means that the questionnaire needs to be sig-should collect additional information as well, in order nificantly modified each year-requiring much moreto provide a more detailed picture of household wel- attention from survey designers after the first year.fare each time the survey is implemented. Indonesia's SUSENAS is a long-standing example

An example of how to implement this approach of a core and rotating module survey design. Jamaica'swould be to use only the core in the first year of the Survey of Living Conditions, which began in 1988,survey, in order to focus on making sure that the core was the first LSMS survey to adopt this approach. Aworks well. In the second year the health module in new LSMS survey in Cambodia is just starting tothe household questionnaire would be expanded to develop such a system, as is the Bangladesh Householdgather more detailed data on individuals' health status Expenditure Survey. (The Bangladesh Householdand behavior, the kinds of health care sought, and the Expenditure Survey is not usually regarded as ancost and quality of that health care. In addition, a health LSMS survey; however, it has adopted much of thefacility questionnaire could be added (see Chapter 8 for LSMS methodology.)

36

CHAPTER 2 MAKING DECISIONS ON THE OVERALL DESIGN OF THE SURVEY

Specia/ Purpose Sample Designs beyond debate; instead, they should be thought of as a

There are two other possible survey designs, both of starting point for making survey design decisions.This

which use special purpose samples (that is, samples that is the case for several reasons. First, the dividing lines

are not nationally representative). The first is a survey between the three basic survey designs are flexible, as

that samples a special population that is of particular it is possible to develop "hybrid" surveys that merge

interest for analytical or policy purposes. 5 An example characteristics from the different survey design

of this is a sample of households within a single city that options. Second, individual countries may not fit neat-

is used to study issues pertaining to that city, such as the ly into the categories implicit in the rules. Third, sur-

housing market, the water supply system, or urban air veys may have multiple analytical objectives. Finally,

pollution. Two LSMS surveys of this type have been funding constraints are not explicitly considered here.

performed: one in the Kagera region oflTanzania, focus- Survey planners should consider the following

ing on areas with high prevalence of AIDS, and one in general "rules of thumb" when deciding what kind of

rural areas of Northeast China, focusing on the agricul- survey to implement:

tural activities of rural households. 1. Countries with sufficient institutional capacity to

A second kind of special purpose survey is one in implement a complex survey should use either a

which the sample is drawn solely for purposes of pro- full LSMS-type multitopic survey every three to

gram evaluation. In this type of survey, a group is five years or a core and rotating module design;

observed both before and after the benefits of a partic- both options can serve a broader range of analytical

ular service or program are made available to this group. objectives than can a sequence of scaled-down

Alternatively, the sample may be composed of two LSMS-type surveys.

groups, one consisting of the households who benefit 2. If annual (or biannual) monitoring of living stan-

from the service to be evaluated-the treatment dards or poverty is the most important analytical

group-and the other consisting of households that are objective, either a sequence of frequent scaled-

similar to the first in every respect except that they do down LSMS-type surveys or a core and rotating

not benefit from the service-the control group. module survey should be adopted. In contrast, a

These special-purpose samples usually gather full-size multitopic survey is inappropriate because

detailed data on the topic being studied, whether it is cost and efficiency considerations imply that such

a specific sectoral issue (such as agriculture) or a pro- surveys should be implemented only every three to

gram to be evaluated. There are so many ways to five years.

design such surveys that this book cannot hope to 3. No new survey is needed if the main objective is to

cover all of them. However, since special purpose sur- provide periodic descriptive information (say, every

veys typically collect data on many general character- three to five years) or to examine the coverage of

istics of the sampled households (such as size, compo- government programs in countries where ample

sition, living standards, labor force status, and data are already available from other sources.

education), designers of this kind of survey can use the 4. If the main objective is to gather periodic

modules proposed in this book as a guide for collect- descriptive information or to examine the cover-

ing this supplemental information. The experience of age of government programs, a core and rotating

past LSMS surveys has been used in designing special module design should not be chosen. Such a

purpose surveys to evaluate the impact of educational design would collect data much more frequently

reforms in El Salvador. And the Nicaragua LSMS sur- than is necessary.

vey included a special sub-sample designed to evaluate 5. If the main objective is to model household behav-

the impact of that country's Social Investment Fund. ior, either a full LSMS-type survey or a core and

rotating module survey should be chosen. A series

Matching Circumstances and Designs of scaled-down surveys would be insufficient for

modeling household behavior.

This section provides some approximate rules of 6. If the main objective is to model household

thumb for choosing among the three common survey behavior and very little other data are available, a

design options discussed in the previous section.These full-size multitopic survey is preferable to a core

recommendations should not be thought of as rigid or and rotating module survey since the latter cannot

37

MARGARET GROSH AND PAUL GLEWWE

supply detailed information on all topics until it Choosing the Modules, Defining Theirhas been in operation for several years. The core Objectives, and Setting Their Sizeand rotating module design can be adopted after

one or two full LSMS-type surveys have been car- Once the basic blueprint of the survey has been select-

ried out. ed, survey designers must decide which modules to

7. If the main objective is to model household behav- include in the household and community question-

ior and a large amount of other data are available, naires.' Designers must also define specific objectives

the core and rotating module survey is preferable to for each module and decide on each module's approx-

a series of periodic full LSMS-type surveys because imate length. The procedures for these steps are dis-

the core and rotating design allows poverty to be cussed in this section. Because decisions about length

monitored more frequently over time. and objectives ultimately depend on many country-

8. If the institutional capacity in the country is lim- specific details, specific recommendations cannot be

ited and the survey aims either to monitor pover- provided for each possible scenario. Instead, some gen-

ty and living standards annually or to provide eral guidelines and procedures are provided that

descriptive information (including coverage of should prove useful for completing this step efficient-

government programs) periodically, a scaled- ly and effectively.

down LSMS-type survey should be chosen. This Two general points must be made at the outset.

survey may be either frequent (for annual moni- First, the tasks of choosing modules, defining their

toring) or periodic (for descriptive information objectives, and setting their approximate size are all

every three to five years). The other options, full closely related and thus must be done simultaneously

multitopic and core and rotating module, are too rather than sequentially.The type of objectives and thecomplex for countries with limited institutional number of objectives have considerable implications

capacity. for the size of each module; more objectives, and more

Table 2.2 summarizes the implications of these complex objectives, necessitate a larger module.

rules, showing which rules lead to which choices. Second, the objectives of each module should be con-

Because countries vith little institutional capacity sistent with the overall objectives of the survey, in

cannot implement a full LSMS-type multitopic survey terms of both the analytical objectives (describing liv-

or a core and rotating module design on their own, ing standards, monitoring poverty and living standards,

they will not be able to collect data that are useful for examining the coverage of government programs, esti-

analyzing household behavior unless their institution- mating the impact of policies) and the specific topics

al capacity is either permanently improved or supple- in which policymakers are interested. The overall

mented in the short run by using international objectives of the survey already provide some infor-

experts. In addition, significant purchases of new mation on what the objectives of many of the mod-

equipment may be required in some countries. ules will be.

Table 2.2 Recommended Survey Designs for Different Settings

Analytical objective _ _

Describing living standards or Monitoring livingAvailability of other data examining program coverage standards or poverty Modeling household behaviorCountries with sufficient institutional capacity

Limited Full LSMS-type survey Core and rotating module Full LSMS-type survey (Rule 5 + Rule 6)(Rule I + Rule 4) (Rule I + Rule 2)

Ampie No newv survey needed Core and rotating module Core and rotating module (Rule 5 + Rule(Rule 3) (Rule I + Rule 2).................................................................................................................................... ........................... *.................................................................

Countries with limited institutional capacity

Limited Periodic scaled-down LSMS- Frequent scaled-down LSMS- Full LSMS-type survey (Rule 5 + Rule 6),type survey (Rule 8) type survey (Rule 8)

Ample No new survey needed Frequent scaled-down LSMS- Core and rotating module(Rule 3) type survey (Rule 8) (Rule 5 + Rule 7)a

a. International experts must be h red to carry out key tasks.Source: Authors' recommendat ons.

38

CHAPTER 2 MAKING DECISIONS ON THE OVERALL DESIGN OF THE SURVEY

Choosing modules to be included in a scaled-down LSMS-type survey

A good first step in choosing modules is to set the and would work in a core and rotating module survey

upper and lower limits of what can be included in a only if it were chosen as the topic emphasized in a

multitopic survey.The lower limit is the essential core particular year. The same circumstances apply to the

discussed above; in almost all cases this lower limit environmental module. The full set of these environ-

should be expanded to include the additional elements mental submodules is equivalent to a very large

that are in the recommended core. The upper limit expanded module, and for this reason it is difficult to

will depend on country-specific circumstances such as imagine the full set used in a single survey.The con-

the capacity of the statistical agency and the willing- tingent valuation modules should be used only when

ness of households to participate in lengthy inter- specific improvements in services (such as urban water

views. It is never possible to include all of the modules supply, urban sanitation, urban air quality, or rural

in any one survey. water supply) are being contemplated.

An important question to address relatively early Even a subset of the expanded environmental

when making decisions about modules is whether the modules is likely to be equivalent to a large expanded

survey will attempt to collect enough data to calculate module, especially if the water, sanitation, and fuel use

total income.The advantages and disadvantages of col- modules are included. This being the case, it is feasible

lecting these data are discussed at length in Chapter to include a large subset of the environmental modules

17. Clearly, if survey designers decide to collect the in a full LSMS-type multitopic survey, but only a rel-

data needed to calculate total income, the agriculture atively small subset can be included in a scaled-down

and household enterprise modules need to be includ- survey. In a core and rotating module survey a large

ed in the questionnaire. 7 If designers decide not to subset of the environmental submodules can be used

collect income data and there is little interest in these only if environmental topics are emphasized in that

two modules, they can be dropped, except for the particular year; for all other years only a small subset

questions on use of agricultural extension services that would be feasible.

are part of the essential core and the asset questions At this point, it is useful to give some general rules

that are part of the recommended core. about how much room there is for modules in differ-

It will probably not be possible to collect total ent kinds of multitopic surveys. For a full LSMS-type

income data in a scaled-down survey because it is not survey, the household questionnaire should be rough-

feasible in a single visit to a household to collect the ly large enough to include a mixture of about 15 stan-

recommended core data plus the data from the agri- dard or short modules. The number of modules that

culture and household enterprise modules and still can be included in a scaled-down survey is probably

have room to examine other topics. This implies that closer to 8 or 10, most of which have to be short ver-

it is also difficult to collect total household income in sions. A core and rotating module survey lies some-

a core and rotating module survey, except when the where in between but is probably closer to the scaled-

module featured is either the agriculture or the house- down survey if only one visit is made to the

hold enterprise module: even when one of these two household. The modules chosen must in all cases

modules is featured in such a survey, collecting total include the components of the essential core; in almost

income data may not be feasible in some countries. all cases the modules should also include the addition-

Two other specific decisions to make early in this al components found in the recommended core.

step of the survey design process are whether to col- Using these starting points for what is feasible, the

lect time-use data and whether to implement a large next task is to consult with policymakers at the high-

number of the detailed environment modules (see est level to get a detailed idea of which topics are of

Chapters 22 and 14, respectively).The time-use mod- greatest interest to them (if this has not already been

ule is very long, and as such should be thought of as done). Policymakers need to specify which topics are

an expanded module. If survey designers choose to of overriding concern, which are of moderate interest,

include this module, they may have to omit several which are of minor interest, and which are of little or

other short or standard modules. While it would be no interest. Expanded modules, if they exist, should be

feasible to include the time-use module in a full-size used for topics of overriding interest. 8 Standard mod-

LSMS-type survey, this module is probably too large ules should be used for items of moderate interest.

39

MARGARET GROSH AND PAUL GLEWWE

Short modules may be appropriate for items of minor Second, for each module, survey designers shouldinterest. Items of little or no interest need not be cov- match the policy issues raised by policymakers withered in the survey unless they are part of the essential the data required to analyze them, as laid out in eachor recommended core. chapter of Parts 2 and 3. One way to do this is to

The core and rotating module survey design is choose the smallest version of each module that caninherently more flexible than other classic designs; if address all of the relevant policy issues, and remove anythe core and rotating survey is implemented annually, questions in that module that are not needed to ana-it can cover four or five topics in great detail over the lyze these policy issues. If the module is still too long,same number of years by including the expanded ver- questions needed only to address the least importantsion of one of these modules each year. Of course, sur- policy issues are deleted. This shorter module isvey designers still have to set priorities about which checked again to see if it exceeds the provisional sizeexpanded module is included in the first year, which is limit.The general principle is that the most importantincluded in the second year, and so on. policy questions are addressed first and additional

The above paragraphs provide survey designers issues are added until the module has reached thewith a scheme for generating a draft list of the mod- length that survey designers, in consultation withules to be included in the survey, their approximate high-level policymakers, have set for it.length, and, to an extent, the objectives of each mod- Third, after this has been done for all modules,ule. Needless to say, this draft list needs to be refined. survey designers should prepare a list of issues theyThis can be done by adding two new "ingredients" to think can be covered by the survey and give this listthe process: discussions with policymakers who spe- to the high-level policymakers, who will decidecialize in particular topics or programs, and a careful whether they would like to change the amount ofreading of the chapters in Parts 2 and 3 of this book. space allocated to each module. The survey design-The task is to reconcile the specific policy questions ers should tell the policymakers about the tradeoffsraised by these more specialized policymakers with the involved, working with them to ensure that thefeasibility of collecting data to analyze them (as dis- issues policymakers deem most important arecussed in detail in Parts 2 and 3 of this book) given the addressed.

approximate sizes of each module as specified by high- Ultimately, this process produces a list of moduleslevel policymakers.This process is not simple and con- to be included in the survey, the proposed length ofsequently involves a certain amount of iteration. each module to be included, and the specific objec-

Unfortunately, policy issues raised by specialist tives for the modules. This completes the second steppolicymakers often require more questions than can fit of survey design. This step may need to be revisitedinto a module of the size specified in the first draft of later if results of the field test show that the question-the modules. The choice at this point is between not naires are too long or that there is room to expand theincluding many of these policy issues in the survey and questionnaire.expanding the module containing these questions atthe expense of other modules. A third alternative is Notesexpanding the relevant module without reducing thesize of any other module, but the feasibility of this The authors would like to express their gratitude to Jere Behrman,option is open to question and will not become clear Lawrence Haddad, Courtney Harold, John Hoddinott, Albertountil a draft questionnaire is field tested. Martini, Raylynn Oliver, Kinnon Scott, and Salman Zaidi for com-

Given this situation and the uncertainty regarding ments on an earlier draft.

what is feasible ancl what is not, survey designers 1. This book is designed to provide a thorough reviewv of inter-

should use the following procedure to reconcile the national experience. However, new experience and knowledge will

specific objectives of each module with any constraints continue to accumulate after the book has been published.Therefore,on module size. First, designers should ask policymak- until a new book is written, any new international-level informationers who specialize in a given topic to rank the policy is probably most easily obtained from international researchers.issues in order of importance, so that the module can 2. If geographic areas-rather than households-are the unit ofcollect the data needed to analyze the most important observation, it may be possible to merge data from different sur-policy issues despite the inevitable constraints. veys. However, this high level of aggregation yields less precise

40

CHAPTER 2 MAKING DECISIONS ON THE OVERALL DESIGN OF THE SURVEY

results, raises issues of aggregation bias, and generally requires sur- 6. Each module in the household questionnaire should also be

veys with very large samnple sizes. included in the community questionnaire. See Chapter 13 for fur-

3. A variety of external sources have been used to fund past LSMS ther discussion of the community questionnaire.

surveys.World Bank loans have partially financed several LSMS sur- 7. While all versions of the household enterprise module col-

veys. Grants from various bilateral development agencies (especially lect income information, only the standard and expanded versions

from the United States, Scandinavian countries, andJapan) and mul- of the agriculture module collect sufficient data for use in the

tilateral development agencies (particularly the United Nations measurement of total income.

Development Programme and the United Nations Children's Fund) 8. A full LSMS-tvpe survey could accommodate two and possi-

have wholly or partially financed a large share of LSMS surveys. In a bly three expanded versions of modules; a scaled-down survey could

few cases, grants from the World Bank research budget have support- accommodate at most one.Volume 3 presents expanded versions of

ed LSMS surveys. Similar surveys, such as the World Bank's SDA sur- the following modules: roster, education, health, employment,

veys, RAND's Family Life Surveys, and a few other surveys in Africa, migration, enviromnent, household enterprises, and agriculture. The

all receive a large share of their funding from external sources. time-use modules introduced in Chapter 22 should also be treated

4. Most previous LSMS surveys have used two-stage sample as expanded modules, and the same is even more true for the full set

designs. If a three-stage sample design is used, ID codes will be of environmental modules introduced in Chapter 14.

needed that identify both the primary and secondary sampling

units of each household. An analogous comment applies to surveys Referencesthat use four or more stages in their sample designs.

5. In large countries with federal systems. surveys can be per- Grosh, Margaret, and Juan Mufioz. 1996. A allanualfor Planning and

formed for individual states. Such surveys usually have the same Iinplenlentinig the Living Standard Mlieasurenzenr Study Surucy.

general purposes as national surveys, and have samples that are rep- Living Standards Measurement Study Working Paper 126.

resentative of the whole state. Washington, D.C.: World Bank.

41

3 Designing Modules and Assembling Them intoSurvey Questionnaires

Margaret Grosh, Paul Glewwe, and Juan Munoz

Chapter 2 outlined the five-step process that survey designers should follow to design LSMS and

similar multitopic surveys. It also provided detailed recommendations on how to undertake the

first two steps, which are deciding on the overall design of the survey and deciding which mod-

ules to include in the survey questionnaire.This chapter discusses the last three steps of the five-

step survey design process.The first section of this chapter describes the third step-drafting each

module, question by question, to ensure that it will collect the data necessary to meet the mod-

ule's objectives (which were laid out in the second step).The second section guides survey

designers through the fourth step-coordinating the different modules and combining them to

create a consistent and comprehensive set of questionnaires. The third section explains the proce-

dures for the last step -translating the questionnaires into local languages and conducting a field

test. The fourth section discusses the formatting of the questionnaires, which is an extremely

important but often neglected aspect of designing successful multitopic surveys. Survey designers

should refer to the material contained in the fourth section many times during the last three

steps of the survey design process.

In practice, the survey design process rarely moves the people involved in carrying out the fieldwork (the

smoothly and sequentially from one step to the next. data producers) but also policymakers (who will make

Instead, survey designers often find themselves moving decisions based on the data), members of the research

backward and forward among the various steps. For community (who will analyze the data), and the staff

example, if designers encounter difficulties when of any agencies financing or providing technical assis-

drafting a specific module, they may need to reconsid- tance to the survey. Eventually, what should emerge

er and modify their original objectives for that mod- from the process is a well-designed set of question-

ule. Developing survey questionnaires is an iterative naires for a multitopic household survey.

process, and survey designers should expect to gothrough at least three or four drafts of each module. It Producing Draft Modulesis not unusual for the different versions of the drafts toadd up to a stack of paper one foot (30 centimeters) The third step in survey design, producing draft mod-

high. Each major redraft of a module or questionnaire ules for the household and community questionnaires,

should be reviewed by all interested parties, not only is one of the most time-consuming steps in the

43

MARGARET GROSH, PAUL GLEWWE, AND JUAN MUNOZ

process. Detailed guidance on this step is provided in must also review all questions and response codes and,the chapters in Parts 2 and 3, so the discussion here if necessary, modify them to reflect local institutionswill be general and relatively brief and terminology. For example, the transfers and other

Once the objectives for each module are finalized nonlabor income module discussed in Chapter 11(at least tentatively), survey designers can begin to must explicitly refer to each public transfer programdevelop detailed draft modules for the household and by name. The consumption module will need evencommunity questionnaires. Survey designers should more work; in particular, the lists of items selecteduse the draft "prototype" modules introduced by the must closely reflect items consumed in the country.chapters in Parts 2 and 3 (and presented inVolume 3) The agricultural module will need careful attention, asas their starting point. As explained in Chapter 2, sur- this module must reflect the country's landholding andvey designers will already have decided on the policy cropping patterns.and analytical objectives of each module. They should For many of the modules, survey designers maynow choose the shortest versions of the modules that find it useful to collect some preliminary data usingwill allow for analysis of the most important of these qualitative techniques, which may help them deter-objectives; any questions not relevant to these objec- mine how best to design these modules to collecttives should be removed. quantitative data. Chapter 25 provides a detailed dis-

If the resulting module is still too long, survey cussion on how to collect qualitative data. Such datadesigners should remove any questions that are need- can be particularly useful in countries where success-ed only for the analysis of the least important of the ful quantitative surveys have never been done for thepolicy issues. This process should continue until the topic to be studied.module meets the length constraint. In some cases the A final general issue to consider when draftingmodule may be shorter than expected, in which case modules is the role played by the fieldwork schedule.a policy issue and its accompanying questions can be A prototypical full LSMS survey spreads fieldworkadded. The general principle is that the most impor- evenly over a 12-month period, for two reasons. First,tant policy issues should be addressed first, and addi- this makes it possible to study or average out any sea-tional ones should be included only if space allows. sonality effects. Second, and more importantly, surveysThis approach is a good start, but much more remains with this fieldwork schedule require a smaller numberto be done. of survey field teams than do surveys that compress the

For some modules the information and guidance fieldwork into a shorter period of time. This smallergiven in the relevant chapter in Parts 2 and 3 of the number of teams reduces costs and allows forbook may be incomplete. For example, the chapter improved quality control. All of the interviewers canmay not address certain policy issues that are impor- be trained together and thus to a uniform standard; intant in a given country or setting, in which case the addition, the cost of training interviewers-whichdesigners of that survey will need to develop new takes about four weeks-will be proportionatelymodules or submodules. Even in these cases the infor- cheaper. Each interviewer will complete more inter-mation provided in the relevant chapter is usually a views and thereby gain more experience. Finally, fewergood base for developing such modules. However, if computers and vehicles will be required.the designers intend to implement major innovations Despite these advantages of a year-long surveyin their survey, they should seriously consider adding period, many past LSMS surveys have compressedto the survey team a specialist with the relevant expe- fieldwork into a period of just two or three months.rience in both data collection and data analysis. This has often been done when there was pressure on

Once each draft module has been written out in the survey team to collect data for analysis as quicklyits entirety, the next task is to verify that the design of as possible. In other cases interviewers may have beeneach module reflects the economic and institutional available for only a short period of time, or the organ-structures of the country in question. For example, the ization funding the survey may have required that thedesigners need to check whether common living project be completed in a relatively short amount ofarrangements are reflected in the definition of the time. The fieldwork schedule can also be modified tohousehold used in the household roster and in the accommodate analysis of certain topics. For example,housing and interhousehold transfer modules. They analysis of some agricultural issues may require inter-

44

CHAPTER 3 DESIGNING MODULES AND ASSEMBLING THEM INTO SURVEY QUESTIONNAIRES

viewers to make two or more visits to each household well as in Ainsworth and van der Gaag (1988). A good

at different times during the year. general reference publication for developing and

Variations in the fieldwork plan may require designing household survey questionnaires is United

changing the wording of some modules. This means Nations (1985). More recent general references are in

that survey designers should ensure that the design of Babbie (1990), Fink (1995), and Fowler (1993);

each module in the questionnaire is consistent with although these books focus more on developed coun-

the fieldwork plan. tries, much of the material they contain is also relevant

When a survey is conducted over a relatively short for developing countries. A final point to bear in mind

period, such as a few weeks or months, careful atten- is that a good deal of attention must be given to cor-

tion must be given to the wording of questions con- rect and consistent formatting. This is described in

cerning events that are seasonal in nature.Will school great detail in the fourth section of this chapter; sur-

be out of session for a large portion of the survey peri- vey designers should read that section very carefully

od? If so, the education module may need to be before they begin designing any survey modules.

changed to reflect this. In particular, questions refer-

ring to school activities during the previous week, Integrating and Combining Modules to Create

such as the number of days that a child attended Complete Questionnaires

school or the number of hours of homework done by

the child, would clearly be inapplicable. Also, questions Once draft versions of each of the individual modules

about water supply during both wet and dry seasons have been written, these drafts must be combined to

should be reviewed to ensure that they reflect the cir- form complete household and community question-

cumstances of these seasons. The largest seasonal naires. Merely stapling the various modules together

changes may need to be made to the agricultural will not produce a well-designed questionnaire; much

module. A detailed discussion of the implications of more work has to be done to ensure that the different

seasonality for that module is provided in Chapter 19. modules fit well together. This section describes how

More substantial changes will be required if the to do this important task. It focuses primarily on mak-

household is to be visited more than once at different ing the modules of the household questionnaire con-

times of the year. In such cases it may be desirable to sistent with each other. Similar, though less difficult,

have the interviewer administer modules for which issues arise when integrating the modules of the com-

the answers are expected to vary by season (such as the munity questionnaire; in most cases the approach to

consumption, agriculture, water, or time use modules) take for the community questionnaire can be inferred

each time he or she visits the household. In contrast, from the discussion of the household questionnaire.

the modules that are unlikely to be affected by season- This section will also highlight particularly important

ality, such as housing, education, fertility, or migration, points to consider when combining the household,

probably need to be administered only once. Any community, and price questionnaires to form a com-

modules that are to be administered more than once prehensive household survey.

usually need to be modified, particularly with respect

to their recall periods. For example, if the interviewer Gaps and Overlaps

makes two household visits six months apart, the con- Survey designers must scrutinize and compare the dif-

sumption module should be administered in both vis- ferent questionnaire modules for gaps and overlaps in

its and should have a recall period of six months rather the information that the modules collect. Analysts

than one year. Also, the water module should ask only often need to combine data from different modules in

about the particular season (wet or dry) during which the household questionnaire. Perhaps the most impor-

the interview is to be conducted. tant example of this is the calculation of each house-

The guidelines given in this chapter are general, hold's total consumption, which requires information

since very detailed information is provided in Parts 2 not only from the consumption module but also from

and 3 of this book. Other information on adapting the education, health, employment, and housing

LSMS questionnaires to fit local circumstances can be modules-and from the water, sanitation, or fuel mod-

found in Oliver (1997), which focuses on survey ules (see Chapter 14) if they are included as separate

design in the countries of the former Soviet Union, as modules in the questionnaire (as opposed to using the

45

MARGARET GROSH, PAUL GLEWWE,AND JUAN MUNOZ

housing module to collect information on these top- Some simple examples illustrate this point. The

ics). Likewise, income data are collected in the expanded water module contains questions about the

employment, agriculture, household enterprise, and price and quality of water from different potential

miscellaneous income modules. It is important to water sources. If the primary sampling units are geo-

check that a questionnaire includes the data needed to graphically compact, all of the households in each pri-

construct these and other complex variables. mary sampling unit are likely to have the same alterna-

Another example of this general issue is that sur- tive water sources, implying that the water price and

vey designers often have a choice regarding the mod- quality questions can be put in the community ques-

ule in which to collect some kinds of information. For tionnaire (which should be administered in each pri-

example, data on expenditures on fuel for cooking and mary sampling unit) rather than in the household

heating can be collected in the consumption module, questionnaire. On the other hand, if the primary sam-

the housing module or, if it exists, the expanded fuel pling unit is not compact so that the households are

module. Questions on child immunization can be widely dispersed, it is likely that some households will

placed in the fertility module, the health module, or be nearest to, say, a particular spring or well while other

the anthropometry module. An argument can be made households will be closer to other springs or wells. In

for choosing any of these options (see the pertinent such cases these questions about alternative water

chapters in Parts 2 and 3 of this book), but the essen- sources should remain in the household questionnaire.

tial point is to ensure that the information is collected Another example concerns the distance to schools

at least once, and is collected twice only if there is a and health facilities. In a compact primary sampling

reason to do so.' Appendix 3.1 provides a list of the unit, the distance to the nearest school or health facil-

most common types of gaps and overlaps to check. ity probably varies little among the households in the

In cases in which information could be plausibly primary sampling unit. This means that information

be collected in more than one part of the question- on the distances to schools and health facilities can be

naire, there may be no absolute right or wrong place collected in the community questionnaire as opposed

to collect it. Rather, survey planners must take into to the household questionnaire.

account who the respondent is in each module, how

well the best recall period for that information match- Length

es the recall periods of modules in which it might be The overall length of the household questionnaire

collected, at what point in the interview the respon- must be manageable. In general, it is not feasible to

dent might discuss the topic most naturally, and include, say, the standard version of each module pre-

whether the topic is a sensitive one that should there- sented in Volume 3, even though past full LSMS sur-

fore be addressed near the end of the questionnaire veys typically included 15 modules, many of which

(for reasons discussed further below). were similar to the standard versions in this book.

The survey designers should also examine any There are several reasons why using all of the stan-

overlaps among the household, community, and price dard draft modules in this book is not feasible. First,

questionnaires. In general, the community and price this book introduces several new modules, including

questionnaircs should collect information on any the time use module and several environmental mod-

topic that varies only slightly from household to ules. Second, some of the standard draft modules, such

household within the primary sampling unit. 2 While as those on health, migration, and household enter-

much of the information collected in the communi- prises, are much longer than the modules on those

ty and price questionnaires could be collected in the topics that were used in previous LSMS surveys.

household questionnaire, it is better to collect it in the Finally, in some of the chapters in this book (includ-

community questionnaire in order to shorten the ing Chapter 18 on household enterprises and Chapter

length of each household's interview. Collecting this 19 on agriculture) it is argued that collecting more

information in the community questionnaire is also detailed data will greatly increase their value for ana-

more efficient; why collect it for all households in a lytical purposes. Thus survey designers should not

primary sampling unit (often 16 or 20 households) combine the standard versions of all of the modules

when it need be asked only once in the community presented in the book into a single household ques-

questionnaire? tionnaire. Instead, the short versions should be used for

46

CHAPTER 3 DESIGNING MODULES AND ASSEMBLING THEM INTO SURVEY QUESTIONNAIRES

some modules, and in almost all cases at least one or test, which is discussed in detail in the fourth section

two modules should be dropped. of this chapter. If field test interviews require many

Assessing whether a draft questionnaire is too hours to complete and exhaust the cooperation and

long is not simply a matter of counting the pages or patience of households, this is an indication that the

questions in it, since many questions, and sometimes questionnaire is too long. At the same time, survey

even entire pages or modules, will apply only to some designers should realize that field test interviews nor-

households. Moreover, in some cases adding questions mally require much more time than do similar inter-

does not lengthen the interview time because the views during an actual survey, because interviewers

respondent cannot avoid going through the thought have little training or experience with the question-

process made explicit in these questions, which implies naire at the time of the field test. In addition, the ques-

that a supposedly abbreviated set of questions will not tionnaire used in the field test is not a final draft and

reduce the time required to complete the interview. thus is likely to contain some problems that will slow

An example of this is the calculation of income down the interviews. A handy rule of thumb is that

derived from agricultural activities. interviews in the actual survey take only about half of

There are also several ways to implement long the time that they take in the field test-and some-

questionnaires that minimize the time required by times even less than that.

(and the fatigue induced in) each survey respondent. A general goal to aim for in the actual survey is

These include conducting individual "mini- that any given respondent should not be interviewed

interviews" with each household member to collect for more than one hour on a given day. Of course,

all of the information needed from that individual at people's tolerance for being interviewed will vary

one time (which allows him or her to leave when from country to country, and this general guideline

questions are being asked of other household mem- must be adapted to suit local conditions. Experience in

bers); using the best-informed respondent for each LSMS surveys to date suggests that people's tolerance

household module; and dividing the interview into for long interviews is lower in urban areas than in rural

multiple visits (for example, going through all the areas, lower among wealthy households than among

individual-specific modules in one visit and returning poor ones, and lower in wealthier countries than in

on a different day to conduct the consumption mod- poorer ones.

ule and other household-level modules). LSMS sur-

veys use all of these techniques. Still, there is a limit to Recall Periods

the amount of information that can be gathered from The recall periods proposed for each module intro-

a single household. duced in this book are mostly those that the authors

How can survey designers determine whether a have deemed appropriate for that particular module.

household questionnaire is too long? A rough idea of This can be a problem when analysts want to combine

the effective length of the questionnaire in different or compare data from several modules. For example, in

circumstances can be obtained by calculating how many LSMS surveys the employment module uses a

many households will go through the different paths one-week recall period. Since most adults work, this

created by the skip patterns and how many questions yields a large number of observations, and the period

will be asked for each possible path. An excellent of time is short enough to yield accurate answers to

example of this is provided in Chapter 18 on house- such basic questions as the number of hours worked

hold enterprises, in Table 18.5. and the payments received during this recall period. In

A more precise estimate of the time required to contrast, the health module uses a four-week recall

administer a household questionnaire can be obtained period. This relatively long recall period is used

when similar surveys have already been done in the because most people are not ill in any given week.The

country or region studied. In this case, the designers of four-week recall period allows more observations of

the new survey will be able to find out how long the illness for a given sample size than would be obtained

interviews took in the previous survey, provided that using a one-week recall period. Since illnesses are

the earlier survey collected metadata along the lines important events, respondents can be expected to

suggested in Chapter 4. If such information is not remember many details of their episodes of illness dur-

available, survey designers will need to rely on the field ing the past four weeks.

47

MARGARET GROSH, PAUL GLEWWE, AND JUAN MUNOZ

However, if an analyst wants to study the impact A particularly important task is to coordinate theof illness on earnings or work effort, these different coding of items in the consumption expenditurerecall periods will complicate the analysis.The analyst module with items in the price questionnaire. Ascannot tell whether the illness took place before or explained in Chapter 2, price data are needed to gen-during the period for which the earnings and hours erate regional and temporal price indices that enabledata were collected. This could be resolved either by comparison of real expenditures of households inter-adding questions to the health module to specify the viewed in different places and at different times.This isdays during the recall period on which the respondent done by matching the prices collected in the pricewas ill or by making the recall periods coincide, per- questionnaire with the consumption expenditurehaps with a compromnise of two weeks for both mod- information gathered in the consumption module. Ifules (bearing in mind the disadvantages in sector-spe- the items are not well matched, this task becomescific analyses of using a recall period different from the more difficult, and the resulting price indices will be"ideal" one for that module). Part of the job of inte- less accurate. In general, the goal should be a one-to-grating the draft modules is to determine and judge one correspondence between the items listed in thethe tradeoffs being made, either confirming that they consumption module and the prices collected in theare acceptable or altering them until a more appealing price questionnaire. For example, if questions are askedtradeoff is reached. on two or three varieties of rice or wheat in the con-

sumption module, a price for each variety should beNomenclature and Coding Schemes collected in the price questionnaire.The questionnaire should be reviewed to check that This should be relatively simple to do for almostwherever similar questions are asked, the nomencla- all food items. Nonfood items are more difficult. It isture and coding schemes are the same. This should usually not possible to obtain prices for durable goodsreduce coding errors and simplify data analysis. For because they often come in many varieties (for exam-example, many different modules allow the respondent ple, there are many kinds of bicycles or televisions).to choose the time unit (for example, hour, day, week, However, for nondurable items, prices can be obtainedor month) that they find most convenient when for well-defined examples. For example, there areresponding to questions regarding time or payments many kinds of shirts, but if a specific widely purchasedover time (such as wage rates, the length of time spent type of shirt can be defined, data on that type of shirtgathering firewood, and the length of time covered by can be collected in the price questionnaire and used asa payment for water). The code numbers for these an indicator of prices for all kinds of shirts. Seetime units should be the same throughout the entire Chapter 13 for a detailed discussion of the price ques-questionnaire; in the draft modules presented in this tionnaire, including a list of suggested food and non-book "day" is always coded as "3," "week" is always food items to include in it.coded as "4," and so on for other units of time.

Another example concerns the migration mod- Choosing the Order of the Modules in the Householdule, the transfers received page of the transfers and Questionnaireother nonlabor income module, and the transfer pay- A final and very important question to address is thements page of the consumption module. All have order of the modules in the household questionnaire. 3

questions about where the migrant, donor, or recipi- It is natural and convenient to arrange the modules inent lives. The coding scheme that categorizes this the order that they will be administered, so the keyinformation, whether it is the type of place (capital issue here is the order in which the modules will becity, other urban area, rural area, or overseas) or the administered and how this affects the physical designname of the place, should be uniform. Likewise, sever- of the questionnaire.al modules include questions about the relationship To put this issue in context, consider the tradi-between two individuals. It is usually a good idea for tional fieldwork plan for a full LSMS survey. Each fieldthese questions to use the same codes that are used in team works in its assigned primary sampling unitsthe household roster module to indicate the relation- (communities) twice.The first time a team arrives in aship of each household member to the head of the primary sampling unit, it works there for about onehousehold. week. The first half of the questionnaire, most of

48

CHAPTER 3 DESIGNING MODULES AND ASSEMBLING THEM INTO SURVEY QUESTIONNAIRES

which usually consists of the individual-specific mod- more visits to each household during the sole trip to

ules, is completed for each household. In addition, a the primary sampling unit.

short module is administered that asks which house- Given these different possible fieldwork plans,

hold members are best able to answer questions con- there are several basic principles about how to order

cerning the specific household-level modules (agricul- the modules in the household questionnaire. The first

ture, household enterprises, consumption, and savings) principle is that any modules on topics that respon-

that will be filled out when the team returns to the dents might consider sensitive should be put at the end

primary sampling unit about two weeks later. Figure of the questionnaire.This gives the interviewer time to

3.1 provides an example of such a module. develop a rapport with the household members,

The field team works in a different primary sam- which should increase the probability that they will

pling unit during the following week, while the data answer questions on sensitive issues, and do so truth-

in the half-completed questionnaires from the first fully. It also means that if the respondent breaks off the

primary sampling unit are entered into a computer by interview in response to a sensitive question, only the

a data entry operator (who does not travel with the data from that last module or modules are lost. Finally,

team) using a data entry program. The data entry pro- by this point in the interview, any interested onlook-

gram checks the first half of the questionnaires for a ers, such as family members and neighbors, may have

wide range of errors and inconsistencies. (This is dis- wandered away, making it possible to administer the

cussed more filly in Grosh and Munoz 1996.) The more sensitive portions of the questionnaire with

team returns to the first primary sampling unit in the greater privacy. Education, housing, migration, and in

third week, administers the rest of the questionnaire some cases health 4 are usually good topics with which

(which mainly consists of household-level modules), to open the interview, because people generally do not

and resolves any problems or inconsistencies found by mind talking about these topics. In contrast, fertility,

the data entry program when the data from the first savings, credit, and transfers and other nonlabor

half of the questionnaire were entered. income are among the most sensitive topics in the

In several recent LSMS surveys, two different pro- household questionnaire.

cedures have been used in the fieldwork stage. One A second principle concerns bounded recall peri-

procedure is that the data entry operator travels with ods. In past LSMS surveys in which the interviewer

the field team. This option has become feasible with made two visits two weeks apart to each household,

the advent of small laptop computers that can be pow- some parts of the questionnaire used bounded recall

ered by batteries, vehicle cigarette lighters, or solar periods; in other words, questions were asked such as

panels.This allows the whole questionnaire to be filled "How much has your household spent on rice since

out and checked using the data entry program during my last visit?"As explained in Chapter 5, using bound-

a single trip to the primary sampling unit. In addition, ed recall periods can increase the accuracy of the

the second half of the questionnaire can be checked by respondents' answers. Obviously, if bounded recall

the data entry program almost immediately, so that periods are used in certain modules, these modules

interviewers can return to the sampled households to must be administered in a second visit to the house-

resolve any problems detected by the program. hold and thus be included in the second half of the

The other procedure, used when a scaled-down questionnaire. The two modules in Volume 3 that

LSMS survey is being implemented, is to complete all explicitly use bounded recall periods are those on

of the interviews in a single trip to the primary sam- consumption (Chapter 5) and household enterprises

pling unit and sometimes even in single visits to each (Chapter 18).5

household. This procedure will have a serious disad- A third consideration is the selection of respon-

vantage if the data entry operator does not travel with dents. As explained above, several modules (including

the team, because none of the data can be checked in the consumption, agriculture, household enterprises,

time to return to the households to resolve problems savings, housing, and environmental modules) collect

detected by the data entry program. If the data entry much or all of their data at the household level, which

operator travels with the team, there is little difference means that the questions are answered by the house-

between this procedure and the former procedure, hold member most knowledgeable about that topic.

except that a full LSMS questionnaire will require With the exception of the housing module, these

49

on FIGURE 3.1: MODULE FOR CHOOSING RESPONDENTS TO BE INTERVIEWED IN THE SECOND HALF OF THE QUESTIONNAIRE

C) >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~-

RESPONDENTS FOR SECOND HALF OF QUESTIONNAIRE |

RESPONDENT: THE PERSON BEST INFORMED OF THE CACTIVITIES OF THE HOUSEHOLD MEMBERS

FULL NAME OF THE RESPONDENT: ID CODE E

z1. Who shops for the food for your household? 5. During the past 12 months has any member of

your household participated in agriculturalNAME: ID CODE [ production, forestry, or raising livestock? Z

CYES ... 1 1zNO .2(>8) lN

2. Who in your household knows most about the 6. Who is the person who knows most about all thenon-food expenses of the members of your household? agricultural and livestock activities of the members

of your household?NAME: ID CODE

NAME: ID CODE [ j7. In addition to this person, who else in yourhousehold manages plots of land owned

3. Who in your household knows most about the miscellaneous or rented in by the household? Who is responsibleincome and transfers received from other households? for plots that are rented out by the household?

NAME: ID CODE IDNAME CODE

4. Who in your household knows most about the savings in your household? _

NAME: ID CODE|l=

FIGURE 3.1: MODULE FOR CHOOSING RESPONDENTS TO BE INTERVIEWED IN THE SECOND HALF OF THE QUESTIONNAIRE

8. 9. 10.

Over the past 12 months, has anyone in What kind of enterprises does your household operate? Who is most informed about and/or in charge ofyour household operated any non- day-to-day operations of the enterprise?

agricultural enterprise that produces goodsor services (for example, artisan,metalworking, tailoring, repair work, andprocessing and selling your outputs from PROBE TO DETERMINE INDUSTRIAL SECTOR IN WHICH ENTERPRISEyour own crops, if done regularly) or has OPERATES.anyone in your household owned a shop oroperated a trading business?

YES ..1 ENTERPRISE ID

NO. . 2 ID FULL WRITTEN DESCRIPTION CODE NAME CODE H

(:NEXT MODULE) 1

z

m

z

2 r~~~~~~~~~~~~~~~~~n

m

Ce

zU

w

z0-II

I

MARGARET GROSH, PAUL GLEWWE, AND JUAN MUfNOZ

modules are quite lengthy. Thus for each of these the names of all the household members, is usuallymodules it is usually best for the interviewer to ask placed further back in the questionnaire so that thewhich member would be the most appropriate names on that page can be seen during the adminis-respondent during the first visit to the household, and tration of all individual-level modules. Thus the phys-then make an appointment to interview that person at ical placement of this page will not reflect the timea later, more convenient time. In the traditional two- during the interview when it is filled in. (For detailsvisit fieldwork plan, this implies that these modules, see the discussion on the fold-out roster page in theexcept perhaps housing, should be administered dur- fourth section of this chapter.)ing the second visit and thus should be located in the After the household roster, it is useful to fill outsecond half of the questionnaire. However, if the team the form on selecting household respondents showntravels only once to the primary sampling unit, it is still in Figure 3.1; this form can be administered to thefeasible to make appointments for later in the day or same person who answered the household roster ques-for another day, which gives survey designers more tions (usually the head of household or the personflexibility in deciding where to place these modules in most knowledgeable about other household mem-the questionnaire. bers). It is useful to collect this information early

A fourth principle relates to the logistics of data because it can be used to save household members'entry. The individual-level modules include many time by interviewing them sequentially using "mini-more questions for which strict range and consistency interviews."That is, after the interviewer has adminis-checks can be built into the data entry program than tered the form that identifies the relevant respondentsdo the modules on consumption, agriculture, and for the household-level modules, he or she shouldhousehold enterprises. 6 If the whole questionnaire is administer all of the modules that are clearly individ-completed using the traditional LSMS fieldwork plan ual-specific (except the credit and fertility modules) to(two visits two weeks apart), all the individual-specific each household member, finishing all such modulesmodules except the credit module should be adminis- with one member before interviewing another mem-tered during the first interview. (The credit module is ber. These are the education, health, employment,probably too sensitive to be administered during the migration, and time use modules. Some householdfirst interview.) This will allow the survey team to members will not need to be interviewed further, andenter the data from these modules and to detect any thus their mini-interviews will consist of the inter-apparent errors or inconsistencies that could then be viewer administering only these modules. In contrast,resolved in the second interview. If the data entry other household members will also be the respondentsoperator travels around with the interview team, the for some of the household-level modules. For exam-data from the interviews can be checked in a matter of ple, the respondent for the housing module can alsohours; thus where these modules appear in the order provide answers for the questions in that module asof the household questionnaire becomes less impor- part of his or her mini-interview. Using this method,tant for the purposes of data entry. the interviewer can obtain all of the information

Given these principles, and some common sense, needed from each individual in a way that minimizesmore specific advice can be given. Each household the use of respondents'time; once a respondent finish-questionnaire should have the metadata module at the es the mini-interview he or she can leave or start somevery front, since much of the information that module activity without further interruption.collects (such as whether the interviewer successfully Within this group of individual-level modules,located the household, the date of interview, and the those on education and migration should be adminis-language in which the interview was conducted) tered first since the information they collect is not verybecomes apparent at the very beginning of the inter- sensitive. Some employment information can be sensi-view. The next module should be the household ros- tive, particularly questions concerning wages, so thister; this must be completed before any other module should be one of the last of the individual-level mod-because it determines who is and who is not a house- ules to be completed, if not the very last. If the shorthold member, and thus determines the people to health module is used, it can be put near the front.whom all the other modules will apply. However, at However, if the standard or expanded version is used, itleast one page of the household roster, the one with should be placed toward the end of the individual-

52

CHAPTER 3 DESIGNING MODULES AND ASSEMBLING THEM INTO SURVEY QUESTIONNAIRES

specific modules because of the sensitive nature of who still need to be interviewed after the "mini-

some of the questions in this version (see endnote 4). interviews" are completed, as this will allow people

Which modules should go near the end of the to leave if they do not need to be interviewed fur-

questionnaire? Because the three most sensitive mod- ther. Continuing the agriculture example, note that

ules are those that collect information on savings, the form in Figure 3.1 identifies all of the household

credit, and transfers and other nonlabor income, these members who either manage or work on a plot of

three modules should probably be put at the end of land. Household members who do not fit this

the questionnaire. Another potentially sensitive topic is description and who are not needed to complete

fertility. In countries in which fertility is particularly any other household-level module can leave after

sensitive, it should come immediately before the sav- their "mini-interview" is finished.

ings, credit, and transfers and other nonlabor income This completes discussion of the fourth step of

modules. integrating the draft modules and combining them

Where should the other modules go? If the tradi- into a complete set of questionnaires. The primary

tional "two visits two weeks apart" interview system is focus has been on the household questionnaire, since

used, the consumption and household enterprise the community questionnaire is much smaller. (See

modules should be in the second half of the question- Chapter 13 for a detailed discussion of the communi-

naire since these modules often use a bounded recall ty questionnaire.) Designers of prospective surveys can

period, namely the time since the interviewer's previ- consult the questionnaires used in previous LSMS sur-

ous visit. Modules that are long and also need to be veys by downloading them from the LSMS website,

administered to specific respondents-the consump- http://worldbank.org/lsms/lsmshome.html.

tion, agriculture, household enterprise, and environ-

ment modules-should also be completed in the sec- Translating and Field Testing the Draft

ond visit. Finally, as discussed above, the housing Questionnaires

module can be administered in the first interview

because it is unlikely to contain any sensitive ques- After the draft modules have been combined into a

tions. If the two-visit system is not used, these modules complete set of household, community, and price

can be put anywhere between the individual-specific questionnaires, they need to be translated and field

modules and the more sensitive modules. tested.7 The field test is particularly important because

Finally, the goal of saving respondents' time by it is the last check on the design of the questionnaires

conducting mini-interviews with each respondent before the survey is implemented.

(who can leave after his or her mini-interview is fin-

ished) is complicated by the fact that the household Translation

enterprise, agriculture, miscellaneous income, and It may be necessary to translate the questionnaires for

credit modules consist of a mixture of household-level three reasons, each of which has different implications

and individual-level questions. For example, in the for the design of the survey. The most common and

standard and expanded versions of the agriculture most important reason is that respondents may speak a

module, individual household members are asked range of different languages. In many countries more

whether they have worked on specific plots of land. than one language is spoken. In these countries quali-

However, these questions cannot be asked until sever- ty control requires that a separate questionnaire be

al other questions have been asked about the different produced for each of the major languages spoken in

plots of land owned and rented in by the household the country, with every question written out verbatim.

members-and such questions would be awkward to Scott and others (1988) demonstrated how this

ask in a form as simple as the one shown in Figure 3.1. procedure greatly increases the accuracy of the data

The best way to resolve this problem will collected. They conducted an experiment designed to

depend on which modules and which versions of measure interviewer errors when the interviewer had

these modules are included in the household ques- to translate each question during the interview. For

tionnaire, so it is difficult to provide general advice. example, the interviewer may have had to use a ques-

However, one way to reduce the time burden on tionnaire written in English to conduct an interview

household members is to identify all of the people in Tagalog or Cebuano or a questionnaire written in

53

MARGARET GROSH, PAUL GLEVWWE,AND JUAN MUNOZ

French to conduct an interview in Baoule or Dioula. written down in the interviewers' manual. In the caseThe interviewers' error rates were two to four times of the least common languages, local interpreters werehigher when they translated questions during the used when none of the interviewers spoke the lan-interview than when they used questionnaires already guage. In this respect, while previous LSMS surveyswritten in the languages used by the respondents. have conformed to normal survey practice, they have

While the final versions of the questionnaires not reached the cutting edge of quality control asmust be translated from the national (official) language defined by the World Fertility Surveys. The guidelinesto produce verbatim questionnaires in the other lan- used in those surveys require that questionnaires beguages used in the country, the preliminary drafts of prepared in all languages used by more than 10 per-the questionnaire can be developed using only the cent of the sample and that a minimum of 80 percentnational language. Ideally, the version of the question- of the sample be interviewed using questionnairesnaire to be used for the field test should be translated written in the respondents' native language.into each of the languages that will have a final writ- Future LSMS and similar multitopic surveysten version of the questionnaire. In practice, field tests should make greater efforts to translate the householdare often done using only oral translations of the questionnaire into local languages. When preparingnational language version of the questionnaire. Thus these translations, the questionnaire should always bethe wording in the local language interviews during worded in the way that the language is commonlythe field test may not correspond exactly to the word- spoken, using relatively simple terms and avoiding aca-ing that will be used in the written translations of the demic or formal language. The gap between the spo-final questionnaire. While this is an imperfect way to ken and written languages and the difficulty of strik-proceed, it is often a reasonable tradeoff given the high ing a balance between simplicity and precision may becosts, in both time and in money, of field testing the greater in local languages, especially ones that are notquestionnaires in each language. commonly used for reading and writing. The transla-

After the final version of the household question- tors should therefore be especially careful to try to findnaire has been translated into another language, the an appropriate balance.translation needs to be carefully checked.The best way Two examples illustrate the kind of problems thatto do this is to use "back translation."That is, after the can occur. The question "&Estuvo enferma en las ulti-questionnaire is translated from the language in which mas cuatro semanas?" literally asks, in Spanish, whetherit was developed into the languages in which it will be the respondent was sick in the past four weeks.administered, someone should translate the versions in However, in spoken Spanish in Chile it could bethose languages back into the original language. After understood as a polite euphemism for asking whetherthis "back translation" has been accomplished, the two a woman has had a menstrual period in the last fourversions in the first language should be checked. weeks.An even more difficult problem in wording wasWhere there is a discrepancy in wording or meaning revealed in the field test in Nepal.Apparently the mostbetween the two versions, the translation should be natural Nepali phrasing for "Have you been ill?" iscarefully checked.A person or group of people famil- closer to "Have you been to the doctor?"The changeiar with the purpose of the questions should do the in meaning from what was intended appeared in thefirst translation. The back translation should be done field test several times when respondents answeredby someone who was not intimately involved in "No, I couldn't afford to go," clearly an inappropriatedesigning the questionnaire. Any ambiguities and response to the question "Have you been ill?"errors must be noted and corrected in the translated The second reason why the questionnaires mayversion rather than being "fixed" only in the back need to be translated is that sometimes the internation-translation version. al experts working on the survey design team do not

Most previous LSMS questionnaires were printed speak the national language well enough to design theonly in the national languages of the countries stud- questionnaires in that language. This happened in theied, so multilingual interviewers had to be employed case of the Vietnam LSMS questionnaires, which wereto conduct interviews in the most commonly used developed jointly in English and Vietnamese. In con-local languages. Occasionally a few key questions or trast, the LSMS questionnaires used in Latin Americanphrases were translated into the local languages and countries have been drafted only in Spanish by teams of

54

CHAPTER 3 DESIGNING MODULES AND ASSEMBLING THEM INTO SURVEY QUESTIONNAIRES

local and international experts who are fluent in that households should not be selected at random. Instead,

language. When translation is a necessary part of the different types should intentionally be included so that

development of the questionnaires, each draft of the all of the various situations likely to be found during

questionnaire must be translated, which may require a the survey are observed during the field test.

substantial amount of money and can also increase the Experience with LSMS surveys has shown that

time needed for designing the questionnaires. field tests should be conducted using at least 100

The third and final reason for translating ques- households.To get enough responses for each module

tionnaires is to produce a questionnaire in one or of the questionnaire, it may be necessary to visit addi-

more of the major international languages (English, tional households to conduct partial interviews in

Spanish, or French) in order to encourage the interna- which only those modules that apply to a relatively

tional research community to use these data in their small number of households are administered. For

policy analysis. Such translations need not be done example, the original 100 households may not include

until after the final questionnaire is developed, and enough pregnant women or people who have been ill

back translations are not needed. in the month preceding the interview to determine

whether the fertility and health modules, respectively,

Field Testing are well designed. In such cases survey designers

After draft versions of the household, community, and should find additional households that contain preg-

price questionnaires have been assembled and (if nec- nant women or ill people and have interviewers

essary) translated, they must be tested in the field. The administer only the fertility or health module to those

field test is one of the most critical steps of the survey households. 8 A field test usually takes about one

design process. The goal is to ensure that the ques- month to complete-about one week for interviewer

tionnaires are capable of collecting the information training, two to three weeks of fieldwork (interview-

that they are intended to collect. A field test should ing), and one or two weeks to discuss the findings and

address the adequacy of the draft questionnaires at finalize the questionnaires. More time is required if the

three levels: final questionnaires are to be produced in more than

* TDie Questionnaire as a Whole. Is the full range of one language, because each version of the question-

required information collected? Is the information naire should be field tested.

collected in different parts of the questionnaire While the full field test should cover 100 or more

consistent? Are any variables unintentionally dou- households, much can also be learned from prelimi-

ble-counted? nary smaller tests. A general rule of thumb is that

* Individual Modules. Does the module collect the about half of the problems will show up in the first 10

intended information? Have all major activities households interviewed. In one recent field test, inter-

been accounted for? Are all major living arrange- national experts wrote six pages of comments about a

ments, agricultural activities, and sources of in-kind single module after interviews were completed for

and cash income accounted for? Are some ques- only three households. Such small-scale preliminary

tions missing? Are some questions redundant or field tests are often particularly appropriate for new or

irrelevant? difficult modules. Yet survey designers must under-

* Individual Questions. Is the wording clear? Do any stand that these are precursors to a full-size field test of

questions allow for ambiguous responses? Are there the whole questionnaire, and not a substitute.

multiple interpretations? Have all responses been The personnel involved in a field test should

anticipated and coded? include the survey design team, a few experienced

It is important for a field test to include house- interviewers or field supervisors, and a few of the peo-

holds from all major socioeconomic groups. For ple consulted by the survey design team, including

example, a sample should include: rural and urban both policymakers and research analysts. It may also be

households; individuals employed in the formal sector, helpful to include people with experience working on

in the informal sector, and in agriculture; and farmers past LSMS or similar multitopic surveys.All of the par-

in each main agroecological region, in each produc- ticipants should divide into a small number of teams,

tion scheme (independent farming, renting, sharecrop- each of which includes at least one person with each

ping, and cooperative farming), and so forth. The kind of expertise.

55

MARGARET GROSH, PAUL GLEWWE,AND JUAN MUNOZ

There should only be a few teams involved in the poses, because in most cases the sample is both non-field test, usually around three or four. Mechanisms random and very small. However, the questionnairesshould be set up to enable the teams to contact each from the field test can be used to check the perform-other during the field test so that they can compare ance of the data entry program.

notes on the problems they encounter and the solu- The personal participation of all senior stafftions they have tried. A good way to set up such (including analysts) is fundamental for both the fieldmechanisms is to have all of the teams working test and its evaluation. The following anecdote illus-together for the first few days, perhaps in the capital trates this point. In one country, before the field test acity.This means that the teams will be in contact with manager in the statistics office asserted that collectingeach other every evening during the period when the information on family assets would be impossiblefirst and often biggest flaws in the draft questionnaire because respondents would fear that the informationare uncovered. In some cases the team members can would be used for tax purposes. The module wasagree on modifications to the questionnaire during included in the field test, and no unusual difficultiesthe field test itself, which allows these modifications to were encountered. But the manager who opposed thebe field tested. module did not witness the field test, and some of

Each interview during the field test should those who did participate in the field test did not par-include, at minimum, the respondent, the interviewer, ticipate in the module's evaluation. Despite the suc-and an analyst or senior survey specialist. During the cessful field experience, the module was removed fromfield test it is acceptable for the analyst or survey spe- the questionnaire, largely because key decisionmakerscialist to interrupt the interview tactfully in order to did not fully participate in the survey design process.refine the wording of a question or the responses coded Many small changes are generally made to ques-for it. Of course, in the actual survey the interviews tionnaires as a result of field testing, including changesshould be conducted in private, and the interviewers in the wording of some questions, in questionnaireshould adhere to the wording of the questionnaire. format, and in answer codes. If either the question-

The interviewers used in the field test should be naire's structure or the way in which certain variablesdrawn from the experienced staff of the statistical are measured is changed substantially, all of the parts ofagency. They should be good interviewers-familiar the questionnaire that have been so modified must bewith basic interviewing practices and able to distin- tested again. This can delay the survey, but one way toguish between problems caused by deficiencies in the reduce the probability of such a delay is to begin thequestionnaire and problems caused by their lack of field test with two or more versions of the most diffi-familiarity with the questionnaire. The interviewers' cult, contentious, or important modules in the ques-training should focus on the purpose of the survey and tionnaire. If one version clearly works the best, there isthe structure and format of the questionnaire. One no need to do another field test because that versionweek of training is usually sufficient, followed by two has already been field tested.or three weeks of household interviews. Ideally, the household, community, and price

Survey planners should set aside 1-2 weeks imme- questionnaires should all be field tested at the samediately after the field test to review the field test results time.This allows the survey design team to evaluate alland debate how to modify the questionnaire in light of of the questionnaires together, taking into account thethose results.The group involved in the field test should possibility that changes in one questionnaire may havego through the questionnaires, module by module, and implications for the design of the others. Simultaneousdiscuss any problems that arose. At this stage, the team testing of the three questionnaires can also reduceshould bear in mind that the length of time required travel costs since, like the household questionnaires,for each interview will fall dramatically when the the community and price questionnaires should beinterviewers are well trained and have become familiar tested in a variety of locations.with the questionnaire. As mentioned above, the typi- Regrettably, in several past LSMS surveys the sur-cal field test interview will be at least twice as long as vey teams neglected to field test the community andthe average interview in the actual survey. price questionnaires, concentrating solely on the house-

The data from the field test should not be entered hold questionnaire. The community and price modulesin the computer or examined for any analytical pur- were tested late and haphazardly or, in some cases, not

56

CHAPTER 3 DESIGNING MODULES AND ASSEMBLING THEM INTO SURVEY QUESTIONNAIRES

tested at all. It is probably not coincidental that the users chosen should be the one that is clearest and most

of the data from many previous LSMS surveys have likely to minimize the possibility of errors. The draft

often had more complaints about the community and questionnaires presented in this book follow the for-

price data than about the household data. If there is not matting conventions explained in this section, which

enough staff time to test all three questionnaires at once, have been used frequently in past LSMS surveys, with

it is important to ensure that separate, rigorous field tests successful results.

are done of the community and price questionnaires. Questionnaire format is important because a good

The health and education modules discussed in format minimizes potential interviewer and data entry

this chapter often include detailed facility question- errors, which improves the accuracy of the data and

naires (in other words, school or health clinic ques- reduces the time needed to check the data before

tionnaires), which can be very complex (see Chapters making them available to data analysts.The objectives

7 and 8 for details). It is essential to field test these facil- underlying a given survey can occasionally have impli-

ity questionnaires. During the field test the survey team cations for formatting, so some aspects of formatting

should be sure to visit each type of facility covered by will vary from country to country. Even so, almost all

the facility questionnaire. For example, field testing a of what has been learned about questionnaire format

health facility questionnaire should involve visits to in previous LSMS surveys will be applicable to new

public health posts, public clinics, private doctors' surveys. Thus the formatting guidelines presented in

offices, public hospitals, and private hospitals in both this section are recommended for all LSMS and simi-

urban and rural areas. Similarly, field testing a school lar multitopic surveys, and for other surveys as well.

questionnaire should involve visits to public and private

schools, primary and secondary schools, and schools in Identifiers

urban and rural areas. Since field testing a facility ques- Every person or object for which data are collected in

tionnaire is a major undertaking in its own right, it is a survey must be uniquely identified. This usually

probably best to conduct such a field test separately requires two or three separate codes. The first code

from the field tests of the other questionnaires. identifies the household. The second code identifies

the person or object of interest, such as an individual

Rules for Formatting Survey Questionnaires household member, a household business, or a plot of

land. Sometimes there is a third code, which applies,

The formatting of survey questionnaires is not a sepa- for example, to all children ever born to each woman

rate step in the overall survey design process. Rather, in the household or to the assets of each business oper-

it influences how the third, fourth, and fifth steps are ated by the household.

carried out. Good questionnaire formatting can make Whenever possible, the identification codes for

a tremendous difference in the quality of the data col- the second or third levels of observation should be

lected. This section discusses formatting in detail, mak- preprinted on the questionnaire pages to which they

ing very specific recommendations about how ques- pertain. For example, the individual identification

tionnaires should be formatted.9 code for each household member should be printed

There is, of course, more than one way to format on all pages that collect data on individual household

household survey questionnaires. Most of the benefits members.This ensures that the codes cannot be omit-

of good formatting come from selecting a formatting ted and avoids any errors that would occur if the inter-

convention and following that convention consistent- viewer were to write down the wrong codes. An

ly, rather than choosing the "best" convention from example of these codes appears in the far left column

among several possible options. For example, in LSMS of Figure 3.2, which presents the short version of the

questionnaires uppercase and lowercase letters are used education module.

to distinguish words spoken aloud during the inter- The importance of adequate identifiers is so obvi-

view from instructions to the interviewer, but this ous that it is hard to believe mistakes can be made, but

could be done in other ways, such as using different they can. In one health survey the questionnaire con-

colors or different fonts. Once a convention is select- sisted of two sheets of paper stapled together. One

ed, it is extremely important to use it consistently contained information on the household, while the

throughout the whole questionnaire. The convention other contained information on individuals. In order

57

co FIGURE 3.2: ILLUSTRATION OF INDIVIDUAL IDENTIFICATION AND SKIP CODES (EDUCATION MODULE SHORT VERSION) 3

C)

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.

Have you Are you What is the What is Were In what What is the Is the How much has your household spent on your education in the last 12 Have you How many Oever currently highest the you grade are highest school you months for: ever times haveattended enrolled grade you highest enrolled you diploma are repeated you repeated

I school? in have diploma in school currently you have currently a grade of a grade of cD school? completed you have during enrolled in attained so enrolled in school? school?

in school? attained? the past school? far? public or ro 12 private?O months?D w)NEXT E PERSON z

PUBLIC. . 1 >C

PUT PETT ~~~~PRIVATE zCODES CODES SECU- A. Tuition B. Parent C. Uni- D. Text- E. Other F. Meals, G. Other a

FOR PU U FOR LA.... 2 and other Associ- forms books? educational transpor- expenses oZYS. . .1 DIFFER- CODES YES O . 1 CODES DIFFER- PRIVATE required ation and materials tation (extra YES. .1 N

NO-... 2 YES.. f KW I 9 k 7ORFOR k-W RELI- fees? fees? other (exercise and/or classes. NO. ... 2 NMER OF(-NEX (-6 GRADS DXL No .. 2DIPLOMAS GRADES GIOUS. .3 clothing? books, odging? optional (-NEXT REPEATED

PERSON) NO ... 2 _______ R (.10) B_____ _____ ________I_______ pens, etc.)? ____fees)? PERSON) GRADES

4-

6100 mA -== = 11 8 5 -ll l

9 5 -l l

11=

CHAPTER 3 DESIGNING MODULES AND ASSEMBLING THEM INTO SURVEY QUESTIONNAIRES

to facilitate data entry, the two pages of the question- Exceptionally large households sometimes have so

naire were separated. Unfortunately, the household many members that there are not enough lines in the

identifier was not put on the page for individuals, grids for all household members. In these cases a sec-

making it impossible to link the two parts of the sur- ond copy of the household questionnaire will be

vey with each other after the data were entered. required, and care must be taken to ensure that the

right household and individual numbers are used. As

Questionnaire Layout explained in Chapter 4, a coding scheme is needed to

The LSMS questionnaires are designed so that only one distinguish between the first and second copies of the

copy of the questionnaire is needed for each household. questionnaire filled out for large households. For

In contrast, some surveys use one household question- example, the individual numbers in the second copy

naire and a separate set of individual questionnaires.This should be changed to start with 13 instead of 1

requires that household identification codes be copied (assuming that the first questionnaire has room for 12

perfectly onto all of the individual questionnaires.While household members).This is a reasonable approach for

perfection is always sought, it is rarely achieved, and sep- large households, but it also introduces a potential

arate questionnaires create the risk of improper match- source of error; survey designers should set the format

ing. This is illustrated in the case of the Russian of the grids to accommodate as many individuals as is

Longitudinal Monitoring Survey. Although care was practical. Previous LSMS questionnaires have typically

taken to ensure accurate coding and matching, many had space for 12-15 individuals.

errors were introduced. For the first round of the sur- In cases where the unit of analysis is such that

vey, which was held in the summer of 1992, there were there is only one observation per household (for

3 percent fewer individual questionnaires than had been example, one dwelling per household), the questions

expected given the number of household members pertaining to that unit can be arranged in a single col-

identified in the household questionnaires. By the sum- umn down the page. One problem with a single col-

mer of 1993, in the third round of the survey, this dis- umn of questions is that much of the page is left blank.

crepancy had grown to about 9.5 percent. To save paper, two or more columns may be put on

Putting all of the information into a single house- one page, as long as it is clear that there is no hori-

hold questionnaire implies the need for a grid of some zontal relationship among the questions in the differ-

kind whenever there are two or more of a particular ent columns. An example of this format is provided in

unit of analysis in a household. For example, a house- Figure 3.3, which shows the first page of Part C of the

hold often includes several people, may have several standard housing module.

plots of land, and may grow several different crops.The

grid typically used in LSMS surveys has questions Fold-Out Roster Page

arranged across the top and units of observation (peo- The household roster page of the household ques-

ple, plots, or crops) down the side; in other words, each tionnaire is printed so that it extends to the left of the

question is a column and each unit of observation is a pages that pertain to individuals in the household.

row. An example of this is shown in Figure 3.2; note Most importantly, the names of each individual mem-

that the identification codes for the units of observa- ber of the household on the roster page are visible

tion (household members) are printed on the left side when filling out the other individual-specific pages of

of the grid page. the household questionnaire. This has been done four

Sometimes the interviewer must fill in the code in different ways in LSMS surveys, as illustrated by Figure

the first column, as in Question 2 of Figure 3.8 (which 3.4.

is discussed below), but this practice should be mini- In the method shown in Format 1, the sheets in

mized to reduce the possibility of introducing errors front of the roster are shorter than the cover, the ros-

when writing down such codes. In the grids for indi- ter, and the sheets that follow the roster. The most

viduals, the lines can be differentiated by alternating common method is shown in Format 2. The roster

shaded and unshaded blocks (as in the draft modules sheet is folded out to extend beyond the body of the

inVolume 3 of this book) or by using a different color questionnaire and its covers. In Formats 1 and 2 the

for each row or block of rows. This helps an inter- roster page is placed behind all of the pages that per-

viewer record the information on the correct line. tain to individuals, so that the names on the household

59

g% FIGURE 3.3: ILLUSTRATION OF PRECODING (PART C OF HOUSING MODULE)C) >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

7. Do you have legal title to the dwelling or any document that showsownership? C

1. Is this dwelling owned by a member of your household? YES ......................hi? 1 _

NO ............................ 2

YES .. 1......... II C

NO .2 (>11) 8. What type of title is it? -

2. How did your household obtain this dwelling? FULL LEGAL TITLE, REGISTERED . .1LEGAL TITLE, UNREGISTERED... 2>

PURCHASE RECEIPT. 3 ZPRIVATIZED ............................. 1 OTHER..4

PURCHASED FROM A PRIVATE PERSON ........ 2 >

NEWLY BUILT ............... 3 ZCOOPERATIVE ARRANGENT................ 9. Which person holds the title or document to this dwelling?SWAPPED ............................. 5 (>6) 0

INHERITED .................... 6 (>>6) WRITE ID CODE OF THIS PERSON FROM THE ROSTER N

OTHER ............................. 7 (>>6)

1ST ID CODE:3. How much did you pay for the unit ? 2ND ID CODE:

4. If you make installment payments for your dwelling, what is the amount of 10. Could you sell this dwelling if you wanted to?the installment?

YES ... 1.. . ....... I

WRITE ZERO IF THE HOUSEHOLD DOES NOT MAKE NO .2 (-13)I

INSTALLMENT PAYMENTS l IAMOUNT (UNITS OF CURRENCY) L. I 11. If you sold this dwelling today how much would you receive for it?

TIME UNIT TAMOUNT (UNITS OF CURRENCY)

5. In what year do you expect to make your last installment payment?12. Estimate, please, the amount of money you could receive as rent if you

YEAR let this dwelling to another person?

6. Do you have legal title to the land or any document that shows AMOUNT (UNITS OF CURRENCY)ownership? TIME UNIT

YES ........... 1 l -> QUESTION 28NO ................ 2

13. Do you rent this dwelling for goods, services or cash?TIME UNITS: DAY ........ 3 MONTH ....... 6 YEAR ..9

WEEK . 4 QUARTER .. 7

FORTNIGHT ..5 HALF YEAR .. .8 YES .1 [l.NO .. 2 (>26)

FIGURE 3.4: ROSTER ARRANGEMENTS

Format I _. gFormat 2

Legal size (14" x 8.5") D size(8.'xlI )

} / ~~~~~~~ ~~or ISO A4 } tor A4

/ / ~~~~~~~~~~Shore pages/_1

HouseholdeRostr is the first of the Household Roster an a widerlonger pages in the middle of ticm sieet; folds out fiom the back page

z

Format 3 D LetA size (8.5 x I I") Format 4 O

Lcgal size (14" x 8.5") R

Household / or ISO A4 ffi folds out from (S.S" x Il") mdouble-sized or A4>

number niust / eoal aS o b rA

appear on h ousehold //\pIfrostag >

roswr and / ldsw m #/j Et

~~~~~~~~~~~~~ldsotfr us r on ~I alfrat,cos thein and malcedestiinathe open flt ID, ooe pero b otran nhidvda ae

questionnairne t heck po s im

I

3

00

In all formats, choose binding to muke questionnair open flat. ID codes appear on the roster and on each individual page. zLines or, the roste must be aligned with the page in the questionnaire.

0'.~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

MARGARET GROSH, PAUL GLEWWE, AND JUAN MUNOZ

roster page are visible whenever individual questions questionnaire. In most cases these response codes

are asked. should be printed directly in the box where the ques-

An innovation in the Kagera Health and tion appears, or next to the question if there is no box

Development Survey in Tanzania was to make the ros- around it.Where the list of codes is lengthy and applies

ter page a removable card, as shown in Format 3. This to several questions, it should be placed in a special

was useful because the survey was designed to be box on the border of each page for which it is need-

administered four times-every six months for two ed. Alternatively, if a list is very long it can be printed

years-to the same households. The roster card was on the back of the preceding page (making it visible

inserted into a pocket in the back of the questionnaire when the interviewer fills out the page in question).

in the first round of the survey. When the second An example of a box on a border of a page is the time

round started, the roster card was removed from the unit box shown at the bottom of Figure 3.3.

first questionnaire and placed in the back pocket of In past LSMS surveys fewer than a dozen ques-

the second questionnaire. In this way, individuals tions on the household questionnaire have requiredretained the same identification codes in each round. the interviewer to write down words or phrases that

A few follow-up questions guaranteed that individuals are given codes, usually by someone else, after the

who moved in or out of the household or were born interview. Precoding allows the data to be entered into

or died between rounds were counted appropriately. the computer straight from the completed question-

In four rounds of interviews conducted over two years naire, thus eliminating the time-consuming and error-

for 800 households, none of the roster cards was lost. prone step of transcribing codes onto data entryHowever, this success may reflect the intensive super- sheets.

vision carried out by the organizers of that survey, as Precoding requires that response codes be clear,well as the relatively small sample size. This option simple, and mutually exclusive, that they exhaust all

should probably not be used in situations with signif- likely answers, that respondents will not all provide theicant quality control problems. same response, and that none of the codes apply to

Format 4 was used in the Tunisia questionnaire. In only a handful of respondents. Designing adequate

this format each page is oriented as "portrait" (a verti- response codes requires extensive knowledge of the

cal page) rather than as "landscape" (a horizontal page) phenomenon being studied as well as careful field test-

and is spiral-bound so that it opens flat. Each ques- ing.A standard technique to ensure that the codes are

tionnaire page then consists of the full 11 x 17 inches mutually exclusive is to add a qualifier where more

of the two-page spread.The roster folds out to the left. than one answer could apply-asking, for example,

In all four cases the line for each individual member "What was the main reason for dropping out of

of the household on the roster page is aligned with the school?" Other standard qualifiers are "What was the

corresponding lines on the other individual-specific first (or last, or principal) reason for ... ?" Alternatively,

pages of the household questionnaire. spaces can be provided for multiple responses, with anA final point regarding the fold-out roster page is instruction to code all responses (up to, say, the two or

that it may be useful to have more than one such page three most important) that apply.per questionnaire. A fold-out roster will be useful A standard technique to ensure that codes encom-

whenever there are several pages of questions for the pass all possible answers is to add an "other (specify

same level of analysis and especially when there are _ )" code to questions for which an explicit

many rows on the grid. For example, in the agricultur- enumeration of each possible response is impossible or

al module one might make rosters for crops grown or inconvenient. In past LSMS surveys the detailedfor plots of land. A fold-out roster page would be par- answers were almost never coded, so analysts usually

ticularly helpful for the household enterprise module. put all "other" responses into a single residual catego-

ry. One way to increase the probability that the infor-

Precoding mation recorded in the "other (specify )"

All of the potential responses to almost all of the ques- answers will be used at a later date is to enter it (as

tions in the questionnaire should be given code num- text) into the computer, without assigning any codesbers so that the interviewer records only code num- to the responses. This allows analysts to code any

bers, as opposed to words or phrases, on the answers that were not precoded in the data released to

62

CHAPTER 3 DESIGNING MODULES AND ASSEMBLING THEM INTO SURVEY QUESTIONNAIRES

the public. It also allows the designers of subsequent unnatural. For example, "Did you spend any time

surveys in the same country to review the answers that doing housework?" followed, if necessary, by "...such

were written in (especially in cases in which a signifi- as cooking, mending, doing laundry, or cleaning?" is

cant percentage of the responses were coded "other") better than "Did you spend any time engaged in

and to modify their coding lists accordingly. In partic- domestic labor, for example, preparing food, repairing

ular, if most of the "other" responses fall into a single, clothes, cleaning clothes, or cleaning house?" It is not

well-defined category, this category should have its always easy to find terms that are simple, short, and yet

own code in any subsequent survey. precise, but that should always be the goal.

There is, of course, a limit to the kind of material In most cases the interviewer reads the question

that can be covered even by well-designed, precoded aloud and marks the questionnaire with the code for

questions. But this limit may be less of a disadvantage the answer given by the respondent. For example, for

than it first appears. Because most analyses of LSMS the question, "Are you currently enrolled in school?"

surveys use sophisticated quantitative techniques, it is the interviewer writes down a 1 for "yes" or a 2 for

difficult for these analysts to make use of the "no." For some questions the response categories are

exploratory, qualitative information gathered in open- part of the question-for example, "Is the school you

ended questions. So even if such questions were asked, are currently enrolled in public or private?"There may

the answers to these questions would not be used also be a few questions for which the wording of

much in analysis. If it is clear that some analysts do respondents' answers may vary even though the mean-

need extensive information of an exploratory, qualita- ing is the same.The best thing to do in such cases is to

tive nature, the designers of a prospective survey may have the interviewer read out all of the response cate-

wish to adopt a different data collection instrument or gories. For example, in Question 4 of Figure 3.5, after

even a new research technique. See Chapter 25 for a reading "Compared to your health one year ago,

thorough discussion of qualitative data collection would you say that your health is..:" the interviewer

alternatives. should read the responses "much better now," "some-

what better now," "about the same;' "somewhat

Verbatim Questions with Simple Answers worse," and "much worse." If necessary, the interview-

All questions in LSMS surveys are written out in er can explain the differences between the various

their entirety and are meant to be read out verbatim response categories. However, the reading out of

by the interviewer.This is done to ensure that ques- response categories should be used as little as possible,

tions are asked in a uniform way, since different because respondents may not listen to the full list

wordings may elicit different responses. For example, before answering, which can lead to errors.

the answers that a respondent gives to "Can you The answers to the questions must be kept simple.

read?" and to "Can you read a newspaper or maga- This means that additional filter questions are often

zine?" will probably be somewhat different. Other needed.Adding enough filter questions to ensure sim-

changes may subtly alter the time period referred to, ple answers can make the number of questions and

as in the change from "Have you worked since you skips seem high. Many survey designers are tempted to

were married?" to "Did you work after you were shorten the questionnaire or simplify the skip pattern

married?" Scott and others (1988) discuss some rig- in a way that results in complex questions and answers.

orous field experiments that compared such verbatim This should be avoided since it will confuse some

questionnaires with questionnaires in which the respondents and is unlikely to save time.

topic was given for each question but the exact Survey designers yielded to this temptation in the

wording was not. When the questionnaire that did agricultural module of the 1987-88 Ghana LSMS sur-

not contain the exact wording was used, 7 to 20 vey. In that module the following question was asked:

times more errors occurred than when the verbatim "Do you or the members of your household have the

questionnaire was used. right to sell all or part of their land to someone else if

When choosing the wording of questions, it is they wish?" The precoded answers (which were not

important to use terms that reflect the language as it is read out to the respondents) were "Yes," "No," "Only

commonly spoken. Using language that is too formal after consulting family members who are not house-

or academic will make the interview stilted and hold members," and "Only after consulting the chief or

63

FIGURE 3.5: ILLUSTRATION OF CASE CONVENTIONS (HEALTH MODULE STANDARD VERSION) 3C)

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. -

I IS THIS COPY THE ID During the last four Compared with your health one CHECK THE AGE IN Did [NAME] Was it Was it Was it a How did you treat it? aD PERSON CODE OF weeks, how many year ago, would you say that your YEARS OF THIS experience mixed with mixed with pale liquid? I

ANSWER- THE days of your health is: PERSON diarrhea in blood? mucus? >NEXT -C ING FOR RESPOND- primary daily [READ OUT ANSWERS TO the last 7 rSECTION c0 HIMSELF/ ENT FROM activities did you RESPONDENT] days? G1D HERSELF? THE miss due to poorE HOUSEHOLD health? REDUCED FOOD OR

ROSTER LIQUID GIVEN TO mCHILD ........... 1 zGAVE SPECIAL FOODS TO CHILD..... 2ORAL REHYDRATION Z

OTHERAPY .

Much better now ......1 0-6 ............. T (SPECIRY 4 NSomewhat better now ..2 7-14 ............ 2 NO TREATMENT .. 5

YES ..1 About the same ....... 3 (-NEXT SECTION) YES ..1 YES ..1 YES. .1 YES. .1(>>3) Somewhat worse .... 4 15-39 ..... 3(t24) NO .. 2 NO.. 2 NO...2 NO ... 2

No.. .2 ID CODE DAYS Much worse ........... 40 AND OVER.4(>>11) (>>11) 1ST 2ND 3RD

2

3

7

12= =l

CHAPTER 3 DESIGNING MODULES AND ASSEMBLING THEM INTO SURVEY QUESTIONNAIRES

the village elders." It is not clear whether the respon- ed rules printed in the manual rather than on the

dents could distinguish between the simple yes answer questionnaire. This helps ensure that instructions willand the yes answer qualified by the need for consulta- be followed uniformly. Well-placed skip codes ensure

tion.Thus a different formulation might have been bet- that inapplicable questions are not asked. (Asking inap-

ter. The question could have been left as is but using plicable questions irritates respondents, wastes inter-

only simple "yes" and "no" codes. Then the interviewer view time, and confuses data analysis.) Finally, explicit

could have put a second question to those who skip codes imply that a "not applicable" code is almost

answered "yes," worded as follows: "Do you need to never used in LSMS questionnaires.

consult with anyone outside the household before sell- One way to check skip codes is to develop a flow

ing the land?"The response codes would be "Yes" and chart of the questions in each module. Flow charts

"No."Then a third question would be put to those who are useful both for checking the logic of the ques-

answered "yes" to the second: "Whom must you con- tionnaire and for training interviewers. Figure 3.6

sult?"The response codes for this question would be for presents a flow chart of a typical health module used

"family member,""village elders," and other appropriate in past LSMS surveys (which differs in several

categories.This formulation would have made the ques- notable ways from the health module presented in

tionnaire longer in terms of the number of questions Volume 3). The proportions of people who answerbut would probably not have increased the interview yes at each branch are recorded based on results from

time since some sort of probing probably occurred in several previous LSMS surveys.The numbers of indi-

the Ghana LSMS when the "yes" answer was given. viduals that would be asked each set of questions areMore importantly, keeping questions and answers sim- shown on the left, assuming a base of 10,000 indi-ple makes the interpretation of the data much clearer. viduals in the sample. The flow chart makes it easy to

check whether the skip patterns lead people throughSkip Codes the module correctly. For example, it is possible toSkip codes are used extensively in LSMS question- check that the question on health insurance is askednaires. Skip codes tell the interviewer which question of all household members, not just of those who areto proceed to after finishing the current question. ill. Analyzing the whole household in this way gives

Some skip codes apply only when a particular answer survey designers a better sense of the likely length ofis given. In such cases an arrow and the number of the time it will take to complete each interview thanquestion to skip to are positioned in parentheses next does the number of pages or number of questions in

to or below the individual response to which the code the questionnaire, because many questions will beapplies. An example is given in Question 2 of Figure skipped for many individuals. (For further discussion

3.2. If the answer to Question 2 is "yes," the inter- of the length of the questionnaire see the second sec-viewer should skip Questions 3, 4, and 5 and proceed tion of this chapter.)to Question 6. If the answer to Question 2 is "no," theinterviewer should proceed to Question 3. In Case ConventionsQuestion 1 a similar construction is used, but when Everything that the interviewer should read aloud

the answer is "no" the interviewer is instructed to skip should be written in lowercase letters. Instructions toall the remaining questions in the module for this the interviewer should always be written in uppercaserespondent and proceed to interview the next person. letters.10 Answer codes should also be written in

Another kind of skip instruction applies regardless uppercase, unless they are to be read aloud to theof the response given to the question.When an arrow respondent. This makes it easy to include instructionsand a question number or instruction are placed in a on the questionnaire as opposed to relying on thebox separate from the response codes, the skip instruc- interviewers' memory of the manual or of instructions

tion contained in the box applies regardless of what that they were given during their training. In Figureanswer is given. An example of this is given in 3.5 instructions to the interviewer are printed in

Question 10 of Figure 3.5. Questions 1, 2, 4, and 5.These are in uppercase, as are

There are several advantages to extensive, explicit the answer codes in Questions 1 and 5. (The answer

skip codes. Interviewers do not have to make decisions codes in Question 4 are in lowercase because they arethemselves, nor do they need to remember complicat- to be read aloud to the respondent.)

65

MARGARET GROSH, PAUL GLEWWE, AND JUAN MUNOZ

FIGURE 3.6: FLOW CHART OF HEALTH MODULE USED IN PREVIOUS LSMS SURVEYS

10,000 |1 Were you ill or injured in the last 4 weeks? I -

YES (10-45%)

2 How many days in the last 4 weeks did you have1000-4500 to stop doing your usual activities?

3 Was anyone consulted? s

I ~~~~NOYES (40-80%)

400-3600 4 Who was consulted?5 Where did you go for that consultation?6 What was the cost of the consultation?7 What means of travel did you use?8 How long did it take to get to the place

of consultation?9 How much did you spend on travel costs?10 How long did you have to wait?

11 Did you have to stay overnight at theclinic or hospital?

YES (5-8%)

20-288 12 How many nights did you stay?13 How much did you have to pay?

1000-4500 14 Did you buy any medicines for thisillness or injury?

NOYES (60-90%)

600-4050 15 How much did you spend on medicines?

10,000 16 Do you have health insurance?

|NEXT PESN

Enumeration of Lists used in the consumption module, as shown in Figure

There are two methods of gathering information 3.7. Although several dozen items are included, it isabout long lists of iterns.A typical LSMS questionnaire expected that most households will have consumed

may use either method depending on particular many of them. The first question is "Has your house-circumstances. hold consumed [FOOD] during the past 12

Consider the case in which one expects that a months?"The interviewer first goes down the wholelarge proportion of the items on the list will apply to list asking this "yes or no" question. Then the inter-most households. For each item on this list a line is viewer returns to the first item that was consumed

put in the grid and the name and code number of the and asks all the follow-up questions for that itemitem is printed on the questionnaire. This approach is before proceeding to the next item. The complete

66

FIGURE 3.7: ILLUSTRATION OF CLOSE- ENDED LIST (PART B OF CONSUMPTION MODULE)

PURCHASES SINCE LAST VISIT PURCHASES TYPICAL MO HOME PRODUCTION GIFTS UNIT CODES:

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. USE CODES

In the following questions, I want to ask Have the How much How much How many How much How many How much What was What is the WITH STAR

about all purchases made for your members of did you pay did you months in do you months in did you the value of total value OSSIBELR

household, regardless of which person your in total? buy? the past 12 usually the past 12 consume in the [FOOD] of the

made them. household months did spend on months did a typical you [FOOD] KILO* 1

bought any your [FOOD] in your month? consumed consumed GRAM* ....

Has your household consumed [FOOD] [FOOD] household one of the household in a typical that you POUND ... 3

during the past 12 months? Please since my purchase months that consume month from received as OUNCE* .

exclude from your answer any [ITEM] last visit, [FOOD]? you [FOOD] that your own a gift over LITER* ...

purchased for processing or resale in a that is since purchase you grew or production? the past 12 CUP* .... 6

household enterprise. [DAY/DATE [FOOD]? produced at months? PIT* .7p home? QUART*.. .8

PUT A CHECK (/) IN THE IF NONE GALLON* ..9

APPROPRIATE BOX FOR EACH WRITEIF NONE IF NONE, BUNCH . 10

FOOD ITEM. IF THE ANSWER TO WRITE WRITE PECK .... ll nI

Q. 1 IS YES, ASK Q.2-13. > ZERO, ZERO BUSHEL 12 >

YES.1 >10 TIN ..... 13 3NO. .2 PIECES. .14 V

| NO | YES |CODE (>5) CURRENCY AM T UNIT MONTHS CURRENCY MONTHS AM T UNIT CURRENCY CMRRENCY DOZENS.. 15 .

zWheat (grain) 1 l l l _ _ _

Wheat (flour or c)

maida) 2 l_ l_ _ _ l|

Maize (tlour or r cgrain) 3 l ll|v

Jawar/Bajra 4 l _ l z

Fine rice (basmati) 5 l l l l l l l _ 3:1

Coarse rice 6 |Othergrains/cereals 7 - - - I

Gram = 8 E l l l__l|__

Dal 9 _ _ _ __ _ _ _

Groundnuts 10 _ r l l l ___

Liquid vegetable ll

oils (dalda) 11 I l c--I

Ghee, Desi ghee =12 = | l l 0____ Fresh milk _ l l l l l l l mz

zFresh milk 1 _ _ _ _ _ __ _ _ _ _ _ _ _ _ __ _ _

Q% FIGURE 3.7: ILLUSTRATION OF CLOSE- ENDED LIST (PART B OF CONSUMPTION MODULE) 3

PURCHASES SINCE LAST VISIT PURCHASES TYPICAL MO HOME PRODUCTION GIFTS UNIT CODES:1. 2. 3. 4. 5 6. 7. 8. 9. 10. USE CODES

In the following questions, I want to ask Have the How much How much How many How much How many How much What was What is the WITH STARabout all purchases made for your members of did you pay did you months in do you months in did you the value of total value 0HENEhousehold, regardless of which person your in total? buy? the past 12 usually the past 12 consume in the [FOOD] of the Imade them. household months did spend on months did a typical you [FOOD] KILO* ... 1 C

bought any your [FOOD] in your month? consumed consumed KILO* .... 1Has your household consumed [FOOD] [FOOD] household one of the household in a typical that you GRAH* .... Gduring the past 12 months? Please since my purchase months that consume month from received as UND* .exclude from your answer any [ITEM] last visit, [FOOD]? you [FOOD] that your own a gift over oUNE*4 ...5purchased for processing or resale in a that is since purchase you grew or production? the past 12 Chousehold enterprise. [DA YIDA TE [FOOD]? produced at months? CUPI* .... 6 zhousehold enterprise. [DAY/DATE [FOOD]? produced at months? ~~~~~~ ~ ~~~PINT* ... U0

home? QUART*8... CPUT A CHECK (/) IN THE IF NONE GALLON* 9 ... APPROPRIATE BOX FOR EACH ZGERO IF NONE IF NONE, BUNCH... 1 CFOOD ITEM. IF THE ANSWER TO WRITE WRITE PECKl.... OQ1 IS YES, ASK Q.2-13. ZERO, ZERO E .... NZERO, ~~~~~~~~BUSHEL. .12N

YES.1 )>10 TIN . 13NO. .2 PIECES. .14

NO YES CODE (-5) CURRENCY AMT UNIT MONTHS CURRENCY MONTHS AMT UNIT CURRENCY CURRENCY DOZENS. .15BO-TTLES .16

Yogurt and Lassi 14 _ l

Milk Powder 15 _ l

Baby Formula 16 = _

Sugar (refined) _ 17 l _ l

CHAPTER 3 DESIGNING MODULES AND ASSEMBLING THEM INTO SURVEY QUESTIONNAIRES

enumeration of items consumed is done before ask- not produce an answer, the interviewer is instructed

ing the follow-up questions so that respondents will (in the interviewers' manual and in training) to write

not be tempted to say that they have not consumed "DK" (for "don't know") in the space reserved for an

something in order to shorten the interview by answer code. Such responses are given a special non-

avoiding the follow-up questions. This temptation is numeric code in the data entry program. The end

prevented because the enumeration is done before the result for analysis is much the same as having a "don't

respondent finds out that there will be follow-up know" code for each question. However, this system

questions on each item enumerated. has the advantage that it discourages interviewers from

A second approach is useful when it is expected accepting "don't know" answers too easily, which they

that only a few of many possible items will pertain to may be tempted to do to speed up the interview.

any one household. Consider Figure 3.8. The large Moreover, the special non-numeric code for such

grid on the right contains lines for several durable responses is glaringly obvious when the supervisor

goods owned by the household, but these are not pre- reviews the questionnaire.

coded. Rather, the respondent is asked, using the small

grid on the left, whether the household owns certain Letting Respondents Choose Units

durable goods. In this example 12 durable goods are For many questions that involve payments or quanti-

considered, but in some cases 20-30 goods have been ties, respondents are allowed to give their answers in

listed. Most households own only a few durable goods. whatever units they find most convenient. Examples of

For all durable goods owned by the household, the this are found in Figure 3.3. In Questions 4 and 12 the

interviewer lists the name and the code number in the code of the time unit in which the respondent replies

large grid to the right in Figure 3.8, and asks a series is placed in the box marked "time unit."The codes are

of questions about each good. If the household owns provided in a box at the bottom of the page.

two or more of the same durable good, one line is Allowing the respondent to select the time unit

filled out for each good owned. means that transactions are expressed in the units in

which they normally occur, which may differ from

Probe Questions household to household or from person to person.

There are some kinds of information that respondents This avoids inaccuracies in conversion. For example, a

may accidentally not provide. In such cases the ques- person paid $510 per week will respond precisely if

tionnaire includes instructions to the interviewer to allowed to respond on a per-week basis. If forced to

ask further "probing" questions on the subject. An respond in terms of dollars per month, the respondent

example of this is Question 9 of Figure 3.1. Suggested might round the figure down to $500 for ease of mul-

probing questions are usually included in the inter- tiplication and calculate each month as being equiva-

viewers' manual and occasionally included in the lent to four weeks. The annualized figure would thus

questionnaire itself. Probe questions are often used to become $24,000 instead of the $26,520 that would be

ensure that all items in a respondent-determined list reported if the respondent were allowed to report on

have been reported to the interviewer, or to ensure a per-week basis and the data analyst then calculated

that the respondent's answer is properly classified by the respondent's annual rate from that answer.

the interviewer. Interviewers are also asked to probe Of course, data analysis is always slightly more

for answers to questions that ask "how much ... ?" (This complicated when respondents' answers must be con-

kind of question is commonly found in the consump- verted in order to arrive at annualized figures, but,

tion, agriculture, and household enterprise modules.) since a computer can easily do this, this disadvantage is

Interviewers should be thoroughly trained to ensure trivial. However, it is very important to ensure that,

that they fully understand what information to probe where necessary, the questionnaire explicitly asks the

for, and how to do so. respondent how many times per year the payments are

Because the interviewer is trained and instructed made. For example, a worker who reports a daily wage

to probe for information, there should be very few rate may be employed only intermittently. In this case,

answers of "don't know" and thus very few codes for the questionnaire should ask the respondent how

"don't know" in the questionnaire. In the exceptional many weeks or months he or she has worked during

case when even a sound interviewing technique does the preceding 12 months (see Chapter 9 for details).

69

o4 FIGURE 3.8: ILLUSTRATION OF OPEN-ENDED LIST (PART E OF CONSUMPTION MODULE) 3

2. 3. 4. 5. 6. 7. >H

LIST ALL THE ITEMS OWNED BY How many Did you purchase How much did How much If you wanted1. Does your household own any of the THE HOUSEHOLD, THEN PROCEED years ago it or receive it as you pay for it? was it worth to sell this 0TO ASK Q.3-7. did you a gift or payment when you [ITEM] today,

acquire this for services? received it? how muchI [ITEM]? would you c

DETERMINE WHICH DURABLES THE T receive? G)HOUSEHOLD OWNS BY ASKING Q.1. FOR E

EACH DURABLE OWNED, WRITE THE MDESCRIPTION AND CODE IN THE SPACE

PROVIDED UNDER 0.2, AND PROCEED TO >>cASK Q.3-7 FOR EACH ITEM. >

3z

C:

PURCHASE..1 0

GIFT OR

PAYMENT ..2ITEM CODE YES NO DESCRIPTION CODE YEARS (>6) CURRENCY CURRENCY CURRENCY

Stove 201 = = 2 =

Refrigerator 202 2

Washing Machine 203 _ 4

Sewing/knitting machine 204 4

Fan 205 5

Television 206 6

Video player 207 __8 =

Tape player/CD player 208 -

Camera, video camera 209 9

Bicycle 210 = 10

Motorcycle/scooter 211 11

Car ortruck 212 1213

14=

1__16

CHAPTER 3 DESIGNING MODULES AND ASSEMBLING THEM INTO SURVEY QUESTIONNAIRES

Table 3.1 Units of Quantity Used in Ghana, 1987-88 cate who is providing the information, a question can beinserted that asks the interviewer whether the person is

Pound * I answering for himself or herself. If someone else is pro-

Kilogram *2 viding the information, the interviewer shoulwal inu the.............................................................................................. ......... vi ngt e nf r a o ,th i t r i w rs ou d i n t e

Ton *3 identification code of that person. An example of this isMinibag *4 shown in Figure 3.5.This information is useful because

................................................................................................ ..........

Maxibag . .... .. .a proxy respondent may give less accurate information

Bowl 8st64@80vl@@@-66*******z@@@9Pev-w----- ................................ ity in question. For example, one household member

American tn ; - may not know the exact salary of another. Therefore,Am"en c an tbin 9

Tre'e'' a'' ' '''''''''''''''' ' ''''''''''' l'''' some analysts may wish to identifyr any possible biases.......................................................... ........ *........................ 1-0 ................. so m an ly t ma v.s to id n iy a y po sb e b a e

Stick 11 introduced by the proxy respondents or to omit theirBundle 12 responses altogether...................................................................................................... 13...........Barrel 1 3............................................................................ *......................... -14.......... C r s o k C v rLiter ... . ..... *1.4 . .Cord stock CoversGEa'iil'o"n, ..... ... ..... .LSMS questionnaires are usually printed with card-Beer bottle * 16*n-c -............................................ stock covers-covers made of very thin cardboardBunch 1 7

Nut......................................................................................... similar to the cardboard used in file folders. In some

Fruit 19 past surveys it was decided not to use these covers

L og 20 because of their added cost, but this led to the prob-

Box 21 lem that the front and back pages of the questionnaire

Al.2-2 occasionally came loose. Since the front page usuallyNote: It is preferable to use the unit codes marked by (*) whenever possible. carries the key household identifier information andSource: Ghana LSMS survey (1987-88).

the back page sometimes contains the household ros-

A particular place in the questionnaire where it is ter, any such loss is likely to render the rest of the

useful to allow respondents to choose their own units questionnaire useless. Thus cardstock covers are well

is in the "quantities produced" questions in the agri- worth their cost.

culture module. In Ghana, for example, respondents

were allowed to give answers in 22 different kinds of Identifying Sections

units (Table 3.1). A serious problem for analysts who The household questionnaire contained in a prototyp-

want to convert these different quantities to a single ical full LSMS survey can be very bulky. The Nepal

standard unit is that only about half of the units used questionnaire, for example, had 70 pages. Therefore, it

in this example were standardized, and some of the is useful to devise some ways to make it easy for read-

standardized units were local terms (such as minibag ers to find their way around in these questionnaires. A

and maxibag) that would be unknown to anyone not few ideas are listed here, and there may well be more.

familiar with farming in Ghana."t In the case of stan- First, it is useful to have page numbers on each page

dardized local units, the survey team should ensure and a table of contents listing the sections (and their

that such terms are defined (in terms of international page numbers) at the beginning or end of the house-

standardized units) in a basic information document hold questionnaire. Second, some inexpensive graphic

that includes all of the information that data users will techniques can be used to divide the questionnaire

need to analyze the data. into smaller parts. For example, some sections of the

questionnaire can be printed on different colored

Respondent Codes paper or in different colored inks, or sheets of colored

It is sometimes useful to know who is answering a cer- paper can be inserted between major portions of the

tain section of the questionnaire. In general, each house- questionnaire. It is also possible to print short, dark

hold member should answer for himself or herself, but bars at the edge of each page, with the placement of

this is not always possible. For example, a household these bars on the page being the same within each

member may be away during the entire week when the module but lower down (if on the vertical edge) or

field team is working in his or her community. To indi- further to the right (if on the bottom edge) in each

71

MARGARET GROSH, PAUL GLEWWE, AND JUAN MUNO0Z

successive module. Using just one or a few of these also simplifies translations, as the verbal parts can betechniques will be sufficient.The questionnaire should overwritten in the local language, leaving intact thenot become too colorful or complicated. skip codes, response codes, and general format.

Legibility and Spacing Appendix 3.1 Common Gaps and OverlapsThere is an art to laying out the grids for a question-

naire. The lettering must be large enough to read, This appendix provides a list of the modules thatwhich is sometimes difficult to accomplish in the should be checked for gaps and overlaps with respectcompact structure of the grid. Legibility is especially to the information that they collect. This list is notimportant, as interviews often take place under poor meant to be exhaustive because household question-lighting conditions, such as outdoors at dusk or after naires of different configurations will be subject to dif-dark in homes dimly lit with lanterns, oil lamps, or ferent risks of gaps and overlaps and because there arecandles. The good print quality now available from so many possibilities that it is difficult to list them all.laser printers helps, but poor legibility is an ongoing However, some of the most common and importantcomplaint among interviewers. issues are mentioned here. Many more are mentioned

There must also be enough white (empty) space in the relevant chapters of this book.in the layout of the questionnaire. Whenever theanswer will be coded later, a generous space should be Consumptionallowed to write out fully the information required, Consumption information usuaUly comes from severalsuch as the person's name, the name of the school different modules of the household questionnaire. Seeattended by the respondent, and the respondent's the discussion in Chapter 5 on the different compo-occupation. In other places, judicious use of white nents of consumption and the modules in which thosespace makes the questionnaire easier to read or less components are typically collected.confusing than a questionnaire in which every page iscrowded with print. Income

In fact, in this book, the fonts used inVolume 3 are Information on household income is gathered in theprobably too small. This is necessary for Volume 3 to following modules: employment, household enter-show how typical questionnaire pages should appear. In prise, agriculture, and transfers and other nonlaboran actual questionnaire, the size of the pages usually will income. It is sometimes also collected in the housingbe somewhat larger than the pages in this book, and the and savings modules. It is important to review thefont size should be increased by a similar proportion. questionnaire as a whole to make sure that it accounts

for all possible sources of income. In particular, ques-Software for the Questionnaire Layout tions about income from any rental property could beMany of the most common word processing and placed in the transfers and other nonlabor incomegraphics software packages are adequate for producing module, on the assets page of the savings module, or, ifquestionnaire page layouts, and LSMS questionnaires the income comes from renting out a portion of thehave been produced using several different software household's primary dwelling, in the housing module.packages. The modules in Volume 3 (the electronicversions of which are available to readers in the CD- WealthROM enclosed in the volume) were produced in Information on household assets is collected in sever-Microsoft Excel, for two reasons. First, Excel is wide- al modules. The housing module gathers informationly available. Second, spreadsheet software is better than on the household's principal residence.The householdword processing software at dealing with the long hor- enterprise module gathers information on equipmentizontal format of groups of questions on a single topic and land associated with each household enterprise,that are spread across several pages. Regardless of the and on the stocks of inputs and outputs used in eachsoftware used, it is now much simpler and cheaper to enterprise. The agricultural module gathers informa-make revisions between the various drafts of the mod- tion on land, equipment, and livestock. The savingsules than it was in the days when graphic artists had to module collects information on other properties anddraw each page by hand. The computerized approach financial assets, and the durable goods submodule of

72

CHAPTER 3 DESIGNING MODULES AND ASSEMBLING THEM INTO SURVEY QUESTIONNAIRES

the consumption module collects data on the house- information in the health module, which includes

hold's durable goods. Finally, the credit module gath- questions on vaccinations in Part C of the standard

ers information on the household's liabilities. health questionnaire.

Credit Domestic HouseworkCredit information is collected in several modules, Some previous LSMS surveys have collected informa-

including the modules for housing, consumption, sav- tion on how much time household members spend

ings, agriculture, and household businesses. There is doing housework (such as cooking, cleaning, and

also a separate credit module. Chapter 21 introduces childcare) in the employment module, usually asking

the credit module and clarifies gaps and overlaps in only one question. If a time use module is included in

credit. a questionnaire, there is no reason to ask questions

about housework in the employment module.

Mortgages However, because the time use module is very long, it

Information on any mortgages that a household might is unlikely to be used in most LSMS-type multitopic

hold can be gathered either in the credit module or in surveys. If the time use module is not included but

the housing, agriculture, and household enterprise survey designers want to gather a small amount of

modules. information on, for example, the number of hours

spent on housework during the previous seven days,

Employment one or two questions can be added to the employment

Analysts often need to know how many hours each module. (See Chapter 9 for further discussion of this

household member works in the household's enter- issue.)

prises and in its agricultural activities as well as hours

worked in employment outside the household. In pre- Notes

vious LSMS surveys, all of this information was col-

lected in the employment module. As explained in The authors would lke to express their gratitude to Jere Behrman,

Chapter 9 (and Chapters 18 and 19), this book rec- Lawrence Haddad, Courtney Harold, John Hoddinott, Alberto

ommends collecting data on household members' days Martini, and Raylynn Oliver for comments on an earlier draft.

and hours of work in household enterprises and agri- 1. Survey designers occasionally collect redundant information

cultural activities in the household enterprise and as a cross-check on other data. For example, most previous LSMS

agriculture modules, respectively, while continuing to surveys have recorded both the age (in years) and the date of birth

ask about the number of hours worked in wage of each household member. This is done to verify the accuracy of

employment in the employment module. However, the age variable.

some survey designers may decide not to include the 2. This assumes that a two-stage sample is used. In the case of a

household enterprise and agriculture modules. In such three-stage sample, the secondary sampling unit is more pertinent.

cases information on the number of hours spent Generally, the penultimate sampling unit is the appropriate unit for

working on these activities must be collected in the collecting community data.

employment module. 3. Issues concerning the order of the questions within each

module are discussed in the topic-specific chapters in Parts 2 and 3

Vaccination of this book. For a general discussion of ordering questions in

If the survey includes a fertility module, questions household surveys see United Nations (1985) and Frey and Oishi

about vaccination should usually be placed in the fer- (1995).

tility module so that this information can be collected 4. The short version of the health module presented in Chapter

not only for children who currently live in the house- 9 does not ask for particularly sensitive information, but the stan-

hold but also for children who have died or moved to dard and extended versions ask detailed questions about health sta-

another household. If there is no fertility module or tus and health behavior (including drinking and smoking) that can

the fertility module does not include all women of be sensitive. Ifeither the standard or the long module is used, health

childbearing age, vaccination information on children should not be one of the first modules in the questionnaire.

living in the household can be collected in the anthro- 5. The questions in the household enterprise module that refer

pometry module. Another alternative is to gather this to "the past 14 days" can be reworded as "since my last visit" if the

73

MARGARET GROSH, PAUL GLEWWE, AND JUAN MUNOZ

second half of the questionnaire is administered two weeks after the income, which was the purpose of the agriculture module in theinterviewer's first visit. Ghana LSMS. However, as is common with such rich data sets, ana-

6. For example, the education module asks questions such as lysts are using the data for other purposes as well, such as calculat-"What grade is [..NAME..] enrolled in?" For this question, the ing the total quantities of various crops that were produced.range of acceptable valuLes in the data set is precisely defined.Moreover, it is also related to other information such as the degree Referencesobtained and the age of the student. (For example, a six-year-oldshould not be in secondary school.) In the consumption module, Ainsxvorth, Martha, and Jacques van der Gaag. 1988. Guidelines forhowever, a wide range of values might be found for a question such Adapting the LSMS Living Standards Questionnaires to Localas "How much did ,vou spend on rice in the last two weeks?" which Conditions. Living Standards Measurement Study Workingimplies that fewer consistency checks are possible. Paper 26.Washington, D.C.:World Bank.

7. This section is a slightly modified version of the discussion on Babbie, Earl. 1990. Survey Research Methods, Belmont, Cal.:translating and field testing found in Chapter 3 of Grosh and Wadswvorth.Munoz (1996). Fink, Arlene. 1995. The Survey Handbook. Thousand Oaks, Cal.:

8. An alternative approach is to stretch the reference periods Sage Publications.during the field test. For instance, instead of asking "Have you been Fowler, Floyd. 1993. Survey Research Mlethods. Second ed. Newburyill or injured during the past 30 days?" as in the actual survey, it may Park, Cal.: Sage Publications.be expedient to ask "Have you been ill or injured during the past FreyJames, and Sabine Mertens Oishi. 1995. How to Conduct Interviews12 months?" or "When was the last time you were iDl or injured?" by Telephone and in Person. Thousand Oaks, Cali: Sage Publications.This approach will simplify the logistics of finding enough people Grosh, Margaret, and Juan Munoz. 1996. A Mlanualfor Planning andto try out the niodule but will not test very precisely whether the Implementing the Living Standards Measurement Study Surveyrespondents find it difficult to recall the information, since the Living Standards Measurement Study Working Paper 126.recall period used in the field test will be longer than the period Washington, D.C.:World Bank.used in the final questionnaire. Oliver, Raylynn. 1997. AIodel Living Standards Mleasurement Study

9. This section is a slightly modified version of the discussion of Survey Questionnairefor the Countries of the Former Soviet Union,questionnaire formatting found in Chapter 3 of Grosh and Munioz Living Standards Measurement Study Working Paper 130.(1996). Washington, D.C.: World Bank.

10. For languages that do not have uppercase and lowercase, Scott, Christopher, Martin Vaessen, Sidiki Coulibaly, and Janeanother way should be found to distinguish instructions from ques- Verrall. 1988. "Verbatim Questionnaires Versus Fieldtions. It may be possible to use italics, bold, a different font, or a dif- Translation or Schedules: An Experimental Study" Internationalferent color.An example of this is the LSMS survey of rural house- Statistical Review 56 (3): 259-78.holds in northeast China in 1995. Chinese characters do not have United Nations. 1985. "Development and Design of Surveyuppercase and lo-xvercase, so two different fonts were used. Questionnaires." Department of Technical Cooperation for

11. It is not necessary to convert quantities into standard units Development, National Household Survey Capability(for example, to convert bunches into kilos) to calculate farm Programme, New York.

74

A User’s Guide for Managing Surveys, Interpreting Results, andInfluencing Respondents

PowerSurveyDesign

THE

OF

GIUSEPPE IAROSSI

Pub

lic D

iscl

osur

e A

utho

rized

Pub

lic D

iscl

osur

e A

utho

rized

Pub

lic D

iscl

osur

e A

utho

rized

Pub

lic D

iscl

osur

e A

utho

rized

Pub

lic D

iscl

osur

e A

utho

rized

Pub

lic D

iscl

osur

e A

utho

rized

Pub

lic D

iscl

osur

e A

utho

rized

Pub

lic D

iscl

osur

e A

utho

rized

WB112742
Placed Image

Foreword xiAcknowledgments xiiiAbbreviations and Acronyms xv

Chapter 1. Taking A Closer Look at Survey Implementation 1

Chapter 2.Survey Management: An Overview 9

Overall Program Design 10Questionnaire Design, Pilot, and Data Entry Form 10Survey Firm Selection 12The Sample 18Training 20Fieldwork and Data Quality Control 20

Chapter 3.How Easy It Is to Ask the Wrong Question 27

Practical Guidelines in Questionnaire Designs 29Question Wording 29Question Style 44Question Type 49Question Sequence 74Questionnaire Length 78Questionnaire Layout 80Translation 85Pre-Test 86

Chapter 4.A Practical Approach to Sampling 95

Determining the Sample Size in Simple Random Sampling 96Determining the Sample Size in Stratified Sampling 99How to Carry Out Systematic Sampling 104How to Carry Out the Probability Proportional to Size Selection

Method 107How to Deal with Population Frame Problems 110Impact of Mergers, Acquisitions, and Separations on Sampling

Weights 115Weight Adjustments and Poststratification 120Sampling in Practice: How to Maximize the Sample Representativeness

while Minimizing the Survey Cost through the Use of Poststratification 129

Table of Contents

Chapter 5. Respondent’s Psychology and Survey Participation 147

Factors Affecting Participation 147Training 159Practical Training Tips 163Securing Participation 164Conducting the Interview 178

Chapter 6. Why Data Management Is Important 187

Coding 188Editing 189Electronic Data Entry 191Cleaning 195

References 219

AppendixesAppendix 1. Perception Questions in the Investment Climate

Survey Core Questionnaire 229

Appendix 2. Objective Questions Used for Parametric Estimation of Survey Firm Fixed Effect 231

Appendix 3. Parametric Results of Survey Firm Fixed Effects on Objective Questions 23

Appendix 4. Table of zα/2 Distribution Corresponding to Different Levels of Confidence α

251Appendix 5. Table of Random Numbers

253Appendix 6. Information Disclosed in Survey

Introductions

Appendix 7. Minimum Fieldwork Log Data 25

Boxes1.1 One Poll, Multiple Interpretations 52.1 Criteria to Look at When Selecting a Survey Firm 122.2 Key Actors and Their Functions in a Typical Investment Climate

Survey 152.3 Responsibilities Must be Clearly Identified in the Interview Cycle 23.1 List of Questionnaire Problems for Pre-Test Expert Review 914.1 The Sampling Unit in Business Surveys 974.2 Advising a Mayor 1004.3 Why it is Important to Use Weights with Stratified Sampling 1214.4 Using SAS to Draw Samples 1426.1 How to Assign Questionnaire IDs 193

vi Contents

3

5

2 94

1

Careful planning is vital to the timely completion ofany project, yet the task of planning and manag-

ing a survey is subject to everything from cultural vicissitudes toweather conditions (Warwick and Lininger 1975). Given the endlessnumber of factors (cultural, economic, ethnic, linguistic, political, psy-chological, sociological, and religious) that influence the implementa-tion of any survey, managing such a project is as much art as science.Hence, the survey manager must have experience in survey implemen-tation and a clear understanding of the objectives of the study.1 As inall projects, the survey manager must plan, organize, lead, and controlthe development of the survey (Weeks 2003).

Throughout the survey process technical and organizational deci-sions must blend the theoretically desirable with the practically feasi-ble (Moser and Kalton 1971). Within this realm the survey manager isresponsible for the following:

• Preparing the overall survey program;• Designing the questionnaire and data entry form;• Conducting the pilot;• Selecting the survey firm2 and defining the financial arrangements;• Drawing the sample;• Training the interviewers; and• Monitoring the fieldwork and developing data quality control

procedures.

Often there is thetemptation to skip on[survey] preparation inorder to move to the fieldtoo rapidly. Thistemptation should beavoided.

—Ghislaine Delaine and others,“The Social Dimensions of

Adjustment Integrated Survey”

Chapter 2

Survey Management: An Overview

9

1 We assume the survey manager to be a single individual. Although it is possible for ateam of staff to take on this role, this is less desirable. Given the functional links amongthe key steps of any survey, there are obvious externalities that favor a single individualto be the survey manager. Furthermore, a clearly identified and experienced survey man-ager can ensure that the survey adequately covers policy issues of interest to the data users(Delaine and others 1991).2 The survey firm is contracted to do the fieldwork and enter the data.

The chronological sequence and overlap of each activity as well astheir functional links must be carefully synchronized. After one step iscompleted, going back will compromise the next step and, thus, eitherthe timely conclusion of the survey, the accuracy of the results, or both.The survey manager is generally assisted in these tasks by a statisticianand a data processing coordinator, but the manager remains responsi-ble for overseeing the collection of accurate information in a timelymanner and within budget (Delaine and others 1991). A good surveymanager has the ability to anticipate possible sources of error (inter-viewing, wording of questions, editing, and coding) and delays (natio-nal or seasonal holidays, weather conditions, religious festivities, orsample frame inaccuracy) (Moser and Kalton 1971).

Overall Program Design

The early stages of a survey should include a careful review of the liter-ature and talks with experts in the country. This helps conceptualizepotential problems. Similarly, a review of previous survey work and dis-cussions with local survey practitioners will help determine what ap-proach works best, what hypotheses have been tested, and which questionitems are best suited for the specific survey (Warwick and Lininger 1975).This stage also includes an assessment of the survey infrastructure, a care-ful search for potential partners in implementing the fieldwork and spon-soring the survey initiative, and finally the design of plans for datagathering and entry, reports, presentations, and dissemination.3

Questionnaire Design, Pilot, and Data Entry Form

After the research objectives have been identified, the difficult challengeof translating them into a well-conceptualized and methodologicallysound questionnaire begins (Warwick and Lininger 1975). In Invest-ment Climate Surveys4 the core5 questionnaire represents the starting

10 The Power of Survey Design

3 It is good practice to address issues of data entry software and coding from the very be-ginning, although a more detailed discussion and implementation of these issues comesonly after the questionnaire is finalized.4 Productivity and Investment Climate Surveys, or Investment Climate Surveys, in short,are business surveys conducted by the World Bank. These surveys identify key features ofthe business climate that foster productivity in a way that allows regional and subregionalbenchmarking (World Bank 2003).5 The core questionnaire is a set of standard questions implemented across countries toenable international benchmarking. Retrieved on June 13, 2005, from http://www.ifc.org/ifcext/economics.nsf/Content/IC-SurveyMethodology.

point. The development of the questionnaire starts soon after generalplans have been drawn and ends just days before the start of the field-work. Focus groups can identify concerns and experiences of the targetpopulation, as well as evaluate questions and clarify definitions (Gower1993). The initial questionnaire is usually revised many times.

The pilot test in the field is a critical component of questionnaire de-sign. Similarly, the training sessions for enumerators should be consid-ered the last step of questionnaire design, because it often helps identifyproblems with wording and translation.

As soon as the questionnaire has been finalized, it must be immedi-ately coded and the data entry form developed.6 A variety of data entrysoftware programs are available, some at no charge.7 A well-designeddata entry form will have two basic characteristics. First, it will havean interface that is a replica of the paper questionnaire. Second, it willinclude a number of built-in consistency checks to disallow invalid en-tries. The development of a data entry form is a delicate and complexprocess. A number of intricate cross-references and checks must be in-cluded, which requires a professional programmer. It remains the sur-vey manager’s task to determine and identify which, and to whatextent, within- and cross-question consistencies should be embedded inthe form.8 The inclusion of too many or too stringent consistencychecks will make data entry almost impossible, even when there are errors that can be easily corrected. Conversely, a lax system of consis-tency checks will defeat the purpose of the data entry form. A delicatebalance between these two alternatives must be found.

Once completed the data entry form must be tested, if possible be-fore the beginning of the fieldwork. Testing is of critical importanceand attempts to short cut this step could result in delays at later stagesof the survey.9 In the World Fertility Survey, more than 80 percent of

Survey Management: An Overview 11

6 Coding a questionnaire stands for assigning a name to each variable in the question-naire corresponding to each field in the data set.7 A variety of commercially available software programs (Microsoft Access©, SPSS©, andso on) can be purchased, depending on the desired level of sophistication. Simpler butequally effective data entry programs can be downloaded for free from the U.S. Centersfor Disease Control and Prevention (www.cdc.gov/epiinfo/) or the U.S. Census Bureau(www.census.gov/ipc/www/imps/index.html). Additionally, the U.K. Association for Sur-vey Computing (ACS) has links to software that can be used for data capture and the dif-ferent stages of survey implementation (http://www.asc.org.uk/Register/index.htm).8 The complexity of the form automatically excludes the use of simple software such asMicrosoft Excel©. Excel is data management software and, therefore, not appropriatefor this purpose.9 Form development and testing generally takes two to four weeks.

all errors found at the first check were due to specification errors andprogramming errors (Rattenbury 1980).

Survey Firm Selection

Depending on the intricacy of the questionnaire and the complexityof the sample elements, the selection of a survey firm is one of themost difficult and critical tasks. It affects both the timing of the sur-vey and the quality of the data collected. The survey infrastructure isusually difficult to assess in developing countries and an informed se-lection usually involves evaluating a wide range of factors, from thegeographic distribution of local offices to the number of personalcomputers owned (box 2.1). An experienced survey manager caneasily infer the technical ability of a prospective firm (Grosh andMuñoz 1996) from the quality of written documents, such as surveymanuals and recently implemented questionnaires, as well as fromthe complexity of surveys completed over the past two to three yearsand those planned in the near future.10

12 The Power of Survey Design

Box 2.1

Criteria to Look at When Selecting a Survey Firm

Experience

Questionnaire

How difficult is the content?

How coherent is the content?

How good is the formatting?

How much time does the interview last?

How are sensitive and memory questions addressed?

Sampling

What is the unit of observation?

How difficult is to interview the respondent?

10 Opinion polls and market research surveys are much easier to administer than the typical Investment Climate Survey.

Survey Management: An Overview 13

How large was the sample?

Was the sample nationwide?

Fieldwork

What was the ratio of supervisors to enumerators?

How many reinterviews were conducted?

How good were the supervisor and enumerator manuals?

What was the nonresponse rate due to refusal?

Data management

What kind of data quality assurance did they adopt?

What type of data entry software did they use?

How did they organize data editing and checking?

Resources

Personnel

How many people are on staff in relevant positions (supervisors,

interviewers, data entry, programmers)?

What is their level of education?

What is their age range?

How much experience do they have?

Do staff who worked in previous complex surveys still work there?

Equipment

Do they have offices throughout the country?

Do they have computer capabilities?

What software do they use?

Do they have their own e-mail accounts?

Client orientation

What is their data access policy?

What is their reputation?

What are their business affiliations?

Source: Based on Grosh and Muñoz 1996.

Box 2.1 (continued)

Another important factor to consider in the selection process is theorganization of the fieldwork. The collection of high-quality data in atimely manner depends on how well field operations are organized. Co-ordinating and timing the interactions of tens if not hundreds of peopleat different levels and stages of the survey becomes a vital and yet com-plex task. The way the prospective implementing agency deals withstaffing, scheduling, and coordinating simultaneous activities shouldtherefore be given the appropriate weight in the selection process(Weeks 2003). A survey in which each individual is clearly identified asa part of a team, in which all members are clear about their responsi-bilities and accountabilities, and in which a well-organized structure fa-cilitates the flow of information and quickly resolves possible conflictsand doubts will definitely have a positive impact on the timing andquality of the data collection process. Key actors in a typical InvestmentClimate Survey and their functional relationship are shown in box 2.2.

As in all other steps, the procurement process requires a great dealof attention to details. Even when a highly recommended and seem-ingly well-qualified agency exists less noticeable factors should informthe selection process:

• How unexpected problems are anticipated and addressed;• What steps are taken to ensure quality;• Which approach is used to handle the expected bias associated

with sensitive questions;• What strategies are adopted to elicit participation; and• Which characteristics interviewers and supervisors have (in terms

of age, education, experience, and occupation).11

The terms of reference (TOR) developed by the survey manager pro-vides guidance on the “technical” requirements of competing proposals.Inadequate TORs have frequently been a source of error in contractingout the fieldwork (Grosh and Muñoz 1996). Thus it is preferable to fol-low a two-stage strategy. Initially, the TOR should indicate the projectobjectives and provide a copy of the draft questionnaire as well as adescription of the basic minimum data quality requirements. Biddersshould be left free to formulate a detailed methodology to achieve thesurvey objectives. Given the cultural, political, religious, and ethniccharacteristics of each country, it is not advisable to apply the same

14 The Power of Survey Design

11 See chapter 5 for a more detailed treatment of the interviewer’s characteristics.

Survey Management: An Overview 15

Box 2.2

Key Actors and Their Functions in a Typical Investment Climate Survey

• The survey director generally is the head of the agency in charge of the fieldwork. He or she provides

professional leadership, coordinates with the survey manager on organizational and financial issues,

and provides support to survey implementation especially through community awareness.

• The survey manager coordinates with the survey director on more technical aspects of the survey

work. He or she helps in designing the sample, plans and supervises the field operation procedures,

and contributes to the training session. He or she will also oversee the field supervisors and the data

manager (Grosh and Muñoz 1996).

• The supervisors assign respondents to interviewers, coordinate their assignments, and ensure that they

work efficiently. It is part of the supervisors’ responsibilities to monitor and review the quality of the

fieldwork, to conduct unannounced field interviews, and to make call-backs as deemed necessary

while personally visiting some respondents. Supervisors must review the quality of completed ques-

tionnaires, ensuring that interviewers’ writing is legible and skip patterns are followed. Unreasonable

answers must be flagged and returned to the interviewer for correction, if necessary, through an addi-

tional visit. Finally, supervisors facilitate the exchange of information between survey manager and

interviewers, make sure that all instructions from the central office are relayed to field workers, and

Survey director

I1

Survey manager

Data managerSupervisor A

I2 I3

Supervisor B

I1 I2 I3

Supervisor C

I1 I2 I3 Data entrypersonnel

Interviewers

Box Figure 2.2.1.

Typical Organizational Structure of Fieldwork

(continued)

16 The Power of Survey Design

ensure that the central office is regularly updated on the progress of data collection (Grosh and

Muñoz 1996).

• Interviewers set up appointments with the sampled respondents and conduct the interviews follow-

ing the rules, techniques, and protocols highlighted during the training sessions and indicated in the

survey materials. They re-interview respondents, when necessary, to rectify incorrect or incomplete

entries.

• The data entry manager, along with the survey manager, designs the data entry quality control

protocol and oversees the development of the data entry form. He or she supervises data entry

personnel and liaisons with the field manager.

• Data entry staff code and key-punch electronically the questionnaires completed in the field.

Source: Author’s creation.

Box 2.2 (continued)

methodology in every country. Thus, for instance, in Indonesia itappears unnecessary to require call-backs given that standard practicecalls for each form to be signed and stamped by the respondent. Once asurvey firm has been selected, a second more detailed and comprehen-sive TOR should be agreed on among the parties.

An often-overlooked criterion in the procurement process refers to thepotential measurement error associated with each type of implementingagency. The type of agency conducting the fieldwork—governmentagency or a private survey company—can have a different effect on dataaccuracy depending on the kind of question asked. Sensitive questionsabout bribes, for instance, are consistently underreported when the in-terviewer is a government employee.12 Although the magnitude of thebias varies depending on the specific question, the impact of the under-reporting appears to be in the order of 0.3 to 0.6 standard deviationswhen a government agency is conducting the survey.13 Nonetheless thesurvey manager should not rush to the conclusion that private surveycompanies are always to be preferred. As a matter of fact, the same datashows that using government officers as interviewers has a positiveeffect on data accuracy by reducing measurement errors for nonsen-

12 A more detailed description of this phenomenon is presented in chapter 3, on ques-tionnaire design, in the discussion on sensitive questions and subjective questions.13 See appendixes 2 and 3 for a description of questions and a complete set of regressionsresults.

sitive questions. The manager’s estimates of sales growth were moreaccurate14 when government officials conducted the interview. This isnot surprising, because statistical officers generally are better trainedand more experienced in conducting business interviews. The magni-tude of the underreporting (measured in terms of standard deviations)of corruption questions when government officials conduct the inter-views appears similar to the magnitude of accounting data inaccuracywhen the survey is fielded by a private firm (see figure 2.1).15

Over the years, the financial resources needed to conduct a firm-levelsurvey in developing countries have varied. Once again a number ofcountry-specific factors apply, each having a different impact on thesurvey budget: a 7-page questionnaire will be priced differently than a20-page instrument, travel costs are unlikely to be the same in Braziland in Eritrea, and survey experts are harder to find and more expen-sive in Africa than in East Asia.

Survey Management: An Overview 17

14 The absolute value of the error was more than 17 percent for a private company andclose to 1.5 percent for a government agency.15 Data accuracy is measured as deviation between the manager’s reported value of salesgrowth last year and the same values calculated from company books.

Figure 2.1.

Who Is Asking What?Reporting Differences When A Government Official, Rather Than a Private Firm, Asks Sensitive Questions

Source: Author’s calculations.

0.3

–0.33–0.4

–0.2

0

0.2

0.4

Standard deviation

Overreportingon sales question

Underreportingon corruption question

PrivateFirm

GovernmentAgency

Household surveys experience shows that 70 to 90 percent of thetotal survey cost goes to field implementation, while personnel andtravel represent the two most important cost categories (table 2.1).Particular attention must also be paid to the internal composition ofthese two items. Determining the appropriate salary levels across differ-ent professional categories is always problematic. The survey job requiresmonths of intense work and it is unrealistic to assume that this can bedone without appropriate incentives, particularly for the interviewers.Travel costs, including per diems, will also be a source of resentment ifnot appropriately estimated. This is clearly a country-specific problem.Nonetheless, accurate planning in terms of the estimated number ofvisits necessary to complete an interview is essential.

Survey managers must use creativity, diplomatic skills, and expertiseto find a solution that is tailored to the country characteristics whilebeing fair to all parties (Grosh and Muñoz 1996). An issue that occa-sionally surfaces is not only the appropriate rate of pay, but also the rela-tive merits of paying interviewers on a piece rate or by the hour.Supporters of piece rate payment point out the strong economic incen-tive for field staff and the more efficient use of time. Hourly wage advo-cates criticize the former approach for providing an incentive to preferquantity over quality and to “fabricate” answers (Warwick and Lininger1975). A combination of the two approaches might be the best solution.In this case, for each completed questionnaire, a flat rate would be paid,augmented by variable components, mainly related to travel costs andper diem expenses, with a decreasing weight when the number of visitsreaches a predetermined limit. It remains in the survey manger’s interestto relate the cost of the survey to the quality of the data collected, andthe final rate agreed with the implementing agency should reflect this.

The Sample

Soon after the decision to undertake the survey has been reached, anumber of critical decisions must be taken regarding the following:

• The identification of the sample unit;• The localization of the population list;• The design of the sampling procedure; and• The determination of the sample size.

Preparations to draw the sample should start at the earliest possibletime given how difficult and time-consuming it is in many developing

18 The Power of Survey Design

Survey Management: An Overview 19

Table 2.1

Share of Survey Cost in Household Surveys

Percentage Weight of Accounting Categories

Personnel Transport Equipment Consumables Other Sample Size

Angola 63 22 10 1 4 6,000

Botswana 79 0a 10 4 7 7,000

Eritrea 64 0a 28 5 3 4,000

Kenya 62 23 3 5 7 7,000

Lesotho 75 5 6 2 12 7,500

Madagascar 31 7 33 13 16 6,500

Malawi 32 17 24 22 5 6,000

Mozambique 61 12 3 12 11

Somalia 44 18 5 1 33 2,200

South Africa 69 24 2 4 2 30,000

Swaziland 30 4 2 1 63 4,500

Tanzania 78 13 2 1 7 3,000

Zambia 82 5 2 6 5 8,000

Overall 63 15 7 6 9 7,054

Percentage Weight of Survey Activities

Preparation Implementation Data Processing Reporting

Angola — 84 6 10

Botswana 10 59 22 9

Kenya — 94 3 4

Lesotho — 73 19 9

Madagascar 0 79 3 18

Malawi 5 63 16 16

South Africa 1 93 3 3

Swaziland 63 23 8 6

Tanzania 23 72 4 1

Zambia 0 92 6 1

Overall 7 81 6 6

Source: Keogh 2003.Note: Data refers to household surveys. — = Not classified.a. Amount included in the personnel costs.

countries to identify a reliable sampling frame. At the end of the field-work, the estimated weights must be adjusted to account for frame prob-lems and nonresponse.

Training

When everything is ready for the start of the fieldwork, training shouldtake place. No matter how complex the questionnaire is, and given theaverage interviewer’s quality in developing countries, training remainsfundamental to ensure a consistent interpretation and implementationof questions. The survey manager, having extensive experience and aclear understanding of the analytical objective of each question, is thebest person to conduct the training. In this process, training manualsare particularly useful, containing detailed information on the generalpurpose of the survey, instructions on the conduct of the interviews, de-tailed explanations of the questions, and references to the methodologyfor recording answers.

Fieldwork and Data Quality Control

The fieldwork is the most time-consuming part of the survey. Althoughthe interview cycle itself must be clearly defined and responsibilitiesclearly identified (box 2.3), the more complex the questionnaire, themore difficult it is to estimate the exact timing of survey completion.

A host of factors influence the chronological implementation of thesurvey. Apart from some obvious “objective” features such as the lengthof the questionnaire, the size and composition of the sample, and thenumber of interviewers, a host of other intangible factors, some quitesubtle, come into play. For example, how well a questionnaire is de-signed will definitely impact the timing of the interview. The appropri-ate use of skipping patterns and the clarity of definitions and sentenceswill not only speed up the interview process but also ensure accuratedata. The quality of the interviewers, and more generally of the surveyfirm, is another factor influencing the timely completion of the survey.Interviewers with an unambiguous understanding of the questions, withexperience in similar surveys, and with the ability to establish a clearrelationship of trust with the respondents will foster higher coopera-tion and complete the interviews in a shorter period of time. Similarly,if the fieldwork is thoroughly organized, delays are minimized. Theaccuracy of the population list is yet another factor. If the list is up to

20 The Power of Survey Design

Box 2.3

Responsibilities Must be Clearly Identified in the Interview Cycle

Verification of existence of establishment

Letter of invitationReplacement

Nonexisting orout of scope Existing

Rectified byinterviewer

Errors andomissions

Centraloffice

Interviewer

Follow-up call

Nonacceptance

AcceptanceAdditional attempts

by interviewerand/or senior staff

Appointment

Interviewcompleted

Questionnaire editedby interviewer

Supervisor check

Survey manager

Data manager

Data entry

Box Figure 2.3.1.

Typical Interview Cycle

date, time will not be wasted in locating respondents that relocated orestablishments that no longer operate. Last but not least the predeter-mined level of response rate considered acceptable will also impact theduration of the survey. A survey with 50 percent of item nonresponsewill no doubt be completed faster than a survey with 90 percent of allquestions appropriately answered.

The beginning of the fieldwork marks the start of a number of head-quarter (HQ) activities coordinated by the survey manager. As soon asinterviewers are in the field, the survey manager should start planningfor quality control and data cleaning. While the development of a response rate control program is relatively fast, the development of acleaning program takes longer. The response rate control must proceedalmost contemporaneously to the fieldwork and should be used to feedback instructions to the field manager about how to improve the qual-ity of the data collection process. To achieve this efficiently, data mustbe sent back to HQ in batches at regular intervals. Data cleaning, onthe other hand, should start during the fieldwork but can only be com-pleted after the end of the collection process.16

One critical aspect of the survey manager’s job is to anticipate po-tential bottlenecks and take remedial actions before they compromisethe timely completion of the whole project. No matter how many fac-tors have been taken into account in the preparatory stage of the sur-vey, the experienced survey manager must be on the lookout for the

22 The Power of Survey Design

The central office should be responsible for the verification of the existence of establishments and for

delivering the introductory letter. The interviewer should first approach the respondent with a personal visit

to secure participation. Additional coaching, if necessary, to convince reluctant respondents should be

handled by the supervisor or survey director. The replacement of ineligible respondents must be carried out

according to specific sampling procedures agreed on with the survey manager. The supervisor’s role is to

manage the fieldwork and to check the quality of the interviews. The completed and verified questionnaire is

transferred to the data entry manager whose team will code and enter it in electronic form.

Source: Author’s creation.

Box 2.3 (continued)

16 Depending on the length of the questionnaire and the degree of accuracy of the clean-ing protocol, the development of the cleaning program can take from three to six weeks.

unexpected. Two useful tools are at the survey manager’s disposal—one for monitoring the design of the whole project, the other to super-vise the progress of the fieldwork.

A first tool used in planning and managing the timing of a survey isthe Gantt Chart (see figure 2.2). Defined as a graphic representation ofthe sequence and link of activities, it can be used to detect slacks and thecritical path of the whole project.17 This chart is a useful tool in identi-fying what options are available if problems occur during the implemen-tation of the survey. For example, if the survey is behind schedule, thefollowing alternatives could be employed to make up time (Weeks2003):

• Start earlier critical path activities by overlapping with predecessoractivities.

• Shorten the duration of critical path activities by (1) adding re-sources if they are resource-driven, or (2) internalizing the loss(that is, lower quality) if not resource driven. This approach worksbest if employed on earlier activities.

• Move resources from noncritical to critical path activities.

The second tool designed to aid field supervision is the weekly re-port (table 2.2). This simple form allows the survey manager toeffectively monitor the progress of interviews from invitation to com-pletion and to estimate a number of fieldwork performance indica-tors, such as cooperation rate, response rate, coverage rate, refusalrate, and completion rate.

Survey Management: An Overview 23

17 The critical path is the series of activities that determines the duration of the project. Slackis the amount of time that an activity can be delayed without delaying the project comple-tion date. By definition, the critical path has zero slack (Project Management Institute 2000).

Figure 2.2.

Gantt Charts Illustrate Timing of Survey Activities

24 The Power of Survey Design

ID Task Name Duration Sep '04 Oct '04 Dec '04 Jan '05 Feb '05 Mar '05 Apr '05 MNov '04

Split

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

1. Questionnaire design

Develop draft instruments

Review & revise draft

Pilot

Translate

Finalize

2. Procurement

Draft TOR

Receive Proposals

Review, Negotiate

3. Sample design

Sample design

Identify frame problem

Adjust sampling weight

4. Manual

Develop draft manuals

Review and finalize

5. Data Entry Form

Develop

Test & Finalize

6. Training

Develop Training program

Hold training sessions

7. FIELD WORK

Interviews

Data entry operations

8. HQ activities

Response rate control

Cleaning

64 days

10 days

10 days

5 days

5 days

1 day

43 days

1 day

21 days

21 days

138 days

21 days

25 days

2 days

20 days

15 days

5 days

20 days

15 days

5 days

10 days

5 days

5 days

85 days

80 days

80 days

80 days

75 days

15 days

22 29 5 12 19 26 3 10 17 24 31 7 14 21 28 5 12 19 26 2 9 16 23 30 6 13 20 27 06 13 20 27 3 10 17 24 1

Progress

Task

Summary

Project Summary

Milestone

External Milestone

Deadline

External Tasks

Source: Author’s creation.

Table 2.2.

Weekly Reports Enable Managers to Monitor Progress

Form FormTarget Out of Non- Total Agreed to Partially Fully Forms Forms Sample

Supervisors Sample Refusalsa Scope contactb Samplec Visited Participate Completed Completed Validated Entered Left

Supervisor 1 133 8 2 0 143 78 56 39 22 18 16 115

Supervisor 2 100 3 9 0 112 76 58 53 51 41 40 59

Supervisor 3 130 1 10 0 141 94 78 56 53 50 47 80

Supervisor 4 299 0 25 0 324 207 164 161 157 111 99 188

Supervisor 5 73 0 1 0 74 28 23 47 31 21 15 52

Supervisor 6 265 5 50 0 320 202 140 118 75 74 63 191

Total 1,000 17 97 0 1,114 685 519 474 389 315 280 685

Source: Author.a. No more attempts.b. Nonexisting, moved outside study area, wrong address.c. Target sample + Replacements (refusals + out of scope + noncontacts).

NonresponseRespondents

Food data collection

in household

consumption

and expenditure

surveys

Food data collection in

household consumption

and expenditure surveys

New York, 29 March 2018

This work is a co-publication of

The Food and Agriculture Organization of the United Nations

and The World Bank

Guidelines for low- and middle-

income countries

Prepared by

The Inter-Agency and Expert Group on Food Security,

Agricultural and Rural Statistics

and

endorsed by the forty-ninth session of the United Nations Statistical Commission,

Required citation: FAO and The World Bank. 2018. Food data collection in Household

Consumption and Expenditure Surveys. Guidelines for low- and middle-income

countries. Rome. 104 pp. Licence: CC BY-NC-SA 3.0 IGO

The designations employed and the presentation of material in this information product

do not imply the expression of any opinion whatsoever on the part of the Food and

Agriculture Organization of the United Nations (FAO) or The World Bank concerning

the legal or development status of any country, territory, city or area or of its authorities,

or concerning the delimitation of its frontiers or boundaries. The mention of specific

companies or products of manufacturers, whether or not these have been patented, does

not imply that these have been endorsed or recommended by FAO or The World Bank in

preference to others of a similar nature that are not mentioned.

The views expressed in this information product are those of the author(s) and do not

necessarily reflect the views or policies of FAO or The World Bank, its Board of

Executive Directors, or the governments they represent.

ISBN 978-92-5-130980-3 (FAO)

© FAO and The World Bank, 2018

Some rights reserved. This work is made available under the Creative Commons

Attribution-NonCommercial-ShareAlike 3.0 IGO licence (CC BY-NC-SA 3.0 IGO;

https://creativecommons.org/licenses/by-nc-sa/3.0/igo).

Under the terms of this licence, this work may be copied, redistributed and adapted for

non-commercial purposes, provided that the work is appropriately cited. In any use of

this work, there should be no suggestion that FAO or The World Bank endorse any

specific organization, products or services. The use of the FAO or The World Bank logo

is not permitted. If the work is adapted, then it must be licensed under the same or

equivalent Creative Commons license and include the following disclaimer along with

the required citation: “This is an adaptation of an original work by the Food and

Agriculture Organization of the United Nations (FAO) and The World Bank. Views and

opinions expressed in the adaptation are the sole responsibility of the author or authors of

the adaptation and are not endorsed by FAO or The World Bank. If a translation of this

work is created, it must include the following disclaimer along with the required citation:

“This translation was not created by the Food and Agriculture Organization of the United

Nations (FAO) or The World Bank. Neither FAO nor The World Bank is responsible for

the content or accuracy of this translation. The original English edition shall be the

authoritative edition.”

Any mediation relating to disputes arising under the licence shall be conducted in

accordance with the Arbitration Rules of the United Nations Commission on

International Trade Law (UNCITRAL) as at present in force.

Third-party materials. Users wishing to reuse material from this work that is attributed to

a third party, such as tables, figures or images, are responsible for determining whether

permission is needed for that reuse and for obtaining permission from the copyright

holder. The risk of claims resulting from infringement of any third-party-owned

component in the work rests solely with the user.

Sales, rights and licensing. FAO information products are available on the FAO website

(www.fao.org/publications) and can be purchased through [email protected].

Requests for commercial use should be submitted via: www.fao.org/contact-us/licence-

request. Queries regarding rights and licensing should be submitted to:

[email protected].

iii

Contents

Preface……………………………………...…………………….…iv

Acknowledgments………………………………………..……….. ..v

Acronyms………………………………...........................................vi

Executive summary……………………………………………......vii

1. Introduction……………………………………………...……1

1.1. Background and motivation……………………………………..……1

1.2. Objectives and audience………………………………………………..6

1.3. Emerging issues…………………………………………………………….7

2. Review of the evidence and summary of the main issues…..12

2.1. Recall versus diary and length of reference period……….12

2.2. Seasonality, number of visits……………………………….........19

2.3. Acquisition versus consumption………………………………... 23

2.4. Meal participation………………………………..........................28

2.5. Food away from home…………………………………………………33

2.6. List of food items………………………………...........................39

2.7. Non-standard units of measurement……………………………..42

3. Conclusions and recommendations…………………………47

3.1. Recall versus diary and length of reference period……….50

3.2. Seasonality, number of visits……………………………….........52

3.3. Acquisition versus consumption………………………………... 53

3.4. Meal participation………………………………..........................55

3.5. Food away from home………………………………...................56

3.6. List of food items………………………………...........................58

3.7. Non-standard units of measurement…………………………….62

Annex 1 Food data collection in household consumption and

expenditure surveys. Draft guidelines for low- and

middle-income countries………..…… ……………………64

Annex 2 Food data collection in household consumption and

expenditure surveys. Draft guidelines for low- and

middle-income countries……………………………………70

4. Bibliography………………………………............................75

5. Glossary………………………………...................................87

vi

Acronyms

BMI body mass index

COICIOP Classification of Individual

Consumption According to Purpose

CBI Cost of Basic Needs

CPI consumer price index

DEC dietary energy consumption

FEI Food Energy Intake

GDP gross domestic product

HCES household consumption and

expenditure surveys

IAEG-AG Inter-Agency and Expert Group on

Food Security, Agricultural and

Rural Statistics

IFAD International Fund for Agricultural

Development

ILO International Labour Organization

INEI Instituto Nacional de Estadística e

Informática (INEI)

OECD Organization for Economic Co-

operation and Development

PoU prevalence of undernourishment

UEMOA West Africa Economic and

Monetary Unit

UNICEF United Nations Children’s Fund

UNSC United Nations Statistical

Commission

UNSD United Nations Statistics Division

USDA United States Department of

Agriculture

WFP World Food Programme

12

2.Review of the evidence and

summary of the main issues A comprehensive review of the different uses of HCES is provided in Smith,

Dupriez, and Troubat (2014). Among those uses are poverty measurement,

informing food security assessment, providing inputs in the compilation of food

balance sheets, providing information for the planning and monitoring of

nutrition interventions, informing the compilation of national accounts, and

collecting data for compilation of CPI. As a result of the different uses, and the

constituencies of users associated with them, the demands from the data vary, and

depending on the exact nature of HCES being designed, there are going to be

different sets of constraints and opportunities for repurposing. Any attempt at

adjusting the design of a survey needs to take into account the analytical needs of

the different users. In this document, the main uses considered in setting the

criteria for guiding survey design are food security assessments, poverty

measurement, and nutrition policy and programming.

Some key issues in the measurement of poverty and food security, and for

monitoring nutrition interventions, that are useful for understanding the data

needs connected to those uses are presented in the Annex 1.9 In what follows, the

document contains a summary of the literature on key choices that confront

practitioners as they design and implement HCES questionnaires. Those aspects

were identified as priority areas in a review conducted by Smith, Dupriez, and

Troubat (2014) and by experts that participated in the consultation process

convened by IAEG-AG and led by FAO and the World Bank. Several of those

issues are also treated, theoretically and empirically, in a recent issue of the

journal Food Policy. The volume includes case studies from a diverse set of

developing and OECD countries, analyzing how different surveys design options

affect the quality of the data being collected and, in turn, the implications for

statistical inference and policy analysis (Zezza et al., 2017).

2.1. Recall versus diary and length of

reference period

Data on household food consumption (and acquisition) is commonly collected

either by asking households to keep a diary over a reference period (e.g. days or

weeks) or through interviews in which respondents are asked to recall

consumption for a specific period (e.g. the past week or the past 30 days). A large

body of evidence from research and practical experience shows that the method

chosen can significantly affect the resulting estimates of consumption.

9 See Annex 1

13

The recall period is referred to as the period over which respondents are asked to

recall the consumption of food items. The recall period differs from the reference

period when households are interviewed multiple times during multiple visits to

the household (Smith, Dupriez, and Troubat, 2014). For example, if households

are interviewed about their food consumption in the last seven days over four

weekly visits, the recall period is seven days and the reference period is 28 days.

In diary methods, households are generally asked to record consumption at the

moment in which it takes place (e.g. at meal times or at time of purchase).

However, in practice, households often fill in information about their

consumption at the end of the day, or during supervised visits from enumerators

(for example, for two-day recall periods if visited every other day). This blurs the

line between diary and recall methods, especially when respondent illiteracy is

high and supervisors support completion of the diary with visits every few days.

The choice of recall period has long been a critical element of survey design for

which there has been limited agreement and evidence of best practice. Scott and

Amenuvegbe (1990) suggested the “wide variations [in recall period] reflect the

almost total absence of evidence for developing countries on the level of recall

error and its relation to recall duration.” Similarly, Deaton, and Grosh (2000)

commented that “there are no definitive answers about the optimal recall period

(…). In the meantime, however, surveys must be designed.”

This uncertainty is reflected in the large variation observed in the choice of recall

periods across surveys. The review of 100 HCES undertaken by Smith, Dupriez,

and Troubat (2014) reveals that of the 56 surveys using exclusively interview

methods, 26 surveys were using multiple recall periods depending on the source

of acquisition or the nature of the purchase (frequently or less frequently

purchased). Of the 30 surveys using only one recall period, 13 used a recall

period of seven days, four used a recall period of 14 to 15 days, two used a recall

period of one month, five used the “usual month” or “usual week” approach, and

the rest used a different recall period.

The “usual month” or “usual week” approach uses a recall period longer than the

month (usually the past 12 months) and is aimed at capturing seasonality and

other short-term fluctuations in food consumption. Households are asked to recall

their average monthly or weekly consumption over the past year, sometimes by

breaking this into questions about the number of months per year that they

consume (or acquire) the food in question, the times per month that they acquire

it in those months and the typical quantity and value on each acquisition

occasion. Consequently, the recall period is the year and the reference period is

meant to be the typical month within that year, although there is evidence that

respondents anchor their answers in the economic conditions of the most recent

month (Gibson, 2007).

For recall surveys, the challenge is to choose an effective method for measuring

the concept of interest while avoiding biases resulting from two main sources:

memory decay and telescoping. A longer recall period may be desirable to better

capture items consumed infrequently and to obtain a better sense of the true

distribution of consumption over a longer time period (addressing the seasonality

of consumption). However, one common effect of longer recall periods is

14

memory decay (or “progressive forgetting” on the part of the respondent), to use

the terminology of Deaton and Grosh (2000), which can lead to under-reporting

of consumption. Scott and Amenuvegbe (1991) investigated the magnitude of

recall error in Living Standards Measurement Study style surveys in experiments

with the Ghanaian Living Standards Survey. For 13 frequently purchased items,

expenditure reported in the survey fell an average of 2.9 percent per additional

day of recall. For seven-day recall, expenditure was 87 percent of what it was for

single-day recall; after two weeks, the recall error levelled out at around 20

percent. Similarly, the Indian National Sample Survey conducted experiments on

recall period using “last week” and “last month” in which it was found food

expenditure estimates in the weekly recall was more than 20 percent higher than

in the monthly recall (NSSO, 2003).

Although a shorter recall period reduces error caused by memory decay, choosing

a short recall period introduces another set of problems. As noted by Deaton and

Grosh (2000), even under perfect recall, when the recall period is shorter than the

period used for analysis, the measure includes variance that does not reflect the

true distribution of living standards. A short recall period of one day may

eliminate bias in the mean,10 but it poorly reflects the distribution of expenditure

and consumption over a longer time period, such as a month or a year, which

generally is the key statistic of interest for household surveys. While the “usual

month” approach was advocated by Deaton and Grosh as a way to structure a

long recall period to make it more feasible for respondents to answer, while

providing analysts with a measure of more typical living standards than is

available from a short recall period, the evidence is that this method is not able to

overcome the tension between what is feasible to ask and what is desirable to

know. In particular, the “usual month” method has proven to be cognitively

burdensome, and it, therefore, introduces educational-related inequality into the

measure of consumption inequality, takes almost twice as long to field as a fixed

recall survey over the same foods, and introduces errors on both the extensive

and intensive margins (Friedman et al, 2017).

Another strategy used in some surveys is to break the longer reference period into

a series of short, adjacent, recall periods. For example, in the Ghana Living

Standards Measurement Study, households are visited up to 10 times over a one-

month period, so that there are only short recall periods between each visit. A

similar design is present in many diary-keeping surveys for illiterate respondents,

who may be visited every day or second day over the 14 to 28-day reference

period. While there may be some novelty for a respondent being interviewed the

first time, a high frequency of repetitious interview visits is likely to induce non-

compliance, and clear evidence of that is shown in the Ghana Living Standards

Measurement Study by Schündeln (2017), who finds that data quality is highest

for the first interview and falls monotonically with each successive interview.

Thus, measured food poverty would be 13 percentage points higher if all 10

interviews of the same household are used, compared with using just the data

from the first visit.

10 Assuming the sample of household is sufficiently distributed throughout seasons.

15

The trade-off between shorter reference periods that allow recall over that same

period to be less prone to forgetting, but provide a poor guide to typical, long-run

living standards also affects studies focused on estimating average daily per

capita dietary energy consumption (DEC).11 In particular, shorter reference

periods are found to affect the variability in energy and nutritional estimates.

Using data for Myanmar collected over two monthly rounds per household

approximately six months apart, Gibson (2016) annualizes estimates of daily

calories per capita from each survey round in two ways: a naïve extrapolation that

multiplies estimates from each round by six and then adds them, and a corrected

extrapolation, which is based on the intra-year correlation in daily calorie per

capita across survey rounds of 0.45. The implications for measures of hunger by

doing this correction is exemplified in Figure 3. Given two distributions with the

same median calories per capita per day, the one based on naïve extrapolation of

the monthly data will have a greater dispersion compared to the adjusted one,

resulting in a greater incidence of hunger for a given threshold (2000 kcal/day in

this case).12

Figure 3

Chronic hunger overstated by naïve extrapolation from monthly calories to annual

Source: Gibson (2016).

An analysis performed by the FAO Statistics Division using the 2010 data of the

Bangladesh Household Income and Expenditure Survey illustrates how the

variance of the per capita DEC is significantly reduced over longer observation

periods. (Box 1).

11 This is a key variable in the measurement of undernourishment, as estimated by FAO and reported

in the State of Food Insecurity in the World Report series. 12 To estimate the prevalence of undernourishment (Sustainable Development Goals indicator 2.1.1),

FAO is using the minimum dietary energy requirement that depends on the age, sex structure of the

population, the fifth percentile of the body mass index distribution for adults, the height of the individuals in the population and the average of the sedentary lifestyle range for physical activity

levels in the country. This threshold is not fixed and varies from one country to the other.

16

Box 2

Case study: Bangladesh 2010 Household Income and Expenditure Survey

The Bangladesh Household Income and Expenditure Survey 2010 was carried out

from February 2010 to January 2011. Within those 12 months of investigation, the

survey was divided into 18 periods, each 20 days. Food consumption was collected

through a diary over a period of 14 days. Throughout the period, households were

visited frequently (from 7 to 14 visits).

The Dietary Energy Consumption (DEC) was estimated for each day of the diary. The

figure below shows the coefficient of variation (CV) of DEC obtained using different

numbers of diary days. From the plot, it can be clearly seen that the variability is highest

(CV=35 percent) when using observations from the first day only and decreases

convexly and converges at a value of CV of around 24 percent after the seventh day.

After the first week, the variability does not seem to decrease much, suggesting that a

reference period of seven days might be enough for estimating the variability of DEC.

Source: Grünberger (2017a)

Short recall periods can also lead to telescoping in which respondents report

consumption that has taken place outside the reference period, also causing a bias

in estimates. Several studies suggest that there cannot be one optimal recall

length, as, depending on the type of good and the frequency of consumption,

telescoping or decay may be observed (Bradburn 2010; Hurd and Rohwedder,

2009). In general, telescoping is more likely for large and infrequently purchased

or consumed items under shorter recall periods, while a longer recall period leads

to recall decay and underreporting of more common and frequent purchases

(Deaton and Grosh, 2000; Moltedo et al, 2014; Neter and Waksberg, 1964). The

Indian National Statistical Office designed an experimental survey including

three types of data collections: daily visits with direct measurement (benchmark),

seven-day recall, and 30-day recall by food group. The report of the Indian

National Statistical Office (NSSO, 2003) shows that optimal recall period

depends on the food group and frequency of consumption. The 30-day recall

works better than the seven-day recall in measuring staple food like cereals and is

not inferior in measuring high-frequency items. One explanation for those

17

patterns is that staple foods, and other high-frequency items, lend themselves to

more accurate “rule of thumb” reporting, based on their regularity (Friedman et

al, 2017), so strictly speaking they are being “estimated” rather than “recalled”,

and with a 30-day period, the effect of any telescoping is diluted compared with a

seven-day period.

A possible solution for dealing with telescoping is a bounded recall approach,

suggested by Neter and Waksberg (1964). In that approach, a first visit to the

household by the enumerator is used to establish the bound of the recall period

for a second visit, which is when the actual interview takes place. The enumerator

can, accordingly, ask the respondent about consumption since the first interview,

reducing the likelihood of respondents reporting consumption or acquisitions

taking place outside the recall period. This approach can, however, be costly to

administer as it requires two visits to the same household, and despite having a

long history (including within the Living Standards Measurement Study

program), there have not been enough studies to provide convincing evidence

that it offers a significant advantage in terms of data quality (Deaton and Grosh,

2000; Gibson, 2005). This is, therefore, an area in which future methodological

research could usefully focus on to establish whether such advantage exists in

practice.

Diaries present an approach which in theory can deal with important shortfalls of

regular (longer) recall methods, such as telescoping and recall bias. They are in

fact the method of choice and are successfully implemented in many countries for

collecting data on food and other frequent expenditures. However, they can be

practically challenging to implement in the conditions prevalent in many low-

and middle-income countries. Diaries are far more demanding in terms of

supervision, especially with illiterate respondents, when they are implemented as

a series of short recall interviews, and as a result, become more expensive and

demand higher capacity. While a well-implemented diary is generally considered

the gold standard for measuring consumption, poorly implemented ones are often

inferior to a good recall survey. Even in the context of the United States of

America, where the set of challenges for diaries and recall may be different than

in lower income countries, evidence suggests that recall surveys might

outperform diaries (Bee, Meyer and Sullivan, 2012).

A growing body of research has shown how the diary method causes

considerable response burden and fatigue, particularly when the length of the

diary increases, ultimately affecting data quality and reliability. Studying the

Canadian Food Expenditure Survey, Ahmed, Brzozowski, and Crossley, (2006)

find a decrease in reporting because of “diary exhaustion” with reporting

decreasing by 10 percent from the first to the second week of filling diaries.

Similarly, studying the United States, Stephens (2003) finds significantly higher

values in the first diary week and on the first day of each diary week relative to

the remaining days, attributable to respondent fatigue. Analyzing the 2009/10

data of Papua New Guinea, Gibson (2013) also finds that the total value of

consumption transactions declined by 4.4 percent per day during the diary-

keeping period. A large set of other studies, such as Kemsley (1961), Turner

(1961), Sudman and Ferber (1971), McWhinney and Champion (1974), and

18

Silberstein and Scott (1991), find similar evidence of fatigue and decay in

information collection in diaries over time.

With high levels of supervision and careful implementation, diaries can and are

being implemented in some countries, with good results. In analyzing the

Bangladesh 2010 Household Income and Expenditure Survey, the FAO Statistics

Division reported a negligibly low decrease of DEC because of fatigue, likely

because of very good respondent supervision practices, with enumerator visits

taking place every one or two days.13 Such levels of supervision lead to a mix of

diary and interview methods that are not likely to be affordable; diaries do not,

therefore, appear to be the most suitable method for resource-constrained

statistical offices in low- and middle-income countries. Furthermore, even for

well-implemented diaries, the evidence clearly suggests that longer periods of

implementation do not add to the quality of information (they actually detract

from it) and entail higher implementation costs. Expanding mobile phone

coverage throughout the world opens possibilities for remotely assisting diary

completion (as well as recall interviews) at a fraction of the cost. This is an

emerging trend (or an established one in some high-income countries), but not an

area for which there is enough experience at scale in low-income settings for it to

be recommended as a common practice during the time these guidelines are being

formulated.

One final aspect of diary implementation is that often the analyst is presented

with data that have already been to some extent aggregated (e.g. by adding up the

7 or 14 days of data), which does not allow for detecting and correcting possible

patterns in the data, such as diary fatigue (Troubat and Grünberger, 2017). When

diaries are implemented, it is important that they are reported together with full

metadata, allowing the user to evaluate the data-collection process, including the

role of the enumerator in aiding the data-collection process, the number and

timing of supervision visits, and similar details.

In the Living Standards Measurement Study handbook, Deaton and Grosh (2000)

provided a discussion of the issues outlined above and concluded by

recommending only changes on the margins of the Living Standards

Measurement Study status quo. Specifically, that meant using bounded recall for

purchases, coupled with a usual month question for purchases and consumption

of food from own production, plus one 12-month recall question on the value of

food gifts received by the household. Deaton and Grosh has already observed,

however, a decline in the actual use of bounded recall in the Living Standards

Measurement Study survey practice for reasons related to the added cost and

burden (for enumerators and respondents) of the additional household visit.

While pointing to the pros and cons of the usual month and of shorter recall

periods (progressive forgetting, telescoping, difference from the “true” variance)

as discussed above, in recommending the usual month approach, they also

recognized that this was based on weak and often contradicting evidence, and

mostly motivated by the desire to modify the “status quo” at the margins, in the

absence of stronger evidence in favour of a particular approach.

13 Food consumption reporting dropped on average by less than 0.1 percent per diary day.

19

Despite the lack of conclusive evidence lamented by Deaton and Grosh, and their

call for “every survey [to] have a budget for experimentation”, there has been a

limited number of new studies undertaken in low- and middle-income countries

focusing on those methodological questions. One that has been particularly

influential is the SHWALITA study (“Survey of Household Welfare and Labour

in Tanzania”) (Beegle et al., 2012; Gibson et al., 2015; de Weerdt et al., 2016).

New evidence has also been produced through the work reported by Backiny-

Yetna, Steele, and Yacoubou (2017) in Niger. Based on those studies and

increased practical experience, practitioners involved in living conditions surveys

have come to favour a seven-day recall period over longer reference periods.

Deaton and Grosh had already noted signs of the bounded approach falling out of

fashion with practitioners because of its higher complexity.

The SHWALITA study (Beegle et al., 2012) provides convincing evidence, from

an experimental setting, that recall interviews inquiring about “usual” monthly

consumption food underestimated household consumption expenditure when

compared to the benchmark assisted individual diary (see Figure 1), whereas the

seven-day recall was reasonably close to the benchmark. At the same time, the

usual month interviews also had the longest completion times (76 minutes

compared to just under 50 minutes for the 7- and 14-day recall), and were not

associated with a significantly smaller coefficient of variation when compared to

the shorter recall methods. In addition to the resource implications of longer

fieldwork time, the longer completion time for the usual month approach is

suggestive of a greater burden on the respondent who, with the enumerator, needs

to engage in a demanding estimation procedure to work out the response for a

typical month starting from recalling consumption episodes over a 12-month

period. Taken together, this evidence indicates that the usual month may be a

lose-lose proposition if it is less accurate and more cumbersome to implement

when compared to a seven-day recall. This is possibly the most important single

development in the evidence base since the publication of Deaton and Grosh

(2000).

Importantly, another plea made by Grosh and Deaton (2000) two decades ago

remains unanswered and just as valid today. As changing the recall period or

method leads to incomparability issues with previous surveys using other

methods, changes in survey methods over time should be accompanied by an

experimental study to make it possible to reconcile the figures produced by the

survey before and after the change in methods. Experiments, such as Beegle et al.

(2012) and Backiny-Yetna, Steele, and Yacoubou (2017) have provided good

practical examples of how changes in methods can be assessed and thus allow for

valid comparisons when methods are changed.

2.2. Seasonality, number of visits Consumption and expenditure patterns often show seasonal variations that are

linked to the agricultural production season, cyclical events, such as floods and

droughts, or cultural events (e.g. Ramadan, Christmas), which affect food

availability, prices and customary consumption practices. The existence of

20

seasonality in food consumption patterns is well-established (Paxson 1992, 1993;

Alderman, 1996) but its extent depends greatly on the context.

Seasonality can be particularly important for food consumption because seasonal

variations in dietary patterns, overall quantities of food consumed, and the

consumption of particular nutrients can be pronounced, partly because of its

relationship with food production cycles (Coates et al., 2012). D’Souza and

Jolliffe (2012) find that household consumption in Afghanistan can be as much as

one third lower in the lean season compared with the post-harvest season. The

different levels of consumption, if taken at face value, would result in estimates

of the poverty headcount doubling from 23 percent in the fall to 46 in the

following summer (D’Souza and Jolliffe, 2012). Seasonality in food prices is a

key concern as it is found to be significant and can affect estimates of poverty

and consumption (Gilbert, Christiaensen, and Kaminski, 2016). That is of course

a major issue for surveys collecting data for the calculation of CPI.

Seasonal variations can also originate from increased expenditure during

festivals and holidays. In the United States, it has been established that

consumption is higher during holidays and summer months (Barsky and Miron,

1988). In low- and middle-income countries, expenditure can vary significantly

with holidays, festivals, and religious observances. Jolliffe and Serajuddin

(2015), using data for Jordan, note that during the period of Ramadan,

consumption levels are 11 percent greater than during other periods of the year.

The festive expenditures can be difficult to capture in surveys because it is often

difficult for survey fieldwork to operate as normal during festive periods. A few

surveys, such as the Living Standards Surveys in Viet Nam, use a special recall

module for food consumption during festive periods, with analysis typically then

spread over the consumption estimates for the households observed in the rest of

the year.

Within-year temporal variation can originate from other patterns, such as those

associated with payment schedules for wages or social assistance (Stephens,

2003). Troubat and Grünberger (2017), in studying the urban subsample of the

Household Socio-Economic Survey 2007-2008 of Mongolia, have found

cyclical variations in food acquisition levels not only between months but also

within months and weeks. Because of Independence Day, a major national

holiday occurring in December (see Figure 4, left panel), results in the mean

consumption in that month are substantially higher than for the other months of

the year. A systematic pattern is also apparent for data collected at the end of

each month when food consumption is significantly higher (centre panel).

Households also tend to spend significantly more on food on Tuesday and

Fridays as compared with Mondays and Wednesdays (right panel).

21

Figure 4

Variation in mean food consumption by month, day of the month, and day of the

week, urban Mongolia

Source: Troubat and Grünberger (2017).

Even if such patterns are difficult to generalize given the context specificity, they

remain an important example of sources of bias that should be mitigated to the

extent possible in survey design (Fielder and Mwangi, 2016). If seasonality is not

taken into account when there is marked seasonal variability in food

consumption, the use of short reference periods bias the estimates of the mean

and the standard deviation of the distribution of habitual food consumption in the

population. Recorded mean consumption may be higher or lower, depending on

the season when the data are collected, and estimates of the coefficient of

variation may be biased by the confounding effect of seasonal variation.

A survey carried out at a specific time of the year (say a season, month, or week),

misses seasonal variation in consumption and risk being unrepresentative of

typical consumption across the year, even when it manages to accurately capture

consumption over the period of data collection. Also, surveys that are not

adequately capturing the entire year pose problems for international

comparability. Comparisons of consumption data for a country conducting a

survey in the lean season and one conducting it in the harvest season are difficult

to make in the absence of elements that enable the habitual consumption levels in

both countries to be gauged. Even over time, comparisons of surveys undertaken

within the same country and during the same period of the year might be

invalidated if major events correlated with consumption patterns move in and out

of the survey implementation period. This may happen with Ramadan, the dates

for it move from year to year, or when harvest periods are delayed or pushed

forward by weather events. For all those reasons, it has been recommended that

22

HCES should cover a full year to properly capture seasonal variations in

expenditures (ILO, 2003), although this is by no means a universal practice.

Deaton and Grosh (2000) suggest the use of a “usual month” approach to

overcome seasonal variation, but in the previous section, it was shown how the

reliability of that approach appears questionable, and is associated with longer

interview times and heavier cognitive burden on the respondents. Deaton and

Zaidi (2002) suggest that for capturing household consumption, the optimal

survey implementation and design is the one likely to provide the most precise

estimate of annual consumption for each household, not just for households on

average. Based on this objective, the ideal design is one in which households are

visited each “season” and habitual consumption is then derived as an average

over the year of seasonal consumption. For variance-based measures, those intra-

year revisits make possible corrected extrapolation, along the lines of what is

shown for estimates of hunger in Gibson (2016). A drawback of revisiting the

same households is the cost and the trade-off with overall sample size, as for any

given sample size, the survey costs increase with the number of multiple visits.

Just over half (53 percent) of the surveys reviewed by Smith, Dupriez, and

Troubat (2014) considered seasonality by using one of two approaches. The first

approach (used in 41 percent of assessment surveys) is to distribute data

collection throughout a year by surveying subsets (usually one twelfth of the

households in the sample) in each month of the year, with subsamples

representative nationally for each quarter. This approach (which conforms to the

ILO recommendation) requires careful planning of the sampling strategy and

survey implementation, but it can ensure that the seasonal variation across space

and time is captured, at least for a “synthetic” household albeit not for any

particular household in the sample, and represents a lower burden on households

as they are visited only in one period of the year. This method can present

advantages in terms of organization when survey staff are employed to just work

on one survey, as it smooths the need for the workforce over the survey year and

can allow working with smaller teams, hence ensuring tighter supervisions.

The second approach (used in 12 percent of the surveys reviewed) is to conduct

two to four visits during the year on the same households. It was noted above

how additional visits to the same households come at a cost, pose logistical

challenges, and increase the burden on the respondents. This option can,

however, be attractive when the survey also has other objectives, such as

collecting data on agricultural activities, to the extent that the visits can be timed

around salient moments for both objectives; for example different points in the

agricultural season (e.g. post-planting and post-harvest) as is done in the Nigeria

National Panel Survey (2010–2011, 2012–2013, 2014–15) and in Enquête

Nationale sur les Conditions de vie des Ménages et l'Agriculture (2011, 2014)

conducted for Niger. Finally, three of the surveys reviewed by Smith, Dupriez,

and Troubat (2014) collected data in four visits over a 12-month period. This

approach is widely deemed to be very difficult to implement and excessively

cumbersome in terms of organization, with the burden on respondents at its

highest. Some countries implement two rounds of data collection in different

periods of the year, but on a different cross-section of households. That does

away with the added interview burden for the household, but it only allows

23

controlling for seasonality for the sample aggregate and not for each specific

household. Also, it does not provide an opportunity for correcting variance-based

measures for excess variability due to intra-year fluctuations.

Forty-seven percent of the surveys reviewed by Smith, Dupriez, and Troubat

(2014) do not satisfactorily account for seasonality in their design, as they are

implemented through one household visit over the span of a few months. This

approach returns data that are subject to all the biases discussed at the beginning

of this section, but it is quite common because the approach is easy to implement.

When staff are employed by different surveys implemented by the same

statistical office, the approach allows the office to move them on to the next

project when the household income and expenditure survey fieldwork is

completed. It is often also motivated by the idea that if the period of

implementation does not vary, then at least over time, comparison would be

safeguarded. This is, however, a questionable assumption, as the extent and

timing of seasonal variations may not be the same from one year to the next.

Also, any unforeseen implementation issues that would delay the onset of the

fieldwork from its planned schedule would invalidate over time comparability of

the data. These are very serious concerns that point to the need to abandon this

practice for one that takes seasonality fully into account.

A hybrid approach that could be experimented, which at least partly limits the

more serious shortcomings of the “one visit over a few months” approach, would

entail complementing that one visit with a second visit on a subsample of

households. This additional subset visit could provide the information required to

correctly annualize the data from the first visit. This is, however, only a

hypothetical survey design that would need to be carefully tested before being

applied at scale, and as such, it represents at best an indication for further

research.14

2.3. Acquisition versus consumption15 Early on, food consumption data were mainly collected in HCES (in particular,

in household budget surveys) to construct the consumer price indices or to

inform national accounts. For such uses, the interest was mainly in collecting

data on food items acquired through expenditures. Obtaining food through

expenditures is now widespread throughout the world and has become the

prominent form of food acquisition in many locations, especially i n urban

areas. However, in many countries, a considerable share of households obtains

some food from their own production, such as from crop cultivation, livestock

rearing, fishing and aquaculture, or hunting and gathering. It is also quite

common for households to obtain some of their food in kind, in the form of

14 An example of this design, using a 20 percent subsample who were revisited approximately five

months after the initial visit, is provided by Gibson (2001). The second visit was estimated to add

about 10 percent to total survey costs, and made it possible to partition poverty estimates into

chronic and transient components. 15 This section is based on and reproduces parts of Conforti, Grünberger, and Troubat (2017).

24

gifts from other households, payments from an employer, or public or private

assistance (school feeding, food assistance programs, or social or private

transfers in kind).

As HCES are increasingly used for poverty and food security analysis, the

emphasis of the surveys has shifted to also collecting data on food items

obtained through not only expenditures, but also through other sources of

acquisition. Accessibility is one of the dimensions of food and nutrition security,

as defined by FAO, which includes access to food from all possible sources. For

poverty analysis, all sources of food acquisition enter the consumption aggregate,

not only those that imply an outlay of cash. With regard to nutritional

assessments, what is actually ingested matters. Again, that implies a focus on

consumption (or more specifically, intake) of food regardless of how it was

acquired. Understanding how food systems work and evolve, what share of foods

households in different socio-economic groups acquire from different channels,

what the relative prices are for households in different locations, and what the

nutrition and welfare implications are, requires having access to food data.

For national accounts, food produced for own consumption is part of the

household final consumption expenditure. Getting information on own-account

production and consumption of food (as well as other goods) by households is,

therefore, critical, even though agriculture surveys or censuses may also provide

that information. Such food should be valued at “basic prices” of similar goods,

which can be approximated by the price of similar goods sold on a local market,

or the price declared by the household producer if he or she had sold the food

rather than consumed it. Information on food and meals acquired through in-kind

transfers is also important. Valuation should be based on actual cost if actually

purchased by the provider or production cost, both being unknown and difficult

to evaluate by the beneficiary.

Not all surveys, however, are designed to capture information on all the food

that is consumed or available for consumption in a household from all the

sources of acquisition. Three different approaches to collecting food data can be

identified, following Conforti, Grünberger, and Troubat (2017):

• Acquisition. Households report on food they acquired through

purchases, own production and in-kind transfers. Actual consumption

of the same food is not reported.

• Combination of acquisition and consumption. Households report on

acquisition for food they incurred an expenditure for purchases, without

specifying the amount of food consumed. Food consumption derived

from own production or received from transfers is reported.

• Consumption. Households report on food consumed, and on whether

that same food was purchased, own-produced or received as a transfer.

Differences in food measurement among surveys focused on acquisition and

consumption are not always clear. In principle, the difference between the two

measures is essentially a change in stocks, and food wasted in households. For

surveys based on efficient representative samples homogeneously spread across

25

time and space, changes in stocks should on average be close to zero. In any

given reference period, some households may build stocks while others may

consume food from stocks. However, surveys with less effective timing of

household visits may show significant differences between acquisition and

consumption (e.g. if the survey is implemented in one visit when most

households are stocking or destocking).

Smith Alderman & Aduayom (2006) provide a general discussion about the

difference between estimates of consumption and acquisition. Depending on the

length of survey coverage and reference period, the distribution of acquired food

is expected to have a higher variance and a higher mean than the distribution of

consumption. The variance of acquisition surveys is higher because daily food

consumption is smoother than acquisition. This difference is expected to decrease

to zero as the length of the survey period increases. During the reference period,

households can either consume from stocks (underestimating household

consumption using an acquisition survey) or build stocks (overestimating

household consumption using an acquisition survey). As a consequence,

households can have zero expenditure during a given reference period, albeit

consuming from stocks (Gibson and Kim, 2012). Acquisition surveys should be

used to approximate aggregated consumption of population groups, rather than

habitual consumption of individual households. Acquisition data are assumed to

have a higher mean than consumption because food waste, rotten stocks, or food

given to pets is already detracted in consumption estimates (Smith Alderman and

Aduayom, 2006). However, empirical studies suggest that the difference between

averages of food acquisition and consumption is not always positive, but they can

sometimes be close to zero or even negative (Kaara and Ramasawmy, 2008;

Martirosova, 2008; Smith, Alderman, and Aduayom, 2006; Bouis, Haddad, and

Kennedy, 1992; Bouis, 1994). Conforti, Grünberger, and Troubat (2017)

analyzed 81 HCES16 conducted between 1988 and 2014 and found that the

average dietary energy consumption from surveys focusing on acquisitions was

only slightly higher than that from surveys focusing on consumption, but the

variability was, in turn, much higher (an average coefficient of variation of 76

compared to 52).

Though the difference in the aggregate measure is not that significant, the

difference in the coefficient of variation is of real concern for FAO, which is

using the coefficient of variation derived from food data collected in HCES to

estimate the prevalence of undernourishment. FAO has developed a methodology

to overcome the issue of excess variability encountered in the food consumption

measurement (Wanner et al., 2014). Troubat and Grünberger (2017) applied this

methodology to the Household Socio-Economic Survey 2007/2008 of Mongolia,

which collected food consumption and food acquisition.17 They found that the

difference in variability that exists between DEC from acquisition and DEC from

consumption disappears after both distributions are corrected for excess

variability (coefficient of variation decreased from 63 to 31 percent for food

16 Surveys analysed by the FAO food security and nutrition statistics team from 2006 to 2014, using

the ADePT-FSM software developed jointly by FAO and the World Bank (Moltedo et al., 2014). 17 The latter survey measures a household’s food acquisition, and food stocks at the beginning and

the end of the reference period. Combining the information of acquisition and stock variation, the

household’s food consumption can also be derived from food acquisition.

26

consumption measurement based on acquisition-type data and from 52 to 30

percent for food consumption measurement based on consumption-type data).

The decision to collect acquisition or consumption data is not expected to have a

large impact on the estimate of the prevalence of undernourishment, nor poverty,

as there is no significant effect on the mean and the impact on variability can be

reduced using the control for excess variability. However, for countries with a

large population and low average DEC, a small difference in kilocalories per

capita per day can still affect food security and nutrition assessments.

In addition to those general questions, there is a more specific – but not less

important – set of risks associated with survey design that does not explicitly

take into account the difference between consumption and acquisition.

According to the review performed by Smith, Dupriez, and Troubat. (2014), in

surveys based on recall interviews, it is not uncommon for questionnaires to

include poorly worded leading questions or other forms of design ambiguity

that can lead to incomplete enumeration of foods consumed. Such issues arise

when the survey design fails to properly consider that not all the food

acquired by a household is consumed during the survey reference period, and

that food can be consumed during the reference period that was acquired

earlier. Their findings are reproduced with minor changes in the remainder of

this section and summarized in Table 1.

Table 1

Completeness of enumeration of food acquisition or food consumption or both

Source: Smith, Dupriez, and Troubat (2014).

For the food data in HCES to be reliably collected, there must be full

accounting of either all acquired food intended for consumption or all food that

was consumed over the recall period. Additionally, only the food intended for

27

consumption (when acquisition focused) or consumed (when consumption

focused) during the reference period must be included, not any additional food.

The following exclusion and inclusion accounting errors can adversely affect

the collection of HCES food data:

(1) Acquisition surveys: rule-out leading question on consumption. If a leading

or “filter” question on consumption of each food item over the recall period is

answered “no,” collection of further data on the acquisition of the food can be

ruled out. In this case, respondents are first asked whether they consumed each

food item in the food list for a recall period of up to one year before the time of

the survey. Then, they are asked how much was purchased, consumed from own

production, or received in kind over the survey recall period for food data

collection. However, if the respondent answers “no” to the leading question, the

instructions skip to the next item, then, the household receives a zero for

acquisitions of the food item regardless of whether or not it was acquired. This

leads to systematic underestimation of the quantities and expenditures on food

acquired. A rule-out leading question on consumption is considered to be a

problem when the two recall periods are less than or equal to two months apart.

Note that this issue does not affect diary surveys because there is no pre-listing

of foods to rule out.

(2) Acquisition surveys: rule-out, short-recall-period leading question on

acquisition. Here, if answered “no,” a short-recall-period leading question on

acquisition of each food item rules out further data collection on the acquisition

of the food over the (longer) survey recall period. In this case, respondents are

first asked whether or not they acquired each food item over the short recall

period (e.g., two weeks). Further information is collected on the acquisitions of

the food for the longer recall period for food data collection only for those food

items that were acquired over the shorter recall period. This leads to

underestimation of mean food acquisition for the population.

(3) Acquisition surveys: rule-out leading question on food purchases. In this

case, if a respondent reports that the household did not purchase a food item,

then no further information is collected on other forms of acquisition of that

food item. As home-produced or in-kind receipts are left out, this problem also

leads to underestimation of mean food acquisition for the population.

(4) Data collected on food harvested rather than food consumed from home

production. When interviewees declare food harvested instead of food

consumed from own production or food from own production for consumption,

the quantities and expenditure on food acquired include those entering into the

households’ production stocks – not the household pantry for immediate

consumption – and are systematic overestimates of food consumed from home

production. A similar situation occurs when there are household animals, such

as poultry and pigs, that may eat some of the food that was harvested from

household food gardens (e.g., undersized tubers, and food that is deemed as

otherwise unfit for human consumption given the food availability at the time).

28

(5) Ambiguity about whether to report on acquisition or consumption. The

question asked to respondents does not make it clear whether they are expected

to report on their acquisitions or consumption of each food item over the recall

period. This problem leads to inaccuracies in the calculation of the mean

acquisition or consumption for the population and measures of inequality.

(6) Routine month surveys: Ambiguity about whether respondents should report

on the routine month in the recall period or only on those months in which the

food item is actually consumed. In many routine-month surveys, respondents

are first asked to report on the number of months in the past year in which each

food item was consumed. Immediately following, they are asked about the usual

or average amount per month. Some questionnaires, however, fail to specify

whether the average should be for those months in which it was consumed or

for any month in the last year. When this type of accounting error occurs, some

households may report on the former and some the latter, leading to over- or

underestimation of their consumption of any food item for which a positive

number of months was reported for the initial question.

As can be seen in Table 1, 11 percent of the assessment surveys suffer from the

use of the three types of rule-out leading questions. The collection of data on

food harvested rather than food consumed from home production is a relatively

rare problem, which affects only 2 percent of the surveys. A full 14 percent of

the surveys had problems of ambiguity in what is to be reported, which likely

leads to incomplete enumeration for some households. The problem of

ambiguity in expected reporting for routine month surveys was identified in 8

percent of the surveys. Overall, 25 percent of the surveys had not met the

reliability criterion for completeness of enumeration, that is, they were affected

by some of the identified problems of incomplete enumeration. Note that the

large majority of the surveys with those types of accounting problems are

interview surveys.

2.4. Meal participation

The size of a household is only a proxy for the number of food consumers in the

household during the reference period. Per-capita measures of food consumption

should be based on the number of people who actually take part in the meals

(food partakers) in the household (Smith, Dupriez, and Troubat, 2014; Weisell

and Dop, 2012). People other than household members who could take part in the

household’s meals include employees who had their meals in the household,

guests, and visitors. The number of food partakers should exclude household

members not present in the household during the reference period.

Adjustment for food partakers may not be an issue for poverty measures, but

collecting that information is essential for the analysis of habitual per capita food

consumption and food security estimates. Indeed, nutrient inadequacy is assessed

with reference to requirements that are expressed in a per person basis.

29

Household surveys collect information on total amount of food consumed by

households over a certain reference period. To convert this information to a per

capita basis, it is important to account for meal participation in the household.

The most common way to do this is to consider the number of people who

consumed the total amount of food reported by the household.

Box 3

Estimating average per capita dietary energy consumption

Based on household size and the number of partakers, per capita DEC of a household i can

be calculated in two ways. First by dividing the total number of daily calories consumed in

a household by the exact number of people who participated in the meals

𝐷𝐸𝐶𝑖𝑃 =

𝑇𝑜𝑡𝑎𝑙𝐷𝐸𝐶𝑖𝑃𝑎𝑟𝑡𝑎𝑘𝑒𝑟𝑠𝑖

or, if the above is not available, by dividing total household calories by the number of

household members

𝐷𝐸𝐶𝑖𝐻𝐻 =

𝑇𝑜𝑡𝑎𝑙𝐷𝐸𝐶𝑖𝐻𝐻𝑠𝑖𝑧𝑒𝑖

In the latter case, food consumption is underestimated if mean consumption is calculated

on the basis of household size. When food is provided also to non-household member

partakers, the total food consumption in the household increases. The household’s mean

consumption should be correctly calculated by dividing total household food consumption

by household size plus additional partakers minus absent household members. In omitting

the additional partakers from the calculation, the denominator is smaller and the

household’s mean consumption is overestimated. If absent household members are not

subtracted from household size, the denominator is higher and household’s mean

consumption is underestimated.

However, there is no standard approach to capturing information on meal

participation in households, and many surveys fail to collect that information.

Smith, Dupriez, and Troubat (2014), in their survey assessment, find that only 15

of 100 surveys ask whether non-household members were present or consumed

meals in the household during the recall period. Even within this small number,

there is variability on the additional information collected, with 11 asking about

the number of visitors in the household, 10 about the number of meals they

consumed, and a handful also asking information on the length of stay, type of

meals they consumed, and age and sex of the guests. Fiedler and Mwangi (2016)

provide a thorough description of approaches to collecting partakers. Reviewing

17 recent surveys, they found 16 different possible questions combined in various

ways, with no two countries collecting the same information on partakers (Table

2).

30

Table 2

Approaches to collecting information on partakers in 17 household consumption

and expenditure surveys

Source: Fiedler and Mwangi (2016).

Partaker correction should in theory have no impact on the overall sample mean

of per capita food consumption because positive and negative deviations from the

household size balance out.18 On average, the household size should be equal to

the number of partakers. The multivariate analysis of 81 HCES19 that were

conducted between 1988 and 2014 (Conforti, Grünberger, and Troubat, 2017)

indicated no significant difference between the mean and the coefficient of

variance of DEC per capita accounting for partakers, when controlling for other

survey characteristics. However, there is empirical evidence to believe that not

accounting for partakers distorts the distribution of per capita DEC.

In analyzing surveys from Kenya and the Philippines that collect information on

partakers, Bouis, Haddad, and Kennedy (1992) and Bouis (1994) show that the

relative difference between mean DEC of the first and fourth quartile is much

lower when partakers are accounted for. Similarly, using an urban survey, Gibson

and Rozelle (2012) show how using the roster of meal partakers lowers the

apparent calorie availability of the richest quartile by 7 percent and raises the

calories of the poorest, in cases in which this pattern results from a coping

18 If meals consumed in another household have a corresponding entry as meals given to another

household. 19 Surveys analysed by the FAO food security analysis team from 2006 to 2014, using the ADePT-

FSM software developed jointly by FAO and the World Bank (Moltedo et al., 2014).

Topic Data item collectedNo. of HCES

collecting

A. Meals

1. Usual number of meals eaten daily 7

2. Type of meal eaten (breakfast, lunch, dinner, snack) 1

3. Type of meal eaten away from home (breakfast, lunch, dinner, snack) 3

4. Total number of meals served 2

B. Person-specific data

B1. Household members

5. Present during the reference period? (yes/no) 8

6. At least 1 meal eaten at home during the recall period ? 10

7. Number of days ate in the household during recall period? 2

8. Meals eaten away from home? (yes/no) 2

9. Number of meals away from home 2

10. Number of days away from home 1

B2. Non-household members/guests

11. Were any guests present during the reference period? (yes/no) 8

12. Number of guests present 7

13. Number of days guests were present 4

14. Number of meals served to guests 4

15. Type of meals served to guests 1

16. Characteristics of the guests (age, gender) 7

Household Consumption and Expenditure Survey data for developing more detailed

estimates of meal attendance by number and type of meal, and number and level of

participation

31

strategy of the poor, which is to visit their wealthier kinfolk at meal times. Those

studies, therefore, provide evidence that the variability of DEC conditional on

household income is lower if data are adjusted for partakers.

Results in line with those of Bouis, Haddad, and Kennedy (1992) were confirmed

by a similar analysis conducted by FAO on food data collected in the 2010

Bangladesh survey (Grünberger, 2017b). In that survey, information was

collected daily on the number of people partaking in the meal, by gender and age

groups. The variability of DEC conditional on income is much lower once DEC

was adjusted for partakers, and the difference between household size and the

number of partakers increased monotonically with income (Figure 5). A clear

upward trend in the difference between per capita and partakers-adjusted mean

DEC can be observed between the bottom and the top decile.

Figure 5

Differences in household size and mean dietary energy consumption per capita

when adjusting for partakers

Source: Grünberger (2017b).

Researchers from the FAO Statistics Division analyzed five surveys that

collected information on partakers and found that the coefficient of variation of

DEC was systematically lower when household size was adjusted for partakers,

even though the five surveys used different approaches to collect data on

partakers (Table 3). In the 2010 Bangladesh Household Income and Expenditure

Survey, the respondents were asked to report daily on the number of people

present in the household and their demographic characteristics. In the Household

Socio-Economic Survey 2007-2008 of Mongolia, information on the number of

visitors and the number of days they stayed in the household was collected. In the

2007-08 Afghanistan National Risk and Vulnerability Assessment and a survey

conducted by Niger in 2011, respondents were asked to report on the number of

meals and number of days that visitors stayed in their house. Finally, in the

2010/11 Uganda National Household Survey, information was collected on the

number of people present in the household during the reference period. The

reference period is sometimes different than that of the food module, which was

observed for the Household Socio-Economic Survey 2007–2008 of Mongolia.

Despite the difference in approaches, in all five countries, the coefficient of

variation of DEC is systematically lower when household size is adjusted for

32

partakers. The method designed by FAO to correct for excess variability fails to

correct it because of the omission of partakers.20 In terms of overall impact on the

estimate of the prevalence of undernourishment, the effect of not-corrected

partakers may lead to an over- or underestimation, as a higher per capita DEC

may counterbalance the effects of the higher variability.

Table 3

Comparisons of the coefficient of variation of dietary energy consumption when

adjusting for partakers

Coefficient of variation of

DEC per capita without

correcting for excess

variability

Coefficient of variation of

DEC per capita after

correcting for excess

variability Not

accounting

for partakers

Accounting

for partakers

Not

accounting

for partakers

Accounting

for partakers

Afghanistan 2007/08

National Risk and

Vulnerability

Assessment

0.36 0.35 0.25 0.24

Bangladesh

Household Income

and Expenditure

Survey

0.32 0.24 0.26 0.22

Household Socio-

Economic Survey

2007–2008 of

Mongolia

0.48 0.46 0.32 0.30

Niger 2011 Enquête

Nationale sur les

Conditions de vie des

Ménages et

l'Agriculture

0.63 0.58 0.35 0.34

Uganda 2010/11

Uganda National

Household Survey

0.63 0.61 0.38 0.35

One must be careful not to conclude from this evidence that meal participation is

not an important issue; indeed, it still needs to be addressed.21 Household socio-

economic surveys that currently attempt to make those adjustments are few and

highly diverse. In several countries, the questionnaires appear to capture only a

portion of the requisite information and the results are likely subject to

considerable measurement error.

20 The FAO methodology should be able to correct for the excess variability because of the non-

adjustment for partakers: for two (Bangladesh and Uganda) of the five countries analysed, the

coefficient of variation corrected based on DEC using household size and the coefficient of variation

corrected based on DEC using partakers were found to be different. 21 This discussion is based on Fiedler and Mwang (2016).

33

Furthermore, there are several reasons why it is believed that the importance of,

and need for, those adjustments is increasing; foremost is the secular, seemingly

universal trend of the growing practice of consuming food away from home. It is

noteworthy that, by implication, those studies may not provide an accurate

portrayal of the actual situation in several of the studied countries: i.e. the

findings may be false negatives regarding the importance of making adjustments

for meal participation. This is especially likely to be the case in countries where

there is greater travel away from home and where, more generally, there is a

more widespread practice of eating away from home. It is, therefore, difficult to

make any definitive assessments about the value of making adjustments for

meals, or about the feasibility or best practices of collecting the requisite

information to make the adjustments.

A review of select questionnaires of more recent household socio-economic

surveys have revealed that over the course of the past few years a number of

countries — India, Malawi, Mali, Niger, Nigeria, Uganda, and the United

Republic of Tanzania, among others — have introduced new questions to identify

who among the household’s members were meal partakers and which meals were

eaten away from home, and by whom. Several of the surveys have also inquired

about meals that households provide to non-household members. Those

modifications appear to have been motivated by concerns that food away from

home is increasing in frequency, is significantly underreported, and is distorting

the precision of food security analyses.

2.5. Food away from home Consumption patterns are rapidly changing across the developing world, with

prepared and packaged meals and meals consumed outside the home taking an

ever-growing share of the households’ food budget. Amid rising incomes,

urbanization, women entering the labor force, and children eating at schools,

among various reasons, this trend is expected to persist as economies transition to

middle-income status (Maxwell and Slater, 2003; FAO, 2006; Popkin, 2008;

Smith, 2013; USDA, 2011). As food away from home gains importance, failing

to appropriately measure this component of food consumption and expenditure

will make comparisons of consumption patterns and poverty less and less

meaningful.

In the United States, the share of food away from home in total food expenditure

increased from 10 to 50 percent during the twentieth century. In urban China,

total expenditure on food away from home increased by 63 percent between

1995 and 2001 (Ma et al., 2006; see Figure 6). Household per capita

expenditure on food away from home rose at an average annual rate of 9.5

percent in China from 2002 to 2011, while the share of food away from home in

total food expenditure increased from 18.2 percent to 21.5 percent (You, 2014).

In Egypt and India, the prevalence of meals eaten away from home almost

doubled in less than 20 years.

34

Figure 6.

The rapid rise of food away from home in the United States and China

Source: Calculated by the Economic Research Service, USDA, from various data sets from

the U.S. Census Bureau and the Bureau of Labor Statistics. USDA-ERS (26/1/2016) (left

panel); Gibson (2016) (right panel).

Taking into account food away from home consumption is particularly

important for measuring calorie consumption, a s food consumed outside the

home tends to be more calorie-dense than food consumed at home (Poti,

Duffey, and Popkin, 2011; Mancino, Todd, and Lin, 2009) and the amount of

food consumed away tends to increase m o r e r a p i d l y , i n l i n e w i t h

increases in income.

Figure 7.

Energy intake from food away from home (%)

Sources: R. Vakis, Improving measurement of Food Away from Home (FAFH) – presentation

35

Food away from home has been found to contribute to as much as 36 percent of

the daily energy intake among men in urban Kenya, and 59 percent among

women in urban Nigeria (Oguntona and Tella, 1999; van’t Riet et al., 2003).

Among the younger population, food away from home contributes, for example,

to 18 percent and 40 percent of the daily energy intake among Chinese children

and school-going adolescents in Benin, respectively (Liu et al., 2015; Nago et al.,

2010).

Most nationally representative household surveys have not kept up with the pace

of change in food pathways and collect very limited information on food away

from home. Smith, Dupriez, and Troubat (2014), when assessing the relevance

and reliability of their sample of 100 surveys, found that 90 percent of the

surveys consider food away from home in some form, but that most of the

approaches are “ad hoc and unsatisfactory.” For example, 25 percent of the

surveys aim to capture all related household consumption from food away from

home using just one question; one in five surveys considers multiple places of

consumption; only 35 percent take snacks explicitly into account (when most

snacking is expected to take place out of the home); and close to 50 percent of the

surveys do not include food away from home received in kind.

While it is widely recognized that food away from home is subject to

considerable measurement error, the exact amount it contributes to

underestimating consumption is unknown. However, as food away from home is

expected to increase as a proportion of total food consumed and total food

expenditure, if no changes are made in the way that information is collected, the

magnitude of that underestimation is expected to increase. As it does, it will

exacerbate the instability of household consumption and expenditure surveys

based on estimates of food insecurity and under-nutrition as currently measured,

(Tandon and Landes, 2011; D’Souza and Tandon, 2014; Smith, 2015), obfuscate

trends, and prompt more researchers to question whether even the general order

of magnitude of the estimates of global under-nutrition should be accepted

(Banerjee and Duflo, 2011). The inadequate collection of food away from home

data urgently needs to be better understood and systematically improved.

Only a few studies have analyzed the implications that failing to account for food

away from home can have on food security analysis.22,23 In a study conducted in

India, Smith (2013) argues that the great Indian calorie debate, originated by an

22 With obesity increasingly becoming a pressing health issue in some middle-income countries, the

link between eating out and obesity is also drawing attention in the developing world (Bezerra and

Sichieri, 2009). 23 The literature on food away from home in the developed world has a longer history in which the

main focus has been on health and nutrition issues. There is widespread interest in studying the differences in the caloric and nutritional composition of the food provided by commercial outlets

relative to home-made food, with the objective of understanding the health consequences of eating

out (Vandevijvere et al., 2009). In particular, high calorie concentration found in certain meals raised particular concern, giving rise to a body of research devoted to understanding the link between

obesity and eating out, among other health outcomes (Burns, Jackson, and Gibbons, 2002; Guthrie,

Lin, and Frazao, 2002; Kant and Graubard, 2004; Binkley, Eales, and Jakanowsky. 2004). There is also interest in establishing food-based dietary guidelines to prevent obesity and related chronic

diseases developed later in life (Phillips et al., 2013).

36

apparent increase in undernourishment at the time of falling poverty rates, can be

partly explained by inaccurate data on calorie intake because of the lack of

measurement of food away from home. Similarly, Borlizzi, del Grossi, and

Cafiero (2017) show in Brazil how the distribution of food consumption by

income strata changes once food consumed at school is taken into account. In

particular, they show that proper accounting for food received through a school

feeding program targeted at the poorer strata of the population results in a more

equal distribution of food consumption than previously thought. Capturing food

away from home increases mean DEC, as it is an important food source,

especially in urban areas. Smith (2015) shows that food away from home is

positively correlated with the estimated mean dietary energy consumption. In

many household consumption and expenditure surveys, food away from home is

only measured in terms of monetary value. However, as meals eaten outside the

home are different than meals at home (Rimmer, 2001), the conversion of

monetary value into calories can be misleading if home food consumption is used

as a benchmark to calculate calories from food away from home.

Using data for Peru, Farfán, Genoni, and Vakis (2017) evaluated the impact of

accounting for food away from home on poverty and consumption inequality

estimates. They show that from a theoretical point of view the direction of the

effect on poverty or inequality cannot be predicted ex ante. Empirically, they

demonstrate that failure to adequately capture food away from home may

generate serious biases in estimates of households’ expenditure patterns and

welfare measures and may change the underlying profile of the poor.

There is considerable evidence about a large variety of socio-economic factors

associated with eating away from home. Measurement error from neglecting food

away from home may, therefore, bias welfare analyses along those lines. In

several countries, geography, household size and composition have been shown

to be systematically related to the incidence and level of food away from home

consumption and expenditure (Meenakshi and Ray, 1999; Mihalopoulos and

Demoussis, 2001; Yen and Jones, 1997; Mutlu and Gracia, 2006; Meng et al.,

2012). Households composed mainly of the elderly have also been found to have

lower probabilities of relying on, and lower expenditure on, food away from

home (Redman, 1980; Meng et al., 2012; Liu et al., 2015).

Conceptual and practical challenges make integrating food away from home in

household surveys a complex exercise. First, a clear definition of what is meant

by food away from home is needed.

Figure 8 contains an outline of a useful way to conceptualize and measure food

away from home along with some key measurement issues to consider. Food

away from home can refer to food produced outside, regardless of whether the

food is consumed outside or inside the home. In that case, takeout meals would

be considered food away from home. Alternatively, it can refer to food consumed

outside irrespective of the origin of the food. Under that scenario, home-made

meals consumed at work or at school would be a component of food away from

home. While there is a general preference towards defining food away from

home based on the place of preparation of the food, a clear protocol that takes

into account all the different pieces is required to be well defined regardless of

37

the concept that is adopted. A second element to consider when collecting

information on food away from home is snacks, which in modern eating habits

are more likely to be consumed outside the home. Finally, there can be different

modes of acquisition of the food, including purchased food or food received in

kind, each of which can originate from multiple sources, such as from

commercial establishments, social programs, and other households. While a great

deal of attention has been paid to food that household member purchases and, to a

lesser extent, food (or meals) that household members receive free as part of a

social intervention (most commonly a school meal), there is evidence from China

and India that “hosted” meals provided free to friends or relatives are also an

important, distinct category (Bai et al., 2010; Fiedler 2015). In China, “hosted”

meals were found to account for nearly 50 percent of all food away from home

and to be disproportionately important for lower income groups. In India, they

accounted for 29 percent of all meals away from home, and 36 percent of all

persons with at least one meal away from home reported having at least one

hosted meal provided by another household.

Figure 8

Defining food away from home (FAFH)

Source: Smith, Lisa C. and Timothy R. Frankenberger. 2012. Typology of food away from home. TANGO International, Tucson, AZ.

Of the issues covered in the guidelines, this is probably the area which is the most

difficult to trace one set of agreed upon international practices. The discussion

that follows is centred on a food away from home definition based on where the

food is prepared. The inclusion of food prepared at home but eaten outside would

most likely result in double counting as the ingredients would already be

38

accounted for under the category food available in the household from different

sources. In addition, while for food at home, the main food preparer is likely

to be adequately informed about the food consumed by all household members,

no one individual will be in such a privileged position to report about the food

consumption patterns of other household members away from home. Food away

from home may, therefore, needs to be captured at the individual level,

interviewing different respondents when possible.24

An additional question relates to what information to collect from respondents.

For nutrition and food security analysis, it is be important to have information

(from within the survey or from other sources that can be integrated into the

survey) on what is eaten, which is a challenge for food away from home, as meal

content is likely unknown to the consumer.25 Options include: differentiating

between meal, drink and snacks; asking by eating occasion (breakfast, midday

snack, lunch, afternoon snack, dinner); reporting the source of preparation

(commercial, government or social program, employer, other household);

differentiating by the type of establishment (fine dining, fast-food restaurant,

street vendor); reporting day of the week (or weekday or weekend). Note that

snacks can be as if not more important than some meals in terms of both calories

and expenditure. As a result, it is very important that they are adequately captured

in the data-collection instrument.

In recognition of those challenges, several low- and middle-income countries

have in recent years adopted innovative approaches to food away from home data

collection. Details of experiences from India, Peru and the West Africa Economic

and Monetary Union are detailed in the Annex 2.26 In the United States, USDA

has recently introduced the Household Food Acquisition and Purchase Survey,

which collects data at the individual level through food books. Attention to food

away from home is ensured by including a reminder in the food books to record

meals, snacks and drinks consumed at a number of different outlets, such as at

school, work, a relative’s home, or recreational sites. Kirlin and Denbaly (2017)

provide a detailed account of data collection in this survey.

With all their differences, those approaches to collecting food away from home in

recent Household Consumption and Expenditure Surveys in such different

settings have a number of aspects in common that can be useful in developing

24 In a small-scale study in an urban slum in India Sujatha et al. (1997) interviews husbands and

wives about the men’s dietary intake, and find that women are not aware of the foods consumed by

their spouses outside of their home. Similarly, Gewa, Murphy, and Neumann (2007) find that mothers of rural school-aged Kenyan children missed 77 percent and 41 percent of the energy intake

originated in food away from home in the food shortage and harvest seasons, respectively (where

food away from home contributes to 13 percent and 19 percent of daily energy intake in each season. Collecting data from children is particularly challenging and not commonly done in household

surveys in low- and middle-income countries. The report recommends a proxy respondent for

children mainly because this is a widely accepted approach in practice and there is no established viable alternative implemented at scale. This is a topic that warrants further research given the

growing importance of food away from home and publicly financed feeding programs, on children’s

diets. 25 The caloric and nutritional content of meals consumed away from home can be sourced through

complementary data sources, either purposely run surveys (as the survey of food establishments

conducted in Lima and described in (Farfán, Genoni, and Vakis, 2017) or via administrative data,

such as in the case of school meals. 26 See Annex 2

39

international standards for data collection directed at low-income countries: all

surveys collect data at the individual level and all surveys differentiate meal types

and make explicit reference to snacks.

2.6. List of food items The design of the list of food items included in a survey affects how food

consumption is reported by respondents (Deaton and Grosh, 2000; Gibson,

2005). As a list with more items is likely to help respondents more accurately

remembering their consumption or acquisition events, a longer and more

disaggregated food list results in higher reported consumption (Beegle et al.,

2012; Carrol, Crossely and Sabelhaus, 2015). However, very long food lists can

quickly lead to greater respondent and interviewer fatigue (Gibson, 2005). The

choice of items to be included needs also to be country specific to reflect

differences in diets. An “optimal” food list length should balance the lower

memory lapses and the lower costs and interview time associated with short lists,

with the better recall and more comprehensive reporting associated with the

longer list.

Interview-based surveys, in which food items are predefined and listed, should

include a sufficient number of food items to help respondents accurately

remember what has been acquired or consumed or both. The list should fit all

households in a given population, from poorest rural to richest urban households

consuming a very wide variety of foods. Diary-based surveys can use a

predefined list or be open-ended. The sample of surveys analyzed in Smith,

Dupriez, and Troubat (2014)27 reveals substantial variability in the number of

food items in HCES globally. The mean number of food items in diary-based

surveys in their review is far higher than in interview-based surveys (229 and

102, respectively). The number of food items spans greatly across surveys, from a

low of 19 to a high of 677.

When the number of food items recorded by respondents is not predetermined,

the aggregated list of items is usually far longer than the predefined list, which, in

turn, affect the value of reported consumption. Tucker and Bennett (1988) report

that diaries of an experimental United States survey with preprinted items lead to

higher total expenditure estimates than blank fill-in diaries, especially for older

persons (see also Tucker, 1992).

Several studies in low- and middle-income countries indicate that longer

disaggregated lists result in higher food consumption reporting than shorter

aggregated lists. In the experimental study of Beegle et al. (2012), seven-day

recall interviews with lists of 58, 17 and 11 food items were tested; the list with

58 items returned the highest consumption estimate. Jolliffe (2001) reports that

for a survey in El Salvador, a longer food item list (72 items) resulted in 20

percent higher food consumption compared to questionnaires with only 18

aggregated food categories. Food data collected in a Jamaican experimental

27 The numbers reported herein exclude the Brazil diary survey, which is a significant outlier at

5,407 items. Many of the items are, however, simply similar items named or spelled differently.

40

survey show 26 percent higher consumption with 119 items compared to 37 items

(Statistical Institute and Planning Institute of Jamaica, 1996).

In Indonesia’s SUSENAS survey, questionnaires with extensive item lists (218

items) showed about 7 percent higher food consumption than questionnaires that

had only 15 aggregated food categories (Pradhan, 2009). The longer list required

a much smaller than proportionate increase in time relative to the increase in the

number of items, although published figures from the study combined the food

and non-food consumption lists (52 minutes for 23 items and 82 minutes for 320

items).

Similarly, in the United Republic of Tanzania (Beegle et al. 2012), reducing the

list length by as much as 80 percent resulted in a reduction in interview times by

only 17 percent (49 and 41 minutes for the 58- and 17-item lists, respectively).

Additionally, Bradburn (2010) has noted that grouping questions (and food types)

can help to minimize the cognitive effort for respondents to recall the requested

information, leading to lower recall error. This implies, for example, that food

away from home questions should be reported in a separate group.

Friedman et al. (2017) decompose the error response in food consumption

measurement into two different components: the omission of any consumption

item, and the error in reporting the value. The shorter food list has lower

consumption incidence as compared to a diary survey benchmark for the large

majority of food groups, while for consumption value (conditional on positive

consumption) the subset list module, which is characterized as having the most

over-reports of consumption in comparison to the diary benchmark.

Analyses of the Food Frequency Questionnaires (FFQs) often used by

nutritionists also support the use of longer lists. Wakai (2009) finds that long

FFQs (97+ foods) display higher correlation with weighed food records (r=0.42

to 0.52), than short FFQs (< 70 food items, r=0.31 to 0.45). Similarly, Henríquez-

Sánchez (2009) finds that FFQs with >100 food items correlated more strongly

with weighed food records (r=0.52) than FFQs with less than 100 items (r=0.47).

A longer, more qualitatively differentiated food list is preferable.

In summary, given that the incremental interview time required for additional

items is relatively small, having a relatively long list is recommended. On the

other hand, one should not underplay the possibility that a longer interview time

may prompt enumerators to take shortcuts in interviewing (Finn and Ranchhod,

2015) and respondents to refuse to participate or terminate the interview ahead of

the required time (Deaton and Grosh, 2000). This is particularly the case when a

questionnaire has a cascading structure, as respondents are more likely to not

report some expenditure to skip other questions (Kreuter et al., 2011). In the

following paragraphs, some criteria that can help practitioners deal with this

complex balancing act are discussed.

A major recommendation in drawing a food list is to align the food list with

standard international classification systems. The United Nations COICOP

provides the reference international classification for individual consumption

expenditures. It is an integral part of the System of National Accounts, intended

for use in household consumption and expenditure surveys and for the

41

compilation of consumer price indices, as well as international comparisons of

gross domestic product and its component expenditures through purchasing

power parities. Between 2012 and 2017, COICOP went through a major revision,

resulting in COICOP 2018.28 This version, which was endorsed by UNSC in

March 2018,29 provides greater granularity as compared to the past, thanks to the

introduction of an additional level of detail (from a three- to four-level

structure).30 FAO actively participated in the process, leading the revision of

Division 01 on food and non-alcoholic beverages, particularly taking into account

and advocating the need to ensure relevance for low- and middle-income

countries. To supplement the official structure, and to guide countries expanding

Division 01 in their national versions, FAO also developed an official annex to

COICOP, which includes 307 additional food products at the fifth level. The

annex is included in the COICOP publication

Another source for standard classification is FoodEx2, a comprehensive food

classification carried out by the European Food Safety Authority, which was

recently updated to cover global needs (European Food Safety Authority, 2014).

FoodEx2 is an additional reference classification of particular relevance when a

survey is expected to be used to conduct nutrition analysis.

An important external reference to consider when drawing a survey food list is

food composition tables. Included items in the food list must match

unambiguously to an entry in the food composition table in order to convert food

quantities into nutrient quantities (FAO/INFOODS, 2012). 31 This is extremely

important if the data are to be used for nutrition and food security analysis, as

calorie and nutrient composition can only be derived if the food items can be

matched at the analysis stage to a corresponding relevant food composition table.

The conversion table should ideally be finalized before the survey goes to the

field to ensure that the breakdown of the food items is compatible and that the

availability of nutrient content data can inform the design of the survey food

list.32

The length and composition of the food list should be formulated, bearing in

mind how data are supposed to be used. From a welfare perspective, it is essential

that the items representing the large majority of food expenditures are included.

From a nutrition perspective, the food items that are important sources of

nutrients in individual diets must be included; items that contribute little to the

understanding of individual nutrient intake are less important. As a result, welfare

and nutrition requirements do not necessarily correspond. Accordingly, choices

pertaining to the food list tend to be “topic oriented”, even if surveys are meant to

be designed for a wide range of users. A nutrition-oriented food list is likely to

28 COICOP 2018 can be downloaded at https://unstats.un.org/unsd/statcom/49th-

session/documents/BG-Item3l-Classification-2-E.pdf 29 Official Records of the Economic and Social Council, 2018, Supplement No. 4 (E/2018/24-

E/CN.3/2018/3). 30 Sixty-eight new classes at the third level and 337 new subclasses at the fourth level. 31 It is also recommended that countries invest in the development of good reference food

composition tables and in keeping them up to date. 32 If a food composition table is available, it can also be used to pre-program computer-assisted

personal interviewing software to perform built-in checks for excessive consumption and speed up

data analysis and cleaning.

42

include more items than a welfare-oriented list. One common solution in HCES

is to list the most common food items consumed by the population, and include

“other, specify” items in each category in the parts of the survey in which

acquisition or consumption of additional food items can be recorded. However,

this entails additional challenges if the intention is to estimate nutrient contents,

as the matching with a food composition table becomes uncertain.33

Finally, when the objective is to collect data in order to evaluate the impact of a

nutrient fortification program, the food list should include all food items that are

directly fortified with such nutrients and their products. For example, if there is

an interest in assessing the nutritional impact of fortified wheat flour, the list of

foods should include fortified wheat flour and the products made with that type of

flour, such as bread, biscuits, and pies.

As diets evolve, food lists must be regularly updated to reflect dietary changes.

This is particularly relevant in urban areas where a wider variety of foods is

typically eaten, and processed foods34 and prepared foods form a larger share of

the diets (Popkin, Adair, and Ng, 2012) and budgets. Again, a nutrition

perspective entails a higher specificity of the food list, including and

distinguishing different levels of food processing, from minimally processed

foods such as yogurt, cheese, bread and frozen vegetables, to highly-processed

and ultra-processed foods rich in sugar, fat, and salt, which have been shown to

be associated with obesity and other diet-related diseases.35 Using data from

Brazil, for instance, a recent paper by Louzada et al. (2017) concludes that HCES

hold potential in reporting consumption of ultra-processed foods, as there is

substantial convergence between the data collected in an individual dietary intake

survey and HCES data in terms of relative energy consumption from ultra-

processed foods.

2.7. Non-standard units of

measurement To be meaningful, food quantities collected through HCES need to be

standardized; this allows for the aggregation and comparison of consumption

across food items and geographical areas, or for translation into nutrient

content.36 In many surveys, units are not standardized, especially in those

33 See Fiedler and Mwangi (2016) for a discussion on this issue. 34 Processed food is defined as: “Any food other than a raw agricultural commodity and includes any

raw agricultural commodity that has been subject to processing, such as canning, cooking, freezing,

dehydration, or milling” (USDA (1946). 35 The Brazilian dietary guidelines include a classification of foods by level of processing. See, for

example Brazil, Ministry of Health (2015). 36 Estimating the quantities of foods consumed away from home poses its own set of issues, mainly

concerning the paucity of data broken down by individual food items. Nevertheless, quantities can

be estimated if respondents report on the foods and dishes that were consumed rather than only their

total expenditures (see Smith and Subandoro (2007) for the detailed methodology to be used

when this information is available at https://unstats.un.org/unsd/statcom.

43

conducted in sub-Saharan Africa where non-standard units are commonly used in

daily life (Deaton and Dupriez, 2011).

Smith and Subandoro (2007) have identified seven primary methods for

collecting information about consumption quantities, and advocate using a

combination of those methods, as one method may be the optimal solution for

certain items, but it may not be appropriate for others.

1. Metric (i.e. standard) units: Respondents report quantities in metric

units, such as kilograms or liters.

2. Monetary value: Respondents report the monetary value of the quantity

consumed.

3. Local (i.e. non-standard) units, such as piles, baskets, or bunches.

4. A count of pieces: a type of non-standard unit.

5. Volumetric equivalents: respondents demonstrate how much space the

food they consumed would take up.37

6. Linear dimensions: respondents provide linear measurements (length

and width or circumference) for the amount of food consumed. As

Smith and Subandoro (2007) point out, this method likely takes more

time to complete, as it requires physical measurement rather than a

simple vocal response.

7. Food models: respondents choose a two- or three-dimensional

depiction of a food item that best corresponds to their consumption.

Three-dimensional models can provide very accurate estimates, but it

can be costly to prepare the models and calculate their weights.

For all but the first method, additional data are required to convert the reported

information into standardized, comparable (metric) quantities. Collecting

quantities in non-standard units or restricting respondents to only reporting in

standard units involves trade-offs in accuracy and feasibility. In the sample of

HCES analyzed by Smith, Dupriez, and Troubat (2014), the most common

method employed is requiring respondents to report in a metric unit of measure.

This method is usually the easiest to administer with the lowest budget and time

cost; it also requires the least amount of post-data processing, as the units are

already comparable across items.

However, restricting respondents to using only standard units may result in

inaccurate estimates, especially when respondents are not accustomed to using

them for all of the items they consume. Reporting quantities in non-standard units

is quite common for a wide range of commodities, especially in sub-Saharan

Africa. For example, in the second wave of the Ethiopia Socioeconomic Survey

from 2013 to 2014 in which non-standard units were allowed, nearly 50 percent

of farmers chose to report their harvests in non-standard units. In the Malawi

National Panel Survey, respondents chose non-standard units about 73 percent of

the time. This strongly indicates that many respondents are more comfortable

reporting quantities in non-standard units. Accordingly, allowing them to be used

37 An advantage of bounded recall is that the initial visit to begin the recall period also allows survey

teams to distribute standardized volumetric containers, such as an empty sack. This can be especially helpful in cases in which bulky root crops or plantains are dietary staples because a rural household

might fill a sack several times over in the course of a week, with root crop consumption of 50 kg or

more. The Papua New Guinea survey used by Gibson (2001) distributed empty 25 kg sacks and these were the preferred non-standard units for all root crops and vegetables, with local weighing

trials for converting sacks (and partial sacks) into kilograms.

44

eases the burden on respondents in terms of memory recall and conversion

calculations, reducing the accuracy of the resulting data.

Recent studies show that asking respondents to combine memory recall with

cognitive tasks, such as abstracting consumption to a “typical week or month,”

leads to less accurate self-reporting (Beegle et al., 2010). The forced conversion

from non-standard to standard units similarly combines cognitive and memory

recall; respondents must (a) have a clear understanding of what a standard unit of

the food item is (e.g. how much is a kilogram of rice); (b) estimate how many

standard units correspond to the non-standard unit they are familiar with; and (c)

use the conversion from step two to convert the quantity consumed into standard

units. The three stages place a cognitive burden on the respondent and can lead to

a sizable measurement error. It is also common practice for such calculations to

be conducted in-situ, often on-the-fly (as the respondent makes the calculations in

their head, perhaps prompted or assisted by an interviewer), further increasing the

likelihood for error.

Accordingly, the most important and overriding benefit of allowing respondents

to report quantities using non-standard units is that they will likely report more

accurately on consumed quantities. Results from a recent methodological study of

land area measurement support the preference for non-standard units. Carletto et

al. (2016) finds that when respondents are allowed to report land area in non-

standard units – instead of being forced to convert an area to standard units – the

self-reported estimates are much more accurate. The additional costs and

challenges associated with this method broadly fall into two categories: those

associated with preparation and implementation of a survey with non-standard

units; and ensuring that non-standard units measurements can be converted into

comparable standard units.

Before non-standard units can be used, country-specific information on common

non-standard units is needed to determine the list of allowable item-unit

combinations. Non-standard units must also be accompanied by standard unit

conversion factors. To directly compare and aggregate quantities, the data user

must convert all quantities into a common standard unit, such as kilograms.

Capéau (1995) and Capéau and Dercon (2006) suggest using econometric

techniques to compare unit values and estimate conversion factors that are

relatively simple to implement. The challenge with that approach is that unit

values can vary as a result of quality differences (Deaton, 1997), or from price

discounts on larger units (Attanasio and Frayne, 2006), or over time and space

because of the impact of transport and storage costs (Gibson and Kim, 2015). The

sources of variability in unit values that are unrelated to mass or volume can

result in distorted or imprecisely estimated conversion factors (Oseni, Durazo,

and McGee, 2017).

An alternative method is to collect weights for each allowable item-unit

combination and use them to create conversion factors. This is a relatively

straightforward concept. However, there are a whole set of challenges involved in

in properly implementing this concept. The standard weight for the same item-

unit combination can vary, even within a country. For example, Casley and

Kumar (1988) found that in Nigeria, an average “bundle” of sorghum weighed

45

between 26 and 49 kilograms, depending on the area. Thus, region-specific

conversion factors also need to be considered. Complicating matters further,

different levels of processing (fresh, dried or powered) lead to different

conversion factors for the same food item.38 This method has also been recently

implemented in the context of a project of survey harmonization in the countries

belonging to the West African Economic and Monetary Union, following the

recent guidebook published by the World Bank (Oseni, Durazo, and McGee,

2017) (see Box 4 for details).

Box 4

Collection of conversion factors in context of the Western Africa Economic and

Monetary Unit project

As part of the Regional Program to Harmonize and Modernize Living Conditions

Surveys in the West African Monetary Union, participating national statistical offices

collect consumption data that allows reporting in non-standard units of measurement.

To prepare for this, market surveys were conducted to establish conversion factors

from non-standard units, such as bowls and heaps, to standard units (kg, litre) for

commonly consumed food items. Following guidance from Oseni, Durazo, and

McGee (2017), the national statistical offices conducted market surveys ahead of the

main household surveys. In the spirit of harmonization, all countries involved in the

program followed the same procedure.

(a) Preparation phase: included a review of previous surveys and preliminary

markets visits to identify products and commonly used non-standard units.

The resulting product-unit list – and options to report additional found units

– were programd into Survey Solutions, a computer-assisted personal

interview software for data collection.

(b) Field work: was conducted in the post-harvest season to ensure wider

availability of products. Enumerators visited six markets (three rural, three

urban) in each region, using tablets to collect measurements from up to

three different vendors for each product-unit combination. They also took

photographs of each measurement which, thanks to the computer-assisted

personal interview application, were automatically linked to the

measurement data.

(c) Data analysis: includes:

o Data cleaning to detect outliers and rearrange unit sizes according to

the actual weights;

o Calculation of national- and regional-level conversion factors using

the median measured weight for each product-unit combination; and

o Identification of the best photographs with actual weights closest to

each conversion factor.

The resulting library of non-standard units materials will support the main household

survey. Enumerators will use the photo albums to guide respondents in reporting food

consumption quantities and the conversion factors will be used to flag unreasonable

quantities for further verification, all ensuring greater data accuracy.

38 This is similar to food crops harvested under different conditions, such as threshed, shelled, fresh,

and dried, which are proven to have a large impact on reported harvest quantities (Fermont and

Benson, 2011; Diskin 1999; Murphy, Casley, and Curry, 1991).

46

Unit conversion factors are often incomplete and lack supporting documentation,

which decreases the number of usable observations and makes it difficult to

cross-reference quantities or apply conversions across different datasets. Smith,

Dupriez, and Troubat (2014) pointed out that calculating metric food quantities is

feasible for only 53 percent of the surveys reviewed. Most of the difficulties

associated with this method can be addressed by ensuring that unit conversion

factor data are thoroughly collected and properly documented. When information

on most common non-standard units used and unit conversion factors is limited

or not available, the survey team must collect those data. This is most effectively

done by consulting with local experts and conducting a market survey prior to the

start of the regular data collection.

47

3.Conclusions and

recommendations This section presents the core of the guidelines, providing a set of recommended

practices for data collection. The recommendations are based on the literature,

empirical evidence, and considerations discussed in Section 2. The objective is

to promote the adoption of good practices, and encourage the abandonment of

some bad practices that are still employed in some surveys. Some of those

recommendations are straightforward and easy to implement and some are

grounded in firm empirical evidence, but some are based on balancing

incomplete pieces of evidence with practical considerations. Additional research

will be useful to reinforce the evidence base behind the entire set of

recommendations. A set of guidelines can, therefore, be extremely useful in

informing design decisions and fostering cross-country comparability in

approaches.

Such guidelines can only be seen as temporary as best practices will evolve and

depend on circumstances. As food consumption patterns evolve, statistical

systems change, and new technologies become available, survey design will

have to be adapted to stay relevant and cost- effective. It is anticipated that as

more survey methodological work is performed and new lessons are learned

from survey implementation, these guidelines will have to be periodically

revised to incorporate the new knowledge being generated and to respond to

additional or different data needs that may emerge. Furthermore, consumption

patterns change with income growth, changes in food technology and the

modernization of food systems, response rates tend to decline in higher income

economies and technology is already proceeding at a rapid pace with new

technologies relevant to survey operations entering the market every day. In

such a rapidly changing environment, the shelf life of these guidelines is

inevitably going to be limited. Accordingly, it would be desirable revise and

update them at least every 10 years.

The overarching concern in drawing these recommendations has been to ensure

the most appropriate balance between accuracy and cost effectiveness, keeping

in mind the typical constraints facing low- and middle-income country statistical

offices. In some cases, the recommendations offered might entail a costlier

option than what is currently practiced while, in other cases, it might imply a

cost saving. The benefits should be assessed not only in terms of greater

accuracy and comparability of the data being collected, but also whether the data

can be made relevant for a wider user base (e.g. nutritionists and food security

analysts, as well as statisticians and economists).

While professionals from those four fields contributed to the preparation of the

guidelines, future revisions would benefit from the involvement of an even

broader range of scholars and experts. Food is part of our cultures and societies,

and the way in which society (and survey respondents) relate to food is mediated

by social and cultural constructs. Perhaps, anthropologists can be recruited to

help in devising questionnaires that are better able to incorporate social and

cultural aspects in data-collection processes and outcomes. Similarly,

psychologists can help in designing more effective survey approaches by

48

providing insights on the cognitive process behind how respondents answer

questions, when and how they shift from enumeration to estimation strategies,

and how survey design should take that into account when thinking about such

details as the recall period, the length of a list, and the sequencing of survey

modules.

One notable gap in the guidelines is the discussion of price data collection.

Prices are clearly an important element of any analysis of poverty, and are

essential for welfare comparisons across households, regions and time. Price

data collection is also a major goal of HCES when they are required to inform

CPI calculations. However, even just from a perspective of analyzing food

consumption, food prices are needed to properly value consumption quantities,

and attaining information on prices is invaluable in cross-checking the

plausibility of reported values and quantities, especially when non-standard units

are used. Also, as food policy often relies heavily on price interventions,

measuring food prices is an important item for analysis in its own right.

However, price data collection was omitted from the guidelines because it has

implications that go well beyond food consumption, and also because it is such

an overarching topic, the view was taken that price data collection would

probably be best served by its own set of guidelines. It would have been difficult

to discuss and make recommendations about price data collection with reference

to food alone, abstracting from all the other demands on price data from other

uses and users.

It is also important to remind users that any change in method should have a

controlled comparison to “bridge” the effects of the old and new methodologies

on the resulting data. Concerns about losing comparability over time, and the

difficulty of explaining to the public the difference in estimates that come with

changes in methodology often prompt statistical offices to shy away from

changes in survey design, even when it is clear that there would be gains in

accuracy and cost savings in doing so. Building a controlled comparison in

survey planning would ease those concerns somewhat. From that standpoint, this

should be considered by statistical offices in low- and middle-income countries

and the donors and international agencies assisting them whenever the

implementation of a methodological change is being reviewed. One idea that

was put forward as part of the international discussion leading to the guidelines

was to create a global fund to support such controlled comparisons, as they are a

source of invaluable learning and hence might be considered a global public

good.

Box 5 contains a summary of priority research areas for the collection of data on

food consumption, based on gaps identified during the preparation of the

guidelines.

49

Box 5

A research agenda for food consumption data collection

The guidelines put forward clear recommendations based on existing evidence and

experience accumulated in survey practices over the past decades, but also recognize

that in several areas there is little sound methodological work to base

recommendations on. A list of priority areas based on the gaps that were identified

during the preparation of the document is provided below.

Food away from home is an area from which different approaches are emerging, but

there are only a limited number experimental methodological studies available. Given

the increasing importance of this component in calories and expenditure, this is

probably the area in which methodological research would be the largest. Several

national statistical offices have signaled their interest in participating in

methodological studies that would help improve the quality of the data they collect on

food away from home.

There are a number of different aspects of the choice between diary and recall, and

the length of the recall period that would benefit from more methodological studies.

Only a handful of experimental studies have been conducted on the topic in low- and

middle-income countries and a larger evidence base would be required to make the

extrapolation of results more robust. Specific questions in need of investigation in this

domain are:

• Bounding of recall. One concern with a short recall period (such as seven-

days) is telescoping. “Bounding” the recall period for a household with another visit

to mark the beginning of the recall period can, in principle, help reduce telescoping

and improve the quality of the recall. While this idea has been around for many years,

it has not been formally tested and compared to unbounded recall in a low- and

middle-income country setting.

• Telephone interview aids. Telephones are increasingly used in surveys,

including in low- and middle-income countries, as the coverage of the mobile phone

network increases. One way in which phones could help in person interviews is by

using follow-up phone calls to aid the filling of a diary, or the collection of a second

set of recall data.

• Multiple visits. One issue with seven-day recall discussed in the guidelines is

that the data are affected by “excess variability”. One way to reduce that is to perform

a second non-consecutive interview to the same households (in person or possibly via

telephone) (Gibson, 2016). This option is potentially attractive, but it has not been

tested at scale.

• Expand the evidence-base. In general, many of the conclusions in the

guidelines come from a small number of studies, with the SHWALITA dataset from

United Republic of Tanzania exerting probably an excessive influence on the current

consensus simply because of its uniqueness. Replicating more studies, in different

settings and regions with a similar set-up would help to expand the knowledge base

and provide more confidence when extrapolating results across countries.

There are other topics that might benefit from more research that the guidelines have

not touched upon. Some of them, which were also brought up during the global

consultation, include the measurement of food waste, the measurement of individual

consumption of specific population subgroups, such as children and women of

reproductive age, and the integration of different data sources. IAEG has called on

countries to consider setting up a global fund that could finance the implementation

of methodological studies and experiments to test and validate survey design options

in those domains.

Finally, as with all survey design choices, countries implementing these

recommendations should carefully evaluate the extent to which the adoption of

them may increase the burden on the respondent, how that risk can be mitigated,

and whether the return in terms of data quality is large enough to justify a

50

possible increase in non-response. As for the evaluation of survey costs, it is

impossible to evaluate in principle those trade-offs with any level of accuracy

and hence to be prescriptive about how to handle those survey design choices.

When implementing the guidelines in practice, however, care must be taken in

finding the right balance between keeping the overall length of the survey

manageable so as not to compromise the quality of the information collected.

This section is developed as follows: for each domain discussed in Section 2, a

summary of main findings is provided, followed by a set of recommendations.

3.1. Recall versus diary and length of

reference period

Summary

The trade-offs between diary and recall and between shorter and longer recall

periods have been highlighted in various survey experiments and analyses of

diary and recall approaches. In low-income economies, evidence suggests that

recall interviews are generally preferable to diary methods for capturing food

consumption when balancing implementation costs and the reliability of the

resulting estimates. The majority of the studies have found that food consumption

or monetary value data collected with recall interviews provides estimates that

are similar to or higher than those recorded in diaries. However, depending on the

implementation methods, diaries often show patterns of rapidly declining

consumption (and data quality) over the reference period. Lower and decreasing

consumption recorded in diaries is frequently attributed to respondents’ fatigue

and illiteracy in combination with poor supervision. Under close supervision,

diaries have proven to be reliable in several contexts and are sometimes

considered to be the gold standard, but when implemented with appropriate levels

of supervision (such as daily visits to households), they are generally costlier than

recall surveys; the detailed cost calculations by Beegle et al. (2012), suggest that

diaries are from 6 to 10 times more expensive than recall after taking into account

the fieldwork and the time-consuming coding and data entry requirements.

Recall surveys are affected by memory decay (memory loss) as the recall period

increases, and telescoping error (reporting of consumption outside of the recall

period) for shorter periods. The experimental evidence suggests that a seven-day

recall can perform as well as a diary in capturing food expenditures and their

variability. Recall periods that exceed 14 days are adversely affected by

significant memory decay, while diary fatigue already appears to be significant

after the first week. Regardless of how accurately they capture mean

consumption, surveys based on short recall periods (“snapshots”) always

overestimate the variability in habitual consumption.

51

For individual food items, short recall periods (such as a seven-day recall) are

affected on the one hand by underestimation of the incidence of consumption,

particularly for infrequently consumed items, and on the other by overestimation

of the value of consumption (conditional on positive consumption) because of

telescoping error. The recall error appears to be larger for less-frequently

consumed items and on short recall periods. The “usual month” consumption was

designed to deal with the conflict between a long reference period (to get

“typical” living standards) and a short recall period for feasible interviewing.

However, it has not worked as expected because “usual month” consumption

results tend to overestimate the incidence of consumption, particularly for

infrequently consumed items, and underestimate consumption values for staples.

Importantly, the “usual month” approach also imposes a significantly higher

burden on the respondent and results in longer interviews.

Bounding the recall period with an earlier visit and asking a household to recall

their consumption since the last visit of the enumerator is a possible option for

improving the quality of recall data. The evidence on the effectiveness of

bounding is limited. As this approach requires an additional visit to the

household, it is a costlier method. Accordingly, it cannot be recommended until

more research is performed to evaluate its benefits. Another approach gaining

ground recently is to complement seven-day consumption recall with data on the

last purchase in the past 30 days. The purpose of that approach is to better capture

the unit values of purchased food items. More methodological work is needed to

assess the performance of those approaches.

Recommendations

The following recommendations are provided for the choice of methods in

capturing food consumption in HCES surveys:

• Low-income countries are advised to adopt recall interviews and a

seven-day recall period, as this method provides the best balance

between accuracy and cost-effectiveness.

• Any survey using diary methods must be closely supervised to ensure

proper and consistent completion, especially in areas where illiteracy

rates are high. The reference period should not exceed 14 days.

Detailed metadata on how the diary was administered and supervised

should be distributed with the primary survey data.

• The “usual month” approach should not be used.

• Any change in the recall period or method (recall versus diary) should

be accompanied by an experimental component aimed at assessing the

impact of the change in survey estimates, which would allow for the

reconciliation of estimates before and after the change in methodology.

The studies by Backiny-Yetna, Steele, and Yacoubou (2017) and

Beegle et al. (2012) provide examples on how this can be done in

practice.

52

3.2. Seasonality, number of visits

Summary

Food consumption and expenditure can show systematic variation related to the

time of the year, month or week, as well as for agricultural seasons, holidays, and

festivals. Such seasonal patterns need to be considered in survey design and

analysis, as they are possible sources of significant bias and measurement error.

A survey that only captures food consumption or expenditure data in one period

of the year misses seasonal variations and is not likely to be representative of

habitual consumption throughout the year. Many surveys collect data using one

visit per household, concentrated over three to four months of fieldwork (48

percent of the surveys assessed by Smith, Dupriez, and Troubat, 2014). That

approach cannot capture seasonality effects and is, therefore, not recommended.

If it is adopted, to ensure that the timing of each round does not affect

comparability in the estimates from one year to the other, maintaining maintain

consistency in the timing of the fieldwork is imperative. It should be noted,

however, that even that provision may not be enough to ensure comparability

over time as seasonal weather patterns and dates of some important festivities

affecting consumption events, such the dates of Ramadan may change from year

to year. One possible way to capture seasonality may be to implement multiple

visits on a subsample of households, but as that approach has not been tested

widely, it is not offered as a recommendation.

The only way to accurately capture habitual consumption for each household is to

survey them multiple times over the year, but this is also the most expensive

option and in practice, is difficult to implement. Data collection spread over the

year, but with only one interview per household (and using a short recall period)

results in an accurate estimate of average consumption for the population, but

excess variability around the mean (Deaton and Zaidi, 2002; Deaton and Grosh,

2000).

Recommendations

When interest is in analyzing the habitual consumption of a population through

an extended period of time (usually one year), it is recommended that seasonality

be taken into account in survey design. The two options to consider, in order of

preference are:

• Conduct one visit per household, spreading the sample over 12 months

of fieldwork. The overall sample should be stratified quarterly (e.g.

split the overall sample into 12 monthly subsamples in a manner that

allows them to be aggregated into quarterly, nationally representative

subsamples).

• Conduct two visits per household, when the timing of the visits is

scheduled to capture seasonal variations (for instance, the first visit

could be during the lean period and the second after the main harvest).

53

Countries should carefully consider using more than two visits because of the

higher cost and the difficulty in managing teams that are associated with more

visits. Implementing more than two visits is not impossible, but in several cases

in which it has been attempted, there have been implementation problems.

Respondents’ burden also increases more than proportionally with each

additional visit.

Regardless of the approach chosen, care should be exercised to ensure that

enumeration is equally spread throughout the days of the week and the month

and change in timing in holidays, festivals, and harvest must be considered.

3.3. Acquisition versus consumption

Summary

Surveys differ as to whether and how they capture food consumption and

acquisition. Typically, household budget surveys focus on collecting data to

construct CPIs, and as a consequence, recording food items acquired through

market purchases. However, as HCES are increasingly used for poverty and

food security analysis, emphasis has shifted towards also collecting data on

food items procured through own production, barter, gifts, and payments in

kind, which are particularly common in rural areas. Information on own

production, barter, gifts, and payments in kind is also important for national

accounts statistics mainly because food acquired through those channels are

included in the household final consumption.

Conforti, Grünberger, and Troubat (2017) classified food data-collection

practices in three approaches: (a) Acquisition: Households report food acquired

through purchases, own production, and in-kind transfers, while consumption of

food is not reported; (b) Combination of acquisition and consumption:

Households report food acquired through purchases without specifying the

amount consumed and report food consumption from own production or in-kind

transfers; and (c) Consumption: Households report food actually consumed, and

indicate whether that same food was purchased, own-produced, or received as a

transfer.

Irrespective of which of the three approaches is adopted by a survey, it is

important that information is collected on all the food that becomes available to

the household through all the possible means of acquisition. It is also important

that the survey objectives are clear on whether the information being collected is

on acquisition, consumption, or both. Regarding consumption, clarity is needed

on whether surveys refer to food intended for consumption (i.e. including food

waste) or food actually consumed, which excludes food waste. In the review by

Smith, Dupriez, and Troubat. (2014), issues have emerged related to current

practices on the neglect of food received in kind, which were not collected in 14

percent of the surveys (4 percent did not collect information on own-produced

food).

54

When combining information on sources of acquisition and consumption, care

should be taken to ensure that the question wording does not lead to

incompleteness or ambiguity in enumeration. Smith, Dupriez, and Troubat (2014)

have found that 38 percent of HCES have issues with the wording of rule out

questions, namely ambiguity on whether acquisition or consumption is being

asked.

Recommendations

• All surveys should collect data on all main modes of food acquisition,

namely:

o Purchases.

o Household’s own production.

o Received in kind. Surveys need to explicitly inquire about in-

kind sources that are otherwise likely to be missed, such as

payments for labour and participation in social programs.

Those in-kind sources can be aggregated, but care should be

taken to avoid duplicating information captured in other

sections of the survey (e.g. employment and social assistance).

If public social assistance transfers are not captured elsewhere

then it would be important to disaggregate the data.

• Surveys should be designed so that it is clear to respondents,

enumerators, and data users exactly what information is requested and

reported, whether the information required is on acquisition, or

consumption, or both.

o In the case of consumption, it should be clear whether the

questions concern food intended for consumption (including

food waste) or food actually consumed (net of food waste).

o If total amount of food purchased over the recall period is the

variable of interest, it is then recommended to add an additional

question on the amount consumed out of those purchases to

avoid mixing acquisitions from purchases with consumption

from own production and in-kind transfers.

• Surveys should exercise care to avoid possible sources of incomplete or

ambiguous enumeration commonly found in current survey practices.

o When using a filter question (30 percent of surveys assessed by

Smith, Dupriez, and Troubat (2015) have a leading or filter

question):

▪ Avoid leading or filter questions in cases in which

respondents are asked first if they consumed a food

over a certain recall period instead of details about

consumption. A negative response to the first question

results in skipping questions on quantities acquired

but not consumed during the recall period. This leads

to systematic underestimation of the quantities or

expenditure of food acquired.

▪ Avoid filter questions that focus on food purchases.

This leads to underestimation of mean food

acquisition for the population by failing to account for

55

food acquired through own production or in-kind

transfers.

o For consumption from own production, the question must be

worded to clearly indicate food consumed from own production

rather than all food harvested. When this distinction is not

made, the quantities or expenditure reported may include food

entering the households’ production stocks – not for immediate

consumption – and as a result, food consumed from home

production is systematically overestimated.

3.4. Meal participation

Summary

Household surveys collect information on the total amount of food to be

consumed by a household over a certain reference period. To convert this

information to a per capita basis and to conduct an analysis of the adequacy of

food consumption and nutrient intake, it is important to know exactly how many

people (partakers) consumed the total amount of food reported by the household.

For food and nutrition security analyses, it is also useful to collect data on the

physiological status of the partakers (e.g. pregnant or lactating status for women),

as that affects nutrition requirements. Furthermore, food may be shared with non-

household members, or household members may not be in the household when

the food is consumed. Neglecting those occurrences adds measurement error to

the distribution of per capita food consumption.

There are two main approaches to adjust household per capita consumption for

the number of partakers. The first approach entails counting the number of people

who shared the household’s meals and then dividing the total household

consumption by the number. That approach, however, is not very precise, as it is

not easy to account for situations in which people participate only in some meals

per day, such as employees. The second approach involves counting the number

of meals taken by each household member and non-household members over the

reference period for which food data are collected. It is more precise, but it is also

more difficult to implement. As very little methodological work has been carried

out to formally test the cost and benefits of adopting competing options for

accounting for partakers, this is an area to focus on in future research. The

recommendations provided below, therefore, are also somewhat more generic

than those provided for other areas of survey design.

Recommendations

It is recommended that all HCES should consider adding an individual household

member-based meal module. As collecting information on individuals is

expensive and difficult, this can be implemented as part of the module collecting

information on food away from home (see below). On the other hand,

practitioners should realize that adding an individual household member-based

56

meal module would make it possible to eliminate other questions that are

commonly used in surveys.39

If an individual member-based meal module cannot be adopted, a less preferred

alternative, would be to collect the following information at the household level

through a proxy respondent:

• How many meals does [NAME] usually take in a day?

• How many days in the past X days was [NAME] present in the

household?

• How many meals during the past X days, did [NAME] purchase or

receive, and eat away from home?

• How many meals during the past X days, did [NAME] eat at home?

• Does [NAME] get meals at school?

• How many meals were served to non-household members during the last

X days?

• Did the household host a ceremony, party or festival in the past X days,

during which a large number of meals (not just snacks) were served to

non-household members? If “yes”: How many attended?

• During the last X days, were there non-household members who stayed

one or more nights in the household as a guest? If “yes”: How many

nights did they stay? How many meals were they (summed together)

served during their stay? How many of the guests were children <15

years old? How many adults 16 and older?

• During the last X days, were any meals served to non-household

members? (Other than those served guests who stayed overnight.)

3.5. Food away from home

Summary

Rapid urbanization and economic growth are typically associated with an

increase in the consumption of food away from home in absolute terms and as a

share of calories and food expenditure. Implementing traditional HCES

questionnaires that are focused on household food consumption at home has the

risk of underestimating food away from home by missing the increasing effect on

the proportion of calories and expenditure through changes in food systems. Food

away from home consumption is particularly important because food consumed

outside the home tends to be more calorie-dense and less nutrient-dense than food

consumed at home. The increase in the amount of food consumed away from

tends to rise with increases in income.

Failing to account for food away from home has been shown to affect measures

of poverty and inequality, including inequality in the distribution of dietary

39 This includes questions such as “How many meals are usually taken per day in your household?”,

“How many days in the past X days was [NAME] present in the household?”, “Did [NAME] eat

meals in this household in the last X days?”, “Does [NAME] get meals at school?”, “Did [NAME] consume any meals/snacks/drinks outside the household in the past X days?”. The information

collected in these questions would now be captured in an individual level module.

57

energy consumption. There is a variety of sources for attaining food away from

home, including, among them, restaurants, schools, places of work, and street

vendors. Survey design needs to be able to account for all of them, as they can be

of varying importance to different groups in the population. Failing to account for

food away from home affects not only the mean, but also the distribution of the

indicators of interest.

An additional challenge is that while the “main food preparer" can be expected to

be reasonably informed about food at home, it is much more difficult for that

person to respond to questions on food away from home, as they relate mostly to

meal events taking place out of his or her sight. Proxy respondents may be able to

report on which household members consumed which meals away from the

households, but they are unlikely to be informed about the cost or content of

those meals. Such information can only be reliably collected through individual

interviews.

Recommendations

Data collection on food away from home should preferably be done at the

individual level, asking the questions separately for each individual. For all

individuals who report having consumed meals outside the home, the minimum

information attained should be on the value of the meals by meal event

(breakfast, lunch, dinner, and snacks). While more research on this topic is

urgently needed, based on current knowledge, the following guidelines are

suggested for the design and implementation of a survey module for the

measurement of consumption of food away from home in recall surveys:

• The practice of collecting food away from home information with just one

question should be discontinued.

• The importance of food away from home warrants the design of a separate

module, based on a clear definition of food away from home. In

particular, surveys should be clear in identifying how to collect

information on potentially ambiguous categories of food: “food

prepared at home and consumed outside” and “food prepared outside

and consumed at home.” The latter can be integrated into the food at

home module (e.g. takeout food) provided that it is made clear to

enumerators, respondents and data users that this is the case.

• Data collection should be organized around meal events, including snacks

and drinks. At a minimum, surveys should collect information on the

value of all meals consumed during a meal event away from home

(breakfast, lunch, dinner, solid snacks or drinks). The meal events list

should be adapted to the local context.

• Considerations regarding the feasibility, costs, and accuracy should inform

the determination of which option to choose between individual

modules and the proxy respondent. Food away from home is best

collected though individual-level interviews of adults. A proxy

respondent can be used to report on children’s meals away from home

and other adults. Possible variations include:

1. Proxy respondents (i.e. a household level module) can be used

to report on the number of individuals who consumed meals

58

away (as in block 4 of the sixty-eighth round of the India

Survey). Detailed information on the meals, such as cost and

meal content, should be collected directly from the relevant

household member, including possibly on a targeted, carefully

designed subsample.

2. Total expenditure on food away from home can be collected at

the household level using a daily food away from home record

sheet provided to a trained proxy respondent.

• Surveys should identify the most frequent place of consumption for each

meal event, such as restaurants, street vendors, work, or schools,

adapting the place of consumption categories to the local context.

• Surveys should use the same reference period for food away from home as

what is used in the food consumed at home module.

• The data to estimate food away from home-related nutrient content, when

feasible, should come from other data sources integrated to the HCES,

such as a survey of food establishments (Farfán, Genoni, and Vakis,

2017) or administrative data on the content of public meals, such as

those given by schools and social programs.

3.6. List of food items

Summary

The following basic principles should inform the design of the HCES list of food

items:

• The number of food items should balance the lower memory lapses,

costs, and interview time associated with short lists, with the better

recall and more comprehensive reporting associated with the longer list.

• Food items representing the large majority of food expenditure,

including nutrient-rich food items, should be included on the list.

• The description of the food items must be explicit enough to match only

one entry in the reference food composition table.

• Each food item must be exclusive, each food group should be

represented and each food item should belong to only one food group).

Adoption of a food classification system may help in meeting those criteria.

Although the adoption of a standard classification system may involve some

challenges given the country specificity of the diet, such a classification is

recommended as it allows survey harmonization in terms of methodology,

analysis and findings. For many of the basic purposes of HCES, such as for the

computation of CPI or for input into national account systems, by far, the most

widely adopted standard classification is COICOP. Harmonization eases cross-

survey comparability, ensures international comparability of inflation and

purchasing power parities, and allows for the assessment of the

comprehensiveness of the food lists.

59

For surveys that are not intended to be used for the calculation of CPI calculation

or for national accounts, COICOP list may be too extensive. There is a widely

acknowledge trade-off in the number of items to be included in a food list.

Aggregated item lists (usually about 15 items) provide lower estimates than more

detailed item lists. On the other hand, a too detailed list of items might have a

negative effect, increasing enumerator and respondent fatigue. A universally

valid solution does not exist because the optimal quantity of items strongly

depends on regional food consumption habits. Accordingly, a food list must be

country specific, representative of the dietary and consumption habits of all

segments of a population, and capture evolving trends in dietary patterns. Useful

information about the frequency and importance of each food item’s dietary and

expenditure patterns can be drawn from previous HCES or dietary survey data

carried out in a given country.

As noted, food lists will inevitably be country-specific. Even so, some rules-of-

thumb or general guiding principles can be identified to help survey designers

determine food lists to capture food consumption and expenditure information

that is disaggregated in a way that can be useful for dietary quality analysis.

Involving nutritionists in the design of the food lists can ensure that their data

needs are properly taken into account. Fiedler and Mwangi (2016) suggest that to

meet all of those requirements, in most cases a list of 100 to 125 items is needed.

Many experts agree with that view, but it can only be seen as an indicative rule of

thumb.

Recommendations

Different classifications can be used to harmonize data and foster comparability

across surveys. These guidelines encourage survey designers to use the COICOP

system, as it is currently the basis of classification used in a wide number of

datasets, in line with the requirements specified in the System of National

Accounts 2008. Food COICOP classifications are mainly structured using the

basic food groups described in Table 4.40 Additional regional or global food lists

that can inform the development of a survey food list include the FoodEx241

classification, which is being tested in Asian countries (European Food Safety

Authority, 2014).

40 See http://unstats.un.org/unsd/cr/registry/regcst.asp?Cl=5 for the detailed list of all COICOP

categories. See also the revisions discussed at the forty-ninth session of the United Nation Statistics

Commission, which was held in New York, 6–9 March 2018 (Technical Subgroup for the Revision of COICOP (2018)). 41 See https://www.efsa.europa.eu/en/supporting/pub/en-804 for more details on FoddEx2

classification.

60

Table 4

Basic Food Groups for COICOP classifications

Basic food group Food items in each group

Bread and cereals Rice

Other cereals, flour, and other products

Bread

Other bakery products

Pasta products

Meat Beef and veal

Pork

Lamb, mutton, and goat

Poultry

Other meats and meat preparations

Fish and seafood Fresh, chilled, or frozen fish and seafood

Preserved or processed fish and seafood

Milk, cheese, and

eggs

Fresh milk

Preserved milk and other milk products

Cheeses

Eggs and egg-based products

Oils and fats Butter and margarine

Other edible oils and fats

Fruit Fresh or chilled fruits

Frozen, preserved, or processed fruit and fruit-based products

Vegetables Fresh or chilled vegetables other than potatoes

Fresh or chilled potatoes and other tubers

Frozen, preserved, or processed vegetables and vegetable-

based products

Sugar, jam, honey,

chocolate, and

confectionery

Sugar

Jams, marmalades, and honey

Confectionery, chocolate, and ice cream

Food products not

elsewhere classified

Salt, spices, condiments

Dessert preparations, soups, and broths

Baby food and dietary preparations

Non-alcoholic

beverages

Coffee, tea, and cocoa

Mineral waters, soft drinks, fruit and vegetable juices

Alcoholic

beverages

Spirits

Wine

Beer

Stimulants

Catering services Restaurants, cafés, and others

Canteens

Data should be collected on all of the types of foods and beverages that make up

the country’s human diet. Lists should be kept up to date to take into account

changing dietary habits, while keeping in mind that that products that account for

minimal budget shares can have particular nutritional values. A list of general

principles that can guide the design of a food list includes the following criteria:

61

• The presence of foods from all the main food groups (e.g. the 16-food

group classification on which the Household Dietary Diversity Score is

based (Kennedy, Ballard, and Dop, 2011).

• An adequate representation of processed foods (with all the degree of

processing from highly processed to moderately processed).

• The inclusion of only foods and no other commodities (principle of

“food exclusivity”; e.g. the food list should not include an item

composed by food and non-food items, such as “alcohol and tobacco”.

• The list needs to include a reasonable number of individual items (the

most common ones) for each of the main food groups. An “other”

category should be added when relevant (e.g. “other fruits” or “other

vegetables”) to record the acquisition or consumption of additional food

items. It is important that such categories remain marginal as quantities

cannot be collected under those categories and food matching is

imperfect.

• Food items (other than prepared dishes) should not span multiple food

groups (e.g. avoid “eggs or milk products” as one group). Only group

food items with similar nutritional properties in one question (e.g. avoid

“Mineral water or soft drinks”). Avoid grouping different status of the

same food item with different nutritional properties (e.g. fresh or dried

fish, fresh or dried milk).

• Avoid broad categories that do not allow identifying the type of food,

such as “snacks”, “canned foods” and “baby food.”

• Food items that are the object of product-specific government-

subsidized programs should be listed individually on the food list.

• Foods that are fortified or have the potential to be the vehicle of food

fortification programs (e.g. iodized salt, fortified flour or cooking oil)

should be listed individually in the food list.

• Micronutrient (e.g. vitamin-A or iron) rich foods, such as sweet potatoes

and liver should be listed individually.

• Food lists can be built from national food composition tables or

databases to ease later food matching. Because of the importance of

having good and updated food composition tables or databases, it is also

recommended that countries invest in the update or development of

country or regional food composition tables.

62

3.7. Non-standard units of

measurement

Summary

To date, there is no standard methodology applied to the collection of food

quantities in cases in which respondents (and enumerators) are less familiar with

standard measurement units. Common practice has been either to require

households to report everything in standard units, or for enumerators to estimate

the standard unit conversions on an ad hoc basis. Both approaches are

problematic and lead to inaccuracies and inconsistencies in the data reported.

Several national statistical offices maintain a library of conversion factors for

local units, but they are often incomplete, not updated, and tend not to be

consistently implemented. Units of measurement are critical for many aspects of

data collection and quality assurance. Qualitative feedback from field

practitioners, and extensive feedback from initial piloting of those methods,

suggest that allowing non-standard units of measurement can increase the

accuracy of reported quantities, primarily by reducing respondent burden. To

properly benefit from allowing non-standard units options, reporting must be

paired with a framework for consistently converting non-standard units into

standard units, based on reliably documented conversion factors. As conversion

factors typically involve weighing and measuring, which are some of the

activities carried out in price surveys – at least for unpackaged foods, such as root

crops and vegetables – the treatment of non-standard units should also be covered

in guidelines for price surveys.

Recommendations

• The decision on whether to allow the use of non-standard units of

measurement should be addressed during the design phase. By doing

this, the tendency for units of measurement to be determined on an ad

hoc or inconsistent basis during fieldwork is reduced.

• Though non-standard units are used around the world (in countries of all

income levels) the cost-benefit ratio of incorporating them into each

survey should be evaluated, focusing particularly on their prevalence of

use. If needed, conduct a pilot survey to determine the extent to which

respondents need non-standard units– extensively, minimally, or not at

all.

• When feasible, allow households to report in both standard and non-

standard units of measure based on what they are most familiar with for

each item reported. If avoiding formal use of non-standard unit still

leads to ad hoc field conversions, then even a rather limited set of non-

standard units with less of an implementation burden, may still be

worthwhile.

63

• It is critical to establish (define or collect) conversion factors for all

non-standard units that are to be used. Additional features to improve

the accuracy of reported non-standard units quantities, such as market

surveys to establish accurate non-standard units and conversion factors,

photo reference aides, and on-the-spot value verification using

computer-assisted personal interviews, may also be considered.

• National statistical offices and implementation partners should work

together to establish non-standard units databases that can be used

across surveys, effectively increasing the standardization of the units,

while also limiting the cost to implement them. To this end, survey

implementers should thoroughly document all non-standard units

protocols and related conversion factors and make them publicly

available.

75

4.Bibliography

Ahmed, N., Brzozowski, M. & Crossley, T.F. 2006. Measurement errors in

recall food consumption data. IFS Working Papers, 06(21), Institute for Fiscal

Studies.

Alderman, H. 1996. Saving and economic shocks in rural Pakistan. Journal of

Development Economics, 51(2): 343–365.

Alkire, S. & Foster, J. 2011. Counting and multidimensional poverty

measurement. Journal of Public Economics, 95(7-8): 476–487.

Attanasio, O. & Frayne, C. 2006. Do the poor pay more? Presented at the

Eighth BREAD Conference on Development Economics, Ithaca, New York, 5–6

May.

Backiny-Yetna, P., Steele, D. & Yacoubou Djima, I. 2017. The impact of

household food consumption data collection methods on poverty and inequality

measures in Niger. Food Policy, 72(17): 7–19.

Bai, J., Wahl, T.I., Lohmar, B.T. & Huang, J. 2010. Food away from home in

Beijing: effects of wealth, time and “free” meals. China Economic Review,

21(3): 432 –441.

Bailey, R.L., West Jr., K.P. & Black, R.E. 2015. The epidemiology of global

micronutrient deficiencies. Annals of Nutrition and Metabolism. 66 (suppl. 2):

22–33.

Banerjee, A.V. & Duflo, E. 2011. Poor Economic: A Radical Rethink of the

Way to Fight Global Poverty. New York: Public Affairs.

Barsky, R.B. & Miron, J.A. 1988. The seasonal cycle and the business cycle.

NBER Working Paper No. 2688.

Bee, A., Meyer, B.D. & Sullivan, J.X. 2012. The validity of consumption data:

Are the consumer expenditure interview and diary surveys informative?. NBER

Working Paper Series, w18308.

Beegle, K., de Weerdt, J., Friedman, J. & Gibson, J. 2012. Methods of

household survey consumption measurement through surveys: experimental

results from Tanzania. Journal of Development Economics, 98(1): 3–18.

Bezerra, I.N. & Sichieri, R. 2009. Eating out of home and obesity: a Brazilian

nationwide survey. Public Health Nutrition, 12(11): 2037–2043.

Binkley J.K., Eales, J. & Jekanowski, M. 2004. The relation between dietary

change and rising US obesity. International Journal of Obesity and Related

Metabolic Disorders, 24(8): 1032–1039.

76

Black, R.E., Allen, L.H. Bhutta, Z.A. Caulfield, L.E., de Onis, M., Ezzati,

M., Mathers, C., & Rivera, J. 2008. Maternal and child undernutrition: global

and regional exposures and health consequences. Lancet, 371(9608): 243–260.

Borlizzi, A., del Grossi, M.E. & Cafiero, C. 2017. National food security

assessment through the analysis of food consumption data from household

budget and expenditure surveys: the case of Brazil’s Pesquisa de Orçamento

Familiares 2008/09. Food Policy, 72(C): 20 –26.

Bouis, H.E. 1994. The effect of income on demand for food in poor countries:

Are our food consumption databases giving us reliable estimates? Journal of

Development Economics, 44(1): 199 –226.

Bouis, H.E., Haddad, L. & Kennedy, E. 1992. Does it matter how we survey

demand for food? Evidence from Kenya and the Philippines. Food Policy,

17(5): 349360.

Bradburn, N. 2010. Recall period in consumer expenditure surveys program.

Unpublished. National opinion Research Center University of Chicago.

(available at https://www.bls.gov/cex/methwrkshprecallperiod.pdf).

Brazil, Ministry of Health. 2015. Dietary Guidelines for the Brazilian

Population. Brasilia, Ministry of Health of Brazil.

Browning, M., Crossley, T.F. & Winter, J. 2014. The measurement of

household consumption expenditures. Annual Review of Economics, 6(1): 475-

501.

Burns, C., Jackson, M. & Gibbons, C. 2002. Foods prepared outside the

home: association with selected nutrients and body mass index in adult

Australians. Public Health Nutrition, 5(3): 441–448.

Capéau, B. 1995. Measurement error and functional form: a proposal to

estimate prices and conversion rates from the ERHS1994. Mimeo.

Capéau, B. & Dercon, S. 2006. Prices, unit values and local measurement units

in rural surveys: an econometric approach with an application to poverty

measurement in Ethiopia. Journal of African Economies, 15(2): 181211.

Carletto, C., Sydney, G., Murray, S. & Zezza, A. 2016. Cheaper, faster, and

more than good enough: is GPS the new gold standard in land area

measurement? Policy Research working paper series 7759, The World Bank.

Carroll, C.D., Crossley, T.F. and Sabelhaus, J. (eds.) 2015. Improving the

Measurement of Consumer Expenditures. National Bureau of Economic

Research Studies in Income and Wealth. Chicago, IL., University of Chicago.

Casley, D. & Kumar, K. 1988. The Collection, Analysis and Use of Monitoring

and Evaluation Data. Baltimore, MD, Johns Hopkins University Press for

World Bank.

77

Chen, S. & Ravallion, M. 2010. The developing world is poorer than we

thought, but no less successful in the fight against poverty. The Quarterly

Journal of Economics, 125(4): 1577–1625.

Coates, J., Colaiezzi, B., Fiedler, J.L., Wirth, J., Lividini, K. & Rogers, B.

2012. A program needs-driven approach to selecting dietary assessment methods

for decision-making in food fortification programs. Food and Nutrition Bulletin,

33(2): 146156.

Conforti, P., Grünberger, K. & Troubat, N. 2017. The impact of survey

characteristics on the measurement of food consumption. Food Policy, 72(C):

4352.

Coudouel, A., Hentschel, J.S. & Wodon, Q.D. 2002. Poverty measurement

and analysis. In J. Klugman, ed. A Sourcebook for Poverty Reduction Strategies,

vol. 2, pp. 29–69. Washington, DC: World Bank.

D’Souza, A. & Jolliffe, D. 2012. Rising food prices and coping strategies:

household-level evidence from Afghanistan. Journal of Development Studies,

48(2): 282–299.

D’Souza, A. & Tandon, S. 2014. How well do household level data

characterize food security? Evidence from the Bangladesh Integrated Household

Survey. Mimeo. Paper presented at the Inter-Agency Working Group Workshop

on Strengthening HCES for Food and Nutrition Analysis, 5 –6 November,

Rome.

Deaton, A. 1997. The Analysis of Household Surveys: a Microeconometric

Approach to Development Policy. Washington DC and Baltimore, MD, The

World Bank and Johns Hopkins University Press.

Deaton, A. & Dupriez, O. 2011. Purchasing power parity exchange rates for

the global poor. American Economic Journal, 3: 137 –166.

Deaton, A. & Grosh, M. 2000. Consumption. In M. Grosh and P. Glewwe, eds.

Designing household survey questionnaires for developing countries: Lessons

from 15 years of the Living Standards Measurement Surveys. Washington, DC,

World Bank.

Deaton, A. & Zaidi, S. 2002. Guidelines for Constructing Consumption

Aggregates for Welfare Analysis. Washington, DC, World Bank.

De Weerdt, J., Beegle, K., Friedman, J. & Gibson, J. 2016. The challenge of

measuring hunger through Survey. Economic Development and Cultural

Change, 64(4): 727 –758.

Diskin, P. 1999. Agricultural Productivity Indicators Measurement Guide.

Arlington VA, Food Security and Nutrition Monitoring (IMPACT) Project,

ISTI, for the United States Agency for International Development.

European Food Safety Authority 2014. Guidance on the EU Menu

methodology. EFSA Journal 2014, 12(12):3944.

78

FAO 2006. The Sixth World Food Survey. World Food Summit. Rome, Italy:

FAO.

FAO/INFOODS. 2012. Guidelines for food matching. Version1.2. Rome, Italy:

FAO.

FAO, International Fund for Agricultural Development (IFAD), World

Food Program (WFP), UNICEF & World Health Organization (WHO).

2017. The State of Food Security and Nutrition in the World. Building resilience

for peace and food security. Rome, FAO.

Farfán, G., Genoni, M.E. & Vakis, R. 2017. You are what (and where) you

eat: capturing food away from home in welfare measures. Policy Research

working paper No. 7257. Washington, DC., World Bank.

Fermont, A. & Benson, T. 2011. Estimating yield of food crops grown by

smallholder farmers. IFPRI Discussion Paper. Washington DC.: International

Food Policy Research Institute.

Ferreira, F.H.G., Chen, S., Narayan, A., Sangraula, P., Dabalen, A.L.,

Serajuddin, U. & Yoshida, N. 2015. A global count of the extreme poor in

2012: data issues, methodology and initial results. World Bank Policy Research

Working Paper No. 7432. Washington, DC, World Bank.

Fiedler, J.L. 2013. Towards overcoming the food consumption information

gap: strengthening household consumption and expenditures surveys for food

and nutrition policymaking. Global Food Security, 2(1): 56–63.

__________2015. Household consumption and expenditures surveys:

comparing 7- and 30-day recall estimates of food consumption from India’s

2011/12 national survey sample. Paper prepared for the International Statistical

Institute’s World Congress of Statistics Panel Measurement Issues for Food

Fiedler, J.L., Carletto, C. & Dupriez, O. 2012. Still waiting for Godot?

Improving Household Consumption and Expenditures Surveys (HCES) to

enable more evidence-based nutrition policies. Food & Nutrition Bulletin, 33(3

supplement): 242–251.

Fiedler, J.L., Lividini, K., Bermudez, O.I. & Smitz, M.F. 2012. Household

consumption and expenditures surveys (HCES): a primer for food and nutrition

analysts in low-and middle-income countries. Food & Nutrition Bulletin, 33(3):

170–184.

Fiedler, J.L. & Macdonald, B. 2009. A strategic approach to the unfinished

fortification agenda: feasibility, costs, and cost-effectiveness analysis of

fortification programs in 48 countries. Food and Nutrition Bulletin, 30(4): 283–

316.

Fiedler, J.L., Martin-Prével, Y. & Moursi, M. 2013. Relative costs of 24-hour

recall and household consumption and expenditures surveys for nutrition

analysis. Food Nutrition Bulletin, 34(3): 318–330.

79

Fiedler, J.L. & Mwangi, D.M. 2016. “Improving household consumption and

expenditure surveys’ food consumption metrics: developing a strategic approach

to the unfinished agenda”. IFPRI (unpublished).

Fiedler, J.L., Sanghvi, T.G. & Saunders, M.K. 2008. A review of the

micronutrient intervention cost literature: program design and policy

lessons. International Journal of Health Planning and Management, 23(4): 373–

397.

Fiedler, J.L. and Yadav, S. (2017). How can we better capture food away from

home? lessons from India’s linking person-level meal and household-level food

data. Food Policy, 72: 81–93.

Finn, A. & Ranchhod, V. 2015. Genuine fakes: The prevalence and

implications of data fabrication in a large South Africa survey. The World Bank

Economic Review, 31(1): 129–157.

Foster, J., Suman S, Loshin, M., and Sajaia, Z. 2013. A Unified Approach to

Measuring Poverty and Inequality: Theory and Practice: Streamlined Analysis

with ADePT Software. Washington, DC, World Bank.

Friedman, J., Beegle, K., De Weerdt, J. and Gibson, J. 2017. Decomposing

response error in food consumption measurement: implications for survey

design from a randomized survey experiment in Tanzania. Food Policy, 72: 94–

11.

Gewa, C.A., Murphy, S.P. & Neumann, C.G. 2007. Out-of-home food intake

is often omitted from mothers' recalls of school children's intake in rural Kenya.

Journal of Nutrition, 137(9): 2154–2159.

Gibson, J. 2001. Measuring chronic poverty without a panel. Journal of

Development Economics, 65(2): 243–266.

__________2007. A guide to using prices in poverty analysis. Mimeo. The

World Bank (also available at http://siteresources. worldbank.

org/INTPA/Resources/429966-

1092778639630/Prices_in_Poverty_Analysis_FINAL. Pdf).

__________2013. “Two decades of poverty in Papua New

Guinea”. Presentation at Crawford School, ANU, 5 June, Canberra. (available at

https://devpolicy.crawford.anu.edu.au/sites/default/files/events/attachments/201

3-06/crawford_school_seminar_gibson_two_decades_poverty_png.pdf).

__________2016. Poverty measurement: we know less than policy makers

realize. Asia & the Pacific Policy Studies, 3: 430–442.

Gibson, J., Beegle, K., De Weerdt, J. & Friedman, J. 2015. What does

variation in survey design reveal about the nature of measurement errors in

household consumption? Oxford Bulletin of Economics and Statistics, 77(3):

466474.

80

Gibson, J. & Kim, B. 2012. Testing the infrequent purchases model using

direct measurement of hidden consumption from food stocks. American Journal

of Agricultural Economics, 94(1): 257270.

__________2015. Hicksian separability does not hold over space: Implications

for the design of household surveys and price questionnaires. Journal of

Development Economics, 114: 3440.

Gibson, J., & Rozelle, S. 2002. How elastic is calorie demand? Parametric,

nonparametric, and semiparametric results for urban Papua New

Guinea. Journal of Development Studies, 38(6): 2346.

Gibson, J., Rozelle, S. & Huang, J. 2003. Improving estimates of inequality

and poverty from urban China's household income and expenditure survey,

Review of Income and Wealth, 49(1): 5368.

Gibson, R.S. 2005. Principles of Nutritional Assessment. 2nd edition. New

York, Oxford University Press.

Gilbert, C.L., Christiaensen, L. & Kaminski, J. 2016. Price seasonality in

Africa: measurement and extent. World Bank Policy Research Working Paper,

7539. Washington, DC., World Bank Group.

Grosh, M. & Glewwe, P., eds. 2000. Designing Household Survey

Questionnaires for Developing Countries: Lessons from 15 Years of the Living

Standards Measurement Study. Volume 1. Washington, DC, World Bank.

Grünberger, K. 2017a. Estimating habitual food consumption with Household

Consumption and Expenditure Surveys. How long households’ food

consumption should be observed? Mimeo, 2017.

__________2017b. The impact of meal participation on food security indicators

derived from Household Consumption and Expenditure Surveys. Mimeo, 2017.

Guthrie, J.F., Lin, B.H. & Frazao, E. 2002. Role of food prepared away from

home in the American diet, 1977–78 versus 1994–96: changes and

consequences. Journal of Nutrition Education and Behavior, 34(3): 140–150.

Henríquez-Sánchez, P., Sánchez‐Villegas, A., Doreste-Alonso, J., Ortiz-

Andrellucchi, A., Pfrimer, K. & Serra-Maje, L. 2009. Dietary assessment

methods for micronutrient intake: a systematic review on vitamins. British

Journal of Nutrition, 102: S10S37

Hurd, M. & Rohwedder, S. 2009. Methodological Innovations in Collecting

Spending Data: The HRS Consumption and Activities Mail Survey. Fiscal

Studies, 30(3‐4): 435459.

International Labour Organization (ILO) 2003. Household Income and

Expenditure Statistics, Report II, Seventeenth International Conference of

Labour Statisticians, Geneva, 24 November3 December 2003.

Jolliffe, D. 2001. Measuring absolute and relative poverty: the sensitivity of

81

estimated household consumption to survey design. Journal of Economic and

Social Measurement, 27(1,2): 1–23.

Jolliffe, D., Lanjouw, P., Chen, S., Kraay, A., Meyer, C., Negre, M., Prydz,

E., Vakis, R., and Wethli, R. 2014. A measured approach to ending poverty

and boosting shared prosperity: concepts, data, and the twin goals. A World

Bank research report. Washington DC., World Bank.

Jolliffe, D. & Serajuddin, U. 2015. Estimating poverty with panel data,

comparably: an example from Jordan., Policy Research Working Paper Series,

7373, The World Bank. Washington, DC: World Bank.

Kaara, J. & Ramasawmy, S. (2008). Food data collected using acquisition and

consumption approaches with a seven-day recall method in Kenya's KIHBS

2005/2006. In R. Sibrian, Deriving Food Security Information from National

Household Budget Survey: Experiences, Achievements, Challenges, pp.69-79.

Rome, FAO.

Kant, A.K. & Graubard, B.I. 2004. Eating out in America, 1987–2000: trends

and nutritional correlates. Preventive medicine, 38(2): 243249.

Kapteyn, A. 1994. The measurement of household cost functions: revealed

preference versus subjective measures, Discussion Paper 1994-3., Tilburg

University, Center for Economic Research.

Kemsley, W.F.F. 1961. The household expenditure enquiry of the Ministry of

Labour: variability in the 1953-54 enquiry. Journal of the Royal Statistical

Society (Applied Statistics), 10(3), 117135.

Kennedy, G., Ballard, T. & Dop, M. 2011. Guidelines for Measuring

Household and Individual Dietary Diversity. FAO Nutrition and Consumer

Protection Division, FAO. 66 pp. (also available at http://www.fao.org/3/a-

i1983e.pdf).

Kirlin, J.A. & Denbaly, M. 2017. Lessons learned from the national household

food purchase and acquisition survey in the United States. Food Policy, 72: 62-

71.

Kreuter, F., McCulloch, S., Presser, S. & Tourngeau, R. 2011. The effects of

asking filter questions in interleafed versus grouped format. Sociological

Methods and Research, 40(1): 88104.

Liu, H., Wahl, T.I., Seale Jr., J.L. & Bai, J. 2015. Household composition,

income and food-away-from-home expenditure in urban China. Food Policy, 51:

97–103.

Louzada, M.L., Levy, R.B., Martins, A.P.B., Claro, R.M., Steele, E.M.,

Verly, Jr, E., Cafiero, C. & Monteirol, C.A. (2017). Validating the usage of

household food acquisition surveys to assess the consumption of ultra-processed

foods: evidence from Brazil. Food Policy, 72: 112120.

Ma, H., Huang, J., Fuller, F. & Rozelle, S. 2006. Getting rich and eating out:

82

consumption of food away from home in urban China. Canadian Journal of

Agricultural Economics/Revue Canadienne d'Agroeconomie, 54(1): 101119.

Mahalanobis, P.C. & Sen, S.B. 1954. On some aspects of the Indian national

sample survey. Bulletin of the International Statistical Institute, 34(2): 514.

Mancino, L., Todd., J. & Lin, B.H. 2009. Separating what we eat from where:

measuring the effect of food away from home on diet quality. Food Policy,

34(6): 557562.

Martirosova, D. 2008. Food data collected using acquisition and consumption

approaches with daily diaries in Armenia’s ILCS 2004. In R. Sibrian, Deriving

Food Security Information from National Household Budget Survey:

Experiences, Achievements, Challenges, pp.5968. Rome, FAO.

Maxwell, S. & Slater, R. 2003. Food policy old and new. Development Policy

Review, 21(5–6) 531553.

McWhinney, I. & Champion, H. (1974). The Canadian experience with recall

and diary methods in consumer expenditure surveys. In S. Berg, Annals of

Economic and Social Measurement, 3(2): 411437.

Meenakshi, J.V. & Ray, R. 1999. Regional differences in India's food

expenditure pattern: a complete demand systems approach. Journal of

International Development, 11(1): 4774.

Meng, T., Florkowski, W.J.,Kolavalii, S. & Ibrahim, M. 2012. Food

expenditures and income in rural households in the northern region of Ghana.

Paper for Agricultural & Applied Economics Association, Annual Meeting, 12-

14 August, Seattle, WA.

Mihalopoulos, V.G. & Demoussis, M.P. 2001. Greek household consumption

of food away from home: a microeconometric approach. European Review of

Agricultural Economics, 28(4): 421–432.

Moltedo, A., Troubat, N., Lokshin, M. & Sajaia, Z. 2014. Analyzing Food

Security Using Household Survey Data: Streamlined Analysis with ADePT

Software. Washington, DC., World Bank.

Murphy, J., Casley, D.J. & Curry, J.J. 1991. Farmers’ estimations as a

source of production data: methodological guidelines for cereals in Africa.

World Bank Technical Paper, 132. Washington, DC, World Bank.

Mutlu, S. & Gracia, A. 2006. Spanish food expenditure away from home

(FAFH): by type of meal. Applied Economics, 38(9): 1037–1047.

Nago, E.S., Lachat, C.K., Huybregts, L., Roberfroid, D., Dossa, R.A. &

Kolsteren, P.W. 2010. Food, energy and macronutrient contribution of out-of-

home foods in school-going adolescents in Cotonou, Benin. British Journal of

Nutrition, 103(2): 28188.

83

National Sample Survey Organization of India (NSSO) (2003). Results of a

pilot survey on suitability of different reference periods for measuring household

consumption. Report no. 475. New Delhi, India, Ministry of Statistics and

Program Implementation.

Neufeld, L.M. & Tolentino L. 2012. Nutritional surveillance, developing

countries. In B. Cabellero, A. Prentice and L. Allen (eds). Encyclopedia of

Human Nutrition. Third edition. Elsevier Press, Oxford.

Neter, J. & Waksberg, J. 1964. A study of response errors in expenditures data

from household interviews. Journal of the American Statistical Association,

59(305): 1855.

Oguntona, C.R. & Tella, T.O. 1999. Street foods and dietary intakes of

Nigerian urban market women. International Journal of Food Sciences and

Nutrition, 50(6):383390.

Oseni, G., Durazo, J. & McGee, K. 2017. The Use of Non-standard Units for

the Collection of Food Quantity: A Guidebook for Improving the Measurement

of Food Consumption and Agricultural Production in Living Standards Surveys.

Washington, DC, World Bank.

Paxson, C.H. 1992. Using weather variability to estimate the response of

savings to transitory income in Thailand. American Economic Review, 82(1):15–

33.

__________1993. Consumption and income seasonality in Thailand. Journal of

Political Economy, 101(1): 39–72.

Phillips, C.M., Dillon, C., Harrington, J.M., McCarthy, V.J., Kearney,

P.M., Fitzgerald, A.P. & Perry, I.J. 2013. Defining metabolically healthy

obesity: role of dietary and lifestyle factors. PLoS ONE,8(10): e76188.

Popkin, B.M. 2008. The World Is Fat –The Fads, Trends, Policies, and

Products that Are Fattening the Human Race. New York: Avery-Penguin

Group.

Popkin, B.M., Adair, L.S. & Ng, S.W. 2012. Global nutrition transition and the

pandemic of obesity in developing countries. Nutrition Reviews, 70(1): 3–21.

Poti, J.M., Duffey, K.J. and Popkin, B.M. 2014. The association of fast food

consumption with poor dietary outcomes and obesity among children: Is it the

fast food or the remainder of the diet?, The American Journal of Clinical

Nutrition, 99(1): 162–171.

Pradhan, M.P. 2001. Welfare analysis with a proxy consumption measure. TI

Discussion Paper, 01-92/2. Amsterdam, Tinbergen Institute.

__________2009. Welfare analysis with a proxy consumption measure:

Evidence from a repeated experiment in Indonesia, Fiscal Studies, 30(3/4): 391–

417.

84

Ravallion, M. 1998. Poverty lines in theory and practice, living standards

measurement study Working Paper No. 133. Washington, DC., World Bank.

Ravallion, M. & Bidani, B. 1994. How robust is a poverty profile?. World

Bank Economic Review, 8(1): 75–102.

Redman, B. 1980. The impact of women's time allocation on expenditure for

meals away from home and prepared foods. American Journal of Agricultural

Economics, 62(2): 234–237.

Rimmer, D.J. 2001. An overview of food eaten outside the home in the United

Kingdom National Food Survey and the new Expenditure and Food Survey.

Public health nutrition, 4(5b): 1173–1175.

Schündeln, M. 2017. Multiple visits and data quality in household

surveys, Oxford Bulletin of Economics, and Statistics, 80(2) 380–405.

Scott, C. & Amenuvegbe, B. 1990. Effect of recall duration on reporting of

household expenditures: an experimental study in Ghana. social dimensions of

adjustment in sub-Saharan Africa. Working Paper, 6, Washington, DC: World

Bank.

Serajuddin, U., Uematsu, H., Wieser, C., Yoshida, N. & Dabalen, A. 2015.

Data deprivation: another deprivation to end. Policy Research Working Paper;

No. WPS 7252. Washington, DC., World Bank.

Silberstein, A.R. & Scott, S. (1991). Expenditure diary surveys and their

associated errors. In P.P. Biemer, R.M. Groves, L.E. Lyberg, N.A. Mathiowetz

and S. Sudman, eds, Measurement errors in surveys. pp.303–326. Hoboken, NJ,

John Wiley & Sons, Inc.

Smith, L.C. 2013. The great Indian calorie debate: explaining rising

undernourishment during India’s rapid economic growth. IDS Working Paper

430. Brighton, UK, Institute of Development Studies.

__________2015. The great Indian calorie debate: explaining rising

undernourishment during India’s rapid economic growth. Food Policy, 50: 53–

67.

Smith, L.C., Alderman, H. & Aduayom, D. 2006. Food insecurity in sub-

Saharan Africa: new estimates from household expenditure surveys. Research

Report 146. Washington DC: International Food Policy Research Institute.

Smith L.C., Dupriez, O. and Troubat, N. 2014. Assessment of the reliability

and relevance of the food data collected in national household consumption and

expenditure surveys. IHSN Working Paper No. 8. (available at

http://www.ihsn.org/sites/default/files/resources/IHSN_WP008_EN.pdf).

Smith, L.C. & Subandoro, A. 2007. Measuring food security using

household expenditure surveys. Food Security in Practice Series. Washington

DC, International Food Policy Research Institute.

85

Statistical Institute and Planning Institute of Jamaica 1996. Jamaica Survey

of Living Conditions. (available

http://microdata.worldbank.org/index.php/catalog/2340).

Stephens, M. 2003. "3rd of the month": Do social security recipients smooth

consumption between checks?. American Economic Review, 93(1): 406–422.

Sudman, S. & Ferber, R. 1971. Experiments in obtaining consumer

expenditures by diary methods. Journal of the American Statistical Association,

66(336): 725–735.

Sujatha, T., Shatrugna, V., Rao, G.V.N., Reddy, G.C.K., Padmavathvi,

K.S., and Vidyasagar, P. 1997. Street food: an important source of energy for

the urban worker. Food and Nutrition Bulletin, 18(4).

Tandon, S. & Landes, M.R. 2011. Counting India’s food insecure is

complicated., USDA, Economic Research Service. Amber Waves, 9(4).

Technical Subgroup for the Revision of COICOP. 2018. Revised

classification of individual consumption according to purpose (COICOP 2018)

– Structure and explanatory notes. Background document for the forty-ninth

session of the United Nations, New York, 69 March. (available at

https://unstats.un.org/unsd/statcom/49th-session/documents/BG-Item3l-

Classification-2-E.pdf.)

Troubat, N. & Grünberger, K. 2017. Impact of survey design in the estimation

of habitual food consumption. The case of the 2007/08 Socio Economic Survey

of Mongolia applied to urban households. Food Policy, 72(C): 132–145.

Tucker, C. 1992. The estimation of instrument effects on data quality in the

Consumer Expenditure Diary Survey. Journal of Official Statistics, 8(1): 41–61.

Tucker, C. & Bennett, C. 1988. Procedural effects in the collection of

consumer expenditure information: the diary operation test, in Proceedings of

the Section on Survey Research Methods, pp. 256–261. Washington, DC.

American Statistical Association.

Turner, R. 1961. Inter-week variations in expenditure recorded during a two-

week survey of family expenditure. Journal of the Royal Statistical Society,

Series C (Applied Statistics), 10(3): 136–146.

United Nations Statistical Commission (UNSC) 2014. Report of the World

Bank on improving household surveys in the post-2015 development era: issues

and recommendations towards a shared agenda. United Nations Economic and

Social Council, E/CN.3/2015/10.

__________2018. UNSC. New York (https://unstats.un.org/unsd/statcom/).

United Nations Statistics Division (UNSD) 2005. Handbook on Poverty

Statistics: Concepts, Methods and Policy Use. Special Project on Poverty

Statistics, United Nations. New York (also available at

86

http://unstats.un.org/unsd/methods/poverty/pdf/UN_Book%20FINAL%2030%2

0Dec%2005.pdf).

United States Department of Agriculture (USDA) (1946). National Food

Guide. AIS-53, 1946.

United States Department of Agriculture, Economic Research Service 2011.

International food consumption patterns. Washington, DC. (available at

https://www.ers.usda.gov/data-products/international-food-consumption-

patterns.aspx).

Vandevijvere, S., Lachat, C., Kolsteren, P. & Van Oyen, H. 2009. Eating out

of home in Belgium: current situation and policy implications., British Journal

of Nutrition, 102(6): 921928.

van’t Riet., H, den Hartog, A.P. & Hooftman, D.A. 2003. Determinants of

non-home-prepared food consumption in two low-income areas in

Nairobi. Nutrition, 19(11-12): 10061012.

Wakai K. 2009. A review of food frequency questionnaires developed and

validated in Japan. Journal of Epidemiology, 19(1):111.

Wanner, N., Cafiero, C., Troubat, N. & Conforti, P. 2014. Refinement to the

FAO methodology for estimating the prevalence of undernourishment indicator.

FAO Statistic Division. ESS working paper 14-05. Rome. 35 pp. (also available

at http://www.fao.org/3/a-i4046e.pdf).

Weisell, R. & Dop, M.C. 2012. The adult male equivalent concept and its

application to Household Consumption and Expenditures Surveys (HCES).

Food & Nutrition Bulletin, 33(2): 157162.

Yen, S. & Jones, A.M. 1997. Household consumption of cheese: an inverse

hyperbolic sine double-hurdle model with dependent errors., American Journal

of Agricultural Economics, 79(1): 246–251.

You, J. 2014. Dietary change, nutrient transition and food security in fast-

growing China. in R. Jha, R. Gaiha and A.B. Deolalika, eds., Handbook on

Food, pp.204–245. Cheltenham, UK, Edward Elgar Publishing.

Zezza, A., Carletto, G., Fiedler, J.L., Gennari, P. and Jolliffe, D. 2017. Food

counts. measuring food consumption and expenditures in household

consumption and expenditure surveys (HCES). Food Policy, 72: 16.

IHSNww

w.ih

sn.o

rg

International Household Survey Network

Assessment of the Reliability and Relevance of the Food Data Collected in National Household Consumption and Expenditure Surveys

Lisa C. Smith Olivier Dupriez Nathalie Troubat

IHSN Working Paper No. 008February 2014

Assessment of the Reliability and Relevanceof the Food Data Collected in National

Household Consumption and Expenditure Surveys

Smith, Lisa C., Olivier Dupriez and Nathalie Troubat

February 2014

IHSN Working Paper No. 008

i

ii

This report was written by:

Lisa C. SmithTANGO, InternationalInternational Household Survey Network (Consultant)

Olivier DupriezWorld Bank

Nathalie TroubatFood and Agriculture Organization of the United Nations

Table of contentsTable of contents ............................................................................................................................. iiiList of figures ................................................................................................................................... vList of tables ..................................................................................................................................... iiiAcronyms ......................................................................................................................................... viAcknowledgements .......................................................................................................................... viiExecutive summary ......................................................................................................................... viii1. Introduction ....................................................................................................... 12. Uses and users of the food data collected in national HCES ................................ 2

2.1 Measuring poverty ................................................................................................................. 22.2 Measuring food security ........................................................................................................ 42.3 Compiling food balance sheets .............................................................................................. 52.4 Informing food-based nutrition interventions ..................................................................... 62.5 Calculating consumer price indices ...................................................................................... 72.6 Informing national accounts statistics .................................................................................. 82.7 Meeting private sector information needs ............................................................................ 92.8 Summary ............................................................................................................................... 9

3. Assessment of the reliability of the food data ..................................................... 103.1 Recall period for at-home food data collection ..................................................................... 113.2 Modes of acquisition for which at-home food data are collected ......................................... 133.3 Completeness of enumeration of foods acquired or consumed ........................................... 143.4 Comprehensiveness of the at-home food list ........................................................................ 163.5 Specificity of the at-home food list ........................................................................................ 193.6 Quality of data collected on food consumed away from home ............................................. 203.7 Accounting for seasonality of food consumption patterns ................................................... 243.8 Summary ............................................................................................................................... 25

4. Assessment of the relevance of the food data ..................................................... 264.1 Calculation of key indicators employed by multiple users ................................................... 26

4.1.1. Measuring quantities of foods consumed ...................................................................... 264.1.2 Calorie consumption ....................................................................................................... 294.1.3 Important measurement issues to keep in mind ........................................................... 31

4.2 Relevance of the food data for various uses .......................................................................... 334.2.1 Measuring poverty .......................................................................................................... 334.2.2 Measuring food security ................................................................................................. 344.2.3 Informing the compilation of food balance sheets ........................................................ 374.2.4 Informing food-based nutrition interventions .............................................................. 394.2.5 Calculating consumer price indices ............................................................................... 414.2.6 Informing national account statistics ............................................................................ 434.2.7 Meeting private sector information needs ..................................................................... 44

4.3 Summary ............................................................................................................................... 445. Conclusions and recommendations for improving reliability and relevance ...... 46

5.1 Recommendations for improving reliability ......................................................................... 46

iii

iv

5.2 Recommendations for improving relevance ......................................................................... 465.3 General recommendations .................................................................................................... 475.4 Key outstanding research questions ..................................................................................... 485.5 Conclusion ............................................................................................................................. 48

Appendix 1. Methodology of the assessment .......................................................... 49A1.1 Formulation of the assessment criteria ............................................................................... 49A1.2 Assessment form development, external review and pre-testing ...................................... 49A1.3 Surveys and documentation employed ............................................................................... 49A1.4 Data analysis ........................................................................................................................ 50

Appendix 2. List of assessment surveys .................................................................. 51Appendix 3. Basic Food Groups with list of some common food items ................... 54Appendix 4. Classification of Individual Consumption according to Purpose (COICOP) - Extract ................................................................................................. 56References ........................................................................................................................................ 63

IHSN Working Paper No. 008

February 2014

v

List of figuresFigure 1: Regional and time distribution of the assessment surveys .......................................... 10Figure 2: Recall period for at-home food data collection ............................................................ 12Figure 3: Modes of acquisition for which food data are collected ............................................... 14Figure 4: Comprehensiveness of the at-home food list: Food groups represented .................... 17Figure 5: Percent of surveys meeting the food list comprehensiveness criteria ......................... 19Figure 6: Percent of surveys meeting the food list specificity criteria ......................................... 20Figure 7: Typology of food away from home ................................................................................ 22Figure 8: Quality of food consumed away from home data collection ........................................ 24Figure 9: Percent of assessment surveys taking seasonality into account .................................. 25Figure 10: Percent of assessment surveys meeting minimum reliability criteria ....................... 25Figure 11: Percent of surveys for which absolute poverty can be measured ............................... 34Figure 12: Percent of surveys for which various food security indicators can be measured ...... 36Figure 13: Percent of surveys for which indicators informing food balance sheets can be measured ....................................................................................................................................... 39Figure 14: Percent of surveys for which indicators informing food-based nutritional interventions can be measured .................................................................................................... 40

List of tablesTable 1: Completeness of enumeration of food acquisition and/or food consumption .............. 15Table 2: Comprehensiveness and specificity of the at-home food list ........................................ 18Table 3: Food away from home data collection ........................................................................... 22Table 4: Indicators needs for various uses of the food data in Household Consumption and Expenditure Surveys ..................................................................................................................... 26Table 5: Percent of surveys for which it is possible to calculate metric quantities of foods acquired or consumed (at home) ................................................................................................. 29Table 6: Percent of surveys for which it is possible to calculate calorie consumption ............... 30Table 7: Collection of data on food given to non-household members (Percent of surveys) ..... 32Table 8: Comprehensiveness and specificity of survey food lists in relation to the Food Balance Sheet food groups ........................................................................................................... 38Table 9: Coverage and specificity of the food list by COICOP class ............................................ 43Table 10: Summary of results on relevance, by use and indicators needed (Percent of surveys for which indicators can be calculated) .......................................................................... 46

Assessment of the Reliability and Revelance of the Food DataCollected in National Household Consumption and Expenditure Survey

vi

AcronymsAFINS Assessing Food InsecurityAME Adult male equivalent BFG Basic Food GroupBLS Bureau of Labor Statistics (USA)CBN Cost of basic needsCOICOP Classification of Individual Consumption According to PurposeCPI Consumer price indexDFID Department for International Development (UK)EC European CommissionFAO Food and Agriculture OrganizationFBS Food balance sheetFCAFH Food consumed away from homeFCT Food composition tableFEI Food energy intakeGDP Gross domestic productHBS Household budget surveyHCES Household consumption or expenditure surveyHIES Household income and expenditure surveyIHSN International Household Survey NetworkIFPRI International Food Policy Research InstituteILO International Labour OrganizationIMF International Monetary FundIRD Institut de Recherche pour le DévelopementMDGs Millennium Development GoalsMENA Middle East and North AfricaNGO Non-governmental organizationNSO National statistics officeOECD Organisation for Economic Co-operation and DevelopmentPPP Purchasing Power ParitiesSNA System of National AccountsSUT Supply-and-use tableUNSD United Nations Statistics Division UNU United Nations UniversityUSDA United States Department of AgricultureWB World BankWEF World Economic ForumWHO World Health Organization

IHSN Working Paper No. 008

February 2014

1

1. IntroductionMost countries in the world periodically collect data on household consumption or expenditure through sample surveys. Household budget surveys (HBS) and household income and expenditure surveys (HIES) are conducted primarily to provide input to the calculation of consumer price indices (CPI) or the compilation of national accounts. In developing countries, national-ly-representative data on household consumption or expenditures are also obtained from various types of socio-economic or living standards surveys conducted to measure and monitor poverty or provide data for in-forming poverty reduction policies. This report refers to this diverse set of surveys as household consumption and expenditure surveys (HCES).

Increasingly, statistical agencies that implement HCES disseminate the survey microdata. When well documented HCES microdata are made easily acces-sible, they are extensively used by secondary analysts, often for purposes other than the ones pursued by the primary investigators. This re-purposing of data offers the potential to add much value to datasets, as it ex-tends and diversifies the uses of the data at no cost to the data producer. Feedback provided by an enlarged community of analysts can help data producers in-crease the reliability and relevance of their surveys.

Considering this growing and diverse community of users, the issue of data quality takes on a new di-mension. Survey design and methods differ consider-ably across countries--and sometimes over time within countries. To what extent do HCES provide reliable and relevant data for both their traditional purposes and for new, additional ones? And if quality issues are identified, how can they be addressed to better meet the needs of users? Data collection is expensive, and puts a high burden on respondents. It is the duty of statistical agencies that implement such surveys to maximize the return on their investments by making data as reliable and relevant as possible. And it is the role of the inter-national statistical community to contribute to the de-velopment of guidelines and recommendations to sup-port the improvement of these surveys.

Under the auspices of the International Household Survey Network (IHSN), the World Bank and the Food and Agriculture Organization of the United Nations (FAO) undertook a large-scale assessment of HCES conducted in low and middle income countries. This project had two key objectives. The first was to devel-op a method to assess the reliability and relevance of

food consumption data as collected through HCES. The second one was to implement this method to conduct a large-scale assessment and report on the relevance and reliability of the data contained in a large number of surveys to identify opportunities for improvements.

The first step in developing the assessment consisted of identifying the main categories of uses and users of household food consumption data: poverty analysts, national accountants and CPI compilers, food security experts, planners of food-based nutrition interventions such food fortification programs, and the private sec-tor. The next step was to define the criteria for assessing reliability and relevance of the data for each user. An assessment form was then developed, which was used to compile information on the design of food consump-tion or expenditure survey modules from 100 coun-tries. The reliability and relevance of each survey’s food consumption (or expenditure) module(s) were then assessed using this meta-database. Reliability refers to the capacity of the survey to provide a “true” or “accu-rate” measure of household consumption or expendi-tures. Relevance refers to the fitness of the survey data for a specific purpose.

The assessment is based purely on a review of survey questionnaires and related documentation. Clearly, the reliability and relevance of survey data also depends on the quality of the sample frame and sampling design, training and supervision of interviewers, the data entry and editing work, and the collaboration of respondents. These factors are however not covered in this study. Also, the assessment is limited to food consumption, although all HCES cover a broader spectrum of goods and services. The reason for focusing on this subcom-ponent of household consumption is that many of the newer users are primarily interested in the food data. Further, global forces are leading to changes in dietary patterns that raise new reliability and relevance issues specifically related to food. Another assessment, cover-ing non-food household expenditures, is being under-taken separately by the World Bank and IHSN, with different partners.

The report is structured as follows. Chapter 2 pro-vides an overview of the key uses and users of HCES food consumption data. Chapters 3 and 4 report re-spectively on the reliability and relevance of 100 survey questionnaires reviewed with respect to the needs of the uses and users identified in Chapter 2. Conclusions and recommendations are formulated in Chapter 5.

Assessment of the Reliability and Revelance of the Food DataCollected in National Household Consumption and Expenditure Survey

10

3. Assessment of the reliabil-ity of the food data

In this chapter, the ba-sic reliability of the food data collected in current national HCES’s is as-sessed. “Reliability” is defined here as the de-gree to which a survey collects data on the ac-tual or “true” food con-sumption and/or expen-ditures of households in a country’s population.

The assessment is based on the most recent HCES from each devel-oping country for which sufficient documenta-tion with which to con-duct the assessment was available to us. Only na-tionally representative surveys are included in the as-sessment. The final set of 100 surveys thus represents a sample of recent, sufficiently-documented, nationally-representative surveys conducted in developing coun-tries. Appendixes 1 and 2 contain respectively a detailed account of the implementation of the assessment and a list of the surveys.

The design of HCES surveys needs to meet certain criteria for the data to provide the reliability required by the data user. Meeting these minimum criteria, and thus ensuring that the data collected are reasonably accurate, is a concern of all us-ers of the food data in HCES’s, from national accounts statisti-cians to planners of nutrition interventions.

Figure 1 reports the regional breakdown and years of data collection of the surveys. The highest number (40) is from Sub-Saharan Africa, and the lowest (5) from the Middle East and North Africa region (MENA). Overall, 70 percent of the developing countries are represented, with South Asia having the highest representation—all eight of its countries—and MENA the lowest.15 The ear-liest year of data collection for a survey is 1993 (Guinea-Bissau), and the latest is 2012 (Vanuatu). The majority of the surveys were administered between 2005 and 2009.

An attempt was made to identify clear, quantita-tive cut offs for defining assessment criteria in order to avoid ambiguity and maintain objectivity. While these cut-offs are in many cases by necessity based on intui-tive judgments rather than scientific evidence, they are intended to serve as a point of reference for prioritiz-ing areas in need of improvement and for tracking re-liability and relevance across countries and over time. In future studies it will be useful to conduct sensitiv-ity analyses to determine the robustness of the cut-offs with respect to accurate measurement of indicators of interest.

15 Sub-Saharan Africa is represented by 85% of its countries, East Asia and the Pacific by 54%, Middle East and North Africa by 39%, Europe and Central Asia by 78%, and Latin America and the Caribbean by 55%. World Bank country and lending groups are used for regional classifications (World Bank 2012).

0

10

20

30

40

50

60

40

Sub-

Saha

ran

Afric

a

13

Sout

h As

ia

5.0

East

Asia

and

the

Pacif

ic

5

Euro

pe a

nd C

entra

l Asia

18

Latin

Am

erica

and

the

Carib

bean

8

30

58

5

16

7

Midd

le Ea

st a

nd N

orth

Afri

ca

2000

-200

4

Befo

re 2

000

2010

and

late

r

2005

-200

9

IHSN Working Paper No. 008

February 2014

11

The assessment is based on seven areas of investiga-tion:

• Recall period for at-home food data collection;• Modes of food acquisition included (food

purchases, food consumed from own production, and food received in kind);

• Completeness of enumeration of either food acquisition or food consumption;

• Comprehensiveness of the at-home food list;• Specificity of the at-home food list;• Quality of data collected on food consumed

away from home; • Whether seasonality in food consumption

patterns is taken into account; and

For each area, a set of minimum criteria for basic reliability is established and then tested using the data collected on the country assessment forms. Meeting these criteria, and thus ensuring that the data collected are reasonably accurate, is a concern of all users of the food data in HCES’s, from national accounts statisti-cians to planners of nutrition interventions. It should be emphasized that the criteria set here are minimum criteria, and even when met, many leave ample room for improvement.

The following methodological issues are also dis-cussed in the chapter:

• Whether metric (or other standard) quantities of foods consumed are provided.

• Calculation of calorie consumption;• Calculation of edible portions and the nutrient

content of foods ;• Calculation of per-capita indicators and

nutrient insufficiencies and the importance of collecting data on the number of food partakers; and

• Use of acquisition data to measure consumption.

3.1 Recall period for at-home food data collection

A wide variety of recall periods are used in national surveys, ranging from 1 to 365 days. The pros and cons of each are discussed, and a minimum standard of two weeks or less proposed for HCES data to be considered reliable. Using this benchmark, for 70 percent of the assessment surveys the data collected can be considered reliable with respect to the recall period.

The recall period for food data collection is the amount of time over which respondents are asked to remem-ber their food acquisitions and/or consumption.16 The longer the recall period, the more difficult it is for re-spondents to make accurate reports. A recall period that is too long leads to “recall error” in which true acquisition or consumption is under reported. On the other hand, the shorter the recall period the more likely a respondent is to include events that occurred before the recall period. Such “telescoping error” leads to over-reporting.17 The relatively high frequency and small size of food (versus non-food) acquisitions/consump-tion means that recall error and thus under-reporting is believed to be more of an issue than telescoping error (Deaton and Grosh 2000).

There is no obvious or commonly agreed-upon num-ber of days that a recall period should be for reliable measurement. A recall period of no more than two weeks, however, is within minimally safe limits, as con-firmed by studies showing considerably lower expendi-ture estimates when 30 days (or one month), which is the next highest recall period in use, is employed.18 In this assessment, two-weeks will therefore be considered the longest recall period to obtain reliable data. A one-week period may have an advantage over two weeks in that it is easier for respondents to remember what happened since the same day last week (for example, Monday). The day of the week can help set up a specific “memory post” at the beginning of the recall period in respondents’ minds, bounding the period.19 The exact point in time two weeks prior to the day a survey is ad-ministered is likely to be more fuzzy, although a pre-liminary visit two weeks before the interview can help.

Among the 100 surveys included in the assessment, 33 percent employed multiple recall periods. The peri-

16 The survey’s recall period should be distinguished from its “reference period”. The latter is the total amount of time for which respondents are asked to report their food acquisitions or consumption. The only circumstance under which the recall and reference periods differ is when households are interviewed more than one time in consecutive visits. For example, households may be visited four times to ask about their food acquisitions in the last seven days, giving a recall period of seven days and a reference period of 28 days.

17 Beegle et al. (2012) provide a review of the literature on the influence of the recall period on expenditure estimates as well as recent evidence from an experiment undertaken in Tanzania.

18 For a commonly-cited example, see the description of an experiment undertaken using India’s HCES in Gibson (2005).

19 A visit a week before the interview (as has been implemented in many Living Standards Measurement Study surveys) in which preliminary data are collected, helps to set this memory post.

Assessment of the Reliability and Revelance of the Food DataCollected in National Household Consumption and Expenditure Survey

12

od can vary by population (e.g., different for urban and rural areas), by source of acquisition (e.g., purchases versus home produced food), and/or by type of food. In these cases, for the purposes of judging reliability the maximum recall period employed is considered. For example, if the recall period for food purchases is seven days but for home-produced food consumed it is one month, then one month is used. In a few cases multiple recall periods were employed for all foods for which data are collected, a design implemented for research purposes. For these surveys, reliability is assessed us-ing the minimum recall period. For example, if data were collected using both a seven-day and one-month recall period for all foods then seven days is used for this assessment. The recall period is considered to be one day for diary surveys.20

Figure 2 reports on the percent of assessment surveys employing various recall periods. The most common is less than one week, utilized by 41 percent of countries. Among these surveys, the most common recall period is one-day, because the large majority use the diary method. Nearly one-quarter of the countries used recall periods of one-week, five percent used two weeks, 21 and seven percent used one month.

A full 30 percent of the assess-ment surveys employed recall pe-riods greater than two weeks.22 One-third of the surveys that did not meet the minimum reliabil-ity criterion employed a 365-day recall period in the context of the “usual month” approach. Here respondents are asked to recall their food acquisitions and/or consumption for

20 In some cases the diaries of households without a literate member are completed by interviewers for time periods greater than one day. While as part of this assessment an attempt was made to collect information on how long this period was and for what percent of households, in most cases the information was not available in the survey documentation.

21 Some of the “two-week” recall surveys actually had 15-day periods, presumably to represent half a month.

22 Four of these surveys had an undefined recall period which could potentially extend beyond two weeks, because respondents were either asked to report on the expenditures/quantities the “last time” a food was acquired or to simply report on how often a food was acquired, with options being daily, weekly, monthly or yearly and the usual expenditure/quantity each time.

a typical month in the last year.23 While this method has the intended advantage of obtaining estimates of usual consumption specific to each household rather than only for population groups, the length of the pe-riod over which respondents are asked to recall is un-reasonably long for accurate estimation.24

Overall, the percent of the assessment surveys for which the data collected can be considered reliable with respect to the recall period used is 70. It should be kept in mind that all of the diary surveys, having a one-day recall period, meet the criterion.

Traditionally HCES were designed to collect data on food acquired for consumption rather than food con-sumed itself, thus the titles “Household Expenditure

Survey” or “Household Budget Survey” (United Na-tions 1989). Today, more than half collect data on food consumed, whether in conjunction with food acquired or alone (see Section 3.3). Collecting food consumption data through HCES survey instruments poses new is-sues for reliability with respect to the recall period for data collection because of the additional cognitive bur-

23 Two of the assessment countries employed a “usual week” approach, one with a recall period of six months and another with a recall period of one year.

24 For recent evidence, Beegle et al. (2011) find that usual month food expenditure estimates for a sample of households in Tanzania are considerably lower than those from a 7-day recall. The authors point out that this difference is partially due to the more difficult cognitive burden required for reporting usual month information.

0

10

20

30

40

50

60

70

80

90

100

41.0

Less than one week

24.0

One week

5.0

Two weeks

23.0

Greater than one week

70.0

Less than or equal

to two weeks

7.0

One month

Note: N=100 surveys

IHSN Working Paper No. 008

February 2014

13

den of remembering the behaviors of more people (in fact, all household members since, hopefully, all house-hold members eat, versus only the food acquirers) and more events (eating occasions versus food acquisition occasions). Further, in the case of interview surveys, re-spondents must remember the wide mix of foods that can be combined into prepared dishes, the latter which are likely to be the focus of respondents’ memory, rath-er than food-specific ingredients as they are acquired (Smith and Subandoro 2006). The nutritional science literature on the collection of food consumption data recommends a recall period of no more than 24 hours25 yet, as seen in this section, the majority of surveys (near 60 percent) use a longer recall period. The reliability of the consumption data collected for these longer recall periods must be the subject of future research.

3.2 Modes of acquisition for which at-home food data are collected

Inclusion of the following three sources from which food can be acquired for at-home consumption is cru-cial for reliable measurement of both food acquisition and consumption using HCES’s:

(1) Market purchases;26 (2) Food consumed from households’ own

production; and(3) Food received in-kind (wages received in

kind, social transfers in kind, or gifts).

25 In their guide to measuring food consuming Swindale and Ohri-Vachaspati (2005) write that “Information on household food consumption should be collected using the previous 24-hour period as a reference (24-hour recall). Lengthening the recall period beyond this time often results in significant error due to faulty recall” (p. 4). Ferro-Luzzi (2003) concurs that “The 24-hour recall method relies on the subject’s capacity to remember what they have eaten. As memory declines rapidly beyond one day, the recall method usually retrieves information only on the previous day’s consumption” (p. 105).

26 Barter is sometimes included as a fourth source (e.g., United Nations 2009). However most surveys do not collect data on barter separately, instead considering it part of purchases.

For many users, it is important that information on the main possible modes of food acquisition (purchases, own produc-tion, in-kind receipts) be collected in HCES. Overall most sur-veys comply with this requirement, with the most problematic being in-kind receipt, which are not collected in 14 percent of the surveys reviewed for the assessment. In some cases sur-veys do not allow specifying multiple sources for each food item, which is a problem for some uses.

Obtaining food through market purchases is now widespread throughout the world and is the prominent form of food acquisition in many locations, especially urban areas. In many countries, a considerable share of households obtain some of their food from their own production, whether from crop fields, home gardens, or orchards. This category also includes wild food gath-ered and consumed, fish and seafood fished or gath-ered, and the consumption of the meat of domestic ani-mals reared by households. It is also quite common for households, especially developing-country households, to obtain some of their food in kind, whether in the form of gifts from other households, payments from an em-ployer, or public or private assistance. For the purposes of this assessment, the food data collected in a survey is considered to be unreliable if any of these three sources is excluded from data collection.

Figure 3 shows the percentage of assessment surveys for which data were collected for each source, as well as for all three sources.27 All of the assessment surveys collected data on food purchases. Almost all surveys also collected data on food consumed from own pro-duction, with just four exceptions. The only source for which a noticeable percent of countries did not collect data (14 percent) is in-kind receipts of food. Overall, 85 percent of countries collected data on all three sources, leaving 15 percent not meeting the minimum reliability criteria in this area.

For some types of analysis, such as calculating CPIs, it is important that data be collected individually on the three food sources. The assessment found that for many surveys the data were collected in such a way that the quantities and/or expenditures on foods obtained from the three sources could not be distinguished. This was the case for 13 of the 84 surveys for which data were col-lected on all of the three sources. In some cases respon-dents were asked to report on consumption of home-produced food and in-kind receipts combined, with no distinction made between the two. In others, respon-dents were asked to specify only one single source for the food item acquired or consumed, with no allowance for acquisition from more than one source, thus effec-tively ruling out accurate enumeration by source. In still other surveys, respondents were asked to identify the source of acquisition but could choose a combined source, such as “both purchased and home produced”, again ruling out individual enumeration. Finally, two

27 The analysis for this section could be undertaken only for 98 of the assessment countries. For the remaining two it was not possible to determine whether the three sources were included in the data collection from the available documentation.

Assessment of the Reliability and Revelance of the Food DataCollected in National Household Consumption and Expenditure Survey

14

surveys gathered information on consumption by asking about con-sumption from purchases, home-produced food and food received in kind over the recall period (the usual sources), but also from “own stock,” which includes all food ac-quired before the recall period. Therefore it is not possible to break down the consumption that came from the three sources separately.

Note that for surveys collecting data individually on in-kind food received, very few enumerated all of the various sources separately so as to obtain a full accounting. Data were collected specifically and in-dividually on “gifts” for 62 percent of countries for which in-kind food received was enumerated individu-ally, on in-kind payments from employers for 25 per-cent of countries, and on food assistance received for 21 percent. Four surveys from Latin America and the Caribbean collected data on food received from house-holds’ own businesses. For countries where any of these sources are important modes of acquisition, their ex-clusion could lead to substantial under-reporting.

3.3 Completeness of enumeration of foods acquired or consumed

As noted in the introduction, modern HCES’s intend to collect data on either the foods acquired by house-holds for consumption or directly on foods consumed. Among the assessment surveys, 41 percent collected data solely on food acquisition, 26 percent solely on food consumption, and the remaining 33 percent on

It is important for the analyst to have clarity on whether sur-veys are collecting data on food acquisition, consumption, or both, and for the survey to collect data according to the stated goal. Only the food intended for consumption or consumed must be included and not additional food (e.g. by not mistak-ing agricultural produce harvested for consumption.) The as-sessment found 25 percent of the surveys to be problematic in some respect, all of them being interview surveys (diaries appear to be immune by problems in this domain).

both (see Table 1).28 Food acquisition data were more likely to be collected as part of diary surveys than in-terview surveys, whether exclusively or in conjunction with food consumption data.

For the food data in HCES to be reliably collected, a full accounting of all acquired food intended for con-sumption or that was consumed over the recall period must take place. Additionally, only the food intended for consumption or consumed must be included and not additional food. The following exclusion and in-clusion accounting errors can plague the collection of HCES food data:

(1) Acquisition surveys: Rule-out leading question on consumption. If a leading or “filter” question on consumption of each food item over the recall period is answered “no,” it rules out collection of further data on the acquisition of the food. In this case, respondents are first asked whether or not they consumed each food item in the food list for a recall period up to a year be-fore the time of the survey. Then they are asked how much was purchased, consumed from own production, and/or received in kind over the survey recall period for food data collection. If the respondent answers “no” to the leading question, however, and the leading question recall period is the same (or close to) the recall period

28 The surveys for which acquisition (consumption) data were collected for both food purchases and in-kind receipts were classified as acquisition (consumption) surveys. Those for which both acquisition and consumption data were collected for either food purchases or in-kind receipts, or for which consumption data were collected for purchases and acquisition data for in-kind receipts and vice versa, were classified into the “both” category.

0

10

20

30

40

50

60

70

80

90

100 100.0

Food purchases

96.0

Home-produced food consumed

86.0

In-kind receipts of food

85.0

All three resources

Note: N=100 surveys

IHSN Working Paper No. 008

February 2014

15

for food data collection, her or his household receives a zero for acquisitions of the food item regardless of whether or not it was acquired. This leads to system-atic underestimation of the quantities and/or expendi-tures on food acquired. A rule-out leading question on consumption is considered to be a problem when the two recall periods are less than or equal to two months apart. Note that this issue does not afflict diary surveys because there is no pre-listing of foods to rule out.

(2) Acquisition surveys: Rule-out, short-recall-peri-od leading question on acquisition. Here, if answered “no”, a short-recall-period leading question on acqui-sition of each food item rules out collection of further data collection on the acquisition of the food over the (longer) survey recall period. In this case respondents are first asked whether or not they acquired each food item over the short recall period (e.g., two weeks). Fur-ther information is collected on the acquisitions of the food for the longer recall period for food data collection only for those food items that were acquired over the shorter recall period. This leads to underestimation of mean food acquisition for the population.

(3) Acquisition surveys: Rule-out leading question on food purchases. In this case if a respondent reports that the household did not purchase any of one food item, then no further information is collected on that food item. Since home-produced or in-kind receipts of the food are left out, this problem also leads to under-estimation of mean food acquisition for the population.

(4) Data collected on food harvested rather than food consumed from home production. When this er-ror occurs, the quantities and/or expenditures on food acquired include those entering into the households’ production stocks – not the household pantry for im-mediate consumption – and are systematic overesti-mates of food consumed from home production.

(5) Ambiguity about whether to report on acquisi-tion or consumption. The question asked of respon-dents does not make it clear whether they are expected to report on their acquisitions of each food item or con-sumption of each food item over the recall period. This problem could pertain to food purchases, food received in-kind or both (but not home produced food con-sumed) and leads to inaccuracies in calculation of the mean acquisition or consumption for the population as well as measures of inequality.

(6) Routine month surveys: Ambiguity about wheth-er respondents should report on the routine month in the recall period or only those months in which any food item is consumed. In many routine month sur-veys respondents are first asked to report on the num-ber of months in the last year in which each food item was consumed. Immediately following, they are asked about the usual or average amount per month. Some questionnaires, however, fail to specify whether or not the average should be for those months in which it was consumed or for any month in the last year. When this type of accounting error occurs, some households may report on the former and some the latter leading to over- or under- estimation of their consumption of any

Table 1: Completeness of enumeration of food acquisition and/or food consumption

Interview Diary All

(percent)

Whether acquisition or consumption data are collected

Acquisition 36.1 48.7 41.0Consumption 36.1 10.3 26.0Both 27.9 41.0 33.0

Problems of incomplete enumeration

Rule-out leading question on consumption 13.1 0.0 8.0Rule-out, short-recall-period leading question on acquisition 3.3 0.0 2.0Rule-out leading question on food purchases 1.6 0.0 1.0

Own production question refers to food harvested rather than consumed 3.3 0.0 2.0Ambiguity whether to report on acquisition or consumption 6.6 5.1 6.0"Usual month" surveys: Ambiguity whether to report consumption in any month or months with positive consumption 13.1 0.0 8.0

Percent of surveys with problems of incomplete enumeration 37.7 5.1 25.0

Note: N= 100 surveys.

Assessment of the Reliability and Revelance of the Food DataCollected in National Household Consumption and Expenditure Survey

16

food item for which a positive number of months was reported for the initial question.

As can be seen in Table 1, 11 percent of the assess-ment surveys suffer from the use of the three types of rule-out leading questions. The collection of data on food harvested rather than food consumed from home production is a relatively rare problem from which only two percent of the surveys suffer. A full 14 percent of the surveys had problems of ambiguity in what is to be reported, which likely leads to incomplete enumeration for some households. The problem of ambiguity in ex-pected reporting for routine month surveys was identi-fied in eight percent of the surveys. Overall, 25 percent of the surveys did not meet the reliability criterion for completeness of enumeration, that is, they were af-flicted by some of the identified problems of incomplete enumeration. Note that the large majority of the sur-veys that have these types of accounting problems are interview surveys.

3.4 Comprehensiveness of the at-home food list

Equally important for reliable collection of food data in HCES’s is that data are collected on all of the types of foods and beverages that make up the modern hu-man diet. This is especially so given that urbanization, globalization and trade openness are leading to con-sumption of a wider variety of foods than in the past when populations tended to rely on foods that could be grown locally. These processes are also leading to greater consumption of processed foods (Popkin, Adair and Ng 2012), defined as “Any food other than a raw ag-ricultural commodity and includes any raw agricultural commodity that has been subject to processing, such as canning, cooking, freezing, dehydration, or milling” (USDA 1946). Because the food modules of developing-country household expenditure surveys were originally set up to collect data on the acquisition of individual foods destined for in-home preparation (Smith 2012), this poses a challenge to countries employing the inter-

As diets evolve, sometimes, quite rapidly, it is important for survey designers to keep up with the changes by updating the food lists. Benchmarks for reliability in this domain refer to the presence of foods from all the main food groups, an adequate representation of processed foods, and the fact that the list should not include non-food items. Overall, 72 percent of the assessment surveys met the criteria set in these three domains.

view method of data collection to continually update their food lists.

To judge the comprehensiveness of survey food lists a set of 14 “basic” food groups that represent the types of foods making up the contemporary human diet can be used as starting point. The Basic Food Groups (BFGs) are listed in Table 2. Common food items in each group are listed in Appendix 3. Each survey’s food list is used to catalogue the number of food items in these groups. For interview surveys, the list is printed directly on the questionnaire. For diary surveys, the actual number of food items recorded by all sample respondents can run into the thousands, far too high for most types of data analyses. In the process of data analysis the detailed recorded items are thus categorized into broader items for inclusion in the actual data set. The food list used for this assessment is this broader list of items, or the “analytical” food list, with the rationale that it is what is eventually used for analysis. Even with this shorter food list, the mean number of food items represented in the diary surveys, at 369, is far higher that of the interview surveys, which is 102 (see Table 2),29 reflecting that fact that the diary method makes it possible to itemize food items much more specifically. Note that the number of food items varies greatly across the assessment surveys, ranging from a low of 19 to a high of 5,407.

Figure 4 shows the percent of assessment surveys that include foods in each BFG. At 12 percent, alcoholic beverages are a group that is left out of a significant number of surveys. While alcohol is a sensitive issue in some countries with large Muslim populations, its ex-clusion is not limited to surveys from these countries. Also notable is that the food group “Eggs” is not repre-sented in four percent of surveys.

29 The number of food items for the Brazil survey (a diary survey), at 5,407, is far higher than the country with the next lowest number, which is 677. When Brazil is excluded from the calculation, the mean number of food items overall falls to 150 and for the diary surveys to 229.

IHSN Working Paper No. 008

February 2014

17

Fish

and

seaf

ood

Vegit

ables

Root

s, tu

bers

& p

lanta

ins

Pulse

s, nu

ts a

nd se

eds

Milk

and

milk

pro

duct

s

Mea

t, po

ultry,

and

offa

l

Cond

imen

ts, s

pices

& b

aking

age

nts

Alco

holic

bev

erag

es

Non-

alcoh

olic

beve

rage

s

Suga

r, jam

, hon

ey, c

hoco

late

& sw

eets

Cere

als

Fruit

s

Oils

and

fats

All 1

4 fo

od g

roup

s

Eggs

0

20

40

60

80

100100 97 99 98 100 100 98 100 9899 96 97

8881.4

99

Figure 4: Comprehensiveness of the at-home food list: Food groups represented

Note: N=96 surveys

Assessment of the Reliability and Revelance of the Food DataCollected in National Household Consumption and Expenditure Survey

18

Three criteria are combined to judge the comprehen-siveness of survey food lists. The first is that all 14 BFGs must be represented by at least one food item. As can be seen in Table 2, just over 80 percent of the assessment surveys meet this criterion. The percentage rises to near 100 among the diary surveys.

The second reliability criterion relates to the per-centage of foods that are processed, including prepared dishes. Five of the food groups listed in Table 2 contain only or almost all processed food items: Milk and milk products, oils and fats, sugar, jam, honey, chocolate and sweets, condiments, spices and baking agents, and both beverage groups. On average, these foods alone make up roughly 30 percent of the total foods. A cut-

off is imposed: At least 40 percent of food items must be processed as a reliability criterion, which allows for some processed items to be included in the other food groups (e.g., bread and other baked goods in the cereals group). The large majority of the surveys, 87 percent, meet this criterion, indicating that many countries have been updating their HCES food lists over time.

The final food list comprehensiveness reliability criterion assessed here is the “food exclusivity” of the list, that is, the food list must include only foods and no other commodities. Among the assessment countries, there are only three for which this criterion is not met (with the non-exclusive item being “alcohol and tobac-

Table 2: Comprehensiveness and specificity of the at-home food list

Interview surveys

Diary surveys All

Mean number of food items 102 369 204Minimum 19 44 19Maximum 411 5407 5407

Comprehensiveness of the food list (Percent of surveys)All 14 Basic Food Groups are represented 71.7 97.3 81.4At least 40% of food items are processed 86.7 86.5 86.6Food items are all food-exclusive 96.7 97.3 96.9

Specificity of the food list

(A) Minimum number of food items in each Basic Food Group a/

Cereals (5) 95.0 100.0 96.9Roots, tubers and plantains (5) b/ 46.7 56.8 50.5Pulses, nuts and seeds (5) 51.7 70.3 58.8Vegetables (10) 63.3 91.9 74.2Fruits (10) 51.7 86.5 64.9Meat, poultry, and offal (5) 80.0 97.3 86.6Fish and seafood (5) 35.0 89.2 55.7Milk and milk products (5) 61.7 94.6 74.2Eggs (1) 93.3 100.0 95.9Oils and fats (5) 46.7 89.2 62.9Sugar, jam, honey, chocolate and sweets (5) 50.0 91.9 66.0Condiments, spices and baking agents (10) c/ 28.3 70.3 44.3Non-alcoholic beverages (5) 65.0 97.3 77.3Alcoholic beverages (5) 35.0 78.4 51.5

(B) Minimum number of food items in at least ….

10 food groups 46.7 89.2 62.911 food groups 28.3 86.5 50.512 food groups 20.0 70.3 39.213 food groups 10.0 54.1 26.814 (all) food groups 6.7 37.8 18.6

(C) Less than 5% of food items span the basic food groups 71.7 86.1 77.1

a/ The minimum number of food items for each group is given in parentheses.b/ Includes potatoes.c/ Includes vegetable-based stimulants (e.g., cola nuts).Notes: N=100 for the information presented on the number of food items; N=96 for that presented on comprehensiveness and specificity.

IHSN Working Paper No. 008

February 2014

19

co” in two cases and “tobacco and kola nuts”30 in one), leaving 97 percent of surveys meeting the criterion.

Figure 5 summarizes the percentage of countries meeting the three assessment criteria of comprehen-siveness, and the percentage meeting them all. Overall, 72 percent of the assessment surveys met all three cri-teria.

3.5 Specificity of the at-home food list

Specificity of the food list refers to the degree of detail with which food items are classified. For an interview survey, in which foods are pre-listed, the goal is to in-clude a sufficient number of food items to jog respon-dents’ memories of what has been acquired and/or con-sumed that is applicable to all households in a popula-tion. This population must include countries’ better-off urban households whose members tend to eat a very wide variety of foods in a variety of forms, including raw, processed, prepared and packaged. There is an ac-curacy trade-off involved, however, because very long food lists can quickly lead to respondent and interview-er fatigue (Beegle et al. 2012). One way that surveys can bridge this tradeoff is to list the most common food

30 Kola nuts are a vegetable-based stimulant.

There are two main aspects to food list specificity: the list needs to include a reasonable number of individual items for each of the main food groups, and non-processed food items should ideally follow into just one group.

items consumed by the population and then include an “other” category where the acquisition/consumption of additional food items can be recorded. When they are present, these “other” food categories are included in the food list counts for this assessment.

To judge the specificity of surveys’ food lists, a first step is identifying a minimum number of food items

that should be included in each of the 14 BFGs. While these numbers are somewhat arbitrary, they were chosen based on the authors’ judg-ment of the typical variety found within each (see Table 2, where the minimum numbers are in pa-rentheses). It ranges from one for “Eggs” to ten for “Vegetables”, “Fruits”, and “Condiments, spices and baking agents”. The table re-ports the percentage of assessment surveys meeting these minimums. Almost all surveys meet the mini-mum for the “Cereals” food group. Food groups where the minimum is met by notably low percentages of surveys are “Roots, tubers and plantains”, “Fish and seafood”, “Condiments, spices and baking agents”, and “Alcoholic bever-

ages”. The food group “Condiments, spices and baking agents” is likely underrepresented because it is made up of more modern processed food items. Given the increased importance of these items in people’s diets (e.g., Popkin 2002) and budgets, however, especially in urban areas, it is important that they be included in food lists in their appropriate relative variety. This point is underlined by the fact that they have a much higher representation in diary than interview surveys.

There are some countries in which specific food groups are more likely to be underrepresented simply because the foods are not consumed widely among their populations. For example, “Roots, tubers and plantains” are not consumed in some countries because they cannot be grown there and are not easily imported. Very few distinct items in the “Fish and seafood” cate-gory may be appropriate for land-locked countries. Be-cause of these inherent country-specific variations, the first assessment criteria used for judging food specific-ity is that the required minimum number of food items be met for at least 10 of the 14 food groups. Sixty-three percent of countries meet this criterion (see Panel B of Table 2).

0

10

20

30

40

50

60

70

80

90

100

81.4

All basic food groups represented

86.6

At least 40 percent of food items processed

96.9

Food items are all food-exclusive

72.2

All three criteria

Note: N=96 surveys

Assessment of the Reliability and Revelance of the Food DataCollected in National Household Consumption and Expenditure Survey

20

The second and last criterion used to judge the speci-ficity of HCES food lists relates to food items that span more than one of the basic food groups. Most prepared dishes will span these food groups because they have multiple ingredients, and this is not considered a prob-lem. Indeed, specificity is increased when these types of food items are listed in detail. A large number of food items (other than prepared dishes) spanning food groups is an indication that a food list is not specific enough for accurate enumeration of food consumption/acquisition, however. In the case of diary surveys, it could be both a reflection of a lack of instruc-tions to diary keepers to be specific about their food consumption/ac-quisition and/or of how food items recorded have been aggregated for analysis. The second specificity criterion is that less than five per-cent of the food items (apart from prepared dishes) span more than one BFG. When this condition is met, the large majority of food items, 95 percent or more, fall into one and only one food group.

Note that 93 percent of the as-sessment surveys had a least one food item (not including prepared dishes) that spans more than one of the BFGs. Among those with any, the average percent of such items in the food list ranges from 0.45 to 26 percent. Some span just two food groups. Examples of these are:

• “Beverages” (spanning the Non-alcoholic beverages and Alcoholic beverages groups);

• “Butter, margarine and cheese” (spanning the Milk and milk products and Oils and fats groups)

• “Other milk, cheese and eggs” (spanning the Milk and milk products and Eggs groups)

• “Other meats, poultry, seafood” (Spanning the Meat, poultry and offal and Fish and Seafood groups); and

• “Canned fruits or vegetables” (Spanning the Fruits and Vegetables groups).

Others could span a large number of food groups. These include residual “catch-all” categories such as “All other foods”, “Other food products”, “Miscella-neous other food,” designed to catch any food expen-ditures that haven’t already been covered by another

food group. They also include broad categories that don’t allow identification of which type of food is being referred to, such as “Snacks”, “Canned foods”, “Baby food”, “Soups”, and “Sauces” that may be difficult for interview respondents to easily recall.

Figure 6 shows that only 54 percent of surveys meet both food list specificity criteria, indicating that there is great room for improvement in this area.

3.6 Quality of data collected on food consumed away from home

The rapid urbanization and globalization that began in the last decades of the 20th century have brought with them “nutrition transition” across the globe. Such a transition is marked by changes from traditional diets towards those higher in fat, sugar, caloric beverages in place of water and, as noted above, processed foods. An-other important change that has typically occurs during the nutrition transition is a rise in the consumption of food outside of the home (Maxwell and Slater 2003; FAO 2006; Popkin 2008; WHO 2002). Urbanization

Food Consumed Away from Home constitutes a large and increasing percentage of food expenditure in countries thor-ough the developing regions. It also makes for one of the trick-iest items to capture in household surveys. While 90 percent of the surveys assessed did try and capture this item, only 42 percent do so by meeting the minimum reliability criteria.

0

10

20

30

40

50

60

70

80

90

100

62.5

At least 10 of the basic food groups have the minimum

number of food items

77.1

Less than 5% of food items span more than one food

group

54.2

Both

Note: N=96 surveys

IHSN Working Paper No. 008

February 2014

21

drives this trend by bringing together increasingly large concentrations of people in one location, making com-mercial eating establishments profitable. Globaliza-tion drives it by bringing with it new imported foods and advertising messages that urge people to eat them. A number of other factors support the trend towards out-of-home food consumption, including: increased incomes, which make eating more expensive, prepared foods affordable; new sources of transportation, which increase the ease with which people can travel or com-mute farther away from their homes; increases in the supply of prepared foods in commercial establishments such as restaurants and street stalls as the demand for prepared foods increases; and the fact that as people, especially women, begin to take on paid jobs, the time for shopping and preparing foods is more limited, mak-ing it more cost-effective to purchase cooked foods (Pingali and Kwaja 2004).

There have been precipitous increases in the con-sumption of food outside of people’s homes over the last decades in both developing and developed coun-tries (Schmidhuber and Shetty 2005; Drichoutis and Lazaridis 2005; WHO 2002). The example of the Unit-ed States, for which the longest time series is available, is telling: food away increased from 10 to 49 percent of total food expenditures between 1900 and 2010 (USDA 2012a). Other evidence comes from Egypt, where the percentage of meals away from home rose from 20 to 46 between 1981 and 1998,31 and Mauritius, where (in-flation adjusted) expenditures on prepared foods rose five times between the 1960s and 1990s (Galal 2002; Mauritius Ministry of Economic Development 1997, both cited in WHO 2002). In urban China total expen-diture on food away from home increased by 63 percent between 1995 and 2001 (Ma et al. 2006). And in In-dia the percentage of households reporting consuming full meals away over a month’s period rose from 23 to 39 between 1994 and 2010 (Smith 2012). According to Schmidhuber and Shetty (2005), trends in consump-tion patterns associated with the nutrition transition, including increased food consumed away from home, will accelerate more in developing than in developed countries.

Taking food away consumption into account is par-ticularly important for measuring calorie consumption because food consumed outside the home tends to be more calorie-dense than food consumed at home (Poti and Popkin 2011; Mancino, Todd and Lin 2009) and the amount of food consumed away tends to increase

31 These percentages include meals consumed at the homes of relatives.

faster with increases in income (Senauer 2006; Gale and Huang 2007). The food may also contain more pro-tein and specific micronutrients.32 Because food con-sumed away is a substitute for food consumed at home, the consequences of not taking it into account is a pro-gressively more unreliable measurement of poverty and food security, possibly including incongruent trends in their indicators that send conflicting messages to policy makers (Smith 2012).

For background, Figure 7 gives a typology of food consumed away from home (or “FCAFH”), which de-lineates its various components. The overarching con-cept is food prepared away from home, which may be consumed either at home or away from home. Focusing on food consumed away from home, a key distinction to make is the mode of acquisition, of which there are two: Purchased or received in kind. It is very impor-tant to take the latter into account as it can be a large proportion of food away for some populations.33 An-other key distinction is the place of consumption. In the case of purchased food this may be a commercial establishment--such as a restaurant, bar, street stall, or mobile vendor--or a canteen or cafeteria at a school or work place. Food received in kind outside of the home may be provided by a school, an employer, through food assistance (e.g., feeding), or as a gift from another household. The latter includes food eaten as a guest at another person’s home or eaten at a commercial estab-lishment and paid for by others. Snacks, which become an increasingly important part of the diet as nutrition transition proceeds (Popkin 2008), can make up a large proportion of FCAFH since people are less likely to con-vene in the home for snacks than meals.

Before considering the minimum reliability criteria for the data collected on FCAFH, Table 3 describes fea-tures of the data collection. Data on FCAFH were not commonly collected in HCES until recently. Ninety percent of the assessment surveys collected some data on FCAFH, rising to 100 percent for the diary surveys. For interview surveys, data were considered to have been collected on FCAFH if any food item in the food list itself, the title of the section in which it is found, or a question regarding the item contains the following words (or variations on them): “Food eaten out, res-taurant foods, foods eaten in restaurants and other es-tablishments, food away from home, food eaten away,

32 See for example Ma et al. (2006), who show that the rapid rise of food away from home in urban China has been accompanied by a rapid increase in meat demand.

33 For example in India more households report consuming in-kind food away from home than purchased (Smith 2012).

Assessment of the Reliability and Revelance of the Food DataCollected in National Household Consumption and Expenditure Survey

22

Another household

Purchased

Grocery

store

Figure 7: Typology of food away from home

Consumed at home Consumed away from home

Received in kind PurchasedPurchased Received in kind

Take away Food assistance

Another household

Commercial establishment

(restaurant, bar,

street stall…)

School Employer Food assistance

Food prepared away from home (meals and snacks)

Source: Smith and Frankenberger (2012).

Table 3: Food away from home data collection

Interview surveys

Diary surveys All

(Percent of surveys)Whether any data collected on food consumed away from home a/ 83.3 100.0 90.0

Detail of data collection b/Only one line item (e.g., "Restaurant food") 36.0 7.9 23.9

Data collected for multiple places of consumption 14.0 35.0 23.3Data collected on food received in-kind 46.0 65.0 54.4Data collected on specific food items 28.0 40.0 32.9Snacks explicitly referred to 26.0 35.1 29.9Alcoholic beverages explicitly referred to 36.0 32.4 34.5Data collected at the individual level 12.0 23.7 17.0

Recall period b/Less than one week 6.0 100.0 47.8One week 48.0 0.0 26.7Two weeks 12.0 0.0 6.7One month 14.0 0.0 7.8Greater than one month 20.0 0.0 11.1

a/ N=100 surveys.b/ Calculations are only for surveys for which any data are collected on food consumed away from home (N=90).

IHSN Working Paper No. 008

February 2014

23

food eaten out of the home, food eaten at other people’s homes, meals eaten out, or meals away”. The diary sur-veys were judged partially on this same requirement, but also on whether the diary instructions or instruc-tions to interviewers explicitly mention food consumed away from home.

The detail or specificity with which data are collected is as important for reliable collection of data on food consumed away from home as it is for food consumed within the home. As noted above, the more detail with which data are collected, the better is respondents’ abil-ity to recall and the higher is the likelihood that all of the food acquired and/or consumed will be captured given reasonable overall time limits to survey admin-istration. While the large majority of the assessment surveys did indeed collect data on FCAFH, compared to the data collected on food consumed at home the detail with which data are collected is very poor.

In the case of interview surveys, 36 percent of those collecting any FCAFH data attempted to capture this broad component with only one line item in the entire questionnaire. Examples of these line items are:

• Food and drinks consumed outside the home• Meals taken outside home• Restaurant food, meal eaten at restaurant• Cooked food and beverages consumed away

from home• Outdoor meals (breakfast, lunch, dinner).

For surveys employing this method, respondents are typically asked to report on the total expenditures of all household members on these food items over the recall period.

Data were collected for multiple places of consump-tion in only 23 percent of the surveys for which any FCAFH data were collected. The most common place was a restaurant, followed by bars, street stalls and educational institutions. Data are collected on in-kind receipts of food consumed outside of the home as op-posed to only purchases for 54 percent of the surveys. Finally, detail on the types of foods and beverages con-sumed is scarce as well. Data were only collected on specific food items consumed away from home for 33 percent of the surveys, and very few dishes were list-ed, certainly not the wide variety that people are likely to eat in restaurants and other commercial establish-ments, especially in urban areas. Snacks and alcohol, both of which tend to be prominent in the expenses and nutrient intake of people eating food away from home

were specifically referred to in 30 and 35 percent of the surveys, respectively. The level of detail with which the data on food consumed away from home are collected tends to be greater for diary than interview surveys. Note that data are collected at the individual (as op-posed to household) level for only 17 percent of the sur-veys; doing so accommodates the reality that most food consumed away from home is eaten away from other family members and, most particularly, the survey re-spondent.

The reliability of the data collected on FCAFH is judged using three criteria:

1. Whether data are explicitly and deliberately collected on FCAFH (as defined above);

2. Whether the recall period for collection of the data is less than or equal to two weeks; and

3. Whether data are collected on in-kind receipts.

If all three of these criteria are met, the data on FCAFH are considered to be minimally reliable. Note that these criteria fall far below optimal data collection, which would entail detailed recording of the actual foods and/or meals consumed for food purchases and multiple sources of food received in kind--including from other households, food assistance, and free food received at schools and work places. Hopefully data collection will improve over the coming years, and the quality bar can be raised.

Figure 8 reports on the percentage of surveys meet-ing the three minimum criteria. As mentioned above, 90 percent of surveys explicitly collect data on FCAFH. Seventy three percent have a recall period less than or equal to two weeks. Only 49 percent collect data on in-kind food received, however. Overall 42 percent of the assessment surveys satisfy the three minimum reli-ability criteria for the quality of data on food consumed away from home, signaling that the quality is indeed quite low at this point in time.

Assessment of the Reliability and Revelance of the Food DataCollected in National Household Consumption and Expenditure Survey

24

3.7 Accounting for seasonality of food consumption patterns

As recognized in the report of the Seventeenth Interna-tional Conference of Labour Statisticians on Household Income and Expenditure Statistics (ILO 2003), HCES’s should cover a full-year accounting period to take into account seasonal variations in expenditures. This is es-pecially important in the case of food, because seasonal variations in dietary patterns, overall quantities of food consumed, and the consumption of particular nutrients can be pronounced (Coates et al. 2012a), partly due to the relationship with cyclical food production cycles.

Seasonality in food consumption patterns is cap-tured by repeating a survey multiple times throughout a year’s period. The assessment surveys that account for seasonality in some way can be divided into two groups: The first are those for which the survey is conducted two to four times a year, either for the same households or a new sample. Twelve percent of the assessment

Seasonal patterns in the consumption of many food items are highly pronounced. 53 percent of the surveys reviewed try to take this into account by conducting the survey two to four times a year (for the same households or a new sample) or by surveying sub-sets, usually one-twelfth, of households in the sample in each month of the year.

surveys were conducted in this manner (Figure 9).34 The second method distributes data collection throughout a year by surveying sub-sets, usually one-twelfth, of households in the sample in each month of the year. This method is employed for just over forty per-cent of the assessment surveys. To capture differences in seasonal patterns across geographic areas within countries, survey primary sampling units should be ran-domly assigned to the different months. Given the information available, it was not possible to de-termine whether this geographical randomization was carried out for each survey.

The overall minimum reliability criteria for whether seasonality is taken into account is that either one of the two methods is used. Just 53 percent of the surveys meet the criterion. The surveys that are instead under-taken over a limited time during a year’s period risk col-lecting data on food acquisition or consumption, and estimating indicators derived from them, that are not an accurate reflection of the overall, annual pattern in the population.

One issue related to seasonality is that concerning measurement of “usual” consumption at the house-hold (as opposed to population) level. For indicators that depend on the distribution of consumption across households rather than only means or totals—measures such as prevalences of poverty and calorie, protein and micronutrient insufficiencies—it is important that such usual consumption be captured for each survey house-hold rather than just the sample as a whole. If data are only collected one time for each household for a “short” observation period, then usual consumption may not be captured because random shocks are included along with the real between-household inequality in con-sumption, leading to overestimates of population prev-alences (Deaton and Grosh 2000; Murphy, Ruel and Carriquiry 2012).

34 It was not possible to assess whether the specific times of year for which data were collected are appropriate for capturing seasonality for each of the surveys within the time frame of this assessment.

0

10

20

30

40

50

60

70

80

90

100

Note: N=100 surveys

90.0

73.0

49.042.0

Data explicitely collected on food

consumed away from home

Recall period two weeks or less

Data collected on both purchase and inkind

receipts

All criteria

Note: For second and third criteria, households without data collected on food consumed away from home are included in the percentage.

IHSN Working Paper No. 008

February 2014

25

But what defines “usual” consumption?35 To assess usual consumption, how many times should data be collected from households and for what observa-tion or “reference” period? 36 What difference will extending reference periods and conducting repeat vis-its actually make to estimates of poverty and nutrient insufficien-cies? According to Gibson (2005), a one-time, 7 to 14 day observa-tion period is insufficient for accu-rate poverty measurement. On the other hand, it is commonly agreed among nutritionists that a 24-hour observation period repeated at least twice on two nonconsecutive days is sufficient to capture usu-al nutrient intakes (Coates et al. 2012b). Therefore, the answers to these questions are far from clear and must be considered in future studies.

3.8 Summary

Figure 10 gives a summary of the extent to which the assessment surveys meet the minimum criteria for reliability of the food data col-lected. The good news is that many criteria are being met by the large majority of HCES. The criterion most often met was that data are collected on all three modes of ac-quisition. Other criteria that were met by large majorities of surveys are those regarding completeness of enumeration and comprehen-siveness of the food list. While the majority of surveys met the crite-rion that the recall period for food data collection be two weeks or less, a full thirty percent did not. Just over fifty percent of countries

35 Deaton and Grosh (2000) write that a year is a “sensible” period over which to judge people’s living standards for poverty measurement. Murphy, Ruel and Carriquiry (2012) define usual nutrient intake simply as the “long-term average intake of a nutrient by an individual” (p. S236).

36 A reference period is the total period over which a household’s consumption and expenditures is observed. The reference period is longer than the recall period if households are visited multiple consecutive times.

did not meet the criteria for specificity of the food list

and for seasonality to be taken into account. The cri-terion that was met by the lowest percentage of house-holds, just 42 percent, relates to the quality of data col-lected on food consumed away from home, a source of food that is likely to rapidly increase over the coming decades.

0

20

40

60

80

100

46.9

A. One round in a limited number of

months

11.5

B. From 2-4 rounds in a year

41.7

C. Twelve or more rounds spread

throughout a year

53.1

Seasonality taken into account (B or C)

Figure 9: Percent of assessment surveys taking seasonality into account

Note: N=96 surveys

0102030405060708090

100

Acqu

isitio

n m

odes

(all t

hree

)

Com

plete

ness

of e

num

erat

ion

Spec

ificity

of f

ood

list

Com

preh

ensiv

enes

s of f

ood

list

Seas

onali

ty ta

ken

into

acc

ount

At-h

ome

reca

ll per

iod <

two

week

s

Quali

ty of

FCA

FH*

data

coll

ectio

n

All c

riter

ia

85.075.0

72.2 70.0

54.253.1

42.0

12.9

* Food consumed away from home.

IHSN Working Paper No. 008

February 2014

63

References

ADePT-FSM. 2013. ADePT Food Security Module. http://www.fao.org/economic/ess/ess-fs/fs-methods/adept-fsn/en/

Alkire, Sabina and Foster, James. 2011. Counting and multidimensional poverty measurement. Jour-nal of Public Economics, vol. 95(7-8), pp. 476-487.

Arimond, M. and M.T. Ruel. 2004. Dietary diversity is associated with child nutritional status. Journal of Nutrition (134) 2579-2585.

Barker, Barker, Mary, Ginny Chorghade, Sarah Cro-zier, Sam Leary and Caroline Fall. 2006. Gender differences in body mass index in rural India are determined by socio-economic factors and lifestyle. The Journal of Nutrition, No. 136, pp. 3062-3068.

Beegle, Kathleen, Joachim De Weerdt, Jed Friedman and John Gibson. 2012. Methods of household consumption measurement through surveys: experimental results from Tanzania. Journal of Development Economics 98(1): 3-18.

Berti, Peter R. 2012. Intrahousehold distribution of food: A review of the literature and discussion of the implications for food fortification programs. Food and Nutrition Bulletin (33)3: S163-S169.

Black, Robert E., Lindsay H. Allen, Zulfiqar A. Bhutta, Laura e. Caulfield, Mercedes de Onis, Majid Ez-zati, Colin Methers, Juan Rivera, for the Ma-ternal and Child Undernutrition Study Group. 2008. The Lancet. 371: 243-260.

Bouis, H., L. Haddad and E. Kennedy. 1992. Does it matter how we survey demand for food? Evi-dence from Kenya and the Philippines. Food Policy 17(5): 349-360.

Cafiero, Carlo. 2012a. What do we really know about food security? Mimeo. Economic and Social Sta-tistics. Food and Agriculture Organization of the United Nations, Rome.

Cafiero, Carlo. 2012b. Advance in hunger measure-ment: Traditional FAO methods and recent inno-vations. Mimeo. Economic and Social Statistics. Food and Agriculture Organization of the United Nations, Rome.

Caulfield, Laura E., Stephanie A. Richard, Juan A. Rive-ra, Philip Musgrove and Robert E. Black. 2006. Stunting, wasting and micronutrient deficiency disorders. Chapter 28 in Dean T Jamison, Joel G Breman, Anthony R Measham, George Alleyne, Mariam Claeson, David B Evans, Prabhat Jha, Anne Mills, and Philip Musgrove, eds. Disease

Control Priorities in Developing Countries, 2nd edition. Washington, D.C., World Bank.

Chen, Shaohua and Martin Ravallion. 2010. The devel-oping world is poorer than we thought, but no less successful in the fight against poverty. Quar-terly Journal of Economics (125)4:1577-1625.

Chesher, A. 1997. Diet revealed?: Semiparametric esti-mation of nutrient intake-age relationships. J R Statist Soc 160: 389-428.

Clark, Renata. 1995. Micronutrient fortification of food: technology and quality control. Paper prepared for the Technical Consultation on Food Fortifi-cation: Technology and Quality Control. Novem-ber 20-23, 1995. Rome, United Nations Food and Agriculture Organization.

Coates, Jennifer, Brooke Colaiezzi, Jack Fiedler, James Wirth, Keith Lividini, and Beatrice Rogers. 2012a. Applying dietary assessment methods for food fortification and other nutrition programs. GAIN Working Paper Series No. 4. Global Alli-ance for Improved Nutrition, Ferald J. and Dor-othy R. Friedman School of Nutrition Science and Policy and HarvestPlus.

Coates, Jennifer, Brooke Colaiezzi, John L. Fiedler, James Wirth, Keith Lividini, and Beatrice Rog-ers. 2012b. A program needs-driven approach to selecting dietary assessment methods for decision-making in food fortification programs. Food and Nutrition Bulletin (33) 3(supple-ment): S146-S156.

Coudouel, Aline, Jesko Hentschel and Quentin Wodon. 2002. Poverty Measurement and Analysis. PRSP Sourcebook, World Bank, Washington D.C.

Deaton, A. 1997. The analysis of household surveys: A microeconometric approach to development policy. World Bank and John’s Hopkins Univer-sity Press, Baltimore and London.

Deaton, A., and M. Grosh. 2000. Consumption. In De-signing household survey questionnaires for de-veloping countries: Lessons from 15 years of the Living Standards Measurement Surveys, ed. M. Grosh and P. Glewwe, 91133. Washington, D.C.: World Bank.

Deloitte. 2011. Consumer 2020 – Reading the signs. Dobermann, A. and R. Nelson. 2013. Opportunities

and Solutions for Sustainable Food Production. Background paper for the High-Level Panel of Eminent Persons on the Post-2015 Develop-ment Agenda. Prepared by the co-chairs of the Sustainable Development Solutions Network Thematic Group on Sustainable Agriculture and Food Production.

Assessment of the Reliability and Revelance of the Food DataCollected in National Household Consumption and Expenditure Survey

64

Drichoutis, Andreas and Panagiotis Lazaridis. 2005. Food consumption issues in the 21st century. In-tensive Program project: Business Management with emphasis on food industry. Agricultural university of Athens, Greece. http://works.bepress.com/andreas_drichoutis/25

Dupriez, Olivier and E. Boyko. 2010. Dissemination of Microdata Files - Principles, Procedures and Practices. IHSN Working Paper No 005.

Engle-Stone, Reina. 2012. Fortifiable food consump-tion in Cameroon: Comparison of FRAT, 24-hour recall, and HCES methods. Presentation at the International Scientific Symposium on Food and Nutrition Security Information: From valid measurement to effective decision-making. United Nations Food and Agriculture Organiza-tion, Rome January 17-19, 2012.

European Commission (EC), International Monetary Fund (IMF), Organisation for Economic Co-operation and Development (OECD), United Nations (UN) and the World Bank (WB). 2009. System of National Accounts 2008. ISBN 978-92-1-161522-7

Eurostat website. Statistics Explained. Accessed on January 16, 2013. http://epp.eurostat.ec.europa.eu/portal/page/portal/eurostat/home/

FAO. 2013. Guidelines for measuring household and indi-vidual dietary diversity. Food and Agriculture Orga-nization of the United Nations, Rome.

FAO. 2012a. Food security statistics from national household surveys. ADePT Food Security Mod-ule—Input data files. FAO Statistics Division. Food and Agriculture Organization of the United Nations, Rome.

FAO. 2012b. The state of food insecurity in the world 2012. Food security indicators (on line). Food and Agriculture Organization of the United Na-tions, Rome. Accessed October 25, 2012. http://www.fao.org/publications/sofi/food-security-indica-tors/en/

FAO. 2012c. International Network of Food Data Sys-tems: About INFOODS. Food and Agriculture Organization of the United Nations, Rome. http://www.fao.org/infoods/en/.

FAO. 2012d. FAO/INFOODS Guidelines for Food Matching. Version 1.2. Food and Agriculture Or-ganization of the United Nations, Rome.

FAO. 2006. The double burden of malnutrition: case studies from six developing countries. FAO Food and Nutrition Paper #84. Food and Agriculture Organization of the United Nations, Rome.

FAO. 1987. The fifth world food survey. Food and Ag-riculture Organization of the United Nations, Rome. http://www.fao.org/economic/the-sta-tistics-division-ess/methodology/methodology-systems/food-balance-sheets-and-the-food-consumption-survey-a-comparison-of-method-ologies-and-results.

FAO, undated. Food balance sheets and the food con-sumption survey: A comparison of methodolo-gies and results. Economic and Social Statistics. United Nations Food and Agriculture Organiza-tion, Rome.

FAO. 2001. Food balance sheets: A handbook. United Nations Food and Agriculture Organization, Rome.

FAO/WHO. 1994. Codex Alimentarius, Volume 4, 2nd edition.

Ferro-Luzzi, Anna. 2003. Individual food intake survey methods. In Measurement and assessment of food deprivation and undernutrition. Proceed-ings of International Scientific Symposium on Food Deprivation and Undernutrition, Rome June 26-28 2002. United Nations Food and Ag-riculture Organization, Rome.

Fiedler, John L., Calogero Carletto, and Olivier Du-priez. 2012. Still waiting for Godot? Improv-ing Household Consumption and Expenditures Surveys (HCES) to enable more evidence-based nutrition policies. Food and Nutrition Bulletin (33)3: S242-S251.

Fiedler, John L., Yves Martin-Préval, Mourad Moursi. 2012. Towards overcoming the food consump-tion information gap: A nutrition analysis view of the costs of 24-hour recall and Household Consumption and Expenditure Surveys. Mimeo. HarvestPlus/IFPRI and IRD. International Food Policy Research Institute, Washington, D.C.

Fiedler, John L. 2009. Strengthening Household In-come and Expenditure Surveys as a tool for designing and assessing food fortification pro-grams. IHSN Working Paper No. 001. Interna-tional Household Survey Network, Washington, D.C.

Fiedler, John L., Marc-Francois Smitz, Oliver Dupriez, and Jed Friedman. 2008. Food and Nutrition Bulletin 29(4): 306-319.

Gaiha, Raghav, Raghbendra Jha and Vani S. Kulkar-ni. 2009. How pervasive is eating out in In-dia? ASARC Working Paper 2009/17. Australia South Asia Research Centre, Arndt-Corden Divi-sion of Economics. Australian National Univer-sity, Canberra.

IHSN Working Paper No. 008

February 2014

65

Galal, O.M. 2002. The nutrition transition in Egypt: obesity, undernutrition and the food consump-tion context. Public Health Nutrition (5): 141-148.

Gale, Fred and Kuo Huang. 2007. Demand for food quantity and quality in China. ERS Report Sum-mary, Markets and Trade. Economic Research Service, U.S. Department of Agriculture, Wash-ington, D.C.

Gibson, R. S., and E. L. Ferguson. 1999. An interactive 24-hour recall for assessing the adequacy of iron and zinc intakes in developing countries. United States Agency for International Development and International Life Sciences Institute, Wash-ington, D.C.

Gibson, John. 2005. Statistical tools and estimation methods for poverty measures based on cross-sectional household surveys. Chapter 5 in Unit-ed Nations Statistics Division. United Nations handbook on poverty statistics. Special project on poverty statistics.

Grunberger, Klaus. Forthcoming. Reconciling Food Balance Sheet and Household Surveys. FAO.

Hammond, A. Kramer W.J., Tran J., Katz, R. and C. Walker. 2007. The Next 4 Billion. Market Size and Busi-ness Strategy at the Base of the Pyramid. Interna-tional Finance Corporation (IFC) and World Re-sources Institute (WRI)

ILO (International Labour Organization). 2003. Re-port II: Household income and expenditure sta-tistics. Seventeenth International Conference on Labour Statisticians. Geneva, 24 November-3 December 2003.

ILO/IMF/OECD/UNECE/Eurostat/The World Bank. 2004. Consumer price index manual: Theory and practice. Geneva, International Labour Of-fice

Jacobs, Krista and Daniel A. Sumner. 2002. The food balance sheets of the Food and Agriculture Orga-nization: A review of potential ways to broaden the appropriate uses of the data. Mimeo. Univer-sity of California, Davis. Davis, California.

Kapteyn, A. 1994. The measurement of household cost functions. Revealed preferences versus subjec-tive measures, Journal of Population Econom-ics, 7, pp. 333–50.

Kruse, J. 2010. Estimating Demand for Agricultural Commodities to 2050. Global Harvest Initiative (pre-publication draft)

Lividini, K., J. Fiedler and O. Bermudez. 2012. Using HCES to inform nutrition policy: The Zambia micronutrient portfolio study. Presented at the 8th International Conference on Diet and Activity

Methods. Methodological Challenges for Mea-suring the Achievements of International Poli-cies. May 14-17, Food and Agriculture Organiza-tion of the United Nations, Rome.

Ma, Henguyn, Jikun Huang, Frank Fuller and Scott Rozelle. 2006. Getting rich and eating out: con-sumption of food away from home in urban Chi-na. Canadian Journal of Agricultural Econom-ics, 54(2006): 101-119.

Mancino, Lisa, Jessica Todd and Biing-Hwan Lin. 2009. Separating what we eat from where: Mea-suring the effect of food away from home on diet quality. Food Policy 34(6): 557-562.

Ma, Henguyn, Jikun Huang, Frank Fuller and Scott Rozelle. 2006. Getting rich and eating out: con-sumption of food away from home in urban Chi-na. Canadian Journal of Agricultural Econom-ics, 54(2006): 101-119.

Mauritius Ministry of Economic Development. 1997. Household budget survey 1998/97. Central Sta-tistical Office. Ministry of Economic Develop-ment, Productivity and Regional Development. Republic of Mauritius.

Maxwell, Simon and Rachel Slater. 2003. Food policy old and new. Development Policy Review. Vol-ume 21, Issue 5-6, pp. 531-553.

Moursi, Jourad, Howarth E. Bouis, Patrick Eozenou, Christine Hotz, and J.V. Meenakshi. 2012. How do Household Consumption and Expenditure Surveys compare to 24-hour recalls in terms of macro and micronutrient intakes? Evidence from Mozambique. Presented at the 8th Inter-national Conference on Diet and Activity Meth-ods. Methodological Challenges for Measuring the Achievements of International Policies. May 14-17, Food and Agriculture Organization of the United Nations, Rome.

Murphy, Suzanna, Marie Ruel and Alicia Carriquiry. 2012. Should Household Consumption and Ex-penditures Surveys (HCES) be used for nutri-tional assessment and planning? Food and Nu-trition Bulletin (33) 3(supplement): S235-S241.

Naiken, L. 2003. FAO metholdogy for estimating the prevalence of undernourishment. In Measure-ment and assessment of food depreivation and undernutrition, United Nations Food and Agri-culture Organization, Rome.

Naska, A. V. Basdekis, and A. Trichopoulou. 2001. A preliminary assessment of the use of household budget survey data for the prediction of individ-ual food consumption. Public Health Nutrition (4): 1159-65.

Assessment of the Reliability and Revelance of the Food DataCollected in National Household Consumption and Expenditure Survey

66

Naska, A., E Oikonomou, A. Trichopoulou, K. Wagner and K. Gedrich. 2007. Estimations of daily ener-gy and nutrient availability based on nationally representative household budget survey data. The Data Food Networking (DAFNE) project. Public Health Nutrition 10(12), 1422-1429.

Neufeld, L.M. and L. Tolentino. Nutritional surveil-lance, developing countries. In B. Cabellero, A. Prentice and L. Allen (eds). Encyclopedia of Hu-man Nutrition. Third edition. Elsevier Press, Oxford.

Omar, Dary and Zo Rambeloson Jariseta. 2012. Vali-dation of dietary applications of Household Consumption and Expenditure Surveys (HCES) against a 24-hour recall method in Uganda. Food and Nutrition Bulletin (33)3(supplement): S190-198.

Pingali, Prabhu and Yasmeen Khwaja. 2004. Global-ization of Indian diets and the transformation of food supply systems. ESA Working Paper No. 04-05, February 2004. Agricultural and Devel-opment Economics Division. Food and Agricul-tural Organization of the United Nations, Rome.

Popkin, Barry M. 2008. The world is fat: the fads, trends, policies and products that are fatten-ing the human race. Avery-Penguin Press, New York.

Popkin, Barry M., Bing Lu and Fengying Zhai. 2002 (updated in 2006). Understanding the nutrition transition: measuring rapid dietary changes in transitional countries. Public Health Nutrition 5(6A): 947-953.

Popkin, Barry. 2002. Stages of the Nutrition Transi-tion: Dynamic Global Shifts Appear to be Accel-erating. (Need full reference).

Poti, J.M. and B.M. Popkin. 2011. Trends in energy in-take among US children by eating location and food source, 1977-2006. J Am Diet Assoc 111(8): 1156-64.

Ramakrishnan U. 2002. Prevalence of micronutri-ent malnutrition worldwide. Nutr Rev. 60(5 Pt 2):S46-52.

Rambeloson Jariseta, Z., O. Dary, J.L. Fiedler, and N. Franklin. 2012. Comparison of estimates of the nutrient density of the diet of women and chil-dren in Uganda by Household Consumption and Expenditures Surveys (HCES) and 24-hour recall. Food and Nutrition Bulletin 33(3): S199-S208.

Ravallion, Martin. 1998. Poverty Lines in Theory and Practice. Living Standards Measurement Study Working Paper No. 133. World Bank, Washing-ton, D.C.

Ravallion, Martin and Benu Bidani. 1994. How Robust is a Poverty Profile? World Bank Economic Re-view, Volume 8, Issue 1, pp. 75-102

Ravallion, Martin and Shaohua Chen. 2010. The Devel-oping World is Poorer than We Thought, But No Less Successful in the Fight Against Poverty. The Quarterly Journal of Economics 125(4): 1577-1625.

Rogers, Beatrice, Jennifer Coates, and Alexander Blau. 2012. Estimating individual consumption from national household consumption and expendi-ture survey data for nutrition programming deci-sions. Presented at the 8th International Confer-ence on Diet and Activity Methods. Methodologi-cal Challenges for Measuring the Achievements of International Policies. May 14-17, Food and Agriculture Organization of the United Nations, Rome.

Ruel, M. T., L. Haddad, M. Arimond, D. Gilligan, N. Mi-not, K. Simler, and X. Zhang. 2003. Diet quality and diet changes of the poor: A global research program to improve dietary quality, health, and nutrition. A proposal for a global research pro-gram (GRP24). International Food Policy Re-search Institute, Washington, D.C.

Schmidhuber, Josef and Prakash Shetty. 2005. The nutrition transition to 2030: Why developing countries are likely to bear the major burden. Food Economics (2)3/4: 150-166.

Sekula, W., M. Nelson, K. Figurska, M. Oltarzewski, R. Weisell, and L. Szponar. 2005. Comparison between household budget survey and 24-hour recall data in a nationally representative sample of Polish households. Public Health Nutrition. 8(4): 430-439.

Senauer, Ben. 2006. “The growing market for high-value food products in developing and transition countries.” Journal of Food Distribution Re-search 37(1): 22-27.

Sibrian, Ricardo, ed. 2008. Deriving food security in-formation from national household budget sur-veys; experiences, achievements, challenges. Food and Agriculture Organization of the United Nations, Rome.

Smith, Lisa C. 2012. The great Indian calorie debate: Explaining rising undernourishment during In-dia’s rapid economic growth. TANGO, Interna-tional, Tucson, AZ.

Smith, Lisa C and Timothy R. Frankenberger. 2012. Ty-pology of food away from home. TANGO, Inter-national, Tucson, AZ.

Smith, Lisa C. and Ali Subandoro. 2007. Measuring food security using household expenditure sur-

IHSN Working Paper No. 008

February 2014

67

veys. Food Security in Practice Series. Interna-tional Food Policy Research Institute, Washing-ton, D.C.

Smith, Lisa C., Harold Alderman and Dede Aduayom. 2006. Food insecurity in Sub-Saharan Africa: New estimates from household expenditure sur-veys. IFPRI Research Report #146. Internation-al Food Policy Research Institute, Washington, D.C.

Smith, Lisa C. and Ali Subandoro. 2005. Improving the Empirical Basis for Assessing Food Insecu-rity in Developing Countries: Asia. Project Final Report. Submitted to the Department for Inter-national Development of the United Kingdom and the Australian Agency for International De-velopment. International Food Policy Research Institute, Washington, D.C.

Swindale, A., and P. Ohri-Vachaspati. 2005. Measuring household food consumption: A technical guide. Food and Nutrition Technical Assistance (FAN-TA) Project and Academy for Educational De-velopment (AED), Washington, D.C. http://www.fantaproject.org/publications/householdcons.shtml.

Trichopoulou, Antonia. 2012. Dietary patterns and their socio-demographic determinants in 16 countries: Data from the DAFNE-ANEMOS da-tabank. Presentation at the International Scien-tific Symposium on Food and Nutrition Security Information: From Valid Measurement to Effec-tive Decision-Making. United Nations Food and Agriculture Organization, Rome, January 16-19 2012.

Trichopoulou, Antonia and Androniki Naska. 2001. The DAFNE Initiative: Assessment of dietary patterns across Europe using household budget survey data. Special Issue. Public Health Nutri-tion, Volume 4, Number 5(B).

United Nations. 2009. Practical guide to producing consumer price indices. United Nations, New York and Geneva.

United Nations. 1989. Household income and expendi-ture surveys: A technical study. National House-hold Survey Capability Programme. United Na-tions Department of Technical Co-operation for Development and Statistical Office. New York.

UNSD (United Nations Statistics Division). 2005. Unit-ed Nations handbook on poverty statistics. Spe-cial project on poverty statistics.

UNU, WHO, and FAO (United Nations University, World Health Organization, and Food and Ag-riculture Organization of the United Nations). 2004. Human energy requirements: Report of a joint FAO/WHO/UNU expert consultation,

Rome, October 17-24, 2001. FAO Food and Nu-trition Technical Report Series 1. FAO, Rome.

USDA. 2012a. USDA expenditures data Table 1: “Food and alcoholic beverages: Total expenditures”. http://www.ers.usda.gov/datafiles/Food_Expendi-tures/Food_Expenditures/table1.xls

USDA. 2012b. Food consumption and demand: food away from home. United States Department of Agriculture, Economic Research Service. http://www.ers.usda.gov/topics/food-choices-health/food-consumption-demand/food-away-from-home.aspx.

USDA. 1946. Title 7—Agriculture; Subtitle B -- Regula-tions Of The Department Of Agriculture; Chapter I - Agricultural Marketing Service H1 (Standards, Inspections, Marketing Practices), Department Of Agriculture; Subchapter C - Regulations And Standards Under The Agricultural Marketing Act Of 1946 And The Egg Products Inspection Act; Part 52 - Processed Fruits And Vegetables, Processed Products Thereof, And Certain Other Processed Food Products H1,Subpart - Regula-tions Governing Inspection And Certification ,Definitions.

U.S. Department of Agriculture. 2012. USDA National Nutrient Database for Standard Reference, Re-lease 25. Nutrient Data Laboratory Home Page, http://www.ars.usda.gov/ba/bhnrc/ndl

Vasdekis, V.G.S, S. Stylianou and A. Naska. 2001. Es-timation of age- and gender-specific food avail-ability from household budget survey data. Pub-lic Health Nutrition 4(5B): 1149-1151.

Vasdekis, V.G.S and A. Trichopoulou. 2000. Nonpara-metric estimation of individual food availabil-ity along with bootstrap confidence intervals in household budget surveys. Statistics Probability Lett. 46: 337-345.

Ward, K. and F. Neuman. 2012. Consumer in 2050. The rise of the EM middle class. HSBC Global Research. Global Economics.

Weisell, Robert and Marie Claude Dop. 2012. The adult male equivalent concept and its application to Household Consumption and Expenditure Sur-veys (HCES). Food and Nutrition Bulletin 33(3): S157-S162.

Welch, R. 2004. Micronutrients, agriculture, and nutri-tion: Linkages for improved health and well be-ing. USDA-ARS, U. S. Plant, Soil and Nutrition Laboratory, Ithaca, N.Y. (http://www.css.cor-nell.edu/FoodSystems/Micros%26AgriMan1ref.html)

World Bank. 2012. Country and lending groups. (Ac-cessed online November 2012). World Bank, Washington, D.C.

Assessment of the Reliability and Revelance of the Food DataCollected in National Household Consumption and Expenditure Survey

68

http://data.worldbank.org/about/country-classifica-tions/country-and-lending-groups

World Bank. 2010. Global strategy to improve agricul-tural and rural statistics: Report of the friends of the chair on agricultural statistics. World Bank, Washington, D.C.

World Economic Forum and Deloitte Touche Tohm-atsu. 2009. Sustainability for Tomorrow’s Con-sumer. The Business Case for Sustainability

World Health Organization (WHO). 2002. Global-ization, diets and noncommunicable diseases. World Health Organization, Geneva.

LSMS GUIDEBOOKJuly 2017

The Use of Non-Standard Units for the Collection of

Food QuantityA Guidebook for Improving the Measurement

of Food Consumption and Agricultural Production in Living Standards Surveys

Gbemisola Oseni, Josefine Durazo, and Kevin McGee

The Use of Non-Standard Units for the Collection of Food QuantityA Guidebook for Improving the Measurement of Food Consumption

and Agricultural Production in Living Standards Surveys

Gbemisola Oseni, Josefine Durazo, and Kevin McGee World Bank

TABLE OF CONTENTS

ACKNOWLEDGMENTS .................................................................................................................................................................................... v

EXECUTIVE SUMMARY .................................................................................................................................................................................... vi

1. INTRODUCTION ........................................................................................................................................................................................... 1

1.1 Standard vs. non-standard units ...............................................................................................................................................................................................1 1.2 The market survey .........................................................................................................................................................................................................................3 1.3 The main survey ..............................................................................................................................................................................................................................4 1.4 This Guidebook ...............................................................................................................................................................................................................................42. METHODOLOGIES FOR REPORTING CONSUMPTION AND PRODUCTION QUANTITIES .......................................................5

2.1 Collecting data on food consumption ....................................................................................................................................................................................5 2.2 Collecting data on agricultural production ...........................................................................................................................................................................6 2.3 Importance of non-standard units in household surveys ...............................................................................................................................................6 2.4 Non-standard units in household surveys ............................................................................................................................................................................7 Benefits of allowing reporting in non-standard units ......................................................................................................................................................7 Challenges of allowing reporting in non-standard units ................................................................................................................................................8 How common are non-standard units in surveys? ...........................................................................................................................................................93. GUIDELINES AND PROCEDURES FOR CAPTURING AND USING NON-STANDARD UNITS ...................................................10 3.1 Market survey planning and preparation .........................................................................................................................................................................10 Timing of market survey ...........................................................................................................................................................................................................10 Selection of markets to visit ...................................................................................................................................................................................................11 Preparation of survey materials ............................................................................................................................................................................................11 3.2 Constructing the list of non-standard units .....................................................................................................................................................................13 Market survey before/independent of main survey .....................................................................................................................................................13 Market survey after main survey ..........................................................................................................................................................................................14 3.3 Collecting weights for conversion factors ........................................................................................................................................................................14 3.4 Collecting reference photos ...................................................................................................................................................................................................16 Which item-units require photos?........................................................................................................................................................................................17 Guidelines for reference photos ...........................................................................................................................................................................................18 Creating and using the photo reference album .............................................................................................................................................................23 3.5 How to use the non-standard units libraries ..................................................................................................................................................................234. BENEFITS OF USING COMPUTER ASSISTED PERSONAL INTERVIEWING (CAPI) .......................................................................25

4.1 CAPI for market surveys .........................................................................................................................................................................................................25 4.2 CAPI for food consumption and agricultural production survey ...........................................................................................................................265. CONCLUSION ..............................................................................................................................................................................................27

REFERENCES .....................................................................................................................................................................................................28

iv v

ANNEX 1: SURVEY INSTRUMENTS .............................................................................................. ................................... ............................29

NSU market survey: questionnaire (Nigeria)NSU market survey: manual (Nigeria)Household survey: reference photo album (Ethiopia)Household survey: consumption module (with NSUs)Household survey: training manual (excerpt)

* Additional examples available online

ANNEX 11: LIBRARY OF NONSTANDARD UNIT CONVERSION FACTORS AND REFERENCE PHOTOS..... .. ..................... ...... 61

Ethiopia: documentation and reference photographs

Malawi: documentation and reference photographs

Nigeria: documentation and reference photographs

Uganda: documentation

* All documents available online at www.worldbank.org/lsms

vi vii

EXECUTIVE SUMMARY

This Guidebook is a reference for survey practitioners, providing advice on how to incorporate non-standard units (NSUs) of measurement into household surveys for the collection of food consumption and production quantities. Food consump-tion and agricultural production are two critical components for monitoring poverty and household well-being in low- and middle-income countries. Accurate measurement of both provides better contextual understanding and contributes to more effective policy design.

At present, there is no standard methodology for collecting food quantities. In many household surveys, respondents are forced to estimate quantities in standard or metric units, typically kilograms or liters. This method requires respondents to convert from whatever unit they actually consumed (e.g., a bowl of rice) to a standard unit. This conversion process is often an unfamiliar or difficult task for respondents and can introduce measurement error. We argue that allowing respondents to report quantities directly in NSUs places less of a burden on respondents and will ultimately improve the accuracy of the information they provide.

Despite these benefits, there are some challenges with this approach. First, these NSU quantities must still be converted into standard units for aggregation and analysis. Often, conversion factors are not readily available and must be created, a process that involves its own data-collection effort. A second challenge is that NSUs are by their nature not necessarily stan-dardized across respondents. One person’s “bunch” of bananas could be half the size of another person’s “bunch.” Showing reference photos of “bunches” to respondents can ensure that the unit “bunch” is further standardized when reported. This requires that a photo reference album is also prepared. This Guidebook explains how to properly incorporate NSUs into data-collection activities—from establishing the list of allowable NSUs to incorporating all components into household surveys. A NSU-focused market survey is a critical part of preparing the conversion factors required for effectively using NSU data in analysis work. As such, the bulk of this Guidebook focuses on implementing the market survey and on calcu-lating conversion factors to ensure the highest-quality data when using NSUs.

Practical guidance on non-standard units, conversion factors, and reference photos

Although existing data must first be taken into consideration, establishing a baseline of properly documented NSUs will most often require conducting a market survey, whereby survey teams seek out item-unit combinations in the market to weigh and photograph. Both market outputs then become inputs to the main household survey: the reference photos are shown to respondents during interviews and the weights are used to create conversion factors that are applied to the reported NSU quantities, facilitating their use in data quality assurance and data analysis. Collectively, these components comprise what is referred to herein as the NSU library.

There are several important steps to follow in preparing the library: 1) Preparation—Plan the timing (relative to the main survey) and the locations of the market survey, prepare the necessary market-survey materials (instruments and manuals), and construct a list of item-unit combinations that will be allowed in the main survey; 2) Market survey implementation—Collect weights and reference photos, taking into account any sub-national variation; and 3) Data documentation for the main survey—Using the market data, create conversion factors for the NSUs and draft clear user protocols for enumera-tors (in terms of reference photos) and data users (in terms of conversion factors).

vi vii

Procedures for properly implementing these steps are summarized here, and are then covered in detail throughout the Guidebook.

a. In terms of planning and preparation, a list of valid item-unit combinations should first be constructed by reviewing, updat-ing, and supplementing as necessary any existing sources that contain information on common NSUs. Next, when planning the market survey, it is especially important to consider its timing relative to the main survey where consumption and agricultural production data will be collected. Ideally, the market survey should be conducted prior to the main survey in order to use the reference photos during the main survey. If necessary, a much smaller-scale market survey can be conducted after the main survey to collect missing weights for any unanticipated conversion factors. Finally, markets should be selected to ensure adequate coverage of NSUs in the relevant context. This is particularly important if NSUs differ across regions.

b. Following these preparatory steps and the detailed market-survey implementation guidelines herein will ensure that as many item-unit combinations are collected as possible, the weights collected are comparable and accurate, and the reference photos clearly demonstrate the actual size of the NSUs. Annex I contains sample survey instru-ments.

c. After the market survey, the information collected should be prepared for use with the main survey. A library of NSU materials should be compiled, starting with the calculation of conversion factors that can be applied to NSU consump-tion and production quantities collected during the main survey. These conversion factors are used to flag unreasonable quantities for further verification; when surveys are conducted using computer-assisted personal interviewing (CAPI), this can be done during the course of fieldwork. When the main survey is complete, the conversion factors can be used to calculate total consumption, analyze poverty, etc. The library should also include an album of refer-ence photos compiled from the photos collected in the market survey. This album should be used by the enumerators conducting the main survey to provide a reference size for NSUs. Finally, the library must include documentation of how the materials were prepared and how to properly use them during the main survey. We highly recommend that the library be made publicly available for use in other surveys in order to further standardize NSU reporting across data-collection efforts. The library can be continually updated as more information is collected.

Annex II to this Guidebook contains libraries for Ethiopia, Nigeria, Malawi, and Uganda, and is available online. Although they are targeted for use with LSMS-ISA surveys, the libraries are intended to be used by any researchers conducting simi-lar survey activities in these countries. The libraries should be considered living documents, to be revised and updated with each new data-collection effort given that available foods and commonly used units and quantities may vary over time. Even so, making NSU libraries publicly available for more countries will make it easier to implement surveys that allow NSUs and will therefore result in improved data-collection for quantities of food consumption and agricultural production.

1

1. Introduction

Measuring poverty often depends on measuring food—food that is both purchased and harvested from the field. In low- and middle-income countries especially, food consumption still constitutes the largest share of total household consumption. As such, constructing a food poverty line and using it to estimate the total poverty line is the preferred methodology for measuring the share of households that are poor, which in turn is one of the most common welfare-analysis indicators for developing economies. Another important element of welfare analysis is the productivity of income-generating activities. In many low- and middle-income countries, agriculture is a major source of livelihood, and measuring agricultural productivity requires adequately measuring the quantity of agricultural output. Data on food quantity is also important for the computation of unit values for food items and crops, which in turn is critically important for monitoring and analyzing prices. Despite the importance of this information, accurately measuring both the quantity of food consumed and the quantity of agricultural output can be very challenging.

1.1 STANDARD VS. NON-STANDARD UNITSOne important aspect of collecting information on food con-sumption and agricultural production is the choice of units in which respondents can report quantities. Many surveys require quantities to be reported only in “standard” units such as kilograms, pounds, liters, etc. In these cases, “local” or “non-standard” units are disallowed. Forcing respondents to report only in standard units simplifies the use of the data (since aggregation/analysis of food-item consumption often requires a common unit of measure) but it can impose a sig-nificant cognitive burden on the respondent, which in turn can reduce the accuracy of the resulting data.

Many respondents in low- and middle-income countries are more comfortable reporting their food consumption and pro-duction using familiar “local” or “non-standard” units instead of standard units. Forcing respondents to convert from these familiar units into standard units during an interview is a type

of cognitive task. Recent studies show that asking respon-dents to combine memory recall with cognitive tasks, such as abstracting consumption to a “typical week or month,” leads to less accurate self-reporting (Beegle et al., 2010).

The forced conversion from non-standard to standard units requires respondents to undergo the process depicted in Figure 1. Respondents 1) must have a good understanding of what a standard unit of a food item is (e.g., how much is a kilogram of rice), 2) must estimate how many standard units correspond to the NSU they know (e.g., how many kilograms fit into a cup of rice), and finally 3) using the conversion from 2, must calculate the quantity consumed in standard units (e.g., 1 cup of rice is about 0.5 kg, I consumed 1.5 cups of rice, so I consumed about 0.75 kg of rice). All three stages place a cognitive burden on the respondent and can lead to sizable measurement error. Allowing respondents to directly report consumption in NSUs would ease the burden on the respon-dents and will ultimately result in more accurate reporting of their consumption.

1. INTRODUCTION 2

Figure 1 — Forcing NSU Conversion vs. Allowing NSUs

We consumed 1.5 bowls of rice.How much rice is a kg? I am not really sure.How many kgs of rice are in a bowl? I guess about 0.5 kg.So then I guess we consumed 1.5 bowls of rice X 0.5 kg in a bowl=0.75 kg of rice!

We consumed 0.75 kg.

We consumed 1.5 bowls of rice.

This size.

Apply collected conversion factor of 0.689 kg of riceper bowl to get 1.0335 kg of rice consumed.

Was the bowlsimilar to thissize, or to thissize?

FORCING STANDARD UNITS: More burden on the respondent, less consistency in conversion factors

ALLOWING NSUs: Simpli�es respondent’s role, conversion factors are consistent

How many kilogramsof rice did you consumein the past 7 days?

How much rice did you consumein the past 7 days?

Source: World Bank, LSMS Team.

3 THE USE OF NON-STANDARD UNITS FOR THE COLLECTION OF FOOD QUANTITY

While allowing NSU reporting will eliminate some burdens for the respondent, it does not mean the issues of NSU con-version disappear. Instead, it falls to the survey or research team to acquire the necessary information to take NSU quantities from respondents and convert them to common standard units (i.e., undertake steps 1, 2, and 3 mentioned above). The most critical information required to make the conversions is a list of standard-unit conversion factors for each NSU as well as for each food item. These item-unit con-version factors can be applied to NSU quantities reported by respondents to convert them into standard units (typically kilograms and liters). In an ideal world, a list of such conver-sion factors would already exist for the relevant country or context. However, the current reality is that conversion fac-tors are not readily available in many low- and middle-income countries. When they are available, they are often limited in scope or poorly documented, making their applicability and reliability hard to determine. When reliable conversion fac-tors are not available, it is up to the national statistics agency or research team to collect the weights and calculate the conversion factors required to convert non-standard units into clearly and widely measurable standard units such as kilograms or liters.

Going about collecting the necessary information to prop-erly incorporate NSUs into a survey and make NSU quan-tities usable for analysis is not a trivial process. The varied nature of NSUs introduces significant challenges to any sur-vey or research team undertaking this task. For example, similarly named NSUs can vary significantly within countries or subnational regions. Even within the same locality, NSUs often come in more than one size (e.g. small, medium, large). The challenges are particularly significant for vaguely defined NSUs such as pieces, heaps, bunches, etc. A “heap” of toma-toes can vary dramatically in size, making it difficult to con-vert each respondent’s “heap” in a consistent and accurate manner. While these challenges are significant, there are no comprehensive guidelines on how to properly collect this information.

This Guidebook is meant to fill this gap by highlighting the necessary steps and best practices for collecting this information. Establishing a systematic, well-documented, and more precise set of conversion factors for non-stan-dard units—and using it to both inform survey design and to convert reported measurements—will go a long way toward increasing the accuracy of crop-output estimates and household consumption. This in turn will allow for more informed policymaking on important development issues

such as household and individual welfare as well as agricul-tural productivity.

1.2 THE MARKET SURVEYThere are a few methods for calculating conversion factors. One such method suggested by Capéau (1995) and Capéau and Dercon (2006) is to compare unit prices using econo-metric techniques to estimate conversion factors. While this method is fairly simple to implement, it suffers from some drawbacks. Primary among these is that unit prices can vary because of factors unrelated to the actual mass or volume of an item. For example, unit prices can vary because of quality differences (Deaton 1997) or because of price discounts on larger units (Attanasio & Frayre, 2006).1 In addition, unlike conversion factors, prices can be subject to significant volatil-ity due to market forces. These sources of variability in unit prices unrelated to mass or volume can result in distorted or imprecisely estimated conversion factors.

The main alternative method is to conduct a market sur-vey where non-standard units are sought out and directly weighed. This is a more intensive process than calculating conversion factors from unit prices, but will likely result in more accurate conversion factors. When conducting a market

1 Under this methodology, a numeraire unit price (usually kilograms or liters) is used to compare with other units. For larger units, there may be discount in the price per kilogram and thus applying the numeraire unit price would underestimate the conversion factor for larger units. The reverse is also true for smaller units.

Figure 2 — Vaguely Defined NSUs

How much?

One small pile.

Source: World Bank, LSMS Team.

1. INTRODUCTION 4

survey, there are certain protocols that must be followed to ensure the collected weights are accurate and usable for cre-ating conversion factors. For example, conversion factors for vaguely defined units (especially non-container units such as pieces, heaps, bunches, etc.) are most reliable when accom-panied by reference photos. These photos can be shown to the respondents to provide standardized reference sizes for a “small heap” of onions, for example. Without the photos, the “small heap” reported by the respondent could be con-siderably different from the “small heap” used to establish the conversion factors (see Figure 2). These reference pho-tos must therefore be taken and collected along with the weights.

1.3 THE MAIN SURVEYOnce all the requisite information is collected for proper implementation of NSUs in a survey, the interview process becomes much less taxing on the respondents, without additional burden on the enumerators. The bottom panel of Figure 1 depicts the revised process. The respondent is only required to think about consumption or production in the unit with which she is most familiar. The enumerator simply confirms this unit using the reference photo and then records the amount in NSUs. Afterward, conversion factors are applied to the reported NSUs to arrive at the correct standard weight.

1.4 THIS GUIDEBOOKThis Guidebook serves as a reference for preparing and using non-standard units: establishing a list of valid NSUs, collecting standard weights and reference aides for NSUs (usually via a market survey), calculating conversion factors from these weights, and incorporating NSUs into household and agriculture surveys. It also provides a library of local units, conversion factors, and photographic aids for selected countries. The Guidebook is structured as follows. Section 2 provides background on non-standards unit (NSUs), dis-cusses the importance of properly quantifying household consumption and production, and offers various methods of collecting this data. The section also details the bene-fits and challenges of NSUs and documents their use in the Living Standards Measurement Study-Integrated Surveys on Agriculture (LSMS-ISA) implemented by the LSMS team of the World Bank. Section 3 outlines the components of a high-quality NSU library as well as the necessary procedures for conducting a market survey to collect the components of the library. Section 4 discusses some of the important

benefits that are derived from utilizing CAPI survey meth-ods to both collect and use information contained in a NSU library. Section 5 offers concluding remarks. Annex I pro-vides a set of sample instruments for collecting and then using NSUs. Annex II is available online and provides librar-ies of NSU conversion factors from four countries (Ethiopia, Malawi, Nigeria, and Uganda).

THE LIVING STANDARDS MEASUREMENT STUDY–INTEGRATED SURVEYS ON AGRICULTURE (LSMS–ISA) is a household survey project to foster innovation and efficiency in statisti-cal research on the links between agriculture and poverty reduction in the region. Recognizing that existing agricultural data in Sub-Sa-haran Africa suffers from inconsis-tent investment, institutional and sectoral isolation, and methodolog-ical weakness, the LSMS-ISA proj-ect collaborates with the national statistics offices of its eight partner countries to design and implement household surveys with a strong focus on agriculture. In each part-ner country, the LSMS-ISA sup-ports multiple rounds of a nationally representative panel survey with a multi-topic approach designed to improve the understanding of the links between agriculture, socio-economic status, and non-farm income activities. The frequency of data collection is determined on a country-by-country basis, depend-ing on data demand and the avail-ability of complementary funding.

5

2. Methodologies for Reporting Consumption and Production Quantities

Food consumption and agricultural production are two of the most important measurements in living standards surveys as well as in many other household surveys in low- and middle-income countries. Food consumption is the primary component for many measures of poverty, nutrition, and food security. Information on agricultural production provides important insights on agricultural performance as well as farm household income and own-food consumption. The critical importance of consumption and agricultural production quantities has led to the development of several different measurement methodologies. Each method has its merits and drawbacks and is not necessarily applicable in all situations. However, we argue that the use of non-standard units to measure quantity is widely applicable and strikes a fair compromise in terms of cost.

2.1 COLLECTING DATA ON FOOD CONSUMPTION The various methods of collecting food-consumption data are the subject of a large and well-established body of liter-ature. In general, consumption information is typically col-lected via respondent recall interviews or a consumption diary. For recall, respondents are asked to estimate their consumption of an item over a specified period, typically seven days. Under the diary method, respondents are asked to keep a daily diary of their consumption. Both methods require respondents to report quantities of food consumed. Though there is a broad range of survey-design issues (see Beegle et al., 2010 for a review of these issues), the discussion here will focus on collecting quantities, as this is the specific design focus of this Guidebook.

There are many methods used to collect information on the quantity of food consumed in a household survey. Smith & Subandoro (2007) discuss seven primary methods, sum-marized as follows:

1. Metric (i.e., standard) units: Respondents report quantities in metric units such as kilograms or grams. While this method is low cost and relatively easy to record, it can result in inaccurate esti-mates in some circumstances (see below).

2. Monetary value: Respondents estimate the mone-tary value of the quantity consumed or produced. This method requires additional collection of met-ric prices in order to estimate quantities, which can also be subject to significant errors.

3. Local (i.e., non-standard) units: Any unit of mea-surement that does not have an objective, univer-sal metric or standard weight. This includes items such as “pail,” “basket,” or “pieces;” the latter is discussed in #4 below.

4. As in #3, respondents report quantities in terms of non-standard units with which they may be more familiar. These methods ease the burden on the respondent (in terms of memory recall and conversion calculations), but can increase the cost of survey implementation.

2. METHODOLOGIES FOR REPORTING CONSUMPTION AND PRODUCTION QUANTITIES 6

5. Volumetric equivalents: Respondents demonstrate how much space the food they consumed would take up. Conversion factors would need to be ap-plied to convert to metric units.

6. Linear dimensions: Respondents provide linear measurements (length and width or circumference) of the amount of food consumed. As Smith & Subandoro (2007) point out, this method likely takes more time to complete as it requires physical measurement rather than a simple vocal response.

7. Food models: Respondents choose a two- or three-dimensional depiction of a food item that best corresponds to their consumption amount. This method can provide very accurate estimates, but it can be costly to prepare the models and cal-culate their weights.

While one method may be optimal for certain items, it may not be feasible or appropriate for others. Smith & Subandoro (2007) advocate using a combination of these methods. This Guidebook and the accompanying library will focus on four of these methods, which are complementary and compre-hensive: metric units, local units, including a count of pieces, with some two-dimensional depictions (i.e., reference pho-tos). Joint use of these four methods will minimize the burden on the respondent as well as on the enumerator, although it may require additional costs beyond the main survey visit.

2.2 COLLECTING DATA ON AGRICULTURAL PRODUCTION There are three prevailing methods for measuring agricul-tural production: farmer recall, whole plot harvest, and crop cutting (Sud et al., 2016). Under the recall method (as with recall for food consumption), farmers estimate how much of a particular crop they have harvested since a certain date. Both whole plot harvest and crop cutting are much more labor-intensive processes that attempt to eliminate the sub-jective bias or error inherent in farmer estimates. Under the crop-cutting method, a portion of a farmer’s crop is cut and measured by enumerators at the time of harvest. However, there are some potential sources of bias that arise in the crop-cutting method (Fermont & Benson, 2011). Whole plot harvest is similar to the crop cutting method, but the output of the entire plot is cut and measured. This is considered the most accurate yield measurement, but is also extremely

costly to implement on a large scale. While there are numer-ous issues associated with all three of these methods, they are largely beyond the scope of this Guidebook (see Fermont & Benson, 2011 and Sud et al., 2016 for a review). The focus here is on the collection of harvest quantities.

In principle, many of the consumption-quantity collection methods outlined by Smith & Subandoro (2007) are also applicable to the collection of agricultural-production quan-tities. However, there is one additional issue that is specific to the measurement of crop harvests. The condition of the crop—threshed, shelled, fresh, dried, etc.—can have a large impact on reported harvest quantities (Fermont & Benson, 2011; Diskin, 1999; Murphy et al., 1991). The weight difference is either due to discarding a portion of the crop via thresh-ing, shelling, or peeling, or is the result of a change in mois-ture content through drying. These processes are particularly important for cereals and legumes, which are quite often processed before being used or sold. It is therefore import-ant to ensure that when a harvested quantity is reported, the condition of the crop to which the quantity refers is also specified. When quantities are reported for various condi-tions, additional condition-specific conversion factors can be applied to render the quantities comparable.

2.3 IMPORTANCE OF NON-STANDARD UNITS IN HOUSEHOLD SURVEYS The methodological issue that is the focus of this Guidebook is the use of non-standard units in the collection of con-sumption and production quantities. But what exactly are “non-standard” versus “standard/metric” units? Both stan-dard and non-standard units are commonly used in markets or by households in many countries. Standard units are uni-versally constant, referring to a clearly defined weight and/or volume. A kilogram in Uganda is the same as a kilogram in France. Likewise, a kilogram of maize is the same weight as a kilogram of wheat. For the most part, “standard” encom-passes metric units, imperial measurements, and other inter-nationally standardized units that are easily converted into metric units. For example, the conversion between kilograms and pounds is constant regardless of region or item.

In contrast to standard units, non-standard units (NSUs) often vary greatly from item to item, region to region, and even village to village. Table 1 presents examples of some common standard units and NSUs.

7 THE USE OF NON-STANDARD UNITS FOR THE COLLECTION OF FOOD QUANTITY

Some NSUs are common across many locations. For example, throughout the world, bananas and other items are often measured in bunches. While this NSU is common, it is not standardized. The number and size of the bananas that are in a bunch are not standard; one bunch of bananas could be three times the size of another bunch. Likewise, a bunch of bananas is not equivalent to a bunch of herbs. The same is true for pieces, heaps, and other units. In addition to these common NSUs, there are also NSUs that are spe-cific to a country or region. Table 1 includes several NSUs commonly used in Uganda (far right column). Many of these units are locally familiar containers of a standard volume; however, the weight of the contents will vary depending on the item. For example, a Nido tin of rice does not weigh the same as a Nido tin of groundnuts. The use of local units can vary significantly within a country. As an example, Table 2 presents the distribution of local units in Nigeria observed

Table 1 — Examples of Standard and Non-Standard UnitsStandard Non-standard

Common Local (Uganda)

Kilograms Sack Jerrican

Grams Bunch Kimbo/Blueband tin

Liters Heap Nido tin

Centiliters Piece/number Cup/mug

Pounds Bucket Nice cup

Crate

Plastic basin

Table 2 — Regional Variation of NSUs in Nigeria

% of all NSU Observations in Zone

North Central North East North West South East South South South West

Mudu 56.7 62.4 17.4 0.0 0.7 0.0

Olodo 0.0 0.0 0.0 0.0 14.0 0.0

Congo 7.5 0.0 0.0 0.0 0.0 47.9

Paint rubber 3.9 0.2 0.6 12.1 0.0 2.7

Derica 0.4 1.3 0.2 2.5 16.1 13.8

Milk cup 27.3 8.9 21.4 21.6 42.6 28.8

Cigarette cup 0.5 0.0 0.1 60.7 22.3 0.0

Tiya 0.0 26.7 58.5 0.0 0.0 0.0

Kobiowu 2.8 0.0 0.0 0.0 0.0 0.0

in the second wave of the General Household Survey Panel (GHS-Panel) for each of the six geopolitical zones. Only one unit (milk cup) is common to all six zones. Most units are only found in two to three zones and are rarely or never observed in others. These complexities associated with NSUs create some challenges relative to standard units. However, there are additional factors that contribute to the relative merits of the methods, detailed further below.

2.4 NON-STANDARD UNITS IN HOUSEHOLD SURVEYS

BENEFITS OF ALLOWING REPORTING IN NON-STANDARD UNITSThere are trade-offs involved in deciding whether to allow respondents to report quantities in NSUs or to restrict respondents to reporting in only standard units. Although NSUs are subject to significant variation, there are tangible benefits to allowing respondents to report in NSUs. The most important and overriding benefit is that respondents will likely be better able to estimate quantities using NSUs. In rural areas especially, standard units may not be commonly used in markets and respondents may not regularly use stan-dard units in their daily activities. Even though respondents may know exactly what a kilogram of sugar looks and feels like (a very common sales unit for sugar), they may not know this for cassava, maize, or other items that are not typically traded in kilograms at the household level. Likewise, many items are not generally consumed in standard units. Often, fruit is sold by the piece instead of by weight; herbs are

Note: Shaded cells = Units rarely/never observed in that zone.Source: World Bank, LSMS Team.

Source: World Bank, LSMS Team.

2. METHODOLOGIES FOR REPORTING CONSUMPTION AND PRODUCTION QUANTITIES 8

sold by the bunch, regardless of weight variation; and home-grown fruits or vegetables are harvested and eaten without being weighed. When respondents are more familiar with NSUs for specific items, it may be too burdensome to expect them to know that item in terms of standard units.

Forcing respondents to report quantities in standard units often combines two self-reporting styles, each with its own potential for error: memory recall and cognitive reasoning. Household consumption modules typically ask respondents to recall a litany of food items eaten by numerous household members over a given period. Farming households are asked to recall and report on a variety of different crops harvested over a given period. The latter case is further complicated by the fact that key crops throughout the region (e.g., cassava, maize, and plantains) are typically harvested in small quanti-ties on a continual basis.

Both memory recall and ad-hoc unit conversions also require mathematical calculations that, while not necessar-ily complicated, are prone to errors when done in the field and on the fly—even more so when considering respondent and enumerator fatigue. Combining memory recall with unit conversion increases the number of calculations required of the respondent for each value, which further increases the potential for error (as shown in Figure 1). In general, allow-ing respondents to report in the units they can most easily quantify simplifies memory recall and will yield estimates that are more accurate.

CHALLENGES OF ALLOWING REPORTING IN NON-STANDARD UNITSAlthough there is a strong case for allowing respondents to report in NSUs, many surveys of consumption and agri-cultural production still restrict respondents to report-ing amounts in standard units. This is primarily due to the additional cost and challenges associated with properly implementing and operationalizing NSUs in a survey. The complexity of NSUs as well as the additional steps required for their use can increase the financial and temporal bur-den of conducting a survey. The challenges associated with using NSUs broadly fall into two categories: (1) those asso-ciated with the preparation and implementation of the sur-vey with NSUs, and (2) ensuring NSU measurements can be converted into comparable standard units.

One of the first challenges that survey designers face is identifying the units to be included in the survey. When respondents are limited to reporting in standard units, com-piling the code list is straightforward. However, compiling a list with non-standard units requires additional information about the common NSUs in the relevant country and/or regions. When such information is limited or not available, it will need to be collected via a market survey.

In addition to identifying the NSUs to include in the main survey, survey designers also need to ensure the clarity of the unit definitions. Some of the most common units that fall into this category are pieces, bunches, or heaps. For example, a piece of sweet potato could weigh 0.5 kilograms or 1.5 kilo-grams. Figure 3 below illustrates this problem: the pictured containers vary significantly in size but are all called dengu in Malawi. In order to obtain the most accurate estimates for these units, respondents should be provided with a refer-ence frame for the quantity. One way to do this is to pro-vide respondents with reference photos for these items. This resource can also be produced as part of the market survey and requires additional enumerator training (detailed below).

Figure 3 — Wide Variety of Dengus of Tomatoes in Malawi

The final and most significant challenge in using NSUs is that they must be accompanied by standard-unit conversion factors. In their raw form, quantities in NSUs are not compa-rable across units. To directly compare and aggregate quanti-ties, the data user must convert all quantities into a common standard unit such as kilograms. Converting between

Source: World Bank, LSMS Team.

9 THE USE OF NON-STANDARD UNITS FOR THE COLLECTION OF FOOD QUANTITY

standard units is relatively easy since the conversions are constant and well known. However, for NSUs, the conver-sion is different for each unit and often for each item. Com-plicating matters further, the standard weight for the same item-unit combination can vary, even within a country. For example, one study in Nigeria found that an average bundle of sorghum weighed between 26 and 49 kilograms depending on the area (Casley & Kumar, 1988). When this is the case, region-specific conversion factors should be acquired.

HOW COMMON ARE NON-STANDARD UNITS IN SURVEYS? In low- and middle-income countries, especially in Africa, NSUs are used quite regularly for the most important items. At markets in these countries, consumers encounter a wide variety of NSUs for their purchases. In addition, when given a choice between reporting quantities in standard or non-stan-dard units, respondents often choose to report in NSUs. For example, in the second wave of the Ethiopia Socioeco-nomic Survey from 2013/2014 when NSUs were allowed, nearly 50 percent of farmers chose to report their harvests in NSUs. In the Malawi National Panel Survey, respondents chose NSUs about 73 percent of the time. This provides a strong indication that many respondents are more comfort-able reporting quantities in NSUs.

While the challenges associated with including NSUs in a consumption or agricultural production survey can be sig-nificant, this Guidebook provides detailed instructions to help survey designers incorporate NSUs into their surveys.

Figure 4 — Recommended Steps for Using NSUs

Prepare to use NSUs1. Establish valid list of NSUs:create new or update existing list2. Plan Market Survey

Conduct Market SurveyCollect national/regional weights and photos. Skip this step ONLY IFexisting data is available and is current.

Create Tools for Main Survey1. Generate conversion factors (preload into CAPI if using)2. Compile photo reference guide3. Write user protocols

Finalize Data Documentation1. Clearly document all steps and NSU/CF data sources.2. If CFs are missing, conduct a brief follow-up survey and updatefinal NSU and CF.

Conduct Main SurveySee Annexed examples for

including NSUs in main survey.

Source: World Bank, LSMS Team.

10

3. Guidelines and Procedures for Capturing and Using Non-Standard Units

This section lays out the necessary steps and procedures required to collect the information necessary to implement and use NSUs in household surveys. Before NSUs can be used, a resource library for NSUs will need to be prepared. This library should include (1) a list of common/allowable NSUs; (2) national or regional conversion factors for all item-unit combinations; (3) a photo reference album based on an index of NSUs; and (4) clear protocols for using conversion factors and reference photos in agricultural and household surveys. The best way to collect the information for the library is to conduct a market survey to capture reference photos and the item-unit weights used to calculate conversion factors. In countries where surveys already allow reporting in NSUs, existing data (once updated, if need be) can complement the library. Taken together, these components will help researchers to adopt the use of NSU reporting; for countries with libraries provided in this Guidebook, the cost/burden of adoption is significantly eased. These libraries should not be treated as fixed, but should instead be continually updated with new NSUs and conversion factors.

There are several important steps to follow when collect-ing the components for an NSU conversion-factor library (see Figure 4): 1) Preparation—Plan the timing (relative to the main survey) and the locations of the market survey, prepare the necessary market-survey materials (instru-ments and manuals), and construct a list of item-unit com-binations that will be allowed in the main survey; 2) Market survey implementation—Collect weights and reference photos, taking into account any sub-national variation; and 3) Data documentation for the main survey—Using the market data, create conversion factors for the NSUs and draft clear user protocols for enumerators (in terms of reference photos) and data users (in terms of conver-sion factors). Each of these steps is covered in detail here.

3.1 MARKET SURVEY PLANNING AND PREPARATION

TIMING OF MARKET SURVEYThe data-collection schedule for the market survey should take into consideration the seasonal availability of items and existing data-collection schedules in each country. In some cases, market-survey data collection should be planned for two separate periods to ensure more complete coverage of seasonally available items. In general, the greatest variety of items and the greatest variety of crop conditions will be available during the harvest season, though some items may only be available during the lean season or after a secondary harvest season.

11 THE USE OF NON-STANDARD UNITS FOR THE COLLECTION OF FOOD QUANTITY

The timing of the market survey relative to the main sur-vey is important. Performing the market survey before the main survey (ex-ante or independently) has several advantages. First, reference photos can be taken and then used during the main survey, and conversion factors can be used to validate reported quantities during fieldwork, both helping to ensure more accurate NSUs estimates. Second, conducting the mar-ket survey ex-ante allows for the identification of additional NSUs that may be missing from any existing list of answer options. Identified in advance, these units can be incorporated into the unit list for the main survey.

However, there are also some drawbacks to conducting the market survey before the main survey. The primary one is that new units not included in the market survey could be reported in the main survey. When the market survey is per-formed after the main survey (ex-post), the unit list for the market survey can be constructed to include all item-unit (and item-unit-conditions) observed in the main survey, thus limiting any conversion-factor gaps.

Given these considerations, the ideal plan is to conduct two market surveys—one before and one after the main survey. Both market surveys need not be equally rigorous; one will likely be more comprehensive than the other. For example, the ex-ante survey could be limited, aiming to collect reference photos and weights (for conversion-factor calculations) for the most common NSUs, while the ex-post survey could com-prehensively collect weights for all additional NSUs reported during the main survey. Or the ex-ante survey could be inten-sive, aiming to collect as many conversion-factor weights as possible—especially when conducted independently—while the ex-post survey could be limited to collecting only those unanticipated item-unit combinations. In general, the intensive version of the market survey should coincide with the season when the most items will be available in the markets.

Many household surveys already conduct market surveys as part of their fieldwork to collect current pricing infor-mation on commonly consumed items. When surveys allow NSU reporting, the market surveys could also collect actual weights of allowable NSU combinations that can be used to calculate standard-unit conversion factors. However, in such cases where the market survey is conducted in parallel with the main survey, reference photos would most likely not be available for use during interviews.

SELECTION OF MARKETS TO VISIT Three main factors will influence the selection of markets for the survey: 1) the degree of regional diversity of units

and their respective weights; 2) the relative timing of market surveys (as explained above); and 3) the types of markets fre-quented by sample households.

Markets should be selected to ensure adequate coverage of regional units and items. For market surveys conducted after the main survey, coverage can be assessed using the item-units observed in each stratum. The strata where the widest diversity of regional units is observed are prime candidates for the market survey. If the item-units were reported in the stratum, it is likely that measurements for that item-unit can be obtained from a market in that area. If the market survey is conducted independent of or prior to the main survey, ade-quate coverage must be assessed using external information on regional variations in units as well as information from the pilot survey (if conducted).

In many countries, households may patronize various types of markets, including local outdoor markets and small shops, supermarkets, wet markets, and wholesale markets. Markets selected for the survey should cover the full range of markets commonly used by households or farmers in the sample area.

In general, the number and dispersion of markets selected for the survey is highly dependent upon the context. For a nationwide survey in a large and diverse country like Ethio-pia, it would likely be necessary to visit many markets across the country to ensure adequate coverage of NSUs, espe-cially if the market survey is conducted during or after the main survey. However, for a market survey limited to a single region/state or community, visiting only a few markets may be sufficient.

PREPARATION OF SURVEY MATERIALSOnce markets are selected, the survey materials (survey instruments and supporting manuals) can be prepared. While these materials should be designed according to the local con-text, Annex I includes examples from NSU-focused market surveys that were conducted to create the LSMS conver-sion-factor libraries for Ethiopia, Malawi, and Nigeria. Figure 5 depicts a snapshot of the market survey questionnaire for Ethiopia.

Though these survey instruments are specific to these countries, they also serve as examples of how to prepare instruments for any country/project. Each survey can collect the following key types of data:

• Market identification details: Name, location, GPS infor-mation, type of market, etc.

3. GUIDELINES AND PROCEDURES FOR CAPTURING AND USING NON-STANDARD UNITS 12

• Survey management information: Date, time, duration of surveys; codes for enumerator, supervisors, and (when applicable) data-entry staff.

• Data on pre-identified NSUs: Weights, prices, and basic metadata for common item-unit combinations that have been previously identified.

• Data on unexpected NSUs: Teams can collect the same type of data listed above for item-unit combinations that are not pre-defined, but that are present in the market. If such NSUs are commonly used in a regional market, it is likely that household survey respondents are purchasing,

and thus more readily able to report, in these quantities.

The Nigeria survey is split into two sections, allowing enu-merators to more easily divide and share the data-collection work during each market visit. Detailed instructions on how to collect the data and complete the questionnaires are in the training manuals (also included in Annex 1).

3.2 CONSTRUCTING THE LIST OF NON-STANDARD UNITS The first step in preparing the NSU library is to establish the list of common NSUs that will be used in the consumption and/

Figure 5 — Excerpt from a Market Survey QuestionnaireMODULE B: ITEM-UNIT MEASUREMENT - NONCONTAINERS

1 2 3 4 5 6 7 8 9

ITEM NAME

ITEM CODE

UNIT NAME

SIZE Was item measured?

YES...1(7)

NO..2

Why was item not measured?

NOT FOUND IN MARKET AT THIS TIME........................................................1CROP NOT COMMONLY FOUND IN THIS MARKET.................................2UNIT NOT COMMONLY FOUND IN THIS MARKET.................................3 SIZE NOT COMMONLY FOUND IN THIS MARKET.................................4

OTHER, SPECIFY..................................5

Item Sample #1 Item Sample #2 Which type of scale was used?

PERSONAL DIGITAL SCALE..................................1

MARKET SCALE WITH GOVERNMENT CERTIFI-CATION.............................2

MARKET SCALE WITH-OUT GOVERNMENT CERTIFICATION..............3

Weight(KGs)

Price(Birr)

Weight(KGs)

Price(Birr)

CEREALS AND GRAINS

BARLEY 1 ESIR Small

1 ESIR Medium

1 ESIR Large

1 CHINET Small

1 CHINET Medium

1 CHINET Large

1 SHEKIM Small

1 SHEKIM Medium

1 SHEKIM Large

MAIZE 2 PIECES Small

2 PIECES Medium

2 PIECES Large

2 ESIR Small

2 ESIR Medium

2 ESIR Large

2 CHINET Small

2 CHINET Medium

2 CHINET Large

MILLET 3 ESIR Small

3 ESIR Medium

ALL RESPONSES ( NEXT ITEM)

Source: World Bank, LSMS Team.

13 THE USE OF NON-STANDARD UNITS FOR THE COLLECTION OF FOOD QUANTITY

or production modules of the main survey. A list of common/allowable item-unit (and when applicable, item-unit-condi-tion) combinations for NSUs should include a comprehen-sive set of valid NSU combinations for each crop and each food item. Where applicable, crop/food condition (e.g., corn in husk or not, peanuts shelled or unshelled, fresh vs. dried cassava) should be considered, especially for reporting har-vested quantities, as the condition significantly impacts the weight-volume ratio. While it may be impossible to prede-termine all the possible combinations, the library should endeavor to include combinations that represent the vast majority of options (preferably higher than 90 percent). Even when the same crops are grown and the same foods are con-sumed in different countries, it cannot be assumed that the same NSUs will be used in both places.2

The best practice for compiling this list would be for national statistics agencies to establish a conversion-factor library independent of any specific household survey, which can then be made available for use with any new survey within the country. Unfortunately, many low- and middle-income countries have no such source for NSU conversion factors; when they do, documentation and other supporting materials are often limited or lacking. When a conversion-factor library is available and well documented, household survey teams may choose to optimize timeline and budget constraints by using this existing resource. When such a library does not exist, or if existing conversion-factor data are limited or outdated, an NSU-focused market survey must accompany the household survey. Instructions for the market survey are discussed in detail in the next section. When implementing a market sur-vey to fill these data gaps, the procedure for identifying the NSUs to include in the market survey will vary depending on the stage at which this step is performed.

MARKET SURVEY BEFORE/INDEPENDENT OF MAIN SURVEYWhen the market survey is implemented before or indepen-dent of the main survey, the first step is to seek any infor-mation on common NSUs within the country. Identifying common NSUs and the items they apply to can be quite chal-lenging, depending on the quality of information available to help guide selection. A best first source would be a com-prehensive review of NSUs within a country. The libraries

2 Cross-country comparisons may be used to check the consistency of allowable item-condition combinations and to reconcile food-weight densities for common crops. This cross-country harmonization will be the focus of future work in this series.

annexed in this guidebook are intended to be such a source, but there may be additional reviews available, such as Kor-mawa & Ogundapo (2004) in Nigeria.

When comprehensive reviews are not available, the next preferred source for common NSUs will be other surveys already conducted in the country of interest that have allowed quantities to be reported in NSUs. These could be either household-level surveys with consumption or agricultural components, or market surveys. For these outside sources, the survey designer must consider the comprehensiveness of the NSU list. Some surveys may include only a few of the most common NSUs and exclude less common though important ones. Likewise, the geographic coverage of the survey also needs to be taken into account. For example, surveys that only cover a small area may not contain NSUs that are com-mon in other areas of the country. Unless the selection of NSUs is clearly documented and comprehensive, the survey designer should seek additional information.

If resources and time permit, existing NSU lists can be val-idated with a small pilot survey to ensure the list is compre-hensive and current. The pilot survey can either be at the household or market level. Performing a household-level pilot has the advantage of capturing consumption units used by households, which may differ from the units used in market transactions. However, conducting even a limited market-level pilot survey will allow for the collection of a wide array of item-units in a single market, whereas it may take several households to acquire a comprehensive list. The pilot survey should be largely open ended, allowing respondents (either household members or market vendors) to report in the units with which they are most comfortable or in the units that are most commonly available.

Many units may be available in different sizes, such as the array of dengus shown in Figure 2. In this case, simply listing dengu in the selected unit list would not sufficiently help stan-dardize this NSU. When there is variation, the unit list should include the possibility of different sizes (e.g. small, medium, large) and the weights and reference photos for each size should be collected. This is particularly important for units such as pieces or heaps, which are subject to within-unit weight variation. When NSUs are coupled with reference photos depicting multiple measured sizes, it provides greater comparability across reported NSUs by standardizing the respondents’ reference points. For example, if tomatoes are scarce in only one region, what is considered a “large tomato” may be equivalent to a small one elsewhere; by providing standardized photo references, the respondent can point to

3. GUIDELINES AND PROCEDURES FOR CAPTURING AND USING NON-STANDARD UNITS 14

“their” tomato, thus ensuring its weight is converted in a standardized way, regardless of local variation.

Note that a key benefit of conducting an ex-ante market survey and using photo references in the main survey is that much of the regional variation can be eliminated, which in turn limits the scope and burden of the market surveys to be conducted. Without the ex-ante collection (which allows for greater standardization with fewer measurements), you will need to collect and compare NSU data from markets in all regions to avoid under or over-reporting consumption across regions with different concepts of reference sizes.

MARKET SURVEY AFTER MAIN SURVEY For market surveys conducted after the main survey, the NSU list can be constructed based on the units observed in the data. Constructing the list ex-post can shorten the list of weights needed to exactly those necessary to make use of the data while ensuring there are no gaps in the eventual conversion-factor data.

When constructing the list of item-unit combinations, the data should be examined for combinations that are commonly observed. Invalid combinations should be excluded. Ideally, every valid item-unit as well as crop condition observed in the data should be included in the list of units for the market survey. However, if the list of observed item-units is exten-sive and/or resources for conducting the market survey are limited, then the item-unit list can be shortened. The most obvious method is to eliminate the least commonly observed item-units. This will depend greatly on the survey, but in gen-eral only very infrequently observed combinations should be dropped.

The extent of regional variation in reported NSUs should also be assessed to determine at what level the market survey should be conducted. This can be done by comparing com-mon units at various geographic levels. If at most geographic levels the item-units are similar, then a national list can be constructed. However, if there is significant variation across regions, it may be more appropriate and feasible to disaggre-gate the item-unit lists to the regional level.

3.3 COLLECTING WEIGHTS FOR CONVERSION FACTORSThe two main purposes of the market survey are to col-lect weights in order to calculate conversion factors for consumption and production NSUs and to take reference

photos of item-unit combinations for respondent interviews. Although the procedure for collecting both these items may seem straightforward, strict protocols should be followed to obtain the most accurate conversion factors and produce useful reference photos.

A dataset of national or regional conversion factors for all allowable combinations will be the main analysis compo-nent of the library. The listed conversion factors should be provided at the lowest feasible (and representative) level of regional disaggregation. The general procedure for collecting weight measurements involves (1) finding vendors who have the necessary non-standard item-unit combination, (2) prop-erly weighing the item-unit, and (3) recording the weight of the item-unit.

Step 1: Finding item-units to weigh

Armed with the list of item-unit combinations to weigh, enu-merators should seek out each of the combinations from vendors in the market. Each item-unit measurement should be taken from multiple vendors to account for any varia-tion in vendors’ subjective assessment of what constitutes a unit amount as well as for possible enumerator error in the measurement itself. For each item-unit pair we recom-mend collecting measurements from three different vendors within each market if time, personnel, and budget constraints permit.

Survey teams need not limit measurements to the prede-termined list of item-units. If additional item-unit pairs are found at the market, record these as well. This will be partic-ularly beneficial when the market survey is conducted before the main survey as it will allow the new units to be incorpo-rated into the main survey.

If an item is available in the market, every effort should be made to collect all the listed unit options for that item. The greatest challenge at this stage will likely be that some items or units are not found due to seasonal availability of the item or limited use of a unit for sales. One solution is to search for the item-unit at vendors nearby who are outside the formal market. If the item-unit is found there, the alternative location should be noted by the enumerator.

The day of the market visit could also be an important determinant of NSU availability. In many communities, there are specific days designated as “market days.” On market days, a wide array of traders and farmers will participate in the market and thus, a greater selection of items and units will likely be available. However, there will also be more

15 THE USE OF NON-STANDARD UNITS FOR THE COLLECTION OF FOOD QUANTITY

activity on these days, making it harder to perform the mea-surements. Vendors may also be less willing to participate in the survey on a market day since they will be busier. Given this trade-off, we recommend survey teams visit the market first on a non-market day to acquire all the measurements available, and then again on a market day if any items or units are missing.

Since the focus of the survey is on consumption or produc-tion units and not specifically on market units, some contain-er-based units may not be found at the markets. This could especially be the case for some agricultural production units used by farmers but not typically sold by vendors. Enumera-tors can ask vendors about the units in which they themselves purchase items from farmers, asking them to demonstrate the appropriate quantity of the item-unit pair. Alternately, survey teams may be able to acquire such containers directly from the source, (i.e., nearby households or farms). Contain-ers may be purchased or borrowed and then brought to the market for filling and weighing.

Larger units, especially those used for measuring harvest quantities, may not be available from market vendors, but can be collected at the market scale station (further details below) or from wholesale traders in the market.

Locating a crop at the market in its various conditions will likely be difficult. Many crops will only be available in their final condition before consumption: cereals will likely be threshed; legumes will likely be shelled. In such cases, addi-tional weights may need to be acquired by conducting some measurements at the farm-household level. A limited number of condition-specific weights can be used to create conver-sion factors across item-unit-condition pairs.

Step 2: Weighing the item-units

Once an item-unit is located, it must be properly weighed. When weighing a container unit, the empty container’s weight must be excluded from the measurement. Many modern scales can automatically subtract the weight of a container (the “tare” weight) from the total weight. If the scale being used does not include the option to zero out the tare weight, then the subtraction must be done manually.

Enumerators must be properly trained in the use of scales, including how to identify appropriate (even) surfaces on which to use the scales. Scales should be calibrated regularly during fieldwork to ensure consistency across measurements. It is important that the scale be kept clear of any other objects, including any spillage from containers. For example, any grains

that fall out of a heaped container and onto the scale should be cleared before weighing.

Enumerators should not be responsible for determining the amount of the item with which a particular unit is filled. They should only weigh what vendors provide. This includes typical heaping practices for containers. If the local practice is to heap as much of an item into a container as possible, then that is what should be weighed; if leveling is common, then leveled containers should be measured. When container quantities are available heaped and level, both should be mea-sured and noted.

While most item-units will require physical weighing of the unit, in some cases no weighing will be required. This is true for item-units that are commonly purchased prepack-aged, with the weight printed on the container. Some com-mon examples are bags of rice, tinned or canned foods (e.g., tomato sauce), snack items, etc. Note that although these item-units need not be weighed, reference photos must still be taken since respondents may not remember the weight of the package but can identify which size/shape package they consumed.

For most consumption units, collecting the weights will be fairly straightforward. However, for larger units— especially those used for production—there may be additional chal-lenges. Heavier item-unit pairs are often beyond the maxi-mum range of the portable scales used for the survey. When this is the case, there are two potential solutions:

• Break up the unit into a series of smaller groups that can be weighed separately. Once all the groups have been weighed, they can be added together to acquire the total weight of the item-unit. Depending on the size of the item-unit and the maximum range of the scale used, this can be a laborious and time-consuming process. Further-more, vendors may be unwilling to open larger units (if sealed) and have them handled by enumerators.

• Make use of other scales that have a higher maximum weight. These can be either additional scales that enumer-ators bring with them to the market or higher-capacity scales found in the market. In many markets, there will be bulk traders or aggregators that purchase items from farmers for resale to market vendors. Since these traders deal in large quantities, they will likely have a scale that can measure these heavier weights. Making use of these market scales may be easier than breaking up the item-unit into multiple groups, but it does require an additional

3. GUIDELINES AND PROCEDURES FOR CAPTURING AND USING NON-STANDARD UNITS 16

step: calibrating the market scale. In general, market scales may not be as advanced or accurate as the main scales used for the survey. Any error in the market scale mea-surement must be estimated and corrected. That can be done by selecting an item that weighs close to the maxi-mum of the survey scale. This same item should then be weighed using the survey and the market scales and both measurements should be recorded. Comparing these two measurements will allow for error correction in the mar-ket scale’s measurement during the data-review process. Only one calibration is needed for every market scale. Since it may be impossible or at least impolite to adjust or even scrutinize the market’s scale, consider doing this after all unit measurements are collected. This procedure requires that the measurement tool also be noted: survey scales or a market scale. These scales will typically not be as precise as the smaller-capacity scales, but they are suf-ficiently precise for larger units. If higher-capacity market scales are not available or common in the market, then larger-capacity scales may be acquired for use by enumer-ators for larger units.

Another potential challenge for production units is the adjustment of weights by bulk traders or aggregators. In some cases, traders will purchase an item-unit from a farmer and adjust the weight before distributing it to market vendors. For example, a farmer may bring a sack with 115 kg of wheat, but after purchasing it, the trader might adjust the weight of the sack to an even 100 kg before selling it to market vendors. The purpose of the market survey is to acquire conversion factors for units reported by farmers, so every effort should be made to weigh the item-unit the farmer brings to the market (e.g., the 115-kg sack of wheat) before it is adjusted by the trader.

Step 3: Calculating the conversion factors

Calculating conversion factors can be a complicated process. Results from the market survey should be cleaned and outli-ers scrutinized. If there are relatively few measurements for each item-unit, outliers can distort conversion factors sub-stantially. If different sizes for a unit were allowed, the mea-surement data may require further processing. A problem may arise where classifications of a small, medium, and large versions of a unit could vary considerably. For example, the small size of a unit found in market X may be larger than the large version collected in market Y. These must be reconciled so that there is a standard classification of small, medium, and large within the relevant level of geographic aggregation (e.g., region, state). This can be done manually through review and

comparison of the reference photos and reassignment of size. However, this can be burdensome if there are many measure-ments. An alternative method is to classify measurements based on their position in the distribution of measurements for that particular item-unit pair. The most basic approach is to classify observations that fall below the 33rd percentile as small, between the 33rd and 66th percentile as medium, and above the 66th percentile as large. However, the number of sizes must be considered before applying this method. Some units may only be found in two relatively uniform sizes, in which case only small and large size should be assigned. If possible, a review of the photos is arguably a more compre-hensive approach, or at least a verification step, to solving this problem.

For some items, the additional component of condition will also need to be taken into consideration when calculat-ing conversion factors. In most of these cases, conversion factors should be applied not only for converting to kilogram amounts, but also to render the quantities comparable to each other. For example, maize/corn can be harvested on the cob (usually fresh) or without the cob in grain form (usually dry). The kilogram conversion reported for fresh, on the cob maize is not directly comparable with the kilogram results of dry maize grains. To compare all reported maize conditions with each other, the conversion between maize on the cob and maize dry kernels/grains is also needed.3

Once cleaned, the measurements must be aggregated to an appropriate level. The mean or median measurement for each container unit can be used. For non-containers, the con-version factor will be item specific and should correspond to the reference photo included in the library. If there is signifi-cant regional variation, then regional-level conversion factors should be given. Otherwise, national conversion factors are adequate. The conversion-factor database should be orga-nized so that there is a single conversion factor for each item-unit at the appropriate geographic level, though there may be some item-units not found in a particular region. Therefore, we recommend that national-level conversion factors also be provided even if there is significant regional variation. Figure 6 presents a subset of the conversion-factor library for Nigeria. In the figure, conversion factors are provided for the six geo-political zones as well as the national average.

3 The appropriate adjustment factors for this exercise are not part of the original set of libraries found in Annex 11, but could be considered in future conversion libraries and methodological research.

17 THE USE OF NON-STANDARD UNITS FOR THE COLLECTION OF FOOD QUANTITY

3.4 COLLECTING REFERENCE PHOTOSThe second element that must be collected during the mar-ket survey is a set of non-standard unit reference photos. After the market survey, all photos should be compiled and included in the library. This index should contain photos of all allowable item-unit combinations, with each one directly linked to the measurements used in the conversion-factor list. For example, the pieces of yam in a photo should be exactly the same pieces used to calculate the conversion

factors as described above, and something in the naming scheme of the photos should make this connection clear. This index of photos will be used to prepare the photo ref-erence album.

WHICH ITEM-UNITS REQUIRE PHOTOS?Ideally, each item-unit included in the survey (including pre-packed foods) will have a reference photo. Practically, surveys

Figure 6 — Excerpt from a Conversion Factor Library for Nigeria

NIGERIA GHS-PANEL WAVE 3CONVERSION FACTORS

ITEMCODE

ITEMNAME

UNITCODE

UNIT DESCRIPTION

UNIT SIZE

CONVERSION FACTOR (KG)

NATIONALAVERAGE

BY ZONE

NORTHCENTRAL

NORTH EAST

NORTH WEST

SOUTH EAST

SOUTH SOUTH

SOUTH WEST

GRAINS AND FLOURS

10 GUINEA CORN/

SORGHUM

11 Paint rubber 3.612 3.758 3.612 3.612 3.768 3.832 2.828

12 Milk cup 0.161 0.205 0.125 0.163 0.180 0.161 0.159

13 Cigarette cup 0.205 0.205 0.205 0.205 0.215 0.198 0.205

14 Tin 14.738 15.510 14.738 14.738 13.965 14.738 14.738

20 Congo Small 1.000 1.280 1.000 1.000 1.000 1.000 .720

21 Congo Large 1.978 1.978 1.978 1.978 1.978 1.978 1.978

30 Mudu Small 1.073 .978 1.103 1.145 1.073 1.060 1.073

31 Mudu Large 1.353 1.368 1.248 1.445 1.353 1.353 1.353

40 Derica Small 0.238 0.238 0.238 0.238 0.238 0.138 0.338

41 Derica Medium 0.639 0.639 0.612 0.639 0.639 0.639 0.745

42 Derica Large 1.587 1.587 1.587 1.587 1.813 1.587 1.361

43 Derica Very large 1.889 1.889 1.889 1.880 1.890 1.870 1.925

51 Tiya Medium 1.825 1.825 1.825 1.825 1.825 1.825 1.825

52 Tiya Large 2.650 2.650 2.650 2.650 2.650 2.650 2.650

60 Kobiowu Small 0.595 0.595 0.595 0.595 0.595 0.595 0.595

61 Kobiowu Medium 1.110 1.110 1.110 1.110 1.110 1.110 1.110

62 Kobiowu Large 1.210 1.210 1.210 1.210 1.210 1.210 1.210

11 MILLET 11 Paint rubber 3.765 3.672 3.765 3.765 3.767 3.805 3.840

12 Milk cup 0.153 0.153 0.145 0.165 0.150 0.153 0.155

13 Cigarette cup 0.210 0.210 0.210 0.210 0.215 0.205 0.210

14 Tin 15.060 15.685 15.060 15.060 14.435 15.060 15.060

20 Congo Small 0.924 1.160 0.924 0.924 0.924 0.924 0.688

21 Congo Large 1.437 1.437 1.437 1.437 1.437 1.437 1.437

30 Mudu Small 0,988 0,893 1,058 1,135 0,988 0,988 0,988

31 Mudu Large 1,260 1,260 1,210 1,323 1,260 1,170 1,260

40 Derica Small 0,243 0,243 0,243 0,243 0,243 0,145 0,340

Source: World Bank, LSMS Team.

3. GUIDELINES AND PROCEDURES FOR CAPTURING AND USING NON-STANDARD UNITS 18

may wish to limit the photo book to items that represent a significant portion of the total food consumed or total food expenditures. It is essential that separate item-unit pho-tos be taken for each non-container unit such as pieces or heaps. However, for container units (pails, plates, etc.), a single photo for each container may be sufficient since the volume of the container does not vary with the item it holds. Item-specific photos of containers are useful if the fill level (heaped/level) varies significantly across each item. If units are expected to differ by region (e.g., only the North uses baskets, or the object called a pail in the West is different from the pail used in the East) then different photos must be taken in each region as well. However, for units that are relatively uniform across the survey area, only one photo need be taken.

GUIDELINES FOR REFERENCE PHOTOSThe primary purpose of these photos is to compile a ref-erence album for use during household survey data collec-tion. With this tool, respondents can estimate quantities in relation to the related reference size. For example, when shown a reference photo of a potato during the household survey, a respondent can say she ate three potatoes of the size shown, or consumed one potato that was about half the size of the reference potato. Additionally, the photos serve an internal purpose in the creation of the conversion factors, as they can be used for verification/validation of weight mea-surements collected by the market team. The photo quality, while important for both applications, is far more critical for

the reference album that will be shared with respondents. When enumerators are instructed to take photos of all mea-surements, the research team will have multiple pictures to choose from when compiling the photo reference guide.

For the reference photos to be useful, they must depict the referenced quantities in a way that can be easily under-stood and interpreted by survey respondents. Regardless of the enumerators’ general familiarity with taking photos, ample time should be allotted for training enumerators on the photo requirements for this exercise. Effective and eas-ily interpreted reference photos should adhere to these guidelines:

• Photos should be well lit so that respondents can easily see the items and differentiate between the item and its shadow or background.

• When possible, a plain background should be used for each photo. This could be a piece of paper, a sheet, or some other material. The plain background will serve to better highlight the item, especially when its color con-trasts with the item color.

• Each photo should contain only one food item or one food unit. For example, a photo of shelled groundnuts should not include unshelled groundnuts or maize; a picture of pails (a unit used for various items) should not include bunches or piles of a particular food.

• For units that come in various sizes (e.g., small, medium, large), all sizes of the item-unit must be present in the

Figure 7 — Correctly Photographed Sahins of Rapeseed

The near horizontal side angle shows the containers are filled in a "heaped" style, allowing for better understanding of the volume.

Source: World Bank, LSMS Team.

19 THE USE OF NON-STANDARD UNITS FOR THE COLLECTION OF FOOD QUANTITY

same photo to help respondents differentiate between sizes. The items should always be in the same size order (i.e., left to right, ordered from small to large) in the picture. However, some units may be too large to include the size variations in a single photo. For such units, spe-cial care must be taken to ensure that the photos of the different sizes are directly comparable – this means that they are taken from the same angle and same distance

and include the exact same reference item (positioned the same way relative to each item-unit).

• A size reference item must be included in the picture to illustrate the relative size of the main objects. The item should be something that generally comes in one standard size, is easily identifiable to respondents, and could be brought to interviews by enumerators. Exam-

Figure 8 — Correctly Photographed Tasas of Sunflower Seeds

Figure 9 — Correctly Photographed Heaps (Medebs) of Papaya

Plain white background contrasts nicely with the items pictured.

Reference item is included. It is placed next to item for easy com-parison, and is an appropriate size given the size of the item-unit.

Photo taken from the side angle shows the items stacked underneath, helps in under-standing the volume of the heap.

Source: World Bank, LSMS Team.

Source: World Bank, LSMS Team.

3. GUIDELINES AND PROCEDURES FOR CAPTURING AND USING NON-STANDARD UNITS 20

Figure 10 — Correctly Photographed Empty Pails (to be used as unit reference for multiple items)

Photo includes only one reference unit to avoid confusion for respondent.

To reduce enumerator error in recording responses, items are pictured in order (small, medium, large), which is done consistently across all photos.

ples include a water or soda bottle, a writing pen, a box of matches, etc. This is a critical component of the photo. Without it, respondents may not be able to accurately judge the size of the item-unit in the photo.

• The dimension or volume of the item-unit must be clear. Usually this means taking the picture from a side angle, either directly horizontal to the item, or slightly above horizontal. For some non-container units such as pieces, aerial photos (taken from directly above) may be accept-able or sometimes preferred. The key is to ensure that the volume of the item is conveyed in the photo.

Several example photos are shown in this section. Figures 7 through 10 (above) are examples of photos that follow these guidelines. Each photo has a reference object (a soda bottle in this case), a plain background, sizes shown in the appropriate order, and all taken from an angle that allows respondents to accurately gauge the size/volume of the unit.

Figures 11 through 14 are examples of photos that were not taken correctly and will be difficult for respondents to interpret. Figure 11 shows a direct overhead view, whereby the volume of the container cannot be accurately gauged. The photo could be of a shallow plate or a very deep

bucket, but it is impossible to tell from the photo. The item is also not photographed in its original container, which makes it more difficult to understand the volume.

Figure 12 features three different sizes, but the direct over-head angle may be misleading for piles of vegetables. Does the large pile have only the five pieces shown, or are there more stacked underneath? How many pieces are really in the medium pile? There is also no reference item, so it is impossi-ble to tell if the small items are the size of golf balls or tennis balls. Finally, the items are in reverse order (large to small); assuming the other photos and the questionnaire list/label units from small to large (as is most commonly done), then photos that do not follow this pattern will increase the likeli-hood of enumerators incorrectly recording (transposing) the unit size of the item shown during data collection.

In Figure 13, all three sizes are included, as is a reference item. However, the background adds a lot of unnecessary dis-traction. And the inclusion of onions in the photo may confuse respondents.

In Figure 14, the items are also in reverse order. More prob-lematic, though, is that the small basket (on the right) was pho-tographed separately using different backgrounds, angles, and

Source: World Bank, LSMS Team.

21 THE USE OF NON-STANDARD UNITS FOR THE COLLECTION OF FOOD QUANTITY

Photographed in a different container. Since the mudu is not in the picture, it loses context for respondents.

Figure 11 — Incorrectly Photographed Mudu of Gari

Figure 12 — Incorrectly Photographed Heaps of Sweet Potato

Shallow pan or deep bucket? Overhead view makes it impossible to related volume.

The background color is not ideal—the item blends into the background.

A reference item is missing; it is unclear if this sweet potato is longer or shorter than a common pen, for example.

Photo is large-to-small; if all the others are small-to-large, this may lead to mix-ing up small and large codes in responses.

Source: World Bank, LSMS Team.

Source: World Bank, LSMS Team.

White Gari One Mudu 1.25kg

Sweet Potatoes (heap) 3.57kg 2.23kg 1.0kg

3. GUIDELINES AND PROCEDURES FOR CAPTURING AND USING NON-STANDARD UNITS 22

Figure 13 — Incorrectly Photographed Heaps of Green Peppers

Figure 14 — Incorrectly Photographed BasketsTwo different pictures are joined together, each taken from a different angle and distance. This makes it harder to relate the size of objects in the different pictures.

Not clear which basket is small, which is medium, which is large.

The items in the background are very distracting. Photos should focus on only the item in question, to avoid confusion.

Source: World Bank, LSMS Team.

Source: World Bank, LSMS Team.

23 THE USE OF NON-STANDARD UNITS FOR THE COLLECTION OF FOOD QUANTITY

distances from the camera for both the basket itself, as well as for the basket in relation to the reference item. All these details make the small basket in the photo visually similar (or greater) in appearance than the medium basket. This is not a useful ref-erence for a respondent and will compromise the accuracy of the data reported.

To comply with the guidelines discussed above, survey teams will often need to have access to a large staging area in which to take photos, especially for larger units or when more than one size of a unit is photographed at once. In compact or crowded markets, this can be a significant challenge. Enumerators may not have enough space to position the camera sufficiently far away to capture all elements in the photo. Similarly, enumera-tors may block passages in the market when taking the photos. This can cause disruption in the market and create animosity from vendors or market patrons. If the market is crowded or very compact, enumerators should try to find a staging area where they can take photos without much difficulty or distur-bance. They should then attempt to collect as many measure-ments from vendors near the photo staging area.

CREATING AND USING THE PHOTO REFERENCE ALBUMPhotos collected from the market survey should be scru-tinized. The best photo should be selected for each item-unit combination for inclusion in the bank of reference photos. This reference tool will be used to establish a clear connection between the respondent’s reporting and the established con-version-factor database. If the size of the reference album must be limited, focus on the most commonly reported NSUs. Ref-erence aides should be printed in color, using a durable material that will withstand fieldwork (such as cardstock, or laminated paper), or should be shown on tablets if the survey is conducted using computer-assisted personal interview (CAPI) technology. The list of reference photos should be organized to match the survey sequence and thus facilitate its use in the field. Forcing enumerators to flip through a multitude of pages of photos to find a particular item or unit will waste time and result in frustra-tion on the part of both the enumerator and the respondent. Both CAPI and paper-based surveys can benefit from printed photo ref-erence albums, which can sometimes be more easily shared with respondents during an interview.

When administering the consumption or agricultural produc-tion questionnaire, the enumerator should allow the respondent to report quantities in the unit with which the respondent is most familiar. The enumerator should not provide the respondent with

the list of allowable item-unit combinations, but should instead refer to the list to ensure that the item-unit provided by the respondent is indeed valid. If the unit the respondent gives is not listed, then the enumerator should use his or her judgement regarding its validity. After the respondent has specified the quantity in the preferred unit, the enumerator should check to see if there is a reference photo for the item-unit. If there is, the enumerator should show the photo to the respondent and verify that the pictured unit is similar to that referred to by the respondent. If applicable, the enumer-ator should also ask which size of the unit most closely matches. The respondent may need to re-estimate his or her consumption/production in terms of the reference photo.

3.5 HOW TO USE THE NON-STANDARD UNITS LIBRARIESOnce all the necessary components are assembled into the NSU library, protocols should be drafted to provide guidance to both enumerators and data users on how use the library. Clear pro-tocols for using NSUs and reference photos must be provided to field teams for the primary household and agriculture sur-veys. Likewise, clear protocols for using the conversion factors must be provided to data users. Each NSU library should include clear, coherent, and concise documentation so that the libraries can easily be used by researchers and field teams. Incorporation of NSU materials into household and agriculture surveys will require additional preparation, which can be done by the house-hold survey team or as a final step in completing part of the NSU library documentation, in which case each household survey team will need to evaluate the available materials in order to adapt them to its needs. This preparation includes revising the consumption (and when applicable, harvest) questionnaire sec-tions to include NSUs and crop conditions; preparing the photo reference guides to be used by teams; and providing instructions to enumerators on how to effectively incorporate these new resources. Annex I contains examples of documents used to incorporate NSUs into the Ethiopian Socioeconomic Survey: the food consumption section of the household survey, including a code sheet for reporting NSUs; a photo reference guide to be printed, bound, and used by enumerators during their interviews; and a sam-ple section of an enumerator training manual that provides instruc-tion to enumerators on the use of the questionnaire and the photo reference guide. A snapshot of the Ethiopia consumption question-naire that incorporates NSUs is shown in Figure 15.

The LSMS team has created conversion-factor libraries for Nigeria, Ethiopia, Malawi, and Uganda, with more planned for Tanzania, Niger, Mali, and Burkina Faso. They are provided

3. GUIDELINES AND PROCEDURES FOR CAPTURING AND USING NON-STANDARD UNITS 24

online in Annex II as they were used to support LSMS data-collection efforts in each country. The process of com-piling these libraries has made it possible to further refine the guidelines and best practices outlined herein.

Although some of these libraries may not have the complete set of recommended items or may have some photos that do not meet all the stated recommendations, they can still serve as valu-able resources. Both partial and completed libraries can be used by researchers and fieldwork teams to help increase the accuracy of reported quantities in their own work, without incurring the significant time-cost burden required to establish a new set of con-version factors. Even so, the libraries should be considered living documents, to be revised and updated with each new data-collec-tion effort. Available foods and commonly used units and quantities may vary over time, so even complete libraries should be reviewed and piloted prior to their use on a new project.

Researchers conducting their own fieldwork can begin by includ-ing the existing lists of allowable item-unit pairs into consumption and production questionnaires, training enumerators on the proper use of photo reference aides, and incorporating the provided data-set of NSU conversion factors into interview and data entry checks. When possible, research teams should do a brief pilot test of the commonly-available NSUs in their survey area, as these may change over time or vary across regions; conversion-factor data and pho-tos would only need to be collected for any newly available combi-nations. For research projects focused on analysis of existing data, where that data allowed for NSU reporting but conversion factors may not be available, the LSMS libraries can help increase the num-ber of usable observations. Annex II provides additional information and user instructions on each of the available libraries.

Figure 15 — Excerpt of a Household Survey Allowing for NSU Reporting

CONSUMPTION UNITS

UNIT SIZE UNIT CODE

Kilogram 1

Gram 2

Litres 4

Centilitres 5

Jog 8

Melekiya 9

Birchiko Small 31

Birchiko Medium 32

Birchiko Large 33

Esir Small 61

Esir Medium 62

Esir Large 63

Festal Small 71

Festal Medium 72

Festal Large 73

CEREALS

1 Teff

2 Wheat

3 Barley

4 Maize

5 Sorghum

SECTION 5A: FOOD LAST 7 DAYS

FOOD

ID

1. 2. 3.Over the past one week (7 days), did you or others in your household consume any [ITEM]?

INCLUDE FOOD BOTH EATEN COMMUNAL-LY IN THE HOUSEHOLD AND THAT EATEN SEPERATELY BY INDIVIDUAL HOUSEHOLD MEMBERS

YES...1 NO...2 NEXT ITEM

How much in total did your household

consume in the past week?

SEE UNIT CODES ABOVE

How much came from purchases?

IF NONE

RECORD 0

SEE UNIT CODES ABOVE

QUAN-TITY

UNIT CODE

QUAN-TITY

UNIT CODE

Source: World Bank, LSMS Team.

25

4. Benefits of Using Computer-Assisted Personal Interviewing (CAPI)

Materials collected to support the use of NSUs can be used with both paper-based and computer-based surveys. However, some aspects of the information collected for use with NSUs can be greatly enhanced when used with computer-assisted personal interviewing (CAPI). CAPI provides unique benefits when conducting a market survey, particularly with respect to the ability to directly link weight measurements with reference photos. When conducting the main food consumption or agricultural production survey, CAPI can make better use of collected reference photos as well as conversion factors (to identify outliers). Both these aspects are discussed in turn here.

4.1 CAPI FOR MARKET SURVEYSMarket surveys are ideal candidates for collection using com-puter-assisted personal interview (CAPI) technology. Per-haps the strongest advantage that CAPI collection has over paper is that photos can be directly linked to measurements. When conducting a market survey using paper, one must ensure that the photos can be linked to the correct weight measurement observation. One way to ensure this link is to apply a rigorous naming scheme for the photos, referenc-ing the item-unit, the market in which it was taken, and the measurement observation it refers to (if there are multi-ple measurements within the same market). Renaming these photos while conducting the survey can be time consuming for enumerators and can lead to mistakes. However, when using CAPI software (such as Survey Solutions), photos can be taken immediately after recording the measurement and can be directly linked to that measurement observation. The photo is automatically named with a reference to that spe-cific case. In addition, CAPI software can provide a prompt to enumerators to take a photo of the measured item. This can help ensure that there is at least one photo taken for every item-unit measurement collected.

In addition, CAPI technology also makes the collection of additional metadata much easier. For example, GPS

coordinates where each specific measurement is taken (or at least the more general market location) can be automatically captured by the CAPI device. Likewise, the date and time the measurement was taken can also be automatically recorded.

Collection using CAPI also allows for on-the-fly consis-tency checks. Since relatively few measurements will be taken within a market, it is important to limit the potential for error when collecting weights in standard units. For exam-ple, the current measurement can be compared with previ-ous measurements and flagged if it is significantly different. Likewise, a predetermined reasonable range for a particular item-unit can be applied. These bounds must be made flexible and must only account for the most egregious mismeasure-ments. For example, for very small units, any measurement over X kg would be unreasonable. These kinds of checks can identify some common errors such as reporting weights in grams instead of kilograms.

However, there is at least one potential drawback to using CAPI to conduct a market survey. In some cases, it could be more difficult to move between item-units within the listing on a CAPI survey. While conducting a market survey, the enumerators will not go item-unit by item-unit. Instead, they will move within the market collecting what item-units they see, not necessarily in order. For the CAPI program to be

4. BENEFITS OF USING COMPUTER-ASSISTED PERSONAL INTERVIEWING (CAPI) 26

usable in the market setting, enumerators must be able to move easily between item-units in the list. Survey Solutions CAPI allows for such flexibility; evaluation of other software options should take this into consideration.

4.2 CAPI FOR FOOD CONSUMPTION AND AGRICULTURAL PRODUCTION SURVEYSAll the reference library resources detailed above can be used with both paper-based and computer-assisted per-sonal interviews (CAPI). Several CAPI-based programs have capabilities that allow photo references to be incorporated into the interview, so that an enumerator can share relevant images with the respondent as an item is being discussed. In several cases, programs connect the photo directly to the item-unit combination represented, so that “selecting” the photo automatically defines the conversion factor for the item reported.

The importance of collecting data on allowable item-unit combinations and calculating their conversion factors prior to the start of fieldwork is made even more critical with CAPI. When used with CAPI, these tools can create more dynamic in situ validation checks for enumerator use. Allow-able combinations can be programmed into CAPI, so that only these options can be selected for any given item. The full set of such combinations is usually far more than an enu-merator can be expected to recall during an interview, so

building them into the parameters of the survey reduces the number of invalid observations reported during data collec-tion. By applying conversion factors to data as they are being collected, reporting errors can be flagged and reviewed with respondents at the time of the interview, further reducing the number of invalid observations and eliminating the need for costly follow-up visits. CAPI programs can include checks of each item, including confirmation that price per kilogram and/or total and per capita standard-unit quantities are within reason. Some CAPI programs can also generate checks and reports compiled across multiple items entered, creating a summary list of all crop harvests in kilograms, listed in order of quantities, that enumerators can review with households for on-the-spot validations, ensuring that top-reported crops match farmer’s expectations, for example. When collect-ing data on household consumption, caloric values can be included to generate food-consumption summaries; enumer-ators can review these immediately with household mem-bers, checking, for example, that the average caloric intake of household members is within reason, and that the ranking of foodstuffs by caloric share of diet makes sense.

As with any survey using CAPI, it is worth emphasizing the importance of dedicating sufficient additional time and resources to ensure the CAPI program is well programmed and that all checks and validations are incorporated before fieldwork—and even before training and piloting—begins. This additional up-front time will ensure that interviews run more smoothly, save time, and produce less data errors.

27

5. Conclusion

Food consumption and agricultural production are two of the most important and commonly measured quantities for welfare analysis in low- and middle-income countries. Both are critical inputs into poverty estimates for these countries and agricultural production is essential for estimating farmer productivity. Many strides have been made in improving several aspects of these estimates, but until recently the challenge of converting non-standard (NSUs) has received less attention.

The usual practice has been either to limit households to reporting in standard units or to have enumerators estimate the conversion to a standard unit on an ad-hoc basis, both of which can be very problematic and lead to poor estimates. The use of NSUs can increase the accuracy of reported quantities in food-consumption and agricultural-production surveys. Reliably documented conversion factors for NSUs ensure that data robustness is not reduced by the loss of valid observations.

The objective of this Guidebook has been to provide advice to survey practitioners on incorporating non-stan-dard units into their surveys, along with practical guidance

on how to create a complete NSU library resource for coun-tries where one does not currently exist. In addition, the Annexes to this Guidebook include sample questionnaire instruments as well as resource libraries from the LSMS-ISA project (Nigeria, Ethiopia, Malawi, and Uganda). The librar-ies can be of use when working on any surveys or with any survey data in the selected countries. The Annexes include the allowable item- (and condition-) unit combinations for each of the countries and all photo references collected. The Stata files containing the conversion factors are available at www.worldbank.org/lsms under Publications/Guidebooks.

28

REFERENCESAttanasio, O., & Frayne, C. (2006). Do the poor pay more? Presented at: Eighth BREAD Conference on Development

Economics. Ithaca, New York.

Beegle, K., De Weerdt, J., Friedman, J., & Gibson, J. (2012). Methods of household consumption measurement through surveys: Experimental results from Tanzania. Journal of Development Economics, 98(1), 3-18.

Capéau, B. (1995). Measurement error and functional form: a proposal to estimate prices and conversion rates from the ERHS1994. Mimeo.

Capéau, B., & Dercon, S. (2006). Prices, unit values and local measurement units in rural surveys: an econometric approach with an application to poverty measurement in Ethiopia. Journal of African Economies, 15(2), 181-211.

Casley, D. J. & Kumar, K. (1988). Collection, analysis and use of monitoring and evaluation data. Baltimore, MD: The John Hopkins University Press.

Deaton, A., (1997). The Analysis of Household Surveys: a Microeconometric Approach to Development Policy. Washington D.C. and Baltimore: The World Bank and Johns Hopkins University Press.

Deaton, A., & Dupriez, O. (2011). Spatial price differences within large countries. Manuscript, Princeton University.

Diskin, P. (1997). Agricultural Productivity Indicators Measurement Guide. Food and Nutrition Technical Assistance Project. Washington, DC: US Agency for International Development.

Fermont, A., & Benson, T. (2011). Estimating yield of food crops grown by smallholder farmers. IFPRI Discussion Paper. Washington DC: International Food Policy Research Institute.

Fiedler, J. L., Carletto, C., & Dupriez, O. (2012). Still waiting for Godot? Improving Household Consumption and Expenditures Surveys (HCES) to enable more evidence-based nutrition policies. Food & Nutrition Bulletin, 33(Supplement 2), 242S-251S.

Kormawa, P. & Ogundapo, A.T. (2004) Local weights and measures in Nigeria: A handbook of conversion factors. IITA Monograph. Ibadan, Nigeria: International Institute of Tropical Agriculture.

Murphy, J., Casley, D. J. & Curry, J. J. (1991). Farmers’ Estimations as a Source of Production Data. World Bank Technical Paper 132. Washington, DC: World Bank.

Smith, L. C., & Subandoro, A. (2007). Measuring food security using household expenditure surveys (Vol. 3). IFPRI Technical Guide. Washington DC: International Food Policy Research Institute.

Sud, U.C., Ahmad, T., Gupta, V.K., Chandra, H., Sahoo, P.M., Aditya, K., Singh, M., & Biswas, A. (2016). Research on Improving Methods for Estimating Crop Area, Yield and Production under Mixed, Repeated and Continuous Cropping. Global Strategy: Improving Agricultural and Rural Statistics, Working Paper No. 5. Rome: Food and Agriculture Organization of the United Nations.

Policy Research Working Paper 7105

Durable Goods and Poverty MeasurementNicola AmendolaGiovanni Vecchi

Poverty Global Practice GroupNovember 2014

WPS7105P

ublic

Dis

clos

ure

Aut

horiz

edP

ublic

Dis

clos

ure

Aut

horiz

edP

ublic

Dis

clos

ure

Aut

horiz

edP

ublic

Dis

clos

ure

Aut

horiz

ed

Produced by the Research Support Team

Abstract

The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.

Policy Research Working Paper 7105

This paper is a product of the Poverty Global Practice Group. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://econ.worldbank.org. The authors may be contacted at [email protected].

The paper focuses on durable goods and their role in the measurement of living standards. The paper reviews the theoretical underpinnings of the methods available to estimate the value of the services flowing from consumer durable goods. It also provides a unified framework that

encompasses the acquisition approach, the rental equiv-alent approach, and the user cost approach. The pros and cons of each method are discussed in the context of poverty and inequality analysis and it is argued that the user cost should receive the highest consideration.

Durable Goods and Poverty Measurement

Nicola Amendola1 and Giovanni Vecchi23

JEL: C46, D31, I32, O15

Keywords: Measurement of poverty; Inequality; Consumption aggregate, Income Distribution

1 University of Rome “Tor Vergata” 2 University of Rome “Tor Vergata” 3 We are grateful to Lidia Ceriani, Sergio Olivieri, Marco Ranzani, Carlos Felipe Balcazar and Nobuo Yoshida for the useful comments

received.

1 Introduction

When it comes to measuring inequality and poverty, the choice and definition of an

appropriate welfare indicator is not a straightforward task. A number of decisions must be

made, many of which are controversial, and most often the decision making process is

based on well-established practices rather than on theoretical arguments (Lanjouw, 2009;

Deaton and Zaidi 2002). In this paper we focus on consumer durable goods and investigate

their contribution to determining the living standard of the household.

In general, long-lived items such as automobiles, appliances and furniture have a positive

and significant impact on living standards. Sometimes the outlay on durable goods only

claims a small fraction of disposable income, but they most often change the lifestyle of the

individuals, either by saving their time – as in the case of housework appliances – or by

consuming their time, as with entertainment appliances (Offer 2005). Either way, consumer

durable goods clearly matter to the wellbeing of individuals and there is an increasing

consensus on the fact that any welfare measure should account for them (Slesnik, 2001,

Deaton and Zaidi 2002, OECD 2013).

In the first part of this paper we review the theoretical underpinnings of the methods

available to deal with durables. We outline the main alternatives found in the literature and

discuss their advantages and disadvantages in the context of poverty and inequality

analysis. We end up our review by arguing that the so-called “user cost approach” is well

worth our recommendation (Diewert 2009). The underlying idea is that it is not the

expenditures on consumer durables that should be included in the welfare aggregate. Rather,

it is the flow of services from durables that must be valued and comprised in the welfare

aggregate, and the user cost method estimates it simply by calculating the difference

between the value of the durable at the beginning of analytical the period and its actual

value at the end of the same period. While simple, this method is not naïve, and offers the

2

further advantage (not shared by the rental equivalence approach) of being a viable solution

given the information typically available in household budget surveys.

The second part of the paper is devoted to measurement issues. What is simple in theory

can turn into a thorny problem in practice. In fact, the estimation of the user cost of durable

goods is an exercise fraught with difficulties. Data limitation is probably the single most

important obstacle to implementing the user cost method. We discuss how data limitations

can be overcome, or at least dealt with, by overviewing the current practice for a number of

countries all around the world.

PART I – THEORY

2 Concepts and definitions

What is a durable good and why durable goods require a special treatment? These are two

focal questions that need to be addressed before outlining the theoretical approaches

available to deal with consumer durable goods and their role in the measurement of living

standards.

A durable good is a consumption good that can “deliver useful services to a consumer

through repeated use over an extended period of time” (Diewert, 2009 p. 447). The main

characteristic of a durable good does not depend on its physical durability, a property

shared by many other consumption goods, but by the fact that, like capital goods, it is

productive for two or more periods. According to the System of National Accounts (SNA)

– the internationally agreed set of recommendations on how to compile measures of

economic activity – “….the distinction is based on whether the goods can be used once

only for purposes of production or consumption or whether they can be used repeatedly, or

continuously. For example, coal is a highly durable good in a physical sense, but it can be

burnt only once. A durable good is one that may be used repeatedly or continuously over a 3

period of more than a year, assuming a normal or average rate of physical usage. A

consumer durable is a good that may be used for purposes of consumption repeatedly or

continuously over a period of a year or more.” (SNA 2008: 184). Housing is clearly a

durable good, arguably among the most important ones in most consumers’ bundles. Due to

its importance, however, the way rents and imputed rents are estimated for inclusion in the

welfare aggregate is the object of a separate paper.4 In this paper we focus on consumer

durable goods other than housing5.

The SNA definition helps to answer the second question, namely why durable goods

require special treatment when measuring living standards. The essence of the problem lies

in the inconsistency between the so-called reference period chosen for the welfare

aggregate, and the period of time during which durable goods deliver their utility to the

consumer. In theory, prior to the analysis, “we need to decide the reference period for

welfare measurement, whether someone is poor if they go without adequate consumption or

income for a week, a month, or a year” (Deaton 1997: 151). Once this choice is made, the

same reference period must be applied to all the components of the welfare aggregate, no

matter if it is pears or t-shirts, electric fans or cars. In practice, very rarely the reference

period exceeds one year; most often, it coincides with the year6. If we assume that the

reference period is one year (or less), then it is clear that durable goods, by their very

4 This follows a well-established practice according to which, a welfare aggregate is constructed by putting

together four building blocks, namely (i) food consumption, (ii) non-food consumption, (iii) durable goods

and (iv) housing [Deaton and Zaidi 2002: 25]. 5 According to the Classification of individual consumption by purpose (COICOP) nomenclature,

consumption goods are classified as non durable (ND), semi-durables (SD) and durable (D). The consumption

goods classified as durable belong to the following categories: furniture, furnishings, carpets and other floor

coverings, major household appliances, tools and equipment for house and garden, therapeutic appliances and

equipment, vehicles, telephone and fax equipment, audiovisual, photographic and information processing

equipment (except recording media), major durables for recreation, electrical appliances for personal care,

jewelry, clocks and watches (ILO, 2004). 6 In most LSMS questionnaires the recall period for nonfood items does not exceed one year.

4

definition, pose a problem: how to reconcile the fact that items whose economic life

extends beyond the reference period of the welfare aggregate must be part of it?

The purchasing market price of a durable good is clearly an inadequate pricing concept.

This is because the purchasing market price corresponds to the value of the durable good

for its entire economic life, while what we need is the value of the use of durable goods for

a shorter period, the reference period. Unfortunately, the value of the use of a durable that

contributes to the welfare during the reference period is rarely, if ever, directly observed.

This explains why durable goods require special treatment: the expression coined in the

literature is that we need to estimate the consumption flow of durable goods, that is to

estimate the benefit accruing to the household from the ownerships of durable goods,

limited to the reference period of analysis.

The impact of using the consumption flow instead of the purchasing price of the durable

depends on the purposes of the analysis. Let us start with the context of the system of

national accounts (Moulton 2004; Young 2005). The value of expenditures on consumer

durables tend to fluctuate widely over the business cycle, while the value of their services

(the consumption flow) varies more smoothly. This suggests that the latter measure

provides a better picture than the former of the changes of a nation’s economic welfare over

time and make international comparisons more meaningful Katz (1983: 406). While the

2008 SNA recognizes these advantages, in practice, “the SNA measures household

consumption by expenditures and acquisitions only. The repeated use of durables by

households could be recognized only by extending the production boundary by postulating

that the durables are gradually used up in hypothetical production processes whose outputs

consist of services. These services could then be recorded as being acquired by households

over a succession of time periods. However, durables are not treated in this way in the

SNA. A possible supplementary extension to the SNA to allow for such an extension of the

production boundary could usefully take place in a satellite account.” (SNA 2008: 184).

A similar issue arises in the construction of a consumer price index (CPI). As argued by

Alchian and Klein (1973), any analytically correct measures of inflation should take

account of changes in the price of durable goods. The point was, in fact, made by Irving

5

Fisher as far back as 1911, when he explained that to base a price index only on “services

and immediately consumable goods would be illogical” (Fisher 1911: ch. 10, X.39). The

claim that durable goods should be covered by the consumer price index has never been

disputed from a theoretical standpoint. Yet, in practice, the task of incorporating the price

dynamics of consumer durables is not straightforward. A special treatment of the prices of

durables is often necessary in order to moderate the observed volatility in measures of

inflation that incorporate changes in the price of assets (Goodhart, 2001). Even more

relevant to the present context is the fact that if the CPI is to serve as a cost-of-living index,

then the pricing concept should hinge on the cost of the use of the services of the durable

good during the reference period rather than on its purchase price (Diewert 2004, 2009).

When it comes to measuring living standards, poverty and inequality, the estimation of the

value of consumer durables is also crucial. The use of the purchase price instead of the

consumption flow leads to overestimate the effect of economic cycle on the household

welfare, to underestimate absolute poverty, and most likely to bias the poverty profile

(Deaton and Zaidi 2002). A well-documented study on Russia, for instance, shows that the

impact on inequality can be very large: the Gini index of expenditure increases from 32

percent to 44 percent when the full purchase value of durables is included instead of its use

value (World Bank 2005: 9).

Irrespective of the angle that one may take, the estimation of the consumption flow from

durable goods stands out as a complex task. The main reason is that the value of the flow of

services of a durable depends on its physical deterioration rate but also on the (unobserved)

expected price of the durable7. This imputation exercise can be interpreted as a special case

of the classical problem of the measurement of capital, one of the oldest and most

contentious areas of economic theory (Hicks 1955; Hulten 1990; Downs 1986).

7 There are circumstances were the estimation of the consumption flow is a simple task. If the consumer

purchases the services of durables – for instance renting a car – then the price that s/he pays does the job as it

represents the consumption flow. Unfortunately, in most cases ownership is not separated from usage and

analysts must deal with the imputation problem.

6

3 Main theoretical approaches for dealing with durables

In this section we discuss the main theoretical approaches to estimate the consumption flow

from durable goods. We begin by introducing some notation. Consider a household in year

t, owning a durable good manufactured in year (t-v) and purchased in year (t–s), where

0 ≤ 𝑠𝑠 ≤ 𝑣𝑣, and let 𝑝𝑝𝑣𝑣,𝑡𝑡𝑠𝑠 denote the market price of the durable in period t, When 𝑣𝑣 = 0, 𝑝𝑝𝑣𝑣,𝑡𝑡

𝑠𝑠

denotes the market price of a new durable; when 𝑣𝑣 > 0, 𝑝𝑝𝑣𝑣,𝑡𝑡𝑠𝑠 corresponds to the (second-

hand) market value in period t of a v-year old durable good. The consumption flow 𝐶𝐶𝐶𝐶𝑡𝑡 of

the durable in period t is defined as follows:

(1) 𝐶𝐶𝐶𝐶𝑡𝑡 = 𝑘𝑘𝑣𝑣,𝑡𝑡𝑠𝑠 × 𝑝𝑝𝑣𝑣,𝑡𝑡

𝑠𝑠

where 𝑘𝑘𝑣𝑣,𝑡𝑡𝑠𝑠 ∈ ℝ+. Equation (1) expresses the current value of the flow of services (𝐶𝐶𝐶𝐶𝑡𝑡) for

a generic v-year old consumer durable purchased s years back in time, as a fraction (𝑘𝑘𝑣𝑣,𝑡𝑡𝑠𝑠 ) of

the market price (𝑝𝑝𝑣𝑣,𝑡𝑡𝑠𝑠 ). The coefficient 𝑘𝑘𝑣𝑣,𝑡𝑡

𝑠𝑠 is typically less than one, but in principle can

be greater than one.8 The main theoretical approaches to dealing with durable goods can be

described as different procedures for estimating the coefficient 𝑘𝑘𝑣𝑣,𝑡𝑡𝑠𝑠 . Following Diewert

(2009), we distinguish between three alternatives: (i) the acquisition approach; (ii) the

rental equivalence approach and (iii) the user cost approach. We depart from Diewert as the

focus of our analysis is limited to consumer durable goods, and in particular we do not

cover owner-occupied housing. Unlike Diewert, we are not much interested in consumer

price indices but rather in constructing a household level welfare indicator and ultimately in

evaluating the distributional impact of alternative measurement methodologies. Our

attempt, in the rest of this section, is to develop a unified approach – here expressed in

equation (1) – that encompasses the three approaches above.

8 This is likely to be the case of, say, a Picasso painting or of a vintage car.

7

3.1 Acquisition approach

When a durable good is purchased by a household and its entire value is attributed to the

household expenditure, we say that the durable good is treated according to the acquisition

approach (also known as “net acquisition approach”). Looking at equation (1), this

approach amounts to specifying the coefficient 𝑘𝑘𝑣𝑣,𝑡𝑡𝑠𝑠 as follows:

(2) 𝑘𝑘𝑣𝑣,𝑡𝑡𝑠𝑠 (𝑎𝑎) = �1 𝑖𝑖𝑖𝑖 𝑠𝑠 = 0

0 𝑖𝑖𝑖𝑖 𝑠𝑠 > 0

In equation (2), the argument “a” in 𝑘𝑘𝑣𝑣,𝑡𝑡𝑠𝑠 (𝑎𝑎) stands for “acquisition”. According to the

acquisition approach, 𝐶𝐶𝐶𝐶𝑡𝑡 = 0 if the household does not purchase the durable during the

survey year t – eq. (2) says that 𝑘𝑘𝑣𝑣,𝑡𝑡𝑠𝑠 (𝑎𝑎) = 0 for all items for which s>0, that is for items

purchased prior to the current period t. Note that the definition does not contemplate v, that

is, it does not matter if the durable is new or used. A positive consumption flow 𝐶𝐶𝐶𝐶𝑡𝑡 = 𝑝𝑝𝑣𝑣,𝑡𝑡0

is attributed to durable goods purchased by the household when s=0, that is during the

survey year. Under the assumption that the market price 𝑝𝑝𝑣𝑣,𝑡𝑡0 captures the current value of

all services provided by the durable over its entire economic life, then the net acquisition

approach assigns the household the entire stream of current and future productive services

of the durable in year t and zero for subsequent years.

The acquisition approach ignores the problem of distributing the initial cost of the durable

over the useful life of the good and allocate the entire charge to the period of purchase [ILO

2003: 419]. Further, the acquisition approach is clearly distortionary: it underestimates the

welfare of households that owns used durable goods with respect to households who

happened to purchase durable goods in the current year. 9 When the net acquisition

approach is used for the construction of the consumption aggregate, both the level and the

9 It is certainly true that this distortionary effect is less important if we are interest in aggregated variables but

we cannot say that it completely vanishes (see section 4).

8

budget shares of durables tend to mirror the business cycle. This is due to the fact that

households tend to postpone the purchase of durables when the economy slows down, and

to increase it when the economy boosts.

3.2 Rental equivalence

If rental or leasing markets for consumer durable goods exist, then the market rental prices

can be used to estimating consumption flows from durable goods. This method is known as

the rental equivalence approach [ILO 2003, Diewert 2009]. Suppose that in period t a

competitive market exists, where households can purchase the services of v-year-old

durable goods. Consumers can rent a car or a refrigerator, for example. Also assume that

households own homogeneous durable goods, that is assume that all goods are of the same

type and quality. Let 𝑅𝑅𝑣𝑣,𝑡𝑡 denote the market rental value of the v-year-old durable good. If

markets are competitive and the economy is in equilibrium, then the market rental value

𝑅𝑅𝑣𝑣,𝑡𝑡 measures the consumption flow from the durable owned by the household.10 Going

back to equation (1), the rental equivalence approach specifies 𝑘𝑘𝑣𝑣,𝑡𝑡𝑠𝑠 (𝑟𝑟) as follows:

(3) 𝑘𝑘𝑣𝑣,𝑡𝑡𝑠𝑠 (𝑟𝑟) =

𝑅𝑅𝑣𝑣,𝑡𝑡 𝑝𝑝𝑣𝑣,𝑡𝑡𝑠𝑠

where 𝑅𝑅𝑣𝑣,𝑡𝑡 𝑝𝑝𝑣𝑣,𝑡𝑡𝑠𝑠⁄ is the rental ratio with respect to the market value of the durable owned by

the household, also known as the capitalization rate. In equation (3), the argument “r” in

𝑘𝑘𝑣𝑣,𝑡𝑡𝑠𝑠 (𝑟𝑟) stands for “rental”. If we substitute (3) in (1) we obtain 𝐶𝐶𝐶𝐶𝑡𝑡 = 𝑅𝑅𝑣𝑣,𝑡𝑡 .

In the rental equivalence approach, a pivotal role is clearly played by the market prices for

the services of durable goods. However, three restrictions must be introduced in order to

make the approach fully consistent: (a) one must assume the existence of a complete set of

10 We also do not consider here taxes and transaction costs. 9

markets for the services of the durables owned by the household; (b) markets must be

competitive and (c) the economy must be in equilibrium. If assumption (a) does not hold

we cannot apply the method and if one of the other two assumptions is violated the market

rental value does not reflect necessarily the household welfare gain from using the durable

in period t.

An additional concern with the rental equivalence approach has an empirical nature. The

estimate of 𝑘𝑘𝑣𝑣,𝑡𝑡𝑠𝑠 (𝑟𝑟) in equation (3) requires the availability of market rental prices in period

t for services of vintage-v durable goods. Even if, according to assumption (a), such a

market exists, this does not necessarily imply that actual transactions are available in the

sample. When markets are thin, 1) it can be very difficult to observe rental prices for all the

durables owned by the households, and 2) prices are likely to suffer from heterogeneity11.

The first condition is well-known and well documented in the literature that deals with the

measurement of the cost of shelter for homeowners (see, among the others, Gillingham

1983). One solution for determining rental price equivalents for consumer durables is to ask

households what they think their durables would rent for. Perception, however, does not

always and necessarily matches with reality. Households might have imperfect information

on current market prices especially with regard to consumption goods purchased or sold

infrequently (or never purchased or sold).

3.3 User cost

The user cost approach is based on a concept first introduced by Keynes (1936, chapter 6,

p. 53) and successively reformulated by Jorgenson (1963). The idea behind the approach is

highly intuitive: the user cost approach calculates the cost of purchasing the durable at the

beginning of the period, using the services of the durable during the period and then netting

11 Note that this should not happen if markets were perfectly competitive. Hence, the empirical relevance of

point 2) may indicate a violation of the perfect competition assumption.

10

off from these costs the benefit that could be obtained by selling the durable at the end of

the period (ILO 2003: 422)12.

Let 𝑝𝑝𝑣𝑣,𝑡𝑡𝑠𝑠 denote the market price of a durable produced in year t-v, but purchased in year t-s.

We assume here that the purchase took place at the beginning of period t. Similarly, let

𝑝𝑝𝑣𝑣+1,𝑡𝑡𝑠𝑠+1 denote the market price of the same durable at the end of period t. This notation

(both indices are now v+1 and s+1) reflects the fact that at the end of the period t both the

vintage of the good (v) and its purchase date have increased by one unit. The user cost

evaluated at the end of period t can be defined as follows:

(4) 𝑈𝑈𝐶𝐶𝑡𝑡 = (1 + 𝑖𝑖𝑡𝑡)𝑝𝑝𝑣𝑣,𝑡𝑡𝑠𝑠 − 𝑝𝑝𝑣𝑣+1,𝑡𝑡

𝑠𝑠+1

where 𝑖𝑖𝑡𝑡 is the nominal interest rate in period t.

The user cost can be interpreted as the opportunity cost of owning for one period the

durable instead of selling the durable at the beginning of period t. Equation (4) can be

manipulated to show that the user cost 𝑈𝑈𝐶𝐶𝑡𝑡 equals the sum of the net return of the durable

(in equilibrium, the net return corresponds to what the consumer would obtain by not

purchasing the asset and investing an equivalent amount of money on a competitive

financial market) plus a possible capital gain:

(5) 𝑈𝑈𝐶𝐶𝑡𝑡 = 𝑖𝑖𝑡𝑡𝑝𝑝𝑣𝑣,𝑡𝑡𝑠𝑠���

𝑟𝑟𝑟𝑟𝑡𝑡𝑟𝑟𝑟𝑟𝑟𝑟𝑠𝑠

+ �𝑝𝑝𝑣𝑣,𝑡𝑡𝑠𝑠 − 𝑝𝑝𝑣𝑣+1,𝑡𝑡

𝑠𝑠+1 ����������𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑡𝑡𝑐𝑐𝑐𝑐 𝑔𝑔𝑐𝑐𝑐𝑐𝑟𝑟𝑠𝑠

Equation (5) shows that the user cost depends on the current price of the durable 𝑝𝑝𝑣𝑣,𝑡𝑡𝑠𝑠 , on

the current nominal interest rate 𝑖𝑖𝑡𝑡 , but also on expected capital gains, that is on the

12 Alternatively, the user cost can be derived by equating the price of a durable to the discounted value of the

net benefits that it is expected to generate in the future. See Jorgenson (1963) and OECD (2009).

11

expected price of the durable at the end of period t. Thus, the user cost can be interpreted as

an ex-ante measure of the consumption flow. In equation (5), and in the rest of this section,

however, we do not distinguish between expected and actual prices. This amounts to

assuming that the economy is in equilibrium and that the user cost is at the same time an ex

ante and an ex post measure of the consumption flow.

In equation (5), capital gains (or losses) depend on the economic depreciation of the

durable and on the monetary price dynamics of durable goods. Economic depreciation is an

expression used to describe the loss in monetary value that most capital goods experience

with age [Hulten and Wykoff 1981]. Due to economic depreciation, one unit of the

consumer durable of vintage v at the beginning of period t corresponds to (1 − 𝛿𝛿𝑡𝑡) units of

a durable of vintage v+1 at the beginning of the period t, where 𝛿𝛿𝑡𝑡𝜖𝜖(0,1) is usually referred

to as the net depreciation rate.13 At the same time, the price of one unit of a consumption

durable of age v at the end of period t is equal to (1 + 𝜋𝜋𝑡𝑡)𝑝𝑝𝑣𝑣,𝑡𝑡𝑠𝑠 , where 𝜋𝜋𝑡𝑡 measures the

durable-specific inflation rate in period t. It follows that 𝑝𝑝𝑣𝑣+1,𝑡𝑡𝑠𝑠+1 = (1 − 𝛿𝛿𝑡𝑡)(1 + 𝜋𝜋𝑡𝑡)𝑝𝑝𝑣𝑣,𝑡𝑡

𝑠𝑠 .

Substituting in equation (5) we obtain:

(6) 𝑈𝑈𝐶𝐶𝑡𝑡 = [𝑖𝑖𝑡𝑡 + 1 − (1 − 𝛿𝛿𝑡𝑡)(1 + 𝜋𝜋𝑡𝑡)]𝑝𝑝𝑣𝑣,𝑡𝑡𝑠𝑠

If we assume 𝜋𝜋𝑡𝑡𝛿𝛿𝑡𝑡 ≃ 0, equation (6) simplifies to:

13 Hulten and Wykoff (1995) distinguish between “deterioration” and “depreciation”. The former is a quantity

concept while the latter refers to a financial value. Deaton and Zaidi (2002: 14) prefer to use the concept, and

discuss a model for the “deterioration rate” based on the assumption that “the quantity of the good is subject

to “radioactive decay”, so that if the household starts off the year with the amount St, it will have an amount

(1 − 𝛿𝛿𝑡𝑡)𝑆𝑆𝑡𝑡 to sell back at the end of the year”. This model corresponds to the geometric depreciation model

that we introduce in section 5.4.1. Hulten and Wykoff (1995), however observe that under the depreciation

model the two concepts coincide. See also Koumanakos and Hwang (1988)

12

(7) 𝑈𝑈𝐶𝐶𝑡𝑡 = (𝑟𝑟𝑡𝑡 + 𝛿𝛿𝑡𝑡)𝑝𝑝𝑣𝑣,𝑡𝑡𝑠𝑠

where 𝑟𝑟𝑡𝑡 = 𝑖𝑖𝑡𝑡 − 𝜋𝜋𝑡𝑡 is the Fisherian real interest rate. Equation (7) states that the user cost

can be obtained by applying the sum of the real interest rate and the economic depreciation

rate to the market price of an v-year old consumer durable in period t. Hence, according to

the user cost approach, the coefficient 𝑘𝑘𝑣𝑣,𝑡𝑡𝑠𝑠 (𝑢𝑢) is simply:

(8) 𝑘𝑘𝑣𝑣,𝑡𝑡𝑠𝑠 (𝑢𝑢) = 𝑟𝑟𝑡𝑡 + 𝛿𝛿𝑡𝑡

where the argument “u” in 𝑘𝑘𝑣𝑣,𝑡𝑡𝑠𝑠 (𝑢𝑢) stands for “user cost”. Equation (8) helps clarify the

relationship between the user cost and the rent equivalent approaches. The two approaches

produce the same estimate of the consumption flow if and only if 𝑘𝑘𝑣𝑣,𝑡𝑡𝑠𝑠 (𝑢𝑢) = 𝑘𝑘𝑣𝑣,𝑡𝑡

𝑠𝑠 (𝑟𝑟), i.e. if

and only if:

(9) 𝑅𝑅𝑡𝑡 = [𝑟𝑟𝑡𝑡 + 𝛿𝛿𝑡𝑡]𝑝𝑝𝑣𝑣,𝑡𝑡𝑠𝑠

Equation (9) states that the market rental value of the durable in a single period must be

equal to the real rate of return that can be obtained by selling the durable and investing it on

the capital market net of the depreciation rate of the durable. This is clearly an arbitrage

condition that is only satisfied in equilibrium and in absence of any market friction

[Jorgenson 1963, Deaton and Muellbauer, 1980]14.

14 Garner and Verbrugge (2007) find evidence on the empirical divergence between the two measures.

13

4 Discussion

There are clear conceptual and theoretical differences between the three approaches

reviewed in the previous section. The net acquisition approach is a stock approach, while

the rental equivalence and the user cost approaches are flow approaches. If we interpret the

consumer durable as a capital asset, the net acquisition approach consists in valuing both

the present and future productive services of new “capital” by means of the purchasing

market price of the durable. One advantage of this method is that it treats durable goods

symmetrically with non-durable consumption goods. The approach is also conceptually

simple, easy to communicate and parsimonious in terms of data requirement (Diewert,

2004, 2009): its implementation only requires the market value of the durables purchased

by the household in the survey period. A further advantage is its consistency with the

prescriptions of SNA 2008 (SNA 2008). For all these reasons, most statisticians engaged in

constructing consumer price indices adopt the acquisition approach for all durable goods

(with the exception of housing) [ILO 2003: 419].

There are, however, theoretical drawbacks that cannot be ignored. While the adoption of

the acquisition approach might have a relatively little impact on the SNA, especially if the

economy is on a steady state growth pattern15, the same is not true for micro-level analyses.

By imputing the entire value of the durable to the initial (acquisition) period the analyst

clearly misrepresents the time pattern of the welfare accruing to the household that owns

durable goods. The acquisition approach systematically overestimates the living standards

of households who decide to invest on new consumer durables, and underestimate it for

15 Diewert, (2004, 2002) prove that, in a long run equilibrium, the ratio between the value of the consumption

flow estimated according to the user cost approach and the value estimated following the acquisition approach

is constant and is, in general, larger than one. The ratio approximate the unity in a steady state growth pattern

Jorgenson and Lanfeld (2006) note, however, that the distortion induced by the acquisition approach could

also involve the aggregated level analysis. In a boost phase of the business cycle households tend to postpone

investments on durables and the opposite is true in boom phases. Hence, the acquisition approach emphasizes

the cycle giving a large weight to the more volatile component (sensitive to expectation) of households’

consumption expenditure

14

households who decide to postpone this decision or have taken this decision in the past.

While it is not easy to identify the direction and the magnitude of the bias, poverty and

inequality estimates are certainly biased in the presence of durable goods imputed by means

of the acquisition approach.

The above discussion lead us to prefer the rental equivalence and the user cost approaches

to the acquisition approach, at least in the context of welfare analysis. This is in line with

the conclusions reached by Deaton and Zaidi (2002: 35), Diewert (2004, 2009), and more

recently by the OECD Expert Group on Micro-statistics on Household Income,

Consumption and Wealth (OECD, 2013).

As shown in equation (9), in an equilibrium position, the rental equivalence and the user

cost approach are theoretically consistent. The rental equivalence approach uses the market

evaluation (observed paid rent) as an (ex ante) estimate of the user cost. If expectations are

fulfilled, market rents can be considered as a good approximation of the user cost. There

are however two problems. Firstly, if there is uncertainty, the hypothesis of perfect

foresight may not be a plausible one16. In most situations, the rental equivalence approach

is unlikely to converge to the user cost, due to departures from the theoretical conditions

that must hold in order for equation (9) to hold true. Market rental prices for durables are

often sensitive to expectations and other cyclical factors. Secondly, we should consider

certain data issues. For most v-year old durable goods, the availability of actual market

rental prices is the exception rather then the rule. In most developing countries, markets for

durable services are thin or even non existent and this makes the hypothesis of perfect

foresight discussed sub 1) untenable. As appealing as the solution of asking households a

self-assessment of durable rental prices might seem, the absence of reliable market data

would not allow to test the reliability of the answers.

16 The question has been clearly pointed out, even if in a different context, by Hulten and Wykoff (1995):

“The rental price, on the other hand, is the ex ante cost of acquiring the right to use the capital good for a

stipulated period of time. Under perfect foresight with full utilization the two concepts will tend to converge.

With uncertainty, they may not”

15

A combination of theoretical and data-related arguments leads us to express a preference

for the user cost approach. Further, questionnaires used in many household budget surveys

around the world contain useful information for implementing the user cost approach,

irrespective of the choice of the welfare aggregate (in particular, whether income- or

expenditure based). All in all, the user cost provides the best balance between theoretical

consistency and data requirement to properly account for durable goods in measuring

welfare at the household level.

PART II – PRACTICE

In this section we focus on implementation issues, and in particular on how to estimate the

consumption flow of consumer durable goods according to the user cost approach. The

discussion covers both data issues and estimation methods available to estimate the

consumption flow as expressed in equation (7). We start off with an overview of how

consumer durables are accounted for in living standard assessment reports around the world

(section 5). Next we focus on the practicalities of how the consumption flow from durables

can be estimated starting from the user cost approach (section 6).

5 A bird’s eye view of current practices

We begin our review by summarizing the solutions adopted in selected World Bank

Poverty Assessment Reports. We examined a sample comprising 95 reports published

between 1996 and 2014, covering 61 countries.17 We find that 43 percent of the reports fail

17 We excluded from the sample another 20 reports in which it was not possible figure out – with the due

detail – the definition of the welfare aggregate. In most cases, our understanding is that durables were not

accounted for. 16

to account for durable goods in the construction of the welfare aggregate: consumer durable

goods are simply ignored and excluded from the welfare aggregate, no matter whether

income- or expenditure based (Figure 1). The reason underlying the exclusion of consumer

durables is not necessarily a choice of the analyst. Oftentimes exclusion is a consequence of

data limitation: not all questionnaires collect suitable information on consumer durables

and their ownership. In other circumstances, durables are excluded due to the analyst’s

concern for making consistent intertemporal comparisons. This is the case, for instance, of

Turkey in 2005 or of Egypt in 2011. Among the reports that do include consumer durables,

almost one out of four follows the acquisition approach (Section 3.1).

Figure 1 – Durable goods in World Bank’s selected poverty assessment reports

Source: Authors’ elaboration

17

Regional practices matter. The choice of the method to deal with consumer durables tend to

be common to countries belonging to the same region. The regions in which durable goods

have been included in the welfare aggregate more frequently are Central Asia and Latin

America (60%). In the MENA region, one out of two reports does account for durable

goods, and does so by means of the acquisition approach. In contrast, in Central Asia and

Latin America and Caribbean the user cost approach is the most popular method (more than

70% of considered reports). The user approach is also popular in East and South Asia (2 out

of 3 reports account for durables), but almost 6 out of 10 directly exclude durables from the

welfare aggregate. Data constraints are particularly binding in Sub-Saharan Africa: only

few reports include durables.

Recently, an OECD expert group, chaired by Bob McCall from the Australian Bureau of

Statistics, has released a report, part of a project launch in 2011, aimed at improving the

measurement of living standards at the micro level, i.e. at the level of individuals and

households. The new framework, named “Framework for Statistics on the Distribution of

Household Income, Consumption and Wealth” (ICW Framework), takes advantage of the

result obtained by a previous work, the Canberra Group Handbook on Household Income

Statistics [Canberra Group, 2011], suggests a number of methodological innovations and

does so by giving durable goods the due attention.

According to the OECD report, consumer durables are to be accounted irrespective of

whether using income or expenditure as a welfare measure. Beginning from income-based

measures, the advocacy for including an estimated value of the flow of services of durables

is clearly stated: “As with owner-occupied housing, household consumer durables normally

provide their owner with services over a number of years. The economic resource flowing

to the owner is notionally the rental value of the durables less the costs such as maintenance

expenses, depreciation and interest on any loan used to purchase the items. While similar in

nature to the net value of owner-occupied housing, it is separated out because it is much

more difficult to obtain relevant data, and because on average it is likely to have less impact

on the micro data, although it may be significant for some sub-population analysis”.

[OECD 2013: 46] Later on in the report, the estimated consumption flow from durables is

18

clearly listed among the components of income (code I3.3, “Net value of services from

household consumer durables”) [OECD 2013: table 4.1, p. 83 ].

Similarly, the concept of consumption expenditure developed within the ICW framework

recognizes the need to account for durable goods: “When a household purchases a dwelling

or consumer durables, it does not normally consume them immediately. Rather, the

household can be viewed as a producing entity that invests in those items as capital

expenditure and provides a flow of services to itself as a consuming entity. In the ICW

Framework, that flow of services is included as consumption expenditure, rather than the

initial purchase of the capital items. Two such service flows are included in the detailed

framework: i) the value of housing services provided by owner-occupied housing and ii)

the value of services from household consumer durables” [OECD 2013: 105]. It is worth

noting, however, that “Any consumer durables (…) provided in kind to households as a

return for labour or for the use of the household’s property are included in income but not

in consumption expenditure” [OECD 2013: 106].

The country-level experience is variegated. We did not undertake an exhaustive review of

the common practice in all countries, a task that goes beyond our scope here, but we chose

a few countries, much depending on the accessibility of the websites where methodological

notes are stored. Following no special order, here is a schematic account of our findings:

− In the US, consumer durable goods do not seem to receive any specific attention. The

issue is mentioned in Citro and Michael (1995: 245), but only in passim. As a matter of

fact, the consumption flow from durables is not part of the method used to estimate

official poverty in the US.

− In Canada, there is no official government definition and therefore, measure, for

poverty. The point is clearly stated in Fellegi (1997): “Once governments establish a

definition, Statistics Canada will endeavour to estimate the number of people who are

poor according to that definition. Certainly that is a task in line with its mandate and its

objective approach. In the meantime, Statistics Canada does not and cannot measure the

level of “poverty” in Canada.” That said, Murphy et al. (2010, 2012) do not mention

any special treatment reserved to consumer durables, and our understanding is that, de 19

facto, consumer durables do not enter the definition of total income used to measure

poverty.

− In Australia, the income aggregate does not include any estimates of the service from

consumer durables (Australian Bureau of Statistics, 2013). With regards to the

expenditure based welfare aggregate, the Australian practice has primarily hinged on

the acquisition approach (Australian Bureau of Statistics, 2012: 18-20).

− In the UK, the income-based welfare aggregate does not include the consumption from

durables [Department for Work and Pensions, 2013].

− In India, durables are included in the welfare aggregate according to the acquisition

approach (Government of India, 2007).

Overall, in most countries, the current practice does not seem to be consistent with the

lessons stemming from Part I of this paper, where it was argued that the user cost approach

deserves the highest consideration as a way of including consumer durables in the welfare

aggregate. The recent OECD publication goes in the right direction in that it endorses the

need of estimating the consumption flow of durables by means of a flow-based depreciation

method.

6 The estimation of the consumption flow

We begin from equation (7), which we rewrite as follows:

(10) 𝑈𝑈𝐶𝐶𝑡𝑡 = (𝑖𝑖𝑡𝑡 − 𝜋𝜋𝑡𝑡 + 𝛿𝛿𝑡𝑡)𝑝𝑝𝑣𝑣,𝑡𝑡𝑠𝑠

Equation (10) contains four variables that must be estimated:

1) 𝑖𝑖𝑡𝑡, the nominal interest rate;

20

2) 𝜋𝜋𝑡𝑡 , the durable-specific yearly inflation rate. Note that 𝑖𝑖𝑡𝑡 and 𝜋𝜋𝑡𝑡 jointly determine the

real interest rate, 𝑟𝑟𝑡𝑡 = 𝑖𝑖𝑡𝑡 − 𝜋𝜋𝑡𝑡.

3) 𝑝𝑝𝑣𝑣,𝑡𝑡𝑠𝑠 , the current market value of the v-year old durable;

4) 𝛿𝛿𝑡𝑡, the economic depreciation rate.

In the rest of this section we discuss the information required to estimate each variable

separately. This will bridge the gap between theory (eq. 10) and practice.

6.1 The nominal interest rate

The nominal interest rate is determined in the capital market and is intended to measure the

monetary financial opportunity cost of a household who decides to purchase (or not to sell)

a durable good. In an economy with competitive markets there are many reference interest

rates. Moreover, financial assets may have different maturities and different degrees of risk.

The equilibrium interest rates reflect all this complexity and even more18. Which one,

among the many available rates, should the welfare analyst choose?

There are many defensible answers and the final choice depends on the ultimate aim of the

analysis, as well as on the institutional context. If one is willing to assume that risk aversion

and wealth are negatively correlated19, then a good choice is the average interest rate on

safe assets, e.g. the yields of government bonds. This type of information is typically

available in most countries, and regularly published either by the national statistical office

or by the central bank. A second possibility is to use the interest rates on loans, and

specifically the borrowing rate on consumer durable goods like cars or other major

durables20.

18 For instance, also the heterogeneity in agents’ expectations should be taken into account. Also, if one

allows for imperfections in financial markets, different individuals might face different interest rates on loans. 19 Arrow (1971) was the first one who argued that absolute risk aversion decreases as wealth increases. 20 See Katz (1983) and Diewert (2009).

21

Concerning the time horizon there are two possibilities: 1) to adopt the same reference

period of the estimated consumption flow (typically, one year); 2) to cover the average

economic life of the durable good considered. Under the first alternative short period

interest rates are considered, while in the second alternative use is made of long-term rates.

The second option is probably to be preferred to the first one, as it implies a lower volatility

typically associated to long-term rates. Further, long-term rates allow to mitigate the effects

of pure monetary fluctuations on inequality and poverty trends.

6.2 The inflation rate

The second variable in equation (10) is the annual rate of change of monetary price of the

durable for which we want to measure the real financial opportunity cost. In theory, we

are interested in a price index specific to each and every durable good acquired by the

household. In practice, no such an index is likely to exist. Instead, it is common practice

for analysts to rely on the general consumer price index (CPI)21. Accordingly, we can

assume:

(11) 𝜋𝜋𝑡𝑡 =𝐶𝐶𝐶𝐶𝐶𝐶𝑡𝑡 − 𝐶𝐶𝐶𝐶𝐶𝐶𝑡𝑡−1

𝐶𝐶𝐶𝐶𝐶𝐶𝑡𝑡−1

Equation (11) implies that a CPI for the at least two years is available in order to identify

the real interest rate in equation (10).

21 To the extent that the analysis is focused on poverty issues, it is advisable to refer to a cost of living index

specific to poor households, that is, an index based on the consumption pattern of households belonging to the

bottom deciles of the distribution of income and or expenditure.

22

6.3 The market values of durable goods

As discussed in Section 3, irrespective of the approach that we adopt, the consumption flow

can be expressed as a fraction of the current market value of the durable 𝑝𝑝𝑣𝑣,𝑡𝑡𝑠𝑠 . The monetary

value 𝑝𝑝𝑣𝑣,𝑡𝑡𝑠𝑠 reflects the specific features of the durable good and the market conditions faced

by the household who owns the durable. Unfortunately, most household budget surveys do

not provide adequate information on 𝑝𝑝𝑣𝑣,𝑡𝑡𝑠𝑠 . In this sense, it is useful to distinguish between

the case when s = 0 (the durable good has been purchased by the household during the

survey year) and s > 0 (the durable has been purchased prior to the survey year). Typically,

household budget surveys only record the price paid by the household to purchase a new

durable good (no matter if purchased on the second hand market)22, which is information

that can be consistently used to estimate the market prices 𝑝𝑝𝑣𝑣,𝑡𝑡0 . The survey questionnaire

does not always report the market values of the durables purchased prior to the survey year.

When this is the case, we must rely on indirect estimation methods for the market value 𝑝𝑝𝑣𝑣,𝑡𝑡𝑠𝑠

based on the price paid by the household, on price changes between t and t-s, and on

estimates of the specific consumption durable depreciation rate.

Households are often asked to report the historical price paid s years before the survey year

for the purchase of the durable, which we denote with 𝑝𝑝𝑣𝑣−𝑠𝑠,𝑡𝑡−𝑠𝑠𝑠𝑠 . Let 𝜋𝜋� denote the yearly

durable specific inflation rate between t – s and t. The rate 𝜋𝜋� can be calculated by solving

the equation ∏ (1 + 𝜋𝜋𝑡𝑡−𝑠𝑠+𝑐𝑐)𝑠𝑠𝑐𝑐=0 = (1 + 𝜋𝜋�)𝑠𝑠. By keeping constant the age of the durable, we

can write:

(12) 𝑝𝑝𝑣𝑣−𝑠𝑠,𝑡𝑡𝑠𝑠 = (1 + 𝜋𝜋�)𝑠𝑠𝑝𝑝𝑣𝑣−𝑠𝑠,𝑡𝑡−𝑠𝑠

𝑠𝑠

22 Sometimes only the price paid for the durable good purchased more recently is reported. 23

The implementation of equation (12) requires the availability of the time series of some

sort of price index 𝐶𝐶𝐶𝐶𝐶𝐶𝑇𝑇 for T=t-s-1,…,t. If, for instance, the durable good was purchased

10 years prior to the survey year (s = 10), the analyst needs the price indices for the past 11

years. Secondly, equation (12) provides an estimate of 𝑝𝑝𝑣𝑣−𝑠𝑠,𝑡𝑡𝑠𝑠 , that is, of the current value of

a (v–s)-year old durable. However, the durable owned by the household is a v-year old

durable. To estimate 𝑝𝑝𝑣𝑣,𝑡𝑡𝑠𝑠 we need to know the average yearly economic depreciation rate

between age v – s and v. This is what we discuss in the next section.

6.4 The depreciation rate

The depreciation rate measures the loss (or gain) in value that durable goods experience

with age due to their physical deterioration and market value change. The depreciation

pattern of a durable good can be represented in a general way. In order to simplify our

discussion, we shall omit the index s, i.e. we will ignore the purchasing date of the

durable23. Thus, we can write:

(13) 𝑝𝑝1,𝑡𝑡 = (1 − 𝛿𝛿1) 𝑝𝑝0,𝑡𝑡

where 𝑝𝑝0,𝑡𝑡 is the market value of a new durable in t and 𝑝𝑝1,𝑡𝑡 is the market value of a 1-year

old durable in t. The value 𝛿𝛿1 ≤ 1 is the deterioration rate for the first year of life of the

durable. Following the same notation, and with the same interpretation, we can also write:

(14) 𝑝𝑝2,𝑡𝑡 = (1 − 𝛿𝛿2) 𝑝𝑝1,𝑡𝑡

23 In the rest of this section we also assume that delta is unique, for each good, at the national level. In

principle, one can explore the use of different deltas for different regions or population subgroups. To the best

of our knowledge, these possibilities have not been discussed in the literature.

24

Equation (14) expresses the price of the durable good when it turns 2-year old, that is at the

end of the second year, as a fraction of its value when it was 1-year old. Substituting

equation (13) in equation (14) we obtain:

(15) 𝑝𝑝2,𝑡𝑡 = (1 − 𝛿𝛿2)(1 − 𝛿𝛿1) 𝑝𝑝0,𝑡𝑡

Proceeding iteratively one obtains:

(16) 𝑝𝑝𝑣𝑣,𝑡𝑡 = �(1 − 𝛿𝛿𝑐𝑐)𝑣𝑣

𝑐𝑐=1

𝑝𝑝0,𝑡𝑡

According to equation (16) the entire depreciation pattern of the durable good between t–v

and t is described by the sequence {𝛿𝛿𝑐𝑐}𝑐𝑐=1𝑣𝑣 . There are, obviously, many ways of

characterizing this depreciation sequence and they all require different pieces of

information to be implemented. We will here discuss the three most common models that

are found in the literature 24: 1) the geometric depreciation model; 2) the straight line

depreciation and 3) the “light bulb” depreciation. Further, we will suggest a fourth

depreciation model, a mixture of methods 1) and 3), which has the advantage of being

parsimonious in terms of information requirement, and therefore potentially widely

applicable.

6.4.1 The geometric depreciation model

Under the geometric model the depreciation rate is assumed to be constant over time. In

other words, a constant fraction of the value of the durable is lost every year. In equation

(16) this implies that 𝛿𝛿𝑐𝑐 = 𝛿𝛿 for every i:

24 See Hulten and Wykoff (1981, 1996) and Diewert (2003, 2004, 2009). 25

(17) 𝑝𝑝𝑣𝑣,𝑡𝑡 = (1 − 𝛿𝛿)𝑣𝑣 𝑝𝑝0,𝑡𝑡

Equation (17) can be solved with respect to the (unique) depreciation rate 𝛿𝛿:

(18) 𝛿𝛿 = 1 − �𝑝𝑝𝑣𝑣,𝑡𝑡

𝑝𝑝0,𝑡𝑡�

1𝑣𝑣

In equation (18) the estimation of the depreciation rate only requires information on the

market values of homogeneous durable goods of different age. Due to its analytical

simplicity and its empirical robustness, the geometric model is one of the most popular

models25.

6.4.2 The straight line depreciation model

The straight line depreciation model assumes that the economic life of the durable is finite

and that its value follows a linear pattern of depreciation. Let us denote with T the

economic life of the durable: after T years its consumption flow equals zero. According to

the linear depreciation model we have:

(19) 𝑝𝑝𝑣𝑣,𝑡𝑡

𝑝𝑝0,𝑡𝑡= �

𝑇𝑇 − 𝑣𝑣𝑇𝑇

𝑖𝑖𝑖𝑖 𝑣𝑣 ≤ 𝑇𝑇

0 𝑜𝑜𝑡𝑡ℎ𝑒𝑒𝑟𝑟𝑒𝑒𝑖𝑖𝑠𝑠𝑒𝑒

25 See Hulten and Wykoff (1981, 1996) Jorgenson (1996) and Fraumeni (1997). The general conclusion of

the empirical literature on the capital depreciation rate is that the geometric pattern is closer to the actual

pattern than the patterns implied by the straight line and the light bulb depreciation model

26

Hence, it is straightforward to show that:

(20) 𝛿𝛿𝑐𝑐 = �1

𝑇𝑇 − 𝑖𝑖𝑖𝑖𝑖𝑖 𝑖𝑖 < 𝑇𝑇

1 𝑜𝑜𝑡𝑡ℎ𝑒𝑒𝑟𝑟𝑒𝑒𝑖𝑖𝑠𝑠𝑒𝑒

where 𝛿𝛿𝑐𝑐 is the depreciation rate for the year i of the economic life of the durable good.

Equation (20) shows that, unlike in the geometric depreciation method, the depreciation

rate here is not constant over time. This implies that in order to implement the user cost

formula we need to calculate a vintage-specific depreciation rate 26. Also note that the

implementation of the straight line depreciation model requires the knowledge of T, which

requires an estimate of the economic life for each and every durable good owned by the

households.

Deaton and Zaidi (2002) assume that the age of the durable goods is uniformly distributed.

Accordingly, 2𝑇𝑇� is an estimator of the maximum economic life of the durable, where 𝑇𝑇� is

the average life of the durable calculated form the data recorded in the survey. An

alternative consists in using some outlier-resistant statistics computed over the set of the

most long-lived goods in the sample: one possibility, for instance, is the 95th percentile of

the sample distribution of the ages of the durable good.

6.4.3 The light bulb depreciation model

The light bulb depreciation model, also known as the “one-hoss shay” model, is the

simplest among the depreciation models. The idea is that the durable maintains its

efficiency and value along all its economic life and ceases to work, like a bulb, after T

years. Accordingly, the consumption flow, as measured by the user cost, is constant over

26 For an application of the linear depreciation model see Diewert (2003).

27

the entire economic life of the durable. Let us define 𝑈𝑈𝑡𝑡 = 𝑈𝑈𝐶𝐶𝑡𝑡/(1 + 𝑖𝑖𝑡𝑡) as the user cost of

the durable evaluated at the beginning of period t. Equation (4) allows us to write:

(21) 𝑈𝑈𝑡𝑡 = 𝑝𝑝𝑣𝑣,𝑡𝑡𝑠𝑠 +

(1 + 𝜋𝜋𝑡𝑡)(1 + 𝑖𝑖𝑡𝑡)

𝑝𝑝𝑣𝑣+1,𝑡𝑡𝑠𝑠

If 𝑈𝑈𝑡𝑡 = 𝑈𝑈� is a constant for 𝑣𝑣 < 𝑇𝑇, then equation (21) can be rewritten as follows:

(22) 𝑝𝑝𝑣𝑣,𝑡𝑡𝑠𝑠 = 𝑈𝑈� + 𝛾𝛾𝑡𝑡𝑝𝑝𝑣𝑣+1,𝑡𝑡

𝑠𝑠

where 𝛾𝛾𝑡𝑡 = (1+𝜋𝜋𝑡𝑡)(1+𝑐𝑐𝑡𝑡) . We also assume 𝛾𝛾𝑡𝑡 < 1, i.e. that the real interest rate is always strictly

positive.

Each durable goods maintains its efficiency for T periods and then ceases to function.

Precisely at that time, 𝑝𝑝𝑇𝑇,𝑡𝑡𝑠𝑠 = 0. Substituting iteratively the RHS term in equation (22) we

obtain:

(23) 𝑝𝑝𝑣𝑣,𝑡𝑡𝑠𝑠 = 𝑈𝑈�[1 + 𝛾𝛾𝑡𝑡 + 𝛾𝛾𝑡𝑡2 … . +𝛾𝛾𝑡𝑡𝑇𝑇−1−𝑣𝑣] = 𝑈𝑈� �

1 − 𝛾𝛾𝑡𝑡𝑇𝑇−𝑣𝑣

1 − 𝛾𝛾𝑡𝑡�

By using equation (18) we obtain the final formula for the “light bulb” depreciation rates:

(23) 𝛿𝛿𝑣𝑣𝑡𝑡 = 1 − �1 − 𝛾𝛾𝑡𝑡𝑇𝑇−1−𝑣𝑣

1 − 𝛾𝛾𝑡𝑡𝑇𝑇−𝑣𝑣�

It should be observed that in order to estimate the depreciation rate for the user cost formula

we need to know the “duration” T of the durable, i.e. its economic life, and the parameter

28

𝛾𝛾𝑡𝑡 that depends on the period t asset specific inflation rate 𝜋𝜋𝑡𝑡 and on the nominal interest

rate 𝑖𝑖𝑡𝑡. If we assume that 𝛾𝛾𝑡𝑡 is constant over time the depreciation rate only varies with v,

i.e. with the vintage of the durable27.

6.4.4 A mixture depreciation model

In this section we introduce a depreciation model that can be obtained as a mixture of the

geometric and light bulb depreciation models. The pure geometric depreciation model

assumes that the value of the durable depreciates at a fixed rate 𝛿𝛿, but the economic life of

the durable never ends. In the new model, the key assumption is that durable goods

depreciate at a constant rate 𝛿𝛿, but the rate goes to one when 𝑣𝑣 > 𝑇𝑇, where T denotes the

maximum economic life of the durable.

The question then is how to determine the scrap value of the durable, i.e. the value of the

durable at v = T. One solution consists in assuming that the scrap value 𝑝𝑝𝑇𝑇,𝑡𝑡 is a fraction

𝛼𝛼 < 1 of the initial value 𝑝𝑝0,𝑡𝑡.

(24) 𝑝𝑝𝑇𝑇,𝑡𝑡 = 𝛼𝛼𝑝𝑝0,𝑡𝑡

The new parameter 𝛼𝛼 can be interpreted as a measure of the transaction cost for the

durable; 𝛼𝛼 is assumed here to be proportional to the initial value of the durable. The idea is

that at the end of the economic life of the durable good the transaction costs absorb all the

value of the durable. As a consequence, after T years the durable cannot be sold on the

market without incurring into a loss28. Substituting in equation (17) we obtain:

27 The pattern of the value of the durable can be easily calculated is given by 𝑝𝑝𝑣𝑣,𝑡𝑡 = (1 − 𝛾𝛾𝑡𝑡𝑇𝑇−𝑣𝑣 1 − 𝛾𝛾𝑡𝑡𝑇𝑇⁄ )𝑝𝑝0,𝑡𝑡. 28 The actualized value of the residual consumption flows from the durable good (the reservation price) is less

than the transacion cost that must be paid to purchase the durable. 29

(25) 𝛼𝛼𝑝𝑝0,𝑡𝑡 = (1 − 𝛿𝛿)𝑇𝑇 𝑝𝑝0,𝑡𝑡

and solving with respect to 𝛿𝛿 we can write:

(26) 𝛿𝛿 = 1 − 𝛼𝛼1𝑇𝑇

Equation (26) compares with equation (18). The advantage of equation (26) is that it does

not require information on the current market prices of durable goods with different

manufacturing years. This facilitates tremendously its implementation, both when data are

relatively abundant, and even more so when the survey only provides an inventory of the

durable goods owned by the households. A drawback is that the estimate of the

depreciation rate 𝛿𝛿 depends on the parameter 𝛼𝛼, which is arbitrarily chosen by the analyst.

In fact, it can be shown that the consumption flow from the durable is quite insensitive to

choice of alpha. Equation (26) can be substituted in equation (10) and the elasticity of the

consumption flow with respect to alpha 𝜂𝜂𝐶𝐶𝐶𝐶 𝛼𝛼⁄ can be easily calculated:

(27) 𝜂𝜂𝐶𝐶𝐶𝐶𝑡𝑡 𝛼𝛼⁄ = 1𝑇𝑇 �𝑟𝑟𝑡𝑡 + 1− 2𝛼𝛼

1𝑇𝑇

𝑟𝑟𝑡𝑡 + 1−𝛼𝛼1𝑇𝑇� = 1

𝑇𝑇ℎ(𝛼𝛼)

where ℎ(𝛼𝛼) is always less than one and is a decreasing function of 𝛼𝛼. Hence the elasticity

of the consumption flow with respect to 𝛼𝛼 is always less than one and tends to zero as the

age of the durable tends to infinity. To get a sense of the magnitude of 𝜂𝜂𝐶𝐶𝐶𝐶 𝛼𝛼⁄ , assume a

conservative scenario where 𝑇𝑇 = 2 , 𝛼𝛼 = 0.10 and 𝑟𝑟𝑡𝑡 = 0.05; under this assumption the

elasticity 𝜂𝜂𝐶𝐶𝐶𝐶 𝛼𝛼⁄ = 0.28, that is a one percent increase in 𝛼𝛼 is associated with a 0.28 percent

increase in the consumption flow. Ceteris paribus, if 𝑇𝑇 = 25, 𝜂𝜂𝐶𝐶𝐶𝐶 𝛼𝛼⁄ = −0.22. Again, the

consumption flow is anelastic to 𝛼𝛼. 30

Figure 2 – Depreciation models compared

Source: our elaboration.

Figure 2 compares the four depreciation models described in this section. On the vertical

axis, the initial value of the durable good is normalized to one. We assume that the

geometric depreciation rate is 10% and that the economic life T is equal to 25 years

(empirically, these are reasonable assumptions for many consumer durables). The dashed

line describes the pattern of the straight line model, while the blue line describes the pattern

of the bulb light model. The geometric depreciation pattern is represented by the black solid

line. The green and the red lines describe the mixture model under the assumption that

𝛼𝛼 = 5% and 𝛼𝛼 = 10%, respectively.

31

6.4.5 Econometric models

A depreciation rate gives the percentage change of the market value of a durable good

between two subsequent years. In principle, depreciation rates can be calculated directly

from the empirical age-price curves for each durable good. In practice, however, lack of

adequate data prevent the analyst from drawing complete age profiles for most durable

goods and, as a consequence, to estimate the required depreciation rates 29. It is then

necessary to estimate the age-price profile by using an econometric model.

Studying the economic depreciation of capital assets Hulten and Whykoff (1981)

introduced an econometric model that encompasses all the theoretical depreciation models

described in sections 6.4.1-6.4.330. Let 𝑝𝑝𝑐𝑐 be the observed market price of an asset of age 𝑣𝑣𝑐𝑐

in year 𝑡𝑡𝑐𝑐. Then, the Box-Cox model for the used asset price is:

(28) ��𝑝𝑐𝑐 =∝ +𝛽𝛽𝑣𝑣�𝑐𝑐 + 𝛾𝛾��𝑡𝑐𝑐 + 𝑢𝑢𝑐𝑐

where

��𝑝𝑐𝑐 =𝑝𝑝𝑐𝑐𝜃𝜃1 − 1𝜃𝜃1

; 𝑣𝑣�𝑐𝑐 =𝑣𝑣𝑐𝑐𝜃𝜃2 − 1𝜃𝜃2

; ��𝑡𝑐𝑐 =𝑡𝑡𝑐𝑐𝜃𝜃3 − 1𝜃𝜃3

29 As observed by Hulten and Wykoff (1981a, 1981b), there is also a problem of censoring. The used price

sample contains only the price of the “survived” durables and they provide a biased estimate of the average

value of a specific vintage. A possible correction for censoring consists in multiplying the prices of surviving

assets of each vintage by the age dependent probabilities of survival. Jorgenson (1996) shows that this

correction has a relevant impact on depreciation rate estimates. Another possible source of bias depends on

the fact that the equilibrium prices in second hand markets might be largely affected by adverse selection

phenomena (Akerlof, 1970). 30 An alternative econometric model was proposed by Oliner (1993) who augmented the standard linear

regression model with polynomial components for 𝑣𝑣𝑐𝑐 and 𝑡𝑡𝑐𝑐. Micro-Economic Analysis Division of Canada

(2007) has carried out a variety of other estimation exercises based on survival econometric models.

32

and 𝑢𝑢𝑐𝑐 ∼ 𝑁𝑁(0,𝜎𝜎2). The parameter vector 𝜽𝜽 = (𝜃𝜃1,𝜃𝜃2,𝜃𝜃3) identifies the specific functional

form of the Box-Cox power transformation that corresponds to the theoretical depreciation

models discussed above. In particular, for 𝜽𝜽 = (0,1,1) equation (28) takes the semi-log

form that corresponds to the geometric depreciation pattern31. When 𝜽𝜽 = (1,1,1) we obtain

the linear depreciation pattern, while 𝜽𝜽 = (1,3,1) identifies the light bulb model.

In the absence of reliable data on used asset prices, the assumption of a geometric

depreciation pattern can be used to derive an alternative estimation procedure:

(29) 𝛿𝛿 =𝑝𝑝𝑣𝑣 − 𝑝𝑝𝑣𝑣+1

𝑝𝑝𝑣𝑣; 𝑣𝑣 = 0,1, … . . ,𝑇𝑇− 1

where 𝑝𝑝𝑣𝑣 is the vintage v asset price (we omit here the temporal index t) and T is the

expected economic life of the asset. From equation (29) we derive:

(30) 𝑇𝑇𝛿𝛿 = �𝑝𝑝𝑣𝑣 − 𝑝𝑝𝑣𝑣+1

𝑝𝑝𝑣𝑣

𝑇𝑇−1

𝑣𝑣=0

Equation (30) can be rewritten as follows:

(31) 𝛿𝛿 = ��𝑝𝑝𝑣𝑣 − 𝑝𝑝𝑣𝑣+1

𝑝𝑝𝑣𝑣

𝑇𝑇−1

𝑣𝑣=0� 𝑇𝑇� = 𝐷𝐷𝐷𝐷𝑅𝑅

𝑇𝑇

where DBR is known in the literature as the “declining balance rate”. The larger is the

expected service life T of the durable, the higher must be the declining balance rate

31 See also Jorgenson (1996)

33

consistent with a given geometric depreciation rate32. The idea of the method is that of

estimating directly DBR for all the assets for which there are reliable information on prices

and then to use these estimates to calculate, by means of eq. (31), the depreciation rate for

which there are only information about the expected economic life of the asset. This

“second best” estimation procedure that was firstly proposed by Hulten and Wykoff

(1981a, 1981b, 1996) is the one mainly adopted by the Bureau of Economic Analysis to

produce its official comprehensive table of depreciation rates33 (Fraumeni, 1997, BEA,

2003).

7 Conclusions

The estimation of the value of the flow of services from durable goods is a relevant issue in

many areas of economic analysis, from national accounting to price index theory, as well as

in welfare analysis. In this paper the focus has been on the treatment that consumer durable

goods must receive in the process of constructing a welfare aggregate. The economic

literature is relatively abundant in theoretical studies focused on the economic depreciation

of assets. The single most debated issue refers to the measurement of capital and

investment decisions. In contrast, the questions of economic depreciation of consumer

durable goods and the impact that different approaches have on the measurement of welfare

at the individual- or household-level have remained largely unexplored issues.

A general conclusion of the literature is that the user cost approach (section 3.3) is the most

appropriate pricing concept to evaluate the flow of services from durable assets. We also

found a broad consensus on the fact that the geometric depreciation model (section 6.4.1) is

the most empirically robust as well as theoretically consistent depreciation model for

32 Equation (31) is clearly an approximated formula. In the geometric depreciation model there is no ending

date for the capital asset. 33 According to the most recent BEA estimates, the DBR values range from 0.89 to 2.27

(www.bea.gov/national/pdf/BEA_depreciation_rates.pdf).

34

capital assets. Our analysis supports the fact that most of the advantages that hold true for

capital goods, broadly defined, also hold true for consumer durable goods. Alternative

approaches, like the acquisition approach or the rental equivalence approach, imply a

higher risk of affecting in undesired ways the distribution of the welfare aggregate, and

more generally welfare comparisons.

The second finding, relating to the advantages of the geometric depreciation model, is more

controversial for two reasons. The empirical evidence in favor of the geometric model is

based on capital assets and not on consumer durables. Extending the same line of argument

to consumer durables is not straightforward. Secondly, the lack of adequate data is often

responsible for insurmountable difficulties that prevent the analyst from estimating the

geometric depreciation rate for consumer durable goods. We suggested a new depreciation

model that preserves the basic structure of the geometric model, but relies on a more

parsimonious set of statistical information.

The main deficiency in the literature reviewed in this paper is that it fails to assess the

impact of the consumption flow from durable goods on the measurement of welfare and its

distribution. We are not aware of studies that come up with an evaluation of the impact on

poverty and inequality measures of including (excluding) the consumption flow in (from)

the welfare aggregate. Nor are we aware of studies that report the sensitivity of the welfare

distribution to different estimation methods of estimating the consumption flow. This adds

uncertainty when it comes to advising on the “best method” to use in practice, and to

designing guidelines for the construction of the welfare aggregate. Further research is badly

needed in this area.

35

References

Akerlof, G. (1970), The Market for Lemons, Quarterly Journal of Economics, 488-500

Alchian, A. and B. Klein, (1973) On a Correct Measure of Inflation, Journal of Money Credit and Banking, Vol. 5, 173-191.

Amendola, N. and G. Vecchi (2010), "Setting a Poverty Line for Iraq", in Confronting Poverty in Iraq: an Analytical Report on the Living Standard of the Iraqi Population, The World Bank.

Arrow, K. J. (1971) Essays on the Theory of Risk Bearing. Chicago: Markham Publishing.

Australian Bureau of Statistics (2012), Household Expenditure Survey and Survey of Income and Housing, User Guide, Australia 2009–10, ABS, Canberra.

Australian Bureau of Statistics (2013), Household Income and Income Distribution, Australia, 2011-12. ABS, Canberra.

Bureau of Economic Analysis (BEA), (2003), Fixed Assets and Consumer Durable Goods in the United States, 1925-97, U.S. Department of Commerce

Citro, C.F. and R.T. Michael (eds.) (1995), Measuring Poverty. A New Approach. Washington DC: National Academy Press.

Deaton, A. (1997), The Analysis of Household Surveys. A Microeconomateric Approach to Development Policy. The John Hopkins University Press.

Deaton A. and J. Muellbauer (1980), Economics and Consumer Behavior, Cambridge University Press.

Deaton A. and S. Zaidi (2002), “Guidelines for Constructing Consumption Aggregates for Welfare Analysis.” Living Standards Measurement Study Working Paper n. 135. The World Bank, Washington, DC.

Department for Work and Pensions (2013), Households Below Average Income. An analysis of the income distribution 1994/95 – 2011/12, June 2013 (United Kingdom)

Diewert, W. E. (2002) ‘‘Harmonized Indexes of Consumer Prices: Their Conceptual Foundations’’, in Swiss Journal of Economics and Statistics, Vol. 138, No. 4.

Diewert, W. E. (2003), “Measuring Capital”, NBER Working Paper, n. 9526

Diewert, W. E. (2004), “Durables and User Costs” in ILO, Consumer Price Index Manual: Theory and Practice, chapter 23, ILO/IMF/OECD/UNECE/Eurostat/World Bank.

Diewert, W. E. (2009), “Durables and Owner-Occupied Housing in a Consumer Price Index” in W. E. Diewert , J.S. Greenlees and C.R. Hulten (eds.), Price Index Concepts and Measurements, University of Chicago Press.

36

Fellegi, I.P. (1997), “On poverty and low income”, available here: http://www.statcan.gc.ca/pub/13f0027x/13f0027x1999001-eng.htm.

Fisher, I (1911), The purchasing power of money, Its Determination and Relation to Credit, Interest, and Crises. New Yoor: The McMillan Co.

Fraumeni, B. (1997) ”Measurement of Depreciation in the US. National Income and Wealth Accounts.” Survey of Current Business. US, vol 77 (7), 7 – 23

Garner, T. and R. Verbrugge (2007), “The Puzzling Divergence of U.S. Rents and User Costs, 1980-2004: Summary and Extensions”, BLS Working Papers, WP n.409

Gillingham R. (1983), “Measuring the Cost of Shelter for Homeowners: Theoretical and Empirical Considerations”, The Review of Economics and Statistics, (65)2: 254-65.

Goodhart, C. (2001), “What Weight Should be given to Asset Price in Measurement of Inflation?”, The Economic Journal, Vol. 111, 335-356.

Government of India (2007) “Poverty Estimates for 2004-05”. New Delhi.

Hicks, J.R. (1955), Capital and Growth, Oxford Clarendon Press

Hulten, C. R. (1990), “The measurement of capital”. In Fifty years of economic measurement, ed. E. R. Berndt and J. E. Triplett, 119– 58. Chicago: University of Chicago Press.

Hulten, C. R. and F. Wykoff (1981), “The measurement of Economic Depreciation” in C. R. Hulten eds, Inflation and Taxation of Income from Capital, Washington DC

Hulten, C. R. and F. Wykoff (1996), “Issues in the Measurement of Economic Depreciation: Introductory Remarks”, Economic Inquiry, 34 (1): 10– 23.

ILO (2004), Consumer price index manual: Theory and Practice. Geneva, International Labour Office, 2004

Jalava, J and Kavonius, K (2009), “Measuring The Stock of Consumer Durables and its Implications for Euro Area Savings Ratios”, Review of Income and Wealth, 55, 1.

Jorgenson, D. W. (1963), “Capital Theory and Investment Behaviour”, American Economic Review, 53: 247-259.

Jorgenson, D. W. (1973), “The economic Theory of Replacement and Depreciation”. In Econometrics and Economic Theory, ed. W. Sellekaerts, 189– 221. New York: Macmillan.

Jorgenson D. W. and S. Lanfeld (2006) “Blueprint for Expanded and Integrated U.S. Accounts Review, Assessment, and Next Steps”, in A New Architecture for the U.S. National Accounts, Dale W. Jorgenson, J. Steven Landefeld, and William D. Nordhaus (eds.), The University of Chicago Press Chicago and

37

Koumanakos, P. and J.C. Hwang, (1988). The Forms and Rates of Economic Depreciation, The Canadian Experience. Presented at the 50th anniversary meeting of the Conference on Research in Income and Wealth, Washington, DC, May 1988.

Katz, A. J. (1983), “Valuing the services of consumer durables”, Review of Income and Wealth, 29 (4): 405– 27.

Lanjouw, P (2009), “Constructing a Consumption Aggregate for the Purpose of Welfare Analysis: Principles, Issues and Recommendations Arising from the Case of Brazil”.

Meyer, B. D. and J.X. Sullivan (2011), “Consumption and Income Poverty over the Business Cycle”, NBER Working Paper No. 16751.

Micro-Economic Analysis Division (2007), “Depreciation Rates for the Productivity accounts”, The Canadian Productivity Review.

Moulton, B.R. (2004), “The System of National Accountsfor the New Economy: What Should Change?”, Review of Income and Wealth, 50 (2): 261-278.

Murphy, B., X. Zhang, and C. Dionne (2010), “Revising Statistics Canada's Low Income Measure (LIM)”, Income Research Paper Series, Statistics Canada, Income Statistics Division.

Murphy, B., X. Zhang, and C. Dionne (2012), “Low Income in Canada: a Multi-line and Multi-index Perspective”, Income Research Paper Series, Statistics Canada, Income Statistics Division.

OECD (2009), Manual: Measuring Capital, Second Edition OECD, Paris,

OECD (2013), OECD Framework for Statistics on the Distribution of Household Income, Consumption and Wealth, OECD Publishing.

Offer, A. (2005), The Challenge of Affluence. Self-Control and Well-Being in the United States and Britain since 1950. Oxford University Press.

Oliner, S. D. (1993) ”Constant-Quality Price Change, Depreciation, and Retirement of Mainframe Computers,“ in Price Measurements and Their Uses, edited by M. F. Foss, M. E. Manser, and A. H. Young. Chicago: University of Chicago Press.

Slesnik, D.T. (2000), Consumption and Social Welfare: Living Standards and Their Distribution in the United States, Cambridge University Press.

System of National Accounts (2008), Commission of the European Communities, International Monetary Fund, United Nations, World Bank, Brussels/Luxembourg, New York, Paris, Washington, DC, 1993.

Young, A.S. (2005), “Some Uncertainties in Household Consumption Expenditure Statistics”, mimeo (www.statssa.gov.za/commonwealth/presentations/Paper_young.pdf).

38

Handbook of

StatisticalData Editingand Imputation

Wiley Handbooks in

Survey Methodology

The Wiley Handbooks in Survey Methodology is a series of books that presentboth established techniques and cutting-edge developments in the field of surveyresearch. The goal of each handbook is to supply a practical, one-stop referencethat treats the statistical theory, formulae, and applications that, together,make up the cornerstones of a particular topic in the field. A self-containedpresentation allows each volume to serve as a quick reference on ideas andmethods for practitioners, while providing an accessible introduction to keyconcepts for students. The result is a high-quality, comprehensive collection thatis sure to serve as a mainstay for novices and professionals alike.

De Waal, Pannekoek, and Scholtus—Handbook of Statistical Data Editing andImputation

Forthcoming Wiley Handbooks in Survey Methodology

Bethlehem, Cobben, and Schouten—Handbook of Nonresponse in HouseholdSurveys

Bethlehem and Biffignandi—Handbook of Web Surveys

Alwin—Handbook of Measurement and Reliability in the Social and BehavioralSciences

Larsen and Winkler—Handbook of Record Linkage Methods

Johnson—Handbook of Health Survey Methods

Handbook of

StatisticalData Editingand Imputation

Ton de Waal

Jeroen Pannekoek

Sander ScholtusStatistics Netherlands

A John Wiley & Sons, Inc., Publication

Copyright 2011 John Wiley & Sons, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New JerseyPublished simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by anymeans, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted underSection 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of thePublisher, or authorization through payment of the appropriate per-copy fee to the Copyright ClearanceCenter, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web atwww.copyright.com. Requests to the Publisher for permission should be addressed to the PermissionsDepartment, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201)748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts inpreparing this book, they make no representations or warranties with respect to the accuracy or completeness ofthe contents of this book and specifically disclaim any implied warranties of merchantability or fitness for aparticular purpose. No warranty may be created or extended by sales representatives or written sales materials.The advice and strategies contained herein may not be suitable for your situation. You should consult with aprofessional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any othercommercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact ourCustomer Care Department within the United States at (800) 762-2974, outside the United States at (317)572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some Content that appears in print may not beavailable in electronic formats. For more information about Wiley products, visit our web site atwww.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Waal, Ton de.Handbook of statistical data editing and imputation / Ton de Waal, Jeroen Pannekoek, Sander Scholtus.

p. cm.Includes bibliographical references and index.ISBN 978-0-470-54280-4 (cloth)

1. Statistics—Standards. 2. Data editing. 3. Data integrity. 4. Quality control. 5. Statisticalservices—Evaluation. I. Pannekoek, Jeroen, 1951-II. Scholtus, Sander, 1983-III. Title.

HA29.W23 2011001.4′22—dc22

2010018483Printed in Singapore

oBook: 978-0-470-90484-8eBook: 978-0-470-90483-110 9 8 7 6 5 4 3 2 1

Contents

PREFACE ix

1 INTRODUCTION TO STATISTICAL DATAEDITING AND IMPUTATION 1

1.1 Introduction, 11.2 Statistical Data Editing and Imputation in the Statistical

Process, 41.3 Data, Errors, Missing Data, and Edits, 61.4 Basic Methods for Statistical Data Editing and

Imputation, 131.5 An Edit and Imputation Strategy, 17

References, 21

2 METHODS FOR DEDUCTIVECORRECTION 23

2.1 Introduction, 232.2 Theory and Applications, 242.3 Examples, 272.4 Summary, 55

References, 55

3 AUTOMATIC EDITING OF CONTINUOUSDATA 57

3.1 Introduction, 573.2 Automatic Error Localization of Random Errors, 593.3 Aspects of the Fellegi–Holt Paradigm, 633.4 Algorithms Based on the Fellegi–Holt Paradigm, 653.5 Summary, 1013.A Appendix: Chernikova’s Algorithm, 103

References, 104

v

vi Contents

4 AUTOMATIC EDITING: EXTENSIONS TOCATEGORICAL DATA 111

4.1 Introduction, 1114.2 The Error Localization Problem for Mixed Data, 1124.3 The Fellegi–Holt Approach, 1154.4 A Branch-and-Bound Algorithm for Automatic Editing of

Mixed Data, 1294.5 The Nearest-Neighbor Imputation Methodology, 140

References, 158

5 AUTOMATIC EDITING: EXTENSIONS TOINTEGER DATA 161

5.1 Introduction, 1615.2 An Illustration of the Error Localization Problem for

Integer Data, 1625.3 Fourier–Motzkin Elimination in Integer Data, 1635.4 Error Localization in Categorical, Continuous, and Integer

Data, 1725.5 A Heuristic Procedure, 1825.6 Computational Results, 1835.7 Discussion, 187

References, 189

6 SELECTIVE EDITING 191

6.1 Introduction, 1916.2 Historical Notes, 1936.3 Micro-selection: The Score Function Approach, 1956.4 Selection at the Macro-level, 2086.5 Interactive Editing, 2126.6 Summary and Conclusions, 217

References, 219

7 IMPUTATION 223

7.1 Introduction, 2237.2 General Issues in Applying Imputation Methods, 2267.3 Regression Imputation, 2307.4 Ratio Imputation, 2447.5 (Group) Mean Imputation, 2467.6 Hot Deck Donor Imputation, 249

Contents vii

7.7 A General Imputation Model, 255

7.8 Imputation of Longitudinal Data, 261

7.9 Approaches to Variance Estimation with ImputedData, 264

7.10 Fractional Imputation, 271

References, 272

8 MULTIVARIATE IMPUTATION 277

8.1 Introduction, 277

8.2 Multivariate Imputation Models, 280

8.3 Maximum Likelihood Estimation in the Presence ofMissing Data, 285

8.4 Example: The Public Libraries, 295

References, 297

9 IMPUTATION UNDER EDITCONSTRAINTS 299

9.1 Introduction, 299

9.2 Deductive Imputation, 301

9.3 The Ratio Hot Deck Method, 311

9.4 Imputing from a Dirichlet Distribution, 313

9.5 Imputing from a Singular Normal Distribution, 318

9.6 An Imputation Approach Based on Fourier–MotzkinElimination, 334

9.7 A Sequential Regression Approach, 338

9.8 Calibrated Imputation of Numerical Data Under LinearEdit Restrictions, 343

9.9 Calibrated Hot Deck Imputation Subject to EditRestrictions, 349

References, 358

10 ADJUSTMENT OF IMPUTED DATA 361

10.1 Introduction, 361

10.2 Adjustment of Numerical Variables, 362

10.3 Adjustment of Mixed Continuous and CategoricalData, 377

References, 389

viii Contents

11 PRACTICAL APPLICATIONS 391

11.1 Introduction, 39111.2 Automatic Editing of Environmental Costs, 39111.3 The EUREDIT Project: An Evaluation Study, 40011.4 Selective Editing in the Dutch Agricultural Census, 420

References, 426

INDEX 429

Preface

Collected survey data generally contain errors and missing values. In particular,the data collection stage is a potential source of errors and of nonresponse.For instance, a respondent may give a wrong answer (intentionally or not),or no answer at all (either because he does not know the answer or becausehe does not want to answer the question). Errors and missing values can alsobe introduced later on, for instance when the data are transferred from theoriginal questionnaires to a computer system. The occurrence of nonresponseand, especially, errors in the observed data makes it necessary to carry out anextensive process of checking the collected data, and, when necessary, correctingthem. This checking and correction process is referred to as ‘‘statistical dataediting and imputation.’’

In this book, we discuss theory and practical applications of statistical dataediting and imputation; important topics for all institutes that produce surveydata, such as national statistical institutes (NSIs). In fact, it has been estimatedthat NSIs spend approximately 40% of their resources on editing and imputingdata. Any improvement in the efficiency of the editing and imputation processshould therefore be highly welcomed by NSIs. Besides NSIs, other producersof statistical data, such as market researchers, also apply edit and imputationtechniques.

The importance of statistical data editing and imputation for NSIs andacademic researchers is reflected by the sessions on statistical data editing andimputation that are regularly organized at international conferences on statistics.The United Nations consider statistical data editing to be such an importanttopic they organize a so-called work session on statistical data editing every 18months. This work session, well-attended by experts from all over the world,addresses modern techniques for statistical data editing and imputation anddiscusses its practical applications.

As far as we are aware, this is the first book to treat statistical data editing indetail. Existing books treat statistical data editing only a secondary topic, and thediscussion of statistical data editing is limited to just a few pages. In this book,statistical data editing is a main topic, and we discuss both general theoreticalresults and practical applications.

The other main topic of this book is imputation. Since several well-knownbooks on imputation of missing data are available, this raises the question whywe deemed a further treatment of imputation in this book useful. We can give

ix

x Preface

two reasons. First, in practice—in any case in practice at NSIs—statistical dataediting and imputation are often applied in combination. In some cases it iseven hard to point out where the detection of errors ends and where imputationof better values starts. This is, for instance, the case for the so-called Nearest-neighbor Imputation Methodology discussed in Chapter 4. It would thereforebe somewhat contrived to reserve attention only to data editing and leave out adiscussion of imputation methods altogether.

The second reason why we have included imputation as a topic in this bookis that data often have to satisfy certain consistency rules, the so-called edit rules.In practice it is quite common that imputed values have to satisfy at least someconsistency rules. However, as far as we are aware, no other book on imputationexamines this particular topic. NSIs and other producers of statistical datagenerally apply ad hoc operations to adapt the imputation methods described inthe literature so imputed data will satisfy the edit rules. We hope that our bookwill be a valuable guide to applying more advanced imputation methods that cantake edit rules into account.

In fact a close connection exists between statistical data editing and imputa-tion. Statistical data editing is used to detect erroneous data and imputation isused to correct these data. During these processes often the same edit rules areapplied, i.e., the same consistency checks that are used to detect errors in theoriginal data must be satisfied by the imputed data later on. We feel that thisclose connection between statistical data editing and imputation merits one bookdescribing both topics.

The intended audience of this book consists of researchers at NSIs andother data producers, and students at universities. Since the overall aim of bothstatistical data editing and imputation is to obtain data of high statistical quality,the book obviously treats many statistical topics, and is therefore primarily ofinterest to students and experts in the field of (mathematical) statistics.

Some readers might be surprised to find that the book also treats topicsfrom operations research, such as optimization algorithms, as well as sometechniques from general mathematics. However, an interest in these topics arisesquite naturally from the problems that we discuss. An important example isthe problem of finding the erroneous values in a record of data that does notsatisfy certain edit rules. This so-called error localization problem can be castas a mathematical programming problem, which may then be solved usingtechniques from operations research. Another example concerns the adjustmentof data to attain consistency with edit rules or with data from other sources(benchmarking). These problems too can be cast as mathematical programmingproblems.

Broadly speaking, applications of operations research techniques frequentlyoccur in the chapters on statistical data editing, whereas the chapters onimputation are of a more statistical nature. Some parts of the material onstatistical data editing (Chapters 2 to 6 and 10) may therefore be more appealingto students and experts in operations research or general mathematics rather thanto students and experts in the field of statistics.

Preface xi

Acknowledgments

There are many people who have at some point contributed to this book, oftenwithout being aware of this at the time. We would like to thank all studentsfrom several Dutch universities who did an internship on statistical data editingor imputation at Statistics Netherlands, all former and current colleagues atStatistics Netherlands we had the pleasure of collaborating with on these topics,and all colleagues at universities and statistical institutes around the world wehave worked with in international research projects or met at one of the worksessions on statistical data editing. It has been a privilege for us to work with,and get to know, all of you.

There are too many people we would like to thank to name all of them here,at least not without expanding this book to twice its current size. We restrictourselves to naming a few of them. We would like to thank Jacco Daalmans,Jeffrey Hoogland, Abby Israels, Mark van der Loo, and Caren Tempelman, forcollaborating with us over the years on many practical applications of statisticaldata editing and imputation at Statistics Netherlands. The experiences gatheredfrom all these projects have somehow made their way into this book. We shouldalso mention that Caren Tempelman’s Ph.D. thesis on imputation methods forrestricted data, which she wrote while working at Statistics Netherlands, was awonderful source of material for Chapter 9.

We also would like to thank co-authors of several articles that form thebasis of some chapters in this book: Ronan Quere, Marco Remmerswaal, andespecially Wieger Coutinho and Natalie Shlomo.

We want to thank a few people who have been more directly involvedwith this book. Firstly, we want to thank Natalie Shlomo again. This time forcarefully identifying errors and occurrences of missing data in an early version ofthe manuscript. Your remarks left us with many imputations to carry out, butthey have only improved the contents of this book. Any errors that remain are ofcourse entirely our fault. We also want to thank the staff at John Wiley & Sons,especially Jackie Palmieri and Lisa Van Horn, for giving us the opportunity towrite this book in the first place, and for always reminding us of approachingdeadlines. Without you this book would never have been finished.

Finally, we want to thank our families for their support and love.

Ton de WaalJeroen PannekoekSander Scholtus

The Hague, The NetherlandsAugust 2010

OneChapter

Introduction toStatistical Data Editingand Imputation

1.1 Introduction

It is the task of National Statistical Institutes (NSIs) and other official statisticalinstitutes to provide high-quality statistical information on many aspects ofsociety, as up-to-date and as accurately as possible. One of the difficulties inperforming this task arises from the fact that the data sources that are used for theproduction of statistical output, both traditional surveys as well as administrativedata, inevitably contain errors that may influence the estimates of publicationfigures. In order to prevent substantial bias and inconsistencies in publicationfigures, NSIs therefore carry out an extensive process of checking the collecteddata and correcting them if necessary. This process of improving the data qualityby detecting and correcting errors encompasses a variety of procedures, bothmanual and automatic, that are referred to as statistical data editing. The effectsof statistical data editing on the errors have been examined since the mid-1950s[see Nordbotten (1955)].

Besides errors in the data, another complicating factor in order to fulfillthe task of NSIs and other statistical institutes successfully is that data are oftenmissing. This can be seen as a simple form of erroneous data, simple in the sensethat missing values are easy to identify; estimating good values for these missingvalues may, however, be hard.

Handbook of Statistical Data Editing and Imputation, by T. de Waal, J. Pannekoek, and S. ScholtusCopyright 2011 John Wiley & Sons, Inc

1

2 CHAPTER 1 Introduction to Statistical Data Editing and Imputation

Errors and missing data can arise during the measurement process. Errorsarise during the measurement process when the reported values differ from thetrue values. A possible reason for a measurement error can be that the true valueis unknown to the respondent or difficult to obtain. Another reason could bethat questions are misinterpreted or misread by the respondent. An example isthe so-called unity measurement error that occurs if the respondent reports ineuros when it was required to report in thousands of euros. Another example isa respondent reporting his own income when asked for the household income.For business surveys, errors also occur due to differences in definitions used bythe statistical office and the accounting system of the responding unit. Theremay, for instance, be differences in the reference period used by the business andthe requested period (financial year versus calender year is an example). Afterthe data have been collected, they will pass through several other processes, suchas keying, coding, editing, and imputation. Errors that arise during this furtherprocessing are referred to as processing errors. Note that although the purposeof editing is to correct errors, it is mentioned here also as a process that mayoccasionally introduce errors. This undesirable situation arises if an item value isadjusted because it appeared to be in error but it is actually correct. Missing datacan arise when a respondent does not know the answer to a question or refusesto give the answer to a certain question.

Traditionally, NSIs have always put a lot of effort and resources into statisticaldata editing, because they considered it a prerequisite for publishing accuratestatistics. In traditional survey processing, statistical data editing was mainly aninteractive activity intended to correct all data in every detail. Detected errors orinconsistencies were reported and explained on a computer screen and correctedafter consulting the questionnaire or contacting respondents, which are time-and labor-intensive procedures. In this book we examine more efficient statisticaldata editing methods.

It has long been recognized that it is not necessary to correct all data in everydetail. Several studies [see, for example, Granquist (1984, 1997) and Granquistand Kovar (1997)] have shown that in general it is not necessary to removeall errors from a data set in order to obtain reliable publication figures. Themain products of statistical offices are tables containing aggregate data, whichare often based on samples of the population. This implies that small errors inindividual records are acceptable. First, because small errors in individual recordstend to cancel out when aggregated. Second, because if the data are obtainedfrom a sample of the population, there will always be a sampling error in thepublished figures, even when all collected data are completely correct. In thiscase an error in the results caused by incorrect data is acceptable as long as it issmall in comparison to the sampling error. In order to obtain data of sufficientlyhigh quality, it is usually enough to remove only the most influential errors.The above-mentioned studies have been confirmed by many years of practicalexperience at several statistical offices.

In the past, and often even in the present, too much effort was spent oncorrecting errors that did not have a noticeable impact on the ultimately publishedfigures. This has been referred to as ‘‘over-editing.’’ Over-editing not only costs

1.1 Introduction 3

money, but also takes a considerable amount of time, making the period betweendata collection and publication unnecessarily long. Sometimes over-editing evenbecomes ‘‘creative editing’’; the editing process is then continued for such alength of time that unlikely, but correct, data are ‘‘corrected.’’ Such unjustifiedalterations can be detrimental for data quality. For more about the danger ofover-editing and creative editing, see, for example, Granquist (1995, 1997) andGranquist and Kovar (1997).

It has been argued that the role of statistical data editing should be broaderthan only error localization and correction. Granquist (1995) identifies thefollowing main objectives:

1. Identify error sources in order to provide feedback on the entire surveyprocess.

2. Provide information about the quality of the incoming and outgoing data.

3. Identify and treat influential errors and outliers in individual data.

4. When needed, provide complete and consistent individual data.

During the last few years, the first two goals—providing feedback on theother survey phases, such as the data collection phase, and providing informationon the quality of the final results—have gained in importance. The feedback onother survey phases can be used to improve those phases and reduce the amountof errors arising in these phases. In the next few years the first two goals of dataediting are likely to become even more important. The main focus in this bookis, however, on the latter, more traditional, two goals of statistical data editing.Statistical data editing is examined in Chapters 2 to 6.

Missing data is a well-known problem that has to be faced by basically allinstitutes that collect data on persons or enterprises. In the statistical literature,ample attention is hence paid to missing data. The most common solutionto handle missing data in data sets is imputation, where missing values areestimated and filled in. An important problem of imputation is to preserve thestatistical distribution of the data set. This is a complicated problem, especiallyfor high-dimensional data. Chapters 7 and 8 examine this aspect of imputation.

At NSIs the imputation problem is further complicated owing to theexistence of constraints in the form of edit restrictions, or edits for short, thathave to be satisfied by the data. Examples of such edits are that the profit andthe costs of an enterprise have to sum up to its turnover and that the turnoverof an enterprise should be at least zero. Records that do not satisfy these editsare inconsistent and are hence considered incorrect. Details about imputationand adjustment techniques that ensure that edits are satisfied can be found inChapters 9 and 10.

The rest of this chapter is organized as follows. In Section 1.2 we examinethe statistical process at NSIs and other statistical organizations, and especiallythe role that statistical data editing and imputation play in this process. InSection 1.3 we examine (kinds of) data, errors, missing data, and edits. Section1.4 briefly describes the editing methods that will be explored in more detaillater in this book. Finally, Section 1.5 concludes this chapter by describing abasic editing strategy.

4 CHAPTER 1 Introduction to Statistical Data Editing and Imputation

1.2 Statistical Data Editing and Imputationin the Statistical Process

1.2.1 OVERVIEW OF THE STATISTICAL PROCESS

The processes of detecting and correcting errors and handling missing data forma part of the process of producing statistical information as practiced at NSIs.This process of producing statistical information can be broken down into anumber of steps. Willeboordse (1998) distinguishes the following phases in thestatistical process for business surveys:

• Setting survey objectives.

• Questionnaire design and sampling design.

• Data collection and data entry.

• Data processing and data analysis.

• Publication and data dissemination.

A similar division can be made for social surveys. Each phase itself can besubdivided into several steps.

Setting Survey Objectives. In the first phase, user groups for the statisti-cal information under consideration are identified, user needs are assessed,available data sources are explored, potential respondents are consultedabout their willingness to cooperate, the survey is embedded in the gen-eral framework for business surveys, the target population and the targetvariables of the intended output are specified, and the output table isdesigned.

Questionnaire Design and Sampling Design. In the second phase thepotential usefulness of available administrative registers is determined, theframe population in the so-called Statistical Business Register is comparedwith the target population, the sampling frame is defined, the sampling designand estimation method are selected, and the questionnaire is designed. Thereis a decision process on how to collect the data: paper questionnaires, personalinterviews, telephone interviews, or electronic data interchange.

Data Collection and Data Entry. In the third phase the sample is drawn,data are collected from the sampled units and entered into the com-puter system at the statistical office. During this phase the statistical officetries to minimize the response burden for businesses and to minimizenonresponse.

Data Processing and Data Analysis. In the fourth phase the collecteddata are edited, missing and erroneous data are imputed, raising weightsare determined, population figures are estimated, the data are incorporatedin the integration framework, and the data are analysed (for example,to adjust for seasonal effects). The process of detecting and correcting

1.2 Statistical Data Editing and Imputation in the Statistical Process 5

errors and handling missing data forms an important part of this phase.The bulk of the work at NSIs and other statistical agencies that col-lect and process data is spent on this phase, especially on statistical dataediting.

Publication and Data Dissemination. The final phase includes settingout a publication and dissemination strategy, protecting the final data (bothtabular data and microdata, i.e. the data of individual respondents) againstdisclosure of sensitive information, and lastly publication of the protecteddata.

1.2.2 THE EDIT AND IMPUTATION PROCESS

During statistical data editing and the imputation process, erroneousrecords—and erroneous values within these records—are localized and newvalues are estimated for the erroneous values and values missing in the data set.To edit an erroneous record, two steps have to be carried out. First, the incorrectvalues in such a record have to be localized. This is often called error localization.Second, after the faulty fields in an erroneous record have been identified, thesefaulty fields have to be imputed ; that is, the values of these fields have to bereplaced by better, preferably the correct, values.

For erroneous records, error localization and imputation are closely related.Often it is hard to distinguish where the error localization phase ends andwhere the imputation phase starts. For instance, when humans edit data, theyfrequently look at possible ways of imputing a record before completing the errorlocalization phase. Another example is that the localization of erroneous valuesmight be based on estimating values first and then determining the deviationbetween the observed and estimated values. The observed values that differ mostfrom their estimated counterparts are then considered erroneous, or in any casesuspicious. In this approach the detection of erroneous values and the estimationof better values are highly intertwined. A third example is that during manualreview (see Chapter 6) the detection of erroneous values and the ‘‘estimation’’of better values are highly intertwined. This ‘‘estimation’’ often simply consistsof filling in correct answers obtained by recontacting the respondents. Despitethe fact that error localization and imputation can be closely related, we willtreat them as two separate processes throughout most of this book. This is asimplification of the edit and imputation problem, but one that has shown towork well for most cases arising in practice.

In principle, it is not necessary to impute missing or erroneous values in orderto obtain valid estimates for the target variables. Instead, one can estimate thetarget variables directly during an estimation phase, without imputing the missingand erroneous values first. However, this approach would in most practical casesbecome extremely complex and very demanding from a computational point ofview. By first imputing the missing and erroneous values, a complete data set isobtained. From this complete data set, estimates can be obtained by standardestimation methods. In other words, imputation is often applied to simplify theestimation process.

6 CHAPTER 1 Introduction to Statistical Data Editing and Imputation

1.3 Data, Errors, Missing Data, and Edits

1.3.1 KINDS OF DATA

Edit and imputation techniques can be divided into two main classes, dependingon the kind of data to be edited or imputed: techniques for numerical data andtechniques for categorical data. Generally, there are major differences betweentechniques for these kinds of data. At NSIs and other statistical institutes,numerical data occur mainly in surveys on businesses whereas categorical dataoccur mainly in social surveys—for instance, surveys on persons or households.

At Statistics Netherlands and other NSIs, editing of business surveys is amuch bigger problem than editing of most social surveys on households andpersons. The main reason is that for business surveys, generally much more editrules (see below) are defined than for social surveys, and business surveys generallycontain much more errors than social surveys. Typically, business surveys are notvery large. Large and complicated business surveys may have somewhat over 100variables and 100 edits. In a small country such as the Netherlands, the numberof records in a business survey is usually a few thousand.

Population censuses form an important exception to the general rule thatediting is easier for social surveys than for business surveys. Census data do notcontain a high percentage of errors, but the number of edits (a few hundred),the number of variables (a few hundred), and especially the number of recordscan be high (several millions). Due to the sheer volume of the data, editing ofdata from a population census forms a major problem. The only efficient way toedit such volumes of data is to edit these data in an automatic manner wheneverpossible.

A recent development at NSIs is the increasing use of administrative (orregister-based) data, as opposed to the more traditional data collection by meansof sample surveys. The editing and imputation of administrative data for statisticalpurposes has certain specific features not shared by sample surveys. For instance,if data from several registers are combined, apart from the errors that are presentin the individual registers, additional inconsistencies may occur between datafrom different registers due to matching errors or diverging metadata definitions.Because this is a relatively new topic, suitable methodology for the statistical dataediting and imputation of administrative data has not yet been fully developed.We refer to Wallgren and Wallgren (2007) for an overview of current methodsfor register-based statistics.

1.3.2 KINDS OF ERRORS

One of the important goals of statistical data editing is the detection and correctionof errors. Errors can be subdivided in several ways. A first important distinctionwe shall make is between systematic and random errors. A second importantdistinction we shall make is between influential errors and noninfluential errors.The final distinction is between outliers and nonoutliers.

1.3 Data, Errors, Missing Data, and Edits 7

Systematic Errors. A systematic error is an error that occurs frequentlybetween responding units. This type of error can occur when a respondentmisunderstands or misreads a survey question. A well-known type of systematicerror is the so-called unity measure error, which is the error of, for example,reporting financial amounts in euros instead of the requested thousands ofeuros. Systematic errors can lead to substantial bias in aggregates. Once detected,systematic errors can easily be corrected because the underlying error mechanismis known.

Systematic errors, such as unity measure errors, can often be detectedby comparing a respondent’s present values with those from previous years,by comparing the responses to questionnaire variables with values of registervariables, or by using subject-matter knowledge. Other systematic errors, suchas transpositions of returns and costs and redundant minus signs, can bedetected and corrected by systematically exploring all possible transpositionsand inclusions/omissions of minus signs. Rounding errors—a class of systematicerrors where balance edits (see Section 1.3.4) are violated because the values ofthe involved variables have been rounded—can be detected by testing whetherfailed balance edits can be satisfied by slightly changing the values of the involvedvariables. We treat systematic errors in more detail in Chapter 2.

Random Errors. Random errors are not caused by a systematic deficiency,but by accident. An example is an observed value where a respondent by mistaketyped in a digit too many. In general statistics, the expectation of a random erroris typically zero. In our case, however, the expectation of a random error may alsodiffer from zero. This is, for instance, the case in the above-mentioned example.

Random errors can result in outlying values. In such a case they can bedetected by outlier detection techniques or by selective editing techniques (seeChapter 6). Random errors can also be influential (see below), in which casethey may again be detected by selective editing techniques. In many cases,random errors do not lead to outlying values or influential errors. In such cases,random errors can often be corrected automatically, assuming that they do leadto violated edit restrictions. Automatic editing of random errors is treated indetail in Chapters 3 to 5.

Influential Errors. Errors that have a substantial influence on publicationfigures are called influential errors. They may be detected by selective editingtechniques (see Chapter 6).

The fact that a value has a substantial influence on publication figuresdoes not necessarily imply that this value is erroneous. It may also be a correctvalue. In fact, in business surveys, influential observations are quite common,because many variables of businesses, such as turnover and costs, are often highlyskewed.

Outliers. A value, or a record, is called an outlier if it is not fitted well by a modelthat is posited for the observed data. If a single value is an outlier, this is called aunivariate outlier. If an entire record, or at least a subset consisting of several values

8 CHAPTER 1 Introduction to Statistical Data Editing and Imputation

in a record, is an outlier when the values are considered simultaneously—that is,if they do not fit the posited model well when considered simultaneously—thisis called a multivariate outlier. Again we have that the mere fact that a value (ora record) is an outlier does not necessarily imply that this value (set of values)contains an error. It may also be a correct value (set of values).

Outliers are related to influential values. An influential value is often alsoan outlier, and vice versa. However, an outlier may also be a noninfluentialvalue and an influential value may also be a nonoutlying value. In the statisticalediting process, outliers are often detected during so-called macro-editing (seeChapter 6). In this book we do not examine general outlier detection techniques,except for a very brief discussion in Chapter 3. We refer to the literature fordescriptions of these techniques [see, e.g., Rousseeuw and Leroy (1987), Barnettand Lewis (1994), Rocke and Woodruff (1996), Chambers, Hentges and Zhao(2004), and Todorov, Templ, and Filzmoser (2009)].

1.3.3 KINDS OF MISSING DATA

The occurrence of missing data implies a reduction of the effective samplesize and consequently an increase in the standard error of parameter estimates.This loss of precision is often not the main problem with nonresponse. Surveyorganizations can anticipate the occurrence of nonresponse by oversampling,and moreover the loss of precision can be quantified when standard errors areestimated. A more problematic effect, which cannot be measured easily, is thatnonresponse may result in biased estimates.

Missing data can be subdivided in several ways according to the underlyingnonresponse mechanism. Whether the problem of biased estimates due to non-response actually occurs will depend on the nonresponse mechanism. Informallyspeaking, if the nonresponse mechanism does not depend on unobserved data(conditionally on the observed data), imputation may lead to unbiased estimateswithout making further assumptions. If the nonresponse mechanism does dependon unobserved data, then further—unverifiable—assumptions are necessary toreduce bias by means of imputation. We shall now make these statements moreprecise.

A well-known and very often used classification of nonresponse mechanismsis: ‘‘missing completely at random’’ (MCAR), ‘‘missing at random’’ (MAR), and‘‘not missing at random’’ (NMAR); see Rubin (1987), Schafer (1997), and Littleand Rubin (2002).

MCAR. When missing data are MCAR, the probability that a value is missingdoes not depend on the value(s) of the target variable(s) to be imputed or onthe values of auxiliary variables. This situation can occur when a respondentforgets to answer a question or when a random part of the data is lost whileprocessing it. MCAR is the simplest nonresponse mechanism, because the itemnonrespondents (i.e., the units that did not respond to the target variable) aresimilar to the item respondents (i.e., the units that did respond to the target

1.3 Data, Errors, Missing Data, and Edits 9

variable). Under MCAR, the observed data may simply be regarded as a randomsubset of the complete data. Unfortunately, MCAR rarely occurs in practice.

More formally, a nonresponse mechanism is called MCAR if

P(rj | yj , x, ξ ) = P(rj | ξ ).(1.1)

In this notation, rj is the response indicator of target variable yj, where rij = 1means that record i contains a response for variable yj , and rij = 0 that the valueof variable yj is missing for record i, x is a vector of always observed auxiliaryvariables, and ξ is a parameter of the nonresponse mechanism.

MAR. When missing data are MAR, the probability that a value is missingdoes depend on the values of auxiliary variables, but not on the value(s) ofthe target variable(s) to be imputed. Within appropriately defined groups ofpopulation units, the nonresponse mechanism is again MCAR. This situationcan occur, for instance, when the nonresponse mechanism of elderly peoplediffers from that of younger people, but within the group of elderly people andthe group of younger people the probability that a value is missing does notdepend on the value(s) of the target variable(s) or on the values of other auxiliaryvariables. Similarly, for business surveys, larger businesses may exhibit a differentnonresponse mechanism than small businesses, but within each group of largerbusinesses, respectively small businesses, the nonresponse mechanism may beMCAR.

In more formal terms, a nonresponse mechanism is called MAR if

P(rj | yj , x, ξ ) = P(rj | x, ξ ),(1.2)

using the same notation as in (1.1).MAR is a more complicated situation than MCAR. In the case of MAR, one

needs to find appropriate groups of population units to reduce MAR to MCARfor these groups. Once these groups of population units have been found, it issimple to correct for missing data because within these groups all units may beassumed to have the same probability to respond.

In practice, one usually assumes the nonresponse mechanism to be MARand tries to construct appropriate groups of population units. These groups arethen used to correct for missing data.

NMAR. When missing data are NMAR, the probability that a value is missingdoes depend on the value(s) of the target variable(s) to be imputed, and possiblyalso on the values of auxiliary variables. This situation can occur, for instance,when reported values of income are more likely to be missing for persons with ahigh income, when the value of ethnicity is more likely to be missing for certainethnic groups, or—for business surveys—when the probability that the value ofturnover is missing depends on the value of turnover itself.

10 CHAPTER 1 Introduction to Statistical Data Editing and Imputation

In more formal terms, a nonresponse mechanism is called NMAR if

P(rj | yj , x, ξ )

cannot be simplified, that is, if both (1.1) and (1.2) do not hold.NMAR is the most complicated case. In order to correct for NMAR,

one cannot use only the observed data. Instead, one also has to make modelassumptions in order to model the dependence of the nonresponse mechanismon the value(s) of the target variable(s).

A related classification of nonresponse mechanisms is: ‘‘ignorable’’ and‘‘nonignorable.’’

Ignorable. A nonresponse mechanism is called ignorable if it is MAR (orMCAR) and the parameters to be estimated are distinct from the parameterξ . Here, distinctness means that knowledge of ξ (which could be inferredfrom the response indicator rj that is available for all units, responding ornot) is not helpful in estimating the parameters of interest. However, asnoted by Little and Rubin (2002, Chapter 6), the MAR condition is moreimportant than distinctness, because MAR alone is sufficient to make validinference possible. If the parameters are not distinct, this can merely resultin some loss of efficiency.

Nonignorable. A nonresponse mechanism is called nonignorable if theconditions for ignorability do not hold. That is, either the nonresponsemechanism is NMAR or the parameter ξ is not distinct from the parametersof interest, or both.

For more details on MCAR, MAR, NMAR, and (non-)ignorability we refer toRubin (1987), Schafer (1997), and Little and Rubin (2002).

1.3.4 EDIT RULES

Errors are most often detected by edit rules. Edit rules, or edits for short, definethe admissible (or plausible) values and combinations of values of the variablesin each record. Errors are detected by verifying whether the values are admissibleaccording to the edits—that is, by checking whether the edits are violated orsatisfied. An edit e can be formulated as

e : x ∈ Sx ,

with Sx the set of admissible values of x. As we shall see below, x can refer to asingle variable as well as multiple variables. If e is false, the edit is violated andotherwise the edit is satisfied.

Edits can be divided into hard (or fatal) edits and soft (or query) edits. Hardedits are edits that must be satisfied in order for a record to qualify as a validrecord. As an example, a hard edit for a business survey specifies that the variableTotal costs needs to be equal to the sum of the variables Employee costs, Capitalcosts, Transport costs, and Other costs. Records that violate one or more hard edits

1.3 Data, Errors, Missing Data, and Edits 11

are considered to be inconsistent and it is deduced that some variable(s) in sucha record must be in error. Soft edits are used to identify unlikely or deviatingvalues that are suspected to be in error, although this is not a logical necessity.Examples are (a) an edit specifying that the yearly income of employees mustbe less than 10 million euros or (b) an edit specifying that the turnover peremployee of a firm may not be larger than 10 times the value of the previousyear. The violation of soft edits can be a trigger for further investigation of theseedit failures, to either confirm or reject the suspected values.

To illustrate the kind of edits that are often applied in practice, examples ofa number of typical classes of edits are given below.

Univariate Edits or Range Restrictions. An edit describing the admissiblevalues of a single variable is called a univariate edit or a range restriction. Forcategorical variables, a range restriction simply verifies whether the observedcategory codes for the variable belong to the specified set of codes. The set ofallowable values Sx is

Sx = {x1, x2, . . . , xC }and consists of an enumeration of the C allowed codes. For instance, for thevariable Sex we could have Sx = {0, 1} and for a date variable in the conventionalyyyy-mm-dd notation the set Sx would consist of all allowed integer combinationsdescribing the year, month, and day. Range restrictions for continuous variablesare usually specified using inequalities. The simplest, but often encountered,range restrictions of this type are nonnegativity constraints, that is,

Sx = {x | x ≥ 0}.Examples are Age, Rent, and many of the financial variables in business surveys(costs of various types, turnover and revenues of various activities and so on).Range restrictions describing an interval as

Sx = {x | l ≤ x ≤ u}are also common. Examples are setting lower (l) and upper (u) bounds on theallowable values of age, income, or working hours per week. Range restrictionscan be hard edits (for instance, if Sx is an enumeration of allowable codes), butthey can also be soft edits if the bounds set on the allowable range are not alogical necessity (for instance, if the maximum number of weekly working hoursis set to 100).

Bivariate Edits. In this case the set of admissible values of a variable x dependson the value of another variable, say y, observed on the same unit. The set ofadmissible values is then the set of admissible pairs of values (x, y). For instance,if x is Marital status with values 0 (never married), 1 (married) and 2 (previouslymarried) and y is Age, we may have that

Sxy = {(x, y) | x = 0 ∧ y < 15} ∪ {(x, y) | y ≥ 15},

12 CHAPTER 1 Introduction to Statistical Data Editing and Imputation

reflecting the rule that persons younger than 15 are not allowed to be married,while for persons of 15 years or more all marital states are allowed. Anotherexample of a bivariate edit is

Sxy = {(x, y) | x − y > 14},

with x the age, in years, of a mother and y the age of her child. This examplereflects the perhaps not ‘‘hard’’ edit that a mother must be at least 14 years olderthan her child. Finally, an important and often encountered class of bivariateedits is the so-called ratio edit which sets bounds on the allowable range of a ratiobetween two variables and is defined by

Sxy = {(x, y) | l ≤ xy

≤ u}.

A ratio edit could, for example, specify bounds on the ratio of the turnover andthe number of employees of firms in a certain branch of industry. Ratio editsare often used to compare data on the same units from different sources, suchas values reported in the current survey (x) with values for the same variablesreported in last year’s survey (y) or values of variables from a tax register withsimilarly defined variables from a survey.

Balance Edits. Balance edits are multivariate edits that state that the admissiblevalues of a number of variables are related by a linear equality. They occur mainlyin business statistics where they are linear equations that should be satisfiedaccording to accounting rules. Two examples are

Profit = Turnover − Total costs(1.3)

and

Total costs = Employee costs + Other costs.(1.4)

These rules are related because they have the variable Total costs in common.If the first rule is satisfied but the second is not, it may seem more likely thatEmployee costs or Other costs are in error than Total costs, because in the lastcase the first rule should probably also be violated. Balance edits are of greatimportance for editing economic surveys where there are often a large numberof such edits. For instance, in the yearly structural business statistics there aretypically about 100 variables with 30 or more balance edits. These interrelatedsystems of linear relations that the values must satisfy provide much informationabout possible errors and missing values.

Since balance edits describe relations between many variables, they aremultivariate edits. Moreover, since they are often connected by common variables,they should be treated as a system of linear equations. It is convenient to expresssuch a system in matrix notation. Denoting the five variables in (1.3) and (1.4)

1.4 Basic Methods for Statistical Data Editing and Imputation 13

by x1 (Profit), x2 (Turnover), x3 (Total costs), x4 (Employee costs), and x5 (Othercosts), the system can be written as

(1 −1 1 0 00 0 −1 1 1

)x1x2x3x4x5

=

(00

),

or Ax = 0. The admissible values of a vector x subject to a system of balanceedits, defined by a restriction matrix A, can then be written as

Sx = {x | Ax = 0} .

1.4 Basic Methods for Statistical Data Editingand Imputation

In this section we have a first brief look at various methods that can be used toedit and impute data. Before we sketch these methods, we first look back to seewhy these methods were developed.

Computers have been used in the editing process for a long time [see,e.g., Nordbotten (1963)]. In the early years their role was, however, restrictedto checking which edits were violated. Subject-matter specialists entered datainto a mainframe computer. Subsequently, the computer checked whether thesedata satisfied all specified edits. For each record all violated edits were listed.Subject-matter specialists then used these lists to correct the records. That is, theyretrieved all paper questionnaires that did not pass all edits and corrected thesequestionnaires. After they had corrected the data, these data were again enteredinto the mainframe computer, and the computer again checked whether the datasatisfied all edits. This iterative process was continued until (nearly) all recordspassed all edits.

A major problem of this approach was that during the manual correctionprocess the records were not checked for consistency. As a result, a record thatwas ‘‘corrected’’ could still fail one or more specified edits. Such a record hencerequired more correction. It was not exceptional that some records had to becorrected several times. It is therefore not surprising that editing in this way wasvery costly, both in terms of money as well as in terms of time. In the literatureit was estimated that 25% to 40% of the total budget was spent on editing [seee.g. Federal Committee on Statistical Methodology (1990) and Granquist andKovar (1997)].

1.4.1 EDITING DURING THE DATA COLLECTION PHASE

The most efficient editing technique of all is no editing at all, but instead ensuringthat correct data are obtained during the data collection phase. In this section webriefly discuss ways to obtain data with no or only few errors at data collection.

14 CHAPTER 1 Introduction to Statistical Data Editing and Imputation

When one aims to collect correct data at data collection, one generally usesa computer to record the data. This is called computer-assisted data collection.With computer-assisted data collection the computer can immediately check therecorded data for violations of edits. Below we discuss four modes for computer-assisted data collection: CAPI (Computer-Assisted Personal Interviewing), CATI(Computer-Assisted Telephone Interviewing), CASI (Computer-Assisted Self-Interviewing), and CAWI (Computer-Assisted Web Interviewing). For moreinformation on computer-assisted data collection in general, we refer to Couperet al. (1998).

When CAPI is used to collect the data, an interviewer visits the respondentand enters the answers directly into a laptop. When CATI is used to collect thedata, the interview is carried out during a telephone call. When CASI or CAWIis used to collect the data, the respondent fills in an electronic questionnairehimself. The difference between these two modes is that for CAWI an electronicquestionnaire on the Internet has to be filled in, whereas for CASI an off-lineelectronic questionnaire is used. When an invalid response is given to a questionor an inconsistency between two or more answers is noted during any of thesedata collection modes, this can be immediately reported by the computer that isused for data entry. The error can then be resolved by asking the respondent therelevant question(s) again. For CASI and CAWI, generally not all edits that couldbe specified are actually implemented, since the respondent may get annoyedand might refuse to complete the questionnaire when the edits keep on reportingthat his/her answers are inconsistent.

Computer-assisted data collection removes the need for data entry by typists,since the data arrive at the statistical office already in digital form. This eliminatesone source of potential errors. In many cases, data collected by means of CAPI,CATI, CASI or CAWI also contain fewer errors than data collected by means ofpaper questionnaires because random errors that affect paper questionnaires aredetected and avoided at collection. For face-to-face interviewing CAPI has in factbecome the standard. CAPI, CATI, CASI, and CAWI may hence seem to be idealways to collect data, but—unfortunately—they too have their disadvantages.

A first disadvantage of CATI and CAPI is that CATI and, especially, CAPIare very expensive. A second disadvantage of CATI and CAPI is that a prerequisitefor these two data collection modes is that the respondent is able to answer thequestions during the interview. For a survey on persons and households, this isoften the case. The respondent often knows (good proxies of) the answers to thequestions, or is able to retrieve the answers quickly. For a survey on enterprisesthe situation is quite different. Often it is impossible to retrieve the correctanswers quickly, and often the answers are not even known by one person orone department of an enterprise. Finally, even in the exceptional case that oneperson knew all answers to the questions, the NSI would generally not know theidentity of this person. For the above-mentioned reasons, many NSIs frequentlyuse CAPI and CATI to collect data on persons and households but only rarelyfor data on enterprises.

Pilot studies and actual applications have revealed that CASI and CAWI areindeed viable data collection modes, but also that several problems arise when

1.4 Basic Methods for Statistical Data Editing and Imputation 15

these modes are used. Besides IT problems, such as that the software—andthe Internet connection—should be fast and reliable and the security of thetransmitted data should be guaranteed, there are a number of practical andstatistical problems. We have already mentioned the practical problem that ifthe edits keep on reporting that the answers are inconsistent, the respondentmay refuse to fill in the rest of the questionnaire. An example of a statisticalproblem is that the group of people responding to a web survey may be selective,since Internet usage is not uniformly distributed over the population [see, e.g.,Bethlehem (2007)].

Another important problem for CAWI and CASI is that data collectedby either of these data collection modes may appear to be of higher statisticalquality than data collected by means of paper questionnaires, but in fact are not.When data are collected by means of CASI and CAWI, one can enforce thatthe respondents supply data that satisfy built-in edits, or one can avoid balanceedits by automatically calculating total amounts from the reported components.Because less edits are failed by the collected data, these data may appear to beof higher statistical quality. This need not be the case, however. Each edit thatis built into the electronic questionnaire will be automatically satisfied by thecollected data, and hence cannot be used to check for errors later on. Therefore,the collected data may appear to contain only few errors, but this might bedue to a lack of relevant edits. There are indications that respondents can beless accurate when filling in an electronic questionnaire, especially if totals arecomputed automatically [see Børke (2008) and Hoogland and Smit (2008)].

NSIs seem to be moving toward the use of mixed-mode data collection,where data are collected by a mix of several data collection modes. This obviouslyhas consequences for statistical data editing. Some of the potential consequenceshave been examined by Børke (2008), Hoogland and Smit (2008), and Van derLoo (2008).

1.4.2 MODERN EDITING METHODS

Below we briefly mention editing methods that are used in modern practice. Theediting techniques are examined in detail in other chapters of this book.

Interactive Editing. Subject-matter specialists have extensive knowledge ontheir area of expertise. This knowledge should be used as well as possible.This aim can be achieved by providing subject-matter specialists with efficientand effective interactive data editing tools. Most interactive data editing toolsapplied at NSIs allow one to check the specified edits during or after data entry,and—if necessary—to correct erroneous data immediately. This is referred toas interactive or computer-assisted editing. To correct erroneous data, severalapproaches can be followed: The respondent can be recontacted, the respondent’sdata can be compared to his data from previous years, the respondent’s data canbe compared to data from similar respondents, and subject-matter knowledge ofthe human editor can be used. Interactive editing is nowadays a standard way toedit data. It can be used to edit both categorical and numerical data. The number

16 CHAPTER 1 Introduction to Statistical Data Editing and Imputation

of variables, edits, and records may, in principle, be high. Generally, the qualityof data editing in a computer-assisted manner is considered high. Interactiveediting is examined in more detail in Section 6.5.

Selective Editing. Selective editing is an umbrella term for several methodsto identify the influential errors (i.e., the errors that have a substantial impacton the publication figures) and outliers (i.e., values that do not fit a model ofthe data well). Selective editing techniques aim to apply interactive editing toa well-chosen subset of the records, such that the limited time and resourcesavailable for interactive editing are allocated to those records where it has the mosteffect on the quality of the final estimates of publication figures. Selective editingtechniques try to achieve this aim by splitting the data into two streams: thecritical stream and the noncritical stream. The critical stream consists of recordsthat are the most likely ones to contain influential errors; the noncritical streamconsists of records that are unlikely to contain influential errors. The recordsin the critical stream, the critical records, are edited in a traditional interactivemanner. The records in the noncritical stream, the noncritical records, are notedited in a computer-assisted manner. They may later be edited automatically.Selective editing is examined in Chapter 6.

Macro-editing. We distinguish between two forms of macro-editing. The firstform is sometimes called the aggregation method [see, e.g., Granquist (1990)]. Itformalizes and systematizes what every statistical agency does before publication:verifying whether figures to be published seem plausible. This is accomplished bycomparing quantities in publication tables with the same quantities in previouspublications. Only if an unusual value is observed, a micro-editing procedureis applied to the individual records and fields contributing to the suspiciousquantity. A second form of macro-editing is the distribution method. Theavailable data are used to characterize the distribution of the variables. Then,all individual values are compared with the distribution. Typically, measuresof location and spread are computed. Records containing values that couldbe considered uncommon (given the distribution) are candidates for furtherinspection and possibly for editing. Macro-editing, in particular the aggregationmethod, has always been applied at statistical offices. Macro-editing is againexamined further in Chapter 6.

Automatic Editing. When automatic editing is applied, records are edited bya computer without human intervention. In this sense, automatic editing is theopposite of the traditional approach to the editing problem, where each recordis edited manually. Automatic editing has already been used in the 1960s and1970s [see, e.g., Nordbotten (1963)]. Nevertheless, it has never become verypopular. For this we point out two reasons. First, in former days, computerswere too slow to edit data automatically. Second, development of a system forautomatic editing was often considered too complicated and too costly by manystatistical offices. In the last two decades, however, a lot of progress has been madewith respect to automatic editing: Computers have become faster and algorithms

1.5 An Edit and Imputation Strategy 17

have been simplified and have also become more efficient. For these reasons wepay more attention to automatic editing than to the other editing techniques inthis book. Automatic editing of systematic errors is examined in Chapter 2, andautomatic editing of random errors in Chapters 3 to 5.

1.4.3 IMPUTATION METHODS

To estimate a missing value, or a value that was identified as being erroneousduring statistical data editing, two main approaches can be used. The firstapproach is manual imputation or correction, where the corresponding respon-dent is recontacted or subject-matter knowledge is used to obtain an estimate forthe missing or erroneous value. The second approach is automated imputation,which is based on statistical estimation techniques, such as regression models. Inthis book, we only treat the latter approach.

In imputation, predictions from parametric or nonparametric models arederived for values that are missing or flagged as erroneous. An imputation modelpredicts a missing value using a function of auxiliary variables, the predictors.The auxiliary variables may be obtained from the current survey or from othersources such as historical information (the value of the missing variable ina previous period) or, increasingly important, administrative data. The mostcommon types of imputation models are variants of regression models withparameters estimated from the observed correct data. However, especially forcategorical variables, donor methods are also frequently used. Donor methodsreplace missing values in a record with the corresponding values from a nearbycomplete and valid record. Often a donor record is chosen such that it resemblesas much as possible the record with missing values. Imputation methods aretreated in Chapters 7 to 9 of this book.

1.5 An Edit and Imputation Strategy

Data editing is usually performed as a sequence of different detection and/orcorrection process steps. In this section we give a global description of an editingstrategy. This description is general enough to include the situation for many datasets as special cases, and most editing strategies applied in practice will at leastinclude a number of the elements and principles described here. The process stepscan be characterized from different points of view—for instance, by the type oferrors they try to detect or resolve or by the methods that are used for detectionor correction. Another important distinction is between automatic methods thatcan be executed without human intervention and interactive editing that isperformed by editors.

The global editing strategy as depicted in Figure 1.1 consists of the followingfive steps that are clarified below.

1. Treatment of Systematic Errors. Identify and eliminate errors that are evidentand easy to treat with sufficient reliability.

18 CHAPTER 1 Introduction to Statistical Data Editing and Imputation

1. Correction of systematic errors

3b. Imputation of missings and errors

3a. Localization of random errors

4.Interactive editing

3c. Adjustment of imputed values.

Yes

5. Macro-analysis Influential errors?

Statisticalmicrodata

Raw data

2. Micro-analysis (scores) Influential errors?

Ye s

No

No

FIGURE 1.1 Example of a process flow.

2. Micro-selection. Select records for interactive treatment that contain influ-ential errors that cannot be treated automatically with sufficient reliability.

3. Automatic Editing . Apply all relevant automatic error detection and correc-tion procedures to the (many) records that are not selected for interactiveediting in step 2.

4. Interactive Editing . Apply interactive editing to the minority of the recordswith influential errors.

5. Macro-selection. Select records with influential errors by using methodsbased on outlier detection techniques and other procedures that make use ofall or a large fraction of the response.

1.5 An Edit and Imputation Strategy 19

We distinguish two kinds of process steps: those that localize or treat errorsand those that direct the records through the different stages of the process. Theprocesses in step 2 and 5 are of the latter kind; they are ‘‘selectors’’ that do notactually treat errors, but select records for specific kinds of further processing.

Step 1. Correction of Systematic Errors. Detection and correction of system-atic errors is an important first step in an editing process. It can be doneautomatically and reliably at virtually no costs and hence will improve boththe efficiency and the quality of the editing process. It is in fact a very efficientand probably often underused correction approach. Systematic and otherevident errors and algorithms that can automatically resolve these errors aredescribed in Chapter 2 of this book.

Step 2. Micro-selection. Errors that cannot be resolved in the previous stepwill be taken care of either manually (by subject-matter specialists) orautomatically (by specialized edit and imputation algorithms). In this step,the data are split into a critical stream and a noncritical stream, usingselective editing techniques as mentioned in Section 1.4.2. The extent towhich a record potentially contains influential errors can be measured by ascore function [cf. Latouche and Berthelot (1992), Lawrence and McKenzie(2000), and Farwell and Rain (2000)]. This function is constructed suchthat records with high scores likely contain errors that have substantial effectson estimates of target parameters. For this selection step, a threshold valuefor the score has been set and all records with scores above this threshold aredirected to manual reviewers whereas records with scores below the thresholdare treated automatically. More details can be found in Chapter 6.

Apart from the score function, which looks at influential errors, anotherimportant selection criterion is imputability. For some variables, very accu-rate imputation models can be developed. If such a variable fails an edit,the erroneous value can safely be replaced by an imputed value, even if itis an influential value. Note that the correction of systematic errors in theprevious step can also be an example of automatic treatment of influentialerrors, if the systematic error is an influential one.

Step 3a. Localization of erroneous values (random errors). The next threesteps are automatic detection and correction procedures. In principle, theyare designed to solve hard edit failures, including missing values, but they canbe applied to soft edits if the soft edit is treated as a hard one. These three stepstogether represent the vast majority of all edit and imputation methodology.The other chapters of this book are devoted to this methodology forautomatic detection and correction of erroneous and missing values.

The first step in the automatic treatment of errors is the localization oferrors. Since systematic errors have already been removed, the remainingerrors at this stage are random errors. Once the (hard) edits are defined andimplemented, it is straightforward to check whether the values in a record areinconsistent in the sense that some of these edits are violated. It is, however,not so obvious how to decide which variables in an inconsistent record arein error. The designation of erroneous values in an inconsistent record is

20 CHAPTER 1 Introduction to Statistical Data Editing and Imputation

called the error localization problem, which is treated in Chapters 3 to 5 ofthis book.

Step 3b. Imputation. In this step, missing data are imputed in an automaticmanner. The imputation method that is best suited for a particular situationwill depend on the characteristics of the data set and the research goals. InChapters 7 to 9 we examine imputation methods in detail.

Step 3c. Consistency Adjustment of Imputed Values. In most cases, the editsare not taken into account by the imputation methods; some exceptionsare examined in Chapter 9. As a consequence, the imputed records arein general inconsistent with the edits. This problem can be solved by theintroduction of an adjustment step in which adjustments are made to theimputed values such that the record satisfies all edits and the adjustmentsare as small as possible. This problem can be formulated as a linear or aquadratic programming problem and is treated in Chapter 10.

Step 4. Interactive Editing . Substantial mistakes by somewhat larger enter-prises that have an appreciable influence on publication aggregates and forwhich no accurate imputation model exists are not considered suitable forthe generic procedures of automatic editing. These records are treated bysubject-matter specialists in a process step called interactive editing; seeSection 1.4.2 above and Chapter 6.

Step 5. Macro-selection. The steps considered so far all use micro-editingmethods—that is, methods that use the data of a single record and relatedauxiliary information to check and correct it. Micro-editing processes canbe conducted from the start of the data collection phase, as soon as recordsbecome available. In contrast, macro-selection techniques use informationfrom other records and can only be applied if a substantial part of the data iscollected or has been imputed. Macro-selection techniques are also selectiveediting techniques in the sense that they aim to direct the attention only topossibly influential erroneous values. Macro-editing is treated in Chapter 6of this book.

The process flow suggested in Figure 1.1 is just one possibility. Depending onthe type of survey and the available resources and auxiliary information, theprocess flow can be different. Not all steps are always carried out, the order ofsteps may be different, and the particular methods used in each step can differbetween types of surveys. For social surveys, for instance, selective editing is notvery important because the contributions of individuals to a publication total arenot so much different, unlike the contributions of small and large enterprises inbusiness surveys. Consequently, there is less need for manual editing of influentialrecords, and step 4 need not be performed. Often, in social surveys, due to a lackof hard edits, the main type of detectable error is the missing value, and processsteps 3a and 3c are not performed either. For administrative data the collectionof all records, or a large part of it, is often available at once. This is differentfrom the situation for surveys where the data are collected over a period of time.For administrative data it is therefore possible to form preliminary estimatesimmediately and to start with macro-editing as a tool for selective editing, and

References 21

a process could start with step 1, followed by step 5 and possibly by step 4and/or step 3.

Although automatic procedures are frequently used for relatively unim-portant errors, choosing the most suitable error detection and/or imputationmethods is still important. If nonappropriate methods are used, especially forlarge amounts of random errors and/or missing values, additional bias may beintroduced. Furthermore, as the quality of the automatic error localization andimputation methods and models gets better, more records can be entrusted tothe automatic treatment in step 3 and less records have to be selected for thetime-consuming and costly interactive editing step.

REFERENCES

Barnett, V., and T. Lewis (1994), Outliers in Statistical Data. John Wiley & Sons, NewYork.

Bethlehem, J. (2007), Reducing the Bias of Web Survey Based Estimates. Discussion paper07001, Statistics Netherlands, Voorburg (see also www.cbs.nl).

Børke, S. (2008), Using ‘‘Traditional’’ Control (Editing) Systems to Reveal Changes whenIntroducing New Data Collection Instruments. Working Paper No. 6, UN/ECE WorkSession on Statistical Data Editing, Vienna.

Chambers, R., A. Hentges, and X. Zhao (2004), Robust Automatic Methods for Outlierand Error Detection. Journal of the Royal Statistical Society A 167 , pp. 323–339.

Couper, M. P., R. P. Baker, J. Bethlehem, C. Z. F. Clark, J. Martin, W. L. Nichols II,and J. M. O’Reilly (eds.) (1998), Computer Assisted Survey Information Collection.John Wiley & Sons, New York.

Farwell, K., and M. Rain (2000), Some Current Approaches to Editing in the ABS.Proceedings of the Second International Conference on Establishment Surveys, Buffalo,pp. 529–538.

Federal Committee on Statistical Methodology (1990), Data Editing in Federal StatisticalAgencies. Statistical Policy Working Paper 18, U.S. Office of Management andBudget, Washington, D.C.

Granquist, L. (1984), Data Editing and its Impact on the Further Processing of StatisticalData. Workshop on Statistical Computing, Budapest.

Granquist, L. (1990), A Review of Some Macro-Editing Methods for Rationalizing theEditing Process. Proceedings of the Statistics Canada Symposium, pp. 225–234.

Granquist, L. (1995), Improving the Traditional Editing Process. In: Business SurveyMethods, B.G. Cox, D.A. Binder, B.N. Chinnappa, A. Christianson, M.J. Colledge,and P.S. Kott, eds. John Wiley & Sons, New York, pp. 385–401.

Granquist, L. (1997), The New View on Editing. International Statistical Review 65, pp.381–387.

Granquist, L. and J. Kovar (1997), Editing of Survey Data: How Much Is Enough? In:Survey Measurement and Process Quality, L.E. Lyberg, P. Biemer, M. Collins, E.D.De Leeuw, C. Dippo, N. Schwartz, and D. Trewin, eds. John Wiley & Sons, NewYork, pp. 415–435.

Hoogland, J., and R. Smit (2008), Selective Automatic Editing of Mixed Mode Question-naires for Structural Business Statistics. Working Paper No. 2, UN/ECE Work Sessionon Statistical Data Editing, Vienna.

22 CHAPTER 1 Introduction to Statistical Data Editing and Imputation

Latouche, M., and J. M. Berthelot (1992), Use of a Score Function to Prioritize andLimit Recontacts in Editing Business Surveys. Journal of Official Statistics 8, pp.389–400.

Lawrence, D., and R. McKenzie (2000), The General Application of Significance Editing.Journal of Official Statistics 16 , pp. 243–253.

Little, R. J. A., and D. B. Rubin (2002), Statistical Analysis with Missing Data, secondedition. John Wiley & Sons, New York.

Nordbotten, S. (1955), Measuring the Error of Editing the Questionnaires in a Census.Journal of the American Statistical Association 50, pp. 364–369.

Nordbotten, S. (1963), Automatic Editing of Individual Statistical Observations. In:Conference of European Statisticians Statistical Standards and Studies No. 2, UnitedNations, New York.

Rocke, D. M., and D. L. Woodruff (1996), Identification of Outliers in MultivariateData. Journal of the American Statistical Association 91, pp. 1047–1061.

Rousseeuw, P. J., and M. L. Leroy (1987), Robust Regression & Outlier Detection. JohnWiley & Sons, New York.

Rubin, D. B. (1987), Multiple Imputation for Non-Response in Surveys. John Wiley &Sons, New York.

Schafer, J. L. (1997), Analysis of Incomplete Multivariate Data. Chapman & Hall, London.Todorov, V., M. Templ, and P. Filzmoser (2009), Outlier Detection in Survey Data Using

Robust Methods. Working Paper No. 40, UN/ECE Work Session on Statistical DataEditing, Neuchatel.

Van der Loo, M. P. J. (2008), An Analysis of Editing Strategies for Mixed-Mode EstablishmentSurveys. Discussion paper 08004, Statistics Netherlands (see also www.cbs.nl).

Wallgren, A., and B. Wallgren (2007), Register-Based Statistics—Administrative Data forStatistical Purposes. John Wiley & Sons, Chichester.

Willeboordse, A. (ed.) (1998), Handbook on the Design and Implementation of BusinessSurveys. Office for Official Publications of the European Communities, Luxembourg.

Contents

List of Figures xiiiList of Tables xvii

1. First Principles 11.1 A preview of the book 31.2 Inequality of what? 41.3 Inequality measurement, justice, and poverty 71.4 Inequality and the social structure 131.5 Questions 14

2. Charting Inequality 172.1 Diagrams 182.2 Inequality measures 242.3 Rankings 312.4 From charts to analysis 362.5 Questions 37

3. Analysing Inequality 393.1 Social welfare functions 403.2 SWF-based inequality measures 493.3 Inequality and information theory 533.4 Building an inequality measure 613.5 Choosing an inequality measure 673.6 Summary 723.7 Questions 75

4. Modelling Inequality 774.1 The idea of a model 784.2 The lognormal distribution 804.3 The Pareto distribution 874.4 How good are the functional forms? 954.5 Questions 100

xi

Contents

5. From Theory to Practice 1035.1 The data 1045.2 Computation of the inequality measures 1125.3 Appraising the calculations 1285.4 Shortcuts: fitting functional forms 1355.5 Interpreting the answers 1425.6 A sort of conclusion 1485.7 Questions 149

A Technical Appendix 153A.1 Overview 153A.2 Measures and their properties 153A.3 Functional forms of distribution 156A.4 Interrelationships between inequality measures 163A.5 Decomposition of inequality measures 164A.6 Negative incomes 169A.7 Estimation problems 170A.8 Using the website 177

B Notes on Sources and Literature 178B.1 Chapter 1 178B.2 Chapter 2 180B.3 Chapter 3 183B.4 Chapter 4 187B.5 Chapter 5 190B.6 Technical Appendix 195

Bibliography 197Index 225

xii

1

First Principles

‘It is better to ask some of the questions than to know all of the answers.’

James Thurber (1945), The Scotty Who Knew Too Much

‘Inequality’ is in itself an awkward word, as well as one used in connectionwith a number of awkward social and economic problems. The difficulty isthat the word can trigger quite a number of different ideas in the mind of areader or listener, depending on his training and prejudice.

‘Inequality’ obviously suggests a departure from some idea of equality. Thismay be nothing more than an unemotive mathematical statement, in whichcase ‘equality’ just represents the fact that two or more given quantitiesare the same size, and ‘inequality’ merely relates to differences in thesequantities. On the other hand, the term ‘equality’ evidently has compellingsocial overtones as a standard which it is presumably feasible for societyto attain. The meaning to be attached to this is not self-explanatory. Someyears ago Professors Rein and Miller revealingly interpreted this standard ofequality in nine separate ways

� One-hundred-percentism: in other words, complete horizontal equity—‘equal treatment of equals’.

� The social minimum: here one aims to ensure that no one falls belowsome minimum standard of well-being.

� Equalization of lifetime income profiles: this focuses on inequality of futureincome prospects, rather than on the people’s current position.

� Mobility: that is, a desire to narrow the differentials and to reduce thebarriers between occupational groups.

� Economic inclusion: the objective is to reduce or eliminate the feeling ofexclusion from society caused by differences in incomes or some otherendowment.

1

Measuring Inequality

� Income shares: society aims to increase the share of national income (orsome other ‘cake’) enjoyed by a relatively disadvantaged group—such asthe lowest tenth of income recipients.

� Lowering the ceiling: attention is directed towards limiting the share ofthe cake enjoyed by a relatively advantaged section of the population.

� Avoidance of income and wealth crystallization: this just means eliminatingthe disproportionate advantages (or disadvantages) in education, polit-ical power, social acceptability and so on that may be entailed by anadvantage (or disadvantage) in the income or wealth scale.

� International yardsticks: a nation takes as its goal that it should be nomore unequal than another ‘comparable’ nation.

Their list is probably not exhaustive and it may include items which youdo not feel properly belong on the agenda of inequality measurement; but itserves to illustrate the diversity of views about the nature of the subject—letalone its political, moral or economic significance—which may be presentin a reasoned discussion of equality and inequality. Clearly, each of thesecriteria of ‘equality’ would influence in its own particular way the mannerin which we might define and measure inequality. Each of these potentiallyraises particular issues of social justice that should concern an interestedobserver. And if I were to try to explore just these nine suggestions with thefullness that they deserve, I should easily make this book much longer thanI wish.

In order to avoid this mishap let us drastically reduce the problem bytrying to set out what the essential ingredients of a Principle of InequalityMeasurement should be. We shall find that these basic elements underlie astudy of equality and inequality along almost any of the nine lines suggestedin the brief list given above.

The ingredients are easily stated. For each ingredient it is possible to usematerials of high quality—with conceptual and empirical nuances finelygraded. However, in order to make rapid progress, I have introduced somecheap substitutes which I have indicated in each case in the followinglist:

� Specification of an individual social unit such as a single person, thenuclear family or the extended family. I shall refer casually to ‘persons’.

� Description of a particular attribute (or attributes) such as income,wealth, land-ownership or voting strength. I shall use the term ‘income’as a loose coverall expression.

� A method of representation or aggregation of the allocation of ‘income’among the ‘persons’ in a given population.

2

First Principles

The list is simple and brief, but it will take virtually the whole book to dealwith these fundamental ingredients, even in rudimentary terms.

1.1 A preview of the book

The final item on the list of ingredients will command much of our atten-tion. As a quick glance ahead will reveal we shall spend quite some timelooking at intuitive and formal methods of aggregation in Chapters 2 and 3.In Chapter 2 we encounter several standard measurement tools that are oftenused and sometimes abused. This will be a chapter of ‘ready-mades’ wherewe take as given the standard equipment in the literature without particularregard to its origin or the principles on which it is based. By contrast theeconomic analysis of Chapter 3 introduces specific distributional principleson which to base comparisons of inequality. This step, incorporating explicitcriteria of social justice, is done in three main ways: social welfare analysis,the concept of distance between income distributions, and an introductionto the axiomatic approach to inequality measurement. On the basis of theseprinciples we can appraise the tailor-made devices of Chapter 3 as well asthe off-the-peg items from Chapter 2. Impatient readers who want a quicksummary of most of the things one might want to know about the propertiesof inequality measures could try turning to page 74 for an instant answer.

Chapter 4 approaches the problem of representing and aggregating infor-mation about the income distribution from a quite different direction. Itintroduces the idea of modelling the income distribution rather than justtaking the raw bits and pieces of information and applying inequality mea-sures or other presentational devices to them. In particular we deal withtwo very useful functional forms of income distribution that are frequentlyencountered in the literature.

In my view the ground covered by Chapter 5 is essential for an adequateunderstanding of the subject matter of this book. The practical issues whichare discussed there put meaning into the theoretical constructs with whichyou will have become acquainted in Chapters 2 to 4. This is where you willfind discussion of the practical importance of the choice of income defini-tion (ingredient 1) and of income receiver (ingredient 2); of the problemsof using equivalence scales to make comparisons between heterogeneousincome units and of the problems of zero values when using certain defini-tions of income. In Chapter 5 also we shall look at how to deal with patchydata, and how to assess the importance of inequality changes empirically.

The back end of the book contains two further items that you may findhelpful. Appendix A has been used mainly to tidy away some of the morecumbersome formulas which would otherwise have cluttered the text; you

3

Measuring Inequality

may want to dip into it to check up on the precise mathematical definitionsand results that are described verbally or graphically in the main text.Appendix B (Notes on Sources and Literature) has been used mainly to coverliterature references which would otherwise have also cluttered the text; ifyou want to follow up the principal articles on a specific topic, or to trackdown the reference containing detailed proof of some of the key results, thisis where you should turn first; it also gives you the background to the dataexamples found throughout the book.

Finally, a word or two about this chapter. The remainder of the chapterdeals with some of the issues of principle concerning all three ingredients onthe list; it provides some forward pointers to other parts of the book wheretheoretical niceties or empirical implementation is dealt with more fully; italso touches on some of the deeper philosophical issues that underpin aninterest in the subject of measuring inequality. It is to theoretical questionsabout the second of the three ingredients of inequality measurement thatwe shall turn first.

1.2 Inequality of what?

Let us consider some of the problems of the definition of a personal attribute,such as income, that is suitable for inequality measurement. This attributecan be interpreted in a wide sense if an overall indicator of social inequalityis required, or in a narrow sense if one is concerned only with inequality inthe distribution of some specific attribute or talent. Let us deal first with thespecial questions raised by the former interpretation.

If you want to take inequality in a global sense, then it is evident that youwill need a comprehensive concept of ‘income’—an index that will serve torepresent generally a person’s well-being in society. There are a number ofpersonal economic characteristics which spring to mind as candidates forsuch an index—for example, wealth, lifetime income, weekly or monthlyincome. Will any of these do as an all-purpose attribute?

While we might not go as far as Anatole France in describing wealth as a‘sacred thing’, it has an obvious attraction for us (as students of inequality).For wealth represents a person’s total immediate command over resources.Hence, for each man or woman we have an aggregate which includes themoney in the bank, the value of holdings of stocks and bonds, the valueof the house and the car, his ox, his ass, and everything that he has. Thereare two difficulties with this. First, how are these disparate possessions to bevalued and aggregated in money terms? It is not clear that prices ruling in themarket (where such markets exist) appropriately reflect the relative economicpower inherent in these various assets. Second, there are other, less tangible

4

First Principles

assets which ought perhaps to be included in this notional command overresources, but which a conventional valuation procedure would omit.

One major example of this is a person’s occupational pension rights:having a job that entitles me to a pension upon my eventual retirementis certainly valuable, but how valuable? Such rights may not be susceptibleto being cashed in like other assets so that their true worth is tricky to assess.

A second important example of such an asset is the presumed prerogativeof higher future incomes accruing to those possessing greater educationor training. Surely the value of these income rights should be included inthe calculation of a person’s wealth just as is the value of other income-yielding assets such as stocks or bonds? To do this we need an aggregate ofearnings over the entire life span. Such an aggregate—‘lifetime income’—in conjunction with other forms of wealth appears to yield the index ofpersonal well-being that we seek, in that it includes in a comprehensivefashion the entire set of economic opportunities enjoyed by a person.The drawbacks, however, are manifest. Since lifetime summation of actualincome receipts can only be performed once the income recipient is deceased(which limits its operational usefulness), such a summation must be carriedout on anticipated future incomes. Following this course we are led into thedifficulty of forecasting these income prospects and of placing on them avaluation that appropriately allows for their uncertainty. Although I do notwish to assert that the complex theoretical problems associated with suchlifetime aggregates are insuperable, it is expedient to turn, with an eye onChapter 5 and practical matters, to income itself.

Income—defined as the increase in a person’s command over resourcesduring a given time period—may seem restricted in comparison with theall-embracing nature of wealth or lifetime income. It has the obvious disad-vantages that it relates only to an arbitrary time unit (such as one year) andthus that it excludes the effect of past accumulations except in so far as theseare deployed in income-yielding assets. However, there are two principaloffsetting merits:

� if income includes unearned income, capital gains, and ‘income in kind’as well as earnings, then it can be claimed as a fairly comprehensiveindex of a person’s well-being at a given moment;

� information on personal income is generally more widely available andmore readily interpretable than for wealth or lifetime income.

Furthermore, note that none of the three concepts that have been dis-cussed completely covers the command over resources for all goods and ser-vices in society. Measures of personal wealth or income exclude ‘social wage’elements such as the benefits received from communally enjoyed items like

5

Measuring Inequality

municipal parks, public libraries, the police, and ballistic missile systems, theinterpersonal distribution of which services may only be conjectured.

In view of the difficulty inherent in finding a global index of ‘well-offness’, we may prefer to consider the narrow definition of the thing called‘income’. Depending on the problem in hand, it can make sense to lookat inequality in the endowment of some other personal attribute, such asconsumption of a particular good, life expectancy, land ownership, etc.This may be applied also to publicly owned assets or publicly consumedcommodities if we direct attention not to interpersonal distribution but tointercommunity distribution—for example, the inequality in the distributionof per capita energy consumption in different countries. The problems con-cerning ‘income’ that I now discuss apply with equal force to the widerinterpretation considered in the earlier paragraphs.

It is evident from the foregoing that two key characteristics of the ‘income’index are that it be measurable and that it be comparable among differentpersons. That these two characteristics are mutually independent can bedemonstrated by two contrived examples. First, to show that an index mightbe measurable but not comparable, take the case where well-being is mea-sured by consumption per head within families, the family rather than theindividual being taken as the basic social unit. Suppose that consumption byeach family in the population is known but that the number of persons isnot. Then for each family, welfare is measurable up to an arbitrary change inscale, in this sense: for family A doubling its income makes it twice as well-off, trebling it makes it three times as well-off; the same holds for familyB; but A’s welfare scale and B’s welfare scale cannot be compared unless weknow the numbers in each family. Second, to show that an index may beinterpersonally comparable, but not measurable in the conventional sense,take the case where ‘access to public services’ is used as an indicator ofwelfare. Consider two public services, gas and electricity supply—householdsmay be connected to one or to both or to neither of them—and the followingscale (in descending order of amenity) is generally recognized:

� access to both gas and electricity;� access to electricity only;� access to gas only;� access to neither.

We can compare households’ amenities—A and B are as well-off as eachother if they are both connected only to electricity—but it makes no senseto say that A is twice as well-off if it is connected to gas as well as electricity.

It is possible to make some progress in the study of inequality with-out measurability of the welfare index, and sometimes even without full

6

First Principles

comparability. For most of the time, however, I shall make both theseassumptions, which may be unwarranted. For this implies that when I writethe word ‘income’, I assume that it is so defined that adjustment has alreadybeen made for non-comparability on account of differing needs, and thatfundamental differences in tastes (with regard to relative valuation of leisureand monetary income, for example) may be ruled out of consideration. Weshall reconsider the problems of non-comparability in Chapter 5.

The final point in connection with the ‘income’ index that I shall mentioncan be described as the ‘constant amount of cake’. We shall usually talk ofinequality freely as though there is some fixed total of goodies to be sharedamong the population. This is definitionally true for certain quantities, suchas the distribution of acres of land (except perhaps in the Netherlands).However, this is evidently questionable when talking about income as con-ventionally defined in economics. If an arbitrary change is envisaged inthe distribution of income among persons, we may reasonably expect thatthe size of the cake to be divided—national income—might change as aresult. Or, if we try to compare inequality in a particular country’s incomedistribution at two points in time, it is quite likely that total income willhave changed during the interim. Moreover, if the size of the cake changes,either autonomously or as a result of some redistributive action, this changein itself may modify our view of the amount of inequality that there is insociety.

Having raised this important issue of the relationship between interper-sonal distribution and the production of economic goods, I shall temporarilyevade it by assuming that a given whole is to be shared as a number ofequal or unequal parts. For some descriptions of inequality this assumptionis irrelevant. However, since the size of the cake as well as its distribution isvery important in social welfare theory, we shall consider the relationshipbetween inequality and total income in Chapter 3 (particularly page 48),and examine the practical implications of a growing—or dwindling—cake inChapter 5 (see page 143.)

1.3 Inequality measurement, justice, and poverty

So what is meant by an inequality measure? In order to introduce this devicewhich serves as the third ‘ingredient’ mentioned previously, let us try asimple definition which roughly summarizes the common usage of the term:

� a scalar numerical representation of the interpersonal differences inincome within a given population.

Now let us take this bland statement apart.

7

Measuring Inequality

Scalar Inequality

The use of the word ‘scalar’ implies that all the different features of inequalityare compressed into a single number—or a single point on a scale. Appealingarguments can be produced against the contraction of information involvedin this aggregation procedure. Should we don this one-dimensional strait-jacket when surely our brains are well-developed enough to cope with morethan one number at a time? There are three points in reply here.

First, if we want a multi-number representation of inequality, we caneasily arrange this by using a variety of indices each capturing a differentcharacteristic of the social state, and each possessing attractive properties as ayardstick of inequality in its own right. We shall see some practical examples(in Chapters 3 and 5) where we do exactly that.

Second, however, we often want to answer a question like ‘has inequalityincreased or decreased?’ with a straight ‘yes’ or ‘no’. But if we make theconcept of inequality multi-dimensional we greatly increase the possibilityof coming up with ambiguous answers. For example, suppose we representinequality by two numbers, each describing a different aspect of inequalityof the same ‘income’ attribute. We may depict this as a point such as B inFig. 1.1, which reveals that there is an amount I1 of type-1 inequality, andI2 of type-2 inequality. Obviously all points like C represent states of societythat are more unequal than B, and points such as A represent less unequalstates. But it is much harder to compare B and D or to compare B and E. Ifwe attempt to resolve this difficulty, we will find that we are effectively usinga single-number representation of inequality after all.

Third, multi-number representations of income distributions may wellhave their place alongside a standard scalar inequality measure. As we shall

typ

e-2

ineq

ualit

y

CE

I2B

AD

type-1 inequalityI1

FIG. 1.1. Two types of inequality

8

First Principles

more inequality

1980

1990 1992

1985

less inequality

FIG. 1.2. An inequality ranking

see in later chapters, even if a single agreed number scale (I1 or I2) is unavail-able, or even if a collection of such scales (I1 and I2) cannot be found, wemight be able to agree on an inequality ranking. This is a situation where—although you may not be able to order or to sort the income distributionsuniquely (most equal at the bottom, most unequal at the top)—you never-theless find that you can arrange them in a pattern that enables you to geta fairly useful picture of what is going on. To get the idea, have a look atFig. 1.2. We might find that over a period of time the complex changes inthe relevant income distribution can be represented schematically as in theleague table illustrated there: you can say that inequality went down from1980 to 1985, and went up from 1985 to either 1990 or 1992; but you cannotsay whether inequality went up or down in the early nineties. Although thismethod of looking at inequality is not decisive in terms of every possiblecomparison of distributions, it could still provide valuable information.

Numerical Representation

What interpretation should be placed on the phrase ‘numerical representa-tion’ in the definition of an inequality measure? The answer to this dependson whether we are interested in just the ordering properties of an inequalitymeasure or in the actual size of the index and of changes in the index.

To see this, look at the following example. Imagine four different socialstates A, B, C, D, and four rival inequality measures I1, I2, I3, I4. The firstcolumn in Table 1.1 gives the values of the first measure, I1, realized ineach of the four situations. Are any of the other candidates equivalent toI1? Notice that I3 has a strong claim in this regard. Not only does it rankA, B, C, D in the same order, it also shows that the percentage change ininequality in going from one state to another is the same as if we use theI1 scale. If this is true for all social states, we will call I1 and I3 cardinally

9

Measuring Inequality

Table 1.1. Four inequality scales

I1 I2 I3 I4

A .10 .13 .24 .12B .25 .26 .60 .16C .30 .34 .72 .20D .40 .10 .96 .22

equivalent. More formally, I1 and I3 are cardinally equivalent if one scalecan be obtained from the other, multiplying by a positive constant andadding or subtracting another constant. In the above case, we multiply I1

by 2.4 and add on zero to get I3. Now consider I4: it ranks the four statesA to D in the same order as I1, but it does not give the same percentagedifferences (compare the gaps between A and B and between B and C). SoI1 and I4 are certainly not cardinally equivalent. However, if it is true thatI1 and I4 always rank any set of social states in the same order, we will saythat the two scales are ordinally equivalent.1 Obviously cardinal equivalenceentails ordinal equivalence, but not vice versa. Finally we note that I2 is notordinally equivalent to the others, although for all we know it may be aperfectly sensible inequality measure.

Now let A be the year 1970, let B be 1960, and D be 1950. Given thequestion, ‘Was inequality less in 1970 than it was in 1960?’, I1 produces thesame answer as any other ordinally equivalent measure (such as I3 or I4):‘numerical representation’ simply means a ranking. But, given the question,‘Did inequality fall more in the 1960s than it did in the 1950s?’, I1 onlyyields the same answer as other cardinally equivalent measures (I3 alone):here inequality needs to have the same kind of ‘numerical representation’ astemperature on a thermometer.

Income Differences

Should any and every ‘income difference’ be reflected in a measure ofinequality? The commonsense answer is ‘No’, for two basic reasons: needand merit. The first reason is the more obvious: large families and the sickneed more resources than the single, healthy person to support a particulareconomic standard. Hence in a ‘just’ allocation, we would expect thosewith such greater needs to have a higher income than other people; suchincome differences would thus be based on a principle of justice, and should

1 A mathematical note: I1 and I4 are ordinally equivalent if one may be written as a monoton-ically increasing function of the other, say I1 = f (I4), where dI1/dI4 > 0. An example of sucha function is log(I ). I1 and I3 are cardinally equivalent if f takes the following special form:I1 = a + bI3, where b is a positive number.

10

First Principles

not be treated as inequalities. To cope with this difficulty one may adjustthe income concept such that allowance is made for diversity of need, asmentioned in the last section; this is something which needs to be donewith some care—as we will find in Chapter 5 (see the discussion on page110).

The case for ignoring differences on account of merit depends on the inter-pretation attached to ‘equality’. One obviously rough-and-ready descriptionof a just allocation requires equal incomes for all irrespective of personaldifferences other than need. However, one may argue strongly that in a justallocation higher incomes should be received by doctors, heroes, inventors,Stakhanovites, and other deserving persons. Unfortunately, in practice it ismore difficult to make adjustments similar to those suggested in the case ofneed and, more generally, even distinguishing between income differencesthat do represent genuine inequalities and those that do not poses a seriousproblem.

Given Population

The last point about the definition of an inequality measure concerns thephrase ‘given population’ and needs to be clarified in two ways. First, whenexamining the population over say a number of years, what shall we doabout the effect on measured inequality of persons who either enter orleave the population, or whose status changes in some other relevant way?The usual assumption is that as long as the overall structure of incomedifferences stays the same (regardless of whether different personnel are nowreceiving those incomes), measured inequality remains unaltered. Hencethe phenomenon of social mobility within or in-and-out of the populationeludes the conventional method of measuring inequality, although somemight argue that it is connected with inequality of opportunity.2 Secondly,one is not exclusively concerned with inequality in the population as awhole. It is useful to be able to decompose this ‘laterally’ into inequalitywithin constituent groups, differentiated regionally or demographically per-haps, and inequality between these constituent groups. Indeed, once oneacknowledges basic heterogeneities within the population, such as age orsex, awkward problems of aggregation may arise, although we shall ignorethem. It may also be useful to decompose inequality ‘vertically’ so that onelooks at inequality within a subgroup of the rich or of the poor, for example.Hence the specification of the given population is by no means a trivialprerequisite to the application of inequality measurement.

2 Check Question 6 at the end of the chapter to see if you concur with this view.

11

Measuring Inequality

Although the definition has made it clear that an inequality measurecalls for a numerical scale, I have not suggested how this scale should becalibrated. Specific proposals for this will occupy Chapters 2 and 3, but acouple of basic points may be made here.

You may have noticed just now that the notion of justice was slippedin while income differences were being considered. In most applications ofinequality analysis social justice really ought to be centre stage. That morejust societies should register lower numbers on the inequality scale evidentlyaccords with an intuitive appreciation of the term ‘inequality’. But, on whatbasis should principles of distributional justice and concern for inequalitybe based? Economic philosophers have offered a variety of answers. Thisconcern could be no more than the concern about the everyday risks of life:just as individuals are upset by the financial consequences of having their carstolen or missing their plane, so too they would care about the hypotheticalrisk of drawing a losing ticket in a lottery of life chances; this lottery couldbe represented by the income distribution in the UK, the USA, or wherever;nice utilitarian calculations on the balance of small-scale gains and lossesbecome utilitarian calculations about life chances; aversion to risk translatesinto aversion to inequality. Or the concern could be based upon the altruisticfeelings of each human towards his fellows that motivates charitable action.Or again it could be that there is a social imperative toward concern for theleast advantaged—and perhaps concern about the inordinately rich—thattranscends the personal twinges of altruism and envy. It could be simpleconcern about the possibility of social unrest. It is possible to construct acoherent justice-based theory of inequality measurement on each of thesenotions, although that takes us beyond the remit of this book.

However, if we can clearly specify what a just distribution is, such a stateprovides the zero from which we start our inequality measure. But even awell-defined principle of distributive justice is not sufficient to enable oneto mark off an inequality scale unambiguously when considering diverseunequal social states. Each of the apparently contradictory scales I1 andI2 considered in Fig. 1.1 and Table 1.1 might be solidly founded on thesame principle of justice, unless such a principle were extremely narrowlydefined.

The other general point is that we might suppose there is a close linkbetween an indicator of the extent of poverty and the calibration of a mea-sure of economic inequality. This is not necessarily so, because two ratherdifferent problems are generally involved. In the case of the measurementof poverty, one is concerned primarily with that segment of the populationfalling below some specified ‘poverty line’; to obtain the poverty measureone may perform a simple head count of this segment, or calculate the gap

12

First Principles

between the average income of the poor and the average income of thegeneral population, or carry out some other computation on poor people’sincomes in relation to each other and to the rest of the population. Now, inthe case of inequality one generally wishes to capture the effects of incomedifferences over a much wider range. Hence it is perfectly possible for themeasured extent of poverty to be declining over time, while at the sametime and in the same society measured inequality increases due to changesin income differences within the non-poor segment of the population, orbecause of migrations between the two groups. (If you are in doubt aboutthis you might like to have a look at Question 5 on page 14.) Poverty willmake a few guest appearances in the course of this book, but on the wholeour discussion of inequality has to take a slightly different track from themeasurement of poverty.

1.4 Inequality and the social structure

Finally we return to the subject of the first ingredient, namely the basic socialunits used in studying inequality—or the elementary particles of which weimagine society to be constituted. The definition of the social unit, whetherit be a single person, a nuclear family, or an extended family depends intrin-sically upon the social context, and upon the interpretation of inequalitythat we impose. Although it may seem natural to adopt an individualisticapproach, some other ‘collective’ unit may be more appropriate.

When economic inequality is our particular concern, the theory of thedevelopment of the distribution of income or wealth may itself influencethe choice of the basic social unit. To illustrate this, consider the classicalview of an economic system, the population being subdivided into distinctclasses of workers, capitalists, and landowners. Each class is characterized bya particular function in the economic order and by an associated type ofincome—wages, profits, and rents. If, further, each is regarded as internallyfairly homogeneous, then it makes sense to pursue the analysis of inequalityin class terms rather than in terms of individual units.

However, so simple a model is unsuited to describing inequality in asignificantly heterogeneous society, despite the potential usefulness of classanalysis for other social problems. A superficial survey of the world aroundus reveals rich and poor workers, failed and successful capitalists, and severalpeople whose rôles and incomes do not fit into neat slots. Hence the focusof attention in this book is principally upon individuals rather than types,whether the analysis is interpreted in terms of economic inequality or someother sense.

13

Measuring Inequality

Thus reduced to its essentials it might appear that we are dealing with apurely formal problem, which sounds rather dull. This is not so. Althoughthe subject matter of this book is largely technique, the techniques involvedare essential for coping with the analysis of many social and economicproblems in a systematic fashion; and these problems are far from dull oruninteresting.

1.5 Questions

1. In Syldavia the economists find that (annual) household consumptionc is related to (annual) income y by the formula

c = · + ‚y,

where · > 0 and 0 < ‚ < 1. Because of this, they argue, inequality ofconsumption must be less than inequality of income. Provide an intu-itive argument for this.

2. Ruritanian society consists of three groups of people: Artists, Bureau-crats and Chocolatiers. Each Artist has high income (15,000 RuritanianMarks) with a 50 per cent probability, and low income (5000 RM) with50 per cent probability. Each Bureaucrat starts working life on a salaryof 5000 RM and then benefits from an annual increment of 250 RMover the 40 years of his (perfectly safe) career. Chocolatiers get a straightannual wage of 10,000 RM. Discuss the extent of inequality in Ruritaniaaccording to annual income and lifetime income concepts.

3. In Borduria the government statistical service uses an inequality indexthat in principle can take any value greater than or equal to 0. You wantto introduce a transformed inequality index that is ordinally equivalentto the original but that will always lie between zero and 1. Which of thefollowing will do?

II + 1

,

√I

I + 1,

II − 1

,√

I .

4. Methods for analysing inequality of income could be applied to inequal-ity of use of specific health services (Williams and Doessel 2006). Whatwould be the principal problems of trying to apply these methods toinequality of health status?

5. After a detailed study of a small village, government experts reckonthat the poverty line is 100 rupees a month. In January a joint teamfrom the Ministry of Food and the Central Statistical Office carry outa survey of living standards in the village: the income for each villager(in rupees per month) is recorded. In April the survey team repeats the

14

First Principles

exercise. The number of villagers was exactly the same as in January,and villagers’ incomes had changed only slightly. An extract from theresults is as follows:

January April. . . . . .

. . . . . .

92 9295 9298 101

104 104. . . . . .

. . . . . .

(the dots indicate the incomes of all the other villagers for whomincome did not change at all from January to April). The Ministry ofFood writes a report claiming that poverty has fallen in the village;the Central Statistical Office writes a report claiming that inequalityhas risen in the village. Can they both be right? (See Thon 1979, 1981,1983b for more on this.)

6. In Fantasia there is a debate about educational policy. The currentsituation is that there are two equal-sized groups of people, the Dark-greys who all get an income of $200, and the Light-greys who all getan income of $600, as in the top part of the accompanying diagram,labelled ‘Parents’. One group of educational experts argue that if theFantasian government adopts policy A then the future outcome for thenext generation will be as shown on the left side of the diagram, labelled

Parents Parents

6004002000

(a) (b)

1,4001,2001,000800$

Children Children

6004002000 1,4001,2001,000800$

6004002000 1,4001,2001,000800$

6004002000 1,4001,2001,000800$

FIG. 1.3. Alternative policies for Fantasia

15

Measuring Inequality

‘Children’; another group of experts argue that if policy B is adopted,the outcome for the next generation will be that on the right side of thediagram (shading shows are used to show whether the children comefrom Dark-grey families or Light-grey families). According to your view:� Which of policies A and B would produce lower inequality of out-

come?� Which policy produces higher social mobility?� Which policy is characterized by lower inequality of opportunity?

16

2

Charting Inequality

F. Scott Fitzgerald: ‘The rich are different from us.’

Ernest Hemingway: ‘Yes, they have more money.’

If society really did consist of two or three fairly homogeneous groups,economists and others could be saved a lot of trouble. We could then simplylook at the division of income between landlords and peasants, among work-ers, capitalists, and rentiers, or any other appropriate sections. Naturally wewould still be faced with such fundamental issues as how much each groupshould possess or receive, whether the statistics are reliable, and so on, butquestions such as ‘what is the income distribution?’ could be satisfactorilymet with a snappy answer ‘65 per cent to wages, 35 per cent to profits’. Ofcourse matters are not that simple. As we have argued, we want a way oflooking at inequality that reflects both the depth of poverty of the ‘havenots’ of society and the height of well-being of the ‘haves’: it is not easy todo this just by looking at the income accruing to, or the wealth possessedby, two or three groups.

So in this chapter we will look at several quite well-known ways of pre-senting inequality in a large heterogeneous group of people. They are allmethods of appraising the sometimes quite complicated information that iscontained in an income distribution, and they can be grouped under threebroad headings: diagrams, inequality measures, and rankings. To make theexposition easier I shall continue to refer to ‘income distribution’, but youshould bear in mind, of course, that the principles can be carried over to thedistribution of any other variable that you can measure and that you thinkis of economic interest.

17

Measuring Inequality

2.1 Diagrams

Putting information about income distribution into diagrammatic form isa particularly instructive way of representing some of the basic ideas aboutinequality. There are several useful ways of representing inequality in pic-tures; the four that I shall discuss are introduced in the accompanying box.Let us have a closer look at each of them.

� Parade of Dwarfs

� Frequency distribution

� Lorenz curve

� Log transformation

PICTURES OF INEQUALITY

Jan Pen’s Parade of Dwarfs is one of the most persuasive and attractivevisual aids in the subject of income distribution. Suppose that everyone inthe population had a height proportional to his or her income, with theperson on average income being endowed with average height. Line peopleup in order of height and let them march past in some given time interval—let us say one hour. Then the sight that would meet our eyes is represented bythe curve in Fig. 2.1.1 The whole parade passes in the interval represented byOC. But we do not meet the person with average income until we get to thepoint B (when well over half the parade has gone by). Divide total income bytotal population: this gives average or mean income (y) and is represented bythe height OA. We have oversimplified Pen’s original diagram by excludingfrom consideration people with negative reported incomes, which wouldinvolve the curve crossing the base line towards its left-hand end. And, inorder to keep the diagram on the page, we have plotted the last point of thecurve (D) in a position that would be far too low in practice.

This diagram highlights the presence of any extremely large incomes and,to a certain extent, abnormally small incomes. But we may have reservationsabout the degree of detail that it seems to impart concerning middle incomereceivers. We shall see this point recur when we use this diagram to derivean inequality measure that informs us about changes in the distribution.

Frequency distributions are well-tried tools of statisticians, and are discussedhere mainly for the sake of completeness and as an introduction for thoseunfamiliar with the concept—for a fuller account see the references cited in

1 Those with especially sharp eyes will see that the source is more than 20 years old. There is agood reason for using these data—see the notes on page 180.

18

Charting Inequality

y D

£30,000

£35,000

£25,000

£20,000

£15,000

Inco

me

£10,000

A Q

£5,000

G

0.0 0.2 0.4 0.6 0.8 1.0Proportion of population

BO C

FIG. 2.1. The Parade of Dwarfs UK income before tax, 1984/5Source: Economic Trends, November 1987

the notes to this chapter. An example is found in Fig. 2.2. Suppose you werelooking down on a field. On one side, the axis Oy, there is a long straightfence marked off by income categories: the physical distance between anytwo points along the fence directly corresponds to the income differencesthey represent. Then, get the whole population to come into the field andline up in the strip of land marked off by the piece of fence correspondingto their income bracket. So the £10,000-to-£12,500-a-year persons stand onthe shaded patch. The shape that you get will resemble the stepped line inFig. 2.2—called a histogram—which represents the frequency distribution. Itmay be that we regard this as an empirical observation of a theoretical curvewhich describes the income distribution, for example the smooth curvedrawn in Fig. 2.2. The relationship f (y) charted by this curve is sometimes

19

Measuring Inequality

f(y)

This area represents 1,000,000 income-receiving units

Total population is 31,415,000 units

Freq

uenc

y

Income

£0

yAO

£2,5

00

£5,0

00

£7,5

00

£10,

000

£12,

500

£15,

000

£17,

500

£20,

000

£22,

500

£25,

000

£27,

500

£30,

000

£32,

500

£35,

000

£37,

500

£40,

000

£42,

500

£45,

000

£47,

500

£50,

000

FIG. 2.2. Frequency distribution of incomeSource: as for Fig. 2.1

known as a density function where the scale is chosen such that the area underthe curve and above the line Oy is standardized at unity.

The frequency distribution shows the middle income ranges more clearly.But perhaps it is not so readily apparent what is going on in the upper tail;indeed, in order to draw the figure, we have deliberately made the lengthof the fence much too short. (On the scale of this diagram it ought to be100 metres at least!) This diagram and the Parade of Dwarfs are, however,intimately related; and we show this by constructing Fig. 2.3 from Fig. 2.2.The horizontal scale of each figure is identical. On the vertical scale of Fig.2.3 we plot ‘cumulative frequency’ written F (y), which is proportional to thearea under the curve and to the left of y in Fig. 2.2. If you experiment withthe diagram you will see that as you increase y, F (y) usually goes up (it cannever decrease)—from a value of zero when you start at the lowest incomereceived, up to a value of one for the highest income. Thus, supposing weconsider y = £30, 000, we plot a point in Fig. 2.3 that corresponds to theproportion of the population with £30, 000 or less. And we can repeat thisoperation for every point on either the empirical curve or on the smooththeoretical curve.

The visual relationship between Figs 2.1 and 2.3 is now obvious. As afurther point of reference, the position of mean income has been drawn

20

Charting Inequality

1

0.8

0.6B

0.4Freq

uenc

y

Income

F(y)

0.2

0yA

£0

£5,0

00

£10,

000

£15,

000

£20,

000

£25,

000

£30,

000

£35,

000

£40,

000

£45,

000

£50,

000

FIG. 2.3. Cumulative frequency distributionSource: as for Fig. 2.1

in at the point A in the two figures. (If you still don’t see it, try turning thepage round!)

The Lorenz curve was introduced in 1905 as a powerful method of illustrat-ing the inequality of wealth distribution. A simplified explanation of it runsas follows.

Once again, line up everybody in ascending order of incomes and let themparade by. Measure F (y), the proportion of people who have passed by, alongthe horizontal axis of Fig. 2.4. Once point C is reached everyone has gone by,so F (y) = 1. Now as each person passes, hand him his share of the ‘cake’—that is, the proportion of total income that he receives. When the paradereaches people with income y, let us suppose that a proportion �(y) of thecake has gone. So of course when F (y) = 0, �(y) is also 0 (no cake gone);and when F (y) = 1, �(y) is also 1 (all the cake has been handed out). �(y) ismeasured on the vertical scale in Fig. 2.4, and the graph of � plotted againstF is the Lorenz curve. Note that it is always convex toward the point C, thereason for which is easy to see. Suppose that the first 10 per cent have filed by(F (y1) = 0.1) and you have handed out 4 per cent of the cake (�(y1) = 0.04);then by the time the next 10 per cent of the people go by (F (y2) = 0.2), youmust have handed out at least 8 per cent of the cake (�(y2) = 0.08). Why?Because we arranged the parade in ascending order of cake-receivers. Noticetoo that if the Lorenz curve lay along OD we would have a state of perfect

21

Measuring Inequality

1.0D

0.8

Φ(y)

0.6

0.4

Prop

ortio

n of

inco

me

0.5

0.2

P

Φ = L(F)

O C0.0

0.0 0.2 0.4 0.6 0.8 1.0Proportion of population

F(y)B H

FIG. 2.4. Lorenz curve of incomeSource: as for Fig. 2.1

equality, for along that line the first 5 per cent get 5 per cent of the cake, thefirst 10 per cent get 10 per cent . . . and so on.

The Lorenz curve incorporates some principles that are generally regardedas fundamental to the theory of inequality measurement, as we will see laterin this chapter (page 34) and also in Chapter 3 (pages 46 and 62). And againthere is a nice relationship with Fig. 2.1. If we plot the slope of the Lorenzcurve against the cumulative population proportions, F , then we are backprecisely to the Parade of Dwarfs (scaled so that mean income equals unity).Once again, to facilitate comparison, the position where we meet the personwith mean income has been marked as point B, although in the Lorenzdiagram we cannot represent mean income itself. Note that the mean occursat a value of F such that the slope of the Lorenz curve is parallel to OD.

Logarithmic transformation. An irritating problem that arises in drawing thefrequency curve of Fig. 2.2 is that we must either ignore some of the verylarge incomes in order to fit the diagram on the page, or put up with a

22

Charting Inequality

diagram that obscures much of the detail in the middle and lower incomeranges. We can avoid this to some extent by drawing a similar frequencydistribution, but plotting the horizontal axis on a logarithmic scale as inFig. 2.5. Equal distances along the horizontal axis correspond to equal pro-portionate income differences.

Again the point corresponding to mean income, y, has been marked inas A. Note that the length OA equals log(y) and is not the mean of thelogarithms of income. This is marked in as the point A′, so that the lengthOA′ = log(y∗) where y∗ is the geometric mean of the distribution. Assumingincomes are non-negative, the geometric mean, found by taking the meanof the logarithms and then transforming back to natural numbers, can neverexceed the conventional arithmetic mean.

We have now seen four different ways of presenting pictorially the samefacts about income distribution. Evidently each graphical technique mayemphasize quite different features of the distribution: the Parade drawsattention to the enormous height of the well-off; the frequency curvepresents middle incomes more clearly, the logarithmic transformation cap-tures information from each of the ‘tails’ as well as the middle, but at thesame time sacrifices simplicity and ease of interpretation. This difference

f (y)

This area represents 250,000 income-receiving units

Total population is 31,415,000 units

Freq

uenc

y

yAA'O(log scale)

Income

£10,

000

£12,

500

£20,

000

£50,

000

£5,0

00

£2,0

00

£100

,000

FIG. 2.5. Frequency distribution of income (logarithmic scale)Source: as for Figure 2.1

23

Measuring Inequality

in emphasis is partly reflected in the inequality measures derived from thediagrams.

2.2 Inequality measures

We can use Figs 2.1 to 2.5 in order to introduce and illustrate some con-ventional inequality measures. A few of the more important ones that weshall encounter are listed in the accompanying box. Of course, an inequalitymeasure, like any other tool, is to be judged by the kind of job that it does: isit suitably sensitive to changes in the pattern of distribution? Does it respondappropriately to changes in the overall scale of incomes? As we go throughthe items in the box we will briefly consider their principal properties: (aproper job must wait until page 67, after we have considered the importantanalytical points introduced in Chapter 3).

� Range R

� Relative mean deviation M

� Variance V

� Coefficient of variation c

� Gini coefficient G

� Log variance v

INEQUALITY MEASURES

The Parade of Dwarfs suggests the first two of these. First, we have therange, which we define simply as the distance CD in Fig. 2.1 or:

R = ymax − ymin,

where ymax and ymin are, respectively, the maximum and minimum valuesof income in the parade (we may, of course, standardize by consideringR/ymin or R/y). Plato apparently had this concept in mind when he madethe following judgement:

We maintain that if a state is to avoid the greatest plague of all—I mean civil war,though civil disintegration would be a better term—extreme poverty and wealthmust not be allowed to arise in any section of the citizen-body, because bothlead to both these disasters. That is why the legislator must now announce theacceptable limits of wealth and poverty. The lower limit of poverty must be the

24

Charting Inequality

value of the holding. The legislator will use the holding as his unit of measureand allow a man to possess twice, thrice, and up to four times its value.

The Laws, 745.

The problems with the range are evident. Although it might be satisfactoryin a small closed society where everyone’s income is known fairly certainly,it is clearly unsuited to large, heterogeneous societies where the ‘minimum’and ‘maximum’ incomes can at best only be guessed. The measure will behighly sensitive to the guesses or estimates of these two extreme values. Inpractice one might try to get around the problem by using a related conceptthat is more robust: take the gap between the income of the person whoappears exactly at, say, the end of the first three minutes in the Parade, andthat of the person exactly at the 57th minute (the bottom 5 per cent andthe top 5 per cent of the line of people) or the income gap between thepeople at the 6th and 54th minute (the bottom 10 per cent and the top10 per cent of the line of people). However, even if we did that there is a morecompelling reason for having doubts about the usefulness of R. Suppose wecan wave a wand and bring about a society where the person at position Oand the person at position C are left at the same height, but where everyoneelse in between was levelled to some equal, intermediate height. We wouldprobably agree that inequality had been reduced, though not eliminated.But according to R it is just the same!

You might be wondering whether the problem with R arises because itignores much of the information about the distribution (it focuses just on acouple of extreme incomes). Unfortunately we shall find a similar criticismin subtle form attached to the second inequality measure that we can readoff the Parade diagram, one that uses explicitly the income values of allthe individuals. This is the relative mean deviation, which is defined as theaverage absolute distance of everyone’s income from the mean, expressed asa proportion of the mean. Take a look at the shaded portions in Fig. 2.1.These portions, which are necessarily of equal size, constitute the areabetween the Parade curve itself and the horizontal line representing meanincome. In some sense, the larger this area, the greater is inequality. (Trydrawing the Parade with more giants and more dwarfs.) It is conventional tostandardize the inequality measure in unit-free terms, so let us divide by thetotal income (which equals area OCGA). In terms of the diagram the relativemean deviation is then:2

M =area OAQ + area QGD

area OCGA.

2 You are invited to check the technical appendix (p. 153) for formal definitions of this andother inequality measures.

25

Measuring Inequality

y D

A Q G

B CO

FIG. 2.6. The Parade with partial equalization

But now for the fatal weakness of M. Suppose you think that the stature ofthe dwarfs to the left of B is socially unacceptable. You arrange a reallocationof income so that everyone with incomes below the mean (to the left ofpoint B) has exactly the same income. The modified parade then lookslike Fig. 2.6. But notice that the two shaded regions in Fig. 2.6 are exactlythe same area as in Fig. 2.1: so the value of M has not changed. Whateverreallocation you arrange among people to the left of B only, or among peopleto the right of B only, inequality according to the relative mean deviationstays the same.

The relative mean deviation can be easily derived from the Lorenz curve(Fig. 2.4). From the Technical Appendix, page 156, it can be verified thatM = 2[F (y) − �(y)], that is: M = 2[OB − BP]. However, a more common useof the Lorenz curve diagram is to derive the Gini coefficient, G, expressed asthe ratio of the shaded area in Fig. 2.4 to the area OCD. There is a varietyof equivalent ways of defining G; but perhaps the easiest definition is as theaverage difference between all possible pairs of incomes in the population,expressed as a proportion of total income: see pages 155 and 156 for a formaldefinition. The main disadvantage of G is that it places a rather curiousimplicit relative value on changes that may occur in different parts of thedistribution. An income transfer from a relatively rich person to a personwith £x less has a much greater effect on G if the two persons are near the

26

Charting Inequality

middle rather than at either end of the parade.3 So, consider transferring£1 from a person with £10,100 to a person with £10,000. This has a muchgreater effect on reducing G than transferring £1 from a person with £1,100to one with £1,000 or than transferring £1 from a person with £100,100 to aperson with £100,000. This valuation may be desirable, but it is not obviousthat it is desirable: this point about the valuation of transfers is discussedmore fully in Chapter 3 once we have discussed social welfare explicitly.

Other inequality measures can be derived from the Lorenz curve in Fig. 2.4.Two have been suggested in connection with the problem of measuringinequality in the distribution of power, as reflected in voting strength. First,consider the income level y0 at which half the national cake has beendistributed to the parade; that is, �(y0) = 1

2 . Then define the minimal majorityinequality measure as F (y0), which is the distance OH. If � is reinterpretedas the proportion of seats in an elected assembly where the votes are spreadunevenly among the constituencies, as reflected by the Lorenz curve, and ifF is reinterpreted as a proportion of the electorate, then 1 − F (y0) representsthe smallest proportion of the electorate that can secure a majority in theelected assembly. Second, we have the equal shares coefficient, defined as F (y):the proportion of the population that has income y or less (the distance OB),or the proportion of the population that has ‘average voting strength’ or less.Clearly, either of these measures as applied to the distribution of incomeor wealth is subject to essentially the same criticism as the relative meandeviation: they are insensitive to transfers among members of the Paradeon the same side of the person with income y0 (in the case of the minimalmajority measure) or y (the equal shares coefficient): in effect they measurechanges in inequality by only recording transfers between two broadly basedgroups.

Now let us consider Figs 2.2 and 2.5: the frequency distribution and itslog-transformation. An obvious suggestion is to measure inequality in thesame way as statisticians measure dispersion of any frequency distribution.In this application, the usual method would involve measuring the distancebetween the individual’s income yi and mean income y, squaring this, andthen finding the average of the resulting quantity in the whole population.Assuming that there are n people we define the variance:

3 To see why, check the definition of G on page 155 and note the formula for the ‘TransferEffect’ (right-hand column). Now imagine persons i and j located at two points yi and yj , agiven distance x apart, along the fence described on page 19; if there are lots of other persons inthe part of the field between those two points then the transfer-effect formula tells us that theimpact of a transfer from i to j will be large (F

(yj)− F (yi ) is a large number) and vice versa.

It so happens that real-world frequency distributions of income look like that in Fig. 2.2 (witha peak in the mid-income range rather than at either end), so that two income receivers, £100apart, have many people between them if they are located in the mid-income range, but ratherfew people between them if located at one end or other.

27

Measuring Inequality

V =1n

n∑i=1

[yi − y]2. (2.1)

However, V is unsatisfactory in that were we simply to double everyone’sincomes (and thereby double mean income and leave the shape of thedistribution essentially unchanged), V would quadruple. One way round thisproblem is to standardize V. Define the coefficient of variation thus

c =

√Vy

. (2.2)

Another way to avoid this problem is to look at the variance in terms ofthe logarithms of income—to apply the transformation illustrated in Fig. 2.5before evaluating the inequality measure. In fact there are two importantdefinitions:

v =1n

n∑i=1

[log

(yi

y

)]2

, (2.3)

v1 =1n

n∑i=1

[log

(yi

y∗)]2

. (2.4)

The first of these we will call the logarithmic variance, and the second wemay more properly term the variance of the logarithms of incomes. Note thatv is defined relative to the logarithm of mean income; v1 is defined relativeto the mean of the logarithm of income. Either definition is invariant underproportional increases in all incomes.

We shall find that v1 has much to recommend it when we come to examinethe lognormal distribution in Chapter 4. However, c, v, and v1 can be criti-cized more generally on grounds similar to those on which G was criticized.Consider a transfer of £1 from a person with y to a person with y − £100.How does this transfer affect these inequality measures? In the case of c, itdoes not matter in the slightest where in the parade this transfer is effected:so whether the transfer is from a person with £500 to a person with £400, orfrom a person with £100,100 to a person with £100,000, the reduction in cis exactly the same. Thus c will be particularly good at capturing inequalityamong high incomes, but may be of more limited use in reflecting inequalityelsewhere in the distribution. In contrast to this property of c, there appearsto be good reason to suggest that a measure of inequality has the propertythat a transfer of the above type carried out in the low income bracketswould be quantitatively more effective in reducing inequality than if thetransfer were carried out in the high income brackets. The measures v and v1

appear to go some way towards meeting this objection. Taking the exampleof the UK in 1984/5 (illustrated in Figs 2.1 to 2.5 where we have y = £7,522),

28

Charting Inequality

a transfer of £1 from a person with £10,100 to a person with £10,000 reducesv and v1 less than a transfer of £1 from a person with £500 to a person with£400. But, unfortunately, v and v1 ‘overdo’ this effect, so to speak. For if weconsider a transfer from a person with £100,100 to a person with £100,000then inequality, as measured by v or v1, increases! This is hardly a desirableproperty for an inequality measure to possess, even if it does occur only athigh incomes.4

Other statistical properties of the frequency distribution may be pressedinto service as inequality indices. While these may draw attention to partic-ular aspects of inequality—such as dispersion among the very high or verylow incomes, by and large they miss the point as far as general inequalityof incomes is concerned. Consider, for example, measures of skewness. Forsymmetric distributions (such as the Normal distribution, pictured on page81) these measures are zero; but this zero value of the measure may beconsistent with either a very high or a very low dispersion of incomes (asmeasured by the coefficient of variation). This does not appear to capturethe essential ideas of inequality measurement.

Figure 2.2 can be used to derive an inequality measure from quite adifferent source. Stark (1972) argued that an appropriate practical method ofmeasuring inequality should be based on society’s revealed judgements onthe definition of poverty and riches. The method is best seen by redrawingFig. 2.2 as Fig. 2.7. Stark’s study concentrated specifically on UK incomes, butthe idea it embodies seems intuitively very appealing and could be appliedmore generally. The distance OP in Fig. 2.7 we will call the range of ‘lowincomes’: P could have been fixed with reference to the income level atwhich a person becomes entitled to income support, adjusted for need, orwith reference to some proportion of average income5—this is very similarto the specification of a ‘poverty line’. The point R could be determined bythe level at which one becomes liable to any special taxation levied on therich, again adjusted for need.6 The high–low index is then total shaded areabetween the curve and the horizontal axis.

The high–low index seems imaginative and practical, but it suffers fromthree important weaknesses. First, it is subject to exactly the same type ofcriticism that we levelled against M, and against the ‘minimal majority’ and‘equal share’ measures: the measure is completely insensitive to transfersamong the ‘poor’ (to the left of P), among the ‘rich’ (to the right of R), or

4 You will always get this trouble if the ‘poorer’ of the two persons has at least 2.72 times meanincome, in this case £20, 447—see the Technical Appendix, page 164.

5 In Fig. 2.7 it has been located at half median income—check Question 1 on page 37 if youare unsure about how to define the median.

6 Note that in a practical application the positions of both P and R depend on family com-position. This however is a point which we are deferring until later. Figure 2.7 illustrates onetype.

29

Measuring Inequality

f (y)

Freq

uen

cy

Income

£0

yP AO

£2,5

00

£5,0

00

£7,5

00

£10,

000

£12,

500

£15,

000

£17,

500

£20,

000

£22,

500

£25,

000

£27,

500

£30,

000

£32,

500

£35,

000

£37,

500

£40,

000

£42,

500

£45,

000

£47,

500

£50,

000

R

FIG. 2.7. The high–low approach

among the ‘middle income receivers’. Second, there is an awkward dilemmaconcerning the behaviour of points P and R over time. Suppose one leavesthem fixed in relative terms, so that OP and OR increase only at the samerate as mean or median income increases over time. Then one faces thecriticism that one’s current criterion for measuring inequality is based onan arbitrary standard fixed, perhaps, a quarter of a century ago. On theother hand, suppose that OP and OR increase with year-to-year increases insome independent reference income levels (the ‘income-support’ thresholdfor point P and the ‘higher-rate tax’ threshold for point R): then, if theinequality measure shows a rising trend because of more people falling in the‘low income’ category, one must face the criticism that this is just an opticalillusion created by altering, for example, the definition of ‘poor’ people;some compromise between the two courses must be chosen and the resultsderived for a particular application treated with caution.7 Third, there is thepoint that in practice the contribution of the shaded area in the upper tail to

7 There is a further complication in the specific UK application considered by Stark. He fixedpoint P using the basic national assistance (later supplementary benefit) scale plus a percentageto allow for underestimation of income and income disregarded in applying for assistance(benefit); point R was fixed by the point at which one became liable for surtax. However, nationalassistance, supplementary benefit, and surtax are no more. Other politically or socially defined Pand R points could be determined for other times and other countries; but the basic problem ofcomparisons over time that I have highlighted would remain. So too, of course, would problemsof comparisons between countries.

30

Charting Inequality

the inequality measure would be negligible: the behaviour of the inequalitymeasure would be driven by what happens in the lower tail—which may ormay not be an acceptable feature—and would simplify effectively to whetherpeople ‘fall in’ on the right or on the left of point P when we arrange them inthe frequency distribution diagram (Figs 2.2 and 2.7). In effect the high–lowinequality index would become a slightly modified poverty index.

The use of any one of the measures we have discussed in this sectionimplies certain value judgements concerning the way we compare one per-son’s income against that of another. The detail of such judgements will beexplained in the next chapter, although we have already seen a glimpse ofsome of the issues.

2.3 Rankings

Finally, we consider ways of looking at inequality that may lead to ambigu-ous results. Let me say straight away that this sort of non-decisive approachis not necessarily a bad thing. As we noted in Chapter 1 it may be helpful toknow that over a particular period events have altered the income distribu-tion in such a way that we find offsetting effects on the amount of inequality.The inequality measures that we have examined in the previous section actas ‘tie-breakers’ in such an event. Each inequality measure resolves the ambi-guity in its own particular way. Just how we should resolve these ambiguitiesis taken up in more detail in Chapter 3.

� Quantiles

� Shares

TYPES OF RANKING

The two types of ranking on which we are going to focus are highlightedin the accompanying box. To anticipate the discussion a little I should pointout that these two concepts are not really new to this chapter, because theyeach have a simple interpretation in terms of the pictures that we werelooking at earlier. In fact I could have labelled the items in the box as Paraderankings and Lorenz rankings.

We have already encountered quantiles when we were discussing theincomes of the 3rd and 57th minute people as an alternative to the range,R (page 25). Quantiles are best interpreted using either the Parade diagramor its equivalent, the cumulative frequency distribution (Fig. 2.3). Take theParade diagram and reproduce it in Fig. 2.8 (the parade of Fig. 2.1 is the solid

31

Measuring Inequality

£40,000

y

£30,000

£35,000

£25,000 1981/2

Inco

me

1984/5

£15,000

£20,000 Hypothetical

£10,000Q0.8

£0

£5,000

1.00.80.60.40.20.0

Q0.2

F (y)

FIG. 2.8. The Parade and the quantile ranking

curve labelled 1984/5; we will come to the other two curves in a moment).Mark the point 0.2 on the horizontal axis, and read off the correspondingincome on the vertical axis: this gives the 20 per cent quantile (usually knownas the first quintile just to confuse you): the income at the right-hand end ofthe first fifth (12 minutes) of the Parade of Dwarfs. Figure 2.8 also showshow we can do the same for the 80 per cent quantile (the top quintile). Ingeneral we specify a p-quantile—which I will write Qp—as follows. Form theParade of Dwarfs and take the leading proportion p of the Parade (where ofcourse 0 ≤ p ≤ 1), then Qp is the particular income level which demarcatesthe right-hand end of this section of the Parade.8

8 A note on ‘iles’. The generic term is ‘quantile’—which applies to any specified populationproportion p—but a number of special names for particular convenient cases are in use. There isthe median Q0.5, and a few standard sets such as three quartiles (Q0.25, Q0.5, Q0.75), four quintiles(Q0.2, Q0.4, Q0.6, Q0.8) or nine deciles (Q0.1, Q0.2, Q0.3, Q0.4, Q0.5, Q0.6, Q0.7, Q0.8, Q0.9); of courseyou can specify as many other ‘standard’ sets of quantiles as your patience and your knowledgeof Latin prefixes permits.

I have avoided using the term ‘quantile group’, that is sometimes found in the literature, whichrefers to a slice of the population demarcated by two quantiles. For example the slice of thepopulation with incomes at least as great as Q0.1 but less than Q0.2 could be referred to as the‘second decile group’. I have avoided the term because it could be confusing. However, you mayalso find references to such a slice of the population as ‘the second decile’: this usage is not justconfusing, it is wrong; the quantiles are the points on the income scale, not the slices of thepopulation that may be located between the points.

32

Charting Inequality

Q 0.9/Q 0.5

Q 0.75/Q 0.5

Q 0.25/Q 0.5

Q 0.1/Q 0.5

Perc

enta

ge

250

150

200

100

50

0

1965

1970

1975

1980

1985

1990

1995

2000

2005

FIG. 2.9. Quantile ratios of earnings of adult men, UK 1968–2007Source: Annual Survey of Hours and Earnings

How might we use a set of quantiles to compare income distributions?We could produce something like Fig. 2.9, which shows the proportionatemovements of the quantiles of the frequency distribution of earnings in theUK in recent years (the diagram has been produced by standardizing themovements of Q0.1, Q0.25, Q0.75, and Q0.9, by the median, Q0.5). We thencheck whether the quantiles are moving closer together or farther apart overtime. But although the kind of moving apart that we see at the right-handof Fig. 2.9 appears to indicate greater dispersion, it is not clear that thisnecessarily means greater inequality: the movement of the correspondingincome shares (which we discuss in a moment) could in principle be tellingus a different story.9

However, we might also be interested in the simple quantile ranking of thedistributions, which focuses on the absolute values of the quantiles, ratherthan quantile ratios. For example, suppose that over time all the quantilesof the distribution increase by 30 per cent as shown by the curve labelled‘hypothetical’ in Fig. 2.8 (in the jargon we then say that according to thequantile ranking the new distribution dominates the old one). Then we mightsay ‘there are still lots of dwarfs about’, to which the reply might be ‘yes butat least everybody is a bit taller’. Even if we cannot be specific about whetherthis means that there is more or less inequality as a result, the phenomenon

9 In case this is not obvious, consider a population with just 8 people in it: in year A theincome distribution is (2, 3, 3, 4, 5, 6, 6, 7), and it is fairly obvious that Q0.25 = 3 and Q0.75 = 6;in year B the distribution becomes (0, 4, 4, 4, 5, 5, 5, 9) and we can see now that Q0.25 = 4 andQ0.75 = 5. Mean income and median income have remained unchanged and the quartiles havenarrowed: but has inequality really gone down? The story from the shares suggests otherwise:the share of the bottom 25 per cent has actually fallen (from 5/36 to 4/36) and the share of thetop 25 per cent has risen (from 13/36 to 14/36).

33

Measuring Inequality

of a clear quantile ranking is telling us something interesting about theincome distribution which we will discuss more in the next chapter. On theother hand if we were to compare 1981/2 and 1984/5 in Fig. 2.8 we wouldhave to admit that over the three year period the giants became a little taller(Q0.8 increased slightly), but the dwarfs became even shorter (Q0.2 decreasedslightly): the 1984/5 distribution does not dominate that for 1981/2.

Shares by contrast are most easily interpreted in terms of Fig. 2.4. Aninteresting question to ask ourselves in comparing two income distributionsis: does the Lorenz curve of one lie wholly ‘inside’ (closer to the line ofperfect equality) than that of the other? If it does, then we would probablyfind substantial support for the view that the ‘inside’ curve represents a moreevenly-spread distribution. To see this point take a look at Fig. 2.10, andagain do an exercise similar to that which we carried out for the quantilesin Fig. 2.8: for reference let us mark in the share that would accrue to thebottom 20 per cent and to the bottom 80 per cent in distribution B (whichis the distribution before tax—the same as the Lorenz curve that we had inFig. 2.4)—this yields the blobs on the vertical axis. Now suppose we lookat the Lorenz curve marked A, which depicts the distribution for after tax

1.0

0.8

Φ(y)

0.6

Before tax

After tax

0.4A

0.2

Prop

ortio

n of

inco

me

0.00.0 0.2 0.4 0.6 0.8 1.0

Proportion of population

F (y)

B

FIG. 2.10. Ranking by shares. UK 1984/5 incomes before and after taxSource: as for Fig. 2.1

34

Charting Inequality

income. As we might have expected, Fig. 2.10 shows that people in thebottom 20 per cent would have received a larger slice of the after-tax cake(curve A) than they used to get in B. So also those in the bottom 80 per centreceived a larger proportionate slice of the A-cake than their proportionateslice of the B-cake (which of course is equivalent to saying that the richest20 per cent gets a smaller proportionate slice in A than it received in B). It isclear from the figure that we could have started with any other referencepopulation proportions and obtained the same type of answer: whatever‘bottom proportion’ of people F (y) is selected, this group gets a larger shareof the cake �(y) in A than in B (according to the shares ranking, A dominatesB). Moreover, it so happens that whenever this kind of situation arises all theinequality measures that we have presented (except just perhaps v and v1)will indicate that inequality has gone down.

However, quite often this sort of neat result does not apply. If the Lorenzcurves intersect, then the shares-ranking principle cannot tell us whetherinequality is higher or lower, whether it has increased or decreased. Eitherwe accept this outcome with a shrug of the shoulders, or we have to use atie-breaker. This situation is illustrated in Fig. 2.11, which depicts the way inwhich the distribution of income after tax changed from 1981/2 to 1984/5.Notice that the bottom 20 per cent of the population did proportionately

1.0

0.8

Φ(y)

0.6

1981/21984/5

0.4

0.2

Prop

ortio

n of

inco

me

0.00.0 0.2 0.4 0.6 0.8 1.0

Proportion of populationF (y)

FIG. 2.11. Lorenz curves crossing

35

Measuring Inequality

0.0 0.2

1981/21984/5

0.4

FIG. 2.12. Change at the bottom of the income distribution

0.6

0.8

1.0

1981/21984/5 Φ(y)

FIG. 2.13. Change at the top ofthe income distribution

better under 1984/5 than in 1981/2 (see also the close-up in Fig. 2.12),whilst the bottom 80 per cent did better in 1981/2 than in 1984/5 (see alsoFig. 2.13). We shall have a lot more to say about this kind of situation inChapter 3.

2.4 From charts to analysis

We have seen how quite a large number of ad hoc inequality measures areassociated with various diagrams that chart inequality, which are themselvesinterlinked. But, however appealing each of these pictorial representationsmight be, we seem to find important reservations about any of the associatedinequality measures. Perhaps the most unsatisfactory aspect of all of these

36

Charting Inequality

indices is that the basis for using them is indeed ad hoc: the rationale forusing them was based on intuition and a little graphical serendipity. Whatwe really need is proper theoretical basis for comparing income distributionsand for deciding what constitutes a ‘good’ inequality measure.

This is where the ranking techniques that we have been considering comein particularly useful. Although they are indecisive in themselves, they yetprovide a valuable introduction to the deeper analysis of inequality measure-ment to be found in the next chapter.

2.5 Questions

1. Explain how to represent median income in Pen’s Parade. How wouldyou represent the upper and lower quartiles? (See footnote 8).

2. Describe how the following would look:

(a) Pen’s Parade with negative incomes.

(b) The Lorenz curve if there were some individuals with negativeincomes but mean income was still positive.

(c) The Lorenz curve if there were so many individuals with negativeincomes that mean income itself was negative. (See the TechnicalAppendix, page 169, for more on this.)

3. DeNavas-Walt et al. (2008) present a convenient summary of UnitedStates’ income distribution data based on the Annual Social andEconomic Supplement to the 2008 Current Population Survey.

(a) How would the information in their Table A-1 need to be adaptedin order to produce charts similar to Fig. 2.2?

(b) Use the information in Table A-3 to construct Pen’s Parade for1967, 1987, 2007: how does the Parade appear to have shifted over40 years?

(c) Use the information in Table A-3 to construct the Lorenz curvesfor 1967, 1987, 2007: what has happened to inequality over theperiod? (Document is available on-line using the link on the websitehttp://darp.lse.ac.uk/MI3)

4. Reconstruct the histogram for the UK 1984/5, before tax income, usingthe file ‘ET income distribution’ on the website (see the TechnicalAppendix page 177 for guidance on how to use the file). Now mergeadjacent pairs of intervals (so that, for example, the intervals [£0,£2000]and [£2000,£3000] become [£0,£3000]) and redraw the histogram: com-ment on your findings.

5. Using the same data source for the UK 1984/5, before-tax income, con-struct the distribution function corresponding to the histogram drawnin Question 4. Now, instead of assuming that the distribution of income

37

Measuring Inequality

follows the histogram shape, assume that within each income intervalall income receivers get the mean income of that interval. Again drawthe distribution function. Why does it look like a flight of steps?

6. Suppose a country’s tax and benefit system operates so that taxespayable are determined by the formula

t[y − y0]

where y is the person’s original (pre-tax) income, t is the marginaltax rate and y0 is a threshold income. Persons with incomes belowy0 receive a net payment from the government (‘negative tax’). If thedistribution of original income is y1, y2, . . . , yn, use the formulas givenin the Technical Appendix (page 155) to write down the coefficient ofvariation and the Gini coefficient for after tax income. Comment onyour results.

7. Suppose the income distribution before tax is represented by a set ofnumbers {y(1), y(2), . . . , y(n)}, where y(1) ≤ y(2) ≤ y(3) . . . . Write down anexpression for the Lorenz curve. If the tax system were to be of theform given in Question 6, what would be the Lorenz curve of dispos-able (after-tax) income? Will it lie above the Lorenz curve for originalincome? (For further discussion of the point here, see Jakobsson 1976and Eichhorn et al. 1984.)

8. (a) Ruritania consists of six districts that are approximately of equalsize in terms of population. In 2007 per-capita incomes in the sixdistricts were:� Rural ($500, $500, $500);� Urban ($20,000, $28,284, $113,137).

What is the mean income for the Rural districts, for the Urbandistricts, and for the whole of Ruritania. Compute the logarithmicvariance, the relative mean deviation, and the Gini coefficient forthe Rural districts and the Urban districts separately and for thewhole of Ruritania. (You will find that these are easily adaptedfrom the file ‘East-West’ on the website, and you should ignore anyincome differences within any one district.)

(b) By 2008 the per-capita income distribution had changed as follows:� Rural: ($499, $500, $501);� Urban: ($21,000, $26,284, $114,137).

Rework the computations of part (a) for the 2008 data. Did inequal-ity rise or fall between 2007 and 2008? (See the discussion on page65 below for an explanation of this phenomenon.)

38

CONTENTS

List of Boxes xiv

List of Figures xviii

List of Tables xxi

Preface xxiii

Abbreviations xxvii

Notation xxx

Introduction 1

HowMuch Poverty Is There? 2

Why Does Poverty Exist? 4

What Can Be Done to Eliminate Poverty? 4

RoadMap 6

PART ONE HISTORY OF THOUGHT

1. Origins of the Idea of a World Free of Poverty 11

1.1 PROGRESS AGAINST ABSOLUTE POVERTY OVER THE LASTTWO HUNDRED YEARS 12

1.2 PREMODERN IDEAS ABOUT POVERTY 18

Ancient Origins 18

Mercantilism 19

1.3 EARLY ANTIPOVERTY POLICIES 28

1.4 THE FIRST POVERTY ENLIGHTENMENT 35

1.5 THE TRANSITION IN THINKING IN THE NINETEENTH AND EARLYTWENTIETH CENTURIES 47

The Industrial Revolution and Poverty 47

Debates on the Poor Laws 52

A Lost Opportunity in America 54

Utilitarianism 55

viii Content s

The Limitations of Charity 59

Schooling Debates 60

Socialism and the Labor Movement 62

Social Research on Poverty 64

New Thinking in the Early Twentieth Century 72

2. New Thinking on Poverty after 1950 80

2.1 THE SECOND POVERTY ENLIGHTENMENT 80

New Economic Thinking Relevant to Poverty 82

Rawls’s Principles of Justice 87

The Rediscovery of Poverty in America 91

America Declares War on Poverty 94

2.2 DEBATES AND BACKLASHES 97

Poverty and Inequality in America 100

Culture of Poverty? 105

Relative and Subjective Poverty 106

The Basic-IncomeMovement 110

2.3 POVERTY IN THE DEVELOPING WORLD 111

Planning for Rapid Industrialization 111

The Planners’ Critics 113

The Aid Industry and Development Economics Are Born 114

Bringing Inequality in from the Cold 115

Rebalancing Development Thinking 117

Debates on the Poverty Focus 120

Better Data 122

Globalization and Poverty 123

NewMillennium, New Hope, New Challenges 125

PART TWO MEASURES AND METHODS

3. Measuring Welfare 131

3.1 CONCEPTS OF WELFARE 131

Welfarism 132

Extensions and Alternatives to Welfarism 136

Capabilities 137

Social Effects onWelfare 139

Opportunities 141

A Less Ambitious Goal 143

3.2 USING HOUSEHOLD SURVEYS FOR WELFARE MEASUREMENT 143

Survey Design 149

Goods Coverage and Valuation 152

Content s ix

Variability and the Time Period of Measurement 153

Measurement Errors in Surveys 158

Interpersonal Comparisons of Welfare 162

3.3 ALTERNATIVE MEASURES IN THEORY AND PRACTICE 164

Real Consumption per Equivalent Single Adult 164

Predicted Welfare Based on Circumstances 175

Food Share 177

Nutritional Indicators 177

Qualitative andMixedMethods 178

Self-AssessedWelfare 181

3.4 THREE PRINCIPLES 188

Principle 1: Strive to Be Absolutist in the Space of Welfare 188

Principle 2: Avoid Paternalism 189

Principle 3: Recognize Data Limitations 190

4. Poverty Lines 191

4.1 DEBATES ABOUT POVERTY LINES 191

4.2 OBJECTIVE POVERTY LINES 194

Basic Needs Poverty Lines 194

Updating Poverty Lines over Time 203

Revealed Preference Tests of Poverty Lines 204

The Food-Energy Intake Method 206

Relative Poverty Lines 208

Consistency versus Specificity 213

4.3 SUBJECTIVE POVERTY LINES 215

5. Poverty and Inequality Measures 219

5.1 NORMATIVE FOUNDATIONS 220

5.2 MEASURING INEQUALITY 221

5.3 MEASURING POVERTY 230

Poverty Measures 232

The Consumption Floor 238

Estimation Issues 242

Hypothesis Testing 243

Summary 244

5.4 DECOMPOSITIONS OF POVERTY MEASURES 244

Poverty Profiles 244

Changes in Parameters versus Changes in Quantities 249

Growth and Redistribution Components 251

The Sectoral Decomposition of a Change in Poverty 253

Transient versus Chronic Poverty 255

5.5 THE ROBUSTNESS OF POVERTY COMPARISONS 257

x Content s

5.6 PRO-POOR GROWTH AND GROWTH INCIDENCE 264

5.7 MEASURING THE “MIDDLE CLASS” 266

Absolute and Relative Approaches 268

Vulnerability and the Middle Class 269

5.8 POVERTY AND INEQUALITY OF OPPORTUNITY 270

5.9 TARGETING AND INCIDENCE MEASURES 272

Targeting Measures 272

Behavioral Effects 277

5.10 MASHUP INDICES 279

6. Impact Evaluation 290

6.1 KNOWLEDGE GAPS 290

6.2 THREATS TO THE INTERNAL VALIDITY OF AN EVALUATION 292

Endogenous Interventions 292

Spillover Effects 294

Misspecification of the Impact Dynamics 294

Behavioral Responses to an Evaluation 294

6.3 EVALUATION METHODS IN PRACTICE 295

Social Experiments 297

Non-Experimental Methods 298

Difference-in-Difference (DD) Estimators 298

Fixed Effects Regressions 299

Instrumental Variables Estimators 301

6.4 THE EXTERNAL VALIDITY OF AN EVALUATION 304

Heterogeneity in Impacts 305

Portfolio Effects 306

General Equilibrium Effects 308

Structural Models 310

6.5 THE ETHICAL VALIDITY OF AN EVALUATION 311

PART THREE POVERTY AND POLICY

7. Dimensions of Poverty and Inequality 317

7.1 GLOBAL INEQUALITY 317

Large Income Disparities in the World 317

Inequality in the DevelopingWorld 321

7.2 POVERTY MEASURES FOR THE DEVELOPING WORLD 322

Data andMeasurement 324

Absolute Poverty Measures 325

Estimates of the Consumption Floor 328

Content s xi

Poorest Left Behind? 329

Differing Fortunes across Regions 330

The DevelopingWorld’s Bulging Middle Class 331

7.3 POVERTY MEASURES FOR URBAN AND RURAL AREAS 336

7.4 GLOBAL MEASURES OF POVERTY 339

Dissatisfaction with Standard Poverty Measures 339

A Globally Relevant Measure of Poverty 340

Interpreting the Global Relative Poverty Measures: Two Bounds 342

Truly Global Poverty Measures 343

Differences in Weakly Relative Poverty among Developing Countries 346

Concluding Comments on Relative Poverty 349

7.5 POVERTY AND THE NON-INCOME DIMENSIONS OF WELFARE 350

The Economic Gradients in Schooling and Learning 352

The Economic Gradient in Health and Nutrition 356

Obesity 359

Socioeconomic Differences in Mortality 363

Socioeconomic Differences in Fertility 366

Family Size and Composition 368

Female-Headship and Poverty 369

Missing Women 370

The Feminization of Poverty 371

Violence and Poverty 374

8. Growth, Inequality, and Poverty 379

8.1 THEORIES OF ECONOMIC GROWTH AND DISTRIBUTIONAL CHANGE 380

Some Basic Concepts 380

Past Debates onWhether Poor People Benefit from Economic Growth 385

Development in a Segmented Economy 391

The Harris-Todaro Model 399

Labor-Market Frictions 402

Modern Growth Economics 404

Institutions and Growth 408

Factor Distribution and Growth 410

How Inequality and Poverty Can Retard Growth 413

Poverty Traps 421

8.2 EVIDENCE ON GROWTH AND DISTRIBUTIONAL CHANGES 430

The Industrial Revolution Did (Eventually) Benefit Wage Workers 431

Evidence on Distribution Post-Kuznets 433

Growth and Non-Income Dimensions of Welfare 436

Growth and Types of Inequality 437

Urbanization and Poverty 440

Progress against Absolute Poverty 441

xii Content s

Inequality as an Impediment to Pro-Poor Growth 445

Economic Crises and Poverty 447

8.3 EVIDENCE ON DISTRIBUTIONAL IMPEDIMENTS TO GROWTH 451

Macro Evidence That Inequality Impedes Growth 454

Evidence fromMicro Studies 458

8.4 PRO-POOR GROWTH? CASE STUDIES OF CHINA , BRAZIL , AND INDIA 460

China 460

Brazil 469

India 472

9. Economy-Wide and Sectoral Policies 477

9.1 URBAN VERSUS RURAL 478

Urban–Rural Prioritization for Development 479

Growth, Poverty, and Urbanization 481

The Role of Agriculture and Rural Development 485

9.2 LAND POLICIES 485

9.3 HEALTHCARE POLICIES 487

9.4 WATER , SANITATION, AND HYGIENE 490

9.5 SCHOOLING POLICIES 493

Why Do Children from Poor Families Get Less Schooling? 494

Mass Schooling as a Policy Response 497

Banning Child Labor 499

9.6 PUBLIC INFORMATION CAMPAIGNS 499

9.7 PRICE INTERVENTIONS 503

MinimumWages 504

Rent Controls 506

9.8 TRADE POLICIES 508

Whose Gains from Trade? 508

The Globalization Debate 511

9.9 DEVELOPMENT AID 518

External Development Assistance 518

Aid and Poverty Reduction 520

Aid and Growth 524

9.10 POLICIES , AID, AND INSTITUTIONS 529

Policy Advice and Economics 532

Conditions for Effective Aid 535

Capital Flight and Odious Debt 538

Poverty and Poor Institutions 539

Understanding Persistently Poor Institutions 543

Content s xiii

10. Targeted Interventions 547

10.1 AN OVERVIEW OF COVERAGE 547

10.2 INCENTIVES , TARGETING, AND LEAKAGE 550

Information and Incentives 550

The BIG Idea 558

Targeting 560

Leakage 562

10.3 TARGETED TRANSFERS 565

State-Contingent Transfers Financed by Taxation 565

Unconditional Subsidies and Transfers 568

Targeted Incentives for Investing in Human Capital 571

Early Childhood Development 577

A Caveat on Service Quality 579

10.4 OTHER TARGETED POLICIES 580

Workfare 580

Training andWage-Subsidy Schemes 582

Land-Based Targeting and Land Reforms 584

Microfinance for Poor People 585

Poor Area Development Programs 587

Conclusions: Past Progress and Future Challenges 592

PROGRESS AGAINST POVERTY 594

EXPLAINING THE TRANSITION IN THINKING 597

KNOWLEDGE CHALLENGES 599

TWO PATHS GOING FORWARD 601

References 605

Index 663

ABBREV IAT IONS

ADB Asian Development BankAFDC Aid to Families with Dependent ChildrenATET average treatment effect on the treatedBIG basic income guaranteeBM Bourguignon and MorrissonBMI body mass indexBMR basal metabolic rateBOT balance of tradeBPL below poverty lineBPS Biro Pusat StatistikBR Bidani and RavallionBRAC Bangladesh Rural Advancement CommitteeBWR benefit withdrawal rateCBN cost of basic needsCBR crude birth rateCCT conditional cash transferCDF cumulative distribution functionCDR crude death rateCGD Center for Global DevelopmentCGE computable general equilibriumCOL cost of livingCPI consumer price indexCRS constant returns to scaleDD difference-in-differenceDFID Department for International Development, United KingdomDHS Demographic and Health SurveyECD early childhood developmentEECA Eastern Europe and Central AsiaEGS Employment Guarantee SchemeEITC earned income tax creditELQ economic ladder questionFAO Food and Agricultural OrganizationFATF Financial Action Task Force

xxvii

xxviii Abbrev ia t ions

FEI food-energy intakeFFE food for educationFGT Foster, Greer, and ThorbeckeFHH female-headed householdFOD first order dominanceGB Grameen BankGDP gross domestic productGIC growth incidence curveGLC generalized Lorenz curveGTAP Global Trade Analysis ProjectHDI human development indexHDR Human Development ReportHIC high-income countryHT Harris and TodaroI2D2 International Income Distribution DatabaseICP International Comparison ProgramIFAD International Fund for Agricultural DevelopmentIFI international financial institutionIFPRI International Food Policy Research InstituteIMF International Monetary FundIMR infant mortality rateINOPP inequality of opportunityITT intent to treatIV instrumental variableLAC Latin America and the CaribbeanLIC low-income countyLSE London School of EconomicsLSMS Living Standards Measurement StudyLTP Lewis turning pointMB marginal benefitMC marginal costMDG Millennium Development GoalsMENA Middle East and North AfricaMIQ minimum income questionMLD mean log deviationMMU money metric utilityMPC marginal propensity to consumeMPI multidimensional poverty indexMPK marginal product of capitalMPL marginal product of laborMRS marginal rate of substitutionMTR marginal tax rateNGO nongovernmental organizationNIT Negative Income TaxNLMS National Longitudinal Mortality StudyNPL New Poor Laws

Abbrev ia t ions xxix

NREGS National Rural Employment Guarantee SchemeNSS National Sample SurveyOECD Organization for Economic Cooperation and DevelopmentOLS ordinary least squaresOPL Old Poor LawsOPM official poverty measurePA poverty assessmentPDC poverty deficit curvePIC poverty incidence curvePIT poor institutions trapPMT proxy means testPPP purchasing power parityPSU primary sampling unitPV present valueRCT randomized control trialSDG Sustainable Development GoalsSNAP Supplemental Nutrition Assistance ProgramSOD second order dominanceSPM supplemental poverty measureSSA sub-Saharan AfricaSSN social safety netSSPL social subjective poverty lineSUTVA stable unit treatment value assumptionSWF social welfare functionTA Technical assistanceTD targeting differentialTFP total factor productivityTFR total fertility rateTOD third order dominanceUB unemployment benefitsUK United KingdomUS United States of AmericaUSAID United States Agency for International DevelopmentWDR World Development ReportWGI World Governance IndicatorsWHO World Health OrganizationWTO World Trade Organization

NOTAT ION

C ConsumptionCI concentration indexCF consumption of foodES equivalence scaleF(y) cumulative distribution functionG Gini indexH headcount indexK CapitalL LaborL(p) Lorenz curvem number of goodsM meann population sizeN household sizeN demographics (numbers of people by type)Ni number or share of people/workers in sector iNS normalized sharep percentileP pricesPG poverty gap indexq quantities consumedr rate of interest or correlation coefficients savings rateS share of spending (or share of transfers)se standard errorSPG squared poverty gap indexTD targeting differentialu utilityw wealthW welfare or wage rateWi wage rate in sector iWI Watts indexY income or output

xxx

Notat ion xxxi

X characteristics of a person or householdy real incomeZ poverty line

Note: When there is any chance of ambiguity a full stop (.) is used to denote theproduct of two variables.

3

MeasuringWelfareWhile money can’t buy happiness, it certainly lets you choose yourown form of misery.

—Groucho Marx

At the foundation of most measures of poverty and inequality is a concept of indi-vidual welfare. In economics, “welfare” (or “well-being,” which is used interchangeablyhere) is generally equated with “utility”—a subjective assessment of all the things aperson cares about. Economists have often tried to infer what those things are frombehavior, and one aspect of behavior in particular: what people choose to buy and sellin markets. However, it is now recognized that this provides rather limited data forthat task. Other information has been sought such as self-perceptions of welfare andobservable attainments, such as being well-nourished. A broader concept of welfarehas also been sought, allowing for external evaluations of a person’s welfare that mayor may not accord with their utility, defined as whatever people maximize.

Differences over how one thinks about welfare canmatter greatly to the descriptiveand normative claims made about poverty. The chapter begins with a review of themain conceptual issues in defining and measuring welfare. It then turns to the mainissues to be aware of in empirical implementations and the various measures found inpractice. As we will see, each of the methods has both strengths and weaknesses.

3.1 Concepts ofWelfare

Approaches to welfare measurement differ in terms of the importance attached to theindividual’s own judgments about his or her well-being. They also differ in terms ofthe factors they try to include within a single measure. It is very widely agreed thatindividual welfare depends in part on household command over commodities, but itis also widely agreed that it depends on other things as well. The debates are mainlyabout what other factors are relevant and how they should be weighted. There are alsodifferences in how one thinks about the concepts of poverty as “absolute” or “relative”(recalling the discussion in chapter 2). To understand the difference between the mea-sures found in practice and judge their relative merits one needs to understand theirconceptual foundations.

131

132 m e a s u r e s a n d m e t h o d s

WelfarismThe standard economic approach to monitoring social progress overall and assessingpolicies aims to rely solely on the individual welfare levels in the relevant population.Social states are judged by (and only by) individual welfare levels. (This is some-times called “individualistic.”) This approach has its roots in classical utilitarianism(section 1.5), although it is more general, as we will see.

But what do we mean by “welfare levels”? One specific definition of welfarism saysthat we should strive to base welfare comparisons and public policy decisions solelyon the individual utilities, defined as what people maximize in their own choices.It is clearly a big step to equate the personal objective that guides one’s choices—the utility function that represents the set of indifference curves, such as in boxes 1.4and 3.1—with one’s “welfare” or “well-being” (interchangeable terms for the presentpurpose).

An important message of this version of welfarism is that, in assessing individualwell-being, one should avoid making judgments that are inconsistent with the prefer-ences that guide people’s own choices. So this version of welfarism is fundamentallyopposed to paternalism—any presumption that someone else knows what is good foryou even if you do not agree. Each person is presumed to be a rational actor maxi-mizing his or her utility. This approach will include all those commodities that peoplechose to consume in assessing their welfare. But it does not stop there. “Utility” ineconomics is whatever people care about. To say that this only includes market com-modities is an unjustified specialization.1 As long as markets exist and are deemedto be competitive, prevailing prices can be used for aggregating the commodities con-sumed and in deflating for differences in the cost-of-living to derive the welfaremetric.However, this is a partial welfare metric to the extent that people also care aboutnon-market goods.

The utility-based approach draws on a model of rational consumer choice. Theessence of the approach is the idea of a utility function (recall boxes 1.4 and 1.9). Thisserves two distinct roles in utility-based welfarism. First, it is a convenient represen-tation of the consumer’s preferences over her affordable consumption bundles. Theconsumer is presumed to be able to order those bundles from the best to the worstand pick the best among the feasible options. In this first role, the utility functionis nothing more than an analytically convenient way of representing the consumer’spreferences. Box 3.1 summarizes the standard model.

Box 3.1 Consumer Choice

Each person is assumed to have a preference ordering over her budget set,defined as the set of all feasible consumption bundles (which can be taken toinclude leisure). To keep things simple, suppose that there are two goods, foodand clothing, consumed in the amounts QF and QC with prices PF and PC (“F” for

1 As we will see, some critiques of welfarism have been based on that overly narrow specialization;while relevant in some cases these critiques lack more general validity.

Measur ing Wel fare 133

food and “C” for clothing). Total spending on the two goods is denoted Y , andthis is held constant in this thought experiment. The affordable sets of bundles(QF,QC) are those for which:

PFQF + PCQC ≤ Y.

The consumer is able to rank the affordable combinations, (QF,QC), satisfy-ing this equation. Assuming that more of either good is better (often called the“non-satiation” assumption), we only need to consider the budget allocationsthat exactly absorb the available budget (since if one is at an interior point wherePFQF + PCQC < Y , then it is possible to afford more of one good with no less ofthe other). The consumer is assumed to be rational, meaning that she picks herpreferred bundle. And the economy is competitive, in that the consumer cannotalter the prices faced.

It is convenient and generally uncontentious to assume that the preferenceordering can be represented by a continuous utility function, U(QF,QC), whichis maximized subject to the budget constraint (and the time constraint if weadded a third dimension of leisure). The utility function is normally assumedto trace out strictly convex indifference curves (Box 1.4). (Recall that the indif-ference curve gives the locus of all the combinations of the two choice variablesthat attain a given level of utility.) The slope of the indifference curve is calledthemarginal rate of substitution (MRS), which is defined as the increment to con-sumption of one good that is needed to compensate for one less unit of anothergood, while keeping utility constant. The quantities of food and clothing thatmaximize utility are denoted Q∗

F and Q∗C. These will clearly depend on the prices,

PF and PC, and of course income, Y .We can now characterize the consumer’s equilibrium. The key feature will be

that there is no reallocation of the fixed budget between food and clothing thatwould make the consumer better off. Consider figure B3.1.1, which shows thebudget constraint and two indifference curves. (The origin may be at positivevalues to allow for biological minima.) The dashed curve is clearly not the highestlevel of utility the consumer can reach, which is the solid curve. The consumeris indifferent between (say) A and B (even though only B exhausts the budget)but prefers the point C to both. At this point the MRS equals the relative priceof food.

B

AC

Food

Budget line with slope =- relative price of food

Indifference curve,holding utility constant

Clothing Figure B3.1.1 Consumer Choice.

134 m e a s u r e s a n d m e t h o d s

Critics of this model have pointed to situations where personal choices do notappear to be rational. There is a risk here that what is seen to be “irrational” may wellreflect an overly narrow view of what people care about. For example, if we ignore thefact that people (including poor people) have concerns about their relative position insociety, as well as their absolute level of living, we may misunderstand their behavior,such as when they spend their scarce resources on celebrations. To give another exam-ple, people may derive utility today from knowing that they will be less poor in thefuture, and this may influence inter-temporal decision-making.2

In the second role, the utility function is assumed to provide sufficient informationfor assessing whether a person is better off over time, or after a policy change, or indetermining one person’s welfare relative to another. This latter role has proved tobe contentious. One critique questions whether personal preferences should be giventhis status in assessing welfare. Some observers have questioned whether the choicesare morally sound. For example, it is sometimes argued that the decision to buy someluxury good is not ethically defensible when people (including children) are dying frompoverty-related causes in the world.3

Another objection is that “utility” is not something we observe. That is true, butwe can still use the idea to motivate welfare comparisons in more familiar mone-tary terms—in the “income space.” However, we must then be confident that ourmeasure based on the observables is calibrated to be consistent with our concept ofutility. When it is a monetary measure and it is consistent with utility it is called amoney-metric of utility (or sometimes equivalent income). This can be readily definedin theory, by finding the income equivalent to utility at fixed reference prices andpersonal characteristics. (We return to this concept in chapter 4.)

Arguably the bigger problem is heterogeneity across people in their welfare-relevantnon-market characteristics. People differ in the utility they can be expected to derivefrom a given consumption bundle. Some people have characteristics—being elderly,or disabled, or living in a cold climate, or living in a place where public services arepoor—whereby they need more of certain market goods to attain the same level ofutility. Given this heterogeneity, an interpersonally comparable utility index cannotbe inferred by looking only at objectively measurable demand and supply behavior.We may well be able to find a utility function consistent with that behavior, but it willnot be unique; there can be many such functions, varying with personal characteris-tics. So the idea that one can infer utilities of heterogeneous individuals from lookingsolely at their demand and supply behavior is easily ruled out (as explained further inbox 3.2). Given that heterogeneity in how command over commodities translates intowelfare, we will inevitably need to broaden the information base for assessing welfare,beyond observed behavior in markets. This calls for information relevant to people’s

2 This is especially evident in the idea of “rational expectations” in economics as the mean forecastgiven current information. It can be readily shown that such expectations are only rational for a ratherspecial formulation of the underlying utility function; see Ravallion (1986). What are sometimes called“irrational expectations”maywell be perfectly rational once one understands what the decisionmakersconcerned care about.

3 See, e.g., Singer (2010).

Measur ing Wel fare 135

welfare that economists have not traditionally favored, such as data on capabilitiesand also subjective well-being data. We return to these types of data.

Box 3.2 The Challenge of Inferring Utility from Behavior in Markets

A fundamental premise of economics is that the observed commoditydemands and labor supplies of households are utility maximizing. The basicmodel assumes a utility function that depends on the quantities consumed of allgoods and services and the leisure time left after working. (See box 3.1.) Recallthat this function represents consumer preferences in that the utility functionranks commodity bundles identically with how the consumer ranks them. Thisutility function is then maximized subject to the budget constraint, which saysthat the total expenditure on commodities plus the imputed value of leisure time(time taken for leisure times the market wage rate) cannot exceed “full income”given by the value of the time endowment plus all other income (including profitsfrom own enterprises).

If demands and supplies are consistent with this model, then we can ingeneral solve backwards from the observed demands and supplies of a givenindividual to recover a utility function that is consistent with the choices made.When we come to compare people in different households, with different sizesand demographic compositions, and differences in other characteristics (suchas health and disability), we must allow the possibility that these differencesmatter to both the observed demands and to the level of utility attained giventhose demands. However, we cannot expect that the ways those differences influ-ence utility at given demands (and supplies) will be properly reflected in theobserved demand and supply behavior. In general there will be multitude util-ity functions (reflecting the heterogeneous characteristics) that can support theobserved behavior as an optimum. Thus, we say that the utility function is “uni-dentified” from observed demands and supplies alone when people differ in theirwelfare-relevant characteristics.

Further reading: These issues are discussed further in Pollak and Wales (1979)and Browning (1992).

The upshot of these observations is that there is a deep problem in implementingthe welfarist agenda that we should rely on the utility derived from consumer choicein deciding whether one person is better off than another. The problem stems fromlikely heterogeneity in the utility obtained from given choices. It is a “deep” problembecausemore data on people’s actual choices will notmake the problem go away. For allpractical purposes, we will need to make external judgments in deciding whether oneperson is better off than another. Again, this is not to say we should ignore the dataon choices, but rather to say that we should not kid ourselves that this is sufficientinformation.

136 m e a s u r e s a n d m e t h o d s

In practice, policymakers, evaluators, and most applied economists today arewilling to admit explicit interpersonal comparisons of utility between people orhouseholds with different characteristics when discussing policies, although typicallythese judgments are largely coming from outside economics. (Those economists whoare not willing to admit such information are likely to be left out of informing themany important policy debates that require interpersonal comparisons.) It is generallyunderstood that in thinking about policy, one can make interpersonal comparisons ofwelfare yet respect personal preferences when relevant. And it is clear that the infor-mation that can be brought to bear in making interpersonal comparisons goes beyondobservable behavior in markets. A large literature has emerged on both the theory andmeasurement of the social welfare implications of some long-standing policy issues,built on such welfarist foundations.4

What are the alternatives to welfarism? What arguments are made for and againstthem? Can these alternative approaches help in identifying a reliable metric ofwelfare?

Extensions and Alternatives toWelfarismPoverty assessments are sometimes based on certain elementary achievements—specific forms of deprivation, such as being able to afford to be adequately nourished.One might count how many people do not attain specific nutritional requirementsfor good health and normal activities. Or one might identify a list of other specificdeprivations related to (say) housing conditions, water and sanitation, and ownershipof consumer durables.5 No special role is assigned to consumer preferences in theseapproaches.

Two concerns arise. First, there is a nagging worry about the arbitrariness in decid-ing what dimensions matter and (when necessary) how one should value one type ofcommodity (such as food) against another (clothing say). Second, there is a concernthat these approaches can be overly paternalistic: experts are essentially saying that“we know better than you about what is good for you.” By ignoring preferences onemay well decide that people are worse off after some policy change (say) even if thepeople concerned do not agree. For example, one can imagine relative price changesdue to (say) an external trade reform that are unambiguously utility-increasing yetentail substitution effects that result in lower caloric intakes in the neighborhood ofnutritional norms. A utility-based assessment will say there has been a gain while anobserver concerned only with nutritional intakes will not.

There is scope for an intermediate position. Even if we do not think people alwaysmake the best decisions for themselves, we need not accept that someone else knowsbetter. That would need to be justified in the specific circumstances. Nor does anacceptance of the need for normative judgments in assessing welfare open the door to

4 Normative public finance has been a fruitful field for such work; important early expositionsincluded Atkinson and Stiglitz (1980) and Newbery and Stern (1987). The measurement of social wel-fare within a welfarist framework advanced rapidly in the 1980s; important contributions includedKing (1983) and Jorgenson and Slesnick (1984).

5 See, e.g., Alkire and Santos (2010). Chapter 5 returns to this example.

Measur ing Wel fare 137

unbounded paternalism. It is one thing to say that the set of things we can conceivablyinfer about welfare from behavior alone cannot be sufficient for deciding who is poorand who is not, or in assessing policies, and quite another to argue that preferenceshave no role. There are situations in which we can say something about the revealedpreferences of individuals, which are respected within a structure that still recognizesthe need for external judgments about who is better off.

This echoes the ideas of Rawls (1971), as discussed in chapter 2. We need someindex to identify the least advantaged group, based on primary commodities. It mightbe enough to simply know that “more is better,” although more likely there will betrade-offs involved, requiring an assumption about preferences. Rawls recognized this,but argued that we need focus only on the preferences of the least advantaged group.It is still not an easy task to identify such preferences given heterogeneity, as discussedabove. And there can be a problem of circularity: we might agree with Rawls that it isthe preferences of the poor that matter, but we may well have to make an assumptionabout preferences before we can figure out who is poor. However, there are ways ofaddressing this problem, as we will see in chapter 4.

Once one recognizes the deficiencies in available data, observations on spe-cific deprivations can have an important role within a broadly welfarist approach.We return to this point in section 3.3.

CapabilitiesAn alternative to both the traditional utility-based approach and the specific-deprivations approach has been proposed by Amartya Sen.6 Sen rejects “utility” asthe sole metric of welfare; he also rejects the non-welfarist formulations, such asthose that focus solely on specific commodity deprivations or income alone. Recallingbox 2.1, Sen argues instead that “well-being” is really to do with being well, which isabout being able to live long, being well-nourished, being healthy, being literate, andso on; as Sen (1987, 25) puts it, the “value of the living standard lies in the living, andnot in the possessing of commodities.” In Sen’s view, what is valued intrinsically arepeople’s capabilities to function. “Poverty” is a lack of capability.

There has been much discussion of Sen’s proposal over the last twenty-fiveyears.7 It has come to be seen by many observers as the main competing theoret-ical foundation to the welfarist approach in economics. However, it is possible tointerpret the capabilities approach in a way that is consistent with welfarism onceone allows for heterogeneous preferences over commodities.8 This only requiresthat one thinks of capabilities as the direct generators of utility. A person’s utility isdetermined solely by her attainable functionings, which depend in turn on income,

6 See, inter alia, Sen (1980, 1985a, 1987, 1992).7 There was an early debate between Sen (1979) and Ng (1981). Also see Sen (1987) and the com-

ments therein by Kanbur andMuellbauer. For a thoughtful defense of welfarism against Sen’s critique,see Kaplow (2008, pt. 5).

8 Utility can be viewed as one of the welfare-relevant functionings—the attainment of personalsatisfaction through choice. This interpretation is found in Sen (1992, ch. 3). But it is not commonamong advocates of the capabilities approach.

138 m e a s u r e s a n d m e t h o d s

the prices faced, and her characteristics. Box 3.3 discusses further this interpretationof the capabilities approach. Some might still reject the view that welfare can beequated with the maximand of personal choice.

Box 3.3 AWelfarist Interpretation of Capabilities

In the standard economic model, welfare depends on the consumptionof commodities but preferences over commodities may vary between people.Without loss of generality, we can think of this as a common utility functionthat depends on personal characteristics as well commodities consumed.

An encompassing way of thinking about welfare is to define it as a com-mon function of capabilities—the attainable functionings of that person.We canwrite this as follows:

Utility = U(Functionings).

It is assumed that the function U does not vary across people. (When usingonly a partial set of observed functionings in practice, this assumption neednot hold). Functionings depend in turn on commodities consumed and personalcharacteristics:

Functionings = f (Commodities consumed, characteristics).

Substituting this equation into the first we are back to a more familiar formfor economists:

Utility = u(Commodities consumed, characteristics).

Through economic and social interactions (including via markets), the attain-able functionings of a person and hence the utility derived, will depend onincomes, prices, and characteristics. Capabilities are enhanced by higher incomebut are not solely determined by income. There will exist (in general) a moneymetric of capability-dependent-welfare.

There is a critique one sometimes hears of income-based poverty measures on thegrounds that welfare, including capabilities as an interpretation of welfare, dependson more than income. This misses the mark.9 The capabilities approach points tothe inadequacy of basing welfare assessments on income alone, but so too does thewelfarist approach. Although measurement practices may well be deficient, the useof a monetary metric does not in principle imply that only income matters. Supposethat we agree that welfare is only about capabilities. We can still use an income-based

9 For example, Iceland (2013, 47) criticizes income-poverty measures on the grounds that they“overlook the core problem associated with poverty—that of capability deprivation.”

Measur ing Wel fare 139

measure of the incidence of poverty tomeasure “capability-poverty” as long as capabil-ities depend at least in part on income.10 It is reasonable to assume that more incomeallows one to do more things—to expand the feasible set of functionings. Then all weneed to do is set the poverty line such that a person at that income level will attainthe critical level of capability-welfare that is needed to not be considered poor by thecapabilities approach. So the issue is not the use of income-based poverty measures—which can in principle be constructed to be perfectly consistent with the capabilitiesapproach—but whether the poverty lines properly reflect the cost of a given level ofwelfare at prevailing prices. (We return to these issues in chapter 4.)

Another questionable argument one hears is that “capabilities” are observable but“utility” is not. This too misses the point. The comparison to make is not betweencapabilities and utility but capabilities and consumptions, which are no harder tomeasure. In making comparisons of people with different capabilities we will some-times (possibly quite often) need to decide how we weight one functioning relativeto another. That requires a utility function. By either approach, welfare depends onconsumptions and personal characteristics (box 3.3). That is not then a difference.

Social Effects onWelfareYet another issue underlying differences in how people think about welfare relatesto the role played by “social needs.” Economists have traditionally taken the viewthat a person’s welfare depends solely on her personal command over commodi-ties (and personal and household characteristics). There is no explicit role for socialcontext in assessing poverty. The alternative view is that people have social needsthat depend on context—that poverty can stem from social exclusion. This can entailexplicit exclusion from certain activities (such as being employed), but it typicallymeans more than that—it can also arise from relative deprivation (being poor rela-tive to others in the society one lives in) or a perceived lack of opportunity for futureprogress.

There has been much debate about absolute versus relative poverty. It is some-times argued that, while absolute poverty entails objective deprivation in nutritionand health, relative poverty is only “in the mind”—a subjective, psychological statesuch as envy. This view has come to be seriously questioned by the evidence of bio-logical responses to relative poverty, notably through heightened stress, as indicatedby cortisol levels.11 For example, one study found significantly higher cortisol levelsthirty minutes after awakening among British civil servants with lower socioeconomicstatus.12 Enhanced cortisol levels are found when subjects are placed under evaluativethreat, meaning that they could be judged negatively by others.13

10 More precisely we require that the welfare derived from capabilities is a continuously increasingfunction of income as well as other variables determining capabilities.

11 Cortisol is a hormone produced in the adrenal cortex and released at higher levels when a personis under stress.

12 See Kunz-Ebrect et al. (2004).13 See the meta-study by Dickerson and Kemeny (2004).

140 m e a s u r e s a n d m e t h o d s

Social effects on welfare can be encompassed within both the welfarist and capabil-ities approach to measuring poverty. A simple but attractive formulation by AnthonyAtkinson and Francois Bourguignon (2001) is to say that one is not poor if one is capa-ble of attaining both absolute “survival needs” and minimum “social inclusion needs”for participating in social and economic activity.

The idea of relative deprivation also has relevance for how we think about welfareand measure poverty. Whether starting from a welfarist or non-welfarist position,many people would agree that relative position often matters to people’s welfare. Thewelfarist will say that one’s personal utility is lower at given “own-income” when onelives in a place where all others have a higher income than in a place where everyonehas a lower income—one experiences a disutility of relative deprivation in the formercase. (The non-welfarist will probably point instead to one’s diminished capability forparticipating in social and economic life in the former setting.) The weight attachedto relative position can often be crucial to the conclusions one draws about poverty.box 3.4 illustrates this point with some simple numerical examples.

Box 3.4 Which Distribution Has More Poverty?

Consider two income distributions (think of the units as dollars per day or perhour):

A : (1, 1, 1) B : (2, 3, 10) .

We can all agree B is more unequal. But everyone in B has a higher incomethan in A. If we think of income as the “primary good,” then Rawls’s differenceprinciple (chapter 2) will prefer B to A; the inequality has benefited the poor.Income poverty is lower in B, but inequality is higher.

However, as we also saw in chapter 2, Rawls and Sen have emphasized thatprimary goods are not only commodities, but are broader, potentially includ-ing not being relatively deprived. Suppose instead that welfare is own incomenormalized by the mean. The normalized distributions of welfare are then:

(1, 1, 1) (0.4, 0.6, 2) .

“A” now has both lower inequality and lower welfare poverty for all povertylines less than 1.

More generally, suppose that the welfare of person i is:

αyi + (1 – α)(yi/y) 1 ≥ α ≥ 0.

Here yi is i’s own incomewhile y is themean income of the group. Thewelfaristassessment of which distribution has more poverty is now seen to depend on thevalue taken by the preference parameter α. It is readily verified that the poorestperson is better off in B if (and only if) α > 0.375. That is an empirical question,though hardly an easy question.

Measur ing Wel fare 141

Arguably the most important contribution of the capabilities approach was theexplicit recognition it gave to the fact that households vary in their capacity to convertcommodities into well-being. This was implicit in the mainstream welfarist approach,but in practice that approach was too often simplified to the point of ignoring het-erogeneity in relevant non-income factors in welfare. That mistake is harder to makewhen one thinks about welfare in the space of capabilities. However, as we see later inthis chapter and in chapter 5, some approaches found in practice that have claimed tobe motivated by “capabilities” are seriously oversimplified in other respects.

OpportunitiesThe idea of “opportunities” has motivated another alternative to welfarism. The ideaof inequality of opportunity (INOP) has a long history. As we learned in chapter 1,the surge of attention to inequality in the latter part of the eighteenth century wasfar more about inequality of opportunity than of outcomes. Since then, advocatesof efforts to promote equality of opportunity have been found on both the Left andRight. Roemer (1998) argues that we need only worry about inequalities that stemfrom circumstances beyond an individual’s control—those things that are not tracea-ble to the individual’s own choices (box 1.8). The classic example of a circumstance isparental education. Suppose that the son of well-educated parents mistakenly under-invests in schooling and grows up poor. An opportunities approach may deem him tobe well off based on his parents’ schooling even though his income is low.

According to the supporters of the INOP approach, inequality of outcomes is fineas long as it reflects personal efforts. Efforts are taken to be choice variables thatdepend on circumstances. This approach implies that the welfare metric for assess-ing inequality and poverty should be the component of income or consumption thatis attributable to circumstances. This is often retrieved by regressing income oncircumstances.14

There is a continuing debate on the merits of this view. People make mistakesand chance also plays a role.15 The opportunities approach typically treats mistakesthe same way as well-considered choices. While the ethical status of mistakes is rea-sonably clear in the opportunities approach, many observers will be willing to helpthose for whom past mistaken choices have caused current deprivations. It is surelyunimaginable that any civilized society would do nothing about extreme, possiblylife-threatening, deprivations on the grounds that they are traceable to some mis-taken choices by the persons concerned or their bad luck. Inequalities stemming fromchoices or luck can hardly be banned from public redress.

The INOP approach in practice rests on a regression model of income on circum-stances, which is then used to measure INOP.16 It is acknowledged that income alsodepends on effort, but it is argued that this is chosen by people themselves and is thena function of their circumstances. Thus, the regression of income on circumstances

14 See, e.g., Bourguignon et al. (2007), Barros et al. (2009), and Ferreira and Gignoux (2011). Alsosee the discussion in Roemer (2014). Section 3.3 discusses this approach further.

15 See the discussion in Kanbur and Wagstaff (2015).16 Examples can be found in Barros et al. (2009), Ferreira and Gignoux (2011), and Roemer (2014).

142 m e a s u r e s a n d m e t h o d s

used in the INOP literature can be given a “reduced form” interpretation.17 However, itshould be noted that this predicted value is not in general a utility-consistent welfaristapproach, which would instead use as the welfare metric a monetary equivalent of themaximum utility attainable when effort is chosen optimally, given circumstances. Thisis explained in box 3.5. The implication is that the INOP approach can deem some-one to be better off (worse off) when they do not agree. There are also problems inoperationalizing the idea of a welfare metric that depends solely on circumstances.We return to these problems in section 3.3.

Box 3.5 Measuring INOPWhen Effort Matters to Welfare

Utility can be taken to be a function of income and effort, with the formerentering positively and the latter negatively:

Utility = U(Income, Effort).

Income in turn depends on effort and circumstances:

Income = F(Effort, Circumstances).

Then the chosen level of effort (denoted with a *) depends on circumstances:

Effort∗ = E(Circumstances).

This is the level of effort that maximizes utility. If we substitute the last equa-tion into the second equation for income, then we have an equation for incomeas a function of circumstances. When written as a regression model the pre-dicted value of income based on circumstances has been widely used in the INOPliterature.

The person’s welfare is the maximum level of utility they can derive, whichdepends on their circumstances as follows:

Utility = U[F(Effort∗, Circumstances), Effort∗].

We can see here that circumstances matter to welfare in two ways, namelythrough income, but also independently of income, given that greater effort givesdisutility. (We can always express this level of welfare in monetary units, giv-ing an exact money-metric of utility.) So the predicted value of income basedon circumstances cannot be a valid monetary measure of welfare. And the dif-ference arises precisely because utility depends on effort, as argued in the INOPliterature.

Further reading: See Ravallion (2015a) for further discussion of this problem inhow INOP is measured in practice.

17 The term “reduced form” comes from simultaneous equationmodels in which one (endogenous)variable Y1 is a function of another endogenous variable (Y2), which is in turn a function of exogenousvariables (X). On substituting out Y2, we obtain the reduced form for Y1 as a function of X.

Measur ing Wel fare 143

ALess Ambitious GoalThis tour of the conceptual issues in thinking about welfare must make one skepti-cal of ever coming up with an ideal fully comprehensive and yet operational measureof “welfare,” embracing everything that matters. It might well be preferable to setmore modest goal of measuring “economic welfare” and defining poverty in that morenarrow dimension. It is unlikely to capture everything that matters to a person’s hap-piness (say). But that would be asking too much of any single measure. As long as weagree that a lack of personal command over commodities is an important dimensionof social progress, we are on safe ground in measuring poverty and inequality thatway. However, as we will see, even this less ambitious task still poses challenges forthe analyst.

3.2 UsingHousehold Surveys forWelfareMeasurement

Household surveys are the single most important source of data for making povertycomparisons; indeed, they are the only data source that can tell us directly about thedistribution of living standards in a society, such as howmany households do not attaina given consumption level. Box 1.17 introduced the basic idea. Here we go into moredepth, pointing to the care that must go into setting up and interpreting such data.This section surveys the main issues one should be aware of. Box 3.6 summarizes thekey concepts from statistics used here.

Box 3.6 Some Key Statistical Concepts about Sample Surveys

Sample surveys are used to reduce the cost of estimating parameters of inter-est for the relevant population, which is the set of people you are interestedin making inferences about. The analyst should have a clear idea of what therelevant population is for the context in which she is working.

A sample survey collects data on a subset (sample) of people in the popula-tion, for the purpose of drawing reliable conclusions about some key features ofinterest about that population. Those features are the statistics one is interestedin. The sample is drawn from a sample frame, which may take the form of a list-ing of the population. (If you survey the entire population, then you are doing acensus.)

In using a sample survey to estimate population parameters, one is typi-cally concerned with obtaining statistically unbiased estimates, meaning that insufficiently large samples the survey-based estimate will converge on the truepopulation parameter. One is typically also keen to assure that the sample esti-mates are reasonably precise, meaning that their standard error is low relative tothe parameter estimate.

An important concept is statistical independence. Two events are said to be sta-tistically independent (or simply “independent”) if the probability of one event

continued

144 m e a s u r e s a n d m e t h o d s

Box 3.6 (Continued)

occurring is not altered either way by the fact that the other event has happened.We can extend the same idea to any two variables, which can be said to be inde-pendent if the probability distribution of one variable does not depend on thevalues taken by the other variable. (The probability distribution, or simply “dis-tribution,” of a variable gives the probability of the variable taking each possiblevalue.)

Two samples are independent if the fact of being selected for one of themhas no bearing on the probability of being selected for the other. Independencein the selection of samples is assured by randomization. There are many waysof doing this, but the simplest is to assign a number to each potential samplepoint in the sample frame and draw a subset of those numbers randomly, usinga random number generator. Software for drawing random samples is readilyavailable both within existing statistical software packages (such as Stata, SPSS,or SAS) and stand-alone products (such as the Research Randomizer).

Randomization is an important example of a sampling method. A simple ran-dom sample is just what it sounds like: one lists everyone in the sample frame anddraws a single random sample, containing those who will then be approached tobe interviewed. A more complex form of sampling involves stratification. Hereyou break the population up into well-defined subgroups (strata) and then dosimple random sampling within each stratum, but at different rates. The idea isthat you oversample certain types of people, such as those living in householdswho participated in a public program being studied.

In calculating summary statistics from the sample, one typically wants a goodestimate for the population fromwhich the sample was drawn. This requires thatone weights each observation from the sample according to how many peopleit represents in the population. In effect, the weights allow one to convert theactual sample (however complex its design) into a sample random sample. (Theinverse of the sampling rate is called the expansion factor, giving the number ofpeople in the population represented by that sample point.) These weights areimportant data in their own right and should always be available to users foranything but a simple random sample (in which all sample points can be equallyweighted). The weights are needed to obtain unbiased estimates of the descrip-tive statistics for the population. (In estimating a regression model the case forweighting is less obvious; box 5.12 returns to this case.)

Drawing a simple random sample for a large geographic area can add to thecost of the survey, since one may well end up with a very scattered sample. Andif one does not have an up-to-date census, it may not be feasible to draw a sim-ple random sample. Cluster sampling (also called two-stage sampling) can thenhelp. By this method, one first randomly samples clusters of households, suchas villages or city blocks; these can be called the primary sampling units (PSUs).PSUs are picked with probability proportional to their size, as usually based onthe latest Census. One then samples households randomly within the selected

Measur ing Wel fare 145

clusters, after doing a complete listing of the households in each sampled clus-ter. If cluster sampling has been used, it is often important to know how this wasdone; for example, if only one cluster was picked in each of the regions then theregional poverty mapmay be quite misleading. And one should be wary of havingtoo many stages to the sampling since the precision of estimation for populationparameters may fall considerably.

Units within the same cluster cannot be considered to be independent, asthey may well share some common attribute (such as associated with living inthe same village). An important difference between stratification and clusteringis that the former typically increases the precision of your estimates from thesample while the latter reduces that precision. The estimates of the samplingvariance need to be adjusted (upward) for clustering. The extent of the adjust-ment depends on how strongly correlated the outcomes of interest are withinthe clusters (often called the “intra-cluster correlation”). When one is estimatinga regressionmodel (box 1.19) the thing to focus on is the intra-cluster correlationof the regression’s error term.

One of the key design choices for a sample survey is how many house-holds to interview in each PSU versus how many PSUs to sample. If onewants to estimate average values for the PSUs, then one clearly needs adequatesamples at the PSU level. However, for a given aggregate sample size, largersamples at the PSU level reduce precision in estimating population character-istics. The choice depends on how much variance there is within PSUs and thepurposes of the surveys—notably whether the study calls for estimates at thePSU level.

Errors are expected in small samples, even when random. While you cannotexpect to get it exactly right in a small sample, as the sample size increases youshould be getting closer to the truth. If not, then something must be wrongin the estimation method. For example, your sample might not in fact havebeen drawn randomly, so it is not representative of the population. When usinga sample survey to measure poverty, a large-sample bias occurs if rich peo-ple refuse to participate in the survey—they are just too busy, or are neverhome, or maybe your interviewers cannot get past their guard dogs! So youoverestimate the poverty rate. This type of problem is sometimes called sur-vey response bias. While sampling error arises because you do not have a largesample, this form of non-sampling error does not go away as your sample sizeincreases.

Even if everyone who is sampled randomly is available to be interviewed,measurement error is still a concern in surveys (and censuses). For example, somepeople (probably more the rich than the poor) may be making wild guesses atkey components of their income or consumption. Large samples help average-out some types of errors but not all. For example, if (as is often claimed) richpeople deliberately understate their incomes in a survey, then this will persist inlarge samples.

continued

146 m e a s u r e s a n d m e t h o d s

Box 3.6 (Continued)

When analyzing survey data, one often uses the idea of statistical significance.This takes account of both the size of the statistic and its sample standard error,measuring the precision of the estimate. If an estimate is said to be “significantat the 5% level,” what is usually meant is that there is only a 5% chance that thetrue value is in fact zero.

Further reading: The classic treatment of sampling is found in Kish (1965). Morerecent introductions to the topic can be found in Iarossi (2006) and Bryman(2012). A comprehensive overview of surveymethods can be found in Bethlehem(2009).

The household surveys found in practice can be classified along four dimensions:

1. The sample frame: The survey may represent a country’s population, or somemore narrowly defined subset, such as residents of a region. The appropriate-ness of a survey’s sample frame naturally depends on the inferences one wantsto draw from it.

2. The unit of observation: This can be the household itself or the individuals withinthe household or both. A “household” is usually defined as a group of peopleeating and living together. Household structures can sometimes be complex,such as in societies where polygamy is practiced or where communal living incompounds is common (such as in rural areas of the Sahel region of Africa),making it difficult to distinguish one household from another.18 Most house-hold surveys include some data on individuals within the household, though thisrarely includes their consumptions, which are typically aggregated to the house-hold level; examples include India’s National Sample Surveys (NSS), Indonesia’sNational Socio-Economic Surveys (SUSENAS), and the World Bank’s LSMS sur-veys. An example of a survey which collected individual food consumption datais the survey of rural households in the Philippines that was done by theInternational Food Policy Research Institute (IFPRI) in the 1980s.19 When thereare multi-cell households (associated with different wives) complex surveys arerequired.20

3. The number of observations over time: A single cross-section, based on one or twointerviews within a short period, is the most common. In a panel (also called lon-gitudinal) surveymembers of the same household are resurveyed over an extended

18 See Scott (1980a), UN (1989), and Rosenhouse (1990).19 See Bouis and Haddad (1992).20 De Vreyer et al. (2008) have developed a survey method for multi-cell households and applied

this in Senegal. Also see the application to studying intergenerational inequality in Lambert et al.(2014).

Measur ing Wel fare 147

period. Such surveys are harder to implement and more costly, but have someadvantages (box 3.7).

4. The principal living standard indicator collected: The most common indicators ofpoverty used in practice are based on household consumption expenditure orhousehold income. Some surveys collect both (such as Indonesia’s SUSENASand the World Bank’s LSMS), but others specialize (e.g., India’s NSS does notinclude all income sources, while most of the household surveys available forLatin America do not include consumption). Not having both income by sourceand expenditure by type can be a serious limitation for certain purposes, includ-ing assessing the poverty impacts of changes in prices. (We return to thisapplication.)

Box 3.7 Panel Data and Its Applications

Most surveys entail interviewing members of one household over a shortperiod of time (a few days or possibly in just one visit). This is a singlecross-sectional survey—by far the most common form. By contrast, in a panelsurvey, two or more rounds of survey data are collected on the same household.There is often a reasonably long period (a year is common) between successiveinterviews.

With such data one can better understand poverty dynamics—the transitionsinto and out of poverty. Consider the table B3.7.1 classifying the population intofour groups, labeled (in italics).

Table B3.7.1 Poverty Dynamics

Poor in Both Years Escaped Poverty(i.e., poorin the first year, but not insecond)

Poor in First Year (sumof row)

Fell into Poverty(i.e., notpoor in the first year, butpoor in second)

Not Poor in Either Year Not poor in first year(sum of row)

Poor in second year(sum of column)

Not poor in second year(sum of column)

Population (sum of allfour cells)

With two cross-sectional surveys one can put numbers in the row and columntotals, but one has no idea about the inner four boxes (in italics). The povertycounts may even be the same in the two dates, yet that is consistent with bothcomplete persistence (the same people are poor in both years) and complete“churning” (all those who were poor in the first year escaped poverty, while allwho were not poor in the first year fell into poverty in the second). More likely

continued

148 m e a s u r e s a n d m e t h o d s

Box 3.7 (Continued)

the truth is somewhere between the two. Only with panel data can one completethe table, filling in the inner four boxes.

Another application is in studying mobility—the movements of people up ordown the income or other ladder. For example, one might study the intergener-ational correlation of incomes or schooling (such as when one asks how manychildren of illiterate parents became literate). This type of question is clearlyimportant to measuring and understanding inequality of opportunity in soci-ety. When studying income mobility, there are various measures that have beenproposed, including the correlation coefficient between incomes at date 1 andthose at date 2 and the rank correlation coefficient. Not all these measuresrequire panel data; for example, one can ask about the respondent’s parents in across-sectional survey.

While panel data have advantages, they are more costly to collect since onemust find the same households. In any changing population, a panel survey can-not be representative at all dates; typically it is only so for the first survey round.There can also be biases due to attrition, whereby some nonrandom subsampledrops out of the panel. This would be the case if attrition is due to householdswith a higher propensity to migrate for work. And time-varying measurementerrors can be a concern; at least some of those “off-diagonal” elements in thearray above (those who moved in or out of poverty) will be measurement errors.(For example, if a household’s income was underestimated in period 1 this mightbe corrected in period 2.)

Three well-known examples of panel data sets are the University ofMichigan’s Panel Study of Income Dynamics in the United States, the VillageLevel Surveys by the International Crops Research Institute for the Semi-AridTropics in India, and the Russia Longitudinal Monitoring Study run by theUniversity of North Carolina over the last twenty years. Very few surveys collectindividual consumption data on a longitudinal basis (an exception is the afore-mentioned IFPRI survey for the Philippines). A modified version of the classicpanel has been used in some LSMS surveys, whereby half of each year’s sampleis resurveyed the following year. This cuts the cost of forming a panel data set,while retaining some of the advantages.

There are some examples of panel data sets that have been constructedfrom existing data sets, rather than being designed as longitudinal surveysfrom the outset. One example is for China, where the samples for the cross-sectional urban and rural surveys done by the National Bureau of Statistics arenot rotated every year. It is thus possible to construct panels for some periods(Chen and Ravallion 1996).

A second example is the study by Chetty et al. (2014) of intergenerationalincome mobility in the United States. Chetty et al. use income tax records tolink children to their parents (who had typically filed for them as dependentsprior to leaving home). They find that measures of income mobility have been

Measur ing Wel fare 149

quite stable since the 1970s. For example, they estimate that the probability ofa child born into the bottom quintile of incomes rising to be in the top quintileas an adult was 0.08 for those born in 1971 versus 0.09 for those born in 1986.This seems puzzling given the rise in income inequality although (as Chetty et al.note) a large amount of that rise has been at the very top of the distribution inthe United States.

Further reading: See Ashenfelter et al. (1986) on the arguments for and againstcollecting panel data. On using panel data to study poverty dynamics (in the con-text of testing the performance of a safety net), see Ravallion et al. (1995). On themeasurement of mobility, see Fields (2001, chs. 6 and 7). On the implications ofmeasurement error in panel data, see Glewwe (2012).

The most common survey used in poverty analysis is a single cross-section fora nationally representative sample, with the household as the unit of observation(though with some information obtained from specific individuals), and it includeseither consumption or income data. The following are the main problems to be awareof when interpreting household consumption or income data from such a householdsurvey.21

SurveyDesignEven very large samples may give biased estimates for poverty measurement if thesurvey is not random, or if the data extracted from it have not been corrected for pos-sible biases, such as due to sample stratification (box 3.6). A random sample requiresthat each person in the population or each subgroup in a stratified sample has anequal chance of being selected. This guarantees statistical independence—the assump-tion that underlies most of the results used routinely in making statistical infer-ences about population parameters from sample surveys. (See box 3.6 on statisticalindependence.)

Poor people may not be properly represented in sample surveys; for example,they may be harder to interview because they live in remote areas or are itin-erant. Indeed, a household survey may miss one distinct subgroup of the poor:those who are homeless. Also, some of the surveys that have been used to meas-ure poverty were not designed for this purpose, in that their sample frame wasnot intended to span the entire population. Examples include labor force sur-veys, for which the sample frame is typically restricted to the “economically activepopulation,” which precludes certain subgroups of the poor. Key questions to askabout any survey are: Does the sample frame span the entire population? Is

21 There are a number of other issues in survey design which I will not cover here, including ques-tionnaire design and field organization. See Iarossi (2006), Bethlehem (2009), and Bryman (2012)for useful overviews of these issues. Also see UN (1989). The classic LSMS questionnaire design isdescribed in Grootaert (1986) and Ainsworth and van der Gaag (1988).

150 m e a s u r e s a n d m e t h o d s

there likely to be a response bias, in that the likelihood of cooperating with theinterviewer is not random (box 3.6)?

Naturally selective response—whereby some types of households are less likely toparticipate in surveys—can be a serious concern when measuring poverty or inequal-ity. The expectation is that it tends to be the relatively well-off who are less inclinedto participate. Then we will overestimate the poverty rate unless the bias can be cor-rected; the implications for measuring inequality are theoretically ambiguous.22 Weshall return to this issue below.

A question for survey design is whether those who agree to participate should bepaid. Practice is uneven in this respect with some surveys making payments (oftenmodest) and others not. The use of new technologies for doing surveys (such asmobilephones) has also brought up this issue. Response rates tend to be lower than for house-hold surveys, so it may seem attractive to find some form of compensation, such as bygiving free phone time to those who agree to participate. However, there is a risk thatsuch practices will actually make matters worse; yes, the overall response rate will rise,but the samplemay well be evenmore biased, with less representation by the rich. Thisis explained further in box 3.8.We shall return to the problems of selective response—whereby some types of households are less likely to participate in surveys—when weconsider measurement errors further below.

Box 3.8 The Economics of Survey Participation

Survey participation is a matter of individual choice; nobody is obliged tocomply with the statistician’s randomized assignment. There is some perceivedbenefit from compliance—the satisfaction of doing one’s civic duty—but thereis a cost as well. That cost can be expected to rise with income. For example,the opportunity cost of the time required to comply rises with income (dueto higher wage rate), while the time itself is roughly independent of income.The potential survey respondent must weigh the perceived benefits againstthe cost.

It seems reasonable to expect that the marginal cost (MC) of survey partici-pation rises with participation (as measured by the time spent doing the survey);the longer you spend doing the survey, the more it starts to eat into other val-ued activities. We can also assume that higher income implies a higher MC ofparticipation. The latter property can be rationalized in terms of the foregoneincome of time spent doing a survey, which will be higher for those with higherwage rates.

22 Notice that selective compliance is not the same as making income transfers between the richand poor, which must change inequality measure. With selective compliance one is moving shares ofthe population, which creates the ambiguity. This is explained inmore technical terms in Korinek et al.(2006).

Measur ing Wel fare 151

Marginal cost ofsurvey participation

Marginal benefit of participation

Survey participation rate

Marginal costor benefit

Higher marginal costat higher income

Figure B3.8.1 The Choice to Participate in a Survey.

It is reasonable to assume that the marginal benefit (MB) of participation isnot affected by participation, or at least does not rise with participation. Let usalso assume that the MB does not rise with income.

Under these conditions, there is an optimal level of individual participation,equating MB with MC, as illustrated in figure B3.8.1. As income rises the desiredparticipation falls. The rich will be less likely to comply than the poor.

A fixed fee paid to those who agree to participate will increase the probabil-ity of participation, but it can also increase the likelihood of a bias whereby theresponse rate falls with income. This happens if the fee is a stronger incentive forthe poor to participate. So compensating survey participants may increase yoursample size but reduce your ability to draw valid inferences about the distributionof income in the population from that sample.

Further reading: Amore complete discussion of this topic can be found in Korineket al. (2006). In a more elaborate model of survey participation, Korinek et al.(2006) show that under certain conditions one can get an inverted U relation-ship, whereby the poorest and the rich are less likely to want to participate in thesurvey than middle-income groups.

There are various methods of sampling that can help achieve a more cost-effectivesurvey than would be possible with simple random sampling (box 3.6). Stratified ran-dom sampling—whereby different subgroups of the population have different (butknown) chances of being selected but all have an equal chance in any given subgroup—can increase the precision in poverty measurement obtainable with a given numberof interviews; for example, one can oversample certain regions where the poor arethought to be concentrated. Cluster sampling, by contrast, reduces precision, since thesurveyed households within a given cluster cannot be considered independent (boxes3.6 and 3.9).

152 m e a s u r e s a n d m e t h o d s

* Box 3.9 The Design Effect on Standard Errors

One is often interested in having a sufficiently large sample for each PSU,such as for measuring geographic variables used to help explain poverty or evenpoverty rates by PSU. This calls for adequate samples at the second stage of thetwo-stage sampling design (box 3.6). But this leads to a further problem. For agiven aggregate sample size, larger samples at the local level increase the stand-ard error of the overall estimate of the poverty measures or other populationparameters. This is called the design effect (DE). This is the ratio of the actualvariance (for a given variable in the specific survey design) to the variance in asimple random sample. It can be shown that the design effect is given by

DE = 1 + ρ(B – 1),

where ρ is the correlation coefficient within the PSU and B is the sample sizedrawn in each PSU. Study designs can thus face a trade-off between the need forprecision in estimating the population parameters of interest and the ability tomeasure the things one is interested in at PSU level.

Further reading: The classic treatment of this topic is Kish (1965, ch. 5).

The choices made in questionnaire design can matter to the measures obtained froma given sample. Qualitative research methods and pilot testing can improve surveydesign, to assure that the subject, phrasing, and sequencing of questions can addressthe hypotheses to be tested. Focus groups can be a useful qualitative tool in the designstage of a survey, such as in formulating relevant questions. Pilot testing is essentialfor any draft questionnaire.

One general point should be clear: one should be aware of any significant changesin survey design across the domain of the poverty comparison, such as differences inthe sample frame or questionnaire. Changes in the wording of a question, or changesin the location of the same question in a survey instrument, can change the results.23

Goods Coverage and ValuationThe coverage of goods and income sources in the survey should be comprehensive,covering both food and non-food goods whenmeasuring consumption, and all incomesources when measuring income. Consumption should cover all monetary expendi-tures on goods and services consumed plus the monetary value of all consumptionfrom income in kind, such as food produced on the family farm, or hunted and gath-ered from common property resources, and the imputed rent for owner-occupiedhousing.24 Similarly, the income definition should include income received in kind

23 See, e.g., Kilic and Sohnesen (2014), who cite other relevant evidence.24 For further discussion of consumption and income definitions in household surveys, see UN

(1989). On measuring consumption, also see Deaton and Zaidi (2002).

Measur ing Wel fare 153

although practices vary. Local market prices often provide a good guide for valuationof own-farm production or owner-occupied housing. The valuation of noncash bene-fits from public services is often difficult, though potentially important. For transfersin kind of market goods, prevailing prices are generally considered to be satisfactoryfor valuation. Non-market goods (such as free use of a public health clinic or school)present a more serious problem, and there is no widely preferred method. A separatemonitoring of the use of public services by poor people will be needed.

A common problem facing the welfare analyst using a household survey is that thesurvey may not be properly integrated, in that categories do not match in relevantways across different segments of the survey. For example, to evaluate the welfareeffects of a change in food staple prices in a food-producing country, it is not enoughto know the budget shares of consumption at the household level—one must alsoknow household food production. Whether a household gains or losses from a changein the price of food depends on consumption net of production; if you consume morethan you produce of some good, then you will be worse off when the price of thatgood rises. However, it is quite common that household surveys in rural areas eitherdo not include data on farm production, or they do not use the same commoditycategorization in the consumption and production schedules. This can make thosesurveys virtually useless for analyzing certain policy problems such as trade reforms.(Chapter 9 discusses trade policies further.)

Variability and the Time Period ofMeasurementIt has long been recognized that the existence of variation over time in incomes and/orprices has implications for the definition and measurement of “real income.” Incomeobserved over a relatively short period of time may be deceptive about economic wel-fare. John Hicks (1939, 176) defined a person’s “income” as “what he can consumeduring the week and still expect to be as well off at the end of the week as he was atthe beginning.”

It is sometimes argued that what we really need to measure is “wealth.” A commoneconomic definition is the discounted present value of all future incomes, althoughthis is making some rather strong assumptions (box 3.10). Finding a unique definitionof wealth that can be considered realistic and comprehensive has proved difficult inpractice.

Box 3.10 Wealth as the Present Value of Future Income

The present value (PV) of a person’s current and future incomes is a commoneconomic definition of “wealth.” To obtain the PV, we cannot simply add up allcurrent and future incomes, since this would treat $1 in the future as equivalentto $1 in your hand today, which cannot be right. We need to discount futureincomes. The present value of income over the two periods is given by:

W = Y1 +Y2

1 + r,

continued

154 m e a s u r e s a n d m e t h o d s

Box 3.10 (Continued)

where Y1 and Y2 denote incomes realized at date 1 (“today”) and 2, respectively,and r is the rate of interest, which is the rate at which future income is discountedto obtain the PV. Hence, r is also called the “discount rate,” although this termoften refers to the personal discount rate, which is also called the rate of timepreference, rather than a market rate of interest. To understand this formula(some version of which often appears in economics and finance) note that if youhad the sum Y2/(1 + r) at date 1 and invested it for one year you would have Y2 atdate 2.We can readily generalize this formula to any longer time period. Definingthat stream as the sequence of incomes Y1, Y2, . . . , YT–1, YT from now (date 1) toT years in the future:

W = Y1 +Y2

1 + r+

Y3

(1 + r)2+ . . . ,

YT

(1 + r)T=

T∑t=1

Yt

(1 + r)t.

If the stream of income is constant Y (called an “annuity”) then this simpli-fies to:

W =Yr

[1 –

1

(1 + r)T

].

All this is clearly making some strong assumptions including perfect fore-sight with no uncertainty. (This can be relaxed to perfect stochastic foresight,allowing for unpredictable errors—sometimes called “rational expectations.”)The method also runs into a problem when some markets are missing for somecomponents of wealth.

There is also a practical problem in implementing this definition: we can-not obtain future incomes from a household survey today. However, we can askabout current consumption. Under certain conditions, consumption depends onwealth, not current income. Box 3.11 goes further into this topic.

The existence of income variability over time is one reason some analysts prefercurrent consumption to current income as the indicator of economic welfare. Realincomes of the poor can vary over time in predictable ways (as well as unpredictableones). This is particularly true in underdeveloped rural economies depending on rain-fed agriculture. Under certain conditions, consumption will then reveal permanentincome, as given by the return to long-term wealth. This is an implication of MiltonFriedman’s (1957) Permanent Income Hypothesis (box 3.11).

However, even when those conditions do not hold, such as when credit marketswork poorly, there are often ways in which people can smooth their consumption inresponse to changes in their incomes or the prices they face, such as by drawing onsavings or turning to their friends and family.

Measur ing Wel fare 155

Box 3.11 Inter-Temporal Consumption Choice and the PermanentIncome Hypothesis

Consider the static consumer choice problem discussed in boxes 1.4 and 3.1,but now replace “clothing” by “consumption at date 1” and replace “food” by “con-sumption at date 2.” Utility depends on both, and we assume convex indifferencecurves reflecting the scope for substitution, similarly to box 3.1.

A pioneering model of inter-temporal consumption behavior was provided byFriedman (1957) and is called the Permanent IncomeHypothesis (PIH). This wasdeveloped as an alternative to the simplest Keynesian model, which assumedthat current consumption depended on current income. (Other alternatives werethe relative-income hypothesis of Duesenberry [1949] and the life-cyclemodel ofAndo and Modigliani [1963].) Friedman postulated that both income and con-sumption in any period had both permanent and transient components; we canwrite these as:

Y = YP + YT,C =CP + CT

for income (Y) and consumption (C), respectively. “Permanent income”(YP

)is

taken to be determined by long-run wealth over some time horizon; for a suf-ficiently distant horizon, we can simply set YP = rW. Transient income

(YT

)fluctuates over time, and Friedman assumed that YT had zero mean and wasuncorrelated with the other variables in his model. The key element of the PIHis that permanent consumption is directly proportional to permanent income:CP = kYP. Friedman postulated that the coefficient k depends on factors such asthe rate of interest and preferences.

The idea that current consumption reveals wealth is attractive, but it doesrequire some strong assumptions. In particular, it assumes that the consumerhas perfect foresight and access to a perfect creditmarket. Yet people (and no lesspoor people) grapple with uncertainty about the future and are exposed to unin-sured risks. Andmarkets are incomplete, meaning that not all income-generatingcomponents of wealth have prices attached to them, to enable straightforwardaggregation.

While the strong implications of the PIH (such as CP = kYP) have not receivedmuch empirical support, a number of studies have found that consumptionresponds more to permanent income than transient income. There is normallysome degree of consumption smoothing, even for poor people.

Further reading: For a good discussion of consumption in an inter-temporalcontext, see Deaton (1992). (Deaton won the 2015 Nobel prize in economics.)

This observation has two distinct implications for welfare measurement: (1) cur-rent consumption is almost certainly a better indicator than current income of thecurrent standard of living, and (2) while current consumption is unlikely to be ideal, it

156 m e a s u r e s a n d m e t h o d s

is a better indicator of long-term well-being than current income, as consumption willreveal at least some information about incomes at other dates, in the past and future.

While there is a strong case for preferring consumption over income in measur-ing welfare, it should not be forgotten that a number of factors can make currentconsumption a noisy welfare indicator. Even with ideal smoothing, consumption willstill (as a rule) vary over the life cycle. Thus, two households with different lifetimewealth—one “young,” the other “old”—may happen to have the same consumptionat the survey date. This may be less of a problem in traditional societies where theextended family is still the norm, though that is rapidly changing.

There are other sources of noise in the relationship between current consumptionand long-term standard of living. Different households may face different constraintson their opportunities for consumption smoothing. It is generally thought that thepoor are far more constrained in their borrowing options than the non-poor.

There is also evidence that poor people tend to be less well-insured (box 3.12).While consumption smoothing and risk-sharing arrangements clearly do exist, howwell they perform from the point of view of poor people is a moot point.25 The lit-erature on risk and insurance in poor rural economies suggests three stylized facts:(1) income risk is pervasive; (2) household behavior is geared in part to protectingconsumptions from such risk; (3) the mechanisms of doing so are both private andsocial, the latter comprising various informal risk-sharing arrangements among twoor more households.

Box 3.12 HowWell Are the Poor Insured? Evidence from Rural China

Are all rural households equally vulnerable to uninsured risks? One studyaddressed this question using a six-year household panel data set for rural China.This was not planned as a panel, but the sample design entailed long periods inwhich there was little sample rotation (box 3.7). The study used this constructedpanel data set to test for systematic wealth effects on the extent of consump-tion insurance against income risk. Motivated by the theory of risk-sharing, thetests entailed estimating the effects of income changes on consumption, withcurrent income treated as endogenous, after controlling for aggregate shocksthrough interacted village–time dummies. The study also tested for insuranceagainst covariate risk at village level. To test for wealth effects, the study strati-fied the sample on the basis of household wealth per capita, and whether or notthe household resides in a poor area. The method avoided the problems iden-tified in past tests for risk-sharing, which were biased in favor of showing thatthere was insurance even when there was none.

The full insurance model was convincingly rejected. The lower a household’swealth, the stronger the rejection, in that the implied marginal propensity to

25 Relevant empirical work for developing countries includes Bhargava and Ravallion (1993),Townsend (1994), Ravallion and Chaudhuri (1997), Jalan and Ravallion (1999), and Dercon andKrishnan (2000).

Measur ing Wel fare 157

consume out of current income is higher for poorer households. While there areclearly arrangements for consumption insurance in these villages of southernChina, they work considerably less well for the asset-poor households.

Such results strengthen the case, on both equity and efficiency grounds, forpublic action to provide better insurance in underdeveloped rural economies. Thespecific form that such action should take in given circumstances is still, however,an open question. The results of this study also suggest that, unless credit andinsurance options for poor families can be improved, one should not be surprisedto see persistent inequality, and an inequitable growth process, in this setting.

Further reading: The study referred to is by Jalan and Ravallion (1999), whichgives full details. On the sources of bias in past methods of testing for insurance,see Ravallion and Chaudhuri (1997).

Advocates of using income data in preference to consumption often point to con-straints on inter-temporal choice and risk sharing. However, such constraints do notjustify believing that current income is a better welfare metric than consumption.We need not presume that markets are perfect to expect that consumption will besmoothed to some extent in the face of income fluctuations. Households can save andthey do have foresight.

Another argument sometimesmade for preferring current incomemeasures is thatthey better reflect what is called “potential consumption.” But this too is questiona-ble. First, we can question whether potential consumption is a valid welfare indicator.A poor farmer may get a bumper harvest once in twenty years, but he can hardly bejudged to be no longer poor, even in that fortunate year. Second, even if we acceptthat potential consumption is what we are after, income is hardly a good measure; wewould surely want to know liquid wealth, and here too actual consumption may wellbe more revealing.

A further argument made for using current income to measure economic welfarearises when assessing the distributional implications of taxes or transfers. For currentincome, there is a straightforward accounting identity whereby we can add transfersand subtract taxes. That is not so for consumption, given that there could well bea savings response. We would need to figure out the likely response. However, thispractical advantage of income is not as great as it may seem at first given that incomein the absence of the taxes or transfers will not in general equal income minus thenet transfer (gross transfer minus taxes paid). For example, households can respondthrough their labor supply (recall box 1.4) or private transfersmay respond. Either way(whether using income or consumption), we may end up having to model behavior toproperly assess the incidence and targeting of public spending.

Some of these issues have implications for survey design. Since many of the ruralpoor face marked seasonality, income over a whole year will better reflect living stan-dards for agricultural households than over one quarter say. However, intervieweerecall is imperfect, and can rarely be relied upon to span the frequency of income var-iation. Thus, consumption will often be a better guide, as discussed above. Careful

158 m e a s u r e s a n d m e t h o d s

survey design can also enhance precision in estimating consumption. Better estimatescan generally be obtained by adapting the period of recall to the frequency at which thetype of good is purchased; recall for oneweekmay be fine for food, while a three-monthperiod is probablymore appropriate for clothing, say.With panel data one can enhanceprecision in estimating typical living standards by averaging themultiple consumptionor income observations over time.26

Measurement Errors in SurveysSystematic errors in the incomes or expenditures reported in surveys have impor-tant implications for measures of poverty and inequality based on those surveys.27

Classical measurement error in the reported incomes of sampled households leads tooverestimation of standard inequality measures.28

Two types of measurement error are identified by survey statisticians. The firstis “item nonresponse.” This can occur when some of the sampled households whoagree to participate refuse to answer specific questions, such as on certain componentsof their incomes, which can be sensitive. Missing values are also likely for dwellingrents when the responded owns the dwelling; even when surveys ask for an imputedrent, this is often unknown. Various imputation/matchingmethods address item non-response by exploiting the questions that are in fact answered, although this does notappear to be done as often as it should be in practice. Box 3.13 explains how this canbe done in more detail.

* Box 3.13 Regression as a Tool for Dealing with Missing Data

The essential idea here is to statistically model the observed responses to thequestion for which data are missing and use that model to predict the missingresponses. That can be done using a regression model; recall from box 1.19 thatthis gives the predicted value of a dependent variable (Y) as a linear function ofone or more explanatory variables (X). The dependent variable is the responsewhen it was given and the predictor variables are things that were answered by alarger sample, including those who did not respond.

Suppose we have a full sample ofN households but only a subset ofM(< N) ofthem responded to the income questions. Let the set of income responders be IR.We have K variables that everyone responded to. Then we can imagine runninga linear regression on theM observations with complete data:

Yi = α + β1X1i + β2X2i + . . . + +βKXKi + εi for all i in the set IR.

26 See, e.g., Ashenfelter et al. (1986), Ravallion (1988a), Lanjouw and Stern (1991), Chaudhuri andRavallion (1994).

27 There is a large literature but some important contributions include Van Praag et al. (1983),Chakravarty and Eichhorn (1994), Cowell and Victoria-Feser (1996), and Chesher and Schluter (2002).

28 A classical measurement error has zero-mean and is uncorrelated with the true value of thatvariable. In some applications of the concept, it is also uncorrelated with other relevant variables. Forfurther discussion in the context of inequality measurement, see Chakravarty and Eichhorn (1994).

Measur ing Wel fare 159

Here Yi is the income response, Xki is the kth explanatory variable,k = 1, . . . ,K <M – 1 and εi is the error term, which can include errors in thereported incomes. For example, when the missing data are the dwelling rentals(invariably missing for owner-occupiers), then the X’s should include all thedwelling and location characteristics available in the survey. Such a regres-sion for imputing rents is often called a “hedonic regression.” The parametersα,β1,β2, . . . ,βK are fixed numbers across the sample. The values of these param-eters can be estimated by the method of Ordinary Least Squares (OLS), whichchooses the estimates to give the best possible fit to the data—specifically tominimize the sum of the squared errors. We can then use the predicted valuesfor the rest of the sample, not in the set IR, using the data we have on the X’s forthose observations (and the estimated parameters based on the above model) informing the predicted values.

An alternative to a regression is to use matching methods. These do notrequire a (potentially restrictive) parametric regressionmodel of themissing var-iable. Instead, for each missing value one finds the most similar observation forwhich a response was recorded, where “similar” is defined by the predicted prob-ability of responding (often called the “propensity score,” following Rosenbaumand Rubin 1983).

Further reading: On regression models in general, see Wooldridge (2013). SeeLittle and Rubin (1987) for more on imputation methods. The use of matchingmethods in this context is an instance of their general application to problems ofmissing data, which includes the problem of impact evaluation; we return to thistopic in chapter 6.

Such methods are not an option for the second type of measurement error, namely“unit nonresponse.” Typically there is some proportion of sampled households thatdoes not participate in a survey, either because they explicitly refuse to do so ornobody is at home (box 3.6). Some surveys make efforts to avoid unit non-response,using “call-backs” to non-responding households and fees paid to thosewho agree to beinterviewed (though recall box 3.8).29 Nonetheless, the problem is practically unavoid-able and non-response rates of 10% or higher are common; indeed, there are nationalsurveys for which 30% of those sampled did not comply.30

Under certain conditions, one can correct survey data for selective compliance afterthe data are collected. Anton Korinek et al. (2007) propose a method for addressingthis source of bias. The idea is to use the geographic distribution of survey responserates (the proportion of the original random sample for that area that agreed to beinterviewed) to infer how the probability of agreeing to be interviewed varies with

29 On reducing bias using call-backs, see Deming (1953), Van Praag et al. (1983) and Groves (2006).30 Scott and Steele (2004) report non-response rates for eight countries, which are as high as 26%.

Holt and Elliot (1991) quote a range of 15%–30% for surveys in the United Kingdom. Philipson (1997)reports a mean non-response rate of 21% for surveys by the National Opinion Research Center in theUnited States.

160 m e a s u r e s a n d m e t h o d s

income and other covariates. Under certain conditions one can infer those prob-abilities, allowing for the fact that the measured incomes are biased by selectivenoncompliance.31 Once one has the estimated probabilities one can re-weight the datato correct for the problem. Korinek et al. find that the probability of being interviewedfalls steadily as income rises, from 95% or higher for the poorest decile to only 50% forthe richest. Thus, the observations one has on rich households need to be weighted uprelative to those for poor ones. The key condition for this method to work is that onehas at least someone from each income level who agrees to be interviewed.32 Box 3.14gives a simple example of the idea.

* Box 3.14 Correcting for Selective Compliance in the Simple 2 x 2 Case

Statistics offices often try to correct for selective compliance (whereby certaintypes of people do not respond to surveys) while doing the survey. There arelimits to how effectively this can be done. It is sometimes possible to correct forsurvey bias ex post, using the available data on response rates and the (potentiallybiased) survey statistics.

To illustrate, consider the case of two income groups and two areas, A and Bwith known overall survey response rates of PA and PB. In other words PA% ofthose who were randomly sampled in area A were actually interviewed. There aretwo income groups, “poor” and “non-poor.” Survey-based estimates are made ofthe proportion of people who are poor or mean income (or other stats).

Two key assumptions are made. First, it is assumed that all the poor have thesame response rate (PP) and all the non-poor have the same response rate (PN)and that PP > PN. This is a simple behavioral model of compliance. Second, it isassumed that the probability of compliance does not vary between groups A andB independently of income.

That is enough to correct for the bias due to selective compliance in the sam-ple survey. Consider first the data we have on the data on the response rates,PA, PB. These are the weighted means of the response rates for the poor and non-poor, with weights given by the true (but unobserved) poverty rates in areas Aand B, denoted HA and HB, respectively. Thus we have:

HAPP + (1 – HA)PN = PAHBPP + (1 – HB)PN = PB.

Next consider the estimated poverty rates in A and B, denoted HA and HB.These are determined by the survey response rates of the poor and non-poor, aswell as the true poverty rates; more precisely we have the following formulae forthe observed (but potentially biased) poverty rates:

HAPP

HAPP + (1 – HA)PN= HA

31 See Korinek et al. (2007).32 This is called the “common support” assumption.

Measur ing Wel fare 161

HBPP

HBPP + (1 – HB)PN= HB.

We have four (nonlinear) equations in four unknowns. That does not guar-antee that a solution exists, but it does for all practical purposes in this case.We then solve these four equations for PP, PN, HA, andHB as functions of the dataon PA, PB, HA, and HB. In the special case in which there is no survey bias—thatpoor and non-poor are equally likely to respond

(PP = PN

)—we have HA = HA and

HA = HA.For example, suppose that the estimated poverty rates are 55% and 31% and

survey response rates are 66% and 58% for A and B, respectively. Then PP = 0.9,PN = 0.5, and the true (unbiased) poverty rates are HA = 40% and HB = 20%.We can apply the same idea to other statistics, such as means.

Note on the literature: This simple 2 x 2 example is purely for expository purposes.Of course, in reality we have many income groups and areas. A more generalestimation method is found in Korinek et al. (2007).

These errors are of special concern in the context of measuring poverty andinequality, which depends on responses to questions pertaining to incomes and expen-ditures and those questions can sometimes be sensitive. Some analysts have arguedthat misreporting of incomes in household surveys justifies scaling up the income dis-tribution so that its mean equals GDP per capita or Private Consumption per capita inthe National Accounts System (NAS).33 We can call this the uniform rescaling method.This method ignores the fact that what is called “Private Consumption” in the NASincludes components of institutional consumption, as well as personal consumption,which could introduce a systematic overstatement of household welfare levels. Themismatch in what is being measured is even worse if the scaling up is to GDP percapita itself, rather than only to per capita consumption from the NAS, since GDPincludesmany things that are not attributable to current household incomes or expen-ditures. The extent of the gap between the two data sources depends on the economy.In economies with substantial subsistence agriculture and other forms of productionfor own consumption, it is unlikely that the national accounts system provides a moreaccurate portrayal of real consumption than the surveys. For example, the latter willtypically include information on consumption from own production at the householdlevel.

Income underreporting or selective compliance in surveys is a real concern inmeas-uring poverty and inequality. However, it is unlikely that the proportionate error isconstant, leaving relative inequality unchanged.34 If richer households tend to under-report more than middle-income or poorer households, then the uniform rescaling

33 See, for instance, Bhalla (2002) and Sala-i-Martin (2006). The series of poverty measures backto 1820 used in chapter 1 from Bourguignon and Morrisson (2002) also uses this method. However,in this case the authors had no choice, since the historical survey data were mostly long lost.

34 See, e.g., Banerjee and Piketty (2005) and Korinek et al. (2006).

162 m e a s u r e s a n d m e t h o d s

method will “over-correct” at the bottom of the distribution, leading to an underesti-mation of poverty incidence. It appears likely that richer households are also less likelyto participate in surveys.35 As noted, this has theoretically ambiguous implications forinequality. Evidence for the United States indicates that selective compliance entailsa non-negligible underestimation of overall inequality.36 Poverty measures are overes-timated, but this bias is small in a neighborhood of the US poverty line. By contrast,assuming instead that the response rate is a constant (independent of income) sub-stantially underestimates the poverty rate and does nothing to correct the bias in theinequality measure.

The likelihood that underreporting and selective compliance lead to an underesti-mation of “top incomes” in surveys has led to interest in the use of supplementarydata from income tax records.37 The methods typically employ Pareto’s Law for fit-ting the upper tail (recall box 1.21). (The figures on the income shares of the top 1%in the United States in figure 2.4 were estimated this way.) This method also has itspros and cons. Underreporting and compliance problems are undoubtedly less severein countries (typically rich countries) where the income tax system is well developed,but the problems remain elsewhere. The measures obtained this way need not accordwell with the types of real income measures preferred when using sample survey data;in particular, the income concept in the tax records need not accord with the conceptthat is preferred for measuring poverty or inequality and it is often difficult to iden-tify households from income tax records, so it is not possible to adjust for differencesin family size or composition. Most important in this context, the method is bettersuited to correcting the upper end of the distribution in countries with well-developedincome tax systems. It is less relevant for measuring poverty.

Interpersonal Comparisons ofWelfareHousehold size and demographic composition vary across households, as do prices.These factors generate differences in well-being at given household expenditures.There are various approaches to normalize for these differences based on demandanalysis, including equivalence scales, cost-of-living indices, and equivalent incomemeasures.38 The basic idea of these methods of welfare measurement is to usedemand patterns to reveal consumer preferences over market goods.39 The consumeris assumed to maximize utility, and a utility metric is derived that is consistent withobserved demand behavior, relating consumption to prices, incomes, householdsize, and demographic composition. The resulting measure of household utility will

35 This is consistent with the findings of Korinek et al. (2007).36 See Korinek et al. (2006).37 See, e.g., Atkinson, Piketty, and Saez (2011) and Piketty (2014).38 There are a number of good expositions of these topics, including Deaton and Muellbauer

(1980). Empirical examples can be found in, inter alia, King (1983), Apps and Savage (1989), Jorgensonand Slesnick (1989), De Borger (1989), and Ravallion and van de Walle (1991a).

39 Though there are measures based on the distance function, or “quantity metric utility,” whichdo not assume that markets exist; see Deaton and Muellbauer (1980) for discussion and references.However, data on preferences are still required.

Measur ing Wel fare 163

typically vary positively with total household expenditures, and negatively withhousehold size and the prices faced.

The most general formulation of this approach is the concept of “equivalentincome,” defined as theminimum total expenditure required for a consumer to achievehis or her actual utility level when evaluated at predetermined reference prices anddemographics fixed over all households.40 This gives an exact monetary measure ofutility; indeed, it is sometimes called money-metric utility (MMU). Quite generally,equivalent income can be thought of as money expenditures (including the value ofown production) normalized by the two deflators: a suitable price index (if prices varyover the domain of the poverty comparison) and an equivalence scale (since householdsize and composition vary). These deflators are discussed further in the next section.

There are a number of concerns that one should be aware of in all such behavioralwelfare measures. A serious problem arises when access to non-market goods (envi-ronmental characteristics, public services, demographic characteristics) varies acrosshouseholds, as discussed in section 3.1. Consumptions of market goods only revealpreferences conditional on these non-market goods; they do not, in general, revealunconditional preferences over both market and non-market goods. (For example, ifyou live in a place where there is a good quality and free public health clinic youwill spend less on private health care.) A revealed set of conditional preferences overmarket goods may be consistent with infinitely many utility functions representingpreferences over all goods (box 3.2). It is then a big step to assume that a particularutility function which can be found to support observed consumption behavior as anoptimum is also the one which should be used in measuring welfare.

Household surveys of consumptions and expenditures are the most basic andwidely used data for implementing consumption-based welfare indicators. A sepa-rate community survey—done at the same time as the interviews, and often by thesame interviewers in the enumeration areas chosen for the survey—can provide use-ful supplementary data on the local prices of a range of goods and on the provision oflocal public services. By having the community-level data matched to the householdlevel data one can greatly improve the accuracy and coverage of household welfareassessments.

The recall of interviewees on public services and the prices implicit in their reportedquantities consumed and expenditures on various goods can also be used for thesepurposes. However, there are a number of issues to be aware of. Knowledge about localpublic services depends on usage. This can be an unreliable indicator of actual availa-bility. Estimates of prices (often called “unit values”) can be retrieved from the surveydata by dividing expenditures by quantities at the level of each commodity type. Thiscan be useful extra data, but it needs to be handled with care since richer householdswill tend to purchase higher quality goodswithin each category. Nor can prices for non-food goods be obtained this way as the data rarely allow meaningful comparisons, soonly expenditures are typically obtained in surveys.

40 See, e.g., King (1983). Blackorby andDonaldson (1987) discuss the relationship between such anequivalent income and the “welfare ratio” defined by the ratio of nominal expenditure to the povertyline, defined in turn as the minimum expenditure needed to attain the (fixed) reference welfare level.

4

Poverty Lines

Poverty lines have both descriptive and normative roles. The former is about makingpoverty comparisons over time and space. The latter is in formulating antipovertypolicies. Even before there were poverty measures for descriptive purposes, there wereattempts to define what constitutes a reasonable minimum income level to not beconsidered poor in specific settings for the purposes of policy. Indeed, the basic ideaof such a “poverty line” is one of the oldest concepts in applied economics, going backto at least the eighteenth century, such as in the Speenhamland antipoverty policy (seechapter 1).

The economic interpretation of a poverty line is as the cost of attaining a givenlevel of economic welfare or “standard of living” in different places or different dates.The dependence of cost-of-living (COL) indices and equivalence scales on the choiceof the fixed reference standard is well understood (as we learned in chapter 3). Thekey thing about a poverty line is that the reference is for the minimum level of eco-nomic welfare needed to not be considered “poor.” That can be determined eitherobjectively—meaning that it is set by an observer, based on data—or subjectively,meaning that it is based on what people themselves think about what constitutes pov-erty in the society in question. We consider objective lines in section 4.2, and turn tosubjective lines in 4.3. But first we review past debates about the idea of a poverty line.

4.1 Debates about Poverty Lines

Today almost everyone has heard of the idea of a “poverty line” and has some personalconcept of what standard of living it implies. Poverty lines exist, but views differ onwhat it means to live at the poverty line.

One thing is agreed almost everywhere today: poverty lines are not “survival lines.”It is undeniable that there exist levels of consumptions of various goods (food, cloth-ing, and shelter) below which survival beyond short periods is threatened. However,in most societies, including some of the poorest, the notion of what constitutes “pov-erty” goes beyond the attainment of the absolute minimum needed for survival. Thereference standard of living for defining the poverty line is almost never the lowestlevel of living in society.

One school of thought rejects the use of poverty lines altogether, arguing that aperson with an observed standard of living ever so slightly below some “poverty line”

191

192 m e a s u r e s a n d m e t h o d s

cannot be appreciably worse off than someone slightly above it. Yet one does not haveto believe that there is a jump (a discontinuity in mathematical terms) in any observ-able welfare indicator to justify such a line. Recall from chapter 3 that there is noescaping the need for external ethical judgments in making welfare comparisons. It isentirely defensible for an external observer to judge that there is a qualitative differ-ence in welfare at one or more critical levels in a specific society. Poverty lines can beseen as normative social judgments, with no less validity than (say) one’s inequalityaversion.

It seems that almost everyone has a concept of “poverty,” even if they do not seesome discontinuity. Surveys invariably find that there is a unique level of income abovewhich people in the specific society and time tend to think they are not “poor” butbelow which they are. This point is called the “social subjective poverty line” and it isan important concept that we will return to in section 4.3 of this chapter.

Setting an explicit poverty line can also help focus public attention and action onthe situation of poor people. As we learned in chapter 1, the various poverty linesthat emerged in England, the United States, and elsewhere around the turn of thetwentieth century (in the work of Rowntree, Booth, Hunter, and others) helped manywell-off people comprehend just how little some people had to live on, and this helpedmobilize action to reduce poverty. Similarly, inmodern times, anyone can fairly readilycomprehend just how frugal the material level of living is of someone with less than$1 per day at their disposal. This sharply focuses attention on material deprivation.Even before it was feasible to count how many people were living below some povertyline, setting a line was seen to be a useful step in formulating concrete antipovertypolicies, such as exemplified by the Speenhamland system in 1795.

Some observers have also worried about the judgments that are required in set-ting poverty lines.1 Yet in this respect poverty lines are not fundamentally differentto many other ideas in applied economics. Indeed, the choice of a reference bundleof goods for setting a poverty line is no more inherently arbitrary a judgment thanthat of setting the reference bundle of goods for the Consumer Price Index (CPI). Yetvery few of those who reject the idea of a poverty line as being “arbitrary” would alsoreject the use of a CPI on the same grounds. More generally, in both theory and appli-cation, all measurement of welfare (including COL indices) calls for a judgment onthe reference household characteristics and prices to essentially anchor the “ruler” formeasurement. This can be called the “referencing problem.”2

Another issue often debated is the extent to which poverty lines should respectthe revealed preferences of poor people themselves. We already confronted this issue inthe discussion of welfare measurement in chapter 3. On the presumption that poorfamilies know best how to spend their scarce resources, we should focus on the aggre-gate resource constraints they face. In practice this means that we focus on their totalincome or expenditure rather than how much they spend on (say) calories. The sameissue is confronted in discussing poverty lines. If poor people know best, then wewould want the composition of the poverty bundle used to construct the line to accord

1 For example, an undergraduate textbook on economics notes (with reference to poverty lines):“Critics argue that defining bundles of necessities is a hopeless task” (Case et al. 2012, 375).

2 See Ravallion (2012c).

Pover ty Lines 193

with their spending behavior. This approach rules out what can be termed paternalisticpoverty lines. An example of the latter would be a line that added up the cost of a list ofnormative “basic needs” to obtain some line Z without regard to whether people withan expenditure of around Z in that same setting would split their budget similarly.

This issue becomes important when there are changes in prices. A paternalistic linedoes not guarantee that when people at the poverty line gain (loss) from those pricechanges the poverty count will fall (rise). This is because the paternalistic line does notput weight on the prices that are consistent with the weights chosen by poor familiesthemselves.

The following discussion will focus on the main alternative methods of settingpoverty lines found in practice. It is worth noting at the outset that a number ofthe methods reviewed here have in common the idea of anchoring the monetarypoverty line to an explicit non-monetary indicator of welfare. Box 4.1 explains thecommonly used regression method. In various ways, the methods are trying to intro-duce information on some welfare indicator into the problem of making povertycomparisons.

*Box 4.1 Using a Welfare Regression to Identify the Poverty Line

A number of the methods reviewed in this chapter can be represented in thefollowing generic form. We have a welfare indicator Wi for household or personi which depends on (say) log household income, Yi, as well as other welfare-relevant characteristics, Xki (k = 1, . . . ,K). Let us write this as a simple linearregression model:

Wi = α + β ln(Yi) + γ1X1i + . . . + γKXKi + εi (i = 1, . . ,N).

Here the parameter β is taken to be positive and the error term εi is assumed tohave the usual property of zero mean given the values taken by the regressors.The literature on measuring welfare has provided a number of interpretationsofWi. It might denote the food share, nutritional status, or subjective welfare.

We now ask: How should we deflate money income to be a valid metric ofexpected welfare, denoted E(Wi)? The answer is clear: we find poverty lines thatassure a fixed level of the welfare indicator. These are found by setting Wi inthe above equation to a fixed reference level, Wz, and then solving the followingequation for the poverty line Zi:

Wz = α + β ln(Zi) + γ1X1i + . . . + γKXKi.

The solution is:

ln(Zi) = [Wz – α – γ1X1i – . . . – γKXKi]/β.

continued

194 m e a s u r e s a n d m e t h o d s

*Box 4.1 (Continued)

Now we see that deflating money income by these poverty lines will assurethat we have an exact money metric of the expected value of the welfareindicator, E(Wi), noting that:

ln(Yi/Zi) = [E(Wi) – Wz]/β.

Since Wz is a constant and assuming that β > 0we see that ln(Yi/Zi) is nothingmore than a rescaled version of expected welfare.

This still leaves begging the question of how Wz is set. That will still require ajudgment call, although it is often easier to make that call in the “W space” thanthe “Y space.” For example, we might look at the stipulated nutritional require-ments for good health and normal activities, or we might focus on some obviouspoint on a subjective scale, such as “my consumption is adequate.” Robustness toother choices should always be tested.

4.2 Objective Poverty Lines

A theme of this chapter’s treatment of poverty lines is to view them as deflators toallow for differences in the costs of attaining a reference standard of living. The COLwill typically vary across certain subgroups, such as large and small households, orthose living in urban versus rural areas. Any welfare comparisons, including poverty orinequality measures, will clearly need to normalize measured nominal consumptionsor incomes for these COL differences to obtain real values.3 This is widely understoodin applied economics. The distinctive thing about poverty measurement is that thenormalization is anchored to a reference level of living that is needed to not be con-sidered “poor” in a specific context. Of course, any deflator for COL differences musthave some (implicit or explicit) reference, so poverty measurement is not conceptuallydifferent in this respect. What differs in practice is the type of data that are often usedin deciding what that “poverty reference” should be.

Basic Needs Poverty LinesA common approach in defining a poverty line is to start by identifying certain basicconsumption needs, deemed relevant to the domain of the poverty comparison—thebasic needs bundle. The most important basic need is clearly the food expenditurenecessary to attain the food-energy intake required to support normal activity levels.This is then augmented by an allowance for non-food goods.

3 This gives what Blackorby and Donaldson (1980, 1987) dub a “welfare ratio.”

Pover ty Lines 195

This method can be given an economic interpretation. At a theoretical level, onecan think of the problem of setting a welfarist poverty line as entailing two steps.First, a reference utility level is determined, which can be thought of as the povertyline in utility space. Second, one determines the cost of reaching that utility level in aspecific context, such as “rural areas of country X.” In the space of nominal consump-tion, the poverty line is then the point on the consumer’s cost function—giving theminimum expenditure needed to attain any given utility level—corresponding to thatreference utility.4 The basic needs bundle is one that attains the poverty level of util-ity at prevailing prices. In economic terms, this bundle must be found on the demandfunctions holding utility constant, as explained in box 4.2.

Box 4.2 An Economic Interpretation of the Poverty Bundle

Recall box 3.1. We can readily see how the amounts of food and clothing thatare consumed vary as we move along the same indifference curve, only changingrelative prices, as in figure B4.2.1.

Consider two poverty bundles, A and B, which attain the same level ofutility—the reference level of utility needed to not be considered poor—but at adifferent relative price of food. Bundle A

(Q∗A

F ,Q∗AC

)can be thought of as pertain-

ing to rural areas where food is relatively cheap, while B is for urban areas. As isplain now, one poverty bundle cannot be right if relative prices differ. As long asthere is substitutability (as in the indifference curve in figure B4.2.1), the povertybundles must vary with prices.

The poverty line is the cost of the appropriate bundle. For A this is:

ZA =m∑j=1

PjQ∗Aj .

A

B

Clothing

Food

QF*A

QC*A

Figure B4.2.1 Poverty Bundles withChanging Relative Prices.

continued

4 In more formal terms, let c(p, x, u) denote the minimum cost of the utility level u for a house-hold with characteristics x when facing prices p; then z = c

(p, x, uz

)is the monetary poverty line

corresponding to fixed utility poverty line uz.

196 m e a s u r e s a n d m e t h o d s

Box 4.2 (Continued)

(We can write a similar equation for B.) Here the notation allows for any numberof goods m goods (since the math liberates us from the constraint of a two-dimensional graph.) The Q∗

j s are called the utility-compensated demands. Theseare the quantities demanded at the poverty level of utility when facing theprices P1, P2, . . . , Pm. (They are called “utility-compensated” because they give theconsumer’s demand at a given level of utility.)

When the poverty line is the above price-weighted aggregate of the utility-compensated demands, corresponding to the poverty level of utility, the lineis automatically the minimum expenditure to attain that level of utility.Furthermore, if a person has an actual expenditure less than Z, then they have alevel of attained utility less than the poverty level of utility.

Historical note: One of the many contributions of John Hicks to economics wasthe idea of the “utility-compensated demand function,” found in his 1939 vol-ume Value and Capital. This is sometimes called the Hicksian demand function.

However, this does not get us very far in solving the problem of setting povertylines, but merely translates it from one space (consumption) into another (utility).For many of the purposes of measurement, the welfarist framework does not includea sufficiently well-defined notion of what poverty means. As discussed in chapter 3(section 3.1), non-welfarist approaches to drawing poverty lines can be interpreted asattempts to expand the information base used in measuring poverty to include indi-cators of capabilities—the attainments of specific valued functionings, such as beingadequately nourished to support normal activity levels.

Food-energy requirements for normal activity levels have been widely used in set-ting poverty lines. Requirements vary across individuals and over time for a givenindividual.5 Nutritionists have estimated food-energy requirements for maintainingbody weight at rest, processing food, and sustaining various activity levels.6

In using these estimates in calculating a poverty line, a normative judgment mustbe made about activity levels. Actual activity levels may reflect poverty. It is plausi-ble that the poorest are underweight and that their activity levels are constrained bythis fact. In such a setting, incorporating existing differences in activity levels (and,indeed, weights) into subgroup poverty lines will clearly lead to a bias in the povertycomparison, in that the poverty lines need not be clearly anchored to a reasonableconception of what would constitute a fixed standard of living over the domain of thatcomparison.

Having set food-energy requirements how can we find a monetary poverty line?One approach is to simply ask: what is the total expenditure (or income) of people

5 For further discussion of the implications of variability in requirements for measuring undernu-trition and poverty, see Osmani (1987), Kakwani (1989), and Dasgupta and Ray (1990).

6 The classic source is WHO (1985). FAO (2001) provides an update with more detail on, inter alia,age-specific requirements and allowances for different activity levels.

Pover ty Lines 197

whose average caloric intakes meet their requirements? This is the food-energy intakemethod. A second approach (and the main rival to the first method in practice) findsinstead a combination of foods that attains the food-energy requirements, as well asother nutritional needs, and calculates the cost of that bundle, to give a food povertyline. This is then augmented for non-food needs. This is the cost-of-basic needs (CBN)method.

One approach to determining the bundle is to minimize the cost of achieving thefood-energy requirements at given prices. A potential problem in practice is that thismethod can yield a composition of the diet that is alien to existing food habits, whichare often well defined by traditions going back centuries. The minimum cost of thestipulated number of calories may be a good deal less than the expenditure level atwhich the poor typically attain that calorie level. Attaining adequate nutrition is nei-ther the sole motive for human behavior, even for most of the poor; nor is it the solemotive in food consumption.

A better approach is to constrain the choice of the bundle in a way that is consistentwith prevailing local food tastes. A simple numerical approach proceeds by firstmakinga guess of the poverty rate—the percentage living below the poverty line. Let’s say weguess initially that the poverty rate is 30%. The bundle of goods can then be chosento accord with the consumption pattern of (say) people living between the 25th and35th percentiles (or a tighter interval if the sample size permits), as estimated from ahousehold expenditure survey. The actual consumptions of this group are then scaledup or down (keeping all relativities within the bundle the same) until they achievethe stipulated food-energy and other nutritional requirements. Prices and tastes mayvary within a country; a traditional food staple in one region may be alien in another.To deal with this we can use the average consumption bundle in each region of thoseliving between the 25th and 35th percentiles nationally. Again, this is scaled up ordown to reach the nutritional norms.

Having set the food bundle, its cost should ideally be estimated separately for eachof the subgroups in the poverty profile. In practice the main concern is with the varia-tion in food prices between regions and (particularly) between urban and rural areas.It is has become fairly common for statistical agencies to monitor prices in both urbanand rural areas, and using such data the food poverty line can be constructed. Armedwith prices we can then calculate the food poverty line in each region, augment thiswith an allowance for non-food needs (using the methods discussed below) and cal-culate the poverty rate. If this turns out to be reasonably close to the first guess of20% then one can stop. The job is done. However, if the initial guess is much higher orlower one can repeat the exercise at the new poverty line. Based on my experience, themethod converges fairly quickly.7

Another approach found in practice is to rely on a local “Expert Group” to set thebasic needs bundle, for both food and non-food. An example of this approach in prac-tice is the method used to set Russia’s official poverty lines, which are described inbox 4.3.

7 I developed this method for use in World Bank poverty assessments in the early 1990s. It hasbeen used many times since then by me or World Bank colleagues and I have never heard of it failingto converge fairly quickly.

198 m e a s u r e s a n d m e t h o d s

Box 4.3 Russia’s Poverty Lines

Russia’s official poverty lines are based on region-specific poverty basketsdetermined by local governments following the guidelines of an inter-ministryexpert group, which also reviews the draft consumer baskets submitted by thelocal governments and provides recommendations to the federal government,which makes the final decision on the composition of the regional baskets. Theexpert group evaluates the nutritional composition of every regional basket aswell as the composition of the non-food components.

Food baskets are defined based on nutritional requirements for calories, pro-teins, fats, and carbohydrates for various demographic groups. The baskets varyacross sixteen geographical zones of Russia, to account for differences in caloricrequirements by climatic zones and for regional differences in food consump-tion patterns. (The caloric requirements for adult males, for example, range from3,030 kcal per day for the colder northern regions to 2,638 kcal per day forthe warmer zones.) Norms for the consumption of proteins and carbohydratescan also vary substantially across zones. The final food poverty bundles com-prise thirty-four items, which differ between zones. For example, northern zonesinclude deer meat while the southern zones include larger shares of (relativelycheaper) fruits and vegetables. Food bundles for the zones with a predominantlyMuslim population do not include pork.

Three zones for non-food goods and three zones for services/utility basketsare defined according to climatic conditions in Russia. The basket for non-foodgoods provides detailed quantities for six groups. These groups are similar tothose used in the construction of the food basket, except that separate basketsfor non-food goods are defined for elderly men and women. The service basketconsists of consumption norms for sevenmain utilities. While the food and non-food baskets are defined at the individual level, the service baskets are definedon a per capita basis.

The non-food bundles consist of a number of personal items and some con-sumer durables. The non-food goods include specific items of clothing, footwear,pens, and notebooks. Goods for the household’s collective use are also included,comprising furniture (table, chair, chest of drawers, mirror, etc.), appliances (TV,refrigerator, clocks), kitchen items (plates, pots and pans, silverware), as well astowels, sheets, blankets, and pillows. Every item in the non-food bundle has anapproximate usage time that varies for different age-gender groups

The services bundle includes allowances for housing, heating, electricity, hotand cold water, gas, and transportation. (There is no allowance for health or edu-cation since by law, at least, these are free in Russia.) The norms for heating andelectricity vary by zones, with larger allowances in cooler places.

Price information on the items in the poverty baskets is collected quarterlyby the Russian Central Statistical Agency in 203 cities and towns of Russia for196 food and non-food items and services. The poverty lines for every geo-graphic zone are calculated by multiplying the quantities of the items in the bas-kets by the corresponding prices in an appropriate city or town within the zone.

Pover ty Lines 199

Further reading: Russia’s poverty lines were established under guidelines devel-oped by theMinistry of Labor and Social Development (MLSD 2000). For furtherdetail, see Ravallion and Lokshin (2006).

Instead of identifying a complete set of both food and non-food goods comprisingthe poverty bundle (as in the case of Russia), a more common approach in practiceis to set a food bundle, and then add an allowance for non-food spending consistentwith the spending patterns of those who have attained the food poverty line. Theseare sometimes called Engel curve methods. By one version of this method, one firstestimates the cost for each subgroup of a food bundle which achieves the stipulatedfood-energy intake level, and then divides this by the share of food in total expenditureof some group of households deemed likely to be poor, such as the poorest 20% in eachsubgroup.

A variation on this method is that proposed for the United States by MollieOrshansky (1965), which we already heard about in chapter 2 (box 2.5). Having set thefood poverty bundle to minimize the cost of attaining the predetermined nutritionalneeds Orshansky then deflated this by the average food share (of poor and non-poor)to derive the total poverty line. This became the basis of the official poverty line forthe United States, which remains the official line at the time of writing (in 2014).However, there has been much dissatisfaction with this line. This is hardly surprising;indeed, it is a credit to the Orshansky method that it has survived so long. Calls havebeen heard for updating the method to be more relevant to current standards of livingand consumption patterns. There have also been calls to embrace a broader definitionof income, allowing for benefits from the government. Box 4.4 summarizes the debateand proposed recent revisions.

Box 4.4 Dissatisfaction with the Official Poverty Line for the United Statesand a NewMeasure

Recall box 2.5. Critics of the official poverty line for the United States havepointed to a number of concerns. Fox et al. (2013, 2) summarize the issues well:

The official poverty measure (OPM) understates the extent of povertyby using thresholds that are outdated and may not adjust appropri-ately for the needs of different types of individuals and households, inparticular, families with children and the elderly. At the same time, itoverstates the extent of poverty, and understates the role of govern-ment policies, by failing to take into account several important types ofgovernment benefits . . .which are not counted in cash income. Becauseof these (and other failings), official poverty statistics do not depictan accurate picture of poverty or the role of government policies incombating poverty.

continued

200 m e a s u r e s a n d m e t h o d s

Box 4.4 (Continued)

The US Census Bureau has produced a new measure that attempts to addressthese concerns; the new measure is called the supplemental poverty measure(SPM). This gives a higher overall threshold, but the income aggregate is morecomprehensive, including benefits received in kind (rather than cash). The neteffect turns out to imply only a slightly higher overall poverty rate, which risesto 16.0% in 2012 using the SPM, from 15.1% using the OPM. The child povertyrate is lower using the SPM, with 18.0% of children deemed to live in poverty, ascompared to 22.3% using the OPM. However, the incidence of poverty rises forthe elderly (14.8% as compared to 9.1%) (Short 2013).

The new measure introduces a degree of relativism into US poverty measure-ment, which has traditionally followed an absolute approach, whereby the lineis only updated for inflation. The new measures were influenced by Citro andMichael (1995) who recommended that US poverty lines should be anchored tothe current median of expenditures on food, clothing, and shelter. This wouldclearly generate poverty lines with a positive elasticity to the mean, but theelasticity will be less than unity given that these goods tend to be necessities.However, one concern with this approach is that it is unclear why concernsabout relative poverty would apply only to necessities; one might expect socialinclusion needs that go beyond necessities in a country such as the United States.

An important change is that the new methodology allows a seeminglystraightforward accounting of the impact of public antipoverty programs.Without those programs, the poverty rate for 2012 would rise from 16.0% to30.5% (Fox et al. 2013). A 14% point reduction in the poverty rate is attributedto direct interventions.

The new poverty numbers also suggest that the incidence of poverty in theUnited States would have risen far more in the absence of the public programs.However, the claimed poverty impact of the US programs ignores behavioralresponses—the incentive effects that we heard about in box 1.4, and that havebeen much discussed back to at least the eighteenth century and continuingtoday. The calculations reported in Fox et al. subtract from the new incomeaggregates all receipts from the public programs and then recalculate the povertymeasures. However, while we have often heard exaggerated claims about incen-tive effects of antipoverty programs (as noted in chapter 1), it is hard to believethat they are entirely absent. There is bound to be some displacement, such asthrough labor supply, at least at times and in places of low unemployment forpoor men and women.

The revisions to the US poverty line are welcome, but still rather limited incapturing relative poverty, and more research is clearly needed on the impacts ofpublic programs. Chapter 10 returns to this topic.

Further reading: Critiques of the official poverty line are found in Citro andMichael (1995) and Blank (2008). On the SPM, see Short (2011).

Pover ty Lines 201

The Orshansky poverty line for the United States is an example of an Engel curvemethod in which the non-food component of the poverty line is set based on foodspending behavior. Orshansky assumed a food share of one-third, so she multipliedthe food poverty line by three to get the total line. Such methods do not assure thatthe resulting poverty lines have constant real value across the domain of the povertycomparisons being made (over space or time). Differences in the purchasing powerof the resulting lines (combining food and non-food components) can emerge sim-ply because of differences in average real consumption or income across subgroups ordates; those with a higher mean will tend to have a lower food share, which will thuslead one to use a higher poverty line. Again, an inconsistency can arise whereby a givenstandard of living is deemed to constitute poverty in one place but not another. Withno better information, it is probably better to use a fixed food share.

There are refinements to the type ofmethod used byOrshansky, as embodied in theofficial US poverty lines.With a little extra effort, one can calibrate the non-food allow-ance to a regression model of food demand behavior. The essential idea here is to lookat how much is spent on non-food goods either by households who are just capableof reaching their nutritional requirements, but choose not to do so, or by those whoseactual food spending equals the food poverty line.8 The former allowance for non-foodneeds is arguably a lower bound to what should be considered reasonable; the logichere is that anything that someone who can afford to reach the food poverty line givesup for non-food goods must be considered a basic non-food good, though there maybe other basic non-food goods. Of course, quite large sums might be spent by somehouseholds on non-food goods, even though their nutritional requirements are notbeing adequately met. One would not necessarily want to identify all such householdsas “poor.” There will also be some variation in spending patterns at any given budgetlevel, such as due to measurement errors or random differences in tastes. Given thisheterogeneity, a more reasonable approach is to ask: what is the average value of non-food spending by a household who is either just capable of reaching the food line withtheir total expenditure, or whose food spending matches the food poverty line? Thesemethods can often be implemented quite easily with readily available data. Box 4.5describes the method in greater detail.

Box 4.5 Setting the Non-Food Poverty Line Based on the FoodDemand Function

Let CF(C) denote the mean level of food spending (CF) by a household withtotal spending C. This relationship is assumed to have the shape of the boldcurved line in figure B4.5.1. Suppose that we have set the food poverty line,which we denote ZF. A seemingly reasonable lower bound to the allowancefor non-food needs is the spending on non-food goods by those whose totalspending is just enough to cover their basic food needs, but who choose instead

continued

8 Ravallion (1994b) outlined this approach, which has been widely used in developing countries.

202 m e a s u r e s a n d m e t h o d s

Box 4.5 (Continued)

45°

Total spending (C)

Food spending (CF)

ZF

Upperline (ZU)

Lowerline (ZL)

CF(C)

Figure B4.5.1 Setting the Non-Food Component of the Poverty Line Based onFood Demand.

to divert some money to non-food needs. This is ZF – CF(ZF

). We can then find

the total poverty line by adding ZF to this allowance for non-food needs, givingthe total poverty line ZL = 2ZF – CF

(ZF

).

Clearly this is a minimal allowance for non-food needs in that it only includesthe non-food spending that displaces the stipulated basic needs for food. A moregenerous allowance is to look instead at the non-food spending of those whoseactual food spending equals ZF. This level of non-food spending is obtained byinverting CF(ZF) at the point where C = ZF, as in the graph above to obtain ZU.

The fact that the upper line (ZU.) accords with spending behavior at the pov-erty line is an appealing feature, assuming that people know best how to spendtheir income. Against this advantage, the Engel curve is likely to shift with tastesand relative prices. And there is nothing to guarantee that those shifts will beconsistent with assuring the same level of welfare is attained at the differentpoverty lines.

Explicit consideration of the normative functionings that should be met for someoneto not be considered poor can also help in setting the non-food component (recallingthe discussion of the capabilities approach in chapter 3). Indeed, something like thisidea is often implicit in practice. (Although “functioning” are not mentioned explicitly,such a concept appears to be implicit in the Russian poverty lines described in box 4.3.)Conceptually, one might justify the use of a higher real poverty line in urban areasthan in rural areas by an appeal to the view that the capabilities to do various things

Pover ty Lines 203

(such as participating fully in the society) should be considered in measuring livingstandards (and hence treated as fixed in a comparison of absolute poverty). On theother hand, the commodities needed to achieve these capabilities are relative, and sovary from place to place. To see what bearing this argument might have on the settingof poverty lines, let us assume that we are concerned with two main functionings:the first is that of being adequately nourished to maintain health, while the secondis that of participating fully in the society in which one lives. Both require food, tomaintain a healthy weight, and to maintain the necessary activities for participatingin society. This food requirement is not particularly difficult to measure, and the realfood consumption level needed to reach it is unlikely to varymuch between (say) urbanand rural areas.

However, that is not true of the non-food component of the poverty line, and hereit could plausibly be argued that achieving the same absolute living standard requiresa more generous non-food commodity bundle in urban areas. For example, achievingthe same capability for participating with dignity in urban society may well requirethat more is spent on clothes, housing, and transport than is the case in a village.This argument would generally lead one to prefer the food-share method, where theallowance for food varies little, while that for non-food varies according to the typicalfood share of the poor. (In the food-energy method, by contrast, even the allowancefor food will tend to be a good deal higher in urban areas.) However, if we extend thelist of basic capabilities somewhat, then it ceases to be clear that we would want amore generous poverty line in urban areas. For example, if we include the capability ofobtaining the help of a doctor when sick, then the cost of doing so may well be verymuch higher in rural areas, given the far lower density of doctors there. There is agood case for setting a higher consumption poverty line in a geographic area that isdeprived in access to public goods, although this is rarely done in practice.9

Updating Poverty Lines over TimeThere are two methods found in the practice of updating poverty lines over time.In the first, the method used in the base date is simply repeated at the next date.In the second method, the old line is updated for inflation over the period using thebest available price index (as in America’s OPMmethod described in box 4.4). For someof themethods found in practice there is only one option, such as for the strongly rela-tive lines. For the absolute lines, we have been looking at in this section using the Engelcurve method the choice between these methods can be important to the results.

Putting aside data problems, if the data aim is to make strictly absolute povertycomparisons over time, then the second method is generally considered preferable.The reason is that repeating the calculations used in the base date may well introducesome differences in the real value of the poverty line associated with shifts in the Engelcurve, such as shifts due to changes in relative prices or tastes.

9 In the (many) developing countries where “urban bias” is severe (Lipton 1977), this should leadone to set a higher poverty line in rural areas.

204 m e a s u r e s a n d m e t h o d s

There are two important caveats. First, CPIs are not always reliable for this pur-pose. One problem is that the standard CPI is often anchored to middle-income orurban spending patterns, which often means that it gives too low a weight on food forthe purposes of updating poverty measures. It is better to use the CPI by components,especially when one has a separate food and non-food CPI. This re-weights the index toaccord with spending patterns at the poverty line. In some (thankfully rare) cases theCPI data are contaminated by political manipulation.10 A further problem is that theCPImight not adequately reflect changes in the economy.When goods that were previ-ously provided publically without charge (or subsidized) become private goods, such asdue to public-sector reforms, the prices facing consumers can rise substantially. Thesechanges may not be reflected in the CPI.

Second, for the basic-needs lines that use the method of setting the upper povertyline discussed in box 4.5, there is an a priori argument that can be made in favor ofupdating over time by repeating the method used in the first date. Jean Lanjouw andPeter Lanjouw (2001) show that this updating method assures that the resulting pov-erty measures are robust to changes in the internal composition of the food bundlesstemming from changes in the survey instrument. This robustness result is strikingand makes for a compelling case for using the upper bound Engel curve method whensurvey comparability is a concern.

Revealed Preference Tests of Poverty LinesRecall that an absolute poverty line in the welfare space requires a monetary line foreach subgroup in the population that is the cost of a common (interpersonally compa-rable) level of welfare. Suppose we follow the economic approach of defining “welfare”by a utility function defined on commodities. An income poverty line is interpreted asthe money metric of the minimum critical level of utility needed to not be poor, giving“utility consistency.”

There are then some testable implications of the utility consistency of povertylines, drawing on Paul Samuelson’s (1938) theory of revealed preference (box 4.6). Thetheory can be readily used to derive testable necessary conditions for utility consist-ency across those groups that are deemed to share common consumption needs—acommon utility function defined on commodities. All that this test requires is the setof “poverty bundles” and their prices. However, poverty lines may well reflect differingconsumption needs as well as differing prices. Then the information base for testingpoverty lines must be expanded; it is not sufficient to just know quantities and prices.Self-assessments of subjective economic welfare—as discussed further below—offer apromising route for testing consistency across different needs groups.

One study applied these ideas to an assessment of Russia’s official poverty lines(box 4.3).11 Russia’s striking climatic differences across regions suggest that the same

10 A famous example is Argentina since 2007; see The Economist (2013). The bias introduced in theofficial CPI makes a big difference to the poverty rate for Argentina; while the government’s estimateof the poverty rate for the urban population is 5%, the Catholic University of Argentina calculates thatthe poverty rate is 27%, once one corrects for the bias in the CPI.

11 See Ravallion and Lokshin (2006).

Pover ty Lines 205

consumption bundle is unlikely to yield the same utility even if relative prices do notvary. (Large regions of Russia have average annual temperatures well below freez-ing, while other regions have moderate northern European climates.) By implication,poverty lines should have higher value (assessed by a quantity index) in colder cli-mates. That is what was found in the data. However, the study also found violationsof revealed preference criteria that cannot be easily ascribed to the sources of needsheterogeneity invoked explicitly in setting the poverty bundles. Nor do the differ-ences across needs groups accord with self-rated perceptions of economic welfare. Theresearchers conclude that there are latent utility inconsistencies in Russia’s officialpoverty lines and they speculate on their origin.

Box 4.6 Applying Samuelson’s Theory of Revealed Preferencesto Poverty Bundles

Consider, two groups, A and B (urban and rural areas, say), eachwith a povertyline, which is the cost in each group of bundles of goods specific to each group.Utility consistency requires that these two bundles yield the same utility. If needsare identical in A and B, then there is a straightforward revealed preference test.This requires that the poverty line for A is no greater than the cost of B’s bundlefor a member of group A, for otherwise the bundle in B is affordable when A waschosen, implying that A is preferred. But then the two bundles cannot yield thesame utility (judged by the common preferences). Similarly, the group B povertyline cannot be greater than the cost in that group of the bundle for A. If thistest fails, then we can reject consistency though passing the test does not assureconsistency for all possible utility functions. For example, suppose again thatthere are just two goods, food and clothing. Four “poverty bundles” are proposedas indicated in figure B4.6.1. Utility consistency is rejected for bundles A and B;but the test is inconclusive for C or D.

Food

CD

Clothing

B

A

Figure B4.6.1 Revealed Preferences.

Historical note: Paul Samuelson published his clever paper on revealed preferencewhen he was twenty-three years of age, while a student at Harvard University. Hewent on to be one of the most influential economists of the twentieth century.

206 m e a s u r e s a n d m e t h o d s

The Food-Energy IntakeMethodThis method proceeds by first fixing a food-energy intake (FEI) cut-off in calories, andthen finding the consumption expenditure or income level at which a person typicallyattains that FEI.12 This can be estimated from a regression of calorie intake againstconsumption expenditures or income.13 In essence, one is defining the poverty line asthe total consumption expenditure at which one can expect a person to be adequatelynourished in the specific society under consideration. If the average level of FEI at agiven consumption expenditure is strictly increasing in consumption, and the food-energy requirement is a single (fixed) point, then this definition will yield a uniquepoverty line. Notice that themethod automatically includes an allowance for non-foodconsumption, as long as one locates the total consumption expenditure at which aperson typically attains the caloric requirement.

When the aim is to measure absolute poverty using a line with constant realvalue, the FEI method runs into a serious problem. The relationship betweenFEI and consumption expenditure (or income) is unlikely to be the same acrossregions/sectors/dates, but will shift according to differences in affluence, tastes, activ-ity levels, relative prices, publicly provided goods, or other variables. This is illustratedin figure 4.1, which illustrates how the method can be used to set separate urbanand rural poverty lines. The curved lines represent the mean FEI at each level ofnominal “income” (or consumption per person). Food-energy expenditures tend to behigher in rural areas at given income. And there is nothing in this methodology toguarantee that these differences are the ones which would be considered relevant topoverty comparisons. For example, agricultural work tends to be more strenuous thanmost urban activities, and thus entails higher food-energy requirements to maintainbody weight.14 If one used a higher food-energy requirement in rural areas, one couldaddress this problem.

zu Income

Food-energy intake

2100

zr

rural

urban

Figure 4.1 Anchoring the Poverty Lines to Food-Energy Requirements.

12 See Osmani (1982) and Greer and Thorbecke (1986a, b) for expositions of this method. Otherexamples can be found in Dandekar and Rath (1971) and Paul (1989). The method has also been usedby a number of governments, including Indonesia (which we return to).

13 On the specification of such regressions, and the econometric problems that need to beconsidered, see Bouis and Haddad (1992).

14 See, e.g., the estimates of caloric requirements for various activities given in WHO (1985).

Pover ty Lines 207

There are other reasons for the shift in the “calorie-income” relationship infigure 4.1. The relative price of food tends to be higher in urban areas. Nominal foodprices are typically higher to compensate for transport costs from rural areas. Also,many non-food goods are cheaper in urban areas—indeed, many such goods are oftennot available in the countryside. Tastes appear also to change with urbanization, infavor of non-food goods. Probably most worrying is the fact that richer householdswill tend to buy more expensive calories; using this method of setting poverty lineswill mean that one sets a higher line in richer areas—it becomes more like relativelines than absolute lines.

As a result, one can end upmaking inconsistent poverty comparisons whereby indi-viduals, who one would deem to have the same standard of living in terms of theirtotal real consumption, are being treated differently. Indeed, comparisons of absolutepoverty across regions, sectors, or dates using the FEI method may be misleading formany purposes, as explained in box 4.7.

*Box 4.7 Pitfalls of the FEI Method of Setting Poverty Lines

Consider two households, one with higher real consumption than the other.Which will be deemed “poorer” relative to poverty lines constructed by the FEImethod? The answer is not obvious, and there can be no presumption thatthe poorer household will be correctly identified. To see why, let real expendi-ture of household i be yi for which y2 > y1, and let Pk

i = cFi /ki denote the average

price paid for a calorie, where cFi is real food expenditure by i, and ki is thefood-energy (caloric) intake, normalized by the stipulated requirement. Thenyi = PF

i ki + cNFi where cNFi is real non-food spending. Assume that food spend-ing increases with total spending, and that the average price of a calorie (foodexpenditure divided by FEI) does also

(Pk2 > Pk

1

). Then the richer person buys

more expensive sources of food energy, such as imported food-grains, or by eat-ing in restaurants. Furthermore (for the purpose of this example), suppose thatFEIs are the same for both households (k1 = k2) and that both are undernour-ished, that is, food-energy requirement exceeds intake (ki < 1). Then the povertygap (the deficit from the poverty line, as derived by the FEI method) must alwaysbe higher for the less poor household. To see this, note that the poverty lineimplied by the FEI method is Zi = PF

i + cNFi (since k = 1 at the calorific require-ment). Then Zi – yi = PF

i (1 – ki) which is greater for the better off household.Thus, under these conditions, the poverty line will not only be higher for thebetter off household, but the poverty gap falls as the standard of living falls. Thesame result can also be obtained if FEI is higher for the better off household, pro-vided that the elasticity of intake with respect to expenditure is sufficiently low;the necessary and sufficient condition is that the elasticity of FEI does not exceedthe product of the income elasticity of the calorie price times the proportionateshortfall of intake from requirement.

Further reading: For further discussion of the FEI method, see Ravallion (1994b,2012c) and Ravallion and Bidani (1994).

208 m e a s u r e s a n d m e t h o d s

The problems with the FEI method of setting poverty lines were identified in astudy for Indonesia.15 Indonesia’s Central Bureau of Statistics (Biro Pusat Statistik:BPS) uses a version of this method for constructing its poverty lines. It proceeds byfirst fixing an FEI cut-off in calories, and then finding the consumption expenditureat which a person typically attains that FEI. One then counts the number of peoplewith expenditure less than this amount. Thus, one is estimating the number of peoplewhose total consumption expenditures would be insufficient to attain the predeter-mined FEI, given the prevailing relationship between FEI and total consumption acrossthe population. Themethod is applied separately to each sector (urban/rural) and eachdate. The BPS method (or variations on it) has been used in poverty studies for othercountries. The Indonesian practice is not unusual.

This method has been found to generate differentials in the poverty lines betweenurban and rural areas that are far in excess of the COL differential.16 The differentialsover time tend also to exceed the rate of inflation. As is typically the case in developingcountries, the relationship between food-energy consumption and total expendituresis very different between urban and rural areas, with higher calorific intakes at anygiven consumption expenditure level in rural areas. For example, as already noted,this could simply reflect a tendency for households in more affluent areas to buymore expensive calories. Differences in relative prices (food being relatively cheaperin rural areas) and tastes may also be important. For the same reasons, the relation-ship between calorie intake and income or consumption appears also to be shiftingover time, with progressively lower FEIs at any given real expenditure level.

The difference in the food-energy and income relationship between urban and ruralareas of Indonesia was so large that, at any given food-energy requirement level, theurban poverty line exceeds the rural poverty line by a magnitude which is sufficientto cause a rank reversal in the estimated headcount index of poverty between the twosectors.17

Clearly one wants the poverty lines used to properly reflect differences in the COLacross the sectors or dates being compared. However, as discussed above, the food-energy method is quite unlikely to generate poverty lines which are constant in termsof real consumption or income across the sectors/dates being compared given thatthe relationship between FEI and consumption or income is not going to be the sameacross sectors/dates. In fact, the poverty lines generated by this method appear tobehave more like relative lines; indeed, the BPS lines have been found to have an elas-ticity with respect to the mean that is close to unity.18 We turn next to consider suchrelative lines.

Relative Poverty LinesA difference between the literatures for developing and developed countries is thatabsolute poverty considerations have dominated the former, while relative poverty

15 See Ravallion and Bidani (1994).16 See Ravallion and Bidani (1994).17 Similar findings were obtained for Bangladesh by Ravallion and Sen (1996) and Wodon (1997).18 See Ravallion and Bidani (1994).

Pover ty Lines 209

has been more important in the latter.19 Much of the developed country literaturehas taken the view that poverty is entirely “relative.”20 The position one takes on thisissue is salient to some important development debates. In particular, as we will seein chapter 8, the extent of relativism one builds into poverty measurement mattersgreatly to the long-standing policy debates about economic growth and poverty.

We have seen earlier that some of the methods used to set “absolute lines” areimplicitly introducing relative considerations. The lines we consider now make thisexplicit. The most common practice in doing so is to use some proportion of the arith-metic mean or median of the distribution of consumption or income as the povertyline; for example, many studies have used a poverty line which is set at about 50% ofthe national median.21 Such poverty lines are known as “strongly” relative lines.22 Oneshould not be surprised to find that such lines yield quite different poverty compar-isons to fixed (absolute) lines.23 For example, the official absolute line for the UnitedStates gives a poverty rate of 15% in 2010 while if the line had been set at 50% of themedian, the rate would have been 20%.24

Is there a compelling case for using poverty lines set at a constant proportion ofthe mean? Poverty measures are discussed in greater detail in chapter 5, but for nowwe need only note that almost all measures of poverty have the property that if onedoubles (say) all incomes and the poverty line then the poverty measure is unchanged.So if the poverty line is set at a constant proportion of the mean, then the measuredepends solely on the relative distribution of income. It might be argued that this isstill a good measure of “relative poverty,” to the extent that what one is really tryingto capture in this concept is the amount of inequality in the distribution. We should,however, then ask whether or not a ranking of distributions in terms of a stronglyrelative measure will preserve their ranking in terms of an appropriate measure ofinequality. However, as we will see in chapter 5, this is not the case in general. Thedetails of this argument must be modified somewhat if the relative poverty line is setat a constant proportion of the median, rather than the mean.25 The outcome willthen depend on how the ratio of the median to the mean changes with increases in themean (depending in turn on how the skewness in the distribution evolves). Nothing

19 There are exceptions; for example, an absolute poverty line has historically been used by the USgovernment, though see box 4.4.

20 See, e.g., Townsend (1985), commenting on Sen (1983); also see Sen’s (1985b) reply.21 Following Fuchs (1967); see the discussion in chapter 2. An alternative, though less common,

approach is to define the poor as those who consume low amounts of certain commodities, relative tothe “norm” in a particular society, as assessed by (say) the modal consumption; on this approach, seeTownsend (1979) and Desai and Shah (1988).

22 Following Ravallion and Chen (2011).23 See Atkinson (1991), who shows how poverty comparisons across countries in Europe are

affected by this choice; there is substantial re-ranking when one compares poverty measures basedon a constant proportion of each country’s mean income with those obtained using the same propor-tion applied to a constant mean across all countries. For a comparison of absolute and relative povertylines for a developing country, see Sahota (1990).

24 These are the estimates reported in Iceland (2013).25 As in Fuchs (1967). This has been the practice in a number of studies, particularly for developed

countries; see, e.g., the work of Smeeding et al. (1990) using the Luxembourg Income Study. For anexample in a developing country, see Sahota (1990).

210 m e a s u r e s a n d m e t h o d s

more can be said in general, though one certainly cannot rule out the possibility thatthe poverty measure may turn out to be an increasing function of the mean. Again, itis unclear what significance one should attach to such a measure.

Critics of strongly relative measures in which the poverty line is a constant pro-portion of the mean (or median) point out that if all incomes increase by the sameproportion then the measure of poverty will be unchanged. The critics argue that thisis a deceptive property. It is hard to imagine that a poor person whose income hasincreased by (say) 100% is not less poor. Yet that is what such measures will tell us.

Seemingly perverse poverty trends have been found using strongly relative mea-sures. For example, one study found that relative poverty measures for Ireland wererising despite higher absolute real incomes for most of the poor.26 Another studyfound that relative poverty measures for New Zealand were deceptive in showing fall-ing poverty despite lower absolute levels of living for the poor.27 The UNDP (2005,334) writes, “It is clear that when economic conditions change rapidly, relative pov-erty measures do not always present a complete picture of the ways that economicchange affects people’s lives.”

Starting from the position that our poverty comparisons must be absolute in thespace of welfare (chapter 3, section 3.1) provides conceptual guidance on what a rela-tive line in the income space would look like. We can suppose that a person’s welfaredepends on both their own income and their relative income, defined as the ratioof their own income to the income of the country they live in. We can call this the“relative-income hypothesis.” We can then see why the income poverty line neededto attain a fixed level of welfare will rise with the mean; the monetary line will needto be higher to compensate for the greater relative deprivation implied by living in aricher country. However, only in the extreme case in which welfare depends on relativeincome alone—and not on own income at given relative income—will we get a povertyline that is a constant proportion of the mean. As long as people care about their ownincome, as well as relative income, the poverty line will rise with mean income but notproportionately. Box 4.8 explains this further.

Box 4.8 The Welfarist Interpretation of a Relative Poverty Line

The welfarist interpretation of a relative poverty line argues that povertyshould be seen as absolute in the space of “welfare,” rather than in the consump-tion or income space, and that welfare depends (positively) on both own incomeand relative income—own income relative to mean income in the country of res-idence. It follows that for a poverty line to be a money-metric of welfare it mustbe an increasing function of mean income.

To see this more clearly, suppose that welfare depends on “own income,” Y ,and “relative income,” Y/M, where M is the mean for the country of residence.Under this specific form of the relative-income hypothesis, welfare is

26 See UNDP (2005, box 3), based on Nolan et al. (2005).27 See Easton (2002).

Pover ty Lines 211

W = W (Y, Y/M).

This is taken to be smoothly non-decreasing in both Y and Y/M. The povertyline in the income space is denoted Z and is defined implicitly by:

W = W(Z, Z/M),

where W is the fixed poverty line in the welfare space.The solution for Z is then a smoothly non-decreasing function ofM.However,

only in a rather special case will it be directly proportional to the mean with thesame slope everywhere, as assumed in the literature on relative poverty. It isplain that the special case is when welfare does not depend on own income, so itcan be written as:

W = V (Y/M).

Here V is some strictly increasing function. Again fixing welfare at W andsolving for the poverty line we now have:

Z = k.M.

Here k = V–1(W) is the constant of proportionality in the strongly relativepoverty measure.

There is another point to note: Even if we assume that people do not careabout their own income at given relative income, the value of k is unlikely to beconstant but will vary with other welfare-relevant factors such as the extent ofinequality and how equitably public services are allocated.

The upshot of this analysis is that we can give relative poverty lines a welfaristinterpretation. However, the resulting lines are not going to look like those usedin the literature on the strongly relative poverty except in the seemingly unlikelylimiting case in which people do not care about their own income independentof their relative income.

Further reading: For further discussion, see Ravallion (2008c, 2012b).

Recall that another justification for relative lines is found in the idea of “social inclu-sion” (chapter 3). However, this argument is also questionable. Consider the classicexample of a social inclusion need found in Adam Smith’s description of the role of alinen shirt in eighteenth-century Europe (chapter 1).28 Since a socially acceptable linenshirt cannot cost any less for the poorest person (let alone cost zero in the limit), it

28 Inmore recent times, a number of studies have also pointed to the social roles played by clothing,festivals, celebrations, and communal feasts; see, e.g., Rao (2001), Banerjee and Duflo (2008), andMilanovic (2008).

212 m e a s u r e s a n d m e t h o d s

simply cannot be that the relative line is a constant proportion of the mean. The anal-ogous commodity to a linen shirt inmiddle- and high-income countries todaymay wellbe a cell phone, but the point remains: it is plausible that ideas about what “poverty”means in terms of real income change as economies develop, but it is not plausiblethat the poverty line is a constant proportion of mean income.

How then do poverty lines vary across countries? A survey of poverty lines acrossninety-five countries, both developing and industrialized, reveals that the elasticity ofthe poverty line with respect to mean consumption is increasing in the mean. (Theresults are given in box 2.5.) At the mean point of the country means the elasticityis 0.66. However, among low-income countries, the elasticity is very much lower, atabout zero. Among the highly industrialized countries the elasticity is close to unity.

In short, this cross-country comparison suggests that real poverty lines tendto increase with growth, but slowly for the poorest countries. Notions of absolutepoverty—whereby the poverty line does not vary with overall living standards—appear to be relevant to low income countries, while “relative poverty” ismore relevantto high-income countries. Furthermore, the proportionality assumption often madein the developed country literature appears to be quite reasonable for the advancedindustrialized countries, though the measure obtained is very difficult to interpretin terms of conventional concepts of inequality and poverty. The use of a constantproportion of the mean is also hard to defend conceptually (box 4.8).

A new concept of “weakly relative poverty” has emerged in recent times thatcontains these two extremes of absolute poverty and (strongly) relative poverty asspecial cases. Consistently with how national poverty lines vary across countries (asin box 2.5), the key feature of these weakly relative lines is that the elasticity of thepoverty line to the mean rises from zero in the poorest countries to unity in the rich-est (though never reaching unity). Chapter 2 (section 2.1) had introduced this idea,and box 4.9 goes into greater detail. When used to measure poverty globally, one caninterpret these lines as indicating the extent of poverty when judged by the standardstypical of each country, given its average consumption level.

Box 4.9 Absolute, Weakly Relative, and Strongly Relative Poverty Lines

Figure B4.9.1 plots the poverty line (for a country, say, but it could besub-national) against mean income. Both are in real units (deflated for COLdifferences). The absolute line is fixed. The strongly relative line is directly pro-portional to the mean, so it is zero at zero mean income and rises linearly. Theweakly relative line of Ravallion and Chen (2011) is alsomarked. This is the abso-lute line up to some critical income level, but then rises with the mean after that.Notice that the relative component of the weakly relative line does not go to zeroat zero income. Thus, it can allow for a positive minimum cost of social inclusionin the poorest countries.

To understand the properties of a strongly relative line, note first that we canwrite a poverty measure in the following generic form (later boxes will also makeuse of this equation):

P = P (M/Z, L) ,

Pover ty Lines 213

Poverty line

Absolute line

Weakly relative

y* Income 0

Strongly relative

Figure B4.9.1 Relative Poverty Lines.

where Z is the poverty line, M is the mean of the distribution on which povertyis measured, and L is the Lorenz curve of that distribution (one can think of thisas a vector of parameters of the Lorenz curve), which summarizes all relevantinformation about relative inequalities. A strongly relative poverty line is set at aconstant proportion of the mean, Z = k.M, where k is some constant, such as0.5, as often used in many European studies. The measure of poverty becomesP(k, L), and depends solely on the Lorenz curve. If all incomes increase by thesame proportion, then P(k, L) would remain totally unchanged; there would be nochange in relative inequalities and so P(k, L) would not change. And the povertyline would simply increase by the same proportion.

Further reading: See Ravallion (2012b).

Consistency versus SpecificityFor many of the purposes in making a poverty comparison—such as deciding whatregion or country should receive aid—the most important thing is that the povertyline yields a welfare-consistent comparison, in that the measured poverty of any per-son depends only on their standard of living, and not in which subgroup (such asregion or ethnic group) they happen to belong. Consistency requires that the pov-erty line is the monetary equivalent of a fixed level of welfare. This is hard to be sureof empirically, but it is clear from the above discussion that many popular methods ofsetting poverty lines can rather easily fail this test. (Recall box 4.7 on the FEI method.)There are variations on these methods that are more likely to yield consistent povertycomparisons and are typically feasible with the available data. However, no methodwill ever be uncontentious, given the existence of immeasurable determinants of well-being. Recognizing that a certain amount of arbitrariness is unavoidable in definingany poverty line in practice, one should be particularly careful about how the choicesmade affect the ordinal poverty comparisons, for these are generally whatmatter mostto the policy implications. Chapter 5 will return to this point.

214 m e a s u r e s a n d m e t h o d s

Internal welfare consistency can be at odds with another seemingly sensible prin-ciple: poverty lines must be considered socially relevant in the specific context. If aproposed poverty line is widely seen as too frugal by the standards of society, then itwill surely be rejected. Nor will a line that is too generous be easily accepted.We shouldnot then be surprised that richer countries tend to use an implicitly higher referencewelfare level for defining poverty. This point has long been recognized. For exam-ple, Tibor Scitovsky (1978, 116) noted that, among developed countries in the 1960s,richer countries tended to have higher poverty lines and he explained this as follows:“in the advanced countries, the poverty norm has long ago ceased to reflect a physi-ological minimum necessary for survival and has become instead a ‘minimum socialstandard of decency,’ the life-style that a particular society considers the minimumqualification for membership.”

Scitovsky’s observation does not apply only to the rich countries today, but in factapplies to all except the poorest countries. As is clear from the preceding discussion ofthe main methods used to set absolute lines, there are many free parameters that canbe brought into the analysis to influence the line obtained. The stipulated food-energyrequirements are similar across countries, but the food bundles that yield a givennutritional intake can vary enormously (such as in the share of calories from coarsestarchy staples rather than more processed foodgrains, and the share from meat andfish). The non-food components also vary, either explicitly or implicitly (through shiftsin the food demand functions). There are relativist gradients in both the food and non-food components of the national poverty lines for developing countries, though theelasticity with respect to mean consumption is higher for the non-food component.29

The judgments made in setting the various parameters of a poverty line are likelyto reflect prevailing notions of what poverty means in each country. And those normsclearly go well beyond the “physiological minimum necessary for survival.” The basalmetabolic rate implies a positive lower bound to the cost of nutritional requirements(for all positive food prices). The cost of the (food and non-food) goods requiredfor social needs must also be bounded below. The poverty lines found in many poorcountries are certainly frugal. Consider, for example, the average daily food bundleconsumed by someone living in a neighborhood of India’s national poverty.30 The dailyfood bundle per person comprised 400 grams of coarse rice and wheat, 200 grams ofvegetables, pulses, edible seeds and fruit, plus modest amounts of milk, eggs, edibleoil, spices, and tea. After buying such a food bundle, one would have been left withabout $0.30 per day (at 1993 purchasing power parity) for non-food items.

Such a frugal line is clearly too low to be acceptable in middle-income (and certainlyin rich) countries, where higher overall living standards naturally mean that higherstandards are used for identifying the poor. Consider instead the daily food bundleused by one study for constructing Indonesia’s poverty line for 1990.31 This comprised300 grams of rice, 100 grams of tubers and similar amounts of vegetables, fruits andspices as in the India example; but it also included fish and meat (about 140 grams in

29 See Ravallion et al. (2009).30 These are the author’s calculations, as reported in World Bank (1997). The official poverty line

for India in 1993 was used.31 The study referred to is Bidani and Ravallion (1993).

Pover ty Lines 215

all per day), and the overall diet was more varied and probably preferable by the tastesof most consumers.32 This bundle would in turn be considered too frugal for definingpoverty standards in many richer countries.

The position one takes on this issue depends in part on the purpose of the povertymeasures. If they are intended to be purely descriptive one might opt for specificity—choosing a line that is considered appropriate in each setting, with no claims ofcomparability across settings. If instead one is using the poverty measures to informpolicymaking, welfare consistency will often trump the merits of specificity.

However, welfare consistency only implies a constant real poverty line if we postu-late that welfare depends solely on one’s own consumption. As soon as we allow forsocial effects on welfare, such as due to perceptions of relative deprivation in richercountries or the costs of assuring social inclusion, the welfare-consistent poverty linewill rise with average income. It seems unlikely that it would rise in direct proportionto average income, but it will demonstrate some gradient.

Are the weakly relative poverty lines described in box 4.9 necessarily welfare-consistent? They will be if the gradient in poverty lines across countries only reflectsrelative deprivation (in a welfarist model) or costs of social inclusion (in a capabilities-basedmodel). However, that cannot be known since there is another possibility: richercountries may use higher reference levels of welfare in determining their poverty lines.One can think of this as a model of social norms determining national lines. Nationalpoverty lines with constant purchasing power can be thought of as providing a lowerbound to the extent of global poverty; this lower bound is relevant if one assumes thatthe national lines only vary according to social norms. The weakly relative lines fittedto national lines can be interpreted as providing an upper bound, in which the nationallines are assumed to reflect the costs of attaining a common level of welfare. The truthis no doubt somewhere between the two bounds.

4.3 Subjective Poverty Lines

We have seen that different countries tend to use different poverty lines, and richercountries tend to have higher lines. The same is true of individuals. One approach tosetting poverty lines explicitly recognizes that poverty lines are inherently subjectivejudgments people make about what constitutes a socially acceptable minimum stand-ard of living in a particular society. The challenge is how to go from this observationto derive a single poverty line.

One approach has been based on survey responses to the following MinimumIncome Question (MIQ):33 “What income level do you personally consider to be abso-lutely minimal? That is to say that with less you could not make ends meet.” Theanswers found in survey responses tend to be an increasing function of actual income.This is a key assumption for this approach to setting poverty lines. Furthermore, thestudies that have included this question have tended to find a relationship as depicted

32 Vegetarians would presumably need to be compensated for the meat and fish by similarlyprotein-rich foods and would then prefer this version of the Indonesian bundle over the Indian bundledescribed above.

33 This is paraphrased from Kapteyn et al. (1988).

216 m e a s u r e s a n d m e t h o d s

z* Actualincome

Subjective minimumincome

45°

Figure 4.2 The Social Subjective Poverty Line.

in figure 4.2. The point Z* in the figure can be called the social subjective poverty line(SSPL)This is an obvious candidate for a poverty line; people with income above z*

tend to feel that their income is adequate, while those below Z* tend to feel that itis not. Thus the SSPL can claim to reflect the collective understandings of what con-stitutes “poverty” in the specific setting, rather than using a concept imposed fromoutside that setting. This approach, or variations on it, has been applied in a numberof European countries.34

There are likely to be other factors besides income that influence the answers tothe MIQ. We can think of those answers as a function of income Y and a list of othervariables, given by the vector X (similarly to box 4.1). Then Z* will also be a functionof X. It is readily verified that any variable in X that shifts the curved line upward(downward) in figure 4.2 will increase (decrease) Z*. Thus we can see, for example,how the social subjective poverty line varies with household size, demographics, orlocation.

Judgments are called for in deciding what variables to include in the X vector, orat least in deciding which ones should be allowed to shift the SSPL. This is a difficultbut poorly understood issue. Should this include all the observable covariates of sub-jective welfare, or only those things that one would deem relevant to poverty lineson a priori grounds? The problem is that there are predictors of subjective welfarethat are not normally considered welfare-relevant for tasks such as setting povertylines or formulating antipoverty policies. Consider, for example, the common findingin the literature for developed economies that unemployment reduces subjective wel-fare at a given level of income.35 If one included this variable in the vector X, thenone would conclude that the unemployed should have a higher poverty line than theemployed, ceteris paribus. Most other approaches to setting a poverty line would nothave this feature, and objections would surely be raised; indeed, the standard economicmodel in which utility depends on the commodities consumed and leisure predicts

34 See, e.g., Hagenaars (1987).35 Examples include Clark and Oswald (1994), Winkelmann and Winkelmann (1998), and

Ravallion and Lokshin (2001).

Pover ty Lines 217

that the unemployed would be better off at given income. (They would prefer not tobe unemployed but in this model that would only be for the added income and henceconsumption that employment allows.) As noted in chapter 3, this standard economicmodel may well be incomplete, as it misses important psychic costs of unemploy-ment. Those who argue that a welfare-consistent poverty line should not be any higherfor the unemployed are implicitly arguing that their concept of “welfare” should notallow for such psychic costs. This would surely be unacceptable to those who equate“welfare” with utility or happiness.

One can postulate that only a subset of the X variables that have predictive powerfor survey responses on subjective minimum income should be used as shift variablesfor the SSPL. Those variables that are left out of the SSPL might be set at sampleaverage values, say. In this respect the SSPL approach shares common features withall the other approaches described above, namely that external value judgments aboutinterpersonal comparisons of welfare are required. This is unavoidable. However,the SSPL approach does allow for non-market goods, and their weights (the missingprices) are determined by the data. Thus, the method narrows the range of choicesfor which external value judgments are required. Of course, one must still accept thatsubjective questions about welfare provide a sufficiently credible signal for makingsuch choices, after one has statistically isolated that signal from the noise that alsocomes with such data.

In applying the MIQ in many developing countries, one will also find that “income”is not a well-defined concept, particularly (but not only) in rural areas. It is not atall clear whether one could get sensible answers to the MIQ. In work with MennoPradhan, I proposed a method for estimating the SSPL based on qualitative dataon consumption adequacy, as given by responses to appropriate survey questions.36

Instead of asking respondents what the precise minimum consumption is that theyneed, one simply asks whether their current consumptions are adequate. This providesa multidimensional extension to the one-dimensional MIQ. The SSPL is the level oftotal spending above which respondents say (on average) that their expenditures areadequate for their needs. For empirical implementation, the probability that a sam-pled household will respond that its actual consumption of each type of commodityis adequate can be modeled as a nonlinear regression, called a “probit.” Under cer-tain technical conditions, a unique solution for the subjective poverty line can thenbe obtained from the estimated parameters of the probit regressions for consumptionadequacy.

There have been some estimates of SSPLs.37 Interestingly, the estimates to datesuggest that the overall poverty rate based on the SSPL is roughly similar to thatimplied by objective poverty lines.38 It may well be that the choice of parameters in

36 See Pradhan and Ravallion (2000).37 I focus on application to developing countries. The applications to date include Pradhan and

Ravallion (2000) using data for Jamaica and Nepal; Ferrer-i-Carbonell and Van Praag (2001), forRussia; Taddesse and Shimeles (2005), for Ethiopia; Gustafsson et al. (2004), for urban China; Lokshinet al. (2006), for Madagascar; Bishop et al. (2006), for urban China; and Carletto and Zezza (2006), forAlbania.

38 An exception to this finding is reported for the United States by de Vos andGarner (1991), wherethe SSPL is well above the prevailing (absolute) line, though the US line has not been updated in real

218 m e a s u r e s a n d m e t h o d s

the “objective” absolute lines already approximated the expected SSPL in the specificcontext. However, the structure of the poverty profile has turned out to be differentin some respects. While objective poverty lines for developing countries often implythat larger households are poorer, this is not typically the case in cross-sectional stud-ies using the subjective approach, which tends to suggest greater economies of scalein consumption than normally assumed. For example, in using the economic ladderquestion (chapter 3) to test the welfare consistency of prevailing objective povertylines for Russia, striking differences were revealed in the properties of the equivalencescale.39 The objective poverty lines had an elasticity of 0.8 to household size, while thesubjective indicator called instead for an elasticity half this size.40

Subjective data have thrown new light on the debate on whether poverty is abso-lute or relative. One finds little credible support for the idea of a relative poverty lineset at a constant proportion of the current mean income. Poverty lines calibrated tosubjective welfare tend to rise with mean income but with an elasticity less than unity,suggesting that they are more like the “weakly relative poverty lines” as defined byRavallion and Chen (2011).41

A number of papers have reported evidence of effects on subjective welfare that canbe interpreted as indicative of “relative deprivation,” meaning that self-assessed well-being tends to fall as social comparators become better off, at given “own income.”42

One study reports regressions for subjective welfare in the United States that imply aparticularly strong relativism, whereby own income does notmatter to subjective well-being independently of income relative to themean in the area of residence.43 The bulkof the evidence has been for relatively rich countries. The work that has been donefor developing countries has been less supportive. The tests for relative deprivationeffects in self-reported happiness have found rather little support for the idea andeven evidence of positive external effects of higher “neighbors’ income,” rather thanthe negative effect predicted by the theory of relative deprivation.44

terms since the 1960s; a more current absolute line for the United States would probably be closer tothe SSPL.

39 See Ravallion and Lokshin (2002).40 Similarly, see Pradhan and Ravallion (2000), using data for Jamaica and Nepal; Bishop and Luo

(2006), using data for urban China; and Rojas (2007), using data for Mexico. For a more general dis-cussion of economies of scale in consumption in developing countries, see Lanjouw and Ravallion(1995).

41 Hagenaars and Van Praag (1985) estimated an elasticity of 0.51 for eight European countries.For the United States, Kilpatrick (1973) estimated an elasticity of about 0.6 for subjective povertylines, and De Vos and Garner (1991) found an own-income elasticity of the US subjective poverty lineof 0.43.

42 See Oswald (1997), Frank (1997), Frey and Stutzer (2002) and Clark et al. (2008). Reviewing theevidence, Frey and Stutzer assert that “there is little doubt that people compare themselves to otherpeople and do not use absolute judgments” (2002, 412). This would seem to be overstated.

43 See Luttmer (2005).44 See Senik (2004), Kingdon and Knight (2007), and Ravallion and Lokshin (2010) using data for

Russia, South Africa, and Malawi, respectively.

5

Poverty and InequalityMeasures

To recap the story so far in Part Two: We have learned about how “economic welfare”is measured. Household command over commodities is key, although it is unlikelyto be sufficient information. In calibrating welfare metrics and setting poverty lines,economists have turned to two main sources of that extra information. The firstsource is data on attainments of certain basic functionings, such as being adequatelynourished for good health and normal activities. The second is information on self-assessments of welfare, for estimating social subjective poverty lines. While there isbound to be some arbitrariness to any poverty line (as there is in other aspects of eco-nomic and social measurement), following Bowley and others, monitoring progress inreducing the number of people living below some fixed line is a justifiable approach tomeasuring social progress.

Applying the various tools reviewed in chapters 3 and 4, we end up with a dis-tribution of the measures of individual economic welfare in the relevant population.Typically, this will be a measure of total household consumption or income normal-ized by the household-specific poverty line, interpreted as a deflator for differences inneeds (associated with differences in the size or composition of the household or inthe prices faced). This chapter turns to the task of aggregating the information on thedistribution of the chosen measure of economic welfare into one or more summarystatistics on poverty or inequality.

Poverty and inequality measures have both descriptive and normative roles. Thelatter has proved to be more contentious, and it is worth reviewing the issues. So thechapter begins with a discussion of the various normative foundations that have beenproposed for measurement, linking back to the discussion in Part One. This will makeclear that in thinking about measuring poverty it makes sense to start with a discus-sion of how we measure inequality. The chapter then discusses the main measuresof poverty found in practice. The chapter also reviews the various tools of analy-sis that have been developed. These include decompositions that can be done of anaggregate poverty measure and tests for assessing the robustness of ordinal pov-erty comparisons—when we only need to know whether there is greater poverty inone place or at one date than another, or with and without a policy change—to theassumptions made about poverty lines andmeasures. One important lesson from thatdiscussion will be the importance of considering a range of poverty lines. This leadsnaturally to a discussion of the size of the “middle class,” taken up later in the chapter.Measures of poverty and inequality are hardly far removed from policymaking, but

219

220 m e a s u r e s a n d m e t h o d s

there are measures that have been developed in the literature that are “hard wired”to assessing the performance of specific policies. A prominent example is the set ofmeasures of “targeting performance,” which are reviewed later in this chapter.

All of the measures described in this chapter can be calculated from your own pri-mary data set using standard statistical packages, such as Stata, Eviews, SAS, andSPSS. Specially designed, user-friendly software products are now also available to cal-culate the various measures and tests in this chapter; good examples are DAD and thepoverty module of ADePT.1

5.1 Normative Foundations

Recall that in classical utilitarianism (as formulated by Bentham and Mill, as dis-cussed in chapter 1) the yardstick for social progress and policy evaluation is thearithmetic sum of utilities. This still penalizes income inequality, assuming dimin-ishing marginal utility of income (box 1.13). In fact, we can quite generally think ofa measure of income inequality as the loss of aggregate social welfare associated withthat inequality.

For expository purposes, consider the following highly stylized case. Everyone hasthe same utility function, which depends solely on each person’s own real income.To sharpen the analysis further, suppose that this common utility function is simplythe log of income. Suppose also that we can attain any distribution of a fixed totalincome we like by transfers. (This ignores incentive effects, which we heard about inPart One and we will return to in Part Three.) Since everyone is taken to have the sameutility function, the utilitarian social welfare objective is maximized when everyonehas the same income.

In this stylized set-up, any inequality of income entails a loss of social welfare. Theaggregate social welfare is the maximum social welfare less the loss attributable to ine-quality. And the implied measure of inequality is one of those we will consider in detaillater (in box 5.4), namely theMean-Log Deviation (MLD). Thus, we have a clear (albeithighly simplified) ethical foundation for how we should go about measuring inequal-ity in practice. In a similar fashion, an important paper by Anthony Atkinson (1970)showed how one can derive a broad class of normatively grounded inequality measuresfor a more general utility function for which the log of income is a special case.

One objection that can be raised is that lower levels of welfare should get higherweight; we do not just care about income inequality in society, we also care aboutinequality of individual welfare levels. Another way of thinking about this is thatwe want our measure of inequality to reflect our ethical aversion to poverty. Buthow should we incorporate an aversion to inequality of welfare? The literature hasnot helped much in addressing this question. In principle one way of answering thequestion is to postulate a generalized version of the utilitarian schema. Instead ofinsisting that social welfare equally weights individual welfare levels, we can postulate

1 There are also useful guidebooks on using standard software programs; see Duclos and Araar(2006) (using DAD), Haughton and Khandker (2009) (using Stata), and Foster et al. (2013) (usingADePT).

Pover ty and Inequal i ty Measure s 221

a social welfare function (SWF) that puts decreasing weight on higher levels. (We willgive an example in the next section.)

Following this approach, if we insist on the strong form of the Pareto principle (asdiscussed in chapter 1), then we would demand that all weights are positive. But onecan relax this to allow a weaker requirement that all weights are non-negative. Thisallows the possibility that the weight goes to zero above some point—that there issome level of individual welfare above which we put negligible social value to furthergains. A poverty measure can be given a normative interpretation along these lines.2

Alternatively, we can follow Rawls’s (1971) (non-utilitarian) proposal that our prin-ciples of justice should focus first on the poorest stratum and (consistently with thepriority given to liberty) we should make social choices that do most to raise the wel-fare of those in that stratum. This is the “maximin” SWF (box 2.3). A poverty measureis sometimes thought of as a way of implementing this normative approach. However,there are two qualifications. First, a conventional poverty measure does not attach anyexplicit weight to the typical welfare level of the poorest and may even be unaffectedby changes in that level. Second, notice that Rawls clearly has in mind a relative meas-ure, in that it is specific to the context. The idea is not that we stop as soon as everyoneis above some (possibly quite austere) poverty line. Rather, once that point is reached,we then move on to consider the next most disadvantaged group. At each step, we stillneed to identify the most disadvantaged.

Yet a further ethical motivation for poverty measures can be devised by consider-ing inequality of opportunity. We can think of this as combining concerns for equityand efficiency. There is an equity motivation in reducing inequality of opportunityin society—to assure a more level playing field. But it comes with an efficiency con-sideration, namely that we do not want to reduce inequality by bringing everyonedown to the level of the poorest person. Thus, we are drawn again toward someform of “maximin”—a view that policy should maximize the welfare of the mostdisadvantaged group in society.3

5.2 Measuring Inequality

Most people have a reasonably clear idea about the difference between “poverty” and“inequality.” As these terms are normally defined, poverty is about absolute levels ofliving—how many people cannot afford certain pre-determined consumption needs.“Poverty” can be said to exist in a given society when one or more persons do notattain a level of economic well-being deemed to constitute a reasonable minimum bythe standards of that society. Inequality is about the disparities in levels of living; forexample, howmuchmore is held by rich people than poor people. This section reviewsthe strengths and weaknesses of some common measures of inequality.

In applied work, economists typically measure inequality by looking at the ratiosof individual incomes to the overall mean. By this approach, the measure of inequal-ity is unchanged if all incomes increase at the same proportionate rate. (As noted in

2 On this approach, see Ravallion (1994a).3 This is argued by Roemer (1998, 2014).

222 m e a s u r e s a n d m e t h o d s

box 1.8, that is not the only defensible concept of inequality; we return to this point.)A useful graphical tool for measuring inequality is the Lorenz curve, giving the share oftotal income held by the poorest p percentage of the population, ranked by householdincome per capita (or per equivalent single adult). Box 5.1 goes into further detail onthe Lorenz curve.

Box 5.1 The Lorenz Curve, Gini Index and Distribution Function

The Lorenz curve gives (on the vertical axis) the share of total income held by(on the horizontal axis) the poorest p percentage of the population, when rankedby income (figure B5.1.1). The curve is rising throughout, with increasing slope,as shown by the bold curved line in figure B5.1.1. The 45 degree line representsperfect equality; everyone has the mean real income, y. (It is now assumed thatnominal income or consumption has been normalized by a price index and equiv-alence scale to give “real income” denoted by the lower case y.) Intuitively, themore the Lorenz curve bends out the more inequality. If the richest person hasall of the income, then the Lorenz curve is the entire area below the diagonal.

The Gini index is twice the gray-shaded area in figure B5.1.1. This is equal tohalf the average absolute difference between all pairs of incomes in the popula-tion, expressed as a proportion of the mean. The index lies within the interval[0,1]. When everyone has the mean income and this is positive, then the Giniindex is at its lower bound of 0. When the richest person has all the income, theGini index approaches its upper bound of 1 as the population size rises.

The Lorenz curve is related to the cumulative distribution function (CDF),p = F(y), giving the proportion of the population with income no greater than y.The slope of the CDF is called the density function, giving the proportion of thepopulation with income y. If we invert the CDF (flipping the axes) we obtainthe quantile function y(p). (For example, y(0.5) is the median.) Then the slope ofthe Lorenz curve at point p gives the quantile function normalized by the meany(p)/y, where y is the mean.

Share of populationranked by income (p)

Lorenz curve: L(p)

(0,0)

45°

100%

Share ofincomegoing topoorest p%

100%

Figure B5.1.1 Lorenz Curve.

Pover ty and Inequal i ty Measure s 223

Historical note:Max Lorenz was an American economist who developed the ideaof the Lorenz curve (as it came to be known) in 1905, at the age of twenty-nine,while studying for his PhD at the University of Wisconsin–Madison. CorradoGini was an Italian statistician who developed his famous index of inequality in1912. Gini’s paper was published in Italian. Hugh Dalton (1920) drew the atten-tion of English-speaking readers to Gini’s measure and its relationship to theLorenz curve.

A simple inequality measure is the gap between the richest and poorest person.But that is surely too simple, as it ignores everyone else. We would do better to takean average of all the gaps. If we divide the average gap between all pairs of incomesby twice the mean, then we will have an index that lies between 0 (everyone has themean, so there is no inequality) and an extreme upper limit of unity (the richest personhas all the income in a population of infinite size). This is the most famous inequalityindex, the Gini index.4 The Gini index also has a simple relationship with the Lorenzcurve (boxes 5.1 and 5.2).

The Gini index is one of a number of measures that have been proposed that satisfywhat is often called the transfer axiom (also called the transfer principle) as explainedin box 5.3, which also discusses other desirable properties of a measure of inequality.The transfer axiom can be a powerful property for assessing whether inequality hasincreased or not without specifying what specific measure of inequality is being used.If Lorenz curve A lies entirely above that for B (only touching each other at the twolimits) then inequality is unambiguously lower for A than B for any measure satisfyingthe transfer axiom, including the Gini index.5

*Box 5.2 More on the Gini Index

We have a distribution of real income, y1, y2, . . . , yn, with mean y. The abso-lute gap between the income of person i and that of person j is

∣∣yi – yj∣∣. Imagineforming the mean absolute gap () over all the n2 pairs of incomes. One wouldobtain:

=1n2

n∑i=1

n∑j=1

∣∣yi – yj∣∣ .(Notice that the double is needed because we calculate the absolute gapsbetween all pairs of incomes.) The Gini index, designated G, is then obtained

continued

4 For example, the US Census Bureau estimates that the Gini index for 2009 is 0.469. Mean annualhousehold income per capita is $28,051 (averaged over 2008–2012). So the implied mean income gapbetween people is $26,312.

5 See Atkinson (1970) for a proof of this claim.

224 m e a s u r e s a n d m e t h o d s

*Box 5.2 (Continued)

by simply normalizing to assure that its maximum value, when the richestperson has all of the income

(ny

), does not exceed unity. It is readily verified that

this requires that G = /(2y). When the richest person has all of the income, theupper bound of the index is 1 – (1/n). This reaches unity in the limit, as n goes toinfinity.

Further reading: A now classic treatment of the Gini index is found in Sen (1973).A more technical exposition is found in Sen (1976b). Also see the update to Sen(1973) by Foster and Sen (1997).

Box 5.3 Desirable Properties of an Inequality Measure

The transfer axiom says that if one transfers a given sum of money fromperson A to a poorer (richer) person B without changing their ranking then ine-quality must fall (rise); clearly, the Lorenz curve will shift upward (downward)at least somewhere. This axiom makes a lot of sense, and it has been widelyaccepted. Most of the inequality measures found in practice (including the Giniindex described in box 5.2) satisfy this axiom, although there are some that donot, including the variance of the log of income and the ratio of the mean to themedian.

Consider the following change in distribution. Initially, in state A, the incomelevels (in dollars a day, say) in a society of five people are:

A: (0, 10, 10, 10, 10);

$3 is transferred from one of those with $10 to the poorest person, creating thedistribution:

B: (3, 7, 10, 10, 10).

The change satisfies the transfer axiom. This is reasonable, but we shouldacknowledge that objections can be raised:

• The number of pairs of people with the same income has fallen in going fromstate A (three such pairs) to B (two).

• Those who keep $10 may well feel the loss to one of their own more than thegain to the distant and distinct poor person with zero income.

• The number of people whomay feel relatively deprived (poorer than someoneelse) has risen, from one to two.

Another widely accepted axiom is called anonymity (also called symmetry).Essentially this says that it does not matter who has which income level. There

Pover ty and Inequal i ty Measure s 225

are no names attached to the incomes in the lists above. If we swapped theincomes of the poorest person in A with one of those with $10 we will obtain:

C: (10, 10, 10, 10, 0) .

Under anonymity, the measures of inequality for A and C are identical. This toocan be questioned: in the real world, the person who previously had $10 will nodoubt raise an objection! Telling people that inequality has not changed may notbe persuasive when there has been considerable churning of incomes, with bothgains and losses.

A third axiom is called scale invariance. This says that multiplying all incomesby a constant does not change the inequality measure. So the following distribu-tion has the same inequality as A:

D: (0, 20, 20, 20, 20) .

This too can be questioned, given that the absolute gap between the richest andpoorest person has doubled. People’s judgments about inequality appear often tobe inconsistent with scale independence.

A fourth axiom is replication invariance (also called population invariance),which says that the measure of inequality is unchanged when we duplicate thepopulation, or pool identical populations. For example, the following distribu-tion has the same inequality as A:

E: (0, 10, 10, 10, 10, 0, 10, 10, 10, 10) .

Another axiom is called decomposability. This says that the total inequality canbe written as the sum of the inequality between groups plus inequality withingroups. Suppose we partition A into two groups:

A1: (0, 10) and A2 (10, 10, 10) .

Then group A1 has high inequality (the maximum income difference), whileA2 has no inequality. Intuitively, the amount of inequality between the twogroups (comparing the difference in their means) is somewhere between theamounts of inequality within each group.

An objection to decomposability is that in this set up, group membershipsonly have salience via the incomes of those in the groups. The groups have nospecial identity. However, it appears to be the case at times that group identities(such as by race or gender) matter more than this decomposition would suggest.Claims that group identities do not matter to inequality because the betweengroup component of an inequality decomposition is small may not fit well withperceptions on the ground.

Further reading: An important early paper on inequality measurement wasAtkinson (1970). A formal treatment of the axioms of inequality measurementcan be found in Cowell (2000). The discussion of the transfer axiom above drawson Kolm (1998) while that on group identities draws on Kanbur (2006). Also seeFoster et al. (2013, ch. 2).

226 m e a s u r e s a n d m e t h o d s

The Gini index does not have all the properties that one might hope for in a meas-ure of inequality. If we measure “global inequality” similarly to how we measuredglobal poverty, then we would ignore all country borders, pooling all residents, andmeasure the inequality among them as if it were one country. This overall measurewill naturally depend on the inequality between countries as well as that within them.Thus, its evolution over time will depend on growth rates in poor countries relativeto rich ones (roughly speaking), as well as the things happening within countries—economic changes and policies—that affect inequality. However, if we are comparingcountry performance at regional or global levels then, we will want to isolate thewithin-country component of inequality. While there are many inequality measuresand one can always calculate the average inequality index for a group of countries,only for a subset of inequality measures will that average accord with the within-country component of total inequality—implying a clean separation of the part weare interested in from the total inequality. Such an exact decomposition is knownto be impossible for the popular Gini index (boxes 5.1 and 5.2).6 The MLD offers apractical solution. This is given by the (appropriately weighted) mean across sampledhouseholds of the log of the ratio of the overall mean income to individual income; seebox 5.4 for more detail.

* Box 5.4 The Mean Log Deviation: A Simple and Elegant but Not MuchUsed Measure of Inequality

The Gini index has long been the most popular measure of inequality, but itis not the best by some criteria. Consider instead the MLD. For a distribution ofconsumption or income y1, y2, . . . , yn (all elements of which are assumed to bepositive) with mean y, the MLD is simply:

MLD =1n

n∑i=1

ln(yyi

).

Like the Gini index, this measure satisfies the transfer axiom (box 5.3). However,unlike the Gini index, MLD is exactly decomposable by population subgroups.To see how, suppose now that these n individuals are assigned to N mutuallyexclusive groups (countries, say). Let yij denote the consumption of person i ingroup j containing nj people. Then we can rewrite MLD as:

MLD = ln y –N∑j=1

sj

nj∑i=1

ln yij,

6 The exception is when the distributions of the different countries, or subgroups, being compareddo not overlap, which is unlikely in general.

Pover ty and Inequal i ty Measure s 227

where sj is the population share of group j. We can go further and write this as:

MLD = MLDW +MLDB,

where:

MLDW =N∑j=1

sj

(ln yj –

nj∑i=1

ln yij

), and

MLDB = ln y –N∑j=1

sj ln yj

are the within-country and between-country components, respectively, and yjis the mean for group j. So total inequality is the population-weighted sum ofinequality within groups and inequality between them.

MLD also lends itself to implementing the distinction between “vertical” and“horizontal” inequality noted in box 1.8. Chapter 9 will return to this point.

Further reading: MLD is one of the “generalized entropy measures” proposedby Theil (1967). Bourguignon (1979) shows that (under mild restrictions onthe properties of the inequality index) MLD is the only measure satisfying thetransfer axiom that is decomposable with population weights.

Following the discussion in section 5.1, we can think of the inequality measure asthe loss of social welfare due to inequality. Thus, we can ask what SWF underlies eachindex, and see if that is ethically appealing. The SWF corresponding to the Gini indexweights incomes by their rank in the distribution, with highest weight on the lowestincome (box 5.5). This SWF is questionable. It is not clear what ethical justification canbe given to using the rank in the distribution as the weight for incomes. A utilitarianwould presumably also object that the implicit utility function does not exhibit dimin-ishingmarginal utility of income. (The weights are how the Gini index comes to satisfythe transfer axiom.) The MLD is more appealing in this respect as it is the inequalitymeasure corresponding to the utilitarian SWF when utility is log income (box 5.5).However, we can also object to the equal weighting of welfare levels in the utilitarianobjective on the grounds that this does not adequately reflect our aversion to poverty(section 5.1). This can be dealt with by a hybrid measure, combining Gini-type weightswith declining marginal utility. Thus, one can generalize the MLD to incorporate theGini-type SWF in which utilities are weighted by ranks. This simultaneously addressesboth the deficiency of the Gini index (that its implicit SWF does not reflect diminish-ing marginal utility of income) and of the MLD (that it does not give higher weight tolower utilities). Box 5.5 goes into more detail on this measure of inequality.

228 m e a s u r e s a n d m e t h o d s

* Box 5.5 Inequality and Social Welfare

Recall from section 5.1 that inequality measures can be thought of as theloss of social welfare due to inequality at a given mean income. If we orderthe incomes from lowest to highest, y1 ≥ y2 ≥, . . . ,≥ yn, then the SWFcorresponding to the Gini index is given by:

2n2

n∑i=1

iyi =(1n+ 1 – G

)y ∼= (1 – G)y for large n.

Here we see that the “Gini SWF” has incomes that are rank-weighted with thehighest weight on the poorest person. (Note that

∑iyi = y1 + 2y2+ . . . . + nyn).

If you take $1 from person j and give it to person k (and nothing else changes),then the Gini index will rise if (and only if) k > j. The Gini index times the meancan be interpreted as the social welfare loss from inequality. While the Gini-SWF gives higher weight to lower levels of welfare, it measures the latter simplyby income. Thus, it does not incorporate the utilitarian idea of diminishingmarginal utility of income.

We can easily see the implicit SWF in the MLD if we rewrite the first equationin box 5.4 as:

1n

n∑i=1

ln yi = ln y –MLD.

On the left-hand side, we have the mean utility for a common utility function ofthe form ln yi. The log transformation embodies diminishing marginal utility ofincome (box 1.13). This is the SWF implicit in using the MLD as the measureof inequality. On the right-hand side, we have the log of the mean, which is themaximum of the mean utility when a fixed total income is redistributed, lessMLD. Thus, we can interpret MLD as the loss of social welfare due to inequality,assuming that incomes can be redistributed without altering total income. (Thisverifies the claim made in section 5.1.)

A more general utility function is proposed in Atkinson (1970), taking theform y1–ε/(1 – ε) for ε �= 1 and ln y for ε = 1. A higher value of the parameterε implies that a greater penalty is attached to income inequality. This generatesthe corresponding class of Atkinson inequality measures.

While MLD (and the more general Atkinson function) incorporates dimin-ishing marginal utility of income, it weights utilities equally. A simple way ofresponding to this concern is to adopt the Gini-type SWF with rank-weights(box 5.2) while maintaining the assumption that utility is log income (rather

than the level of income, as in the Gini index). So the new SWFwould ben∑i=1

i ln yi,

where incomes are ordered from the richest (i = 1) to the poorest (i = n). The newversion of the MLD would take the form:

MLD∗ =2

n(n + 1)

n∑i=1

i ln(yyi

).

Pover ty and Inequal i ty Measure s 229

(Note thatn∑i=1

i = n(n+1)2 .) One drawback is that this modified index loses the neat

decomposability property of MLD (box 5.4).

Another possible candidate for a measure of “inequality” is the strongly relativepoverty measure, in which the poverty line is set at a constant proportion of the mean(as discussed in chapter 4). This makes our aversion to poverty very clear. However,one should be careful as this measure does not respect the transfer axiom. One canconstruct examples whereby distribution A Lorenz dominates B—so that A has lessinequality than B for any well-behaved measure of inequality—and yet the stronglyrelative measure is higher for the A distribution.7 The strongly relative poverty meas-ure could indicate higher poverty in A than B, even though inequality and absolutepoverty are unambiguously lower in A than B. And such examples are also possiblewhen the transfers are only made among the poor. Thus, the strongly relative pov-erty measure is not only independent of the mean, it need not be consistent withreasonable normative judgments about relative poverty.

So far in this section, we have only discussed relative inequality, whereby if allincomes are multiplied by a constant the measure is unchanged. By this approach,inequality depends on the ratios of incomes in the population. “Absolute inequal-ity” depends instead on the absolute differences in levels of living, rather than relativedifferences, as captured by the ratios to the mean. Standard practice is to meas-ure inequality using a relative measure, consistently with the scale invariance axiom(box 5.3). However, one can equally well measure inequality in terms of the absolutedifferences, not normalized by the mean.8 To understand this distinction, consideran economy with just two household incomes: $1,000 and $10,000. If both incomesdouble in size then relative inequality will remain the same; the richer household isstill 10 times richer. But the absolute difference in their incomes has doubled, from$9,000 to $18,000. Relative inequality is unchanged but absolute inequality has risensharply.

The choice between absolute and relative inequality measures comes down towhether one accepts the scale invariance axiom for inequality measurement (box 5.3).Recall that this says that multiplying all incomes by a constant leaves the measureof inequality unchanged. However, it should not be forgotten that this is an axiom.We do not have to accept it. You can prefer to say that adding an equal amount to allincomes does not change inequality. (This is sometimes called translation invariance.)If one keeps the mean constant (at some reference value, such as the base year) when

7 This is shown in Ravallion (1994b) by exploiting the analytic properties of the Lorenz curve(following Gastwirth 1971).

8 In one of the earliest papers on measuring inequality, Dalton (1920) discussed both absolute andrelative measures. Kolm (1976) noted the distinction (as discussed later). But the distinction largelyvanished from the subsequent literature until it re-emerged in the context of the globalization debate(Ravallion 2003a, 2004).

230 m e a s u r e s a n d m e t h o d s

calculating the Gini indices over time, then it becomes an absolute Gini index, asdistinct from the relative index in box 5.2.

This is no mere academic debate. Perceptions on the ground that “inequality is ris-ing” appear often to be referring to an absolute concept of inequality, as reflected incommonly heard statements such as “the rising gap between the rich and the poor.”And the distinction matters to how one views distributional policies. Serge Kolm(1976) gives the following example. May 1968 saw mass protests by students andworkers in France, which eventually led to the Grenelle agreement for a 13% increasein all salaries. However, many of the protesters felt cheated, for in their view thisagreement would increase income inequality.9 As we will discuss further in chapter 8,whether one thinks about inequality as absolute or relativematters greatly to the long-standing policy debates about the distribution of the gains from economic growth.

It is not that one concept is “right” and one “wrong.” They simply reflect differentvalue judgments about what constitutes higher “inequality.” And it appears that manypeople think about inequality in absolute terms. YoramAmiel and Frank Cowell (1992,1999) did some simple but clever experiments to identify which concept of inequalityis held by people. They found that 40% of the university students they surveyed (inthe United Kingdom and Israel) think about inequality in absolute rather than relativeterms.10 In 2014, I fielded a subset of the types of questions used by Amiel and Cowellto my class of undergraduates (using a confidential computer-based questionnairetool); these were students doing a course based on this textbook although the surveywas done before we got to the lecture dealing with the axioms of inequality meas-urement. From the 130 responses, the class was roughly evenly split between thosewho thought about inequality in relative terms versus those who thought about it inabsolute terms. Interestingly, the “absolutists” were a clear majority when the stylizedincomes were “low” but the “relativists” became the majority when the incomes were“high.” However, almost all agreed with both the anonymity and transfer axioms.11

5.3 Measuring Poverty

Suppose we now have ameasure of individual welfare that has been estimated for eachhousehold in a sample. Sometimes we may have a sequence of values of this measureover time for each household. This time profile is used in distinguishing transient fromchronic poverty (section 5.4).12 But for now we imagine just one value for each house-hold. How do we aggregate this information into a measure of poverty for each of thedistributions being compared? The literature has identified numerous axioms for adesirable measure of poverty. Box 5.6 reviews the main ones.

9 For this reason, Kolm calls the absolute measure the “Leftist measure” while the relative measureis “Rightist.” I leave it to the reader to judge how closely this distinction matches peoples’ politics.

10 Harrison and Seidl (1994) report similar findings for a large sample of German universitystudents.

11 More so it seems than the students surveyed by Harrison and Seidl (1994).12 A time profile of welfare measures is also postulated in constructing measures that allow for

selective premature mortality (Kanbur and Mukerjee 2007); we return to this point.

Pover ty and Inequal i ty Measure s 231

Box 5.6 Desirable Properties of a Poverty Measure

The most widely agreed property is called the focus axiom. This says that themeasure of poverty should be unaffected by any changes in the incomes (orconsumptions) of those who are not deemed to be poor (and stay so after thechanges). A concern raised about this axiom is that it assumes that we knowwith certainty who is poor and who is not.

Another property that is considered desirable is called themonotonicity axiom.This says that, holding all else constant, the measure of poverty must rise if apoor person experiences a drop in her income. This is appealing, but (as we willsee) it is not satisfied by the most common measure of poverty. An extensionof the last axiom is called subgroup monotonicity. This says that if we partitionthe population into two groups, each with a fixed size, and poverty increases inone group while remaining unchanged in the other then aggregate poverty mustrise. This also seems reasonable; it would certainly be odd to find that after suc-cessfully reducing poverty in (say) rural areas, without any loss to poor peoplein urban areas, and no change in the urban–rural composition of the population,that poverty in the country as a whole has risen. Subgroup monotonicity is sat-isfied for all additive measures, meaning that aggregate poverty is the arithmeticsum of the individual levels of poverty in the population.

A number of other axioms have been proposed that are similar to inequalitymeasurement (see the discussion in box 5.3). In the context of povertymeasures,scale invariance means that the measure is unchanged when all incomes and thepoverty line increase by the same proportion. (The measure is then said to behomogeneous of degree zero.) Replication invariance requires that the measureis unchanged when we replicate the current population, or pool identical pop-ulations. The transfer axiom for poverty measures says that the measure fallswhenever a given sum of money is transferred from a poor person to someonewho is even poorer (without changing their ranking).

Note on the literature: In some of the literature, measures that satisfy scale invar-iance are referred to as “relative poverty measures,” as distinct from “absolutepoverty measures,” which satisfy instead a translation invariance property, inthat they are invariant to adding the same absolute amount to all incomes andthe poverty line. This is essentially the same distinction we heard about withregard to inequality measures (section 5.2). I shall not use this terminology hereas there is a risk of confusion with the distinction between absolute and relativepoverty lines (chapter 4).

Further reading:A seminal early contributionwasmade by Sen (1976a). Blackorbyand Donaldson (1980) discuss a number of issues, including the scale invari-ance axiom. See Foster and Shorrocks (1991) on subgroup monotonicity. Zheng(1993) lists other axioms. Also see Foster et al. (2013, ch. 2) for an overview ofthe various axioms that have been proposed in the literature.

232 m e a s u r e s a n d m e t h o d s

PovertyMeasuresThere is now a large literature on poverty measures.13 I will focus solely on additivemeasures. This is not unduly restrictive, and this class of measures is known to havedesirable properties (box 5.6).14 Rather than discuss all of themeasures that have beenused or proposed, I shall focus on a few representative additive measures and discussthe pros and cons of each. Box 5.7 provides a glossary.

Box 5.7 Glossary of Measures of Poverty

Headcount index (H) The proportion of the population living inhouseholds with income per person (or perequivalent single adult) less than or equal to thepoverty line. Suppose q people are poor by thisdefinition in a population of size n. Then theheadcount index,H, is simply the proportion of thepopulation deemed poor:H = q/n. This satisfiesthe focus and scale invariance axioms but none ofthe other axioms in box 5.6.

Poverty gap index (PG) Mean distance below the poverty line as aproportion of the line where the mean is taken overthe whole population, counting those above theline as having zero gap. To see how this measure isdefined, let consumptions be arranged in ascendingorder; the poorest has Y1, the next poorest Y2,etc., with the least poor having Yq, which is (bydefinition) no greater than the poverty line Z. Nowdefine the proportionate poverty gap of person i as(Z – Yi) /Z = 1 – Yi/Z if the person is poor (Yi < Z);if the person is not poor, then the gap is set to zero.PG is then the mean proportionate poverty gap, sodefined. This fails the transfer axiom in box 5.6.

Income gap ratio (I) Mean distance below the poverty line as aproportion of the line, among the poor alone. Thisfails the monotonicity and transfer axioms inbox 5.6.

13 For useful surveys, see Foster (1984), Atkinson (1987), and Hagenaars (1987).14 See Atkinson (1987) and Foster and Shorrocks (1991). The latter’s requirement of subgroup

monotonicity (what they call “subgroup consistency”) essentially requires additive measures. The Sen(1976a) index does not qualify.

Pover ty and Inequal i ty Measure s 233

Squared poverty gapindex (SPG)

As for PG except that the proportionate povertygaps are weighted by themselves. Thus, to calculateSPG one takes the mean of the squaredproportionate poverty gaps, (1 – Yi/Z)2 for thepoor, and zero otherwise. This measure satisfies allthe axioms in box 5.6.

The Watts index (W) Mean proportionate gap, measured as the log of thepoverty line less the log of income, counting thenon-poor as having zero gap. This satisfies all theaxioms in box 5.6. It also satisfies a number ofother axioms found in the literature (Zheng 1993).

The simplest (and most common) measure is the headcount index of poverty, givenby the proportion of the population for whom the measure of economic welfare Y isno greater than the poverty line Z. This is simply one point on the CDF, namely F(Z),where F is the proportion of the population with income or consumption below Z (or,more precisely, living in households with income per capita less than or equal to Z).

While H has been by far the most popular index, it is not the best. H is easilyunderstood and communicated and for certain sorts of poverty comparisons, such asassessing overall progress in reducing poverty, it may be quite adequate (though pref-erably always calculated for at least two poverty lines). However, for some purposes,including analyses of the impacts on the poor of specific policies, the H has a seri-ous drawback. To see why, suppose that a poor person suddenly becomes very muchpoorer. What will happen toH? Nothing. The index is totally insensitive to differencesin the depth of poverty among the poor.

This can be important when looking at progress against poverty over time, or theimpacts of policies on poverty. Each panel in figure 5.1 gives two CDFs. In each casethe upper one is (say) before a policy change and the lower one is after that change.

(a) “Rising tide lifts all boats” (b) “Poorest left behind”

Measure ofwelfare

Measure ofwelfare

Cumulative % of population

Cumulative % ofpopulation

Povertyline

Povertyline

HH

Figure 5.1 Stylized Representations of Poverty Reduction.

234 m e a s u r e s a n d m e t h o d s

The impact on H is similar, but the distribution of the gains among the poor is verydifferent, with much larger gains (as measured by the horizontal differences) amongthose below the poverty line in case (a).

Box 5.8 Different Ways of Defining the Poverty Gap Index

In box 5.7, PG is defined as the mean of the proportionate poverty gaps inthe population, where the gap is set to zero for the non-poor. Another way ofdefining PG is as PG = I.H, where I is the “income gap ratio” and is defined byI = 1 –Mz/Z, whereMz denotes the mean consumption of the poor. Note, how-ever, that the income gap ratio is not a very good poverty measure. To see why,suppose that someone just below the poverty line is made sufficiently better offto escape poverty. The mean of the remaining poor will fall, and so the incomegap ratio will increase. And yet one of the poor has become better off, and noneis worse off; one would be loath to say that there is not less poverty, and yet thatis what the income gap ratio would suggest. This problem does not arise if theincome gap ratio is multiplied by the head count index to yield PG; under thesame circumstances, that measure will register a decrease in poverty.

PG also has an interpretation as an indicator of the potential for eliminat-ing poverty by targeting transfers to the poor. The minimum cost of eliminatingpoverty using targeted transfers is simply the sum of all the poverty gaps in apopulation; every poverty gap is filled up to the poverty line. The cost would be(Z –Mz).q (recall that q are poor). Clearly this assumes that the policymaker hasa lot of information; one should not be surprised to find that a very “pro-poor”government would need to spend far more than this in the name of povertyreduction. At the other extreme, one can consider the maximum cost of elimi-nating poverty, assuming that the policymaker knows nothing about who is poorand who is not. Then the policymaker would have to give Z to everyone to be surethat none is poor; the cost is Z.n. The ratio of the minimum cost of eliminatingpoverty with perfect targeting to the maximum cost with no targeting is simplyPG. Thus, this poverty measure is also an indicator of the potential saving to thepoverty alleviation budget from targeting. Of course, realizing that potential inpractice is a different matter, as we will see in chapter 10.

A better measure for capturing gains below the line is the poverty gap index(PG), which is simply the mean of the proportionate poverty gaps in the popula-tion (box 5.7). Box 5.8 discusses various ways of defining PG, to help understand itsproperties.

A drawback of the PG is that it may not convincingly capture differences in theseverity of poverty among the poor. For example, consider two distributions of con-sumption for four persons; the A distribution is (1,2,3,4) and the B is (2,2,2,4). For apoverty line Z = 3 (so that H = 0.75 in both cases), A and B have the same value ofPG = 0.25. However, the poorest person in A has only half the consumption of thepoorest in B. One can think of B as being generated from A by a transfer from the least

Pover ty and Inequal i ty Measure s 235

poor person to the poorest. The poverty gap will be unaffected. In other words, thismeasure does not satisfy the transfer axiom.

There are a number of poverty measures in the literature that penalize inequal-ity among the poor and so satisfy the transfer axiom (box 5.6).15 In applied work thebulk of attention has focused on additive measures, meaning that aggregate povertyis equal to the population-weighted sum of poverty levels in the various subgroups ofsociety. There are conceptual and practical advantages to such additivity in the con-struction of poverty profiles and in testing hypotheses about poverty comparisons;the discussion will return to some of these issues.

The earliest additive measure that penalizes inequality among the poor was pro-posed by Watts (1968). This is the mean proportionate poverty gap (log of the ratio ofpoverty line to income), counting the non-poor as having no gap. (Box 5.9 describesthe Watts index in more detail.) This index has many desirable features, although ithas not been used much and was little known until the mid-1990s.16 The measure isespecially attractive whenwe come to discuss the incidence of the benefits of economicgrowth later in this chapter and in chapter 8.

The Watts index is a member of a large class of additive distribution-sensitivemeasures.17 A recent example is the squared poverty gap (SPG) introduced by JamesFoster, Joel Greer, and Erik Thorbecke (1984). This is similar to PG with the (impor-tant) difference that the individual proportionate poverty gaps are weighted by thosegaps, giving the squared proportionate gap.18 A proportionate poverty gap of (say)10% of the poverty line is given a weight of 10% while one of 50% is given a weightof 50% (notice that, in the case of PG, they are weighted equally). Again we take themean of these squared proportionate gaps across the population (counting the gap aszero for the non-poor).

Box 5.9 The Watts Index: An Old Measure Nobody Paid Much Attention toTurns Out to Be the Best!

The Watts index was the first poverty measure to penalize inequality amongthe poor, and it is arguably the best. The index satisfies all the desirable axiomsfor a poverty measure described in box 5.6, plus other properties that have hadadvocates in the literature. We can define theWatts proportionate poverty gap of

continued

15 One of the earliest such measures that attempted to do this was proposed by Sen (1976a,1981a). However, this did not satisfy the transfer axiom, as was pointed out by Thon (1979), althoughShorrocks (1995) showed that a simple re-normalization of the Sen index did satisfy the transfer axiomand also assured continuity (so there was no jump in the index when someone crossed the povertyline).

16 Zheng (1993) rediscovered the Watts index and demonstrated its desirable properties in moreformal terms.

17 As characterized in formal terms by Atkinson (1987).18 That is in fact how the authors came up with the index (based on communication with Erik

Thorbecke).

236 m e a s u r e s a n d m e t h o d s

Box 5.9 (Continued)

person i as ln(Z/Yi) if the person is poor (Yi < Z); if the person is not poor, thenthe gap is zero, of course. Note that this is not the same as the proportionatepoverty gap (1 – Yi/Z), which is why we shall call ln(Z/Yi) the Watts proportion-ate poverty gap. Now take the mean of these proportionate poverty gaps in thepopulation. If incomes are ordered such that Yi ≤ Z if an only if i < q, then theWatts index is:

W =1n

q∑i=1

ln(Z/Yi).

(Notice that the headcount index is H = q/n.) If all the incomes of poor peoplegrow at a rate g, thenW/g is approximately the average time it takes to exit pov-erty at the growth rate g (Morduch 1998). If all the incomes in the populationgrow at the same rate (so that the Lorenz curve remains unchanged), then theelasticity of the Watts index to the mean is –H/W.

Note on the literature: TheWatts index did not start to be acknowledged in the lit-erature on the theory of poverty measurement for about twenty-five years afterWatts’s (1968) paper. Zheng (1993) drew attention to the many desirable fea-tures of the index. It also became important in the later literature on “pro-poorgrowth,” which we return to in section 5.6.

One drawback of the distribution-sensitive measures is that they are not as easy tointerpret as PG or (especially) H.19 For poverty comparisons, however, the key pointis that a ranking of dates, places, or policies in terms of SPG should reflect well theirranking in terms of the severity of poverty. It is the ability of the measure to orderdistributions in a better way than the alternatives that makes it useful, rather thanthe precise numbers obtained.

On comparing the above formulae for H, PG, and SPG, a common structure is evi-dent. This suggests a generic class of measures in which the proportionate povertygaps are raised to the power α, which is a non-negative parameter. This is the Foster-Greer-Thorbecke (FGT) class of poverty measures, for which we can use the genericsymbol Pα. When α = 0, we get the measure P0 = H; when α = 1, we get P1 = PG; whileα = 2 gives us P2 = SPG. For all α > 0, the individual poverty measure in the FGTindex is strictly decreasing in the living standard of the poor (the lower your stand-ard of living, the poorer you are deemed to be). Furthermore, for α > 1, it also hasthe property that if the increase in your measured poverty due to a fall in standardof living will be deemed greater, the poorer you are.20 The FGT measure is then said

19 The measure can be thought of as the sum of two components: an amount due to the povertygap and an amount due to inequality among the poor. More precisely, let CVp2 denote the squaredcoefficient of variation of consumption among the poor. Then SPG = I.PG + (1 – I) (H – PG)CVp

2.20 For a complete axiomatic characterization of the FGT class of poverty measures, see Foster and

Shorrocks (1991, proposition 7).

Pover ty and Inequal i ty Measure s 237

α= 1

α= 2

α= 0

Z “income” (Y)

Indi

vidu

al p

over

ty m

easu

re

Figure 5.2 Individual Poverty Measures for Various Values of the Inequality-AversionParameter (α) in the FGT Index.

to penalize inequality among the poor whenever α > 1. One can say that the indexis “strictly convex” in incomes (“weakly convex” for α = 1). In the limit, as α goes toinfinity the index becomes the lowest level of income observed in the data.

Figure 5.2 shows how the relationship between individual poverty and theeconomic-welfare metric varies across the different values of α. The higher the valueof α, the more sensitive the measure is to the well-being of the poorest person; asα approaches infinity it collapses to a measure which only reflects the poverty ofthe poorest person. Figure 5.1 also illustrates another conceptual attraction of SPG,namely, that the individual poverty measure hits zero smoothly at the poverty line;thus there is negligible difference in the weight the measure attaches to someonejust above the poverty line versus someone just below it.21 Given the aforementionedconcerns about introducing discontinuities in the individual poverty measure andthe uncertainties in measuring living standards discussed above, this is a desirableproperty.22

Does it really matter which of these measures one uses? Intuitively, the answerdepends on whether, and how, relative inequalities in the society have changed. If allconsumption levels (poor and non-poor) have changed by the same proportion—sometimes called a “distribution neutral” growth or contraction—then all of thesepovertymeasures will yield the same ranking in the poverty comparison, and the rank-ing in terms of absolute poverty will depend solely on the direction of change in themean of the distribution.

However, the differences between these measures can become quite pronouncedotherwise. Consider, for example, two policies: Policy A entails a small redistributionfrom people around the mode of the distribution, which is also where the poverty linehappens to be located, to the poorest households. (This is actually a fair character-ization of how a reduction in the prices of domestically produced food staples wouldaffect the distribution of welfare in some Asian countries.) Policy B entails the opposite

21 This relates to a long-standing issue of whether poverty is best viewed as a discrete or continuousphenomenon. For further discussion, see Atkinson (1987). This measurement issue has been foundto be important in the context of analyzing the effects of risk on poverty (Ravallion, 1988a) and incharacterizing optimal poverty reduction schemes (Bourguignon and Fields 1990; Ravallion 1991b).

22 It is not shared by some other distributionally sensitive poverty measures, including Sen(1976a).

238 m e a s u r e s a n d m e t h o d s

change—the poorest lose, while those at the mode gain. (An increase in food stapleprices in the above example.) A moment’s reflection will confirm that the headcountindexH will prefer policy B;HA > HB, since changes inH depend solely on which direc-tion people are crossing the poverty line. However, a measure such as SPGwill indicatethe opposite ranking, SPGA < SPGB, since it will respond relatively more to the gainsamong the poorest than among the not-so-poor.

The need to examine higher order poverty measures, such as PG and SPG, alsodepends on whether or not the poverty comparison in terms of the headcount indexhas considered more than one poverty line, as recommended in the previous section.If only one poverty line is used, then it should be considered imperative, in my view,to check the higher order measures. But values of H for one or two extra poverty linescan often provide an adequate substitute. If, for a given poverty line, the higher ordermeasure gives a different result to the headcount index, then this will also hold for analternative headcount index based on a sufficiently low poverty line.

There is another concern about the standard poverty measures discussed above.As we will learn in chapter 7, there is selective mortality in that poorer people areless likely to survive. When a poor person dies, standard measures will show a declinein poverty. (This is obvious for the headcount index, but it also holds for the higherorder measures reviewed above.) Similarly, higher fertility rates for poor families willtend to increase the poverty rate in a purely mechanical way. This is an instance of ageneric problem in welfare economics when one uses any form of average welfare (suchas per capita income or per capita utility) in assessing social progress with varyingpopulations: the demise of anyone below the average will increase the average. Thatis not ethically acceptable. Given this problem, we need supplementary measures onmortality, in case this is why we are seeing falling poverty measures.

The standard measures of poverty can also be modified to better reflect ourjudgments on this matter. Ravi Kanbur and Diganta Mukerjee (2007) propose anintriguing solution based on the idea of a normative lifetime, L, such that poverty ismeasured using the time profile of incomes for all those born L years ago, whether theyare still alive today. An income must be imputed for the years not living; a seeminglynatural assumption to make is that all those now dead have a lower imputed incomethan when they were alive. (Newly rich vampires are ruled out!) Having establishedthis time profile the measurement problem proceeds similarly to before. For example,one can derive a modified version of the FGT index.23

TheConsumption FloorThis refers to the typical level of living of the poorest in a given society.We can think ofthis as the lower bound of permanent consumption (recalling box 3.11). Human phys-iology makes it likely that consumption levels below some critical (positive) value areunlikely to be sustainable for more than a fairly short time period. This is the biologicalfloor. Social and political factors may also come into play to influence the level of theconsumption floor, which can thus rise above the biological floor in a specific society.

23 Kanbur and Mukerjee (2007) show how this is done and address a number of implementationissues in making this approach operational.

Pover ty and Inequal i ty Measure s 239

The idea of the consumption floor has been around since at least the time of thefirst economists. Early ideas of the “subsistence wage” can be interpreted as the wagerate required to assure that the consumption floor is reached for a typical family.The classical economists identified the consumption floor as the point at which thepopulation is constant; any temporary increase (decrease) in consumption in a neigh-borhood of the floor would induce population growth (contraction). The idea of a floorhas been a key feature of development models for dualistic economies, such as in theoriginal model of Lewis (1954), which we return to in chapter 8. The idea of a con-sumption floor has often been built into demand models, famously in Engel’s Law(box 1.16). It has sometimes been incorporated in modern economic models of thegrowth process, which we also return to in chapter 8.24 And the idea of a consumptionfloor is found in discussions of the problem of determining the optimal populationsize.25 When living close to the consumption floor, the prospects for investment andsustained growth will naturally be limited.

This is quite a different concept to a poverty line, which is not typically intended tobe a biological floor, in the sense that nobody lives below that level for any sustainedperiod of time. Naturally, any poverty line aims instead to reflect what “poverty”means in a specific society, on the understanding that (potentially many) people livebelow that level. The poverty line is invariably above the biological floor.

Indeed, the idea that we should judge progress in part at least by success in rais-ing the floor is missing from virtually all standard poverty measures. Raising the floorautomatically reduces, ceteris paribus, any measure of poverty satisfying the monoto-nicity axiom (box 5.6). However, none of the standard axioms of povertymeasurementattach any explicit value to the level of the floor. This appears to be due in large partto the difficulties in identifying the floor.26

Nor does the fact that an overall poverty measure is falling tell us that the poor-est are doing better—that society’s consumption floor is rising. The floor may stayput, even though fewer people living at or near it. This is illustrated in figure 5.3. Thedecline in the poverty rate is similar but in panel (a) the level of living of the poorestis unchanged.

As we saw in chapter 2, an important school of modern political philosophy hasargued that we should judge a society’s progress by its ability to enhance the welfareof the poorest, following the principles of justice proposed by Rawls (1971). This hasbeen proposed as a principle for judging development success. For example, in whatcame to be known as his “talisman,” Mahatma Gandhi (1958, 65) wrote that “recall theface of the poorest and weakest person you have seen and ask if the step you contem-plate is going to be any use to them.”Watkins (2013, 1) refers explicitly to the Gandhi’stalisman and argues that “as a guide to international cooperation on development,that’s tough to top.”

Quantifying the consumption floor is not easy, however. With a sound samplingdesign and large enough sample, we can be confident about an estimate of the overallmean, but it is far from clear how reliably we could estimate the consumption floor.

24 See, e.g., Azariadis (1996), Ben-David (1998), and Kraay and Raddatz (2007).25 See Dasgupta (1993, ch. 13).26 See, e.g., Freiman’s (2012) comments on Rawls’s maximin principle.

240 m e a s u r e s a n d m e t h o d s

(a) Poorest left behind (b) Same reduction in the incidence of povertybut with leaving the poorest behind

Measure ofwelfare

Cumulative % ofpopulation

Measure ofwelfare

Cumulative % ofpopulation

Povertyline

Povertyline

Figure 5.3 Same Reduction in the Poverty Count but Different Implications for thePoorest.

If we were to know the true consumption and any other relevant aspects of wel-fare, we could estimate the floor directly from a sufficiently large sample. However,we must recognize the existence of measurement errors (chapter 3). There are alsolikely to be transient effects in the data, whereby observed consumption in a sur-vey falls temporarily below the floor (such as due to illness), but recovers soon afterthe survey is done. Given the measurement errors, there is a non-negligible chancethat anyone within some stratum of low observed consumption levels is in fact atthe floor.

We can postulate a probability that a person with a given observed consumptionis in fact living at the floor. These probabilities are not data, of course. But there aresome defensible assumptions we can make in lieu of the missing data. It is reasonableto assume that the probability of being the poorest person is highest for the personwho appears to be worst off in our data. It is also reasonable to assume that the prob-ability declines as the person’s observed measure of welfare rises. Beyond some pointthere can be no chance that person is the worst off. Box 5.10 explains this idea in moredetail and proposes a specific functional form, based on a member of the class of Betadistributions in statistics, which establishes a useful link with FGT class of povertymeasures. Thus, we can readily apply existing poverty measures to the task of imple-menting a Rawlsian approach to assessing social progress, although the focus switchesto ratios of FGT measures.

* Box 5.10 An Approach to Estimating the Expected Value of theConsumption Floor

Let the consumption floor be denoted, ymin. This is the lowest level of perma-nent consumption in a population. However, it is not observed given transienteffects and measurement errors. Our data comprise n observed consumptions, y.

Pover ty and Inequal i ty Measure s 241

We can treat the consumption floor as a random variable, meaning that it has aprobability distribution given the data. The task is to estimate the mean of thatdistribution based on the observed consumptions. We can write this as:

E(ymin∣∣y ) = ∑

φ(yi)yi.

Here the probability that person i, with the observed yi, is in fact the worst offperson is φ(yi). For example, if we are certain that the person with the low-est observed y also has the lowest permanent consumption, then this formulareturns that value. More generally we cannot be sure that the person with thelowest y is in fact the worst off person (as discussed in the text). However,it may be reasonable to trust our data sufficiently to believe that she has thehighest probability of being the worst off. The probability then falls as observedconsumption rises, until it reaches zero at some level z.

A specific functional form satisfying these assumptions is:

ϕ(yi) = k(1 – yi/z)α for yi ≤ z

= 0 for yi > z.

Here there are three parameters, k, α, and z, all positive constants. The k parame-ter assures that the probabilities add up to unity, which requires that k = 1/(nPα),where Pα is the FGT measure (with y’s rank-ordered starting from the lowest):

Pα =1n

q∑i=1

(1 – yi/z)α.

However, in contrast to the FGT poverty measure, now the parameter α deter-mines how fast the chance of being the poorest person falls as y increases ratherthan the degree of aversion to inequality among the poor, as in the FGT index.Rather than draw this, we can use figure 5.1 and redefine the vertical axis as theprobability of being the poorest person. The third parameter, z, is not a standardpoverty line; rather it is the point above which we no longer think there is anychance that a person is the worst off.

We can then derive the following formula for the expected value of theconsumption floor:

E(ymin∣∣y ) = z(1 – Pα+1/Pα).

For example, if we assume that the probability of being the worst off person fallslinearly with y up to z, then the expected value of the floor is z(1 – SPG/PG). Notethat α = 0 can be ruled out; the probability must fall as y increases. To put thepoint another way, if one uses α = 0, then every consumption below z is equallylikely to be the lowest, so z (1 – PG/H) is the mean consumption of the poor.

continued

242 m e a s u r e s a n d m e t h o d s

* Box 5.10 (Continued)

With falling poverty measures over time, the expected value of the consump-tion floor will only rise if the proportionate rate of decline in the Pα+1 measureexceeds that for Pα . Intuitively, a rising floor requires faster progress against themore distribution-sensitive FGT measure.

Further reading: Ravallion (2014 f) goes into more detail and provides an applica-tion to data for the developing world. We review the results in chapter 7.

Estimation IssuesTwo distinct types of data are encountered in practice: household-level (sometimescalled “unit record”) data and tabulated grouped data derived from the household-leveldata. Unit record data are typically only available in a machine-readable form, whilegrouped data are often found in governmental statistical publications. Quite differentproblems are encountered in estimating poverty measures on these two types of data.

All of the additive poverty measures above can be readily and accurately calculatedas the means of the corresponding individual poverty measures when one has accessto unit record data. The main points to be aware of are:

(1) It should not be presumed that estimates of poverty measures from unit recorddata aremore accurate than those from grouped data, since the latter can “averageout” errors in the unit record data, such as negative consumption figures, whichmight otherwise add a sizable bias to estimates of the severity of poverty.

(2) Most large household surveys use stratified samples, whereby the chance of beingselected in the sample is not uniform over the population. This is often done toassure adequate sample sizes in certain regions. Estimates of population parame-ters from a stratified sample are unbiased, if weighted by the appropriate inversesampling rates (box 3.6). Provided your data set includes the sampling rate foreach household or area, it is easy to do so.27

(3) One should be clear on whether one wants to estimate poverty among house-holds or poverty among people. For example, suppose one ranked householdsby consumption per person, but measured poverty in terms of the proportion ofhouseholds who are below the poverty line. Household size tends to be negativelycorrelated with consumption per person, so your calculation will tend to under-estimate the number of persons who are living in poor households (though it willnot necessarily underestimate the number of poor persons; that also depends onthe distribution within the household, which is typically unknown).

Themost defensible position on this last issue is to recognize that poverty is experi-enced by individuals, and not by households per se, and so it is poverty among personsthat we are trying tomeasure. Although wemay not know anything about distribution

27 For further discussion, see, e.g., Levy and Lemeshow (1991).

Pover ty and Inequal i ty Measure s 243

within the household, that does notmean that we should onlymeasure poverty amonghouseholds. A common practice is to assume an equal distribution within householdswhen constructing the estimated distribution of individual consumptions. This maywell lead one to underestimate poverty among persons, and the magnitude need notbe negligible.28 However, it is not clear what would be a better assumption. Furtherresearch using individual consumption data, when available, may be able to throw lighton best practice when such data are not available.

Sometimes in practice one only has grouped data, such as income shares of house-holds ranked in deciles, or income frequency distributions. (And even household-leveldata can be interpreted as grouped individual data.) Poverty lines rarely occur at theboundaries of the grouped data. So we must find some way of interpolating betweenthose boundaries. Linear interpolation is the easiest method, but it can be quite inac-curate, particularly when the poverty line is far from the mode of the distribution(such as when it is found in the typically quite nonlinear lower region of the CDF.29)Quadratic interpolation is usually feasible with the same data, and is generally moreaccurate, though one must be wary of the possibility that the probability density(slope of the frequency distribution) implied by this method does not become nega-tive. Another method of interpolation which can be very accurate, and is also useful incertain policy simulations, involves estimation of a parameterized Lorenz curve. Thisis a precise mathematical model of the Lorenz curve, and a number of specificationshave been proposed in the literature. The accuracy depends greatly on the particularspecification used, and some tend to dominate others on many data sets.30

Hypothesis TestingTesting hypotheses about differences in poverty between two situations is not difficultfor additive poverty measures when the poverty line is treated as fixed (i.e., measuredwithout error). Recall that additive measures can be calculated as the sample mean ofan appropriately defined individual poverty measure. For random samples, the stand-ard error can then also be readily calculated.31 This allows us to test hypotheses aboutpoverty, such as whether it is significantly higher in one subgroup than in another.

28 See, e.g., Haddad and Kanbur (1990).29 For example, I have come across one estimate of the headcount index for a country (which can

remain unnamed) that was obtained by linear interpolation in the lowest class interval of the groupeddistribution; the estimate was 9.5%. However, when one re-estimatedwith amodel of the Lorenz curve(based on a Beta specification; see below) that took account of the nonlinearity, one obtained a figureof 0.5%! This is an extreme case, though large errors are likely from linear interpolation at the lowerend of any grouped distribution.

30 Two specifications that have been found to work well in practice are the generalized quad-ratic model and the Beta model; see Villasenor and Arnold (1989) and Kakwani (1980a), respectively.Formulae for poverty measures as functions of the Lorenz curve parameters and the mean of thedistribution are found in Datt and Ravallion (1992).

31 The key result from statistics being used here is called the Central Limit Theorem. Let a samplemean M of any variable be calculated from a random sample of size N and let μ denote the true value

244 m e a s u r e s a n d m e t h o d s

For the headcount index, the standard error can be calculated the same way as forany population proportion.32 These methods can be extended to other additive pov-erty measures. Nanak Kakwani (1990) has derived formulae for the standard errors ofa number of other additive measures, including the FGT measures.33

SummaryWhere does this tour of the extensive literature on povertymeasurement leave us? Theadditive and smooth distribution sensitive poverty measures that have been proposed(such as the Watts index and SPG) have considerable theoretical appeal. Nonetheless,the “lower order” measures—the headcount and poverty gap indices—are sure toremain popular if only because they are much easier to interpret. As a rule, the head-count index also tends to be less sensitive to some common forms of measurementerror at the bottom of the observed distribution. It is important to know whether apoverty comparison is sensitive to the choice of poverty measure, and not just becauseof the uncertainties involved in that choice; differences in rankings by different mea-sures can also tell us something about the precise way in which the distribution ofliving standards has changed. Section 5.5 will describe some analytical tools that canhelp assess sensitivity to the choice of poverty measure.

5.4 Decompositions of PovertyMeasures

Decompositions can be useful tools in poverty analysis. The discussion here will firstdiscuss how a single aggregate poverty number can be decomposed to form a pov-erty profile. It will then look at two useful ways of decomposing changes in povertyover time.

Poverty ProfilesA “poverty profile” is simply a special case of a poverty comparison, showing howpoverty varies across subgroups of society, such as region of residence or sector

of the mean. Then it can be shown that (M – μ)N approaches a normal distribution with mean zero asN increases.

32 Like any population share, H has a binomial distribution in random samples, approaching anormal distribution as sample size increases. Thus the standard errors—the standard deviation ofthe sample distribution of the headcount index—is given by

√[H. (1 – H) /N] for a sample of size

N. For all but very small sample sizes (less than 5), a useful rule-of-thumb is that the approxi-mation involved in using the normal distribution will be accurate as long as the absolute value of√

{(1 – H) /H} –√

{H/ (1 – H)} does not exceed 0.3N (Box et al. 1978).33 The standard error of the FGT measure (Pα) is

√ [(P2α – Pα

2) /N], which yields the aforemen-

tioned standard error for the headcount index as the special case when α = 0. This formula does nottake account of survey design; typically there will be a degree of clustering in survey design that willraise standard errors of those in simple random samples; and there will be a degree of stratificationthat will lower them. Also one will often need to weight the observations, such as by household size.For formulae for the standard errors that deal with these common features of data, see Howes andLanjouw (1997). Also see Preston (1992).

Household Sample Surveys in Developing and Transition Countries

335

Chapter XVI Presenting simple descriptive statistics from household survey data

Paul Glewwe Department of Applied Economics

University of Minnesota St. Paul, Minnesota, United States of America

Michael Levin United States Bureau of the Census

Washington, D.C., United States of America

Abstract

The present chapter provides general guidelines for calculating and displaying basic descriptive statistics for household survey data. The analysis is basic in the sense that it consists of the presentation of relatively simple tables and graphs that are easily understandable by a wide audience. The chapter also provides advice on how to put the tables and graphs into a general report intended for widespread dissemination. Key terms: descriptive statistics, tables, graphs, statistical abstract, dissemination.

Household Sample Surveys in Developing and Transition Countries

336

A. Introduction 1. The true value of household survey data is realized only when the data are analysed. Data analysis ranges from analyses encompassing very simple summary statistics to extremely complex multivariate analyses. The present chapter serves as an introduction to the next four chapters and, as such, it will focus on basic issues and relatively simple methods. More complex material is presented in the four chapters that follow. 2. Most household survey data can be used in a wide variety of ways to shed light on the phenomena that are the main focus of the survey. In one sense, the starting point for data analysis is basic descriptive statistics such as tables of the means and frequencies of the main variables of interest. Yet, the most fundamental starting point for data analysis lies in the questions that the data were collected to answer. Thus, in almost any household survey, the first task is to set the goals of the survey, and to design the survey questionnaire so that the data collected are suitable for achieving those goals. This implies that survey design and planning for data analysis should be carried out simultaneously before any data are collected. This is explained in more detail in chapter III. The present chapter will focus on many practical aspects of data analysis, assuming that a sensible strategy for data analysis has already been developed following the advice given in chapter III. 3. The organization of this chapter is as follows. Section B reviews types of variables and simple descriptive statistics; section C provides general advice on how to prepare and present basic descriptive statistics from household survey data; and section D makes recommendations on how to prepare a general report (often called a statistical abstract) that disseminates basic results from a household survey to a wide audience. The brief final section offers some concluding remarks.

B. Variables and descriptive statistics 4. Many household surveys collect data on a particular topic or theme, while others collect data on a wide variety of topics. In either case, the data collected can be thought of as a collection of variables, some of which are of interest in isolation, while others are primarily of interest when compared with other variables. Many of the variables will vary at the level of the household, such as the type of dwelling, while others may vary at the level of the individual, such as age and marital status. Some surveys may collect data that vary only at the community level; an example of this is the prices of various goods sold in the local market.27 5. The first step in any data analysis is to generate a data set that has all the variables of interest in it. Data analysts can then calculate basic descriptive statistics that let the variables 27 In most household surveys, the household is defined as a group of individuals who: (a) live in the same dwelling; (b) eat at least one meal together each day; and (c) pool income and other resources for the purchase of goods and services. Some household surveys modify this definition to accommodate local circumstances, but this issue is beyond the scope of this chapter. �Community� is more difficult to define, but for the purposes of this chapter, it can be thought of as a collection of households that live in the same village, town or section of a city. See Frankenberg (2000) for a detailed discussion of the definition of �community�.

Household Sample Surveys in Developing and Transition Countries

337

�speak for themselves�. There are a relatively small number of methods of doing so. The present section explains how this is done. It begins with a brief discussion of the different kinds of variables and descriptive statistics, and then discusses methods for presenting data on a single variable, methods for two variables, and methods for three or more variables.

1. Types of variables 6. Household surveys collect data on two types of variables, �categorical� variables and �numerical� variables. Categorical variables are characteristics that are not numbers per se, but categories or types. Examples of categorical variables are dwelling characteristics (floor covering, wall material, type of toilet, etc.), and individual characteristics such as ethnic group, marital status and occupation. In practice, one could assign code numbers to these characteristics, designating one ethnic group as �code 1�, another as �code 2�, and so on, but this is an arbitrary convention. In contrast, numerical variables are by their very nature numbers. Examples of numerical variables are the number of rooms in a dwelling, the amount of land owned, or the income of a particular household member. Throughout this chapter, the different possible outcomes for categorical variables will be referred to as �categories�, while the different possible outcomes for numerical variables will be referred to as �values�. 7. When presenting data for either type of variable, it is useful to make another distinction, regarding the number of categories or values that a variable can take. If the number of categories/values is small, say, less than 10, then it is convenient (and informative) to display complete information on the distribution of the variable. However, if the number of values/categories is large, say, more than 10, it is usually best to display only aggregated or summary statistics concerning the distribution of the variable. An example will make this point clear. In one country, the population may consist of a small number of ethnic groups, perhaps only four. For such a country, it is relatively easy to show in a simple table or graph the percentage of the sampled households that belong to each group. Yet, in another country, there may be hundreds of ethnic groups. It would be very tedious to present the percentage of the sampled households that fall into each of, say, 400 different groups. In most cases, it would be simpler and sufficiently informative to aggregate the many different ethnic groups into a small number of broad categories and display the percentage of households that fall into each of these aggregate categories. 8. The example above used a categorical variable, ethnic group, but it also applies to numerical variables. Some numerical variables, such as the number of days a person is ill in the past week, take on only a small number of values and so the entire distribution can be displayed in a simple table or graph. Yet many other numerical variables, such as the number of farm animals owned, can take on a large number of values and thus it is better to present only some summary statistics of the distribution. The main difference in the treatment of categorical and numerical variables arises from how to aggregate when the number of possible values/categories becomes large. For categorical variables, once the decision not to show the whole distribution has been made, one has no choice but to aggregate into broad categories. For numerical variables, it is possible to aggregate into broad categories, but there is also the option of displaying summary statistics such as the mean, the standard deviation, and perhaps the

Household Sample Surveys in Developing and Transition Countries

338

minimum and maximum values. The following subsection provides a brief review of the most common descriptive statistics.

2. Simple descriptive statistics 9. Tables and graphs can provide basic information about variables of interest using simple descriptive statistics. These statistics include, but are not limited to, percentage distributions, medians, means, and standard deviations. The present subsection reviews these simple statistics, providing examples using household survey data from Saipan, which belongs to the Commonwealth of the Northern Mariana Islands and from American Samoa.

10. Percentage distributions. Household surveys rarely collect data for exactly 100, or 1,000 or 10,000 persons or households. Suppose that one has data on the categories of a categorical variable, such as the number of people in a population that are male and the number that are female, or data on a numerical variable, such as the age in years of the members of the same population. Presenting the numbers of observations that fall into each category is usually not as helpful as showing the percentage of the observations that fall into each category. This is seen by looking at the first three columns of numbers in table XVI.1. Most users would find it more difficult to interpret these results if they were given without percentage distributions. The last three columns in table XVI.1 are much easier to understand if one is interested in the proportion of the population that is male and the proportion that is female for the different age groups. Of course, one may be interested in column percentages, that is to say, the percentage of men and the percentage of women falling into different age groups. This is shown in table XVI.2. (A third possibility is to show percentages that add up to 100 per cent over all age by sex categories in the table, but this is usually of less interest.) Both tables show that percentage distributions can be shown for either categorical or numerical variables.

Table XVI.1. Distribution of population by age and sex, Saipan, Commonwealth of the

Northern Mariana Islands, April 2002: row percentages

Numbers Row percentages Broad age group, in years Total Male Female Total Male FemaleTotal persons 67 011 29 668 37 343 100.0 44.3 55.7 Less than 15 16 915 8 703 8 212 100.0 51.5 48.5 15 to 29 18 950 5 765 13 184 100.0 30.4 69.6 30 to 44 20 803 9 654 11 149 100.0 46.4 53.6 45 to 59 8 105 4 458 3 648 100.0 55.0 45.0 60 years or over 2 239 1 088 1 150 100.0 48.6 51.4Source: Round 10 of the Commonwealth of the Northern Mariana Islands Current Labour-force Survey. Note: Data are from a 10 per cent random sample of households and all persons living in collectives.

Household Sample Surveys in Developing and Transition Countries

339

Table XVI.2. Distribution of population by age and sex, Saipan, Commonwealth of the

Northern Mariana Islands, April 2002: column percentages

Numbers Column percentages Broad age group, in years Total Male Female Total Male FemaleTotal persons 67 011 29 668 37 343 100.0 100.0 100.0 Less than 15 16 915 8 703 8 212 25.2 29.3 22.0 15 to 29 18 950 5 765 13 184 28.3 19.4 35.3 30 to 44 20 803 9 654 11 149 31.0 32.5 29.9 45 to 59 8 105 4 458 3 648 12.1 15.0 9.8 60 years or over 2 239 1 088 1 150 3.3 3.7 3.1Source: Round 10 of the Commonwealth of the Northern Mariana Islands Current Labour-force Survey. Note: Data are from a 10 per cent random sample of households and all persons living in collectives.

11. It is clear from table XVI.1 that the sex distribution differs across the age groups. This reflects something that cannot be seen in tables XVI.1 and XVI.2, namely that Saipan has many immigrant workers � particularly female workers � employed in its garment factories. While Saipan has slightly more males than females at the youngest ages, the next age group, those 15-29 years, has only 30 males for every 70 females. Age group 30-44 also has more females than males. This is consistent with the fact that most of Saipan�s garment workers are women between the ages of 20 and 40. In the next group, those 45-59 years of age, there are more males than females. The column percentages in table XVI.2 show that the largest age group for males was that of 30-44, while the largest age group for females was that of 15-29, the age group of females most likely to work in the garment factories.

12. Medians. The two most common statistical measures for numerical variables are means and medians. (By definition, categorical variables are not numerical and thus one cannot calculate means and medians for such variables.) The median is the midpoint of a distribution, while the mean is the arithmetic average of the values. The median is often used for variables such as age and income because it is less sensitive to outliers. As an extreme example, let us assume that there are 99 people in a survey with incomes between $8,000 and $12,000 per year, and symmetrically distributed around $10,000. Thus, the mean and the median would be $10,000. Now suppose one more person with an income of $500,000 during the year is included, then, the mean would be about $15,000 while the median would still be about $10,000. For many income variables, published reports often show both the mean and the median.

13. Returning to the data from Saipan, the median age for the Saipan population was 28.5 years in April 2002, that is to say, half the population was older than 28.5 years and half was younger than 28.5 years. The female median age was lower than the male median age (27.6

Household Sample Surveys in Developing and Transition Countries

340

versus 30.5), because of the large number of young immigrant females working in the garment factories.

14. Means and standard deviations. As noted above, the mean is the arithmetic average of a numerical variable. Means are often calculated for the number of children ever born (to women), income, and other numerical variables. The standard deviation measures the average distance of a numerical variable from the mean of that variable, and thus provides a measure of the dispersion in the distribution of any numerical variable.

15. Table XVI.3 shows medians and means for annual income obtained from the 1995 American Samoa Household Survey. The survey was a 20 per cent random sample of all households in the territory. The fact that household mean income was higher than the median income is not surprising, since some households earned significantly higher wages and derived higher income from other sources. Tongan immigrants are relatively poor, as seen by their low mean and low median income; while the high mean and high median income of �other ethnic groups� indicate that they are relatively well off.

Table XVI.3. Summary statistics for household income by ethnic group, American Samoa,

1994 Other ethnic

Annual income Total Samoan Tongan Groups Number of households surveyed

8 367 7 332 244 790Median (United States dollars)

15 715 15 786 7 215 23 072Mean (United States dollars)

20 670 20 582 8 547 25 260 Source: 1995 American Samoa Household Survey. Note: Data are an unweighted, 20 per cent random sample of households.

3. Presenting descriptive statistics for one variable 16. The simplest case when presenting descriptive statistics from a household survey is that where only one variable is involved. The present subsection explains how this can be produced for both categorical and numerical variables. 17. Displaying the entire distribution. Categorical or numerical variables that take a small number of categories or values, say 10 or less, are the simplest to display. A table can be used to show the entire (percentage) distribution of the variable by presenting the frequency of each of the categories or numerical values of the variable. An example of this is given in table XVI.4, which shows the (unweighted) sample frequency counts and percentage distribution for the main sources of lighting among Vietnamese households. Many household surveys require the use of weights to estimate the distribution of a variable in the population, in which case showing the raw sample frequencies may be confusing and thus is not advisable; the use of weights will be discussed in section C below. (The survey from Viet Nam was based on a self-weighting sample

Household Sample Surveys in Developing and Transition Countries

341

and thus no weights were needed.) A final point is that it is also useful to report the standard errors of the estimated percentage frequencies (see chap. XXI for a detailed discussion of this issue, which is complicated by the use of weights and by other features of the sample design of the survey). 18. In some cases, the number of categories or values taken by a variable may be large, but the major part of the distribution is accounted for by only a few categories or values. In such cases, it may not be necessary to show the frequency of each category or value. One option to prevent the amount of information from taxing the patience of the reader of a table is to combine rare cases into a general �other� category. For example, any category or value with a frequency of less than 1 per cent could go into this category. Indeed, this is what was done in table XVI.4, where �other� includes rare cases such as torches and flashlights. In some cases, there may be other natural groups. For example, in many countries, ethnic and religious groups can be divided into a large number of distinct categories, but there may be a much smaller number of broad groups into which these more precise categories fit. In many cases, it will be sufficient to present figures only for the more general groups. The main exception to this rule concerns categories that may be of particular interest even though they occur rarely. In general, such �special interest but rare� categories could be reported separately, but it is especially important to show standard errors in such instances because the precision of the estimates is lower for rare categories. 19. In many cases, presentation of data can be made more interesting and more intuitive if it is displayed as a graph or chart instead of as a table. For a single variable that has only a small number of categories or values, a common way to display data graphically is in a column chart or histogram, in which the relative frequency of each category or value is indicated by the height of the column. Figure XVI.1 provides an example of this, using the data presented in table XVI.4. Another common way of displayinig of the relative frequency of the categories or values of a variable is the pie chart, which is a circle showing the relative frequencies in terms of the size of the �slices� of the pie. An example of this is given in figure XVI.2, which also displays the information given in table XVI.4. See Tufte (1983) and Wild and Seber (2000) for detailed advice on how to design effective graphs.

Table XVI.4. Sources of lighting among Vietnamese households, 1992-1993

Method

Number of households

Percentage of households (standard error)

Electricity

2 333 48.6 (0.7)

Kerosene/oil lamp

2 386 49.7 (0.7)

Other

81 1.7 (0.2)

Total households in sample 4 800 100.0 Source: 1992-1993 Viet Nam Living Standards Survey. Note: Data are unweighted.

Household Sample Surveys in Developing and Transition Countries

342

Figure XVI.1. Sources of lighting among Vietnamese households, 1992-1993 (column chart)

1.7

49.748.6

0102030405060

Electricity Kerosene/oillamp

Other

Percentage of

households

Source: 1992-1993 Viet Nam Living Standards Survey. Note: Sample size: 4,800 households.

Figure XVI. 2. Sources of lighting among Vietnamese households, 1992-1993 (pie chart) (Percentage)

49.7

1.7

48.6 ElectricityKerosene/oil lampOther

Source: 1992-1993 Viet Nam Living Standards Survey. Note: Sample size: 4,800 households.

20. Displaying variables that have many categories or values. Both categorical and numerical variables often have many possible categories or values. For categorical variables, the only way to avoid presenting highly detailed tables and graphs is to aggregate categories into broad groups and/or combine all rare values into an �other� category, as discussed above. For numerical variables, there are two distinct options. 21. First, one can divide the range of any numerical variable with many values into a small number of intervals and display the information in any of the ways described above for the case where a variable has only a small number of categories or values. For example, this was done for the age variable in tables XVI.1 and XVI.2. This option can also be used in graphs: information on the distribution of a numerical variable that takes many values can be displayed using a graph that shows the frequency with which the variable falls into a small number of categories. One example of such a graph is the histogram, which approximates the density function of the underlying variable. Histograms divide the range of a numerical variable into a relatively small number of �sub-ranges�, commonly called bins. Each bin is represented by a

Household Sample Surveys in Developing and Transition Countries

343

column that has an area proportional to the percentage of the sample that falls in the sub-range corresponding to the bin. Figure XVI.3 does this for the age data in table XVI.2. The first bin is the sub-range from 0 to 14; the next is the sub-range from 15 to 29, and so on.28 Note that, unlike the column chart in figure XVII.1, there is no distance between the �columns� of the histogram. This is because the horizontal axis in a histogram depicts the range of the variable, and variables typically have no �gaps� in their range.

Figure XVI.3. Age distribution of the population in Saipan, April 2002 (histogram)

0

5

10

15

20

25

30

35

0-14 15-29 30-44 45-59 60-74 75-89 90-104

Age

Percentage of population

Source: Round 10 of the Commonwealth of the Northern Mariana Islands Current Labour-force Survey 22. The second, and perhaps most common, option for displaying a numerical variable that takes many values is to present some summary statistics of its distribution, such as its mean, median, and standard deviation. This can be done only by showing these statistics in a table; it is not possible to show summary statistics for a single numerical variable in a graph. In addition to the mean, median and standard deviation, it is also useful to present the minimum and maximum values, the values of the upper and lower quartiles,29 and perhaps a measure of skewness. An example of this is given in table XVI.5.

4. Presenting descriptive statistics for two variables.

23. Examination of the relationships between two or more variables often offers much more insight into the underlying topic of interest than examining a single variable in isolation. Yet, at the same time the possibilities for displaying the data increase by an order of magnitude. The

28 This histogram divides the population aged 60-99 into three groups (60-74, 75-89 and 90-104) each of which spans the same number of years, 15, as the population groups younger than 60. This is done to ensure that the area in each column of the histogram is proportional to the percentage of the population in each age group. 29 The lower quartile of a distribution is the value for which 25 per cent of the observations are less than the value and 75 per cent are greater than the value, and the upper quartile is the value for which 75 per cent of the observations are lower than the value and 25 per cent are higher than the value.

Household Sample Surveys in Developing and Transition Countries

344

present subsection describes common methods, distinguishing between variables that have a small number of categories or values and variables that take a large number of values.

24. Two variables with a small number of categories or values. The simplest case for displaying the relationship between two variables is that where both variables have a small number of categories or values. In a simple two-way tabulation, the categories or values of one variable can serve as the columns, while the categories or values of the other variable can serve as the rows. An example of this is shown in table XVI.6, which illustrates the use of different types of health service providers in urban and rural areas of Viet Nam. In this example, the columns sum to 100 per cent. As explained above, an alternative would be for the rows to sum to 100 per cent. In the example from Viet Nam, percentage figures that sum to 100 per cent across each row would indicate how the use of each type of health facility was distributed across urban and rural areas of Viet Nam. A third alternative would be for each �cell� of this table to give the frequency (in percentage terms) of the (joint) probability of a visit to a health-care facility by someone in a particular geographical region (urban or rural), in which case the sum of the percentages over all rows and columns would be 100 per cent. This is rarely used, however, since conditional distributions are usually more interesting. In any case, it is good practice to report sufficient data so that any reader can derive all three types of frequencies given the data provided in the table.

Table XVI.5. Summary information on household total expenditures: Viet Nam,

1992-1993 (Thousands of dong per year)

Mean 6 531 Standard deviation 5 375 Median 5 088 Lower quartile 3 364 Upper quartile 7 900 Smallest value 235 Largest value 100 478

Source: 1992-1993 Viet Nam Living Standards Survey. Note: Sample size: 4,799 households.

Table XVI.6. Use of health facilities among population (all ages) that visited a health facility in the past four weeks, by urban and rural areas of Viet Nam, in 1992-1993

Urban areas Rural areas

Place of consultation Frequency Percentage (std. error)

Frequency Percentage (std. error)

Hospital or clinic 251 45.0 (2.1) 430 25.0 (1.0) Commune health centre 30 5.4 (1.0) 318 18.5 (0.9) Provider�s home 213 38.2 (2.1) 595 34.6 (1.1) Patient�s home 50 9.0 (1.2) 376 20.1 (1.0) Other 14 2.5 (0.7) 29 1.7 (0.3) Total 558 100.0 1718 100.0

Source: 1992-1993 Viet Nam Living Standards Survey.

Household Sample Surveys in Developing and Transition Countries

345

25. There are several ways to use graphs to display information on the relationship between two variables that take a small number of values. When showing column or row percentages, one convenient method is to show several vertical columns that sum to 100 per cent. Each column represents a particular value of one of the variables, and the frequency distribution of the other variable is shown as shaded areas of each column. This is shown for the health facility data from Viet Nam in figure XVI.4. Spreadsheet software packages present many other variations that one could use.

Figure XVI.4. Use of health facilities among the population (all ages) that visited a health facility in the past four weeks, by urban and rural areas of Viet Nam, in 1992-1993

(Percentage)

5.418.5

38.234.6

20.1

45.025.0

2.5 1.7

0%10%20%30%40%50%60%70%80%90%

100%

Urban areas Rural areas

Other

Home of patient

Home of provider

Commune healthcenterHospital or clinic

9.0

Source: 1992-1993 Viet Nam Living Standards Survey. Note: Sample size: 2,276.

26. One variable with a small number of categories/values and a numerical variable with many values. Another common situation is one where there are two variables. One takes a small number of categories or values (perhaps after aggregating to reduce the number) and the other is a numerical variable that takes many values. Here the most common way to display the data is in terms of the mean of the numerical variable, conditional on each value of the variable that takes a small number of categories or values. One could also add other information, such as the median and the standard deviation. An example of this is seen in table XVI.7, which shows mean household total expenditure levels in Viet Nam in 1992-1993 with households being classified by the seven regions of that country. This could be put into a �profile plot� column graph, where each column (x-axis) represents a region and the lengths of the columns (y-axis) are proportional to the mean incomes for each region. 27. Another option is to transform the continuous variable into a discrete variable by dividing its range into a small number of categories. For example, it is sometimes convenient to divide households into the poorest 20 per cent, the next poorest 20 per cent, and so on, based on household income or expenditures. After this is done, one can use the same methods for displaying data for two discrete variables, as described above. A specific example is to modify figure XVI.4 to show five columns, one for each income quintile.

Household Sample Surveys in Developing and Transition Countries

346

28. Two numerical variables with many values. Statisticians often provide summary information on two numerical variables in terms of their correlation coefficient (the covariance of the two variables divided by the square root of the product of the variances). However, such statistics are often unfamiliar to a general audience. An alternative is to graphically display the data in a scatter-plot that has a dot for each observation. This could show, for example, the extent to which household income is correlated over two periods of time, using observations on the same households in two different surveys (one for each period of time).

Table XVI.7. Total household expenditures by region in Viet Nam, 1992-1993 (Thousands of dong per year)

Region Mean total expenditures

(standard errors in parentheses) Northern uplands 4 792 ( 95.5) Red River delta 5 306 (110.4) North central 4 708 (107.7) Central coast 7 280 (234.8) Central highlands 6 173 (373.7) South-east 10 786 (398.5) Mekong Delta 7 801 (167.4) All Viet Nam 6 531 (77.6)

Source: 1992-1993 Viet Nam Living Standards Survey. Note: Sample size: 4,799 households. 29. One problem with using scatter-plots is that when the sample size is large, the graph becomes too �crowded� to interpret easily. This can be avoided by drawing a random subsample of the observations (for example, one tenth of the observations) to keep the diagram from becoming too crowded. Another problem with scatter-plots is how to adjust them to account for sampling weights. One simple method is to create duplicate observations, with the sampling weight being the number of duplicates for each observation. This will almost certainly overcrowd the scatter-plot; hence after creating the duplicates, only a random subsample of the observations should be included in the scatter plot.

5. Presenting descriptive statistics for three or more variables

30. In principle, it is possible to display relationships between three or more variables using tables and graphs. Yet, this should be done rarely because it adds additional dimensions that complicate both the understanding of the underlying relationships and the methods for displaying them in simple tables or graphs. In practice, it is sometimes possible to show the descriptive relationships among three variables, but it is almost never feasible to show descriptive relationships among four or more variables.

31. For three variables, the most straightforward approach is to designate one variable as the �conditioning� variable. Either this variable will have a small number of discrete values or, if continuous, it will have to be �discretized� by calculating its distribution over a small number of

Household Sample Surveys in Developing and Transition Countries

347

intervals over its entire range. After this is done, separate tables or graphs can be constructed for each category or value of this conditioning variable. For example, suppose one is interested in showing the relationship among three variables: the education of the head of household, the income level of the household, and the incidence of child malnutrition. This could be done by generating a separate table or graph of the relationship between income and an indicator of children�s nutritional status (such as the incidence of stunting) for each education level. This may show, for example, that the association between income and child nutrition is weaker for households with more educated heads.

C. General advice for presenting descriptive statistics

1. Data preparation 32. Before any figures to be put into tables and graphs are generated, the data must be prepared for analysis. This involves three distinct tasks: checking the data to remove observations that may be highly inaccurate; generating complex (derived) variables; and thoroughly documenting the preparation of the �official� data set to be used for all analysis. In all three tasks, extra effort and attention to detail initially may save much time and many resources in the future. The present subsection presents a brief overview of these tasks; for a much more detailed treatment the reader should consult chapter XV. 33. Virtually every household survey, no matter how carefully planned and executed, will have some observations for some variables that do not appear to be credible. These problems range from item non-response (see chap. XI) and other clear errors -- for example, a three-year-old child who is designated as the head of household -- to much less clear cases, such as a household with very high income but an average level of household expenditures. In many cases, the errors are due to inaccurate data entry from paper questionnaires and so the paper questionnaire should be checked first. Such data entry errors can be easily fixed. If the strange data are on the questionnaire itself, there are several options. First, one could change the value of the variable to �missing�. If there are only a small number of such cases, those observations can be excluded when calculating any table or graph that uses that variable.30 If there are a large number of cases, the �missing� values can be calculated as a distinct category of a categorical variable, labelled �not reported� or �not stated�. Second, if most of the cases are concentrated in a small number of households, those households could be dropped. Third, if there are many questionable observations for many households for some variables, a decision may have to be made not to present results for that variable. 34. One approach to missing data is to �impute� missing values using one of several methods. Imputation methods assign values to unknown or �not reported� cases, as well as to cases with implausible values. Approaches include the hot deck imputation and nearest neighbour methods, which allow for a �best guess� for a response when none is available. The idea behind these methods is quite simple: households or people that are similar in some 30 This option has the disadvantage that the sample size will differ slightly for each table. While this could cause confusion, a note at the bottom of each table explaining that a few observations were dropped should provide sufficient clarification.

Household Sample Surveys in Developing and Transition Countries

348

characteristics are probably also similar in other characteristics. For example, houses in a given rural village are likely to have walls and roofs that are similar to those of houses in other rural areas, as opposed to houses in urban areas. Similarly, most of the people in a household will have the same religion and ethnicity. The survey team must decide on the specific rules to follow in light of the country�s demographic, social, economic and housing conditions. 35. While imputation methods are quite useful, they also may have serious problems. The team members responsible for data analysis must decide whether to change missing data on a case-by-case basis or use some kind of imputation method. The effects on the final tabulations must be considered. Imputing 1 or 2 per cent of the cases should have little or no effect on the final results. If about 5 per cent of the cases are missing or inconsistent with other items, imputation should probably still be considered. However, the need to impute a much larger proportion of values, say 10 per cent or more, could very well make the variable unsuitable for use in display and analysis, hence no results should be presented for that variable. Readers should consult chapters VIII and XI and the references therein for further advice on imputation and the handling of missing values. 36. Another aspect of data preparation is calculation of complex (derived) variables. In many household surveys, total household income or total household expenditure, or both, are calculated based on the values of a large number of variables. For example, total expenditure is typically calculated by adding up expenditures on 100 or more specific food and non-food items. While in theory, calculating these variables is straightforward, in practice many problems can arise. For example, in calculating the farm revenues and expenditures of rural households, it is sometimes the case that farm profits are negative. When strange results occur for specific households, it may help to look at each of the components that go into the overall calculation. One or two may stand out as the cause of the problem. Continuing with the example of farm profits, it may be that the price of some purchased input is unusually high. In this case, the profit could be recalculated using an average price. 37. Unfortunately, preparing the data sets when problems arise is more of an art than a science. Decisions will have to be made when it is not clear which choice is the best. Finally, it is important to document the choices made and, more generally, to document the entire process by which the �raw data� are transformed into tables and graphs. The documentation should include a short narrative about the process plus all the computer programs that manipulated and transformed the data.

2. Presentation of results 38. The best way to present basic statistical results will vary according to the type of survey and the audience. However, some general advice can be given that should apply in almost all cases. 39. The most important general piece of advice is to present results clearly. This implies several more specific recommendations. First, all variables must be defined precisely and clearly. For example, when presenting tables and graphs on household �income�, the income variable should be either �per capita income� or �total household income�, never just �income�.

Household Sample Surveys in Developing and Transition Countries

349

Complex variables such as income and expenditure should be defined clearly in the text and in footnotes to tables and graphs. Does income refer to income before or after taxes? Does it include the value of owner-occupied housing? Does income refer to income per week, per month or per year? This must be completely clear. For many variables, it is very useful to present in the text the wording in the household questionnaire from which the variable has been derived. For example, for data on adult literacy, it should be very clear how this variable has been defined. It may be defined by the number of years the person has attended school, or the person�s ability to sign his or her name, or the respondent�s statement that he or she can read a newspaper; or it may be based on some kind of test given to the respondent. Different definitions can give very different results. 40. A second specific recommendation regarding clarity is that percentage distributions of discrete variables should be very clear as to whether they are percentages of households or percentages of people (that is to say, of the population). In many cases, these will give different results. In many countries, better-educated individuals have relatively small families. This implies that the proportion of the population living in households with well-educated heads is smaller than the proportion of households that have a well-educated head. A third recommendation regarding clarity is that graphs should show the numbers underlying the graphical shapes. For example, the column chart in figure XVI.1 shows the percentages for each of three sources of lighting among Vietnamese households, and the same is true of the pie chart in figure XVI.2. 41. Finally, there are several other miscellaneous pieces of advice. First, reports should not present huge numbers of tables and a vast array of numbers in each table. Statistical agencies sometimes present hundreds of tables giving minute details that are unlikely to be of interest to most audiences, and a similar point often applies concerning the detail in a given table. Staff preparing reports should discuss the purpose of the various tables that are being prepared, and if little use can be perceived in presenting a particular table or the detailed information in a given table, then the extraneous information should be excluded. Second, estimates of sampling errors should be reported for a selection of the most important variables collected in the survey; in addition, it is highly useful to show the confidence intervals for key variables or indicators. This is an obvious point, but it is often overlooked. It emphasizes the importance of conveying to the reader the degree of precision of the information provided by the household survey. Third, the sample sizes should be given for each table.

3. What constitutes a good table 42. The present subsection offers specific advice about preparing tables that present information from a household survey. When preparing tables and graphs, the following general principle applies: the information the tables include should be sufficient to enable the user to interpret them correctly without having to consult the text of the report. This is highly important because many users of reports photocopy tables and later use them without reference to the accompanying text.

43. The advice given below is general in nature. For any survey, the survey team must decide which conventions are most appropriate. Once the conventions are chosen, they should

Household Sample Surveys in Developing and Transition Countries

350

be very strictly followed. However, in some cases, divergence from the conventions may be necessary to illustrate specific points or to display specific types of statistical analyses. A final point regarding this subsection is that almost all of these guidelines for tables also apply to graphs. 44. The various parts of a good table are included in table XVI.6. Each table should contain: a clear title; geographical designators (when appropriate); column headers; stub (row) titles; the data source; and any notes that are relevant. 45. Title. The title should provide a succinct description of the table. This description should include: (a) the table number; (b) the population or other universe under consideration (including the unit of analysis, such as households or individuals; (c) an indication of what appears in the rows; (d) an indication of what appears in the columns; (e) the country or region covered by the survey; and (f) the year(s) of the survey. 46. Regarding the table number, most statistical reports number their tables consecutively, starting with table XVI.1, and continuing through to the last table. Sometimes countries use letters and numbers for different tables sets, for example, H01, H02, etc., for housing tables, and P01, P02, etc., for population tables. While this procedure is simple and straightforward, it has the disadvantage that reports become locked into the numbering, making additions or deletions very cumbersome. 47. The universe is the population or housing base covered by the table. If all of the population is included in the table, then the universe can be omitted from the title: the total population is assumed. In contrast, if a table encompasses a subpopulation such as persons in the labour force, and the potential labour force is defined as persons aged 10 years or over, then the title might contain the phrase � Population aged 10 years or over�. 48. The title of table XVI.6 also includes an indication of what appears in the rows and what appears in the columns of the table. In particular, it states that the table presents information on types of health facilities used (the rows) and shows this information separately for urban and rural areas (the columns). Including the country or region in the title makes the geographical universe immediately apparent. This feature is most important for researchers comparing results between countries. Obviously, the country statistical office collecting the data will know its own country name; but persons using tables from different countries may need this information in order to distinguish between the countries.

49. Finally, the year(s) of the survey should be in the title to make the time frame immediately apparent. Sometimes, a country�s national statistics agency may want to show data from two or more different surveys in the same table. Then two dates may appear, for example �1990 and 2000� or �1980 through 2000�. The survey team must make a decision about whether it wants to write out a series of dates (for example, �1980, 1990 and 2000�, rather than the simpler, but less complete, �1980 through 2000�); once the decision has been made, however, the country should always follow its decision.

Household Sample Surveys in Developing and Transition Countries

351

50. Geographical designators. Whenever the same table is repeated for lower levels of geography, each table should have a geographical designator to clarify which table applies to which geographical region. For example, if table XVI.6 were repeated for each of Viet Nam�s seven regions, the name of the region could appear in parentheses in a second line immediately below the title of the table. �Non-geographical� designators could also be used. For example, a table might be repeated for major ethnic groups or nationalities.

51. Column headers. Each column of a table must be labelled with a �header�. Column headers can have more than one �level�; for example, in table XVI.6, the header for the first two columns is designated as �Urban areas� and the header for the last two is designated as �Rural areas�; and within both urban and rural areas, there are separate headers for the frequency of observations and for the percentage distribution of those observations. Another point pertains to columns of �totals� or �sums�, such as the first column of table XVI.3. The survey team should choose a convention with respect to where these columns will be placed. Traditionally, the total comes last, with all of the attributes shown first across the columns. However, if a table continues for multiple pages, with many columns of information, the survey team may prefer to have the total first (at the left) for the series of columns. When the total appears first, any user will immediately know the total for that series of columns, without having to page through all of the table.

52. Column headers and their associated columns of data should be spaced to minimize blank space on the page. Spacing of columns needs to take into account the number of digits in the maximum figures to appear in the columns, the number of letters in the names of the attributes appearing in the columns, and the total number of �spaces� allowed by the particular font being used. The font used is very important, and should be chosen early in the tabulation process.

53. Stub (row) titles. The survey team must also determine conventions to be used for stub (row) headings and titles. Stub �headings� should be left justified and only one variable should be listed on each line. Stub headings should consist of the names of variables displayed in the row. Stubs may include subcategories (nested variables). For example, a stub �group� may have two separate rows, one for male and one for female. Some conventions need to be established to distinguish between the different stub groups; the convention usually involves different indentation for different �levels� of variables.

54. Precision of numbers. Many tables suffer from presenting too many significant digits. When percentages are shown, it is almost always sufficient to include only one digit beyond the decimal point; presenting two or more digits rarely provides useful information and has three disadvantages: it distracts the reader, wastes space, and conveys a false sense of precision. Numbers with four or more digits rarely need any decimal point at all. When large numbers are displayed, they should appear in �thousands� or �millions,� so that no numbers of more than four or five digits appear.

55. Source. The source of the data should appear as the complete name of the survey, usually at the bottom of the table (as seen in table XVI.6). However, sometimes tabulations display more than one survey for a country, or surveys from more than one country. When this happens, the information in the sources becomes more important. The date should be included along with

Household Sample Surveys in Developing and Transition Countries

352

the name of the survey. If the source is a published report, it is useful to distinguish between the date of publication of the report and the year of data collection. For example, a country might have collected data in 1990, but published the data in 1992. Hence, the source might read �1990 Fertility Survey, 1992� with 1992 indicating the date of publication. 56. Notes. Notes provide immediate information with which to properly interpret the results shown in the table. For example, the notes to tables XVI.1 and XVI.2 indicate that the sampled population includes all persons living in either individual dwellings or collectives. In addition to notes at the bottom of a table, a series of definitions and explanations might appear in the text accompanying the tables. The text would include the definitions of the characteristics, for example, it would indicate that the birthplace referred to the mother�s living quarters just prior to going to the hospital to deliver, rather than to the hospital location. The text might also include explanations regarding how the data were obtained or are to be used. For example, if the date of birth and age were both collected, but date of birth superseded age when they were inconsistent, this information might assist certain users, like demographers, in assessing the best method of interpreting the data.

4. Use of weights

57. The present subsection provides a brief overview of the use of weights when producing tables and graphs using household survey data. For much more detailed treatment, see chapters II, VI, XIX, XX and XXI and the references therein. 58. With respect to survey weighting, the simplest type of household survey sample design is the �self-weighted� type. In such a case, no weights need actually be used in the analysis because each household in the population has the same probability of being selected in the sample. The 1992-1993 Viet Nam Living Standards Survey used in several of the examples in this chapter was such a survey. Yet, variation in response rates across different types of households usually implies that weights should be calculated to correct for such variation. More importantly, most household surveys are not self-weighted because they draw disproportionately large samples for some parts of the population that are of particular interest. For these surveys, weights must be used to reflect the differential probabilities of selection in order to properly calculate unbiased estimates of the characteristics of interest to the survey. 59. Accurate weights must incorporate three components. The first encompasses the �base weights� or �design weights�. These account for variation in the selection probabilities across different groups of households (that is to say, when the sample is not self-weighting) as stipulated by the survey�s initial sample design. The second component is adjustment for variation in non-response rates. For example, in many developing countries, wealthier households are less likely to agree to be interviewed than are middle-income and lower-income households. The base weights need to be �inflated� by the inverse of the response rate for all groups of households. Finally, in some cases, there may be �post-stratification adjustments�. The rationale for post-stratification is that an independent data source, such as a census, may provide more precise estimates of the distribution of the population by age, sex and ethnic group. If the survey estimates of these distributions do not closely correspond to those of the independent source, the survey data may be re-weighted to force the two distributions to agree.

Household Sample Surveys in Developing and Transition Countries

353

For a more detailed account of the second and third components, see Lundström and Särndal (1999).

D. Preparing a general report (abstract) for a household survey 60. Most household surveys first disseminate their results by publishing a general report which contains a modest amount of detail on all of the information collected in the survey. Such reports usually have much wider circulation than do more specialized reports that make full use of certain aspects of the data. These general reports are sometimes called �statistical abstracts�. The present section provides some specific recommendations for producing these reports, based on Grosh and Muñoz (1996).

1. Content 61. The main material in any general statistical report is a large number of tables and graphs. They should reflect all of the main kinds of information collected in the survey; in-depth analysis of more narrow topics should be left to more focused special reports. A small amount of text should accompany the tables, just enough to clarify the type of information in those tables. There is no need to draw particular policy conclusions, although possible interpretations can be suggested as fruitful areas for future research. 62. The most basic information can be broken down by geographical regions, by sex and, perhaps, by age. If the survey contains income or expenditure data, they can also be broken down by income or expenditure groups. In some countries, there will be large differences across these different groups, and the nature of these differences can be explored further in additional tables. In other countries, some of these differences will not be very large, so there will be no need to present more detail. 63. In addition to the results from the household survey data, the general report should have several pages describing the survey itself, including the sample size and the design of the sample, the date of the survey�s start and the date of its termination, and some detail on how the data were collected. The questionnaire or questionnaires used should be included as an annex to the main report.

2. Process 64. A good general statistical report is produced by a team of people, several of whom will ideally have had experience on previous reports. Some team members will focus on the technical aspects of generating tables and graphs, while others will mainly be responsible for the content and the text accompanying the tables. The more technically-oriented team members can choose the statistical software with which they are most familiar, since most statistical software are able to produce the figures needed for the tables and graphs. However, estimation of standard errors will likely require software specifically designed for that purpose, since household survey sample designs are virtually always too complex to be handled properly by standard statistical software packages (see chapter XXI for a discussion of these issues).

Household Sample Surveys in Developing and Transition Countries

354

65. The team members responsible for the content should meet with experts in government agencies regarding the topics to be included in the report. This will ensure that the tables and graphs present the data in a form most useful to those agencies. It might prove useful as well to consult international aid agencies, which could also find the data useful in planning their programmes (see chap. III for a more general discussion of how to form an effective survey team).

E. Concluding comments 66. This chapter has provided an introduction to the presentation of simple descriptive statistics using household survey data. The treatment has been very general, and undertaken at a very basic level. As much of what has been presented constitutes little more than common sense, data analysts should use their own common sense when facing particular issues regarding the analysis of their surveys. More sophisticated methods can also be used to analyse household survey data, some of which are discussed in later chapters. All things considered, the data analysis for any given household survey will have to be tailored to the main topics and objectives of the survey, and researchers will have to consult specialized books and journals to obtain guidance on issues specific to those topics. References Frankenberg, Elizabeth (2000). Community and price data. In Designing Household Survey

Questionnaires for Developing Countries: Lessons from 15 Years of the Living Standards Measurement Study, M. Grosh and P. Glewwe, eds. New York: Oxford University Press, for the World Bank.

Grosh, Margaret, and Juan Muñoz (1996). A Manual for Planning and Implementing the Living

Standards Measurement Study Survey�. Living Standards Measurement Study Working Paper, No. 126. Washington, D.C.: World Bank.

Lundström, S., and C. E. Särndal (1999). Calibration as a standard method for the treatment of

non-response in sample surveys�, Journal of Official Statistics, vol. 13, No. 2, pp. 305-327.

Tufte, Edward (1983). The Visual Display of Quantitative Information. Cheshire, Connecticut:

Graphics Press. Wild, C. J., and G. A. F. Seber (2000). Chance Encounters: A First Course in Data Analysis and

Inference. New York: Wiley.

Journal of Economic Perspectives—Volume 28, Number 1—Winter 2014—Pages 209–234

OO nce upon a time, a picture was worth a thousand words. But with online nce upon a time, a picture was worth a thousand words. But with online news, blogs, and social media, a good picture can now be worth so much news, blogs, and social media, a good picture can now be worth so much more. Economists who want to disseminate their research, both inside more. Economists who want to disseminate their research, both inside

and outside the seminar room, should invest some time in thinking about how to and outside the seminar room, should invest some time in thinking about how to construct compelling and effective graphics.construct compelling and effective graphics.

An effective graph should tap into the brain’s “pre-attentive visual processing” An effective graph should tap into the brain’s “pre-attentive visual processing” (Few 2004; Healey and Enns 2012). Because our eyes detect a limited set of visual (Few 2004; Healey and Enns 2012). Because our eyes detect a limited set of visual characteristics, such as shape or contrast, we easily combine those characteristics and characteristics, such as shape or contrast, we easily combine those characteristics and unconsciously perceive them as an image. In contrast to “attentive processing”—the unconsciously perceive them as an image. In contrast to “attentive processing”—the conscious part of perception that allows us to perceive things serially—pre-attentive conscious part of perception that allows us to perceive things serially—pre-attentive processing is done in parallel and is much faster. Pre-attentive processing allows the processing is done in parallel and is much faster. Pre-attentive processing allows the reader to perceive multiple basic visual elements simultaneously. Here is a simple reader to perceive multiple basic visual elements simultaneously. Here is a simple example; count the occurrences of the number 3 in the following set:example; count the occurrences of the number 3 in the following set:

1269548523612356987458245 0124036985702069568312781 2439862012478136982173256

Now repeat the task with this set of numbers:

1269548523612356987458245 0124036985702069568312781 2439862012478136982173256

An Economist’s Guide to Visualizing Data

■ ■ Jonathan A. Schwabish is a Principal Analyst, Congressional Budget Offi ce, Washington, Jonathan A. Schwabish is a Principal Analyst, Congressional Budget Offi ce, Washington, DC. His email address is [email protected]. His email address is [email protected].

http://dx.doi.org/10.1257/jep.28.1.209 doi=10.1257/jep.28.1.209

Jonathan A. Schwabish

210 Journal of Economic Perspectives

The instances in the second set are much easier to fi nd because they are encoded using a different pre-atttentive attribute—in this case, the intensity of boldface type.

It takes imagination to create high-quality images that illustrate data accurately It takes imagination to create high-quality images that illustrate data accurately and effectively and also show some understanding and appreciation of how people and effectively and also show some understanding and appreciation of how people acquire information. Indeed, what is known as the “Picture Superiority Effect” refers acquire information. Indeed, what is known as the “Picture Superiority Effect” refers to our ability to retain more information seen through pictures than through words to our ability to retain more information seen through pictures than through words (for example, Medina 2008; Hockley and Bancroft 2011). There are thousands of (for example, Medina 2008; Hockley and Bancroft 2011). There are thousands of approaches to presenting data: for starters, consider the vast information on how approaches to presenting data: for starters, consider the vast information on how to choose fonts, colors, styles, layouts, and chart types. Three basic principles seem to choose fonts, colors, styles, layouts, and chart types. Three basic principles seem especially useful.especially useful.

First,First, show the data. People read graphs in a research report, article, or blog to . People read graphs in a research report, article, or blog to understand the story being told. The data are the most important part of the graph understand the story being told. The data are the most important part of the graph and should be presented in the clearest way possible. But that does not mean that and should be presented in the clearest way possible. But that does not mean that all of the data must be shown—indeed, many graphs show too much. of the data must be shown—indeed, many graphs show too much.

Second, Second, reduce the clutter. Chart clutter, the use of unnecessary or distracting Chart clutter, the use of unnecessary or distracting visual elements, will tend to reduce effectiveness. Clutter comes in dark or heavy visual elements, will tend to reduce effectiveness. Clutter comes in dark or heavy gridlines; unnecessary tick marks, labels, or text; unnecessary icons or pictures; gridlines; unnecessary tick marks, labels, or text; unnecessary icons or pictures; ornamental shading and gradients; and unnecessary dimensions. Too often graphs ornamental shading and gradients; and unnecessary dimensions. Too often graphs use textured or fi lled gradients use textured or fi lled gradients

when simple shades of a color could accomplish the same task.

In some cases, familiar data markers (■♦●×) are used to distinguish between several data series on a graph, but when the markers intersect and overlap they end up cluttering the patterns.

Third, Third, integrate the text and the graph. Standard research reports often suffer . Standard research reports often suffer from the “slideshow effect,” in which the writer narrates the text elements that from the “slideshow effect,” in which the writer narrates the text elements that appear in the graph. A better model is one in which visualizations are constructed to appear in the graph. A better model is one in which visualizations are constructed to complement the text and at the same time to contain enough information to stand complement the text and at the same time to contain enough information to stand alone (Corum 2013). As a simple example, legends that defi ne or explain a line, alone (Corum 2013). As a simple example, legends that defi ne or explain a line, bar, or point are often placed far from the content of the graph—off to the right or bar, or point are often placed far from the content of the graph—off to the right or below the graph. Integrated legends—right below the title, directly on the chart, or at below the graph. Integrated legends—right below the title, directly on the chart, or at the end of a line—are more accessible.the end of a line—are more accessible.

Jonathan A. Schwabish 211

These three principles embody the idea that an author should support the These three principles embody the idea that an author should support the reader’s acquisition of information quickly and easily. By stripping out unneces-reader’s acquisition of information quickly and easily. By stripping out unneces-sary clutter, emphasizing the data, and using certain pre-attentive attributes (for sary clutter, emphasizing the data, and using certain pre-attentive attributes (for example, hue (color), size, orientation, and shape) graphs can more clearly and example, hue (color), size, orientation, and shape) graphs can more clearly and more effectively communicate information. However, default graph options in more effectively communicate information. However, default graph options in many statistical programs tend to add clutter and to separate text and graphs, and many statistical programs tend to add clutter and to separate text and graphs, and so researchers need to consider overriding those defaults and perhaps adding so researchers need to consider overriding those defaults and perhaps adding annotation to create graphs that communicate information more effectively to annotation to create graphs that communicate information more effectively to the reader.the reader.

This article encourages economists to think more strategically about how to This article encourages economists to think more strategically about how to visualize their data and presents some pathways to create better, more effective visualize their data and presents some pathways to create better, more effective graphs. The next section demonstrates the principles in eight graphs remade using graphs. The next section demonstrates the principles in eight graphs remade using nothing more complicated than Excel. The discussion then addresses some types nothing more complicated than Excel. The discussion then addresses some types and purposes of different data visualizations, and briefl y reviews some tools and and purposes of different data visualizations, and briefl y reviews some tools and sources of information that researchers can use to improve their existing graphs or sources of information that researchers can use to improve their existing graphs or to create new ones. One thing researchers should keep in mind is that graphs in to create new ones. One thing researchers should keep in mind is that graphs in research reports or articles, and even those shown in verbal presentations, are not research reports or articles, and even those shown in verbal presentations, are not meant for the author, but for the reader or the seminar audience. The line chart meant for the author, but for the reader or the seminar audience. The line chart that a researcher uses in the data exploration phase—with default gridlines, tick that a researcher uses in the data exploration phase—with default gridlines, tick marks, and colors—may not be the one that will best communicate the researcher’s marks, and colors—may not be the one that will best communicate the researcher’s ideas to others. Discussions of data visualization are only now making their way into ideas to others. Discussions of data visualization are only now making their way into economics journals and conferences, but perhaps this is just the beginning and economics journals and conferences, but perhaps this is just the beginning and our discipline’s understanding of the importance of good visualization will expand our discipline’s understanding of the importance of good visualization will expand and grow.and grow.

Eight Graphs Transformed

Poor graphs communicate ineffectively, or even worse, provide a distorted Poor graphs communicate ineffectively, or even worse, provide a distorted impression of the data. This section shows how eight graphs could be redesigned impression of the data. This section shows how eight graphs could be redesigned to demonstrate the application of the three  principles outlined above. Some to demonstrate the application of the three  principles outlined above. Some decisions are subjective, of course—line thickness, series order, axis label style. decisions are subjective, of course—line thickness, series order, axis label style. Other decisions, I would argue, are just objectively Other decisions, I would argue, are just objectively better ways to convey meaning. ways to convey meaning. All of the redesigned graphs were constructed in Excel and required slight varia-All of the redesigned graphs were constructed in Excel and required slight varia-tions from the program’s default settings. Garamond—a classic serif font—is tions from the program’s default settings. Garamond—a classic serif font—is used to slightly distinguish the graphs from the used to slightly distinguish the graphs from the Journal of Economic Perspectives’s ’s Baskerville typeface. The electronic version of the Baskerville typeface. The electronic version of the JEP often uses color, which can often uses color, which can be an important tool in data visualization to invoke emotion, emphasize graphical be an important tool in data visualization to invoke emotion, emphasize graphical elements, or simply add aesthetic value. The print elements, or simply add aesthetic value. The print JEP does not use color, but all does not use color, but all graphs that use color in the electronic version of the graphs that use color in the electronic version of the JEP are designed to work are designed to work in greyscale for print readers. The choice of a color palette, like the choice of in greyscale for print readers. The choice of a color palette, like the choice of a font, also can be subjective, but following some basic guidelines can improve a font, also can be subjective, but following some basic guidelines can improve communication. In the fi nal section, I discuss some tools that can help with those communication. In the fi nal section, I discuss some tools that can help with those strategies and selections.strategies and selections.

212 Journal of Economic Perspectives

The Line ChartFigure 1A summarizes regression results of the correlation between the long-Figure 1A summarizes regression results of the correlation between the long-

run unemployment rate in the United States and Supplemental Nutrition Assistance run unemployment rate in the United States and Supplemental Nutrition Assistance Program caseloads for four  groups.Program caseloads for four  groups.11 Instead of a single line chart with multiple Instead of a single line chart with multiple series or four  large, separate charts, the authors have used a “small multiples” series or four  large, separate charts, the authors have used a “small multiples” approach in which four smaller charts are grouped together. Instead of packing as approach in which four smaller charts are grouped together. Instead of packing as much information as possible into a single graph, researchers should probably use much information as possible into a single graph, researchers should probably use this approach more often. However, in other ways the graphs violate the three prin-this approach more often. However, in other ways the graphs violate the three prin-ciples outlined above.ciples outlined above.

Perhaps most importantly, a graph should emphasize the data, but the darkest Perhaps most importantly, a graph should emphasize the data, but the darkest and thickest line on these graphs is the 0 percent gridline. Your eye is immediately and thickest line on these graphs is the 0 percent gridline. Your eye is immediately drawn to that thick, horizontal gridline rather than to the important parts of the graph, drawn to that thick, horizontal gridline rather than to the important parts of the graph, namely the coeffi cient line and the standard errors. Also notice that the data values in namely the coeffi cient line and the standard errors. Also notice that the data values in the last point of the WE and SS charts actually exceed the 15 percent data marker—the the last point of the WE and SS charts actually exceed the 15 percent data marker—the graphs fail to show all data points.graphs fail to show all data points.

Some elements add unneeded clutter. For example, the y-axis labels and Some elements add unneeded clutter. For example, the y-axis labels and percentage signs are redundant and add clutter (there are 28  percentage signs percentage signs are redundant and add clutter (there are 28  percentage signs in all!) and the tick marks on the y-axes also seem unnecessary.in all!) and the tick marks on the y-axes also seem unnecessary.

1 The authors were kind enough to send me the data behind their fi gure for reproduction here.

Figure 1AAn Original Line Chart

Source: Klerman and Danielson (2011).

15%

10%

5%

0%

5%

10%

15%

0 1 2 3 4 5

Per

cen

t ch

ang

e in

cas

elo

ad

Years

Caseload: AO

15%

10%

5%

0%

5%

10%

15%

0 1 2 3 4 5

Per

cen

t ch

ang

e in

cas

elo

ad

15%

10%

5%

0%

5%

10%

15%

Per

cen

t ch

ang

e in

cas

elo

ad

Years

Caseload: NC

15%

10%

5%

0%

5%

10%

15%

0 1 2 3 4 5

Per

cen

t ch

ang

e in

cas

elo

ad

Years

0 1 2 3 4 5

Years

Caseload: SSCaseload: WE

An Economist’s Guide to Visualizing Data 213

Finally, what do AO, NC, WE, and SS mean in the fi gure? In this article, these Finally, what do AO, NC, WE, and SS mean in the fi gure? In this article, these terms are explained on the third and fourth pages, 15 pages before this fi gure is terms are explained on the third and fourth pages, 15 pages before this fi gure is presented. It seems unfair to ask a reader to search the paper to decode the meaning presented. It seems unfair to ask a reader to search the paper to decode the meaning of those labels. Instead, the text and the graphs can be integrated.of those labels. Instead, the text and the graphs can be integrated.

Figure 1B offers an alternative version of this fi gure. The darkest line now shows Figure 1B offers an alternative version of this fi gure. The darkest line now shows the data (in this case, the coeffi cient estimate). The gridlines have been lightened, the data (in this case, the coeffi cient estimate). The gridlines have been lightened, but leaving the 0 percent gridline slightly darker so it can provide a baseline for the but leaving the 0 percent gridline slightly darker so it can provide a baseline for the series that dip below zero. Because the charts are aligned vertically and horizontally, series that dip below zero. Because the charts are aligned vertically and horizontally, I eliminated two sets of labels to reduce clutter and to help highlight the central I eliminated two sets of labels to reduce clutter and to help highlight the central message. I eliminated the percent signs and identifi ed the unit in parentheses below message. I eliminated the percent signs and identifi ed the unit in parentheses below the title. This last edit is essentially a style choice—there is nothing inherently wrong the title. This last edit is essentially a style choice—there is nothing inherently wrong with using rotated text on the vertical axis, but it does require readers to turn the with using rotated text on the vertical axis, but it does require readers to turn the page sideways or tilt their heads (Robbins 2013a).page sideways or tilt their heads (Robbins 2013a).

The title is now above the graph—some publishers place titles below graphs even The title is now above the graph—some publishers place titles below graphs even though readers tend to start reading from the top left, move down along the left margin, though readers tend to start reading from the top left, move down along the left margin, and then move to the right. Repositioning the word “Caseload”and then move to the right. Repositioning the word “Caseload” into the title—used into the title—used four times in the original—leaves room to spell out the abbreviations. One could, of four times in the original—leaves room to spell out the abbreviations. One could, of course, argue about many of the choices illustrated here: Perhaps four sets of axis labels course, argue about many of the choices illustrated here: Perhaps four sets of axis labels are better than two, or dotted standard error lines would be preferable to solid ones. are better than two, or dotted standard error lines would be preferable to solid ones. But surely, a number of these changes are clear improvements on the original fi gure.But surely, a number of these changes are clear improvements on the original fi gure.

Figure 1BA Revised Line Chart

Implied Impulse Response Functions for Different Caseloads(Percent change)

No Cash

0 1 2 3 4 5Years

SSI Cash Assistance

-20

-10

0

10

20Adult Only

-20

-10

0

10

20

0 1 2 3 4 5Years

Welfare Cash Assistance

214 Journal of Economic Perspectives

The ClutterplotThe next chart (Figures 2A and 2B) comes from the The next chart (Figures 2A and 2B) comes from the JEP (Hanson 2012, p. 54). (Hanson 2012, p. 54).

The explanation in the original article is as follows (emphasis added):The explanation in the original article is as follows (emphasis added):

[The fi gure] plots countries’ revealed comparative advantage in offi ce machines . . . averaged over 2006 to 2008, against the average years of school-ing of the adult population in 2005 . . . China is above the regression line, indicating that its specialization in the sector is greater than one would expect given its level of education, but it is hardly an extreme outlier. Other middle-income countries—including Costa Rica, the Philippines, Malaysia, and Thailand—have larger positive residuals.

Congratulations if you already know the three-letter codes for the fi ve countries Congratulations if you already know the three-letter codes for the fi ve countries referenced in the text. Even if you do, try to fi nd them in the haystack of labels and referenced in the text. Even if you do, try to fi nd them in the haystack of labels and dots. The redesign offers an alternative that shows the data and reduces the clutterdots. The redesign offers an alternative that shows the data and reduces the clutter. I  eliminated all labels other than those for the fi ve  countries under discussion, I  eliminated all labels other than those for the fi ve  countries under discussion, which I spelled out, and I made the fi ve data points darker, thus deemphasizing the which I spelled out, and I made the fi ve data points darker, thus deemphasizing the other points but still showing the important information.other points but still showing the important information.

Figure 2AA Clutterplot Example

Source: Hanson (2012).

Rev

eale

d co

mpa

rati

ve a

dvan

tage

0

5

–5

China

Years of schooling, 2005

41420 121086

Education and Exports of Office Machines

Jonathan A. Schwabish 215

One potential objection to naming only some of the points is the concern that One potential objection to naming only some of the points is the concern that some readers may want to search through the fi gure for individual countries or some readers may want to search through the fi gure for individual countries or identify various outlier points. There will often be some tension between presenting identify various outlier points. There will often be some tension between presenting enough data to tell the story and presenting additional data that some readers enough data to tell the story and presenting additional data that some readers might want. But most journals and many individual researchers now have their own might want. But most journals and many individual researchers now have their own websites, so posting the complete actual data for interested readers can be relatively websites, so posting the complete actual data for interested readers can be relatively simple and inexpensive. If it is important to include the data within the paper itself, simple and inexpensive. If it is important to include the data within the paper itself, a supplemental table in the text or as an appendix could be a more straightforward a supplemental table in the text or as an appendix could be a more straightforward method of presentation.method of presentation.

The Basic Column ChartWhen it comes to bar and column charts, a fi rst rule is to start the chart at zero. When it comes to bar and column charts, a fi rst rule is to start the chart at zero.

Otherwise, the differences between the columns are overemphasized. Figure  3A Otherwise, the differences between the columns are overemphasized. Figure  3A shows one such example of a column chart that does not start at zero. For example, shows one such example of a column chart that does not start at zero. For example, if you look at the second-shortest bar, with a value just above 500, it certainly does if you look at the second-shortest bar, with a value just above 500, it certainly does not appear at a glance that it is half the length of the longest bar. This fi gure had not appear at a glance that it is half the length of the longest bar. This fi gure had different colors for each bar, which is not necessary but may be useful. The axis in different colors for each bar, which is not necessary but may be useful. The axis in the redesigned version, shown in Figure 3B, starts at zero and, to make room for the the redesigned version, shown in Figure 3B, starts at zero and, to make room for the labels that are integrated with the chart, the fi gure is rotated horizontally. In this labels that are integrated with the chart, the fi gure is rotated horizontally. In this version, the second-shortest bar now appears at a glance to be about half the length version, the second-shortest bar now appears at a glance to be about half the length of the longest bar.of the longest bar.

Figure 2BRevising the Clutterplot Example

Rev

eale

d co

mpa

rativ

e ad

vant

age

in o

ffic

e m

achi

nes,

2006

–08

0

5

–5

China

Costa Rica Malaysia

Philippines

Thailand

Years of schooling, 2005

41420 121086

Education and Exports of Office Machines

216 Journal of Economic Perspectives

Figure 3AThe Basic Column Chart

Source: Stinebrickner and Stinebrickner (2013).

400

500

600

700

800

900

1000

1100in

com

ein

thou

sand

sfinish no school

finish 1 yr

finish 3 yrs

grad 2.0 GPA

grad 3.0 GPA

grad 3.75 GPA

Figure 2 Discounted Expected LifetimeEarnings, VN(t')

Figure 3BThe Revised Column Chart

Source: Author’s calculations using numbers inferred from text in Stinebrickner and Stinebrickner (2013).

Discounted Expected Lifetime Earnings, VN(t')(Income in thousands)

0 200 400 600 800 1,000 1,200

Finish no school

Finish 1 year

Finish 3 years

Graduate, 2.0 GPA

Graduate, 3.0 GPA

Graduate, 3.75 GPA

An Economist’s Guide to Visualizing Data 217

The 3D ChartThe 3D ChartFigure 4A uses the now-familiar 3D effect. In such graphs, the third dimen-Figure 4A uses the now-familiar 3D effect. In such graphs, the third dimen-

sion does not plot data values, but it does add clutter to the chart and, worse, it sion does not plot data values, but it does add clutter to the chart and, worse, it can distort the information. Look at the far-right-hand bar, labeled 6 percent: No can distort the information. Look at the far-right-hand bar, labeled 6 percent: No point of the column touches the gridline for that value. This software tool—like point of the column touches the gridline for that value. This software tool—like many others—uses perspective to give depth to the imaginary plane that runs across many others—uses perspective to give depth to the imaginary plane that runs across the top of the column, intersecting the gridline. But most readers will perceive the the top of the column, intersecting the gridline. But most readers will perceive the actual value of the column as less than 6 percent. Figure 4B shows a redesign: cancel actual value of the column as less than 6 percent. Figure 4B shows a redesign: cancel the 3D treatment and integrate the disconnected legend with the graph. Notice that the 3D treatment and integrate the disconnected legend with the graph. Notice that inserting the common baseline—portrayed in the original by a hovering, barely inserting the common baseline—portrayed in the original by a hovering, barely perceptible thin gray line—permits a more effective comparison among groups.perceptible thin gray line—permits a more effective comparison among groups.

The Unbalanced ChartThe source material for Figure  5A originally appeared in an interactive The source material for Figure  5A originally appeared in an interactive

visualization on the Organisation for Economic Co-operation and Development visualization on the Organisation for Economic Co-operation and Development (OECD) website (http://www.oecd.org/gender/data/proportionofemployedw(OECD) website (http://www.oecd.org/gender/data/proportionofemployedwhoareseniormanagersbysex.htm); a static version was later reproduced in a hoareseniormanagersbysex.htm); a static version was later reproduced in a New York Times Economix blog post (http://economix.blogs.nytimes.com/2013/04/02Economix blog post (http://economix.blogs.nytimes.com/2013/04/02/comparing-the-worlds-glass-ceilings/?_r=2)./comparing-the-worlds-glass-ceilings/?_r=2).

Figure 4AA 3D Chart

Source: Ottaviano and Peri (2008).

Change in real weekly wages of US-born workers by group, 1990-2006

-6.0%

-4.0%

-2.0%

0.0%

2.0%

4.0%

6.0%

8.0%

10.0%

12.0%

Some High School High School Graduate Some College College Graduate

0.4%

-1.2% -1.2%

11.3%

-5.4%

-1.3%

-3.0%

6.0%

groups

Young (experience below 20 years)

Old (Experience above 20 years)

218 Journal of Economic Perspectives

Figure 5AAn Unbalanced Chart

0%

5%

10%

15%

20%

Uni

ted

Stat

es

New

Zea

land

U

nite

d K

ingd

om

Irel

and

Aus

tral

ia

Est

onia

B

elgi

um

Gre

ece

Can

ada

Icel

and

Fran

ce

Ital

y N

ethe

rland

s Fi

nlan

d O

EC

D a

vera

ge

Hun

gary

Sp

ain

Isra

el

Slov

enia

Po

land

C

zech

Rep

ublic

Sw

itzer

land

A

ustr

ia

Port

ugal

N

orw

ay

Slov

ak R

epub

lic

Ger

man

y Sw

eden

Lu

xem

bour

g Tu

rkey

D

enm

ark

Mex

ico

Kor

ea

Women Men

Percentage of Employed Who Are Senior Managers, by Sex, 2008

Source: Author, based on OECD (no date) and Rampell (2013).

Figure 4BFlattening a 3D Chart

Change in real weekly wages of US-born workers by group, 1990–2006(Percent)

0.4

-1.2 -1.2

11.3

-5.4

-1.3-3.0

6.0 Young (experience below 20 years)

Old (experience above 20 years)

-6

-4

-2

0

2

4

6

8

10

12

Some High School College Graduate Some College High School Graduate

Change in real weekly wages of US-born workers by group, 1990–2006(Percent)

0.4

-1.2 -1.2

11.3

-5.4

-1.3-3.0

6.0 Young (experience below 20 years)

Old (experience above 20 years)

-6

-4

-2

0

2

4

6

8

10

12

Some High School College Graduate Some College High School Graduate

Jonathan A. Schwabish 219

The chart is an ineffective communication tool in either version for three main The chart is an ineffective communication tool in either version for three main reasons. First, the same kinds of data are plotted using different types of encoding reasons. First, the same kinds of data are plotted using different types of encoding so that it is diffi cult to compare location (diamonds) with length (bars). Further, the so that it is diffi cult to compare location (diamonds) with length (bars). Further, the relationship between the two values is obscured: The data points for men are too far relationship between the two values is obscured: The data points for men are too far away from those for women and there is no visual connection between them. The away from those for women and there is no visual connection between them. The original versions used color—red (for women) and blue (for men) in the original versions used color—red (for women) and blue (for men) in the New York Times and orange and blue in the OECD version—which can serve to draw attention and orange and blue in the OECD version—which can serve to draw attention to or from certain elements.to or from certain elements.

Second, the columns for women take up a much larger proportion of the graph Second, the columns for women take up a much larger proportion of the graph than do the diamonds for men, overemphasizing the data for women. If the inten-than do the diamonds for men, overemphasizing the data for women. If the inten-tion is to give greater emphasis to the data for women, then a descriptive title—such tion is to give greater emphasis to the data for women, then a descriptive title—such as “Women’s Employment as Senior Managers Averaged 6 Percent in 2008”—would as “Women’s Employment as Senior Managers Averaged 6 Percent in 2008”—would have helped clarify the meaning. Furthermore, the gradient color shading in the have helped clarify the meaning. Furthermore, the gradient color shading in the columns is darker at the bottom than at the top where the data are truly encoded.columns is darker at the bottom than at the top where the data are truly encoded.

Finally, there are many gridlines, all heavy, and the percent signs on the y-axis Finally, there are many gridlines, all heavy, and the percent signs on the y-axis labels are redundant. Additionally, the x-axis labels are potentially diffi cult to scan labels are redundant. Additionally, the x-axis labels are potentially diffi cult to scan because they are vertical.because they are vertical.

The chart could be redesigned in a number of ways. Paired bars could be The chart could be redesigned in a number of ways. Paired bars could be used for men and women, or the fi gure could be turned into a table (Schwabish used for men and women, or the fi gure could be turned into a table (Schwabish 2013a shows some other alternatives). Figure 5B shows a less-common type of visual 2013a shows some other alternatives). Figure 5B shows a less-common type of visual approach. For some readers, these different types of plots may not be perceived as approach. For some readers, these different types of plots may not be perceived as quickly as more commonly used graph types, such as bars or lines, but it’s instructive quickly as more commonly used graph types, such as bars or lines, but it’s instructive to remember that scatterplots, not so long ago a novelty in mainstream publishing, to remember that scatterplots, not so long ago a novelty in mainstream publishing, now appear regularly. Just as our text literacy can expand with experience and expo-now appear regularly. Just as our text literacy can expand with experience and expo-sure, so can our graphic literacy.sure, so can our graphic literacy.

The redesigned chart shown in Figure  5B has several characteristics worth The redesigned chart shown in Figure  5B has several characteristics worth noting. First, the data are encoded similarly for men and women so that the reader noting. First, the data are encoded similarly for men and women so that the reader can better perceive the connection between the two  series and compare them. can better perceive the connection between the two  series and compare them. Second, the title, units, and legend are integrated and placed at the top-left of the Second, the title, units, and legend are integrated and placed at the top-left of the chart, which helps the reader “enter” the chart. Third, the country labels are rotated chart, which helps the reader “enter” the chart. Third, the country labels are rotated horizontally and incorporated directly onto the chart with the thin gray connecting horizontally and incorporated directly onto the chart with the thin gray connecting lines helping to illustrate the comparison between data points for men and women. lines helping to illustrate the comparison between data points for men and women. Finally, the average value for the OECD as a whole is an unfi lled circle (in a version Finally, the average value for the OECD as a whole is an unfi lled circle (in a version where black and white printing is not considered, different shades or colors could where black and white printing is not considered, different shades or colors could be used instead).be used instead).

One potential shortcoming of the redesign is the lack of vertical gridlines. In One potential shortcoming of the redesign is the lack of vertical gridlines. In the original, the gridlines perhaps helped distinguish more specifi c values. There the original, the gridlines perhaps helped distinguish more specifi c values. There is often a tension in this aspect of data visualization—how much of the purpose of is often a tension in this aspect of data visualization—how much of the purpose of the graph is to explain an idea and how much is to provide specifi c data (Few 2005; the graph is to explain an idea and how much is to provide specifi c data (Few 2005; Schwabish 2013b)?Schwabish 2013b)?

The Spaghetti ChartIf a line chart has too many series, any single trend will be obscured. Such If a line chart has too many series, any single trend will be obscured. Such

charts are sometimes called “spaghetti charts” (Nussbaumer 2013). If too much charts are sometimes called “spaghetti charts” (Nussbaumer 2013). If too much

220 Journal of Economic Perspectives

information is plotted, readers can have diffi culty pulling out the meaning of a information is plotted, readers can have diffi culty pulling out the meaning of a single series or drawing an overall conclusion. Figure 6A is not an extreme example single series or drawing an overall conclusion. Figure 6A is not an extreme example of a spaghetti chart, with only four lines. Nonetheless, consider some of its short-of a spaghetti chart, with only four lines. Nonetheless, consider some of its short-comings: First, the data markers on every point make it diffi cult to pre-attentively comings: First, the data markers on every point make it diffi cult to pre-attentively follow any single series. Second, the legend is located far from the data and the follow any single series. Second, the legend is located far from the data and the order of the legend does not match the order of the lines.order of the legend does not match the order of the lines.

One alternative to the spaghetti chart is to create smaller charts in series (some-One alternative to the spaghetti chart is to create smaller charts in series (some-times called sparklines or small multiples; see Tufte 2006). Instead of a single, dense times called sparklines or small multiples; see Tufte 2006). Instead of a single, dense presentation of data, Figure 6B splits the information into four separate, smaller presentation of data, Figure 6B splits the information into four separate, smaller

Figure 5BRevising the Unbalanced Chart

United States New Zealand

Ireland Australia Estonia

Belgium Greece

Canada Iceland France

Italy Netherlands

Finland OECD average

Hungary Spain

Israel Slovenia Poland

Czech Republic Switzerland

Austria Portugal Norway

Germany Sweden

Luxembourg Turkey

Denmark Mexico

Korea

0 5 10 15 20Percent

Women Men

Percentage of Employed Who Are Senior Managers, by Gender, 2008(percent)

United Kingdom

Slovak Republic

An Economist’s Guide to Visualizing Data 221

Figure 6AA Spaghetti Chart

Source: Social Security Advisory Board (2012).

27. Initial DI Worker Awards by Major Cause of Disability—Calendar Years 1975-2010

0%

5%

10%

15%

20%

25%

30%

35%

1975 1980 1985 1990 1996 2000 2005 2010

Mental

Cancer

Circulatory

Musculoskeletal

Figure 6BRevising the Spaghetti Chart

Initial DI Worker Awards by Major Cause of Disability—Calendar Years 1975–2010(Percent)

Circulatory MentalMental Circulatory

Musculoskeletal Cancer

1975 1980 1985 1990 1995 2000 2005 2010

32

11

17

26

1975 1980 1985 1990 1995 2000 2005 2010

11

23

10

14

222 Journal of Economic Perspectives

graphs to highlight the information in each line within the context of all the data. graphs to highlight the information in each line within the context of all the data. The contrast between light and dark helps to highlight specifi c trends and reduce The contrast between light and dark helps to highlight specifi c trends and reduce clutter, as does the use of a label at either end of the main line in each set. (The clutter, as does the use of a label at either end of the main line in each set. (The y-axes are deleted, but they could be restored.) In this redesign, I  have tried to y-axes are deleted, but they could be restored.) In this redesign, I  have tried to emphasize the trends over time; my approach would differ if I were trying to empha-emphasize the trends over time; my approach would differ if I were trying to empha-size specifi c numbers.size specifi c numbers.

The Pie ChartThe debate concerning the effectiveness of pie charts is among the most The debate concerning the effectiveness of pie charts is among the most

contentious in the fi eld of data visualization (much of the discussion in this section contentious in the fi eld of data visualization (much of the discussion in this section is based on Few 2007). Many people love pie charts—they are familiar, easily under-is based on Few 2007). Many people love pie charts—they are familiar, easily under-stood, and present “part-to-whole” relationships in an obvious way. But others argue stood, and present “part-to-whole” relationships in an obvious way. But others argue that because pie charts force readers to make comparisons using the areas of the that because pie charts force readers to make comparisons using the areas of the slices or the angles formed by the slices—something that our visual perception does slices or the angles formed by the slices—something that our visual perception does not accurately support—they are not an effective way to communicate information. not accurately support—they are not an effective way to communicate information. Donut charts—in which the center of the pie is punched out—just exacerbate the Donut charts—in which the center of the pie is punched out—just exacerbate the problem: The empty center makes the reader problem: The empty center makes the reader estimate the angle and arrive at other the angle and arrive at other qualitative part-to-whole judgments without being able to see the center where the qualitative part-to-whole judgments without being able to see the center where the edges meet.edges meet.

Pie chart slices that form 90-degree right angles—that is, slices that form one-Pie chart slices that form 90-degree right angles—that is, slices that form one-quarter increments—are the most familiar to our eyes. Other amounts can be far quarter increments—are the most familiar to our eyes. Other amounts can be far more diffi cult to discern. The six segments in Figure 7A, for example, are presented more diffi cult to discern. The six segments in Figure 7A, for example, are presented clockwise in alphabetical order. Group C is easily identifi ed as being about 25 percent clockwise in alphabetical order. Group C is easily identifi ed as being about 25 percent of the whole. If, as in Figure 7B, however, the order of the segments is positioned so of the whole. If, as in Figure 7B, however, the order of the segments is positioned so that the largest starts at the 12 o’clock position, the value of Group C is not so easily that the largest starts at the 12 o’clock position, the value of Group C is not so easily apprehended. One small change obscures the information.apprehended. One small change obscures the information.

To test your accuracy, try to guess the values of the slices: Figure 8ATo test your accuracy, try to guess the values of the slices: Figure 8A shows the shows the answers. Was your guess about Group A correct? What about Group E? When I ask answers. Was your guess about Group A correct? What about Group E? When I ask this question, guesses ordinarily range between 5% and 17% for each. Figure 8Bthis question, guesses ordinarily range between 5% and 17% for each. Figure 8B offers a different approach: Add labels that integrate the data (instead of listing offers a different approach: Add labels that integrate the data (instead of listing percentages in the notes or attaching them to the legend).percentages in the notes or attaching them to the legend).

Figure 7AA Pie Chart

Figure 7BB: A Pie Chart, Rotated

Group A

Group B

Group C

Group D

Group E

Group F

Group B

Group C

Group D

Group A

Group E

Group F

Jonathan A. Schwabish 223

This approach results in what amounts to two sets of information: the labels This approach results in what amounts to two sets of information: the labels and the values for the slices (which readers might add up to arrive at 100 percent). and the values for the slices (which readers might add up to arrive at 100 percent). This defeats the very purpose of the chart, which is to provide a visual representa-This defeats the very purpose of the chart, which is to provide a visual representa-tion of the data. It might be more effective just to present this information in the tion of the data. It might be more effective just to present this information in the form of a short table. A bar or column chart could be another effective alternative: form of a short table. A bar or column chart could be another effective alternative: Figure 8BFigure 8B takes the guesswork out of identifying the value of each group and gives takes the guesswork out of identifying the value of each group and gives a clear picture of the relationships between the various groups, both in absolute a clear picture of the relationships between the various groups, both in absolute amounts and in relative differences, as well as a ranking of the group from largest amounts and in relative differences, as well as a ranking of the group from largest to smallest. The bar or column chart representation is best suited for comparing to smallest. The bar or column chart representation is best suited for comparing different segments; however, it is less effi cient at helping us make part-to-whole different segments; however, it is less effi cient at helping us make part-to-whole

Figure 8A A Pie Chart, Labeled

Group B40%

Group C25%

Group D17%

Group A10%

Group E7%

Group F1%

Figure 8BPie Chart Alternative: A Bar or Column Chart

0%

10%

20%

30%

40%

50%

Group B40%

Group C25%

Group D17%

Group A10%

Group E7%

Group F1%

Total100%

Percentage of Total Sales

+ + + + = +

224 Journal of Economic Perspectives

comparisons. In an attempt to add “part-to-whole” context—something that is often comparisons. In an attempt to add “part-to-whole” context—something that is often not that interesting in itself—the descriptive title at the top and the values and plus not that interesting in itself—the descriptive title at the top and the values and plus signs at the bottom reinforce the fact that the sum of the columns is 100 percent. signs at the bottom reinforce the fact that the sum of the columns is 100 percent. (A completed graph could also delete the percent signs.)(A completed graph could also delete the percent signs.)

Another point: Although pie charts typically ask us to compare all the data Another point: Although pie charts typically ask us to compare all the data within a single space, they are by defi nition designed to facilitate part-to-whole within a single space, they are by defi nition designed to facilitate part-to-whole comparisons. Thus, Figure 8C shows the true purpose of the pie chart, which is to comparisons. Thus, Figure 8C shows the true purpose of the pie chart, which is to individually compare each part to the whole (Camões 2013).individually compare each part to the whole (Camões 2013).

Another common—and less-than-optimal—way to present data is in a 3D pie Another common—and less-than-optimal—way to present data is in a 3D pie chart. Like the 3D bar chart, such a treatment often presents the additional draw-chart. Like the 3D bar chart, such a treatment often presents the additional draw-back of actually distorting the data. Whatever slice of a 3D pie chart that is toward back of actually distorting the data. Whatever slice of a 3D pie chart that is toward the “front” tends to look bigger, because you can also see the 3D  thickness for the “front” tends to look bigger, because you can also see the 3D  thickness for that slice. However, slices toward the “back” of a 3D pie chart tend to look smaller, that slice. However, slices toward the “back” of a 3D pie chart tend to look smaller, because their “thickness” is only partially visible or not visible at all (Skau 2012).because their “thickness” is only partially visible or not visible at all (Skau 2012).

As an example of the diffi culty in discerning quantities from pie charts, As an example of the diffi culty in discerning quantities from pie charts, Figure 9AFigure 9A makes the reader undertake a diffi cult comparison of a large number makes the reader undertake a diffi cult comparison of a large number of segments—not just internally, but with another similar set right beside the fi rst.of segments—not just internally, but with another similar set right beside the fi rst.22

The process of information transfer can be simplifi ed by use of other graphic The process of information transfer can be simplifi ed by use of other graphic forms, which also offer the benefi t of emphasizing different aspects of the analysis. forms, which also offer the benefi t of emphasizing different aspects of the analysis. For example, the For example, the paired column chart in Figure 9Bin Figure 9B fosters within-category compari-fosters within-category compari-sons. The vertical orientation requires that the labels either stretch across several sons. The vertical orientation requires that the labels either stretch across several lines or, worse, are rotated 90 degrees. A horizontal orientation would also work lines or, worse, are rotated 90 degrees. A horizontal orientation would also work

2 Note that the pie chart for 1962 does not sum to 100 percent and there is a small, unlabeled gap left at the 12 o’clock position. For the redesigned graphs in Figures 9B –9D, I added the missing 2 percent from the 1962 series to the “Other” category. Brock (2013) shows other possible alternatives to paired pie charts.

Figure 8CPart-to-Whole Mini-Pie Charts

An Economist’s Guide to Visualizing Data 225

in this case and is a useful approach when labels are diffi cult to fi t in the vertical in this case and is a useful approach when labels are diffi cult to fi t in the vertical column chart layout (recall Figure 3; also see Schwabish 2013c). Also notice the column chart layout (recall Figure 3; also see Schwabish 2013c). Also notice the (subjective) decision to omit the y-axis; the usefulness of the y-axis is doubtful with (subjective) decision to omit the y-axis; the usefulness of the y-axis is doubtful with data labels placed on top of each column.data labels placed on top of each column.

Figure 9ATwo Pie Charts for Comparison

Aggregate income, by source

1962

Other16%

Governmentemployeepensions

6%

Assetincome15% Earnings

28%

SocialSecurity30%

Privatepensions

3%

2007

Other3%

Governmentemployeepensions

8%Private

pensions9%

Assetincome16%

Earnings29%

SocialSecurity36%

Shares of Aggregate Income, 1962 and 2007

Source: Social Security Administration (2009).

Figure 9BAlternative to a Pie Chart: A Paired Column Chart

Shares of Aggregate Income, 1962 and 2009(Percent)

30 28

15

36

18

38

29

11 9 9

4

Social Security Earnings Asset income Privatepensions

Governmentemployeepensions

Other

1962 2009

226 Journal of Economic Perspectives

Alternatively, the Alternatively, the stacked bar chart in Figure 9C in Figure 9C shows the distribution of the shows the distribution of the various groups and that the groups sum to 100 percent, while also highlighting various groups and that the groups sum to 100 percent, while also highlighting differences from one year to the other. Finally, the differences from one year to the other. Finally, the slope chart in Figure 9D in Figure 9D also also shows the difference in each category from the fi rst year to the last by pairing points shows the difference in each category from the fi rst year to the last by pairing points on two vertical axes. Slope charts can be used for a variety of purposes including on two vertical axes. Slope charts can be used for a variety of purposes including showing correlations; for example, the relationship between a state’s obesity rate showing correlations; for example, the relationship between a state’s obesity rate and the share of people with at least a bachelor’s degree (Cairo 2013). In this and the share of people with at least a bachelor’s degree (Cairo 2013). In this example, the color contrast (or what appears as different shades of grey in the example, the color contrast (or what appears as different shades of grey in the black-and-white printed version) identifi es which categories increased over time black-and-white printed version) identifi es which categories increased over time (blue; darker) and those that declined (orange; lighter).(blue; darker) and those that declined (orange; lighter).

Figure 9CAlternative to a Pie Chart: A Stacked Bar Chart

Shares of Aggregate Income, 1962 and 2009(Percent)

30

38

28

29

18

4

15

11

6

9

3

9

1962

2009

Social Security Earnings Other Private

pensions

Government

employee pensionsAsset income

Figure 9DAlternative to a Pie Chart: The Slope Chart

Shares of Aggregate Income, 1962 and 2009(Percent)

38

29

11 9

4

1962 2009

Social Security, 30 Earnings, 28

Private pensions, 3

Other, 18Asset income, 15

Government employeepensions, 6

Jonathan A. Schwabish 227

Form and Function

Once we move past the various strategies for presentation, it is useful to examine Once we move past the various strategies for presentation, it is useful to examine the various forms and functionsthe various forms and functions of data visualization. First, consider the vertical axis of data visualization. First, consider the vertical axis in Figure 10, which illustrates the connection between the two general in Figure 10, which illustrates the connection between the two general forms of visu-of visu-alization. alization. Static visualizations provide all of the information at once and are not active provide all of the information at once and are not active or moving: for example, visualizations that appear on paper are static. or moving: for example, visualizations that appear on paper are static. Inter active visualizations allow a transfer of information between the fi gure and the user. allow a transfer of information between the fi gure and the user. AnimatedAnimated visualizations, which move but do not necessarily permit manipulation of visualizations, which move but do not necessarily permit manipulation of data points to create alternative results—like movies or online slideshows where the data points to create alternative results—like movies or online slideshows where the user can control the pace of the story—can be thought of as falling between a static user can control the pace of the story—can be thought of as falling between a static and an interactive visualization. Second, consider the horizontal axis in Figure 10, and an interactive visualization. Second, consider the horizontal axis in Figure 10, which considers the which considers the function of the visualization. of the visualization. Explanatory visualizations bring the bring the main results to the forefront—they “surface key fi ndings”—to some extent helping main results to the forefront—they “surface key fi ndings”—to some extent helping to reveal the story (for further discussion, see Segel and Heer 2010; Kosara and to reveal the story (for further discussion, see Segel and Heer 2010; Kosara and Mackinlay 2013). In comparison, Mackinlay 2013). In comparison, exploratory visualizations help users interact with a help users interact with a dataset or subject matter to uncover the fi ndings themselves. Such visualizations do dataset or subject matter to uncover the fi ndings themselves. Such visualizations do not generally propose a single narrative or draw out specifi c insights.not generally propose a single narrative or draw out specifi c insights.33

3 Other mappings are of course possible and just as feasible (Bertin 1983; Harris 1996; Heer, Bostock, and Ogievetsky 2010; Kirk 2013; Kosara 2013a).

Figure 10Data Visualization: Form and Function

Form: Static

Form: Interactive

Function: Exploratory

Function: Explanatory

Often a line chart, bart chart, or infographic used to identify keyfindings for the reader, reinforce a specific point in the text, or show a single narrative (for example, CBO, Federal Means-Tested Transfer Programs).

Often, an online slide deck or a static graph with an interactive hover or rollover that allows a reader to interact with a specific story (for example, World Bank, Economic Policy & External Debt ).

Leads readers to discover their own stories as they examine the static representation of data (for example, Moritz Stefaner, Müsli Ingredient Network).

Promotes information transfer between user and interface; asks users to generate their own hypotheses and find their own stories (for example, OECD, Better Life Index).

Source: CBO (Congressional Budget Offi ce), Means-Tested Transfer Program (http://www.cbo.gov/publication/43935); Moritz Stefaner Müsli Ingredient Network (http://stefaner.eu/projects/musli-ingredient-network); World Bank, Economic Policy & External Debt (http://data.worldbank.org/topic/economic-policy-and-external-debt); and OECD, Better Life Index (http://www.oecdbetterlifeindex.org).

228 Journal of Economic Perspectives

Economists typically live in a world of explanatory, static graphs. The graphs Economists typically live in a world of explanatory, static graphs. The graphs tend to be static and are used to reinforce a point made in accompanying text. tend to be static and are used to reinforce a point made in accompanying text. Infographics (a longer form that tends to combine text, graphics, pictures, and Infographics (a longer form that tends to combine text, graphics, pictures, and icons) also typically are in that quadrant.icons) also typically are in that quadrant.

An example of a static explanatory fi gure would be one that portrays a single An example of a static explanatory fi gure would be one that portrays a single narrative, like the Congressional Budget Offi ce’s narrative, like the Congressional Budget Offi ce’s Federal Means-Tested Transfer Programs infographic (http://www.cbo.gov/publication/43935), which combines infographic (http://www.cbo.gov/publication/43935), which combines images, graphs, and text to tell a specifi c story. An example of a static exploratory images, graphs, and text to tell a specifi c story. An example of a static exploratory fi gure is Moritz Stefaner’s fi gure is Moritz Stefaner’s Müüsli Ingredient Network (http://stefaner.eu/projects (http://stefaner.eu/projects/musli-ingredient-network), which shows combinations of custom-ordered muesli /musli-ingredient-network), which shows combinations of custom-ordered muesli ingredients. This approach leads readers to discover their own stories as they ingredients. This approach leads readers to discover their own stories as they examine the static representation of data.examine the static representation of data.

Interactive visualizations often are popular because they open the possibility of Interactive visualizations often are popular because they open the possibility of new and independent conclusions. They also enable the user to take those discov-new and independent conclusions. They also enable the user to take those discov-eries and produce something more explanatory. In general, the most effective eries and produce something more explanatory. In general, the most effective interactive visualizations follow a three-step mantra: “Overview fi rst, zoom and fi lter, interactive visualizations follow a three-step mantra: “Overview fi rst, zoom and fi lter, then details-on-demand” (Shneiderman 1996). Such visualizations give a broad then details-on-demand” (Shneiderman 1996). Such visualizations give a broad look at the graphic space, then allow readers to further defi ne the space of interest, look at the graphic space, then allow readers to further defi ne the space of interest, and fi nally permit them to capture specifi c details.and fi nally permit them to capture specifi c details.

Perhaps the easiest explanatory interactive graph type to consider is a static Perhaps the easiest explanatory interactive graph type to consider is a static graph that has an interactive hover or rollover layered on top (for example, the graph that has an interactive hover or rollover layered on top (for example, the line graphs of economic indicators produced by the World Bank, http://dataline graphs of economic indicators produced by the World Bank, http://data.worldbank.org/topic/economic-policy-and-external-debt). Exploratory interactive .worldbank.org/topic/economic-policy-and-external-debt). Exploratory interactive visualizations (such as OECD’s visualizations (such as OECD’s Better Life Index, http://www.oecdbetterlifeindex.org) , http://www.oecdbetterlifeindex.org) graphically present a complete dataset and ask users to fi nd interesting stories. graphically present a complete dataset and ask users to fi nd interesting stories. Subsequent links and reports then allow readers to fi nd the details on demand.Subsequent links and reports then allow readers to fi nd the details on demand.

Tools and Resources

Standard statistical software packages can generate basic static graphs, but Standard statistical software packages can generate basic static graphs, but people who want to improve their visualizations can take a few steps to reduce clutter people who want to improve their visualizations can take a few steps to reduce clutter and move away from default layouts, gridlines, colors, and fonts. The discussion and move away from default layouts, gridlines, colors, and fonts. The discussion below is not comprehensive and mentions of specifi c products are not intended below is not comprehensive and mentions of specifi c products are not intended as an endorsement but may serve as a starting point. A longer list can be found as an endorsement but may serve as a starting point. A longer list can be found at the “Resources” page of my website (http://policyviz.com/resources/).at the “Resources” page of my website (http://policyviz.com/resources/). Several Several tools—often free—can help users to make better use of their data in analysis or as tools—often free—can help users to make better use of their data in analysis or as they prepare for presentation or publication.they prepare for presentation or publication.

ColorColor could be the most misused and misunderstood aspect of design (Kosara Color could be the most misused and misunderstood aspect of design (Kosara

2013b). One simple strategy for improving the fi nished product is simply to avoid 2013b). One simple strategy for improving the fi nished product is simply to avoid using default palettes. The red, green, and blue default in Excel is so pervasive using default palettes. The red, green, and blue default in Excel is so pervasive that just changing to a different set of colors can make a graph more appealing. that just changing to a different set of colors can make a graph more appealing.

An Economist’s Guide to Visualizing Data 229

Free online tools like Adobe Kuler (kuler.adobe.com/create/color-wheel), Free online tools like Adobe Kuler (kuler.adobe.com/create/color-wheel), ColorBrewer2.0 (colorbrewer2.org), ColorSchemeDesigner (colorschemedesigner.ColorBrewer2.0 (colorbrewer2.org), ColorSchemeDesigner (colorschemedesigner.com), and Colrd (colrd.com), allow users to create, modify, and export color com), and Colrd (colrd.com), allow users to create, modify, and export color palettes. The Instant Eyedropper tool (instant-eyedropper.com) allows users to palettes. The Instant Eyedropper tool (instant-eyedropper.com) allows users to choose colors from any image on a computer screen. Much has been written on choose colors from any image on a computer screen. Much has been written on color, but Ware (2012) offers perhaps the most comprehensive discussion. For situ-color, but Ware (2012) offers perhaps the most comprehensive discussion. For situ-ations such as publication in this journal, where one’s graph may appear in both ations such as publication in this journal, where one’s graph may appear in both color and greyscale, there are tools—especially the ColorBrewer tool—that allow color and greyscale, there are tools—especially the ColorBrewer tool—that allow one to test grayscale-consistent color palettes.one to test grayscale-consistent color palettes.

Another important note about color is that about 10 percent of the population Another important note about color is that about 10 percent of the population has some form of a color vision defi ciency, and many of those people have diffi cul-has some form of a color vision defi ciency, and many of those people have diffi cul-ties distinguishing between greens and reds (see, for example, Coady 2013). Color ties distinguishing between greens and reds (see, for example, Coady 2013). Color Oracle (colororacle.org) and Vischeck (www.vischeck.com/) are two free tools that Oracle (colororacle.org) and Vischeck (www.vischeck.com/) are two free tools that can simulate varieties of color blindness.can simulate varieties of color blindness.

FontsWith so many type choices on most people’s computers or available online for With so many type choices on most people’s computers or available online for

free, it seems a shame to use the boring and overused Arial, Calibri, and Times New free, it seems a shame to use the boring and overused Arial, Calibri, and Times New Roman typefaces. New fonts are designed all the time, and many old ones can give an Roman typefaces. New fonts are designed all the time, and many old ones can give an image a fresh look. Good starting places are Font Squirrel (www.fontsquirrel.com) image a fresh look. Good starting places are Font Squirrel (www.fontsquirrel.com) and Google Fonts (www.google.com/fonts). Spiekermann and Ginger (2003) and and Google Fonts (www.google.com/fonts). Spiekermann and Ginger (2003) and Bringhurst (2013) explain some of the science of typography and offer guides to Bringhurst (2013) explain some of the science of typography and offer guides to choosing type for different purposes.choosing type for different purposes.

Visualization ToolsIt is relatively simple to venture beyond the basic graphs available through the It is relatively simple to venture beyond the basic graphs available through the

default settings in Stata, SAS, Excel, and other commonly available programs. The default settings in Stata, SAS, Excel, and other commonly available programs. The default graph in Stata, for example, has a blue background, and the fi rst set of tick default graph in Stata, for example, has a blue background, and the fi rst set of tick marks are not always located where the x- and y-axes intersect. Both of those conditions marks are not always located where the x- and y-axes intersect. Both of those conditions are easily changed. As alternatives, the open-source language R (www.r-project.org) are easily changed. As alternatives, the open-source language R (www.r-project.org) offers more graphing capabilities. And although Excel is often dismissed as unimag-offers more graphing capabilities. And although Excel is often dismissed as unimag-inative, a variety of blogs, books, and websites provide tips and strategies to extend inative, a variety of blogs, books, and websites provide tips and strategies to extend its capabilities (www.peltiertech.com is one).its capabilities (www.peltiertech.com is one).

It used to be that knowledge of HTML, JavaScript, or some other web-based It used to be that knowledge of HTML, JavaScript, or some other web-based programming language was a prerequisite for creating interactive visualizations. programming language was a prerequisite for creating interactive visualizations. But an expanding set of available graphics tools require no more than the ability But an expanding set of available graphics tools require no more than the ability to drag and drop. Several different fi le types can be imported into Tableau (wwwto drag and drop. Several different fi le types can be imported into Tableau (www.tableausoftware.com), for example, to create a variety of interactive graphics. Custom .tableausoftware.com), for example, to create a variety of interactive graphics. Custom visualizations can be built using different programming languages such as HTML, visualizations can be built using different programming languages such as HTML, JavaScript, or Processing (http://processing.org/). The R programming language JavaScript, or Processing (http://processing.org/). The R programming language also has a series of additions that enable users to create interactive visualizations: for also has a series of additions that enable users to create interactive visualizations: for examples, see, rCharts (http://ramnathv.github.io/rCharts/) and Shiny (http://examples, see, rCharts (http://ramnathv.github.io/rCharts/) and Shiny (http://www.rstudio.com/shiny/). The www.rstudio.com/shiny/). The New York Times, for example, makes extensive use of , for example, makes extensive use of the JavaScript library D3. Creator Mike Bostock (http://bost.ocks.org/mike/ and the JavaScript library D3. Creator Mike Bostock (http://bost.ocks.org/mike/ and

230 Journal of Economic Perspectives

bl.ocks.org/mbostock) provides a library of D3 visualizations, and Murray (2013) bl.ocks.org/mbostock) provides a library of D3 visualizations, and Murray (2013) offers an introduction to the language.offers an introduction to the language.

LayoutThe workhorse of most graphic designers is the Adobe Creative Suite, The workhorse of most graphic designers is the Adobe Creative Suite,

which includes programs such as InDesign, Illustrator, and PhotoShop. The free, which includes programs such as InDesign, Illustrator, and PhotoShop. The free, open-source software Inkscape (http://inkscape.org/) is an alternative. Many open-source software Inkscape (http://inkscape.org/) is an alternative. Many books have been published on the topic of layout and design: Golombisky and books have been published on the topic of layout and design: Golombisky and Hagen (2010) offer a starting point to better understanding of design techniques, Hagen (2010) offer a starting point to better understanding of design techniques, and Tondreau (2009) offers a good introduction to layout.and Tondreau (2009) offers a good introduction to layout.

MappingFormal mapping software can be quite expensive and free online versions—from Formal mapping software can be quite expensive and free online versions—from

ArcGIS and ESRI, for example—are often relatively infl exible. Stata offers the “spmap” ArcGIS and ESRI, for example—are often relatively infl exible. Stata offers the “spmap” add-in (www.stata.com/support/faqs/graphics/spmap-and-maps), but the quality of add-in (www.stata.com/support/faqs/graphics/spmap-and-maps), but the quality of the images is disappointing. StatPlanet (www.statsilk.com) is a free Flash-based program the images is disappointing. StatPlanet (www.statsilk.com) is a free Flash-based program that imports data from Excel to create interactive visualizations. Interactive maps can that imports data from Excel to create interactive visualizations. Interactive maps can also be constructed in Tableau, mentioned above. Another free tool, TileMill (http://also be constructed in Tableau, mentioned above. Another free tool, TileMill (http://www.mapbox.com/tilemill/) is HTML-based and thus may require a bit more time to www.mapbox.com/tilemill/) is HTML-based and thus may require a bit more time to learn and use, but it is slightly more fl exible.learn and use, but it is slightly more fl exible.

Infographic ToolsRising interest in infographics has led to the creation of services that guide Rising interest in infographics has led to the creation of services that guide

users through the design process. Like those for interactive visualizations, the new users through the design process. Like those for interactive visualizations, the new infographic packages are more user friendly than older tools. Examples include infographic packages are more user friendly than older tools. Examples include Datawrapper (http://datawrapper.de/), Infogr.am (http://infogr.am/), and Lemon.ly Datawrapper (http://datawrapper.de/), Infogr.am (http://infogr.am/), and Lemon.ly (http://lemon.ly/).(http://lemon.ly/).

ResourcesThe amount of writing on data visualization has exploded over the past few The amount of writing on data visualization has exploded over the past few

years. The data visualization fi eld moves fast—new work is constantly emerging, years. The data visualization fi eld moves fast—new work is constantly emerging, new products are constantly being released, and discussion and debates about best new products are constantly being released, and discussion and debates about best practices are continuous. Many books and blogs offer in-depth discussions on data practices are continuous. Many books and blogs offer in-depth discussions on data visualization techniques and strategies or offer tutorials for the tools listed above.visualization techniques and strategies or offer tutorials for the tools listed above.

Early and fundamental books are those by Tukey (1977), Bertin (1983), Early and fundamental books are those by Tukey (1977), Bertin (1983), Cleveland (1993), and Tufte (2001[1983]). Wong (2010) and Robbins (2013b) are Cleveland (1993), and Tufte (2001[1983]). Wong (2010) and Robbins (2013b) are excellent sources for classifying specifi c charts for specifi c data types. In his books, excellent sources for classifying specifi c charts for specifi c data types. In his books, Few (2009, 2012) dedicates a bit more time to examine the cognitive theory of Few (2009, 2012) dedicates a bit more time to examine the cognitive theory of data visualization and effective data visualization techniques. Recent books by Cairo data visualization and effective data visualization techniques. Recent books by Cairo (2013) and Yau (2011, 2013) are also excellent newer contributions to the fi eld.(2013) and Yau (2011, 2013) are also excellent newer contributions to the fi eld.

The number of blogs dedicated to the data visualization fi eld is constantly The number of blogs dedicated to the data visualization fi eld is constantly growing. What follows is a very short list: Eager Eyes (eagereyes.org) is produced by growing. What follows is a very short list: Eager Eyes (eagereyes.org) is produced by Robert Kosara, a visual analysis researcher at Tableau Software and former computer Robert Kosara, a visual analysis researcher at Tableau Software and former computer science professor at UNC-Charlotte. Kosara often writes about the research side of science professor at UNC-Charlotte. Kosara often writes about the research side of

Jonathan A. Schwabish 231

data and information visualization and offers constructive criticism and explora-data and information visualization and offers constructive criticism and explora-tions. Flowing Data (fl owingdata.com) is produced by author and statistician Nathan tions. Flowing Data (fl owingdata.com) is produced by author and statistician Nathan Yau, who provides a daily showcase of visualizations from around the web. He also Yau, who provides a daily showcase of visualizations from around the web. He also posts visualization tutorials, primarily in the R programming language. Percep-posts visualization tutorials, primarily in the R programming language. Percep-tual Edge (perceptualedge.com) is produced by author and consultant Stephen tual Edge (perceptualedge.com) is produced by author and consultant Stephen Few, who writes about good and bad trends in data visualization and promotes Few, who writes about good and bad trends in data visualization and promotes best practices with a focus on the visual aspects of human cognition. Junk Charts best practices with a focus on the visual aspects of human cognition. Junk Charts ( junkcharts.typepad.com) is where Kaiser Fung collects and offers criticism of ( junkcharts.typepad.com) is where Kaiser Fung collects and offers criticism of charts that do a poor job of presenting data. Visualising Data (visualisingdata.com) charts that do a poor job of presenting data. Visualising Data (visualisingdata.com) author Andy Kirk details the design process and trends in the fi eld. Storytelling author Andy Kirk details the design process and trends in the fi eld. Storytelling with Data (storytellingwithdata.com) is Cole Nussbaumer’s blog where she offers with Data (storytellingwithdata.com) is Cole Nussbaumer’s blog where she offers practical examples for creating more effective visualizations. Finally, I offer prac-practical examples for creating more effective visualizations. Finally, I offer prac-tical data visualization examples on my website (policyviz.com/), as well as issues tical data visualization examples on my website (policyviz.com/), as well as issues pertaining to creating more effective verbal presentations. On my companion site, pertaining to creating more effective verbal presentations. On my companion site, HelpMeViz.com, readers can submit works in progress to seek advice and feedback HelpMeViz.com, readers can submit works in progress to seek advice and feedback from the data visualization community.from the data visualization community.

Conclusion

For economists who want readers to apprehend results quickly and accurately, For economists who want readers to apprehend results quickly and accurately, presentation matters. Effective visualizations show the data to tell the story, reduce presentation matters. Effective visualizations show the data to tell the story, reduce clutter to keep the focus on the important points, and integrate the text with the clutter to keep the focus on the important points, and integrate the text with the graphs to transfer information effi ciently. With the increased fl exibility of even graphs to transfer information effi ciently. With the increased fl exibility of even fairly basic software programs (like Excel), it is now more cost-effective in terms of fairly basic software programs (like Excel), it is now more cost-effective in terms of time and energy for researchers to invest some time learning and thinking about time and energy for researchers to invest some time learning and thinking about the details of graphical presentation.the details of graphical presentation.

To create great, effective visualizations, carefully consider the needs of your To create great, effective visualizations, carefully consider the needs of your audience—the numbers, facts, or stories that will help them understand your ideas audience—the numbers, facts, or stories that will help them understand your ideas and your arguments. Consider the interfaces—static versus interactive—they will and your arguments. Consider the interfaces—static versus interactive—they will use. And pair the depth and clarity of your data, models, and writing with visualiza-use. And pair the depth and clarity of your data, models, and writing with visualiza-tions that are just as clear and compelling.tions that are just as clear and compelling.

■ The views in this article are mine and should not be interpreted as those of the Congressional Budget Offi ce. For more information on data visualization and presentation techniques, see my site www.PolicyViz.com. I am grateful to the editors of this journal—David Autor, Chang-Tai Hseih, Ulrike Malmendier, and Timothy Taylor—for recognizing the importance of data visualization in the economics profession, and to Ann Norman and Annette Blanar for their painstaking proofreading and typesetting work. I also thank the authors of the graphs used in the paper and Alberto Cairo, Stephen Few, Molly Dahl, Kate Kelly, and Robert Kosara.

232 Journal of Economic Perspectives

References

Bertin, Jacques. 1983. Semiology of Graphics: Diagrams, Networks, Maps. New York, NY: Esri Press.

Bringhurst, Robert. 2013. Elements of Typographic Style: Version  4.0, 20th Anniversary Edition. Point Roberts, WA: Hartley and Marks.

Brock, Tim. 2013. “Improving on a Pair of Pie Charts.” http://datatodisplay.com/blog/chart-design/improving-pair-pie-charts/.

Cairo, Alberto. 2013. The Functional Art: An Introduction to Information Graphics and Visualiza-tion. Berkeley, CA: New Riders.

Camões, Jorge. 2013. “Finally Revealed: The Optimal Number of Categories in a Pie Chart.” http://www.excelcharts.com/blog/optimal-number-categories-pie-chart/.

Cleveland, William S. 1993. Visualizing Data. Summit, NJ: Hobart Press.

Coady, Geri. 2013. Colour Accessibility. (The Pocket Guide series: Collection 3.) Five Simple Steps. http://www.fi vesimplesteps.com/products/the-pocket-guide-series-collection-three.

Corum, Jonathan. 2013. “Storytelling with Data.” Opening keynote address at the Tapestry Conference, held in Nashville, TN, February 23. http://style.org/tapestry/.

Few, Stephen. 2004. “Tapping the Power of Visual Perception.” Visual Business Intelligence News-letter, September 4. http://www.perceptualedge.com/articles/ie/visual_perception.pdf.

Few, Stephen. 2005. “Grid Lines in Graphs are Rarely Useful.” Visual Business Intelligence Newsletter, February. http://www.perceptualedge.com/articles/dmreview/grid_lines.pdf.

Few, Stephen. 2007. “Save the Pies for Dessert.” Visual Business Intelligence Newsletter (August).

Few, Stephen. 2009. Now You See It. Burlingame, CA: Analytics Press.

Few, Stephen. 2012. Show Me the Numbers: Designing Tables and Graphs to Enlighten. 2nd ed. Burlingame, CA: Analytics Press.

Golombisky, Kim, and Rebecca Hagen. 2010. White Space is Not Your Enemy: A Beginner’s Guide to Communicating Visually through Graphic, Web & Multimedia Design. Burlington, MA: Elsevier, Inc.

Hanson, Gordon H. 2012. “The Rise of Middle Kingdoms: Emerging Economies in Global Trade.” Journal of Economic Perspectives 26(2): 41– 64.

Harris, Robert L. 1996. Information Graphics: A Comprehensive Illustrated Reference. New York, NY: Oxford University Press.

Healey, Christopher G., and James T. Enns. 2012. “Attention and Visual Memory in Visualiza-tion and Computer Graphics.” IEEE Transactions on Visualization and Computer Graphics 18(7): 1170 – 88.

Heer, Jeffrey, Michael Bostock, and Vadim Ogievetsky. 2010. “A Tour through the Visualization Zoo.” ACMQueue, May 13. http://queue.acm.org/detail.cfm?id=1805128.

Hockley, William E., and Tyler Bancroft. 2011. “Extensions of the Picture Superiority Effect in Associative Recognition.” Canadian Journal of Experimental Psychology 65(4): 236 – 56.

Kirk, Andy. 2013. “Discussion: Storytelling and Success Stories.” April. http://www.visualisingdata.com/index.php/2013/04/discussion-storytelling-and-success-stories/.

Klerman, Jacob Alex, and Caroline Danielson. 2011. “The Transformation of the Supplemental Nutrition Assistance Program.” Journal of Policy Analysis and Management 30(4): 863 – 88.

Kosara, Robert. 2013a. “Visual Storytelling in the Age of Data.” Keynote address at the Tapestry Conference, held February in Nashville, TN. http://www.youtube.com/watch?v=qSYEjhR2AwQ&feature=youtu.be.

Kosara, Robert. 2013b. “How the Rainbow Color Map Misleads.” July 7. http://eagereyes.org/basics/rainbow-color-map.

Kosara, Robert, and Jock Mackinlay. 2013. “Storytelling: The Next Step for Visualization.” Computer (Special Issue on Cutting-Edge Research in Visualization) 46(5): 44 – 50.

Medina, John. 2008. Brain Rules: 12 Principles for Surviving and Thriving at Work, Home, and School. Pear Press.

Murray, Scott. 2013. Interactive Data Visualiza-tion for the Web: An Introduction to Designing with D3. Sebastopol, CA: O’Reilly Media, Inc.

Nussbaumer, Cole. 2013. “Strategies for Avoiding the Spaghetti Graph.” March 14. http://www.storytellingwithdata.com/2013/03/avoiding-spaghetti-graph.html.

Organisation for Economic Co-operation and Development. Accessed, August 2013. “Percentage of Employed Who are Senior Managers, by Sex.” http://www.oecd.org/gender/data/proportionofemployedwhoareseniormanagersbysex.htm.

Ottaviano, Gianmarco I. P., and Giovanni Peri. 2008. “Immigration and National Wages: Clarifying the Theory and the Empirics.” NBER Working Paper 14188 ( July).

Rampell, Catherine. 2013. “Comparing the World’s Glass Ceilings.” Economix, April 2. http://economix.blogs.nytimes.com/2013/04/02/comparing-the-worlds-glass-ceilings/?_r=0.

Robbins, Naomi. 2013a. “How to Position Y-Axis Labels in Graphs.” February 12. http://www.forbes.com/sites/naomirobbins/2013/02/12/how-to-position-y-axis-labels-in-graphs/.

An Economist’s Guide to Visualizing Data 233

Robbins, Naomi. 2013b. Creating More Effective Graphs. 2nd ed. Chart House.

Schwabish, Jonathan. 2013a. “Mind the Gap—An Economic Remake.” The Why Axis, April. http://thewhyaxis.info/gap-remake/.

Schwabish, Jonathan. 2013b. “To Label or Not to Label? That Is the Question.” June 10. http://www.allanalytics.com/author.asp?section_id=3072&doc_id=264322.

Schwabish, Jonathan. 2013c. “Visualizing Data: Bad Labels, Easy Fix.” May 2. http://www.allanalytics.com/author.asp?section_id=3072&doc_id=262539.

Segel, Edward, and Jeffrey Heer. 2010. “Narra-tive Visualization: Telling Stories with Data.” IEEE Transactions on Visualization and Computer Graphics 16(6): 1139 – 48.

Shneiderman, Ben. 1996. “The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations.” In Proceedings of the IEEE Symposium on Visual Languages, IEEE Computer Society Press, 336–43.

Skau, Drew. 2012. “2D’s Company, 3D’s a Crowd.” http://blog.visual.ly/2ds-company-3ds-a-crowd/.

Social Security Administration. 2009. Fast Facts & Figures About Social Security, 2009. Social Security Administration ( July). http://www.ssa.gov/policy/docs/chartbooks/fast_facts/2009/fast_facts09.pdf.

Social Security Advisory Board. 2012. Aspects of Disability Decision Making: Data and Materials. Social Security Administration (February), http://www.ssab.gov/Publications/Disability/GPO_Chartbook_FINAL_06122012.pdf.

Spiekermann, Erik, and E. M. Ginger. 2003. Stop Stealing Sheep & Find Out How Type Works, 2nd ed. Berkeley, CA: Adobe Press.

Stinebrickner, Ralph, and Todd Stinebrickner. 2013. “Academic Performance and College Dropout: Using Longitudinal Expectations Data to Estimate a Learning Model.” Western University CIBC Working Paper 2013-5 ( July), http://economics.uwo.ca/cibc/workingpapers_docs/wp2013/Stinebrickner_Stinebrickner05.pdf.

Tondreau, Beth. 2009. Layout Essentials: 100 Design Principles for Using Grids. Beverly, MA: Rockport Publishers.

Tukey, John W. 1977. Exploratory Data Analysis. Reading, MA: Addison Wesley.

Tufte, Edward. 2001 [1983]. The Visual Display of Quantitative Information. 2nd ed. (First edition 1983). Cheshire, CT: Graphics Press.

Tufte, Edward. 2006. Beautiful Evidence. Cheshire, CT: Graphics Press.

Ware, Colin. 2012. Information Visualization: Perception for Design. 3rd ed. Waltham, MA: Morgan Kaufman.

Wong, Dona M. 2010. The Wall Street Journal Guide to Information Graphics: The Dos and Don’ts of Presenting Data, Facts, and Figures. New York: W. W. Norton and Company.

Yau, Nathan. 2011. Visualize This: The Flowing Data Guide to Design, Visualization, and Statistics. Indianapolis, IN: Wiley Publishing, Inc.

Yau, Nathan. 2013. Data Points: Visualization that Means Something. Indianapolis, IN: Wiley Publishing, Inc.


Recommended