IPUMS-International Development Process
1. Inventory1. Inventory
2. Metadata Preparation2. Metadata Preparation
3. Data Preparation3. Data Preparation
4. Harmonization4. Harmonization
5. Data Enhancements5. Data Enhancements
6. Dissemination6. Dissemination
IPUMS-International Development Process
1. Inventory1. Inventory
a) Dataa) Data
b) Data dictionaryb) Data dictionary
c) Census questionnaire and instructionsc) Census questionnaire and instructions
d) Sample designd) Sample design
IPUMS-International Development Process
2. Metadata Preparation2. Metadata Preparation
• English translationEnglish translation
IPUMS-International Development Process
2. Metadata Preparation2. Metadata Preparation
• English translationEnglish translation
• Data dictionariesData dictionaries
Original Data Dictionary (Kenya 1989). C006-EA-TYPE N 13 1 RURAL 1 URBAN 2 . C007-HHOLDNUM N 14-16 3 HHOLD-CODE 001:999 . (record type) A 17 1 . .age 2 Data Dictionary: REAL1 IMPS Version 3.1 . Created: 31/10/95 11:57:21 . Record Name: POP-RECORD Record Type: 2 .------------------------------------------------------------------------------- .tem (occurs) Data Item . Subitem (occurs) Type Position Len. Dec. Value Name Values .------------------------------------------------------------------------------- POP1 A 18-67 50 . P00-LINENUMBER N 18-19 2 0 LINE-NUMBER 01:49 . P10-RELATIONSHI N 20 1 0 HEAD 1 SPOUSE 2 SON-OFHEAD 3 DAU-OFHEAD 4 FATHER 5 MOTHER 6 OTHERRELATIVE 7 NOTRELATED 8 NR 9 . P11-SEX N 21 1 0 MALE 1 FEMALE 2 NR 9 . P12-AGE N 22-23 2 0 UNDERONE 00 YEARGIVEN 01:96 OVR97 97 NR 99
Original Data Dictionary (Romania 1992)Line No.
Item Data type and
Item Len.
Signification and values
1. MAPA N 6 010001- 47XXXX number of the file, where : - 01- 47 is the code of the county
- 0001-XXXX is the code of the census sector within the county
2. CLAD N 3 The order number of the building in the file 3. LOC N 3 The order number of the dwelling within the building 4. RT N 1 Record type value: 4 5. P00 N 1 The order number of the household in the dwelling 6. PNR N 2 The order number of the person in the household 7. P01 N 2 Relationship with the household head:
. household head 1 . husband / wife 2 . son / daughter 3 . son in law / daughter in law 4 . grandson / granddaughter 5 . father / mother 6 . grandfather / grandmother 7 . brother / sister 8 . brother in law / sister in law 9 . father in law / mother in law 10 . other relative 11 . non-related person 20
8. P05 N 1 Situation at the census moment: . present 1 . temporally absent from the household: - left in other place of the country 2 - left abroad 3 . absent for a long time: - for working 4 - for studies 5 - other reason 6
Original Data Dictionary (China 1982)======================================= year: 1982, sample: 1%, record: individual, variable: age Length: 3 Start: 7 Age in years 0..99 ======================================= year: 1982, sample: 1%, record: individual, variable: race Length: 2 Start: 10 Ethnicity 01: Han 21: Va 41: Tajik 02: Mongol 22: She 42: Nu 03: Hui 23: Gaoshan 43: Uzbek 04: Tibetan 24: Lahu 44: Russian 05: Uygur 25: Sui 45: Ewenkei 06: Miao 26: Dongxiang 46: Benglong 07: Yi 27: Naxi 47: Baoan 08: Zhuang 28: Jingpo 48: Yugur 09: Bouyi 29: Kirgiz 49: Gin 10: Korean 30: Tu 50: Tatar 11: Man 31: Daur 51: Derung 12: Dong 32: Mulam 52: Orogen 13: Yao 33: Qiang 53: Hezhen 14: Bai 34: Bulang 54: monba 15: Tujia 35: Salar 55: Lhoba 16: Hani 36: Maonan 56: Jino 17: Kazak 37: Gelao 97: Other Unidentified 18: Dai 38: Xibe 98: Naturalized Foreigners 19: Li 39: Achang 20: Lisu 40: Pumi ======================================= year: 1982, sample: 1%, record: individual, variable: regstats Length: 1 Start: 12 Registration Status 1: Residing and registered here 2: Residing here over 1 year, but registered elsewhere. 3: Residing here less than 1 year, absent from the registration place 1 year or more. 4: Living here with registration unsettled 5: Used to reside here; is now abroad with no local registration =======================================
Original Data Dictionary (Mexico 1990)
25 CLAVE DE PARENTESCO CATALOGO DE PARENTESCO (CATPAREN.TXT) PRIMER DIGITO IGUAL A: 1 JEFE(A) 2 ESPOSA(O) O COMPAÑERA(O) 3 HIJO(A) 4 SIRVIENTE 5 SIN PARENTESCO 6 OTRO PARENTESCO 7 PERSONA SOLA 9 PARENTESCO NO ESPECIFICADO 26 SEXO 1 HOMBRE 2 MUJER 27 EDAD AÑOS CUMPLIDOS 999 EDAD NO ESPECIFICADA 28 LUGAR DE NACIMIENTO CATALOGO DE PAISES (CATPAISE.TXT) 001..032 ENTIDADES DEL PAIS 033..099 ENTIDAD INSUFICIENTEMENTE ESPECIFICADO 100..998 OTRO PAIS 999 NO ESPECIFICO LUGAR DE NACIMIENTO 29 LUGAR DE RESIDENCIA ANTERIOR CATALOGO DE PAISES (CATPAISE.TXT) 001..032 ENTIDADES DEL PAIS 033..099 ENTIDAD INSUFICIENTEMENTE ESPECIFICADO 100..998 OTRO PAIS 999 NO ESPECIFICO LUGAR DE RESIDENCIA ANTERIOR
Variable Labels File – IPUMS Metadata
(Costa Rica 2000)
Rec Var Col Wid Value Value_Label Value_Label_Original Freq Svar P relate 36 2 Relationship to household head P01-Parentesco con el jefe(a) CR00A400 1 Head (male or female) Jefe o jefa 960,098 2 Spouse or partner Esposo(a)/compañera 680,217 3 Child or stepchild Hijo(a)/hijastro 1,763,230 4 Son-in-law or daughter-in-law Yerno o nuera 23,644 5 Grandchild Nieto(a) 140,300 6 Parent or parent in-law Padres o suegros 44,393 7 Other relative Otro familiar 117,223 8 Domestic servant or relative Serv.Domestico o su familiar 11,884 9 Other non-relative Otro no familiar 69,190 P sex 38 1 Sex P02-Sexo CR00A401 1 Male Masculino 1,902,614 2 Female Femenino 1,907,565 P bpl 39 1 Place of birth P04-Lugar de Nacimiento CR00A403 1 In this same canton Mismo canton 2,303,784 2 In another canton Otro canton 1,209,934 3 In another country Otro pais 296,461 P ethnic 40 2 Ethnic group P06-Etnia CR00A408 1 Indigenous Indigena 63,876 2 Black or Afrocostarican Negra o Afrocostarricense 72,784 3 Asian China 7,873 4 None of the above Ninguna anterior 3,568,471 9 Unknown Ignorado 97,175 P indigsp 42 2 Speaks Indigenous language P06b-Habla lengua indigena CR00A410 1 Yes, speaks Indigenous lang Si habla lengua indígena 15,806 2 No, does not speak Indigenous lang No habla lengua indígena 13,768 9 Unknown Ignorado 3,554 10 [no label] 3,777,051
IPUMS-International Development Process
2. Metadata Preparation2. Metadata Preparation
• English translationEnglish translation
• Data dictionariesData dictionaries
• Questionnaires and instructionsQuestionnaires and instructions
5. Number of Rooms
How many rooms are used for sleeping without counting hallways? _____ Write the number
Without counting the hallways or bathrooms how many total rooms are in this dwelling? Count the kitchen
_____Write the number
6. Access to water
Read all of the options until you get an affirmative answer. Circle only one answer
1 Running water inside the dwelling 2 Running water outside the dwelling but on the land 3 Running water from a public faucet or hydrant 4 Running water that is carried from another dwelling 5 Tanked in by truck 6 Water from a well, river, lake, stream or other
Answers 3, 4, 5, 6 continue with number 8
7. Water supply
How many days of the week is water available? Circle only one answer
1 Daily 2 Every third day 3 Twice a week 4 Once a week 5 Occasionally
Text of Census Questionnaire (Mexico 2000)
5. Number of Rooms <svar v="MX00A016" a="all"> How many rooms are used for sleeping without counting hallways?
<i1> _____ Write the number </i1>
</svar> <svar v="MX00A017" a="all"> Without counting the hallways or bathrooms how many total rooms are in this dwelling? Count the kitchen
<i1> _____Write the number </i1>
</svar> <svar v="MX00A018" a="all"> 6. Access to water
Read all of the options until you get an affirmative answer. Circle only one answer <i1> 1 Running water inside the dwelling 2 Running water outside the dwelling but on the land 3 Running water from a public faucet or hydrant 4 Running water that is carried from another dwelling 5 Tanked in by truck 6 Water from a well, river, lake, stream or other </i1>
Answers 3, 4, 5, 6 continue with number 8 </svar>
XML-Tagged Census Questionnaire (Mexico 2000)
Source variableSource variableMX00A016MX00A016
Source variableSource variableMX00A017MX00A017
Source variableSource variableMX00A018MX00A018
(water access)(water access)
<svar v="MX00A016 MX00A017" a="all"> 5. Number of Rooms
Room is the space in the dwelling delimited, normally, by fixed walls and roofs of any material. In the first question, only the rooms utilized for sleeping are considered. In the second, include all the rooms in the dwelling: bedrooms, living room, dining room, living room-dining room, kitchen, living room ["estancia"], study, and service room.
Storerooms, granaries, commercial areas, stores, garages, or others, which are regularly used for sleeping, should be counted as bedrooms and be included in the total number of rooms. </svar> <svar v="MX00A018" a="all"> 6. Access to Water
This question distinguishes dwellings which have piped water from those that get water from a different source.
[Depiction of this completed question on the enumeration form, and a related drawing] </svar> <svar v="MX00A019 MX00A020" a="all"> 7. Water Supply
When there is piped water within the dwelling or outside of the dwelling but within the property, first ask how often they receive it, and if they receive it daily, if they receive it during all or part of the day.
[Depiction of this completed question on the enumeration form] </svar>
Source variableSource variableMX00A018MX00A018
XML-Tagged Census Instructions (Mexico 2000)
IPUMS-International Development Process
3. Data Preparation3. Data Preparation
• Data reformattingData reformatting
geography housing
person (head)
person (child)
person (child)
geography housing person (head)
geography housing person (child)
geography housing person (child)
geography housing person (head)
geography housing person (spouse)
geography housing person (child)
geography housing person (child)
geography housing
person (head)
person (spouse)
person (child)
person (child)
(Brazil 1980)
(Person records only; household data duplicated on person records)
Reformat Rectangular Sample
dwelling
household
person (head)
person (spouse)
person (child)
household
person (head)
person (child)
person (head)
person (spouse)
dwelling
household
dwelling household
person (head)
person (spouse)
person (child)
dwelling household
person (head)
person (child)
dwelling household
person (head)
person (spouse)
(Chile 1992)
(Separate dwelling and household records)
Reformat Dwelling-Household-Person Sample
serial 001 head
serial 001 spouse
serial 002 head
serial 002 child
serial 003 head
serial 001 geog & housing
serial 002 geog & housing
serial 003 geog & housing
serial 001 household
serial 001 head
serial 001 spouse
serial 003 household
serial 002 household
serial 002 head
serial 002 child
serial 003 head
Household File
Person File
(Brazil 2000)
Merge Separate Household and Person Files
IPUMS-International Development Process
3. Data Preparation3. Data Preparation
• Data reformattingData reformatting
• Draw samplesDraw samples
• Confidentiality measuresConfidentiality measures
• Convert source variables to inputConvert source variables to input
MX2000 MX00A018
H 49
Code Label Code Label
B Not specified {21,807} 0 NIU
1 Piped water inside the dwelling {1,138,262} 1 Piped water, inside dwelling
2 Piped water outside dwelling, but within property {697,912} 2 Piped water, outside dwelling, within property
3 Piped water from a public tap (or hydrant) {68,212} 3 Piped water, from a public tap
4 Piped water brought in from another dwelling {52,041} 4 Piped water, brought from another dwelling
5 Tanked in by truck {46,147} 5 Tanked in by truck
6 Water from a well, river, lake, stream or other {287,654} 6 From a well, river, lake, stream or other
7 [undocumented] {3} 9 Unknown
8 [undocumented] {1} 9 "
9 [undocumented] {6} 9 "
Original Source VariableOriginal Source Variable IPUMSI Input VariableIPUMSI Input Variable
Input Variables – Data
Input Variables – DescriptionMX00A18 Water source Universe Not collective households. Description Source of water used by the household. Questionnaire 6. Access to water Read all of the options until you get an affirmative answer. Circle only one answer
1 Running water inside the dwelling 2 Running water outside the dwelling but on the land 3 Running water from a public faucet or hydrant 4 Running water that is carried from another dwelling 5 Tanked in by truck 6 Water from a well, river, lake, stream or other
Answers 3, 4, 5, 6 continue with number 8 Instructions 6. Access to Water This question distinguishes dwellings which have piped water from those that get water from a different source.
Assigned by Assigned by computercomputer
Developed by Developed by researchersresearchers
Assembled by computerAssembled by computerfrom XML markupsfrom XML markups
IPUMS-International Development Process
4. Harmonization4. Harmonization
• DataData
• Correspondence tablesCorrespondence tables
Correspondence Table – Marital Status
MARST Marital Status
code label CN82A403 CO73A411 KN89A413 MX70A402 US90A425
100 SINGLE/NEVER MARRIED 1=never married 4=single 1=single 9=single 6=never married
200 MARRIED/IN UNION
210 Married (not specified) 2=married 2=married 3=monogamous 1=married
211 Civil 3=only civil
212 Religious 4=only religious
213 Civil and religious 2=civil and religious
214 Polygamous 3=polygamous
220 Consensual union 1=free union 5=free union
300 SEPARATED/DIVORCED 3=sep. or divorced
310 Separated 6=separated 8=separated 3=separated
321 Legally separated
322 De facto separated
330 Divorced 4=divorced 5=divorced 7=divorced 4=divorced
400 WIDOWED 3=widowed 5=widowed 4=widowed 6=widowed 5=widowed
999 UNKNOWN/MISSING 0=missing 6=unknown B=blank 1=unknown
ChinaChina19821982
ColombiaColombia19731973
KenyaKenya19891989
MexicoMexico19701970
U.S.A.U.S.A.19901990
Correspondence Table – Marital Status
MARST Marital Status
gen code label CN82A403 CO73A411 KN89A413 MX70A402 US90A425
1 100 SINGLE/NEVER MARRIED 1=never married 4=single 1=single 9=single 6=never married
2 200 MARRIED/IN UNION
210 Married (not specified) 2=married 2=married 3=monogamous 1=married
211 Civil 3=only civil
212 Religious 4=only religious
213 Civil and religious 2=civil and religious
214 Polygamous 3=polygamous
220 Consensual union 1=free union 5=free union
3 300 SEPARATED/DIVORCED 3=sep. or divorced
310 Separated 6=separated 8=separated 3=separated
321 Legally separated
322 De facto separated
330 Divorced 4=divorced 5=divorced 7=divorced 4=divorced
4 400 WIDOWED 3=widowed 5=widowed 4=widowed 6=widowed 5=widowed
9 999 UNKNOWN/MISSING 0=missing 6=unknown B=blank 1=unknown
General Codes
IPUMS-International Development Process
4. Harmonization4. Harmonization
• DataData
• Correspondence tablesCorrespondence tables
• Supplemental programmingSupplemental programming
<programming> BRA1970 inctot=p25..28*12; if (p25..28 = 9999) inctot=0; if (age<10) inctot=99999999; BRA1980 inctot=p64..72 + p87..95 + p103..111 + p112..120 + p121..129 + p130..138; BRA1991 if (p139..146=BBBBBBBB)inctot=0. if (p139..146 > 9999997 && p139..146 < 99999999)inctot=9999997; if (p139..146=99999999)inctot=9999998; if (age<10) inctot=99999999; BRA2000 if (p310..315 = 0 and age < 10) inctot=99999999; if (p310..315 = BBBBBB && age > 9)inctot=9999998; COL1973 if(age < 10) inctot=9999999; if(p55..59 = 99999) inctot=9999998; if(p55..59 = BBBBB) inctot=0; USA1960, USA1970, USA1980, USA1990 if (p154..159=999999)inctot=9999999; MEX1970 if (age < 12) inctot=9999999; MEX2000 if (p170..175 = 999999) inctot=9999998; </programming>
Supplementary Variable Programming (INCTOT)
IPUMS-International Development Process
4. Harmonization4. Harmonization
• DataData
• Correspondence tablesCorrespondence tables
• Supplemental programmingSupplemental programming
• DocumentationDocumentation
• IntegrationIntegration
• Mark-up for web deliveryMark-up for web delivery
XML-Tagged Variable Text (Literacy)<vardesc> <var> LIT </var> <desc> LIT indicates whether or not the respondent could read and write in any language. A person is typically considered literate if they can both read and write. All other persons are illiterate, including those who can either read or write but cannot do both. </desc> <comp> Some samples provided more specific criteria than others with respect to the level of ability that should constitute literacy. Typically, the instructions appear to be aimed at distinguishing persons who have memorized how to write their signature or recognize certain words from those that can truly write and comprehend text they read. In 1999 Vietnam, all persons with 5 or more years of schooling are automatically considered literate. </comp> <comp.bra> All Brazilian censuses consistently stipulated that to be considered literate a person must be able to read and write a simple note in any language. Persons are not literate if they can only write their name or if they once learned to read and write but have since forgotten. </comp.bra> <comp.chn> The Chinese census instructions supplied explicit criteria for defining literate and semi-literate persons, who are combined in the data as "illiterate." The instructions stated that illiterate and semi-literate persons were those who knew fewer than 1500 words and could not read "simple language books and newspapers or write a simple message." </comp.chn>
VariableVariableNameName
DescriptionDescription
GeneralGeneralComparabilityComparability
ComparabilityComparabilityBrazilBrazil
ComparabilityComparabilityChinaChina
IPUMS-International Development Process
5. Data Enhancements5. Data Enhancements
• Data editingData editing
• Consistency editsConsistency edits
• Hot-deck imputationHot-deck imputation
OCCallocated when 975, 996, 998
categ1 categ2 categ3 categ4 categ5 categ6
empstat (10-19) (20-29) (30-39)
classwkr (10-19) (20-29) (99)
sex (1) (2)
race (100-199) (200-299) (300-899)
age (10-19) (20-29) (30-39) (40-49) (50-59) (60-120)
Missing Data Allocation Script
(Occupation variable, USA)
5 dimensional table5 dimensional table324 cells324 cells
IPUMS-International Development Process
5. Data Enhancements5. Data Enhancements
• Data editingData editing
• Consistency editsConsistency edits
• Hot-deck imputationHot-deck imputation
• Family interrelationship “pointers”Family interrelationship “pointers”
Pernum Relate Age Sex Marst Chborn
1 head 46 male married n/a
2 spouse 44 female married 3
3 aunt 77 female widow 7
4 child 15 female single 0
5 child 13 female single n/a
6 child 11 male single n/a
Pernum Relate Age Sex Marst Chborn
1 head 46 male married n/a
2 spouse 44 female married 3
3 aunt 77 female widow 7
4 child 15 female single 0
5 child 13 female single n/a
6 child 11 male single n/a
Spouse’s
Mother’s Father’s
IPUMS “Pointer” Variables
Location
2
1
0
0
0
0
Location
Location
0
0
0 0
0
0
2 1
1
1
2
2
(Simple household)
Pernum Relationship Age Sex Marst Chborn
1 head 53 female separated 6
2 child 28 male single n/a
3 child 22 male single n/a
4 child 21 male single n/a
5 child 25 female married 2
6 child-in-law 28 male married n/a
7 grandchild 3 male single n/a
8 grandchild 1 male single n/a
9 non-relative 32 female separated 2
10 non-relative 10 male single n/a
11 non-relative 5 female single n/a
Location
Location
Location
0
0
0
0
0
6
5
0
0
0
0
0
0
1
1
1
1
0
5
5
0
9
9
0
0
0
6
6
0
0
0
0
0
Spouse’s Father’sMother’s
IPUMS “Pointer” Variables(Complex household)
IPUMS-International Development Process
6. Dissemination6. Dissemination
• Documentation systemDocumentation system
• Preferences and dynamic content deliveryPreferences and dynamic content delivery
IPUMS-International Development Process
6. Dissemination6. Dissemination
• Documentation systemDocumentation system
• Preferences and dynamic content deliveryPreferences and dynamic content delivery
• Data extraction systemData extraction system
• Sample, variable, and case selectionSample, variable, and case selection
• General and detailed variablesGeneral and detailed variables
• Advanced extract featuresAdvanced extract features