SJTU CMGPD 2012Methodological Lecture
Day 9Kinship
Ancestry identifiersSpecific patrilineal ancestors
• In the Basic file…– FATHER_ID– GRANDFATHER_ID
• In the Kinship file…– F_ID_1 – same as FATHER_ID– F_ID_2 – same as GRANDFATHER_ID– F_ID_3 – Great-grandfather– F_ID_4 – Great-great-grandfather
Ancestry identifiersSpecific patrilineal ancestors
• Wives of paternal ancestors– M_ID_1 – Mother
• Same as MOTHER_ID in Basic– M_ID_2 – Paternal grandmother
• Father’s mother (fm)– M_ID_3 – Paternal great-grandmother
• ffm– M_ID_4 – Paternal great-great-grandmother
• fffm
Ancestry identifiersInferred ancestors
• Most identifiers refer to actual individuals observed in the dataset• In some cases, the existence of a common ancestor whose death
predated the earliest available register is inferred.– Based on relationship codes– Brothers in the earliest available register are inferred to have a
common father.– Cousins in the earliest available register are inferred to have a
common father.• For grouping purposes, an identifier is assigned that doesn’t refer to
anyone observed in the dataset– No corresponding PERSON_ID
• FATHER_ID_IMPUTED, GRANDFATHER_ID_IMPUTED are flags indicating that the IDs don’t refer to anyone observed in the dataset
Distributions of men by numbers of descendants
110
100
1000
1000
0N
umbe
r of m
en
0 5 10 15 20Number of descendants
Sons GrandsonsGreat-grandsons Great-great-grandsons
use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\27063-0001-Data.dta" if SEX == 2 & PRESENTbysort PERSON_ID: keep if _n == 1keep FATHER_IDkeep if FATHER_ID != "-99"bysort FATHER_ID: generate sons = _Nbysort FATHER_ID: keep if _n == 1rename FATHER_ID PERSON_IDsave Sons, replace
use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\27063-0001-Data.dta" if SEX == 2 & PRESENTbysort PERSON_ID: keep if _n == 1keep GRANDFATHER_IDkeep if GRANDFATHER_ID != "-99"bysort GRANDFATHER_ID: generate grandsons = _Nbysort GRANDFATHER_ID: keep if _n == 1rename GRANDFATHER_ID PERSON_IDsave Grandsons, replace
use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\27063-0001-Data.dta" if SEX == 2 & PRESENTbysort PERSON_ID: keep if _n == 1merge 1:1 RECORD_NUMBER using "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0004\27063-0004-Data.dta", keepusing(F_ID_3) keep(match master)keep F_ID_3keep if F_ID_3 != "-99"replace F_ID_3 = substr(F_ID_3,3,.)bysort F_ID_3: generate ggrandsons = _Nbysort F_ID_3: keep if _n == 1rename F_ID_3 PERSON_IDsave GGrandsons, replace
use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\27063-0001-Data.dta" if SEX == 2 & PRESENTbysort PERSON_ID: keep if _n == 1merge 1:1 RECORD_NUMBER using "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0004\27063-0004-Data.dta", keepusing(F_ID_4) keep(match master)keep F_ID_4keep if F_ID_4 != "-99"replace F_ID_4 = substr(F_ID_4,3,.)bysort F_ID_4: generate gggrandsons = _Nbysort F_ID_4: keep if _n == 1rename F_ID_4 PERSON_IDsave GGGrandsons, replace
use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\27063-0001-Data.dta" if SEX == 2 & PRESENTbysort PERSON_ID (YEAR): keep if _n == 1 & YEAR <= 1810keep PERSON_IDmerge 1:1 PERSON_ID using Sons, keep(match master)replace sons = 0 if sons == .drop _mergemerge 1:1 PERSON_ID using Grandsons, keep(match master)replace grandsons = 0 if grandsons == .drop _mergemerge 1:1 PERSON_ID using GGrandsons, keep(match master)replace ggrandsons = 0 if ggrandsons == .drop _mergemerge 1:1 PERSON_ID using GGGrandsons, keep(match master)replace gggrandsons = 0 if gggrandsons == .drop _merge
replace sons = 20 if sons >= 20bysort sons: generate first_in_sons = _n == 1bysort sons: generate sons_number = _Nlabel variable sons_number "Sons"
replace grandsons = 20 if grandsons >= 20 bysort grandsons: generate first_in_grandsons = _n == 1bysort grandsons: generate grandsons_number = _N label variable grandsons_number "Grandsons"
replace ggrandsons = 20 if ggrandsons >= 20 bysort ggrandsons: generate first_in_ggrandsons = _n == 1bysort ggrandsons: generate ggrandsons_number = _Nlabel variable ggrandsons_number "Great-grandsons"
replace gggrandsons = 20 if gggrandsons >= 20 bysort gggrandsons: generate first_in_gggrandsons = _n == 1bysort gggrandsons: generate gggrandsons_number = _Nlabel variable gggrandsons_number "Great-great-grandsons"
twoway line sons_number sons if first_in_sons, sort yscale(log) || line grandsons_number grandsons if first_in_grandsons, sort || line ggrandsons_number ggrandsons if first_in_ggrandsons, sort || line gggrandsons_number gggrandsons if first_in_gggrandsons, sort ||, scheme(s1mono) xtitle("Number of descendants") ytitle("Number of men") ylabel(1 10 100 1000 10000)
Kinship variables for groupingUses
• Controlling for kin group membership– Via random-effects models– Alongside village, household, other levels– Multiple levels are computationally demanding
• Often need tricks to collapse observations or otherwise reduce the dataset
• Computation of explanatory variables– Aggregate measures of kin network status to use as
right-hand side variables• Units of analysis in their own right
– See yesterday
Kinship variables for groupingAscending order of kin distance
• FOUNDER_ID– Descent from a common male ancestor in the
registers• FOUNDER_INFERRED_ID
– Descent from a common male ancestor inferred from relationship codes in the earliest available register
• UNIQUE_YI_HU– Descent from members of the same yihu in the
earliest available register• UNIQUE_GROUP
– Descent from members of the adjacent yihu with the same surname in the earliest available register
Numbers and average sizes of units
Units Obs. Per Unit
Individuals Per. Unit
FOUNDER_ID 25540 59 10
FOUNDER_INFERRED_ID* 28832 52 9
UNIQUE_YIHU 2688 563 99
UNIQUE_GROUP 1063 1423 250
Kinship variables for groupingFOUNDER_ID
• PERSON_ID of earliest male ancestor located in the registers.
• Most narrowly-defined grouping variable– Based on descent from a single observed
individual.• Many extinctions
– Within one or two generations– Causes average size of groups defined by
FOUNDER_ID to rise over time
0.2
.4.6
Frac
tion
0 50 100 150 200Number of observations with same FOUNDER_ID
bysort FOUNDER_ID: generate founder_id_obs = _Nbysort FOUNDER_ID: generate first_in_founder_id = _n == 1replace founder_id_obs = 200 if founder_id_obs > 200histogram founder_id_obs if first_in_founder_id, width(10) scheme(s1mono) xtitle("Number of observations with same FOUNDER_ID") fraction
bysort FOUNDER_ID YEAR: generate founder_id_obs_year = _Nbysort FOUNDER_ID YEAR: keep if _n == 1collapse founder_id_obs_year, by(YEAR)line founder_id_obs_year YEAR, scheme(s1mono) ytitle("Mean number of observations per FOUNDER_ID") ylabel(0(2)12)
02
46
810
12M
ean
num
ber o
f obs
erva
tions
per
FO
UN
DE
R_I
D
1750 1800 1850 1900Year
Kinship variables for groupingFOUNDER_ID_INFERRED
• Uses earliest available inferred ancestor– Based on relationship codes in earliest
available register• Useful for grouping records in earliest registers
– Until 1789, relationships were to head of yihu, not linghu.
– Allowed for inference of common ancestry• Average size of groups defined by
FOUNDER_ID_INFERRED increases over time because of extinction of smaller groups
bysort FOUNDER_INFERRED_ID: generate founder_id_obs = _Nbysort FOUNDER_INFERRED_ID: generate first_in_founder_id = _n == 1replace founder_id_obs = 200 if founder_id_obs > 200histogram founder_id_obs if first_in_founder_id, width(10) scheme(s1mono) xtitle("Number of observations with same FOUNDER_INFERRED_ID") fraction
0.1
.2.3
.4.5
Frac
tion
0 50 100 150 200Number of observations with same FOUNDER_INFERRED_ID
bysort FOUNDER_INFERRED_ID YEAR: generate founder_id_obs_year = _Nbysort FOUNDER_INFERRED_ID YEAR: keep if _n == 1collapse founder_id_obs_year, by(YEAR)line founder_id_obs_year YEAR, scheme(s1mono) ytitle("Mean number of observations per FOUNDER_INFERRED_ID") ylabel(0(2)12))
02
46
810
12M
ean
num
ber o
f obs
erva
tions
per
FO
UN
DE
R_I
NFE
RR
ED
_ID
1750 1800 1850 1900Year
Kinship variables for groupingUNIQUE_YIHU
• Descendants of members of the same yihu in the earliest available register.
• Clusters are much larger than the ones defined by FOUNDER_ID or FOUNDER_INFERRED_ID
0.1
.2.3
.4Fr
actio
n
0 100 200 300 400 500Number of observations in UNIQUE_YI_HU
bysort UNIQUE_YI_HU YEAR: generate founder_id_obs_year = _Nbysort UNIQUE_YI_HU YEAR: keep if _n == 1collapse founder_id_obs_year, by(YEAR)line founder_id_obs_year YEARline founder_id_obs_year YEAR, scheme(s1mono) ytitle("Mean number of observations per UNIQUE_YI_HU") ylabel(0(5)60)
05
1015
2025
3035
4045
5055
60M
ean
num
ber o
f obs
erva
tions
per
UN
IQU
E_Y
I_H
U
1750 1800 1850 1900Year
Kinship variables for groupingUNIQUE_GROUP
• Descendants of members of consecutive yihu in earliest available register who have same surname.
• Most stable over time in terms of size and number– Ideal for analysis of change over the long
term
bysort UNIQUE_GROUP YEAR: generate founder_id_obs_year = _Nbysort UNIQUE_GROUP YEAR: keep if _n == 1collapse founder_id_obs_year, by(YEAR)line founder_id_obs_year YEAR, scheme(s1mono) ytitle("Mean number of observations per UNIQUE_GROUP") ylabel(0(5)60)
010
2030
4050
6070
8090
1001
1012
0130
Mea
n nu
mbe
r of o
bser
vatio
ns p
er U
NIQ
UE_
GR
OU
P
1750 1800 1850 1900Year