SJTU CMGPD 2012 Methodological Lecture

SJTU CMGPD 2012Methodological Lecture

Day 9Kinship

Ancestry identifiersSpecific patrilineal ancestors

• In the Basic file…– FATHER_ID– GRANDFATHER_ID

• In the Kinship file…– F_ID_1 – same as FATHER_ID– F_ID_2 – same as GRANDFATHER_ID– F_ID_3 – Great-grandfather– F_ID_4 – Great-great-grandfather

Ancestry identifiersSpecific patrilineal ancestors

• Wives of paternal ancestors– M_ID_1 – Mother

• Same as MOTHER_ID in Basic– M_ID_2 – Paternal grandmother

• Father’s mother (fm)– M_ID_3 – Paternal great-grandmother

• ffm– M_ID_4 – Paternal great-great-grandmother

• fffm

Ancestry identifiersInferred ancestors

• Most identifiers refer to actual individuals observed in the dataset• In some cases, the existence of a common ancestor whose death

predated the earliest available register is inferred.– Based on relationship codes– Brothers in the earliest available register are inferred to have a

common father.– Cousins in the earliest available register are inferred to have a

common father.• For grouping purposes, an identifier is assigned that doesn’t refer to

anyone observed in the dataset– No corresponding PERSON_ID

• FATHER_ID_IMPUTED, GRANDFATHER_ID_IMPUTED are flags indicating that the IDs don’t refer to anyone observed in the dataset

Distributions of men by numbers of descendants

110

100

1000

1000

0N

umbe

r of m

en

0 5 10 15 20Number of descendants

Sons GrandsonsGreat-grandsons Great-great-grandsons

use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\27063-0001-Data.dta" if SEX == 2 & PRESENTbysort PERSON_ID: keep if _n == 1keep FATHER_IDkeep if FATHER_ID != "-99"bysort FATHER_ID: generate sons = _Nbysort FATHER_ID: keep if _n == 1rename FATHER_ID PERSON_IDsave Sons, replace

use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\27063-0001-Data.dta" if SEX == 2 & PRESENTbysort PERSON_ID: keep if _n == 1keep GRANDFATHER_IDkeep if GRANDFATHER_ID != "-99"bysort GRANDFATHER_ID: generate grandsons = _Nbysort GRANDFATHER_ID: keep if _n == 1rename GRANDFATHER_ID PERSON_IDsave Grandsons, replace

use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\27063-0001-Data.dta" if SEX == 2 & PRESENTbysort PERSON_ID: keep if _n == 1merge 1:1 RECORD_NUMBER using "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0004\27063-0004-Data.dta", keepusing(F_ID_3) keep(match master)keep F_ID_3keep if F_ID_3 != "-99"replace F_ID_3 = substr(F_ID_3,3,.)bysort F_ID_3: generate ggrandsons = _Nbysort F_ID_3: keep if _n == 1rename F_ID_3 PERSON_IDsave GGrandsons, replace

use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\27063-0001-Data.dta" if SEX == 2 & PRESENTbysort PERSON_ID: keep if _n == 1merge 1:1 RECORD_NUMBER using "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0004\27063-0004-Data.dta", keepusing(F_ID_4) keep(match master)keep F_ID_4keep if F_ID_4 != "-99"replace F_ID_4 = substr(F_ID_4,3,.)bysort F_ID_4: generate gggrandsons = _Nbysort F_ID_4: keep if _n == 1rename F_ID_4 PERSON_IDsave GGGrandsons, replace

use "C:\Users\Cameron Campbe\Documents\Baqi\CMGPD-LN from ICPSR\ICPSR_27063\DS0001\27063-0001-Data.dta" if SEX == 2 & PRESENTbysort PERSON_ID (YEAR): keep if _n == 1 & YEAR <= 1810keep PERSON_IDmerge 1:1 PERSON_ID using Sons, keep(match master)replace sons = 0 if sons == .drop _mergemerge 1:1 PERSON_ID using Grandsons, keep(match master)replace grandsons = 0 if grandsons == .drop _mergemerge 1:1 PERSON_ID using GGrandsons, keep(match master)replace ggrandsons = 0 if ggrandsons == .drop _mergemerge 1:1 PERSON_ID using GGGrandsons, keep(match master)replace gggrandsons = 0 if gggrandsons == .drop _merge

replace sons = 20 if sons >= 20bysort sons: generate first_in_sons = _n == 1bysort sons: generate sons_number = _Nlabel variable sons_number "Sons"

replace grandsons = 20 if grandsons >= 20 bysort grandsons: generate first_in_grandsons = _n == 1bysort grandsons: generate grandsons_number = _N label variable grandsons_number "Grandsons"

replace ggrandsons = 20 if ggrandsons >= 20 bysort ggrandsons: generate first_in_ggrandsons = _n == 1bysort ggrandsons: generate ggrandsons_number = _Nlabel variable ggrandsons_number "Great-grandsons"

replace gggrandsons = 20 if gggrandsons >= 20 bysort gggrandsons: generate first_in_gggrandsons = _n == 1bysort gggrandsons: generate gggrandsons_number = _Nlabel variable gggrandsons_number "Great-great-grandsons"

twoway line sons_number sons if first_in_sons, sort yscale(log) || line grandsons_number grandsons if first_in_grandsons, sort || line ggrandsons_number ggrandsons if first_in_ggrandsons, sort || line gggrandsons_number gggrandsons if first_in_gggrandsons, sort ||, scheme(s1mono) xtitle("Number of descendants") ytitle("Number of men") ylabel(1 10 100 1000 10000)

Kinship variables for groupingUses

• Controlling for kin group membership– Via random-effects models– Alongside village, household, other levels– Multiple levels are computationally demanding

• Often need tricks to collapse observations or otherwise reduce the dataset

• Computation of explanatory variables– Aggregate measures of kin network status to use as

right-hand side variables• Units of analysis in their own right

– See yesterday

Kinship variables for groupingAscending order of kin distance

• FOUNDER_ID– Descent from a common male ancestor in the

registers• FOUNDER_INFERRED_ID

– Descent from a common male ancestor inferred from relationship codes in the earliest available register

• UNIQUE_YI_HU– Descent from members of the same yihu in the

earliest available register• UNIQUE_GROUP

– Descent from members of the adjacent yihu with the same surname in the earliest available register

Numbers and average sizes of units

Units Obs. Per Unit

Individuals Per. Unit

FOUNDER_ID 25540 59 10

FOUNDER_INFERRED_ID* 28832 52 9

UNIQUE_YIHU 2688 563 99

UNIQUE_GROUP 1063 1423 250

Kinship variables for groupingFOUNDER_ID

• PERSON_ID of earliest male ancestor located in the registers.

• Most narrowly-defined grouping variable– Based on descent from a single observed

individual.• Many extinctions

– Within one or two generations– Causes average size of groups defined by

FOUNDER_ID to rise over time

0.2

.4.6

Frac

tion

0 50 100 150 200Number of observations with same FOUNDER_ID

bysort FOUNDER_ID: generate founder_id_obs = _Nbysort FOUNDER_ID: generate first_in_founder_id = _n == 1replace founder_id_obs = 200 if founder_id_obs > 200histogram founder_id_obs if first_in_founder_id, width(10) scheme(s1mono) xtitle("Number of observations with same FOUNDER_ID") fraction

bysort FOUNDER_ID YEAR: generate founder_id_obs_year = _Nbysort FOUNDER_ID YEAR: keep if _n == 1collapse founder_id_obs_year, by(YEAR)line founder_id_obs_year YEAR, scheme(s1mono) ytitle("Mean number of observations per FOUNDER_ID") ylabel(0(2)12)

02

46

810

12M

ean

num

ber o

f obs

erva

tions

per

FO

UN

DE

R_I

D

1750 1800 1850 1900Year

Kinship variables for groupingFOUNDER_ID_INFERRED

• Uses earliest available inferred ancestor– Based on relationship codes in earliest

available register• Useful for grouping records in earliest registers

– Until 1789, relationships were to head of yihu, not linghu.

– Allowed for inference of common ancestry• Average size of groups defined by

FOUNDER_ID_INFERRED increases over time because of extinction of smaller groups

bysort FOUNDER_INFERRED_ID: generate founder_id_obs = _Nbysort FOUNDER_INFERRED_ID: generate first_in_founder_id = _n == 1replace founder_id_obs = 200 if founder_id_obs > 200histogram founder_id_obs if first_in_founder_id, width(10) scheme(s1mono) xtitle("Number of observations with same FOUNDER_INFERRED_ID") fraction

0.1

.2.3

.4.5

Frac

tion

0 50 100 150 200Number of observations with same FOUNDER_INFERRED_ID

bysort FOUNDER_INFERRED_ID YEAR: generate founder_id_obs_year = _Nbysort FOUNDER_INFERRED_ID YEAR: keep if _n == 1collapse founder_id_obs_year, by(YEAR)line founder_id_obs_year YEAR, scheme(s1mono) ytitle("Mean number of observations per FOUNDER_INFERRED_ID") ylabel(0(2)12))

02

46

810

12M

ean

num

ber o

f obs

erva

tions

per

FO

UN

DE

R_I

NFE

RR

ED

_ID

1750 1800 1850 1900Year

Kinship variables for groupingUNIQUE_YIHU

• Descendants of members of the same yihu in the earliest available register.

• Clusters are much larger than the ones defined by FOUNDER_ID or FOUNDER_INFERRED_ID

0.1

.2.3

.4Fr

actio

n

0 100 200 300 400 500Number of observations in UNIQUE_YI_HU

bysort UNIQUE_YI_HU YEAR: generate founder_id_obs_year = _Nbysort UNIQUE_YI_HU YEAR: keep if _n == 1collapse founder_id_obs_year, by(YEAR)line founder_id_obs_year YEARline founder_id_obs_year YEAR, scheme(s1mono) ytitle("Mean number of observations per UNIQUE_YI_HU") ylabel(0(5)60)

05

1015

2025

3035

4045

5055

60M

ean

num

ber o

f obs

erva

tions

per

UN

IQU

E_Y

I_H

U

1750 1800 1850 1900Year

Kinship variables for groupingUNIQUE_GROUP

• Descendants of members of consecutive yihu in earliest available register who have same surname.

• Most stable over time in terms of size and number– Ideal for analysis of change over the long

term

bysort UNIQUE_GROUP YEAR: generate founder_id_obs_year = _Nbysort UNIQUE_GROUP YEAR: keep if _n == 1collapse founder_id_obs_year, by(YEAR)line founder_id_obs_year YEAR, scheme(s1mono) ytitle("Mean number of observations per UNIQUE_GROUP") ylabel(0(5)60)

010

2030

4050

6070

8090

1001

1012

0130

Mea

n nu

mbe

r of o

bser

vatio

ns p

er U

NIQ

UE_

GR

OU

P

1750 1800 1850 1900Year

Date post:	22-Feb-2016
Category:	Documents
Upload:	abiola
View:	32 times
Download:	0 times