Date post: | 28-Mar-2015 |
Category: |
Documents |
Upload: | julia-elliott |
View: | 212 times |
Download: | 0 times |
Housekeeping
Fire alarm:LOUD continuous ringingTurn right down corridor
Down stairsGather on Oxford Road side of building
Men’s and Women’s toiletsTurn right, toilets at end of corridor
Using the hierarchy of the government surveys
Jo WathanCentre for Census and Survey Research
Economic and Social Data Service(Government Data)
ESDS Using Hierarchy: v.06/04 3
ESDS Government
• Part of the wider Economic and Social Data Service, ESRC funded data dissemination and support service.
• ESDS is headed by UK Data Archive, also involves MIMAS and CCSR at the University of Manchester and ISER at the University of Essex
• ESDS Government, headed by CCSR. • Supports the large scale, continuous, cross-
sectional surveys collected by ONS and NatCen• Data dissemination carried out by UKDA• Value added services and user support carried out
by CCSR
ESDS Using Hierarchy: v.06/04 4
This afternoon…
• What is hierarchical data? • What is the research purpose of
hierarchical data?• What hierarchy is available in ESDS
Government datasets?• Working with hierarchy in SPSS and Stata• Practical exercise
ESDS Using Hierarchy: v.06/04 5
What is hierarchy?
• Data which can be analysed at more than one level, where smaller levels are nested within higher levels
• Most commonly seen in the form of household data, where information is collected on all individuals within the household – Data contains a variable indicating which household
an individual lives in– Data can be analysed at the household level or the
individual level– Often possible to analyse at the family level too
• Other forms of hierarchy available, eg. Sub-individual level (e.g. information per hospital stay, per crime reported)
ESDS Using Hierarchy: v.06/04 6
Compared with flat files…
• Contextual information may be present, e.g. individual asked about size of household but:– Information collected from only one level– Not usually appropriate to use data at other
levels– Not usually possible to create additional derived
variables at other levels– E.g. information collected from one individual
within household
ESDS Using Hierarchy: v.06/04 7
Hierarchical data: conceptually
Household 1North West
Social rented
Household 2Wales
Owner occupier
Person 1HoH
Female28
GCSEP/T WorkNo LTILL
Person 2Son of HoH
Male12N/AN/A
No LTILL
Person 1 HoHMale33
DegreeF/T Employee
No LTILL
Person 2Spouse of HOH
Female31
DegreeP/T Employee
No LTILL
Person 3Parent of HoH
Female 72
No qualsEcon Inactive
LTILL
ESDS Using Hierarchy: v.06/04 8
More complex hierarchy…
Household 1 Household 2
Family 1 Family 2 Family 3
hoh Son of hoh Hoh Wife of hoh Mother of hoh
In patient 1
In patient 2
In patient 1
ESDS Using Hierarchy: v.06/04 9
What does the data look like?Flattened data (GHS)
ESDS Using Hierarchy: v.06/04 10
What does the data look like (2)Multiple tables (FES)
Household.por
Jobmain.por
ESDS Using Hierarchy: v.06/04 11
Use the hierarchy to…
• Better describe the household• Describe the household context of an
individual• Look at intra-household differences
(& sameness)
ESDS Using Hierarchy: v.06/04 12
Describing the householde.g. Is the household deprived / in
poverty?• Equivalising income (e.g. FRS)
– Need information on total income (all members not just Household Reference Person)
– Need information on household composition
• Identifying workless households– E.g. Gregg and Wadsworth (1999)
ESDS Using Hierarchy: v.06/04 13
Workless households (source FES, various years 1968-1996)
0
5
10
15
20
25
68 70 72 74 76 78 80 82 84 86 88 90 92 94 96
Year
Pe
rce
nta
ge
(o
f p
res
en
t w
ork
ing
ag
e h
oh
)
workless households
children in worklesshouseholds
Source: Richard Dickens, Paul Gregg and Jonathan Wadsworth(2000) ‘New Labour and the Labour Market, CMPO Working Paper Series00/19 Table 5
ESDS Using Hierarchy: v.06/04 14
The effect of partnership on employment (mothers)
fig 5.8: Employment Activity by all mothers (of dependent children) aged 16-59 by Partnership 1975-1996
0
10
20
30
40
50
1975 1981 1991 1996
Year
Perc
enta
ge
Partnered, f/t
Partnered, p/t
Unpartnered, f/t
Unpartnered, p/t
ESDS Using Hierarchy: v.06/04 15
Ethnic homogeneity -
% hhold members in same ethnic group as HOH
0 20 40 60 80 100
White
Black caribbean
Black-Other
Indian
Pakistani
Source 1991 Household SAR
ESDS Using Hierarchy: v.06/04 16
Hierarchy in some key datasets
SurveyHhd hierarchy?
Levels Type
GHS Household,Family,Individual,Sub Individual
Flat file
LFS Household, Family,Individual
Flat files(QLFS/Hhd data)
FES Multiple, inc. household, person, family unit, benefit unit
Multiple files
FRS Household,Benefit Unit, Individual Multiple files
HSE Household, Individual(watch out for variable samples)
Flat files (1 all inds, 1 all resps)
BSAS Individual Flat file
BCS Individual,Incident (Hhd context only)
Multiple files
BHPS Household, Individual (& below) Multiple files
Household SARs
Household, Family, Individual Flat file
ESDS Using Hierarchy: v.06/04 17
Main Levels
• Household – group who have the accommodation as their only or
main residence and who either share one meal a day or share the living accomodation.
– Useful for coresidence or policy related issues • Family Unit
– An individual plus partner plus any unmarried children– The census definition of family unit excludes single
childless individuals– Useful for identifying partnership and parenthood
relationships• Benefit Unit
– Adult children in separate unit from parents
– Useful when considering income and benefits• Check your definitions (despite harmonisation)
ESDS Using Hierarchy: v.06/04 18
Identifying the units
• You will need a unique identifier for the unit at each level
• Several variables may be needed to be used in combination
• You may need to compute a unique identifier
• Will need to read the documentation to assess this
ESDS Using Hierarchy: v.06/04 19
Straightforward: GHS 00-01
• To identify a household use HSERIAL• To identify an individual within the
household use PERSNO• To identify a family unit use FSERIAL• To identify a family unit within a household
use AFAM• To identify the household reference person
test for PERSNO = HRP (HRP gives the person no. for the HRP)
• Similarly to locate the Family Unit head test for FUH=PERSNO
ESDS Using Hierarchy: v.06/04 20
Complex e.g. QLFS 2003• If interested in using household information use the
Household File• Information about identifiers is in the read file• Household identifier is Remserno – however this is
not present in all LFS datasets• To compute use:
– Week x 10000000 +– W1yr x 1000000 +– Qrtr x 100000 +– Add x 1000 +– Wafnd x 100 +– Hhd
• This has to be used together with either CASEID or QUOTA (which are identical) – could combine this with Remserno to derive an easier to use household ID
• To identify a person in the household use person
ESDS Using Hierarchy: v.06/04 21
Working with hierarchical data
• Which level should I analyse at?• Manipulating data in SPSS
– Menu driven approach– Syntax
• Manipulating data in Stata
ESDS Using Hierarchy: v.06/04 22
Which level should I analyse at?
Hhd ID
Personnumber
Relationship to HRP
family
Income p/w
age tenure health
Relation-ship to FUH
FUH
Hid Person Reltohrp Fam Inc Age Tenure Health
Reltofuh fuh
1 1 self 1 dna 63 Soc rent Poor Self yes
2 1 self 1 300 21 Priv rent Good Self Yes
2 2 none 2 400 28 Priv rent Good Self Yes
2 3 none 3 100 19 Priv rent Ok Self Yes
3 1 self 1 700 43 Own occ Good Self Yes
3 2 partner 1 500 40 Own occ Good Partner No
3 3 child 1 N/a 12 Own occ Good Child No
4 1 self 1 200 35 Own occ Good Self Yes
4 2 partner 1 90 34 Own occ Ok Partner No
5 1 self 1 450 25 Soc rent Ok Self Yes
5 2 child 1 N/a 4 Soc rent Poor Child No
5 3 child 1 N/a 2 Soc rent ok child no
ESDS Using Hierarchy: v.06/04 23
Understanding the data
• What is the default case/unit of analysis in the dataset?
• How many cases are in the data?• How many households are in the data?• How many family units are in the data?• How many households have more than
one family unit? • How large is the largest household?• How many lone families are in the data?
ESDS Using Hierarchy: v.06/04 24
Using the data
• What unit of analysis would you use to answer the following questions?
• Would you need create variables at different levels of analysis to answer the question?– What is the mean income per adult?– What proportion of children live with 2 parents?– What is the mean income per adult-equivalent
household member (where children count as half a household member)?
– Does your partner’s health affect your own?– How is total household income related to tenure?
ESDS Using Hierarchy: v.06/04 25
Working with hierarchy in SPSS
• SPSS is not good at data manipulation!• To generate a household variable from
individual data need to use the aggregate command.
• Aggregate command creates a household level file, with:– 1 case per household– Contains the household ID variable specified plus
any aggregate variables defined
• Slow, memory intensive, unnecessarily complicated compared with some other packages…
ESDS Using Hierarchy: v.06/04 26
Creating a summary variable at the household level adding the number of people in the household
Rectangular file Aggregate file –household level
HH 1 Person 1 person 2 person 3 person 4 HH 2 person1 person 2
Nperhh HH1 4 HH2 2
ESDS Using Hierarchy: v.06/04 27
Step1: Creating a summary variable at the household level Finding the oldest person in the household
Rectangular file Aggregate file –household level
HH 1 Age Person 1 56 person 2 44 person 3 13 person 4 6 HH 2 person1 22 person 2 25
Oldest HH1 56 HH2 25
ESDS Using Hierarchy: v.06/04 28
Creating a summary variable at the household level Identifying the health of the Household Reference Person
Rectangular file Aggregate file –household level
ReltoHRP health HH1 self poor spouse ok child good child good HH 2 self good none good
hrphlth HH1 poor HH2 good
ESDS Using Hierarchy: v.06/04 29
Aggregation at the household level
• You can work at the level of the household– Use the aggregate outfile– Remember to carry across other household level
variables that you will need into the aggregate file as part of the aggregate procedure
• Or match the household level variable back to the original individual level dataset…
ESDS Using Hierarchy: v.06/04 30
Aggregate and match back to individual file
• Usually it is best to match back your aggregated variable to the master file– the household variable is distributed to each
individual– you can then select on household head, family
head to work at level of household or family– Or you can link information about the household
to the individual
ESDS Using Hierarchy: v.06/04 31
Match the aggregate variables back to each individual in the household Rectangular file Aggregate file –household level
HH 1 Nperhh Oldest Person 1 4 44 person 2 4 44 person 3 4 44 person 4 4 44 HH 2 person1 2 76 person 2 2 76
Nperhh Oldest HH1 4 44 HH2 2 76
ESDS Using Hierarchy: v.06/04 32
SPSS syntax used*compute a variable which is a low value, but which takes the (higher) value for health when
respondent is hrp.
compute hlthrep = -9.if (reltohrp = 1) hlthrep = health.crosstabs hlthrep by health by reltohrp.sort cases hid.aggregate outfile = "c:\work\esds\aggfile.sav"
/break hid/nperhh = n(hid)/oldest = max(age)/hrphlth = max(hlthrep).
execute.
match files/file = */table = "c:\work\esds\aggfile.sav"/by hid.
execute.
ESDS Using Hierarchy: v.06/04 33
Working with hierarchy in Stata
• Stata much better at data manipulation than SPSS
• Not necessary to create an additional file• Simply run the appropriate procedure for
each household separately– Sort the data by the household identifier first– Use the by household identifier subcommand
ESDS Using Hierarchy: v.06/04 34
The equivalent Stata commands:
sort hidegen nperhh = count(hid), by (hid)egen oldest = max(age), by (hid)gen hlthrep = -9replace hlthrep=health if (reltohrp == 1)egen hrphlth = max(hlthrep), by (hid)
ESDS Using Hierarchy: v.06/04 35
Some issues…• Is the data representative for your choice
of unit?– Looking at individuals in a household survey will
generally omit individuals not living in households
– Weighting may be necessary to counteract survey design
– If the survey was not designed to analyse using the units you use, will it still be representative?
• Will there be any clustering effects?– Individuals within households will be more alike
than individuals in general– This could affect the accuracy of the estimates