Introduction Data And Methods Results
Bayesian forecasting of demographic rates forsmall areas
Junni Zhang
Guanghua School of Management, Peking UniversityJoint Work with John Bryant, Statistics New Zealand
December 21, 2014
1 / 29
Introduction Data And Methods Results
Section 1
Introduction
2 / 29
Introduction Data And Methods Results
Demographic Forecasting
Forecasts of the total number of people are not sufficient.Planning for hospitals, bridges, schools, housing, andmuch else besides requires population forecasts for smallareas.Forecasts typically extend many years into the future,because planning for infrastructure requires long timehorizons.
3 / 29
Introduction Data And Methods Results
Our Application
e.g. counts of “permanent and long-term" departures from NewZealand, region=Kaikoura, time=2012
SexAge Female Male0-4 3 15-9 5 3
10-14 3 315-19 3 120-24 7 525-29 3 830-34 2 135-39 2 440-44 5 145-49 1 550-54 1 055-59 2 160-64 0 065-69 1 070-74 0 275+ 0 0
Cross-classified counts,rates, probabilitiesDimensions: age, sex,region, time
4 / 29
Introduction Data And Methods Results
Our Application
We produce statistical forecasts for emigration rates for 16 agegroups, 2 sexes, and 73 regions over 25 years.
5 / 29
Introduction Data And Methods Results
Traditional Demographic Approach
Forecasts are constructed for birth rates, death rates andmigration rates, disaggregated by age and sex.The accounting identity is then repeatedly applied, to giveforecasted population in each period after the base year.
6 / 29
Introduction Data And Methods Results
Traditional Demographic Approach
Traditionally mathematics, not statisticsData evaluation, rich but informalStruggling with disaggregationUncertainty: ‘low’, ‘medium’, and ‘high’ variants.
When population forecasts are assembled from low,median, and high variants for fertility, mortality andmigration, the results are often counter-intuitive.For instance, combinations of variants that lead to largevariation in population size may lead to small variation inthe ratio of young people to old people.
7 / 29
Introduction Data And Methods Results
Probabilistic Approaches
Researchers have developed probabalistic approaches toforecasting that combine ideas from demography withideas from the time series literature.
Lee and Carter (1992), Booth (2006), Booth and Tickle(2008), Alkema et al., (2011), Raftery et al. (2012), Bijakand Wisniowski (2010)
However, almost all the research on population hasfocused on national-level projections.
8 / 29
Introduction Data And Methods Results
Complications with Small Area Forecasts
Increasing prominence of random variation as the databecome more disaggregated.Virtually all geographically-disaggregated data containgaps and breaks due to changes in administrativeboundaries.
Before 2010 there were 73 territorial authorities in NewZealand.During 2010, seven territorial authorities within greaterAuckland were amalgamated into a single unit.
9 / 29
Introduction Data And Methods Results
Our Approach
We draw on ideas from the literature on small areaestimation, in addition to demography and time seriesstatistics.We develop a Bayesian hierarchical model.
10 / 29
Introduction Data And Methods Results
Section 2
Data And Methods
11 / 29
Introduction Data And Methods Results
Direct Estimation: Both Sexes, All Ages
Year
Rat
e
0.000
0.005
0.010
0.015
0.020
1995 2005
New Zealand
1995 2005
Kaikoura
1995 2005
Masterton
1995 2005
Papakura
1995 2005
Rotorua
1995 2005
Manukau
12 / 29
Introduction Data And Methods Results
Direct Estimation: Female, Age 20-24
Year
Rat
e
0.00
0.05
0.10
0.15
1995 2005
New Zealand
1995 2005
Kaikoura
1995 2005
Masterton
1995 2005
Papakura
1995 2005
Rotorua
1995 2005
Manukau
13 / 29
Introduction Data And Methods Results
Direct Estimation: Both Sexes, Time 1992 and 2010
Year
Rat
e
0.00
0.02
0.04
0.06
0.08
0.10
0 20 40 60 80
New Zealand
0 20 40 60 80
Kaikoura
0 20 40 60 80
Masterton
0 20 40 60 80
Papakura
0 20 40 60 80
Rotorua
0 20 40 60 80
Manukau
1992 2010
14 / 29
Introduction Data And Methods Results
Basic Model
For age a, sex s, region r and time t, let xasrt denote populationsize, and let yasrt denote the count of international departures.
yasrtind∼ Poisson(λasrtxasrt)
logλasrt = β0 + βagea + βsex
s + βregr + βtime
t
+ βage:sexas + β
age:regar + β
sex:regsr + β
age:sex:regasr + εasrt
εasrtind∼ N(0, σ2
ε )
15 / 29
Introduction Data And Methods Results
Basic Model
βtimet ∼ a non-stationary polynomial trend model with order p
p = 1:
βtimet = θt ,1 + vt
θt ,1 = θt−1,1 + wt ,1
p = 2:
βtimet = θt ,1 + vt
θt ,1 = θt−1,1 + θt−1,2 + wt ,1
θt ,2 = θt−1,2 + wt ,2
βagea ∼ a non-stationary polynomial trend model with order q
16 / 29
Introduction Data And Methods Results
Basic Model
βregr = γ>X r + ur , (1)
X r consists of:the logarithm of percent of population born overseas forregion r .the logarithm of percent of population in full-time study forregion r .
urind∼ N(0, σ2
u)
17 / 29
Introduction Data And Methods Results
Prior Distributions for Other Parameters
βage:sexas
ind∼ N(0, σ2age:sex),
βage:regar
ind∼ N(0, σ2age:reg),
βsex:regsr
ind∼ N(0, σ2sex:reg),
βage:sex:regasr
ind∼ N(0, σ2age:sex:reg).
The regression coefficients and the standard deviations followimproper uniform prior distributions.
18 / 29
Introduction Data And Methods Results
MCMC
We use a Markov Chain Monte Carlo (MCMC) algorithm todraw the parameters from their posterior distribution.
19 / 29
Introduction Data And Methods Results
Time Order Terms
time
value
-0.4
-0.2
0.0
0.2
0.4
order1
1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
-0.4
-0.2
0.0
0.2
0.4
order2
We choose p = 1.
20 / 29
Introduction Data And Methods Results
Age Order Terms
age
value
-2
-1
0
1
2
order1
0-4 5-9 10-1415-1920-2425-2930-3435-3940-4445-4950-5455-5960-6465-6970-74 75+
-2
-1
0
1
2
order2
We choose q = 2.21 / 29
Introduction Data And Methods Results
Missing Values for Region: First Problem
7.5% of records have no regional information at all, eitherbecause the respondent did not provide it, or because theresponse could not be coded.
We address this issue through multiple imputation.
22 / 29
Introduction Data And Methods Results
Missing Values for Region: First Problem
Statistics New Zealand has information on citizenship thatcannot be released publicly.
“... imputation performed by the data collector (e.g. the CensusBureau) has the important advantage of allowing the use ofinformation available to the data collector but not available to anexternal data analyst. · · · This kind of information, even thoughinaccessible to the user of a public-use file, can often improvethe imputed values.” (Rubin 1987)
∑r
ymisasrtc = yobs
astc
23 / 29
Introduction Data And Methods Results
Missing Values for Region: Second Problem
From 2010 onwards, if the region is coded as Auckland, there ismissing information on which of the seven original territorial 16authorities within Auckland is associated with the records.
We address this issue through jointly updating these valuesand the parameters within the MCMC algorithm.
24 / 29
Introduction Data And Methods Results
Section 3
Results
25 / 29
Introduction Data And Methods Results
Validation Exercise
We pretend that the event of merging seven territorialauthorities within Auckland happened at 2003.The training data include the counts for 73 regions for1991-2002 and the counts for 67 regions, with the countsfor the seven territorial authorities within Auckland merged,for 2003-2005.We predict the emigration counts ypre
asrt for 2006-2010, andobtain their posterior medians, 50% credible intervals and90% credible intervals.We compare the posterior medians and credible intervalswith the observed values of yasrt for 2006-2010.
26 / 29
Introduction Data And Methods Results
Results of Validation Exercise
The median of yasrt to be predicted equals to 9, and themedian absolute error of using posterior medians of ypre
asrt topredict yasrt is 3.The percentage of yasrt lying inside the 50% credibleintervals is 57.1%.The percentage of yasrt lying inside the 90% credibleintervals is 89.6%.
27 / 29
Introduction Data And Methods Results
Estimates for Female, Age 20-24
Time
0.05
0.10
0.15
Kaikoura
1995 2000 2005 2010
Gore Masterton
1995 2000 2005 2010
Papakura Rotorua
1995 2000 2005 2010
0.05
0.10
0.15
Manukau
28 / 29
Introduction Data And Methods Results
Estimates and Prediction for Female, Age 20-24
Time
0.05
0.10
0.15
0.20
Kaikoura
1990 2000 2010 2020 2030 2040
Gore Masterton
1990 2000 2010 2020 2030 2040
Papakura Rotorua
1990 2000 2010 2020 2030 2040
0.05
0.10
0.15
0.20
Manukau
29 / 29