Post on 25-Feb-2016
description
transcript
EC 827
Module 2Forecasting a Single Variable from its own
History
Regression Refresher (oh what fun?)
Consider a simple regression, e.g. one based on the incomes of workers at a firm:
…for which a computer will calculate an optimal mathematical model:
where the ’s are coefficients on education, age etc… and the is a random error.
[K])rank [T], tenure [R], race [S],sex [A], age ,[E] education(functionYi
ii6i5i4i3i2i10i K TRSAEY
Coefficients Let’s say that (the coefficient on age) is 125. If
we assume that the data for age is just some number of years (e.g. 32 years of age etc…), then the coefficient implies that, on average, a worker who is one year older will earn $125 more than a younger equivalent, all things being equal.
But…– is the relationship real?– does it matter at all?
Statistical Significance To tell whether the coefficient should be believed,
the computer will generally give us some measure of the reliability of the coefficient estimate, normally either:– the standard error– the t-stat– the p-value
Fortunately, they all tell the same story...
Statistical Significance II …they all measure, given the natural
variability of the data, how likely it is that the actual (as opposed to estimated) coefficient is really zero (and hence, there’s no real relationship between, e.g. age and income).
Generally, we use a 95% confidence level as our measure of “certainty”. If a coefficient passes one of three equivalent tests, we are reasonably confident there is a relationship:
Coefficient Tests I If the coefficient is approximately twice the size
of the standard erroror The t-stat is greater (in absolute value) than 1.96 or The p-value is less (in absolute value) than 0.05
…then we’re confident we’ve found a real relationship. Note: saying we’re confident there’s a relationship isn’t the same as saying we’re confident that our relationship is accurate.
Coefficient Tests II As a general rule, if a coefficient passes one of
these tests, we think it’s important as an explanation of the dependent variable (in our example, income). If it fails the test, we may want to discard it.
We can also test whole sets of variables by using F-tests. Again, the computer will calculate these scores for you. Their meaning is the same: they measure whether or not it appears the variables are important determinants of whatever we’re modeling.
Economic Significance Be careful: all these tests just look for a
mathematically meaningful relationship, not a practically important one.
For example, it would be possible that the coefficient on age was 1.25, not 125, but was very statistically significant. That would imply that an extra year of age raised average salaries by $1.25 per year. Statistically measurable? Yes. Relevant or important? No.
Time Series Data Data that we are interested in forecasting are
time series: observations that have a well defined temporal ordering
Notation: t-1Xt = forecast constructed at time = t-1 of the observation on X to be realized at time = t. No standard notation
Alternative data have no particular ordering: e.g. sample of heights of people in this class
Forecasting a Single Variable I Any time series can be considered as the sum of two
components:– Deterministic Component: that part of the time
series for which a perfect forecast of the future value can be constructed
» examples: constant, time trends, constant seasonal factors– Stochastic Component: that part of the time series
for which is random (stochastic) for which predictions of future value may turn out to be in error.
Forecasting a Single Variable II Assumption is that a history of the variable is available
- a time series of observations
Require that information in that history have implication for current or future realizations of the variable.
Useful information available at the present for forecasting the future requires correlation between events at different points in time.
Correlation and Causation Correlation simply implies a link between two
data series, not that the link is “cause and effect”.
In a perfect world, we’d like to find the effects of causes
In the real world, the best we can hope for is to find potential causes of effects
Forecasts from Own History:An Example
Consider a coin tossing experiment:– Toss a coin N times and record heads (1) or tails
(0) for each replication– Generate a time series: 0,1,1,0, ...– No deterministic component to this time series– Does the information in the time series of
outcomes provide any basis for forecasting the outcome of additional tosses? Why or why not?
Covariance Stationarity Characteristic of the Stochastic Component of a
time series– mean does not depend on time– variance does not depend on time and is
finite– autocovariances (autocorrelations) depend
only on the distance between observations and not the time of the observations.
Sample Correlation Coefficients
N
1i
N
1i
2i
2i
N
1iii
]YY[]XX[
)YXYX(
For two series, Xi and Yi on which N observations are available:
correlation coefficient =
Autocorrelations: Definitions Definition: An autocorrelation coefficient is the
correlation between observations of a time series that are separated by a fixed time interval.– A First order autocorrelation is the correlation
between observations in a time series and the same observations lagged one period.
– A pth order autocorrelation is the correlation between observations in a time series and the same observations lagged p periods.
Autocorrelations: Formulae
2jtjt2
tt
jttjtt
thjti
XXXX
)XXX(X
:ist coefficien
ationAutocorrel orderj theThen . 0 j,X YLet
Sample Autocorrelation Function I
Coin Tossing Experiment
Lag1 3 5 7 9
-1.00
-0.75
-0.50
-0.25
0.00
0.25
0.50
0.75
1.00
Predictions of Outcomes of Future Coin Tosses
Outcome of any particular coin toss (head or tail) is not influenced by the outcome of any past toss (assuming a fair coin)
Coin tosses are independent events Autocorrelations for time series of coin toss
outcomes are zero (estimated autocorrelations are not significantly different from zero)
No useful forecasting info in time series
Sample Partial Autocorrelation Coefficients
Construct a linear regression of a variable on a constant and lagged observations on the dependent variable up to order p.
Estimated coefficients in this regression model are the estimated partial autocorrelation coefficients, i.e. the coefficient on the n’th lag would be correlation between the event n periods ago and today’s event, given all of the other events.
White Noise Variables white noise process (stochastic variable) =
– zero mean – constant finite variance– not serially correlated (uncorrelated with
observations at different points in time)» autocorrelation coefficients of order 1 are all
zero. Coin tossing is such a white noise process
(actually stronger = independent white noise)
Wold Representation Theorem Any zero mean covariance stationary process
Xt can be written as an infinite sum of white noise processes:
1
20
0
,1j
j
t
jjtjt
bb
noisewhite
bx
Wold Theorem: Implementation What does that mean? It means that everything
that happens is a function of an infinite series of all past random events. True, but… so what?
The problem is to estimate the terms– impossible since there are an infinite number– trick is to find some model that approximates
the Wold representation
Forecasting Without Infinite Information
What will tomorrow look like? Generally, tomorrow will look like today.
What’s more important:– How much will tomorrow look like today?– How will tomorrow respond to today’s shocks?– How will tomorrow be different from today?
First, know what question to ask… then worry about answers.
Autocorrelation Patterns I Autoregressive (AR) patterns: “Today looks
like previous days”– at least three components
» deterministic component (e.g. constant, trend, constant seasonal factors)
» second component depends on observed values of previous periods
» third component is a new shock, independent of anything that has happened in the past
Example: AR(1) process
X a a X et t t 0 1 1
a0 = Constant or DeterministicComponenta1 = First order Autoregressive
Coefficient(-1 < a1 < 1)
et = Error term (independent whitenoise)
Sample Autocorrelation Function II
AR(1) Variable; a1 = .8
Lag1 3 5 7 9
-1.00
-0.75
-0.50
-0.25
0.00
0.25
0.50
0.75
1.00
Sample Autocorrelation II Note that in AR(1) process all observations prior
to time = t are correlated with the outcome at t.– all previous observations have information that
is useful in forecasting what will happen at t.– usefulness (size of autocorrelation coefficients)
decreases as information becomes older (move to more distant past)
This is the general result for AR processes, although the autocorrelation patterns can differ.
Autocorrelation Patterns II Moving Average (MA) Process: “Today is
determined by yesterday’s shocks”– at least three components
» the deterministic component of the series» the effect of the current shock on the series» the effect of one or more previous shocks whose
influence still persists in the current observation.
MA(1) Process: Example
shocks past any with)correlated(not of tindependen Shock;Current e
Parameter Average MovingOrder First c
eecX
t
1
t1t1t
Sample Autocorrelation Function III
Ma1 Variable c1 = .8
Lag1 3 5 7 9
-1.00
-0.75
-0.50
-0.25
0.00
0.25
0.50
0.75
1.00
The shock occurs at t=1, with full strength, then persists with strength
c1*et-1 = (.8*1) = .8 at t=2, then vanishes
Sample Autocorrelations III Note that in MA process only a limited number
of past observations have information (autocorrelations) that are useful in forecasting the current outcome.– number of past observations that have
potentially useful forecasting information is determined by the length of the MA process (here only one past observation)
Sample Autocorrelations IV Size of first order autocorrelation of MA(1) process
determined by value of c1 (see Diebold, p. 158)
(don’t worry about remembering this. That’s what PCs are for)
( )11
1
12
c
c
Forecasts from AR Models I How does anything that occurs today (time = t)
carry forward to influence future observations?– Assume shock et = 1.0– AR(1) coefficient in (e.g.) Annual Inflation
Model is 0.58– Shock of 1.0 at time = t carries forward to
generate an increase in Xt+1 of 0.58
– Shock of 0.58 to Xt+1 carries forward to generate and increase in Xt+2 of 0.58*0.58 = 0.34
Forecasts from AR Models II Effect of shock at time = t persists to affect
observations at t+3, t+4, etc. Size of the effect on future observations becomes
smaller as long as absolute value of autoregressive coefficient is < 1.0
Contrast:– Shocks in AR model carry forward to affect future
observations indefinitely;– Shocks in MA model carry forward limited number
of periods
Forecasts from AR Models III Effect on future observations at time t+j of a
shock at a particular time t, is measured by impulse response function
Impulse Response Function Annual Inflation AR(1) Model
AR(1) Impulse Response Function
0 2 4 6 8 10 120.00
0.25
0.50
0.75
1.00
Forecasts from MA Models
How does anything that occurs today (time = t) carry forward to influence future observations?– Assume that we see a shock et = 1.0.– What are implications for t+1, t+2, etc?
Transitory & Permanent Shocks A transitory shock is one whose effect eventually
dies off– go far enough out into the future and events at that
time are not influenced by what is happening today.
A permanent shock is one whose effect continues to influence events, no matter how far off into the future.– all past events have a lasting effect on the present
and the future