Analysis of Big Dependent Data in Economicsand Finance
Ruey S. TsayBooth Shool of Business, University of Chicago
September 2016
Ruey S. Tsay Big Dependent Data 1 / 72
Outline
1 Big data? Machine learning? Data science? What is in foreconomics and finance?
2 Real-world data are often dynamically dependent3 A simple example: Methods for independent data may fail4 Trade-off between simplicity and reality5 Some methods useful for analyzing big dependent data in
economics and finance6 Examples7 Concluding remarks
Ruey S. Tsay Big Dependent Data 2 / 72
Big dependent data
1 Accurate information is the key to success in thecompetitive global economy. Information age.
2 What is big data? High dimension (many variables)? Largesample size? Both?
3 Not all big data sets are useful. Confounding & Noises4 Need to develop methods to efficiently extract useful
information from big data5 Know the limitations of big data6 Issues emerged from big data: privacy? ethical issues?7 Focus on methods for analyzing big dependent data in
economics and finance
Ruey S. Tsay Big Dependent Data 3 / 72
What are available?
Statistical methods:1 Focus on sparsity (Simplicity)2 Various penalized regressions, e.g. Lasso and its
extensions3 Various dimension reduction methods and models4 Common framework used: Independent observations, with
limited extensions to stationary dataReal data are often dynamically dependent!
Some useful concepts in analyzing big data:1 Parsimony vs sparsity: Parsimony⇒ Sparsity2 Simplicity vs reality: trade-off btw feasibility &
sophistication
Ruey S. Tsay Big Dependent Data 4 / 72
Parsimonious, not sparse
A simple example
yt = c +k∑
i=1
βxit + εt = c + β
k∑i=1
xit + εt ,
where k is large, xit are not perfectly correlated, and εt are iidN(0, σ2).The model has three parameters so it is parsimonious, but notsparse because y depends on all explanatory variables.In some applications,
∑ki=1 xit is a close approximation to the
first principal component. For example, the level of interestrates is important to an economy.Fused-Lasso can solve this difficulty in some situations.
Ruey S. Tsay Big Dependent Data 5 / 72
What is LASSO regression?
Model: (assume mean-adjusted)
yi =
p∑j=1
βjXj,i + εi .
Matrix form: X is the design matrix
Y = Xβ + ε.
Objective function: In particular, if p > T
β(λ) = arg minβ
(‖Y − Xβ‖22/T + λ‖β‖1),
where λ ≥ 0 is a penalty parameter, ‖β‖1 =∑p
j=1 |βj |,‖Y − Xβ‖22 =
∑Ti=1(yi − X ′iβ)2
Ruey S. Tsay Big Dependent Data 6 / 72
What is the big deal?
SparsityUsing convexity, LASSO is equivalent to
βopt (R) = arg minβ;‖β‖1≤R
‖Y − Xβ‖22/T .
Old friend: Ridge regression
βRidge(λ) = arg minβ
(‖Y − Xβ‖22/T + λ‖β‖22), or
β(R) = arg minβ;‖β‖2
2≤R‖Y − Xβ‖22/T .
Special case: p = 2. ‖Y − Xβ‖22/T is quadratic. ‖β‖1 is aregion of diamond shape, yet ‖β‖22 is a circle. Thus, LASSOleads to sparsity.
Ruey S. Tsay Big Dependent Data 7 / 72
Computation and extensions
1 Optimization: Least angle regression (lars) by Efron et al.(2004) makes the computation very efficient.
2 Extensions:Group lasso: Yuan and Lin (2006). Subsets of X havespecific meaning, e.g. treatmentElastic net: Zou and Hastie (2005). Using a combination ofL1 and L2 penaltiesSCAD: Fan and Li (2001). Nonconcave penalizedlikelihood. [Smoothly clipped absolute deviation (SCAD).]Various Bayesian methods: penalty function is the prior.
3 Packages available in R: lars, glmnet, gamlr, gbm andmany others.
Ruey S. Tsay Big Dependent Data 8 / 72
A simulated example
p = 300, T = 150, X iid N(0,1), εi iid N(0,0.25).
yi = x3i+2(x4i+x5i+x7i)−2(x11,i+x12,i+x13,i+x21,i+x22,i+x30,i)+εi
1 How? R demonstration2 Selection of λ? Cross-validation (10-fold), measurement of
prediction accuracy3 The commands lars and cv.lars of the package lars4 The commands glmnet and cv.glmnet of the package
glmnet5 Relationship between the two packages (alpha = 0)
Ruey S. Tsay Big Dependent Data 9 / 72
Lasso may fail for dependent data
1 Data generating model: scalar Gaussian autoregressive,AR(3), model
xt = 1.9xt−1 − 0.8xt−2 − 0.1xt−3 + at , at ∼ N(0,1).
Generate 2000 observations. See Figure 1.2 Big data setup
Dependent xt : t = 11, . . . ,2000Regressors: Xt = [xt−1, xt−2, . . . , xt−10, z1t , . . . , z10,t ], wherezit are iid N(0,1).Dimension = 20, sample size 1990.
3 Run the Lasso regression via the lars package of R. SeeFigure 2 for results. Lag 3, xt−3 was not selected.
Lasso fails in this case.
Ruey S. Tsay Big Dependent Data 10 / 72
Time
xt
0 500 1000 1500 2000
−400
00−3
5000
−300
00−2
5000
Figure: Time plot of simulated AR(3) time series with 2000observations
Ruey S. Tsay Big Dependent Data 11 / 72
*
******* *************** ** ** ***
********
* ****
** * ** * * *
0.0 0.2 0.4 0.6 0.8 1.0
−2e+
050e
+00
2e+0
54e
+05
|beta|/max|beta|
Stan
dard
ized
Coef
ficie
nts
* ******* *************** ** ** *** ***** *** **
***
*** ** * * *
* ******* *************** ** ** *** *****
**** *
***
** * ** * * ** ******* *************** ** ** ***
*****
**** * *** ** * ** * * ** ******* *************** ** **
***
***** *** * * *** ** * ** * * ** ******* *************** **
**
*** ***** *** * * *** ** * ** * * ** ******* ***************
**
** *** ***** *** * * *** ** * ** * * ** ******* *************
**** ** *** ***** *** * * *** ** * ** * * *
* ******* *************** ** ** *** ***** *** * * *** ** * ** * * ** ******* ***********
**** ** ** *** ***** *** * * *** ** * ** * * ** ******* *************** ** ** *** ***** *** * * *** ** * ** * * ** ******* *************** ** ** *** ***** *** * * *** ** * ** * * ** ******* *************** ** ** *** ***** *** * * *** ** * ** * * ** ******* *************** ** ** *** ***** *** * * *** ** * ** * * ** ******* *************** ** ** *** ***** *** * * *** ** * ** * * ** ******* *************** ** ** *** ***** *** * * *** ** * ** * * ** ******* *************** ** ** *** ***** *** * * *** ** * ** * * ** ******* *************** ** ** *** ***** *** * * *** ** * ** * * ** ******* *************** ** ** *** ***** *** * * *** ** * ** * * ** ******* *************** ** ** *** ***** *** * * *** ** * ** * * *
LASSO
26
51
0 1 9 23 28 35 39 40 43 48 50
Figure: Results of Lasso regression for the AR(3) series
Ruey S. Tsay Big Dependent Data 12 / 72
OLS works if we entertain AR models
Run the linear regression using the first three variables of Xt .Fitted model
xt = 1.902xt−1 − 0.807xt−2 − 0.095xt−3 + εt , σε = 1.01.
All estimates are statistically significant with p-value lessthan 2.22× 10−5.The residuals are well behaved, e.g. Q(10) = 12.23 withp-value 0.20 (after adjusting the df).
Simple time series method works for dependent data.
Ruey S. Tsay Big Dependent Data 13 / 72
Why does lasso fail?
Two possibilities:1 Scaling effect: Lasso standardizes each variable in Xt . For
unit-root non-stationary time series, standardization mightwash out the dependence in the stationary part
2 Multicollinearity: Unit-root time series have strong serialcorrelations. [ACF approach 1 for all lags.]
This artificial example highlights the difference betweenindependent and dependent data.
Need to develop methods for big dependent data!
Ruey S. Tsay Big Dependent Data 14 / 72
Possible solutions
1 Re-parameterization using time series properties2 Use different penalties for different parameters
The first approach is easier.For the particular time series, we can define ∆xt = (1− B)xtand ∆2xt = (1− B)2xt . Then,
xt = 1.9xt−1 − 0.8xt−2 − 0.1xt−3 + at
= xt−1 + ∆xt−1 − 0.1∆2xt−1 + at
= double + single + stationary + at .
The coefficients of xt−1,∆xt−1,∆2xt−1 are 1, 1, an −0.1,
respectively.
Ruey S. Tsay Big Dependent Data 15 / 72
Different frameworks for LASSO
The X -matrix of conventional LASSO consists of
(xt−1, xt−2, . . . , xt−10, z1t , . . . , z10,t ),
where zit are iid N(0,1).Under the re-parameterization, the X -matrix becomes
(xt−1,∆xt−1,∆2xt−1, . . . ,∆
2xt−8, z1t , . . . , z10,t ).
These two X -matrices provide theoretically the sameinformation. However, the first one has high multicollinearity,but the 2nd one does not, especially after standardization.
Ruey S. Tsay Big Dependent Data 16 / 72
5 10 15 20
0.00.4
0.8
1:20
β 2
5 10 15 20
0.00.4
0.8
1:20
β 3
5 10 15 20
−0.2
0.20.6
1.0
1:20
β 4
5 10 15 20
0.00.4
0.8
1:22
β 2
5 10 15 20
0.00.4
0.8
1:22
β 3
5 10 15 200.0
0.40.8
1:22
β 4
Figure: Comparison of β-estimates of lars results
Ruey S. Tsay Big Dependent Data 17 / 72
Theoretical justification
Focus on the particular series xt used. Some properties of theseries are
1 T−4∑Tt=1 x2
t ⇒∫ 1
0 W 2, where W =∫ 1
0 W (s)ds with W (s)the standard Brownian motion.
2 T−5/2∑Tt=1 xt ⇒
∫ 10 W
3 T−3∑Tt=1 xt ∆xt ⇒
∫ 10 WW
4 T−2∑Tt=1(∆xt )
2 ⇒∫ 1
0 W 2
Standardization may wash out the ∆xt−1 and ∆2xt−1 parts.
Ruey S. Tsay Big Dependent Data 18 / 72
Examples of big dependent data
1 Daily returns of U.S. stocks2 Demand of electricity every 30-m intervals3 Daily spreads of CDS (credit default swaps) of selected
companies4 Monthly unemployment rates of the 50 states of U.S.5 Interest rates of an economy6 Air pollution measurements of multiple locations and health
risk. Complex spatio-temporal data in general.
Ruey S. Tsay Big Dependent Data 19 / 72
2012−2013
days
N(sto
cks)
0 100 200 300 400 500
6600
6700
6600
6700
size
Figure: Sample sizes of U.S. daily stock returns in 2012 and 2013:mean 6681, range = (6593,6774)
Ruey S. Tsay Big Dependent Data 20 / 72
Time series plot
−0.10 0.00 0.10
Densities of 2012
lnreturn
dens
ity
−0.10 0.00 0.10
Densities of 2013
lnreturn
dens
ity
Figure: Densities of daily log returns of U.S. stocks in 2012 and 2013.
Ruey S. Tsay Big Dependent Data 21 / 72
1000 2000
0.00
0.01
0.02
0.03
0.04
0.05
demand
Monday
1000 2000
demand
Tuesday
1000 2000
demand
Wednesday
1000 2000
demand
Thursday
1000 2000
demand
Friday
1000 2000
demand
Saturday
1000 2000
demand
Sunday
Figure: Empirical densities of electricity demand, 30 minute intervals,from July 6, 1997 to March 31, 2007. Adelaide, Australia
Ruey S. Tsay Big Dependent Data 22 / 72
1980 1990 2000 2010
510
15
year
urat
e
State UNRATE: 1976.1 to 2015.9
Figure: Time plots of monthly state unemployment rates of the U.S.from 1976.1 to 2015.9
Ruey S. Tsay Big Dependent Data 23 / 72
Some statistical methods
Goal: Extract useful information, including pooling.
1 Classification and cluster analysisK meansTree-based classificationModel-based classification
2 Factor models & ExtensionsOrthogonal factor modelApproximate factor modelDynamic factor modelConstrained factor models (column, row constraints)
X t = Rf tC + et
3 Generalizations of Lasso methods to dependent data, e.g.LASSO for nowcasting vs MIDAS
Ruey S. Tsay Big Dependent Data 24 / 72
Constrained factor models
Column (variable) constraint only: Tsai & Tsay (2010) Let z t bea k -dimensional time series
z t = Hωf t + εt , t = 1, . . . ,T
where H is a k × r known matrix, f t is m-dimensional commonfactor, ω is r ×m unknown loading parameters.For observed data in matrix form
Z = Fω′H ′ + ε
Ruey S. Tsay Big Dependent Data 25 / 72
A simple illustration
Monthly log returns of 10 stocks from 2001 to 20111 Semi-conductor: TXN, MU, INTC, TSM2 Pharmaceutical: PFE, MRK, LLY3 Investment bank: JPM, MS, GS
The constraints H = [h1,h2,h3], where
h1 = (1,1,1,1,0,0,0,0,0,0)′
h2 = (0,0,0,0,1,1,1,0,0,0)′
h3 = (0,0,0,0,0,0,0,1,1,1)′
Ruey S. Tsay Big Dependent Data 26 / 72
Table: Estimation Results of Constrained and Orthogonal FactorModels
Stock Constrained Model: L = Hω Orthogonal Model: PCATick L1 L2 L3 Σε,i L1 L2 L3 Σε,i
TXN 0.76 0.26 0.27 0.28 0.79 0.20 0.32 0.24MU 0.76 0.26 0.27 0.28 0.67 0.36 0.29 0.34INTC 0.76 0.26 0.27 0.28 0.79 0.18 0.33 0.23TSM 0.76 0.26 0.27 0.28 0.80 0.27 0.16 0.26PFE 0.44 -0.68 0.10 0.34 0.49 -0.64 -0.03 0.35MRK 0.44 -0.68 0.10 0.34 0.40 -0.69 0.23 0.31LLY 0.44 -0.68 0.10 0.34 0.45 -0.70 0.06 0.31JPM 0.74 0.06 -0.43 0.27 0.72 0.02 -0.35 0.36MS 0.74 0.06 -0.43 0.27 0.76 0.05 -0.43 0.25GS 0.74 0.06 -0.43 0.27 0.75 0.12 -0.50 0.18e.v. 4.58 1.65 0.88 4.63 1.68 0.93
Variability explained: 70.6% Variability explained: 72.4%
Ruey S. Tsay Big Dependent Data 27 / 72
Both row and column constraints
: Tsai, et al (2016) T observations and k variables. Data matrixform
Z = F 1ω′1H ′ + GF 2ω
′2 + GF 3ω
′3H ′ + E ,
where G denotes a known T ×m row constraint matrix.
Ruey S. Tsay Big Dependent Data 28 / 72
Figure: The census regions and divisions of the United States
Ruey S. Tsay Big Dependent Data 29 / 72
1998 2002 2006
7.88.4
New England
1998 2002 2006
8.69.0
9.4
Middle Atlantic
1998 2002 2006
9.29.6
East North Central
year
1998 2002 2006
8.28.8
9.4
West Noth Central
1998 2002 2006
10.2
10.8
South Atlantic
1998 2002 2006
8.69.0
9.4East South Central
year
1998 2002 2006
9.29.6
10.2
West South Central
1998 2002 2006
9.49.8
moutain
1998 2002 2006
9.49.8
Pacific
year
Figure: Time plots of monthly housing starts (in logarithms) of 9 U.S.divisions: 1997-2006.
Ruey S. Tsay Big Dependent Data 30 / 72
F1[,1]
Time
0 20 40 60 80 100 120
−1.0
−0.5
0.0
0.5
1.0
F1[,2]
Time
0 20 40 60 80 100 120
−1.5
−0.5
0.5
1.0
1.5
F2[,1]
Time
2 4 6 8 10 12
−0.6
−0.2
0.0
0.2
0.4
F2[,2]
Time
2 4 6 8 10 12
−0.2
0.0
0.2
0.4
F3[,1]
2 4 6 8 10 12
−0.0
3−0
.01
0.01
0.03
F3[,2]
2 4 6 8 10 12
−0.8
−0.4
0.0
0.4
Figure: Time series plots of common factors for a DCF model of order(r,p,q) = (2,2,2) via maximum likelihood estimation.
Ruey S. Tsay Big Dependent Data 31 / 72
−0.10
−0.05
0.00
0.05
New
Engla
nd
−0.10
−0.05
0.00
0.05
Midd
le At
lantic
−0.2
−0.1
0.00.1
East
North
Cen
tral
−0.20
−0.10
0.00
0.10
Wes
t Nor
th Ce
ntral
−0.08
−0.04
0.00
0.04
South
Atla
ntic
−0.20
−0.10
0.00
0.10
East
South
Cen
tral
−0.15
−0.05
0.05
Wes
t Sou
th Ce
ntral
−0.10
−0.05
0.00
0.05
Moun
tain
−0.06
−0.02
0.02
0 20 40 60 80 100 120
Pacif
ic
Index
ts(Gterm)
Figure: Time series plots for GF 2ω′2 of a fitted DCF model of order
(2,2,2). Maximum likelihood estimation is used.Ruey S. Tsay Big Dependent Data 32 / 72
−0.4
−0.2
0.00.2
New
Engla
nd
−0.4
−0.2
0.00.2
Midd
le At
lantic
−0.4
−0.2
0.00.2
East
North
Cen
tral
−0.4
−0.2
0.00.2
Wes
t Nor
th Ce
ntral
−0.4
−0.2
0.00.2
0.4
South
Atla
ntic
−0.4
−0.2
0.00.2
0.4
East
South
Cen
tral
−0.4
−0.2
0.00.2
0.4
Wes
t Sou
th Ce
ntral
−0.4
−0.2
0.00.2
0.4
Moun
tain
−0.4
−0.2
0.00.2
0.4
0 20 40 60 80 100 120
Pacif
ic
Index
ts(Hterm)
Figure: Time series plots for F 1ω′1H ′ of a fitted DCF model of order
(2,2,2). Maximum likelihood estimation is used.Ruey S. Tsay Big Dependent Data 33 / 72
−0.10
−0.05
0.00
0.05
New
Engla
nd
−0.10
−0.05
0.00
0.05
Midd
le At
lantic
−0.10
−0.05
0.00
0.05
East
North
Cen
tral
−0.10
−0.05
0.00
0.05
Wes
t Nor
th Ce
ntral
−0.3
−0.2
−0.1
0.00.1
South
Atla
ntic
−0.3
−0.2
−0.1
0.00.1
East
South
Cen
tral
−0.3
−0.2
−0.1
0.00.1
Wes
t Sou
th Ce
ntral
−0.10
−0.05
0.00
0.05
Moun
tain
−0.10
−0.05
0.00
0.05
0 20 40 60 80 100 120
Pacif
ic
Index
ts(GHterm)
Figure: Time series plots for GF 3ω′3H ′ of a fitted DCF model of order
(2,2,2). Maximum likelihood estimation is used.Ruey S. Tsay Big Dependent Data 34 / 72
Matrix-valued variables
Consider simultaneously n macroeconomic variables in kcountries
U.S. Italy Spain · · · CanadaGDP X11,t X12,t X13,t · · · X1k ,tUnem X21,t X22,t X23,t X2k ,tCPI X31,t X32,t X33,t X3k ,t
......
...M1 Xn1,t Xn2,t Xn3,t · · · Xnk ,t
On-going: only preliminary results are available. See Chen et al(2016)
Ruey S. Tsay Big Dependent Data 35 / 72
Classification
A possible approach: Use a two-step procedure1 Transform dependent big data into functions, e.g.
probability densities2 Apply classification methods to functional data
The density functions of daily log returns of U.S. stocks serveas an example.We can then classify the density functions to make statisticalinference
Ruey S. Tsay Big Dependent Data 36 / 72
Illustration of classification
Cluster Analysis of density functions
Consider the time series of density functions {ft (x)}.For simplicity, assume the densities are evaluated atequally-spaced grid point {x1 < x2 < . . . < xN} ∈ D withincrement ∆x . The data we have become{ft (xi)|t = 1, . . . ,T ; i = 1, . . . ,N}.
Using Hellinger distance (HD), we consider two methods:K meansTree-based classification
Ruey S. Tsay Big Dependent Data 37 / 72
Hellinger distance of two density functions
Let f (x) and g(x) be two density functions on the commondomain D ⊂ R. Assume both density functions are absolutelycontinuous w.r.t. the Lebesgue measure. The Hellingerdistance (HD) between f (x) and g(x) is defined as
H(f ,g)2 =12
∫D
(√f (x)−
√g(x)
)2dx = 1−
∫D
√f (x)g(x)dx
Basic properties:1 H(f ,g) ≥ 02 H(f ,g) = 0 if and only if f (x) = g(x) almost surely.
Ruey S. Tsay Big Dependent Data 38 / 72
K-means method
For a given K , the K-means method seeks partitions of thedensities, say, C1, . . . ,CK , such that
1⋃K
k=1 Ck = {ft (x)}2 Ci
⋂Cj = ∅ for i 6= j
3 Sum of within-cluster variation V =∑K
k=1 V (Ck ) isminimized, where the within-cluster variation is
V (Ck ) =∑
t1,t2∈Ck
H(ft1 , ft2)2
It turns out this can easily be done by applying the K-meansmethod with squared Euclidean distance to the squared-rootdensities {
√ft (x)}.
Ruey S. Tsay Big Dependent Data 39 / 72
Example of K-means
Consider the 48 density functions of half-hour demand ofelectricity on Monday in Adelaide, Australia.With K = 4 clusters, we have
k Elements (time index) Calendar Hours1 17 to 44 8:00 AM to 10:00 PM2 15, 16, 45 to 48, 1, 2, 3 7:00 − 8:00 AM; 10:00 PM − 1:30 AM3 4, 5, 13, 14 1:30 − 2:30 AM; 6:00 − 7:00 AM4 6 to 12 2:30 − 6:00 AM
Result: capture daily activities, namely, (1) active period, (2)transition period, (3) light sleeping period, and (4) soundsleeping period.
Ruey S. Tsay Big Dependent Data 40 / 72
1000 1500 2000 2500 3000
0.000
0.001
0.002
0.003
0.004
megawatts
dens
ity
Mondaydemand
Figure: Density functions of half-hour electricity demand on Mondayat Adelaide, Australia. The sample period is from July 6, 1997 toMarch 31, 2007.
Ruey S. Tsay Big Dependent Data 41 / 72
1000 1500 2000 2500 3000
0.000
0.001
0.002
0.003
0.004
Megawatts
dens
ity
Figure: Results of K-means Cluster Analysis Based on SquaredHellinger Distance for Electricity Demands on Monday. Differentcolors denote different clusters.
Ruey S. Tsay Big Dependent Data 42 / 72
Tree-based classification
Let Z t = (z1t , . . . , zpt )′ denote p covariates. We use an iterative
procedure to build a binary tree, starting with the rootC0 = {ft (x)}.
1 For each covariate zit , let zi(j) be the j th order statistic1 Divide C0 into two sub-clusters
Ci,j,1 = {ft (x)|zit ≤ zi(j)}; Ci,j,2 = {ft (x)|zit > zi(j)}
2 Compute the sum of within-cluster variations
H(i , j) = V (Ci,j,1) + V (Ci,j,2)
3 Find the smallest j , say vi , such that H(i , vi ) = minj{H(i , j)}.2 Select i ∈ {1, . . . ,p}, say I, such that
H(I, vI) = mini{H(i , vi)}.3 Use covariate zIt with threshold vI to grow two new leaves,
i.e.C1,1 = CI,vI ,1, C1,2 = CI,vI ,2
Ruey S. Tsay Big Dependent Data 43 / 72
Tree-based procedure continued
Next, consider C1,1 and C1,2 as the root of a branch and applythe same procedure with their associated covariates to findcandidate for growth.The only modification is as follows: When considering C1,1, wetreat C1,2 as a leaf in computing the sum of within-clustervariations. Similarly, when considering C1,2 for further division,we treat C1,1 as a leaf in computing the sum of within-clustervariations.This growth-procedure is iterated until the number of clusters Kis reached.
Ruey S. Tsay Big Dependent Data 44 / 72
Example of tree-based classification
Consider the density functions of U.S. daily log stock returns in2012 and 2013.Using the first-differenced VIX index as the explanatory variableand K = 4, we obtain 4 clusters as follows:
(−∞,−0.73], (−0.73,0.39], (0.39,1,19], (1.19,∞).
The cluster sizes are 104, 259, 86, and 53, respectively.Note that positive zt signifies an increase in market volatility(uncertainty).
Ruey S. Tsay Big Dependent Data 45 / 72
What drove the U.S. financial market?
The Fear Factor
days
VIX
0 100 200 300 400 500
1525
Change series of VIX
days
diff(V
IX)
0 100 200 300 400 500
−40
4
Figure: Time plots of the market fear factor (VIX index) and its changeseries: 2012-2013
Ruey S. Tsay Big Dependent Data 46 / 72
−0.2 −0.1 0.0 0.1 0.2
020
4060
log−rtn
dens
ity
dvix > 1.19
−0.2 −0.1 0.0 0.1 0.2
020
4060
log−rtn
dens
ity
1.19 >= dvix > 0.39
−0.2 −0.1 0.0 0.1 0.2
020
4060
log−rtn
dens
ity
0.39 >= dvix > −0.73
−0.2 −0.1 0.0 0.1 0.2
020
4060
log−rtn
dens
ity
dvix <= −0.73
Figure: Results of Tree-based Cluster Analysis for the Daily Densitiesof Log Returns of the U.S. Stocks in 2012 and 2013. Thefirst-differenced series of the VIX index is used as the explanatoryvariable. The numbers of element for the clusters are 53, 86, 259,and 104, respectively. The cluster classification is given in theheading of each plot. Ruey S. Tsay Big Dependent Data 47 / 72
Model-based classification
Work directly on observed multiple time series1 Postulate a general univariate model for all time series,
e.g. an AR(p) model2 Time series in a cluster follow the same model: Pooling
data to estimate common parameters3 Time series in different clusters follow different models4 May be estimated by Markov chain Monte Carlo methods5 May employ scaled-mixture of normal innovations to
handle outliersHave been widely studied, e.g. Wang et al (2013) andFruehwirth-Schnatter (2011), among others.
Ruey S. Tsay Big Dependent Data 48 / 72
Application
1 Apply to monthly unemployment rates of 50 states of theU.S.
2 Use out-of-sample predictions to compare with othermethods, including lasso.
3 For 1-step to 5-step ahead predictions, the model-basedmethod works well in comparison. Wang et al (2013, JoF).
Ruey S. Tsay Big Dependent Data 49 / 72
RMSE×104 MAE×104
Method m = 1 m = 2 m = 3 m = 4 m = 1 m = 2 m = 3 m = 4UAR 1616 1492 1791 2073 879 994 1268 1386VAR 2676 2095 2129 2759 1349 1353 1506 1624
Lasso25 1798 1833 2063 2504 1245 1250 1332 1401Lasso15 1714 1798 1855 2028 1186 1228 1296 1399G-Lasso 1877 1865 1882 1905 1291 1290 1306 1327
LVAR 1550 1716 1806 1904 1065 1298 1210 1355Pls10 1239 1531 1679 1873 909 1028 1263 1226Pls30 1395 1651 1835 1890 933 1092 1281 1320Pls50 1685 1871 2006 1967 940 1158 1304 1377Pls70 1914 2040 2182 1953 996 1222 1362 1432
Pls100 2187 2279 2313 2123 1099 1342 1480 1552Pcr10 1276 1829 2077 2108 890 1073 1247 1415Pcr30 1577 1837 2049 1769 888 1093 1261 1321Pcr50 1546 1805 2017 1759 880 1035 1209 1260Pcr70 1594 1837 2049 1769 886 1042 1221 1283
Pcr100 1649 2117 2202 2163 1068 1243 1324 1421MBC 1607 1703 1809 1961 885 1035 1225 1361rMBC 1225 1481 1691 1839 873 1027 1193 1295
Table: Root mean squared errors (RMSE) and mean absolute error(MAE) of 1-step to 4-step ahead out-of-sample forecasts for variousmodels applied to 50 state unemployment rates. The forecastingperiod is from January 2006 to September 2011. In the table, mdenotes the forecasting horizon. The models used are univariateAR(4) model (UAR), traditional VAR(4) model (VAR), VAR(4) withLASSO and s = 0.25 of L1 norm (Lasso25), VAR(4) with LASSO ands = 0.15 of L1 norm (Lasso15), group LASSO (G-Lasso), large VectorAutoregression of Song and Bickel (LVAR), partial least squares withthe first k components (Plsk , k = 10,30,50,70,100), principalcomponent regression with the first k components (Pcrk ,k = 10,30,50,70,100), model-based clustering (MBC), and robustmodel-based clustering (rMBC).
Ruey S. Tsay Big Dependent Data 50 / 72
Functional PCA: Singular value decomposition
1 A tool to study the time evolution of the return distributions2 Data set: In this particular instance, each density function
is evaluated at 512 points and we have
Y = [Yit = ft (xi)|i = 1, . . . ,N; t = 1, . . . ,T ]512×502
3 Perform singular value decomposition
Y = (N − 1)UDV ′
where Y denotes column-mean adjusted data matrix, U isan N × N unitary matrix, D is an N × T rectangulardiagonal matrix, and V is a T × T unitary matrix.
4 This is a simple form of functional PCA. [Large samples,smoothing of PC is not needed.]
Ruey S. Tsay Big Dependent Data 51 / 72
Scree plot
Comp.1 Comp.3 Comp.5 Comp.7 Comp.9
ScreeplotVa
rianc
es
050
0010
000
1500
020
000
Figure: Scree plot of PCA for daily return densities in 2012 and 2013.
Ruey S. Tsay Big Dependent Data 52 / 72
The first 6 PC functions
−0.2 −0.1 0.0 0.1 0.2
040
080
0
lnreturn
pc1
−0.2 −0.1 0.0 0.1 0.2
−150
015
0
lnreturn
pc2
−0.2 −0.1 0.0 0.1 0.2
−100
0
lnreturn
pc3
−0.2 −0.1 0.0 0.1 0.2
−60
040
lnreturn
pc4
−0.2 −0.1 0.0 0.1 0.2
−30
020
lnreturn
pc5
−0.2 −0.1 0.0 0.1 0.2−2
00
20lnreturn
pc6
Figure: The first 6 PC functions for daily log return densities in 2012and 2013.
Ruey S. Tsay Big Dependent Data 53 / 72
The next 6 PC functions
−0.2 −0.1 0.0 0.1 0.2
−15
010
lnreturn
pc7
−0.2 −0.1 0.0 0.1 0.2
−10
010
lnreturn
pc8
−0.2 −0.1 0.0 0.1 0.2
−10
05
lnreturn
pc9
−0.2 −0.1 0.0 0.1 0.2
−50
5
lnreturn
pc10
−0.2 −0.1 0.0 0.1 0.2
−60
4
lnreturn
pc11
−0.2 −0.1 0.0 0.1 0.2−4
02
lnreturn
pc12
Figure: The 7th-12th PC functions for daily log return densities in2012 and 2013.
Ruey S. Tsay Big Dependent Data 54 / 72
Meaning of PC functions? 1st
−0.2 −0.1 0.0 0.1 0.2
010
2030
40
lnreturn
pc1
Mean density pm first PC
Figure: Mean density ± 1st PC: Peak and tails: mean+ standardized1st PC (red).
Ruey S. Tsay Big Dependent Data 55 / 72
Meaning of PC functions? 2nd
−0.2 −0.1 0.0 0.1 0.2
010
2030
40
lnrturn
pc2
Mean density pm 2nd PC
Figure: Mean density ± 2nd PC: Midrange returns
Ruey S. Tsay Big Dependent Data 56 / 72
Meaning of PC functions? 3rd
−0.2 −0.1 0.0 0.1 0.2
010
2030
40
lnreturn
pc3
Mean density pm 3rd PC
Figure: Mean density ± 3rd PC: Curvature
Ruey S. Tsay Big Dependent Data 57 / 72
Approximate factor models
ft (x) =
p∑i=1
λt ,igi(x) + εt (x),
where gi(x) denotes the i th common factor and εt (x) is thenoise function.
1 A generalization of the orthogonal factor model, but allowsthe error functions to be correlated.
2 Only asymptotically identified under some regularityconditions.
3 FPCA provides a way to estimate approximate factormodels.
Ruey S. Tsay Big Dependent Data 58 / 72
Loadings of the first PC function
−4 −2 0 2 4
−0.05
−0.04
−0.03
−0.02
dvix
Load
ings
Figure: Scatter plot of loadings vs changes in VIX index. Red linedenotes lowess fit
Ruey S. Tsay Big Dependent Data 59 / 72
Functional PC via Thresholding
1 Zero appears to be a reasonable and natural threshold2 Regime 1: dvix ≥ 0 with 244 days. [Volatile (bad) state]3 Regime 2: dvix < 0 with 258 days. [Calm (good) state]4 Perform PCA of density functions for each regime.5 The differences are clearly seen.6 Leads to different approximate factor models for the
density functions
Ruey S. Tsay Big Dependent Data 60 / 72
Scree plots
Comp.1 Comp.3 Comp.5 Comp.7 Comp.9
dvix >= 0Va
rianc
es
040
0010
000
Comp.1 Comp.3 Comp.5 Comp.7 Comp.9
dvix < 0
Varia
nces
060
00
Figure: Scree plots of PCA for each regime
Ruey S. Tsay Big Dependent Data 61 / 72
The first 6 PC functions
−0.2 −0.1 0.0 0.1 0.2
030
060
0
lnreturn
pc1
−0.2 −0.1 0.0 0.1 0.2
−100
010
0
lnreturn
pc2
−0.2 −0.1 0.0 0.1 0.2
−60
040
lnreturn
pc3
−0.2 −0.1 0.0 0.1 0.2
−20
20
lnreturn
pc4
−0.2 −0.1 0.0 0.1 0.2
−20
020
lnreturn
pc5
−0.2 −0.1 0.0 0.1 0.2−1
50
10lnreturn
pc6
Figure: The first 6 PC functions for daily log return densities for eachregime: red line is for the Calm state, Regime 2
Ruey S. Tsay Big Dependent Data 62 / 72
Approximate factor models
1 Use approximate factor models with the first 12 principalcomponent functions
2 Compare overall fits with/without thresholding3 For Regime 1 (positive dvix): randomly select day 174 For Regime 2 (negative dvix): randomly select day 420.5 Check: (a) observed vs fits and (b) residuals of
with/without thresholding6 With 12 components, both approaches fair well, but
thresholding provides improvements.
Ruey S. Tsay Big Dependent Data 63 / 72
Comparison: day 17 (in Regime 1)
−0.2 −0.1 0.0 0.1 0.2
015
30
lnreturn
dens
itydensity and its fits: day 17
−0.2 −0.1 0.0 0.1 0.2
−0.4
0.20.6
lnreturn
differ
ence
Error in approximation: red (Thr)
Figure: Top plot: observed (black), all (red), Thr (blue). Bottom plot:all (black), Thr (red)
Ruey S. Tsay Big Dependent Data 64 / 72
Comparison: day 420 (in Regime 2)
−0.2 −0.1 0.0 0.1 0.2
010
2030
lnreturn
dens
itydensity and its fits: day 420
−0.2 −0.1 0.0 0.1 0.2
−0.6
0.0
0.4
lnreturn
erro
rs
Errors of approximation: day 420, red(Thr)
Figure: Top plot: observed (black), all (red), Thr (blue). Bottom plot:all (black), Thr (red)
Ruey S. Tsay Big Dependent Data 65 / 72
Lasso and beyond
1 Need to exploit parsimony, beyond sparsity2 Need to take into account prior knowledge. We have
accumulated lot of knowledge in diverse scientific areas.How to take advantages of this knowledge?
3 Variable selection is not sufficient. More importantly, whatare the proper measurements to take? What questions cana given big data answer?
Ruey S. Tsay Big Dependent Data 66 / 72
An illustration
Every country has many interest series1 have different maturities2 serve different financial purposes3 What is the information embedded in those interest rate
series?Consider U.S. weekly constant maturity interest rates
1 From January 8, 1982 to October 30, 20152 Maturities: 3m, 6m, 1y, 2y, 3y, 5y, 7y, 10y, and 30y∗
Ruey S. Tsay Big Dependent Data 67 / 72
1985 1990 1995 2000 2005 2010 2015
05
1015
Figure: Time plots of U.S. weekly interest rates with differentmaturities: 1/8/1982 to 10/30/2015.
Ruey S. Tsay Big Dependent Data 68 / 72
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
p1
020
4060
80
Figure: Screeplot of U.S. weekly interest rates.
Ruey S. Tsay Big Dependent Data 69 / 72
1985 1990 1995 2000 2005 2010 2015
−10
010
2030
Figure: Time plots of the first four principal components of U.S.weekly interest rates
Ruey S. Tsay Big Dependent Data 70 / 72
Implication?
In lasso-type of analysis,1 should we use the interest rate series directly? Even with
group lasso.This leads to sparsity.
2 should we apply PCA first, then use the PCs?This leads to parsimony.
3 should we develop other possibilities? Fused lasso?Factor models?
Ruey S. Tsay Big Dependent Data 71 / 72
Concluding Remark
1 Big dependent data appear in many applications2 Methods developed for independent big data may fail3 Statistical methods for big dependent data are relatively
under-developed4 Some new challenges emerge, new opportunities exist5 Simple modifications of the traditional methods might work
well6 Both theory and methods require further research
Ruey S. Tsay Big Dependent Data 72 / 72