+ All Categories
Home > Documents > Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically...

Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically...

Date post: 23-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
72
Analysis of Big Dependent Data in Economics and Finance Ruey S. Tsay Booth Shool of Business, University of Chicago September 2016 Ruey S. Tsay Big Dependent Data 1 / 72
Transcript
Page 1: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Analysis of Big Dependent Data in Economicsand Finance

Ruey S. TsayBooth Shool of Business, University of Chicago

September 2016

Ruey S. Tsay Big Dependent Data 1 / 72

Page 2: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Outline

1 Big data? Machine learning? Data science? What is in foreconomics and finance?

2 Real-world data are often dynamically dependent3 A simple example: Methods for independent data may fail4 Trade-off between simplicity and reality5 Some methods useful for analyzing big dependent data in

economics and finance6 Examples7 Concluding remarks

Ruey S. Tsay Big Dependent Data 2 / 72

Page 3: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Big dependent data

1 Accurate information is the key to success in thecompetitive global economy. Information age.

2 What is big data? High dimension (many variables)? Largesample size? Both?

3 Not all big data sets are useful. Confounding & Noises4 Need to develop methods to efficiently extract useful

information from big data5 Know the limitations of big data6 Issues emerged from big data: privacy? ethical issues?7 Focus on methods for analyzing big dependent data in

economics and finance

Ruey S. Tsay Big Dependent Data 3 / 72

Page 4: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

What are available?

Statistical methods:1 Focus on sparsity (Simplicity)2 Various penalized regressions, e.g. Lasso and its

extensions3 Various dimension reduction methods and models4 Common framework used: Independent observations, with

limited extensions to stationary dataReal data are often dynamically dependent!

Some useful concepts in analyzing big data:1 Parsimony vs sparsity: Parsimony⇒ Sparsity2 Simplicity vs reality: trade-off btw feasibility &

sophistication

Ruey S. Tsay Big Dependent Data 4 / 72

Page 5: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Parsimonious, not sparse

A simple example

yt = c +k∑

i=1

βxit + εt = c + β

k∑i=1

xit + εt ,

where k is large, xit are not perfectly correlated, and εt are iidN(0, σ2).The model has three parameters so it is parsimonious, but notsparse because y depends on all explanatory variables.In some applications,

∑ki=1 xit is a close approximation to the

first principal component. For example, the level of interestrates is important to an economy.Fused-Lasso can solve this difficulty in some situations.

Ruey S. Tsay Big Dependent Data 5 / 72

Page 6: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

What is LASSO regression?

Model: (assume mean-adjusted)

yi =

p∑j=1

βjXj,i + εi .

Matrix form: X is the design matrix

Y = Xβ + ε.

Objective function: In particular, if p > T

β(λ) = arg minβ

(‖Y − Xβ‖22/T + λ‖β‖1),

where λ ≥ 0 is a penalty parameter, ‖β‖1 =∑p

j=1 |βj |,‖Y − Xβ‖22 =

∑Ti=1(yi − X ′iβ)2

Ruey S. Tsay Big Dependent Data 6 / 72

Page 7: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

What is the big deal?

SparsityUsing convexity, LASSO is equivalent to

βopt (R) = arg minβ;‖β‖1≤R

‖Y − Xβ‖22/T .

Old friend: Ridge regression

βRidge(λ) = arg minβ

(‖Y − Xβ‖22/T + λ‖β‖22), or

β(R) = arg minβ;‖β‖2

2≤R‖Y − Xβ‖22/T .

Special case: p = 2. ‖Y − Xβ‖22/T is quadratic. ‖β‖1 is aregion of diamond shape, yet ‖β‖22 is a circle. Thus, LASSOleads to sparsity.

Ruey S. Tsay Big Dependent Data 7 / 72

Page 8: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Computation and extensions

1 Optimization: Least angle regression (lars) by Efron et al.(2004) makes the computation very efficient.

2 Extensions:Group lasso: Yuan and Lin (2006). Subsets of X havespecific meaning, e.g. treatmentElastic net: Zou and Hastie (2005). Using a combination ofL1 and L2 penaltiesSCAD: Fan and Li (2001). Nonconcave penalizedlikelihood. [Smoothly clipped absolute deviation (SCAD).]Various Bayesian methods: penalty function is the prior.

3 Packages available in R: lars, glmnet, gamlr, gbm andmany others.

Ruey S. Tsay Big Dependent Data 8 / 72

Page 9: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

A simulated example

p = 300, T = 150, X iid N(0,1), εi iid N(0,0.25).

yi = x3i+2(x4i+x5i+x7i)−2(x11,i+x12,i+x13,i+x21,i+x22,i+x30,i)+εi

1 How? R demonstration2 Selection of λ? Cross-validation (10-fold), measurement of

prediction accuracy3 The commands lars and cv.lars of the package lars4 The commands glmnet and cv.glmnet of the package

glmnet5 Relationship between the two packages (alpha = 0)

Ruey S. Tsay Big Dependent Data 9 / 72

Page 10: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Lasso may fail for dependent data

1 Data generating model: scalar Gaussian autoregressive,AR(3), model

xt = 1.9xt−1 − 0.8xt−2 − 0.1xt−3 + at , at ∼ N(0,1).

Generate 2000 observations. See Figure 1.2 Big data setup

Dependent xt : t = 11, . . . ,2000Regressors: Xt = [xt−1, xt−2, . . . , xt−10, z1t , . . . , z10,t ], wherezit are iid N(0,1).Dimension = 20, sample size 1990.

3 Run the Lasso regression via the lars package of R. SeeFigure 2 for results. Lag 3, xt−3 was not selected.

Lasso fails in this case.

Ruey S. Tsay Big Dependent Data 10 / 72

Page 11: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Time

xt

0 500 1000 1500 2000

−400

00−3

5000

−300

00−2

5000

Figure: Time plot of simulated AR(3) time series with 2000observations

Ruey S. Tsay Big Dependent Data 11 / 72

Page 12: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

*

******* *************** ** ** ***

********

* ****

** * ** * * *

0.0 0.2 0.4 0.6 0.8 1.0

−2e+

050e

+00

2e+0

54e

+05

|beta|/max|beta|

Stan

dard

ized

Coef

ficie

nts

* ******* *************** ** ** *** ***** *** **

***

*** ** * * *

* ******* *************** ** ** *** *****

**** *

***

** * ** * * ** ******* *************** ** ** ***

*****

**** * *** ** * ** * * ** ******* *************** ** **

***

***** *** * * *** ** * ** * * ** ******* *************** **

**

*** ***** *** * * *** ** * ** * * ** ******* ***************

**

** *** ***** *** * * *** ** * ** * * ** ******* *************

**** ** *** ***** *** * * *** ** * ** * * *

* ******* *************** ** ** *** ***** *** * * *** ** * ** * * ** ******* ***********

**** ** ** *** ***** *** * * *** ** * ** * * ** ******* *************** ** ** *** ***** *** * * *** ** * ** * * ** ******* *************** ** ** *** ***** *** * * *** ** * ** * * ** ******* *************** ** ** *** ***** *** * * *** ** * ** * * ** ******* *************** ** ** *** ***** *** * * *** ** * ** * * ** ******* *************** ** ** *** ***** *** * * *** ** * ** * * ** ******* *************** ** ** *** ***** *** * * *** ** * ** * * ** ******* *************** ** ** *** ***** *** * * *** ** * ** * * ** ******* *************** ** ** *** ***** *** * * *** ** * ** * * ** ******* *************** ** ** *** ***** *** * * *** ** * ** * * ** ******* *************** ** ** *** ***** *** * * *** ** * ** * * *

LASSO

26

51

0 1 9 23 28 35 39 40 43 48 50

Figure: Results of Lasso regression for the AR(3) series

Ruey S. Tsay Big Dependent Data 12 / 72

Page 13: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

OLS works if we entertain AR models

Run the linear regression using the first three variables of Xt .Fitted model

xt = 1.902xt−1 − 0.807xt−2 − 0.095xt−3 + εt , σε = 1.01.

All estimates are statistically significant with p-value lessthan 2.22× 10−5.The residuals are well behaved, e.g. Q(10) = 12.23 withp-value 0.20 (after adjusting the df).

Simple time series method works for dependent data.

Ruey S. Tsay Big Dependent Data 13 / 72

Page 14: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Why does lasso fail?

Two possibilities:1 Scaling effect: Lasso standardizes each variable in Xt . For

unit-root non-stationary time series, standardization mightwash out the dependence in the stationary part

2 Multicollinearity: Unit-root time series have strong serialcorrelations. [ACF approach 1 for all lags.]

This artificial example highlights the difference betweenindependent and dependent data.

Need to develop methods for big dependent data!

Ruey S. Tsay Big Dependent Data 14 / 72

Page 15: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Possible solutions

1 Re-parameterization using time series properties2 Use different penalties for different parameters

The first approach is easier.For the particular time series, we can define ∆xt = (1− B)xtand ∆2xt = (1− B)2xt . Then,

xt = 1.9xt−1 − 0.8xt−2 − 0.1xt−3 + at

= xt−1 + ∆xt−1 − 0.1∆2xt−1 + at

= double + single + stationary + at .

The coefficients of xt−1,∆xt−1,∆2xt−1 are 1, 1, an −0.1,

respectively.

Ruey S. Tsay Big Dependent Data 15 / 72

Page 16: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Different frameworks for LASSO

The X -matrix of conventional LASSO consists of

(xt−1, xt−2, . . . , xt−10, z1t , . . . , z10,t ),

where zit are iid N(0,1).Under the re-parameterization, the X -matrix becomes

(xt−1,∆xt−1,∆2xt−1, . . . ,∆

2xt−8, z1t , . . . , z10,t ).

These two X -matrices provide theoretically the sameinformation. However, the first one has high multicollinearity,but the 2nd one does not, especially after standardization.

Ruey S. Tsay Big Dependent Data 16 / 72

Page 17: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

5 10 15 20

0.00.4

0.8

1:20

β 2

5 10 15 20

0.00.4

0.8

1:20

β 3

5 10 15 20

−0.2

0.20.6

1.0

1:20

β 4

5 10 15 20

0.00.4

0.8

1:22

β 2

5 10 15 20

0.00.4

0.8

1:22

β 3

5 10 15 200.0

0.40.8

1:22

β 4

Figure: Comparison of β-estimates of lars results

Ruey S. Tsay Big Dependent Data 17 / 72

Page 18: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Theoretical justification

Focus on the particular series xt used. Some properties of theseries are

1 T−4∑Tt=1 x2

t ⇒∫ 1

0 W 2, where W =∫ 1

0 W (s)ds with W (s)the standard Brownian motion.

2 T−5/2∑Tt=1 xt ⇒

∫ 10 W

3 T−3∑Tt=1 xt ∆xt ⇒

∫ 10 WW

4 T−2∑Tt=1(∆xt )

2 ⇒∫ 1

0 W 2

Standardization may wash out the ∆xt−1 and ∆2xt−1 parts.

Ruey S. Tsay Big Dependent Data 18 / 72

Page 19: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Examples of big dependent data

1 Daily returns of U.S. stocks2 Demand of electricity every 30-m intervals3 Daily spreads of CDS (credit default swaps) of selected

companies4 Monthly unemployment rates of the 50 states of U.S.5 Interest rates of an economy6 Air pollution measurements of multiple locations and health

risk. Complex spatio-temporal data in general.

Ruey S. Tsay Big Dependent Data 19 / 72

Page 20: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

2012−2013

days

N(sto

cks)

0 100 200 300 400 500

6600

6700

6600

6700

size

Figure: Sample sizes of U.S. daily stock returns in 2012 and 2013:mean 6681, range = (6593,6774)

Ruey S. Tsay Big Dependent Data 20 / 72

Page 21: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Time series plot

−0.10 0.00 0.10

Densities of 2012

lnreturn

dens

ity

−0.10 0.00 0.10

Densities of 2013

lnreturn

dens

ity

Figure: Densities of daily log returns of U.S. stocks in 2012 and 2013.

Ruey S. Tsay Big Dependent Data 21 / 72

Page 22: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

1000 2000

0.00

0.01

0.02

0.03

0.04

0.05

demand

Monday

1000 2000

demand

Tuesday

1000 2000

demand

Wednesday

1000 2000

demand

Thursday

1000 2000

demand

Friday

1000 2000

demand

Saturday

1000 2000

demand

Sunday

Figure: Empirical densities of electricity demand, 30 minute intervals,from July 6, 1997 to March 31, 2007. Adelaide, Australia

Ruey S. Tsay Big Dependent Data 22 / 72

Page 23: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

1980 1990 2000 2010

510

15

year

urat

e

State UNRATE: 1976.1 to 2015.9

Figure: Time plots of monthly state unemployment rates of the U.S.from 1976.1 to 2015.9

Ruey S. Tsay Big Dependent Data 23 / 72

Page 24: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Some statistical methods

Goal: Extract useful information, including pooling.

1 Classification and cluster analysisK meansTree-based classificationModel-based classification

2 Factor models & ExtensionsOrthogonal factor modelApproximate factor modelDynamic factor modelConstrained factor models (column, row constraints)

X t = Rf tC + et

3 Generalizations of Lasso methods to dependent data, e.g.LASSO for nowcasting vs MIDAS

Ruey S. Tsay Big Dependent Data 24 / 72

Page 25: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Constrained factor models

Column (variable) constraint only: Tsai & Tsay (2010) Let z t bea k -dimensional time series

z t = Hωf t + εt , t = 1, . . . ,T

where H is a k × r known matrix, f t is m-dimensional commonfactor, ω is r ×m unknown loading parameters.For observed data in matrix form

Z = Fω′H ′ + ε

Ruey S. Tsay Big Dependent Data 25 / 72

Page 26: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

A simple illustration

Monthly log returns of 10 stocks from 2001 to 20111 Semi-conductor: TXN, MU, INTC, TSM2 Pharmaceutical: PFE, MRK, LLY3 Investment bank: JPM, MS, GS

The constraints H = [h1,h2,h3], where

h1 = (1,1,1,1,0,0,0,0,0,0)′

h2 = (0,0,0,0,1,1,1,0,0,0)′

h3 = (0,0,0,0,0,0,0,1,1,1)′

Ruey S. Tsay Big Dependent Data 26 / 72

Page 27: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Table: Estimation Results of Constrained and Orthogonal FactorModels

Stock Constrained Model: L = Hω Orthogonal Model: PCATick L1 L2 L3 Σε,i L1 L2 L3 Σε,i

TXN 0.76 0.26 0.27 0.28 0.79 0.20 0.32 0.24MU 0.76 0.26 0.27 0.28 0.67 0.36 0.29 0.34INTC 0.76 0.26 0.27 0.28 0.79 0.18 0.33 0.23TSM 0.76 0.26 0.27 0.28 0.80 0.27 0.16 0.26PFE 0.44 -0.68 0.10 0.34 0.49 -0.64 -0.03 0.35MRK 0.44 -0.68 0.10 0.34 0.40 -0.69 0.23 0.31LLY 0.44 -0.68 0.10 0.34 0.45 -0.70 0.06 0.31JPM 0.74 0.06 -0.43 0.27 0.72 0.02 -0.35 0.36MS 0.74 0.06 -0.43 0.27 0.76 0.05 -0.43 0.25GS 0.74 0.06 -0.43 0.27 0.75 0.12 -0.50 0.18e.v. 4.58 1.65 0.88 4.63 1.68 0.93

Variability explained: 70.6% Variability explained: 72.4%

Ruey S. Tsay Big Dependent Data 27 / 72

Page 28: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Both row and column constraints

: Tsai, et al (2016) T observations and k variables. Data matrixform

Z = F 1ω′1H ′ + GF 2ω

′2 + GF 3ω

′3H ′ + E ,

where G denotes a known T ×m row constraint matrix.

Ruey S. Tsay Big Dependent Data 28 / 72

Page 29: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Figure: The census regions and divisions of the United States

Ruey S. Tsay Big Dependent Data 29 / 72

Page 30: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

1998 2002 2006

7.88.4

New England

1998 2002 2006

8.69.0

9.4

Middle Atlantic

1998 2002 2006

9.29.6

East North Central

year

1998 2002 2006

8.28.8

9.4

West Noth Central

1998 2002 2006

10.2

10.8

South Atlantic

1998 2002 2006

8.69.0

9.4East South Central

year

1998 2002 2006

9.29.6

10.2

West South Central

1998 2002 2006

9.49.8

moutain

1998 2002 2006

9.49.8

Pacific

year

Figure: Time plots of monthly housing starts (in logarithms) of 9 U.S.divisions: 1997-2006.

Ruey S. Tsay Big Dependent Data 30 / 72

Page 31: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

F1[,1]

Time

0 20 40 60 80 100 120

−1.0

−0.5

0.0

0.5

1.0

F1[,2]

Time

0 20 40 60 80 100 120

−1.5

−0.5

0.5

1.0

1.5

F2[,1]

Time

2 4 6 8 10 12

−0.6

−0.2

0.0

0.2

0.4

F2[,2]

Time

2 4 6 8 10 12

−0.2

0.0

0.2

0.4

F3[,1]

2 4 6 8 10 12

−0.0

3−0

.01

0.01

0.03

F3[,2]

2 4 6 8 10 12

−0.8

−0.4

0.0

0.4

Figure: Time series plots of common factors for a DCF model of order(r,p,q) = (2,2,2) via maximum likelihood estimation.

Ruey S. Tsay Big Dependent Data 31 / 72

Page 32: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

−0.10

−0.05

0.00

0.05

New

Engla

nd

−0.10

−0.05

0.00

0.05

Midd

le At

lantic

−0.2

−0.1

0.00.1

East

North

Cen

tral

−0.20

−0.10

0.00

0.10

Wes

t Nor

th Ce

ntral

−0.08

−0.04

0.00

0.04

South

Atla

ntic

−0.20

−0.10

0.00

0.10

East

South

Cen

tral

−0.15

−0.05

0.05

Wes

t Sou

th Ce

ntral

−0.10

−0.05

0.00

0.05

Moun

tain

−0.06

−0.02

0.02

0 20 40 60 80 100 120

Pacif

ic

Index

ts(Gterm)

Figure: Time series plots for GF 2ω′2 of a fitted DCF model of order

(2,2,2). Maximum likelihood estimation is used.Ruey S. Tsay Big Dependent Data 32 / 72

Page 33: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

−0.4

−0.2

0.00.2

New

Engla

nd

−0.4

−0.2

0.00.2

Midd

le At

lantic

−0.4

−0.2

0.00.2

East

North

Cen

tral

−0.4

−0.2

0.00.2

Wes

t Nor

th Ce

ntral

−0.4

−0.2

0.00.2

0.4

South

Atla

ntic

−0.4

−0.2

0.00.2

0.4

East

South

Cen

tral

−0.4

−0.2

0.00.2

0.4

Wes

t Sou

th Ce

ntral

−0.4

−0.2

0.00.2

0.4

Moun

tain

−0.4

−0.2

0.00.2

0.4

0 20 40 60 80 100 120

Pacif

ic

Index

ts(Hterm)

Figure: Time series plots for F 1ω′1H ′ of a fitted DCF model of order

(2,2,2). Maximum likelihood estimation is used.Ruey S. Tsay Big Dependent Data 33 / 72

Page 34: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

−0.10

−0.05

0.00

0.05

New

Engla

nd

−0.10

−0.05

0.00

0.05

Midd

le At

lantic

−0.10

−0.05

0.00

0.05

East

North

Cen

tral

−0.10

−0.05

0.00

0.05

Wes

t Nor

th Ce

ntral

−0.3

−0.2

−0.1

0.00.1

South

Atla

ntic

−0.3

−0.2

−0.1

0.00.1

East

South

Cen

tral

−0.3

−0.2

−0.1

0.00.1

Wes

t Sou

th Ce

ntral

−0.10

−0.05

0.00

0.05

Moun

tain

−0.10

−0.05

0.00

0.05

0 20 40 60 80 100 120

Pacif

ic

Index

ts(GHterm)

Figure: Time series plots for GF 3ω′3H ′ of a fitted DCF model of order

(2,2,2). Maximum likelihood estimation is used.Ruey S. Tsay Big Dependent Data 34 / 72

Page 35: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Matrix-valued variables

Consider simultaneously n macroeconomic variables in kcountries

U.S. Italy Spain · · · CanadaGDP X11,t X12,t X13,t · · · X1k ,tUnem X21,t X22,t X23,t X2k ,tCPI X31,t X32,t X33,t X3k ,t

......

...M1 Xn1,t Xn2,t Xn3,t · · · Xnk ,t

On-going: only preliminary results are available. See Chen et al(2016)

Ruey S. Tsay Big Dependent Data 35 / 72

Page 36: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Classification

A possible approach: Use a two-step procedure1 Transform dependent big data into functions, e.g.

probability densities2 Apply classification methods to functional data

The density functions of daily log returns of U.S. stocks serveas an example.We can then classify the density functions to make statisticalinference

Ruey S. Tsay Big Dependent Data 36 / 72

Page 37: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Illustration of classification

Cluster Analysis of density functions

Consider the time series of density functions {ft (x)}.For simplicity, assume the densities are evaluated atequally-spaced grid point {x1 < x2 < . . . < xN} ∈ D withincrement ∆x . The data we have become{ft (xi)|t = 1, . . . ,T ; i = 1, . . . ,N}.

Using Hellinger distance (HD), we consider two methods:K meansTree-based classification

Ruey S. Tsay Big Dependent Data 37 / 72

Page 38: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Hellinger distance of two density functions

Let f (x) and g(x) be two density functions on the commondomain D ⊂ R. Assume both density functions are absolutelycontinuous w.r.t. the Lebesgue measure. The Hellingerdistance (HD) between f (x) and g(x) is defined as

H(f ,g)2 =12

∫D

(√f (x)−

√g(x)

)2dx = 1−

∫D

√f (x)g(x)dx

Basic properties:1 H(f ,g) ≥ 02 H(f ,g) = 0 if and only if f (x) = g(x) almost surely.

Ruey S. Tsay Big Dependent Data 38 / 72

Page 39: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

K-means method

For a given K , the K-means method seeks partitions of thedensities, say, C1, . . . ,CK , such that

1⋃K

k=1 Ck = {ft (x)}2 Ci

⋂Cj = ∅ for i 6= j

3 Sum of within-cluster variation V =∑K

k=1 V (Ck ) isminimized, where the within-cluster variation is

V (Ck ) =∑

t1,t2∈Ck

H(ft1 , ft2)2

It turns out this can easily be done by applying the K-meansmethod with squared Euclidean distance to the squared-rootdensities {

√ft (x)}.

Ruey S. Tsay Big Dependent Data 39 / 72

Page 40: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Example of K-means

Consider the 48 density functions of half-hour demand ofelectricity on Monday in Adelaide, Australia.With K = 4 clusters, we have

k Elements (time index) Calendar Hours1 17 to 44 8:00 AM to 10:00 PM2 15, 16, 45 to 48, 1, 2, 3 7:00 − 8:00 AM; 10:00 PM − 1:30 AM3 4, 5, 13, 14 1:30 − 2:30 AM; 6:00 − 7:00 AM4 6 to 12 2:30 − 6:00 AM

Result: capture daily activities, namely, (1) active period, (2)transition period, (3) light sleeping period, and (4) soundsleeping period.

Ruey S. Tsay Big Dependent Data 40 / 72

Page 41: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

1000 1500 2000 2500 3000

0.000

0.001

0.002

0.003

0.004

megawatts

dens

ity

Mondaydemand

Figure: Density functions of half-hour electricity demand on Mondayat Adelaide, Australia. The sample period is from July 6, 1997 toMarch 31, 2007.

Ruey S. Tsay Big Dependent Data 41 / 72

Page 42: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

1000 1500 2000 2500 3000

0.000

0.001

0.002

0.003

0.004

Megawatts

dens

ity

Figure: Results of K-means Cluster Analysis Based on SquaredHellinger Distance for Electricity Demands on Monday. Differentcolors denote different clusters.

Ruey S. Tsay Big Dependent Data 42 / 72

Page 43: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Tree-based classification

Let Z t = (z1t , . . . , zpt )′ denote p covariates. We use an iterative

procedure to build a binary tree, starting with the rootC0 = {ft (x)}.

1 For each covariate zit , let zi(j) be the j th order statistic1 Divide C0 into two sub-clusters

Ci,j,1 = {ft (x)|zit ≤ zi(j)}; Ci,j,2 = {ft (x)|zit > zi(j)}

2 Compute the sum of within-cluster variations

H(i , j) = V (Ci,j,1) + V (Ci,j,2)

3 Find the smallest j , say vi , such that H(i , vi ) = minj{H(i , j)}.2 Select i ∈ {1, . . . ,p}, say I, such that

H(I, vI) = mini{H(i , vi)}.3 Use covariate zIt with threshold vI to grow two new leaves,

i.e.C1,1 = CI,vI ,1, C1,2 = CI,vI ,2

Ruey S. Tsay Big Dependent Data 43 / 72

Page 44: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Tree-based procedure continued

Next, consider C1,1 and C1,2 as the root of a branch and applythe same procedure with their associated covariates to findcandidate for growth.The only modification is as follows: When considering C1,1, wetreat C1,2 as a leaf in computing the sum of within-clustervariations. Similarly, when considering C1,2 for further division,we treat C1,1 as a leaf in computing the sum of within-clustervariations.This growth-procedure is iterated until the number of clusters Kis reached.

Ruey S. Tsay Big Dependent Data 44 / 72

Page 45: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Example of tree-based classification

Consider the density functions of U.S. daily log stock returns in2012 and 2013.Using the first-differenced VIX index as the explanatory variableand K = 4, we obtain 4 clusters as follows:

(−∞,−0.73], (−0.73,0.39], (0.39,1,19], (1.19,∞).

The cluster sizes are 104, 259, 86, and 53, respectively.Note that positive zt signifies an increase in market volatility(uncertainty).

Ruey S. Tsay Big Dependent Data 45 / 72

Page 46: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

What drove the U.S. financial market?

The Fear Factor

days

VIX

0 100 200 300 400 500

1525

Change series of VIX

days

diff(V

IX)

0 100 200 300 400 500

−40

4

Figure: Time plots of the market fear factor (VIX index) and its changeseries: 2012-2013

Ruey S. Tsay Big Dependent Data 46 / 72

Page 47: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

−0.2 −0.1 0.0 0.1 0.2

020

4060

log−rtn

dens

ity

dvix > 1.19

−0.2 −0.1 0.0 0.1 0.2

020

4060

log−rtn

dens

ity

1.19 >= dvix > 0.39

−0.2 −0.1 0.0 0.1 0.2

020

4060

log−rtn

dens

ity

0.39 >= dvix > −0.73

−0.2 −0.1 0.0 0.1 0.2

020

4060

log−rtn

dens

ity

dvix <= −0.73

Figure: Results of Tree-based Cluster Analysis for the Daily Densitiesof Log Returns of the U.S. Stocks in 2012 and 2013. Thefirst-differenced series of the VIX index is used as the explanatoryvariable. The numbers of element for the clusters are 53, 86, 259,and 104, respectively. The cluster classification is given in theheading of each plot. Ruey S. Tsay Big Dependent Data 47 / 72

Page 48: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Model-based classification

Work directly on observed multiple time series1 Postulate a general univariate model for all time series,

e.g. an AR(p) model2 Time series in a cluster follow the same model: Pooling

data to estimate common parameters3 Time series in different clusters follow different models4 May be estimated by Markov chain Monte Carlo methods5 May employ scaled-mixture of normal innovations to

handle outliersHave been widely studied, e.g. Wang et al (2013) andFruehwirth-Schnatter (2011), among others.

Ruey S. Tsay Big Dependent Data 48 / 72

Page 49: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Application

1 Apply to monthly unemployment rates of 50 states of theU.S.

2 Use out-of-sample predictions to compare with othermethods, including lasso.

3 For 1-step to 5-step ahead predictions, the model-basedmethod works well in comparison. Wang et al (2013, JoF).

Ruey S. Tsay Big Dependent Data 49 / 72

Page 50: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

RMSE×104 MAE×104

Method m = 1 m = 2 m = 3 m = 4 m = 1 m = 2 m = 3 m = 4UAR 1616 1492 1791 2073 879 994 1268 1386VAR 2676 2095 2129 2759 1349 1353 1506 1624

Lasso25 1798 1833 2063 2504 1245 1250 1332 1401Lasso15 1714 1798 1855 2028 1186 1228 1296 1399G-Lasso 1877 1865 1882 1905 1291 1290 1306 1327

LVAR 1550 1716 1806 1904 1065 1298 1210 1355Pls10 1239 1531 1679 1873 909 1028 1263 1226Pls30 1395 1651 1835 1890 933 1092 1281 1320Pls50 1685 1871 2006 1967 940 1158 1304 1377Pls70 1914 2040 2182 1953 996 1222 1362 1432

Pls100 2187 2279 2313 2123 1099 1342 1480 1552Pcr10 1276 1829 2077 2108 890 1073 1247 1415Pcr30 1577 1837 2049 1769 888 1093 1261 1321Pcr50 1546 1805 2017 1759 880 1035 1209 1260Pcr70 1594 1837 2049 1769 886 1042 1221 1283

Pcr100 1649 2117 2202 2163 1068 1243 1324 1421MBC 1607 1703 1809 1961 885 1035 1225 1361rMBC 1225 1481 1691 1839 873 1027 1193 1295

Table: Root mean squared errors (RMSE) and mean absolute error(MAE) of 1-step to 4-step ahead out-of-sample forecasts for variousmodels applied to 50 state unemployment rates. The forecastingperiod is from January 2006 to September 2011. In the table, mdenotes the forecasting horizon. The models used are univariateAR(4) model (UAR), traditional VAR(4) model (VAR), VAR(4) withLASSO and s = 0.25 of L1 norm (Lasso25), VAR(4) with LASSO ands = 0.15 of L1 norm (Lasso15), group LASSO (G-Lasso), large VectorAutoregression of Song and Bickel (LVAR), partial least squares withthe first k components (Plsk , k = 10,30,50,70,100), principalcomponent regression with the first k components (Pcrk ,k = 10,30,50,70,100), model-based clustering (MBC), and robustmodel-based clustering (rMBC).

Ruey S. Tsay Big Dependent Data 50 / 72

Page 51: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Functional PCA: Singular value decomposition

1 A tool to study the time evolution of the return distributions2 Data set: In this particular instance, each density function

is evaluated at 512 points and we have

Y = [Yit = ft (xi)|i = 1, . . . ,N; t = 1, . . . ,T ]512×502

3 Perform singular value decomposition

Y = (N − 1)UDV ′

where Y denotes column-mean adjusted data matrix, U isan N × N unitary matrix, D is an N × T rectangulardiagonal matrix, and V is a T × T unitary matrix.

4 This is a simple form of functional PCA. [Large samples,smoothing of PC is not needed.]

Ruey S. Tsay Big Dependent Data 51 / 72

Page 52: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Scree plot

Comp.1 Comp.3 Comp.5 Comp.7 Comp.9

ScreeplotVa

rianc

es

050

0010

000

1500

020

000

Figure: Scree plot of PCA for daily return densities in 2012 and 2013.

Ruey S. Tsay Big Dependent Data 52 / 72

Page 53: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

The first 6 PC functions

−0.2 −0.1 0.0 0.1 0.2

040

080

0

lnreturn

pc1

−0.2 −0.1 0.0 0.1 0.2

−150

015

0

lnreturn

pc2

−0.2 −0.1 0.0 0.1 0.2

−100

0

lnreturn

pc3

−0.2 −0.1 0.0 0.1 0.2

−60

040

lnreturn

pc4

−0.2 −0.1 0.0 0.1 0.2

−30

020

lnreturn

pc5

−0.2 −0.1 0.0 0.1 0.2−2

00

20lnreturn

pc6

Figure: The first 6 PC functions for daily log return densities in 2012and 2013.

Ruey S. Tsay Big Dependent Data 53 / 72

Page 54: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

The next 6 PC functions

−0.2 −0.1 0.0 0.1 0.2

−15

010

lnreturn

pc7

−0.2 −0.1 0.0 0.1 0.2

−10

010

lnreturn

pc8

−0.2 −0.1 0.0 0.1 0.2

−10

05

lnreturn

pc9

−0.2 −0.1 0.0 0.1 0.2

−50

5

lnreturn

pc10

−0.2 −0.1 0.0 0.1 0.2

−60

4

lnreturn

pc11

−0.2 −0.1 0.0 0.1 0.2−4

02

lnreturn

pc12

Figure: The 7th-12th PC functions for daily log return densities in2012 and 2013.

Ruey S. Tsay Big Dependent Data 54 / 72

Page 55: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Meaning of PC functions? 1st

−0.2 −0.1 0.0 0.1 0.2

010

2030

40

lnreturn

pc1

Mean density pm first PC

Figure: Mean density ± 1st PC: Peak and tails: mean+ standardized1st PC (red).

Ruey S. Tsay Big Dependent Data 55 / 72

Page 56: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Meaning of PC functions? 2nd

−0.2 −0.1 0.0 0.1 0.2

010

2030

40

lnrturn

pc2

Mean density pm 2nd PC

Figure: Mean density ± 2nd PC: Midrange returns

Ruey S. Tsay Big Dependent Data 56 / 72

Page 57: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Meaning of PC functions? 3rd

−0.2 −0.1 0.0 0.1 0.2

010

2030

40

lnreturn

pc3

Mean density pm 3rd PC

Figure: Mean density ± 3rd PC: Curvature

Ruey S. Tsay Big Dependent Data 57 / 72

Page 58: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Approximate factor models

ft (x) =

p∑i=1

λt ,igi(x) + εt (x),

where gi(x) denotes the i th common factor and εt (x) is thenoise function.

1 A generalization of the orthogonal factor model, but allowsthe error functions to be correlated.

2 Only asymptotically identified under some regularityconditions.

3 FPCA provides a way to estimate approximate factormodels.

Ruey S. Tsay Big Dependent Data 58 / 72

Page 59: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Loadings of the first PC function

−4 −2 0 2 4

−0.05

−0.04

−0.03

−0.02

dvix

Load

ings

Figure: Scatter plot of loadings vs changes in VIX index. Red linedenotes lowess fit

Ruey S. Tsay Big Dependent Data 59 / 72

Page 60: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Functional PC via Thresholding

1 Zero appears to be a reasonable and natural threshold2 Regime 1: dvix ≥ 0 with 244 days. [Volatile (bad) state]3 Regime 2: dvix < 0 with 258 days. [Calm (good) state]4 Perform PCA of density functions for each regime.5 The differences are clearly seen.6 Leads to different approximate factor models for the

density functions

Ruey S. Tsay Big Dependent Data 60 / 72

Page 61: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Scree plots

Comp.1 Comp.3 Comp.5 Comp.7 Comp.9

dvix >= 0Va

rianc

es

040

0010

000

Comp.1 Comp.3 Comp.5 Comp.7 Comp.9

dvix < 0

Varia

nces

060

00

Figure: Scree plots of PCA for each regime

Ruey S. Tsay Big Dependent Data 61 / 72

Page 62: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

The first 6 PC functions

−0.2 −0.1 0.0 0.1 0.2

030

060

0

lnreturn

pc1

−0.2 −0.1 0.0 0.1 0.2

−100

010

0

lnreturn

pc2

−0.2 −0.1 0.0 0.1 0.2

−60

040

lnreturn

pc3

−0.2 −0.1 0.0 0.1 0.2

−20

20

lnreturn

pc4

−0.2 −0.1 0.0 0.1 0.2

−20

020

lnreturn

pc5

−0.2 −0.1 0.0 0.1 0.2−1

50

10lnreturn

pc6

Figure: The first 6 PC functions for daily log return densities for eachregime: red line is for the Calm state, Regime 2

Ruey S. Tsay Big Dependent Data 62 / 72

Page 63: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Approximate factor models

1 Use approximate factor models with the first 12 principalcomponent functions

2 Compare overall fits with/without thresholding3 For Regime 1 (positive dvix): randomly select day 174 For Regime 2 (negative dvix): randomly select day 420.5 Check: (a) observed vs fits and (b) residuals of

with/without thresholding6 With 12 components, both approaches fair well, but

thresholding provides improvements.

Ruey S. Tsay Big Dependent Data 63 / 72

Page 64: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Comparison: day 17 (in Regime 1)

−0.2 −0.1 0.0 0.1 0.2

015

30

lnreturn

dens

itydensity and its fits: day 17

−0.2 −0.1 0.0 0.1 0.2

−0.4

0.20.6

lnreturn

differ

ence

Error in approximation: red (Thr)

Figure: Top plot: observed (black), all (red), Thr (blue). Bottom plot:all (black), Thr (red)

Ruey S. Tsay Big Dependent Data 64 / 72

Page 65: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Comparison: day 420 (in Regime 2)

−0.2 −0.1 0.0 0.1 0.2

010

2030

lnreturn

dens

itydensity and its fits: day 420

−0.2 −0.1 0.0 0.1 0.2

−0.6

0.0

0.4

lnreturn

erro

rs

Errors of approximation: day 420, red(Thr)

Figure: Top plot: observed (black), all (red), Thr (blue). Bottom plot:all (black), Thr (red)

Ruey S. Tsay Big Dependent Data 65 / 72

Page 66: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Lasso and beyond

1 Need to exploit parsimony, beyond sparsity2 Need to take into account prior knowledge. We have

accumulated lot of knowledge in diverse scientific areas.How to take advantages of this knowledge?

3 Variable selection is not sufficient. More importantly, whatare the proper measurements to take? What questions cana given big data answer?

Ruey S. Tsay Big Dependent Data 66 / 72

Page 67: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

An illustration

Every country has many interest series1 have different maturities2 serve different financial purposes3 What is the information embedded in those interest rate

series?Consider U.S. weekly constant maturity interest rates

1 From January 8, 1982 to October 30, 20152 Maturities: 3m, 6m, 1y, 2y, 3y, 5y, 7y, 10y, and 30y∗

Ruey S. Tsay Big Dependent Data 67 / 72

Page 68: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

1985 1990 1995 2000 2005 2010 2015

05

1015

Figure: Time plots of U.S. weekly interest rates with differentmaturities: 1/8/1982 to 10/30/2015.

Ruey S. Tsay Big Dependent Data 68 / 72

Page 69: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9

p1

020

4060

80

Figure: Screeplot of U.S. weekly interest rates.

Ruey S. Tsay Big Dependent Data 69 / 72

Page 70: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

1985 1990 1995 2000 2005 2010 2015

−10

010

2030

Figure: Time plots of the first four principal components of U.S.weekly interest rates

Ruey S. Tsay Big Dependent Data 70 / 72

Page 71: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Implication?

In lasso-type of analysis,1 should we use the interest rate series directly? Even with

group lasso.This leads to sparsity.

2 should we apply PCA first, then use the PCs?This leads to parsimony.

3 should we develop other possibilities? Fused lasso?Factor models?

Ruey S. Tsay Big Dependent Data 71 / 72

Page 72: Analysis of Big Dependent Data in Economics and Finance · 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off

Concluding Remark

1 Big dependent data appear in many applications2 Methods developed for independent big data may fail3 Statistical methods for big dependent data are relatively

under-developed4 Some new challenges emerge, new opportunities exist5 Simple modifications of the traditional methods might work

well6 Both theory and methods require further research

Ruey S. Tsay Big Dependent Data 72 / 72


Recommended