Functional Data Perspectives for TrafficMonitoring and Forecasting
Kehui Chen
Department of Statistics, University of Pittsburgh
Joint work with Hans-Georg Muller at UC Davis
November 11, 2014, INFORMS
Traffic Monitoring and Forecasting
• Dedicated equipment: loop detectors, cameras and radars.• GPS-enabled phone based traffic monitoring system• New types of data and new data analysis approaches:
Functional data perspectives for traffic forecasting.• “Mobile Century” Experiment: Joint UC Berkeley - Nokia
project.J. Herrera, D. Work, R. Herring, X. Ban, Q. Jacobson and A.Bayen (2010)
• The follow-up project ‘Mobile Millennium’ is generating moredata. http://traffic.berkeley.edu.
Individual Trip Data
• Decoto Road to the south (Postmile 21) and Winton Avenue tothe north (Postmile 27.5)
• Combine data(tl,sl,Vl)l=1,...,N ,
where N = ∑i Ni.• One can apply a two-dimensional smoothing procedure for these
combined data to recover a smooth random velocity field V(t,s)along the highway as an exploratory step.
Observed and Future Velocity Field
10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:0021
22
23
24
25
26
27
Time
Pos
tmile
13.56:30 14:1013:4021
22
23
24
25
26
27
0
10
20
30
40
50
60
70
80
mphObserved
X(s)
Y(t)
Unobserved
Functional Data Perspective
• Assume underlying latent smooth random process that generatesthe data. Data points could be densely and regularly spaced, orsparsely and irregularly sampled. Measurements may becontaminated with noises.
• Recover the underlying X(s) based on functional principalcomponent analysis not individual smoothing: borrow strengthfrom entire sample.
• Modeling conditional distributions of Y(t) given X(s): predictedcurve and global prediction bands.
Functional Principal Component Analysis
• X(s) is a second order random process,mean function µ(s) ∈ L2(T ),continuous covariance function G(s1,s2) = cov(X(s1),X(s2)).
• G(s1,s2) = ∑∞k=1 λkφk(s1)φk(s2), eigenvalues
λ1 ≥ λ2, · · · ,λk, · · · ≥ 0, and eigenfunctions φk(t).• Karhunen-Loeve expansion: double orthogonal.
X(s) = µ(s)+∞
∑k=1
ξkφk(s)
• Best linear expansion with K components:
X(s)≈ µ(s)+K
∑k=1
ξkφk(s).
Estimation
•
X(s) = µ(s)+K
∑k=1
ξkφk(s)
• Pool all the sample. Smoothing of mean and covariancefunctions leads to eigenfunctions/eigenvalues.
• Conditional expectation method to estimate the components ξik.For sparse case, best linear unbiased prediction under Gaussianassumption; for dense data, it is asymptotically equivalent to thenumerical approximation of ξik =
∫T (Xi(s)−µ(s))φk(s)ds.
• Yao et al. (2005), Hall et al. (2006), Li and Hsing (2010), Caiand Yuan (2010).
Predictor Functions
21 22 23 24 25 26 27−60
−50
−40
−30
−20
−10
0
10
20
10:28:49
postmile
Rel
ativ
eSpe
ed (m
ph)
21 22 23 24 25 26 27−60
−50
−40
−30
−20
−10
0
10
20
11:20:30
postmile
Rel
ativ
eSpe
ed (m
ph)
21 22 23 24 25 26 27−60
−50
−40
−30
−20
−10
0
10
20
12:2:55
postmile
Rel
ativ
eSpe
ed (m
ph)
21 22 23 24 25 26 27−60
−50
−40
−30
−20
−10
0
10
20
17:15:54
postmile
Rel
ativ
eSpe
ed (m
ph)
Prediction for Response Functions
• X(s)≈ µX(s)+∑Kk=1 ξkφk(s)
Y(t)≈ µY(t)+∑Pj=1 ζjψj(t)
• Prediction of Mean Function, FAM (Muller and Yao 2008)E(Y(t)|X)≈ µY(t)+∑
Pj=1 ∑
Kk=1 fjk(ξk)ψj(t)
• cov(Y(t1),Y(t2) | X)≈ ∑
Pj=1 var(ζj | X)ψj(t1)ψj(t2)
≈ GYY(t1, t2)+∑Pj=1 ∑
Kk=1gjk(ξk)− f 2
jk(ξk)ψj(t1)ψj(t2)
• If X(s) is a Gaussian process,fjk(ξk) = E(ζj | ξk), gjk(ξk) = E(ζ 2
j − γj | ξk)
Prediction for Response Functions
• X(s)≈ µX(s)+∑Kk=1 ξkφk(s)
Y(t)≈ µY(t)+∑Pj=1 ζjψj(t)
• Prediction of Mean Function, FAM (Muller and Yao 2008)E(Y(t)|X)≈ µY(t)+∑
Pj=1 ∑
Kk=1 fjk(ξk)ψj(t)
• cov(Y(t1),Y(t2) | X)≈ ∑
Pj=1 var(ζj | X)ψj(t1)ψj(t2)
≈ GYY(t1, t2)+∑Pj=1 ∑
Kk=1gjk(ξk)− f 2
jk(ξk)ψj(t1)ψj(t2)
• If X(s) is a Gaussian process,fjk(ξk) = E(ζj | ξk), gjk(ξk) = E(ζ 2
j − γj | ξk)
Prediction for Response Functions
• X(s)≈ µX(s)+∑Kk=1 ξkφk(s)
Y(t)≈ µY(t)+∑Pj=1 ζjψj(t)
• Prediction of Mean Function, FAM (Muller and Yao 2008)E(Y(t)|X)≈ µY(t)+∑
Pj=1 ∑
Kk=1 fjk(ξk)ψj(t)
• cov(Y(t1),Y(t2) | X)≈ ∑
Pj=1 var(ζj | X)ψj(t1)ψj(t2)
≈ GYY(t1, t2)+∑Pj=1 ∑
Kk=1gjk(ξk)− f 2
jk(ξk)ψj(t1)ψj(t2)
• If X(s) is a Gaussian process,fjk(ξk) = E(ζj | ξk), gjk(ξk) = E(ζ 2
j − γj | ξk)
Prediction for Response Functions
• X(s)≈ µX(s)+∑Kk=1 ξkφk(s)
Y(t)≈ µY(t)+∑Pj=1 ζjψj(t)
• Prediction of Mean Function, FAM (Muller and Yao 2008)E(Y(t)|X)≈ µY(t)+∑
Pj=1 ∑
Kk=1 fjk(ξk)ψj(t)
• cov(Y(t1),Y(t2) | X)≈ ∑
Pj=1 var(ζj | X)ψj(t1)ψj(t2)
≈ GYY(t1, t2)+∑Pj=1 ∑
Kk=1gjk(ξk)− f 2
jk(ξk)ψj(t1)ψj(t2)
• If X(s) is a Gaussian process,fjk(ξk) = E(ζj | ξk), gjk(ξk) = E(ζ 2
j − γj | ξk)
Global Prediction Bands
• YX(t)≈ µY|X(t)+∑Pj=1 ζj(X)ψj(t).
•
ΩX,α = (ζ1(X), . . . ,ζP(X)) :P
∑j=1
ζj(X)2
γj(X)≤ C 2
X,α,
such thatP(ζ X ∈ΩX,α) = 1−α.
• The upper bound function U(t) is found by solving themaximization problems
maxζ X∈ΩX,α
µY|X(t)+
P
∑j=1
ζj(X)ψj(t)
, for all 0 < t < 1.
Global Prediction Bands
• U(t) = µY|X(t)+C 2
X,α ∑Pj=1 γj(X)ψ2
j (t)1/2
= µY|X(t)+CX,α ˆvar(YX(t))1/2 .
• In the case that (ζ1(X), . . . ,ζP(X)) are jointly Gaussian,
CX,α = Cα =√
χ2P,1−α
.
• In general case: Find a constant Cα and regions ΩX,α ,
E[P(ζ X ∈ΩX,α
)] = 1−α.
Estimated 90% Prediction Regions
0 50 100 150 200 250 300
−80
−60
−40
−20
0
20
40
60
10:28:49
Time (sec)
Rel
ativ
eSpe
ed (m
ph)
Y(t)µY|X(t)
U(t), L(t)
0 50 100 150 200 250 300
−80
−60
−40
−20
0
20
40
60
11:20:30
Time (sec)
Rel
ativ
eSpe
ed (m
ph)
Y(t)µY|X(t)
U(t), L(t)
0 50 100 150 200 250 300
−80
−60
−40
−20
0
20
40
60
12:2:55
Time (sec)
Rel
ativ
eSpe
ed (m
ph)
Y(t)µY|X(t)
U(t), L(t)
0 50 100 150 200 250 300
−80
−60
−40
−20
0
20
40
60
17:15:54
Time (sec)
Rel
ativ
eSpe
ed (m
ph)
Y(t)µY|X(t)
U(t), L(t)
Extensions
• Other types of data from GPS-enabled phones: VTLs.• Dynamic updating: prediction based on the current time and
location.• larger networks of roads: divide and conquer.• Other functional data methods.
• K. Chen and H.G. Muller (2014), “ Modeling conditional distributions forfunctional responses, with application to traffic monitoring via GPS-enabledmobile phones”, Technometrics, 56(3), 347-358.
• Code available (written in Matlab), PACE package version 2.17,http://www.stat.ucdavis.edu/PACE/
• J. Herrera, D. Work, R. Herring, X. Ban, Q. Jacobson and A. Bayen (2010),
“Evaluation of Traffic Data Obtained via GPS-Enabled Mobile Phones: The
Mobile Century Field Experiment,” Transportation Research C, 18, 568-583.
THANK YOU!