Statistical Analysis of Aircraft Trajectories:a Functional Data Analysis Approach
Florence NICOL
ENAC-DEVI
ALLDATA 2017April 26, 2017
enac-bleu3.jpg
Outline
Principal Component Analysis
Functional DataWhat are functional data?Functional Data Analysis
Functional Principal Component AnalysisWhat is this?Estimation Methods
Application to Aircraft TrajectoriesUnivariate FPCAMultivariate FPCA
Conclusion
2 / 30
enac-bleu3.jpg
Principal Component Analysis
Outline
Principal Component Analysis
Functional Data
Functional Principal Component Analysis
Application to Aircraft Trajectories
Conclusion
3 / 30
enac-bleu3.jpg
Principal Component Analysis
What is the best representation?An old problem
Figure: Fragment from the Tomb of Nebamun, Thebes, British Museum.
4 / 30
enac-bleu3.jpg
Principal Component Analysis
What is the best representation?An old problem
Figure: Mug shot of American gangster Al Capone, 1931.
5 / 30
enac-bleu3.jpg
Principal Component Analysis
What is the best representation?An old problem
Figure: Marie-Therese portrait, Picasso.6 / 30
enac-bleu3.jpg
Principal Component Analysis
Main ideas
Context
• a large number of numeric variables possibly correlated,
• analyze data variability by studying the covariance structure.
What do we want to do?
• create a small number of new descriptors,
• capture the maximum amount of variation in the data.
How can we do that?
• Looking for new orthogonal directionssuch that the variance of the projecteddata is maximal.
•
Δ2
X1
X3
Δ1
•
••
•
••
X
•Δ3
•
•
•
•
•
X2
X
XX
X
XXX
X
X
X
X
X
7 / 30
enac-bleu3.jpg
Functional Data
Outline
Principal Component Analysis
Functional DataWhat are functional data?Functional Data Analysis
Functional Principal Component Analysis
Application to Aircraft Trajectories
Conclusion
8 / 30
enac-bleu3.jpg
Functional Data
What are Functional Data?
Rather than a sample of points xi , i = 1, . . . , n, we observe a sampleof entangled curves xi (t), images or functions.
40 60 80 100 120
−20
0−
150
−10
0−
500
5010
0
Longitude−Latitude
X(t) (Nm)
Y(t
) (N
m)
Figure: Sample of aircraft trajectories (Paris-Toulouse).9 / 30
enac-bleu3.jpg
Functional Data
What are functional data?
Functional variable (f.v.)
X = {X (t), t ∈ J} is a functional space H-valued random variable
• a continuous stochastic process on a compact interval J,
• H is the separable Hilbert space L2(J).
Observed dataA functional dataset x1, . . . , xn is n realizations of the f.v. X (or theobservation of n f.v. X1, . . . ,Xn identically distributed as X ).
Discretized observed dataI Functional data xi (t) are observed discretely
xi (tij), i = 1, . . . , n, j = 1, . . . ,Ni .
I Often given at the same time arguments t1, . . . , tN .10 / 30
enac-bleu3.jpg
Functional Data
Functional Data AnalysisInference about functional data
Multivariate statistical techniques are inadequate !
• They don’t take into account to the functional nature of data.
• A hard drawback: the curse of dimensionality N � n.
Extend multivariate methods to the functional case
• Functional principal component analysis (FPCA), Clustering.
• Functional linear models, Functional analysis of variance... etc
Generalization is not trivial!
• Data live in infinite dimensional spaces.
• Two types of errors:
I sampling error in random functions drawn from an underlying process,I measurement error when functions are discrete noisy sample paths.
11 / 30
enac-bleu3.jpg
Functional Principal Component Analysis
Outline
Principal Component Analysis
Functional Data
Functional Principal Component AnalysisWhat is this?Estimation Methods
Application to Aircraft Trajectories
Conclusion
12 / 30
enac-bleu3.jpg
Functional Principal Component Analysis
Generalization to the functional case
Multivariate PCA Functional PCAindividual vector xi ∈ Rp function xi (t) ∈ Heigenvector u eigenfunction γ
mean vector x ∈ Rp mean function µ(t) ∈ Hcovariance matrix covariance operator
inner product inner product
〈u, xi 〉 = uT xi 〈γ, xi 〉 =
∫Jγ(t)xi (t)dt
13 / 30
enac-bleu3.jpg
Functional Principal Component Analysis
Generalization to the functional case
Maximization of varianceThe weight function γ1 maximizes the variance of the projected data
γ1 = argmax‖γ1‖=1
Var ( 〈γ1,X 〉 ) .
The subsequent weight functions γk can be found analogously subjectto the additional constraint (orthogonality)
〈γi , γk〉 =
∫Jγi (t)γk(t) = 0, i < k.
I γ1, γ2, . . . are called functional principal components,I orthogonality constraints ensure that γi indicates something new,I the amount of variation λi = Var ( 〈γi ,X 〉 ) will decline stepwise.
14 / 30
enac-bleu3.jpg
Functional Principal Component Analysis
Estimation
Let X1, . . . ,Xn be a sample of independent functional variables.
Karhunen-Loeve representation
Xj(t) =n∑
i=1
Aij γi (t), j = 1, . . . , n.
Interpretation
• Principal components γi are modes of variation of individualtrajectories.
• Random scores Aij = 〈γi ,Xj〉 are proportionality factors: measurethe influence of the principal component γi on the shape of Xj .
15 / 30
enac-bleu3.jpg
Functional Principal Component Analysis
Estimation
Reduction dimension tool: a small number L� n is needed
Xj(t) 'L∑
i=1
Aij γi (t)+µ(t).
I A small number L of components is often sufficient to account fora large part of variation.I High values of L are associated with high frequency componentswhich represent the sampling noise.
Quality of representation: % of total variation (Scree Plot)
τi =λi∑ni=1 λi
, τCL =
∑Lk=1 λk∑ni=1 λi
.
16 / 30
enac-bleu3.jpg
Application to Aircraft Trajectories
Outline
Principal Component Analysis
Functional Data
Functional Principal Component Analysis
Application to Aircraft TrajectoriesUnivariate FPCAMultivariate FPCA
Conclusion
17 / 30
enac-bleu3.jpg
Application to Aircraft Trajectories
Univariate FPCA: Flight LevelRoute: Paris Orly airport −→ Toulouse airport
Flight level
I Aircraft type: A319(25%), A320(41%), A321(24%), B733 (4%),B463 (2%) AT type (4%).I Aircraft trajectories measured at 4 seconds intervals.
18 / 30
enac-bleu3.jpg
Application to Aircraft Trajectories
Univariate FPCA: Flight LevelEffects on the mean trajectory of adding (+) or substracting (-) PC
●
●
●
●
●● ● ● ● ● ● ● ● ● ●
2 4 6 8 10 12 14
8890
9294
9698
100
PCA Scree Plot
Principal Components
Var
ianc
e (%
)
0 50 100 150 200−
0.15
−0.
10−
0.05
0.00
0.05
0.10
Principal Components
time (sec)
PC1 88.1 %PC2 6.7 %PC3 2.6 %PC4 1.3 %
I 4 components = 98, 7% of total variation.
19 / 30
enac-bleu3.jpg
Application to Aircraft Trajectories
Univariate FPCA: Flight LevelEffects on the mean trajectory of adding (+) or substracting (-) PC
Overall effect Takeoff effect
88.1% 6.7%
0 50 100 150 200
2040
6080
PC1 88.1 %
time (sec)
Z(t
) (*
100
feet
)
++
++
+++++++++++++++
++
++
++
−−
−−
−−−−−−−−−−−−
−−
−−
−−
−−
−
0 50 100 150 200
2040
6080
PC2 6.7 %
time (sec)
Z(t
) (*
100
feet
)
++
+++++++++++++
++
++
++
++
++
−−
−−
−−
−−−−−−−−−−−−−−
−−
−−
−
0 50 100 150 200
2040
6080
PC3 2.6 %
time (sec)
Z(t
) (*
100
feet
)
++
++
++++++++
++
++
++++++++ +
−−
−−
−−−−−−−−−−−−−
−−
−−
−−
−
−
0 50 100 150 200
2040
6080
PC4 1.3 %
time (sec)
Z(t
) (*
100
feet
)
++
++
++
++
++
++++++++++
++
++
+
−−
−−−−−−−−−−−−−
−−
−−
−−−−− −
First step effect Time shift effect
2.6% 1.3%
20 / 30
enac-bleu3.jpg
Application to Aircraft Trajectories
Univariate FPCA: Flight LevelScore scatterplots by aircraft types
−80 −60 −40 −20 0 20 40
−15
−10
−5
05
1015
Score Plot
PC1
PC
2
1
1
1
1
1
1
1
11
1
1
1
1
1
1
1
1
1
1
11
1
1
1
1
11
1
1
11
1
11
2
2
2
2
2
2
2
2
2
2
22
22
2
2 2
2
2
2 2
2
2
2
2
333 3
33
333
3
3
3
33
3
3
3
3
33
33
3
33
3
3
3
33333
3
33 3
33
3
3
33
333
3
4
4
44
4
44
4
4
5
5
5
5
5
5
5
5
5
5
5
6
6
6
6
6
6
6
6
6
66
6
6
6
6
66
66
123456
Aircraft Type
A319A320A321ATB463B733
−80 −60 −40 −20 0 20 40
−10
−5
05
1015
20
Score Plot
PC1
PC
31
11
1
1
1
1
1
1
11
1
1
1
1
1
1 11
11 1
11
11
1
1
1
11
1
1
1
2
2
2
2
2
2
2
2
2
2
22
2
22
2
222
2
22
22
2
33
3
3
3
3
3
3
3
3333
3
3
3
33
3
33
33 3
3
33 3
33
33
3
3
33 33
3
3
3
33
3
3
33
4
4 44
4
4
44
45
5
5
55
5
5
5
5
5
5
6
6
6
66
6
6
6
66
66
66
6
666
6
−6 −4 −2 0 2 4 6
−10
−5
05
1015
20
Score Plot
PC3
PC
4
1
11
1
1
1
1
1
1
11
1
1
1
1
1
111
11 1
11
11
1
1
1
11
1
1
1
2
2
2
2
2
2
2
2
2
2
22
2
22
2
2 22
2
22
22
2
33
3
3
3
3
3
3
3
33
33
3
3
3
33
3
33
33 3
3
33 3
3 3
3 3
3
3
3333
3
3
3
33
3
3
33
4
4 44
4
4
44
45
5
5
55
5
5
5
5
5
5
6
6
6
66
6
6
6
66
66
6 6
6
666
6
I detect outliers and clusters in the data,I interpret clusters,I explain individual behaviour relatively to modes of variation.
21 / 30
enac-bleu3.jpg
Application to Aircraft Trajectories
Univariate FPCA: Flight Level
Table: Individual scores by aircraft type
PC1 PC2 PC3 PC4 OutlierAircraft type Overall Take-off First step Time shift
AT, E120, B463 + + + +A320 0 - - -B733 - - + + *A319 - + 0 0A321 + - - 0
22 / 30
enac-bleu3.jpg
Application to Aircraft Trajectories
Multivariate FPCARoute: Paris Charles de Gaulle airport −→ Toulouse airport
Longitude-Latitude trajectories
40 60 80 100 120
−20
0−
150
−10
0−
500
5010
0
Longitude−Latitude
X(t) (Nm)
Y(t
) (N
m)
Route
Paris Charles de Gaullewwwwwwwwwwww�Toulouse
I Aircraft type: A319(25%), A320(41%), A321(24%), B733 (4%),B463 (2%) AT type (4%).I Aircraft trajectories measured at 4 seconds intervals.
23 / 30
enac-bleu3.jpg
Application to Aircraft Trajectories
Multivariate FPCAPrincipal Components (PC): Principal components in X and Y -coordinates
0 1000 2000 3000 4000
−0.
02−
0.01
0.00
0.01
0.02
0.03
Principal Components X(t)
time (sec)
PC1 PC2 PC3 PC4
0 1000 2000 3000 4000
−0.
02−
0.01
0.00
0.01
0.02
Principal Components Y(t)
time (sec)
Principal Component Total Var X-coord Y-coordPC1 Overall effect 58% 2% 98%PC2 Landing effect 14.7% 48% 52%PC3 Separation effect 12.9% 86% 14%PC4 Change procedure effect 6% 66% 34%
24 / 30
enac-bleu3.jpg
Application to Aircraft Trajectories
Multivariate FPCAEffects on the mean trajectory of adding (+) or substracting (-) PC
Overall effect Landing effect
58% 14.7%
50 60 70 80 90 100
−20
0−
150
−10
0−
500
5010
0
PC1 58.7 %
X(t) (en Nm)
Y(t
) (e
n N
m)
50 60 70 80 90 100
−20
0−
150
−10
0−
500
5010
0
PC2 14.7 %
X(t) (en Nm)
Y(t
) (e
n N
m)
50 60 70 80 90 100
−20
0−
150
−10
0−
500
5010
0
PC3 12.9 %
X(t) (en Nm)
Y(t
) (e
n N
m)
50 60 70 80 90 100
−20
0−
150
−10
0−
500
5010
0
PC4 6 %
X(t) (en Nm)
Y(t
) (e
n N
m)
Separation effect Procedure effect
12.9% 6%
25 / 30
enac-bleu3.jpg
Application to Aircraft Trajectories
Multivariate FPCAMean cluster Trajectories and the overall mean (black curve)
40 60 80 100 120
−20
0−
150
−10
0−
500
5010
0
Three clusters
X(t) (Nm)
Y(t
) (N
m)
Clusters
C1C2C3
Table: k-means on scores
Aircraft type Cluster 1 Cluster 2 Cluster 3A319 15 18 0A320 14 14 1A321 25 28 0AT 2 0 8B463 10 0 2B733 22 1 2
26 / 30
enac-bleu3.jpg
Application to Aircraft Trajectories
Multivariate FPCAMean cluster Trajectories (Route: Paris Orly airport � Toulouse airport)
Longitude-Latitude trajectoriesRoute
Paris Orlywwwwwwwwwwww�Toulouse
Route
Paris Orly~wwwwwwwwwwwwToulouse
I FPCA is able to separate the two clusters located at the right side:a standard approach procedure and a short one at Toulouse airport.27 / 30
enac-bleu3.jpg
Conclusion
Outline
Principal Component Analysis
Functional Data
Functional Principal Component Analysis
Application to Aircraft Trajectories
Conclusion
28 / 30
enac-bleu3.jpg
Conclusion
Conclusion and future works
A dimension reduction toolI An empirical basis function expansion.I Dimension reduction: use score vectors instead of functions xi (t).
A powerful visualization tool
I Explore the ways in which trajectories vary.I Reveal clusters and atypical trajectories
Other applications
I Generalization to 3D trajectories.I Generate samples of trajectories.I Reduce the dimension of simulated models.
29 / 30
enac-bleu3.jpg
Short bibliography
P. Besse, J.O. Ramsay, Principal component analysis of sampledcurves, Psychometrika, 51 (1986), 285311.
Bookstein F.L., Morphometric tools for landmark data:geometryand biology, Cambridge: Cambridge University Press, 1991.
J. Dauxois, A. Pousse, and Y. Romain, Asymptotic theory for theprincipal component analysis of a vector random function: Someapplications to statistical inference. Journal of MultivariateAnalysis, 12 (1982), 136154.
F. Nicol, Functional Principal Component Analysis of AircraftTrajectories, ISIATM, July 810, 2013, Toulouse, France.
J.O. Ramsay, G. Hooker and S. Graves, Functional data analysiswith R and Matlab, Springer-Verlag, New York, 2009.
30 / 30