1
Functional Data Analysis in Matlab and R
James Ramsay, Professor, McGill U., Montreal
Hadley Wickham, Grad student, Iowa State, Ames, IA
Spencer Graves, Statistician, PDF Solutions, San José, CA
2
Outline • What is Functional Data Analysis?
• FDA and Differential Equations
• Examples: – Squid Neurons– Continuously Stirred Tank Reactor (CSTR)
• Conclusions
• References
3
What is FDA? • Functional data analysis is a collection of
techniques to model data from dynamic systems – possibly governed by differential equations – in terms of some set of basis functions
• The ‘fda’ package supports the use of 8 different types of basis functions: constant, monomial, polynomial, polygonal, B-splines, power, exponential, and Fourier.
4
Observations of different lengths • Observation vectors of different lengths
can be mapped to coordinates of a fixed basis set
• All examples in the ‘fda’ package have the same numbers of observations
• No conceptual obstacles to handling observation vectors of different lengths
5
Time Warping
• “start” and “stop” are sometimes determined by certain transitions
• Example: growth spurts in the life cycle of various species do not occur at exactly the same ages in different individuals (even within the same species)
6
10 Girls: Berkeley Growth Study• Tuddenham, R. D.,
and Snyder, M. M. (1954) "Physical growth of California boys and girls from birth to age 18", _University of California Publications in Child Development_, 1, 183-364.
ooo
ooo
oo
oo
ooooooooooooooooooooo
5 10 15
8010
012
014
016
018
0
age
Hei
ght
(cm
.)
ooooo
oo
oo
ooo
ooooooooooooooooooo
oooo
oo
oo
oo
ooooooooooooooooooooo
oooo
oo
oo
oo
ooooooooooooooooooooo
ooo
ooo
oo
oo
oooooooooo
ooooooooooo
oooo
oo
oo
oo
ooooooo
oooooooooooooo
ooo
ooo
oo
oo
ooooooooooooooooooooo
ooooo
oo
oo
oo
oo
oo
oooooooooooooooo
ooo
ooo
oo
oo
ooooooooooooooooooooo
oooo
o
oo
oo
ooooooo
ooooooooooooooo
7
Acceleration • Growth spurts
occur at different ages
• Average shows the basic trend, but features are damped by improper registration
ooo
ooo
oo
oo
ooooooooooooooooooooo
5 10 1580
100
120
140
160
180
age
Hei
ght
(cm
.)
ooooo
oo
oo
ooo
ooooooooooooooooooo
oooo
oo
oo
oo
ooooooooooooooooooooo
oooo
oo
oo
oo
ooooooooooooooooooooo
ooo
ooo
oo
oo
oooooooooo
ooooooooooo
oooo
oo
oo
oo
ooooooo
oooooooooooooo
ooo
ooo
oo
oo
ooooooooooooooooooooo
ooooo
oo
oo
oo
oo
oo
oooooooooooooooo
ooo
ooo
oo
oo
ooooooooooooooooooooo
oooo
o
oo
oo
ooooooo
ooooooooooooooo
5 10 15
-4-3
-2-1
01
2
age
Gro
wth
acc
eler
atio
n (c
m/y
ear^
2)
8
Registration • register.fd all
to the mean
• Not perfect, but better
5 10 15
-4-3
-2-1
01
2
ageG
row
th a
ccel
erat
ion
(cm
/yea
r^2)
5 10 15
-4-3
-2-1
01
2
warped age
Gro
wth
acc
eler
atio
n (c
m/y
r^2)
9
A Stroll Along the Beach
• Light intensity over 365 days at each of 190*143 = 27140 pixels was – smoothed – functional principal components
• http://www.stat.berkeley.edu/~wickham/userposter.pdf
10
Other fda capabilities
• Correlations – even with
series of different lengths!
• Phase plane plots – good
estimates of derivatives
Month
Me
an
Te
mp
era
ture
Jan Apr Jun Sep Dec
-10
05
15
j F
m
A
M
JJ A
S
O
N
D
Montreal average daily tempdeviation from average (C)
-10 -5 0 5 10 15 20
-0.0
06
0.0
00
0.0
06
Temperature (C)
Acc
ele
ratio
n
jF
m
A
M JJ
AS
O
N
D
j
Montreal average daily tempdeviation from average (C)
afda-ch03.Rfda-ch01.Rfda-ch02.R
11
Script files for fda books • Ramsay and Silverman
– (2002) Applied Functional Data Analysis (Springer)
– (2006) Functional Data Analysis, 2nd ed. (Springer)
• ~R\library\fda\scripts– Some but not all data sets discussed in the
books are in the ‘fda’ package – Script files are available to reproduce some but
not all of the analyses in the books. – plus CSTR demo
12
FDA and Differential Equations
• Many dynamic systems are believed to follow processes where output changes are a function of the outputs, x, and inputs, u (and unknown parameters ):
Tttt ,0,|, θux,fx
• Matlab was designed in part for these types of models
13
Squid Neurons • FitzHugh (1961) - Nagumo et al. (1962) Equations:
Estimate a, b and c in: cbRaVR
RVVcV
33
Vol
tage
acr
oss
Axo
n M
embr
ane
Rec
over
y vi
a O
utw
ard
Cur
rent
s
V
R
14
Tank Reactions • Continuously Stirred Tank Reactor (CSTR)
Tem
pera
ture
C
once
ntra
tion
15
Functional Data Analysis Process1. Select Basis Set
2. Select Smoothing Operator – e.g., differential equation– equivalent to a Bayesian prior over coefficients
to estimate
3. Estimate coefficients to optimize some objective function
4. Model criticism, residual plots, etc.
5. Hypothesis testing
16
Inputs to Tank Reaction Simulation
17
ba
aFFaFF
FTFT
FFFF
FTTFT
TFTFCFTTFFdtdT
CFCFTdtdC
bb
CCTC
TT
CC
TCTT
CC
,,,:parameters 4
2
,130,
,
1110exp,
,,
,
co
co
1 coco
inin
incoinco
inref4
in
cocoininininco
ininin
Computations: Nonlinear ODE
• Compute Input vectors
• Define functions
• Call differential equation solver
• Summarize, plot
Tem
pera
ture
C
once
ntra
tion
estimate parameters (, , a, b)
18
Three problems
• Estimate (, , a, b) to minimize SSE in Temperature only
function SSE SSE-minMatlab lsqnonlin 5.09888 0.00236R nls 5.09652 0
optim Nelder-Mead 5.09652 0BFGS 5.09652 0CG 5.09900 0.00248SANN 5.17504 0.07852
nlminb 5.09652 0
19
0 10 20 30 40 50 601.2
1.4
1.6
C(t
)
Concentration (red = true, blue = estimated)
0 10 20 30 40 50 60330
340
350
360
T(t
)
Temperature
SSE(Temp, Conc)
• Matlab: lsqnonlin • R: nls
0 10 20 30 40 50 60
1.2
1.4
1.6
1.8
Concentration (red = true, blue = estimate)
C(t
)
0 10 20 30 40 50 60
33
03
40
35
03
60
Temperature
C(t
)
Matlab RConcentration 1.149E-03 1.145E-03Temperature 2.640E-04 2.636E-04
Median absolute relative error
20
R vs. Matlab • Gave comparable answers
• R code for CSTR slightly more accurate but requires much more compute time – coded by different people
• R has helper functions not so easily replicated in Matlab – summary.nls – confint.nls – profile.nls
Estimate StdErr t Pr(>|t|) kref 0.466 0.004 113.0 < 2e-16 ***EoverR 0.840 0.009 94.7 < 2e-16 ***a 1.720 0.232 7.4 8.2e-13 ***b 0.496 0.050 10.0 < 2e-16 ***
21
confint.nls• Likelihood-based confidence intervals:
generally more accurate than Wald intervals – Wald subject to parameter effects curvature – Likelihood: only affected by intrinsic curvature
> confintNlsFit 2.5% 97.5%kref 0.458 0.474EoverR 0.823 0.858a 1.300 2.222b 0.401 0.599
22
0.455 0.465 0.475
0.0
1.0
2.0
0.82 0.84 0.86
0.0
1.0
2.0
1.2 1.6 2.0 2.4
0.0
1.0
2.0
0.40 0.50 0.60
0.0
1.0
2.0
plot.profile.nls• for a plot
showing the sqrt(log(LR))
0.455 0.465 0.475
0.0
1.0
2.0
0.82 0.84 0.86
0.0
1.0
2.0
1.2 1.6 2.0 2.4
0.0
1.0
2.0
0.40 0.50 0.60
0.0
1.0
2.0
kref EoverR
a b
50
99
80
9590
23
Conclusions
• R and Matlab give comparable answers
• R:nls has helper functions absent from Matlab:lsqnonlin
• Functional data analysis tools are key for – estimating derivatives and – working with differential operators
24
References
• www.functionaldata.org
• Ramsay and Silverman (2006) Functional Data Analysis, 2nd ed. (Springer)
• ________(2002) Applied Functional Data Analysis (Springer)
• Ramsay, J. O., Hooker, G., Cao, J. and Campbell, D. (2007) Parameter estimation for differential equations: A generalized smoothing approach (with discussion). Journal of the Royal Statistical Society, Series B. To appear.
25
NOT free-knot splines
• For this, see – DierckxSpline package – Companion to Dierckx, P. (1993). Curve and
Surface Fitting with Splines. Oxford Science Publications, New York.
• R package by Sundar Dorai-Raj – links to Fortran code by Dierckx available from
www.netlib.org/dierckx
• soon to appear on CRAN