+ All Categories
Home > Documents > Simultaneous Confidence Bands for the Mean of Dense ...

Simultaneous Confidence Bands for the Mean of Dense ...

Date post: 31-Jan-2023
Category:
Upload: khangminh22
View: 0 times
Download: 0 times
Share this document with a friend
23
1 Simultaneous Confidence Bands for the Mean of Dense Functional Data By LI Xiaoyu (17250811) A thesis submitted in partial fulfillment of the requirements for the degree of Bachelor of Science (Honours) in MATH & STAT QDA at Hong Kong Baptist University Supervisor: Prof CHENG Ming-Yen 13/10/2020
Transcript

1

Simultaneous Confidence Bands for theMean of Dense Functional Data

By

LI Xiaoyu

(17250811)

A thesis submitted in partial fulfillment of the requirements

for the degree of

Bachelor of Science (Honours)in MATH & STAT QDA

at

Hong Kong Baptist University

Supervisor: Prof CHENG Ming-Yen

13/10/2020

2

Acknowledgement

First of all, I want to express my sincere gratitude to Professor CHENG

Ming-Yen, my supervisor, who gave me necessary research papers, detailed

instructions, considerable assistance and valuable suggestions during the first

semester. Moreover, I appreciate my parents, my friends and professors from

the Department of Mathematics. Without their assistance, encouragement and

guidance in my life, I cannot successfully complete my final year project.

Finally, the plasma data set comes from Andersen et al. (1981), Hart and Wehrly

(1986) and moreover, the R package ‘SCBmeanfd’ comes from Degras (2016).

Thanks for their contribution.

______________________

Signature of Student

______________________

Student Name

Department of Mathematics

Hong Kong Baptist University

Date: ________________

LI Xiaoyu

08/01/2021

LI Xiaoyu

3

Simultaneous Confidence Bands for theMean of Dense Functional Data

LI Xiaoyu

(17250811)

Department of Mathematics

ABSTRACT

The mean function is an important topic in functional data analysis (FDA). To

analyze the mean of functional data on the entire domain, one should conduct

simultaneous inference rather than pointwise inference. Before building SCB,

one needs to first estimate the mean function, conduct the nonparametric

smoothing, choose the smoothing parameter, and finally estimate the covariance

function.

This thesis first introduces the whole process and two specific methods to build

simultaneous confidence bands (SCB) for the mean of dense functional data and

then uses R package ‘SCBmeanfd’ to analyze and interpret the plasma data.

This example illustrates how SCB applies in hypothesis testing and provides

evidence to draw the conclusion whether to reject the null hypothesis or not.

Keywords: FDA, SCB, hypothesis testing, R package ‘SCBmeanfd’

4

1. Introduction

As shown in Degras (2017), rapid development in science and industry have helped collecthigh-resolution data in the areas of time, space, and frequency. Such data can be consideredas discrete observation functions and thus are called functional data. For example, in figure 1,daily temperature in Spain on average per year during the period 1980 to 2009 is Spanishweather functional data. The R package ‘fda.usc’ comes from Bande et al. (2020).

> install.packages("fda.usc")

> library(fda.usc)

> plot(aemet$temp)

FIGURE 1: daily temperature in Spain on average per year over 30 years

As shown in figure 1, Spanish temperature data possess dense features, which are so-calleddense functional data.

Figure 2 and figure 3 show that the daily wind speed and log precipitation in Spain onaverage per year during the period 1980 to 2009, respectively, which give other examples ofthe Spanish weather dense functional data.

> plot(aemet$wind.speed)

> plot(aemet$logprec)

5

FIGURE 2: daily wind speed in Spain on average per year over 30 years

FIGURE 3: daily log precipitation in Spain on average per year over 30 years

More importantly, functional data are very useful and crucial when analyzing biomedicalscience, social science, climate science and finance. The common feature of these sciencefields is that they need to collect data many times in a specific period and draw thecontinuous curve on domain t.

To analyze the functional data, new theory and statistical methods are needed. As discussed inDegras (2017), the goal of functional data analysis (FDA) is to analyze the given information

6

on curves or functions such as mean function estimation and functional principal componentanalysis (FPCA). In other words, FDA is the process of analyzing surfaces, curves orfunctions.

There are many excellent past works on how to build SCB for functional data analysis suchas Choi and Reimherr (2018), Cao et al. (2016), and Degras (2011). This thesis mainlyfocuses on the SCB for the mean of dense functional data.

This thesis will be organized as follows. In section 2, the thesis will discuss the estimation ofthe mean and covariance of functional data. Section 3 tells how to build simultaneousconfidence bands (SCB). Section 4 illustrates the methods to build simultaneous confidencebands by analyzing the plasma data set available through R. Finally the thesis will make theconclusion in section 5 and list all references in section 6.

7

2. Preparation process before building SCB

Before conducting the estimation, one needs to know some basic statistical process. First, astochastic process {X(t)| t ∈ T} is simply a collection of some random variables where indext is often time. Second, a Stochastic process X is normally distributed with mean μ and

variance 2V if its probability density function is � �� �2

22x

2e

21xf V

P��

SV . Third, Gaussian

process (GP) is a special case in stochastic process and refers to a set of random variables onthe continuous domain in which every finite collection follows the multivariate normaldistribution. In other words, their arbitrary finite linear combination is a normal distribution.Hence, a Gaussian process can be uniquely determined by a fixed mean and covariancefunction.

Now let Xi(t) denote a stochastic process and assume Xi(t) follow the Gaussian process with

mean function � �tP and covariance function � �.t,s* That is, � � )).t,s(,tμ(GP~(t)Xi *

Then a general statistical model for the functional data is as follows:Yij = Xi (tj) +εij, for i=1,...,n, & j=1,...,p

where Yij is the observation of the ith statistical unit at grid point tj and εij is a measurementerror (Degras, 2017).

2.1 Estimate the mean function

In general, the most common method to estimate mean function P is .Xn1X

n

1ii¦

However, small size of grid tj and large noise εi always lead to unstable and incorrectestimation for mean function. According to Degras (2017), one can calculate the sample

mean ¦

n

1iijj Y

n1Y at every grid point tj and then smooth them in a nonparametric fashion.

There are mainly two benefits for data smoothing. First, it generates a better-definedestimator for mean. Second, it can cancel out the adverse effect caused by unstablemeasurement error εij.

8

2.2 Nonparametric smoothing

As stated in Degras (2017), the basic process of data nonparametric smoothing is as follows.First, let wj denote the weight functions which are determined by given data and smoothing

methods. Second, estimate mean μ by nonparametric smoother � � � � j

p

1jj Ytwtμ ¦

.

Wand and Jones’s book (as cited in Degras, 2017) presented that the Nadaraya-Watson

estimator, a very common example of nonparametric smoother, is � �¦

¦

�.

�.

p

1j

j

p

1jj

j

)h

tt(

Y)h

tt(tμ for

kernel function . and positive constant h, which is a bandwidth decided by the analyst. In

this case, the weight functions are ,)

htt

htt

(t) w p

1i

i

j

j

¦

��.

�.

)(

which is very complicated. Note

that � � 1twp

1jj ¦

for .Dt��

The kernel function K and the bandwidth h are both very important concepts in datasmoothing. According to Guidoum (2015), if satisfying the following conditions, function K(t)can be considered as a kernel:(1) . Non-negative and integral equals 1.(2) . Symmetric around the origin.

(3) . With a finite second moment. That is to say, f³ <dt)t(KtR

2 .

As shown in Guidoum (2015), the most frequently used kernel function is normally

distributed with mean zero and variance 1, that is, .e21)t(K 2

t2�

S

In addition, as discussed in Degras (2017), bandwidth h determines the effective size of thelocal average window. Bandwidth h has the following characteristics. Given the fixed grid

point tj, if the bandwidth h is small, then the weight function wj (t) to the sample mean jY is

big where t is close to tj. On the contrary, when the bandwidth h is big, then the weight

function wj (t) to the sample mean jY is large where the t is far from tj. Furthermore, if grid

9

point tj and bandwidth h are both fixed, then the weights function wj (t) will increase with trunning closer to tj.

Finally, as stated in Eilers and Marx (1996), B-splines provides an important tool fornonparametric smoothing. Moreover, there are still many other nonparametric smoothingmethods such as the spline smoothing methodology (Silverman, 1985). One should choosethe suitable nonparametric smoothing method based on the specific data features. As shownin Degras (2017), one of the most common nonparametric smoothing techniques is localpolynomial smoothing method. Its nonparametric smoother is

� � � � ¸¹·

¨©§ �

¦ �E� P °

°

¿

°°

¾

½

°°

¯

°°

®

­

¦

EEh

ttKttYmintˆ jkj

m

0kkj,...,

2p

1jphk1

m0

where K is the kernel function and h is the bandwidth.

However, it’s well known that the effect of different smoothing methods on the final results isfar less than that of the value of bandwidth h. Therefore, one can also choose thenonparametric smoothing method based on preferences.

2.3 Choose smoothing parameter

The selection of the smoothing parameter is a key part. In section 2.3 smoothing parameter isjust the bandwidth h mentioned above in section 2.2.

In probability and statistics, the mean squared error (MSE) measures the average of thesquare of the errors. In other words, MSE calculates the average squared difference betweenthe actual values and estimated values. So it can be expressed in the following form:

� � � �� �> @2tμtμE)μ(MSE � for .Dt�

After some proof, MSE can satisfy the following form:

� � � �� � � �� � � � � �� �� �> @22 tμEtμEtμtμEμMSE ��� � �� � .μ),μ(BiastμVar 2�

Therefore, if MSE of the mean estimator � �tP is small, then both its variance and bias

should be also small. According to Degras (2017), the smoothing parameter h defines the

relationship between the variance and the bias for the estimator � �.tμ The larger the

smoothing parameter h is, the larger the bias is and the smaller the variance is. This isbecause the data curve is more smooth with h increasing. In contrast, the smaller h is, thesmaller the bias is and the larger the variance is.

10

Now let AMSE denote the average mean squared error where

� � � �� �> @¦

¸¹·¨

©§ �

p

1j

2jjh tμtμE

p1hAMSE (Degras, 2017).

One needs to determine the value of smoothing parameter h to minimize the AMSE. This isthe goal of section 2.3.

Leave-one-out cross-validation (LOOCV) can successfully achieve this goal. As studied inSteorts (2017), the basic conducting process and data analysis of the simplestleave-one-data-out cross validation is as follows:Assume that a data set consists of n data. Pick (n-1) data for each time as the training set toadjust the parameters of the model and use the remaining one data as the validation set. Keepconducting the experiments until each data has been taken as the testing set. Note that the ithtraining set includes all data points but data point i. In this case one can eventually get nmodels and compute n corresponding MSE, which are MSE1, ..., MSEn, respectively. Then the

LOOCV estimator is the average of these n MSE where .i

n

1i)n( MSE

n1CV ¦

This LOOCV

re-sampling method can reduce the bias because it uses n-1 data as a training set each time.

According to Degras (2017), the goal of leave-one-data-out cross validation is to minimize

the CV score: � � � �¦¦

¸¹·¨

©§� �P

n

1i

p

1j

2ijj

jh Ytˆ

np1hCV related to the smoothing parameter h where

jhˆ �P is the mean estimator based on all but the data point at grid tj, for j=1,...,p.

However, in functional data analysis, one needs to analyze the curves instead of the datapoints. In this case, the corresponding leave-one-curve-out cross-validation score is asfollows:

� � � � ,Ytˆnp1hCV

2n

1i

p

1jijjh

i¦¦

¸¹·¨

©§ �P �

where -ihμ is the mean estimator based on all but the ith curve (Rice & Silverman, 1991, as

cited in Degras, 2017).

More importantly, it can be inferred based on the equation mentioned above that theexpectation of CV score with regard to h approximately equals AMSE(h) +c for constant cwhich is independent of h. Therefore, the smoothing parameter h determines the expected CVscore and the average mean squared error (AMSE) simultaneously in the same way.Accordingly, one can select the appropriate smoothing parameter h by minimizingcross-validation score, instead of AMSE. This method is so-called Cross-validation

11

bandwidth selection in R package ‘SCBmeanfd’.

2.4 Estimate the covariance function

The last step to build SCB is to estimate the covariance function for mean estimator � �.tP

Hart and Wehrly’s and Degras’s paper (as cited in Degras, 2017) stated that the covariance

estimator of μ satisfies:

� � � �� � � �n

t,stμ,sμCov *| fo� p,nDt,s ,

where * is the covariance of Gaussian process Xi(t). Therefore, the covariance estimationtask is simplified to * estimation.

There are mainly two methods to estimate the covariance � �t,s* of the Gaussian process

Xi(t).

First of all, as stated in Degras (2017), apply the smoothing method which was applied to the

average data curve to all curves to estimate mean function � �.tP This process is so-called

data pre smoothing. The pre smoothed data can be expressed like this:

� � j

p

1jiji YtwtX ¦

¸¹·¨

©§

where wj is the weight function. In this case, the mean estimator � �tP is the mean of the pre

smoothed data mentioned before and thus satisfies .Xn1ˆ

n

1ii¦

P Hence, one can use sample

covariance of iX to estimate the covariance * :

� � � � � �� � � � � �� �.tμ tXsμ sXn1t,sˆ

n

1iii¦

�� *

Ramsay and Silverman (2005) studied the second method to estimate the covariance * ,which is to conduct the functional principal components analysis (FPCA). In short, one needs

to compute the eigenvalues 0ˆˆˆ n21 tOt���tOtO and the corresponding eigenfunctions

n21 ˆˆˆ M���MM ,,, of the sample covariance � �t,s* , mentioned above in the first method. The

12

eigenfunctions kM are orthonormal.

Indritz’s book (as cited in Degras, 2017) stated that the sample covariance � �t,s* can be

expanded as � � ,¸¹·¨

©§¸

¹·¨

©§

MMO * ¦ tˆsˆˆt,sˆ kk

n

1kk which is based on the Mercer’s Theorem. However,

this expansion form will take up a huge amount of storage space. In this case, one should

select a suitable truncation order K, which is much smaller than n, to decompose � �t,s* into

the following reduced form: � � ¸¹·¨

©§¸

¹·¨

©§

MMO * ¦ tˆsˆˆt,sˆ kk

K

1kk (Degras, 2017). The reduced form can

save a considerable amount of storage space and maintain most of the variance of the sample

covariance � �t,s* at the same time.

13

3. Build SCB

Having estimated the mean and covariance function of ,P it’s time to build SCB.

Recall that a Gaussian process refers to a set of random variables in which every finitecollection follows the multivariate normal distribution. Moreover, the general model fordense functional data of the Gaussian process Xi(t) is Yi = Xi (tj) +εi, as mentioned in section2.

As stated in Degras (2011), let �*,0(GP denote a Gaussian Process with mean zero and

covariance * , then the mean estimator μ can be expressed in the following form based on

the functional central limit theorem (FCLT):

� � � �*o ,0GPμ-μn as n, p →f .

Note that the central limit theorem argues that under ideal conditions, the mutuallyindependent random variables approximately follow a normal distribution. Functional centrallimit theorem is a specific example of central limit theorem.

There exist two basic methods to build simultaneous confidence bands (SCB) for P (Degras,

2017):

First of all, let � � � �t,tt * V and � � � � � �ts)t,st,s

VV�*

U denote the standard deviation and

correlation functions of Gaussian process Xi(t), respectively, and then let a stochastic process

Z be another Gaussian process with mean zero and covariance U , that is, � �.,0GP~Z U

Therefore, based on the functional central limit theorem (FCLT), ZV converges to the limitGaussian process with mean 0 and covariance * when n and p go to infinity, that is,

,),0(GP~Z *V as .p,n fo

The first method is to use the Gaussian process �*,0(GP to build SCB. In this case, the

SCB for μ in the confidence level D�1 is:

� � � � ,zn

1tˆzn

1tˆ ,,, ¸¹·

¨©§ �P�P *D*D Dt�

where ),0(GP~Z *V and � � .1ztZtsupP ,Dt D� dV *D¸¹·¨

©§¸

¹·¨

©§�

14

Second, one can also use the Gaussian process � �U,0GP to build SCB instead of the

previous limit Gaussian process .,0(GP �* Recall that � � � �t,tt * V and � � � � � �.ts)t,st,s

VV�*

U

Then the SCB for μ in the confidence level D�1 is approximately as follows:

� � � � � � � � Dtntztμ,

ntztμ ,, �¸

¹·

¨©§ V

�V

� UDUD ,

where UD ,Z satisfies the probability function � �� � D� d UD� 1ztZsupP ,Dt . This SCB

construction method used the method to build the pointwise confidence intervals. In otherwords, μ will fall in the above confidence region with the probability of 1-D when n and p

are large enough. In this case, as the correlation U increases, the quantile UD,z will decrease.

In particular, if the correlation U equals 1, UD,z will equal 1.96 for D= 0.05, which is the z

score for standard normal distribution.

As mentioned in Degras (2017), in practice, � �tV is difficult to compute. However, � �tV

can be easily estimated by its estimator � � � �.t,tˆtˆ * V Moreover, the quantile UD,z can be

approximately derived from Gaussian process simulation � �.ˆ,0GP~Z U Therefore, the

practical SCB for μ in the confidence level D�1 is:

� � � � � � � � ,ntˆ

ztμ,ntˆ

ztμ ˆ,ˆ, ¸¹·

¨©§ V

�V

� UDUD Dt�

where the quantile UD ˆ,z approximately by Gaussian process silumation � �U,0GP~Z is

considered as a parametric bootstrap of the estimator� � � � � �� �,ttˆtˆn

P�PV

.Dt�

The process of the parametric bootstrap is as follows (Degras, 2017):

First of all, discretize the Gaussian process � �U,0GP~Z over a fine grid ^ `m21 ,...,, WWW in D

and consider Zm as the resulting random vector. Generally, the grid size m should be set largeenough that the discretization effect can be negligible. However, with grid size m increasingrapidly, the computation cost O(m3) will also increase quickly due to the complicated,uncertain and random simulation process. The computation cost is the time it takes computersto analyze large amounts of data. Therefore, one needs to determine a suitable value of gridsize m by considering synthetically all the factors such as the domain D size and the

15

roughness of .U If domain D runs from 0 to 1, grid size m should be no higher than 500.

Second, simulate the normal distribution � �Um M,0N~Z where � �� � .,ˆM mk,1jkjˆ dtU WWU

There exist memory costs O(m2) and computation costs O(m3) during the complicatedsimulation process, and both of them will increase quickly as grid size m increases. Note thatthe memory costs refer to the historical cost of computers to store and record data; thecomputation costs is the time it takes computers to analyze large amounts of data. Therefore,as mentioned in the first step, grid size m should be set suitable. To greatly reduce both

memory costs and computation costs, � �t,s* can be expressed in the following reduced

form: � � � � � �tsˆt,sˆ lk

K

1k

K

1lkl MMO * ¦¦

where the truncation order K is much smaller than the grid

size m. Note that kM is the eigenfunctions of sample covariance * and klO is the

corresponding eigenvalues. Recall that this reduced decomposition form occurs when one

uses FPCA method to estimate � �t,s* of the Gaussian process Xi(t). In this case, to simulate

the normal distribution � �,M,0N~Z ˆm U one just needs to simulate a random vector

� �Ok M,0N~Z where .ˆˆ

ˆM

llkk

klˆ

OO

O O

The last step is to compute the L-Infinity norm of Zm. The L-Infinity norm is the maximumabsolute value of the entries of Zm. Conduct many times of simulation process, and then setthe quantile of confidence level 1-D of these simulated process of the L-Infinity norm of Zm

as the quantile .ˆ,Z UD It’s worthy mentioning that the parametric bootstrap does not considerthe uncertainty related to smoothing parameter h selection and covariance estimation.Therefore, errors of SCB curves may happen especially when sample size n is not largeenough.

Besides the parametric bootstrap, one can also use the nonparametric bootstrap of the

estimator� � � � � �� �ttˆtˆn

P�PV

to determine the value of UD ˆ,Z (Degras, 2017). In general, conduct

the nonparametric sample ^ `n1 X~,...,X~ of the pre smoothed curves ^ `n1 X,...,X and compute

the bootstrap estimator .X~n1~ n

1ii¦

P This process needs repeating for many times to

determine the bootstrap distribution. Afterwards one only needs to use the quantile ofconfidence level 1-D of the bootstrap distribution as .ˆ,Z UD Although the nonparametric

16

bootstrap needs much more computation cost O(m3) than the parametric bootstrap mentionedbefore, its results are more accurate because it considers the uncertainty when estimating thecovariance function and selecting the smoothing parameter h.

17

4. Analyze plasma data set

Then I will use the above methodology to analyze the plasma data set through the R package‘SCBmeanfd’. Before conducting the analysis, I need to emphasize again that the plasma dataset comes from Andersen et al. (1981), Hart and Wehrly (1986) and that the R package‘SCBmeanfd’ comes from Degras (2016). Thanks for their work. This example illustrates theimportant application of SCB on the hypothesis test of one sample test for population mean.

> install.packages("SCBmeanfd")

> library(SCBmeanfd)

> data(plasma)

The data set description is the plasma citrate concentration of 10 persons random sampling onsome day. The measurements were taken every hour from 8AM to 9PM. So the plasma dataset is a 10*14 matrix, as shown in figure 4.

FIGURE 4: plasma data table

Furthermore, the below figure 5 is the curve of plasma data sets on the continuous index time,including the mean curve.

> matplot(8:21, t(plasma), type = "l", col = 3, lty = 3,

xlab = "hour per day", ylab = "plasma citrate concentration",

main = "plasma data")

> lines(8:21, colMeans(plasma), col = 1, lwd = 2)

> legend("top", col = 1, lty = 1, legend = "the average")

18

FIGURE 5: curve of plasma data sets on the continuous index time

According to Leslie and Renty (2016), the normal plasma citrate concentration is larger than

100 .MP Now I want to estimate whether the overall average plasma citrate concentration is

in this normal range, which is larger than 100 units or not given the sample plasma data. Notethat the sample mean of plasma citrate concentration is 119.1357, which is computed by R.Design an one-tailed hypothesis test for whether the population mean of plasma citrate

concentration per day is 100 units. That is, 100μ :H 0 { versus � � 100tμ:H 0a ! for .Dt 0 �

In this case, I used the cross-validation bandwidth selection method to select bandwidth hwhere degree of the local polynomial fit equals 1. Set significance level α = 0.05, then thesimultaneous confidence band can be drawn like figure 6:

> h <- cv.select(8:21, plasma,degree=1,interval=c(0.5,1))

> scb <- scb.mean(8:21, plasma, bandwidth =h, scbtype = "both", gridsize =2e3)

> plot(scb, xlab = "hr", ylab = "concentration", main = "plasma data")

> legend("topright", col = "green", lty = 2, lwd = 1, legend = "normal SCB")

> legend("topleft", col = "blue", lty = 2, lwd = 1, legend = "bootstrap SCB")

19

FIGURE 6: SCB curves for population mean of plasma citrate concentration

As shown in figure 6, the green dotted lines are the boundary of normal SCB while the bluedotted lines indicate the boundary of bootstrap SCB in the confidence level 0.95 for thepopulation mean of plasma citrate concentration. Obviously the normal SCB curve excludes100 all the time and the bootstrap SCB curve does not contain 100 from 6PM to 8PM. So weshould reject the null hypothesis at significance level 0.05. In other words, the population

plasma citrate concentration should be larger than 100 ,MP which is in the normal range.

Then double check the test statistic, normal p value, and bootstrap p value by thegoodness-of-fit test.

> scb.model(8:21,plasma,model=1,bandwidth=h,level=.05,scbtype="both",

gridsize =2e3)

Goodness-of-fit test

Model for the mean function: linear

Bandwidth: 0.75

SCB type: normal and bootstrap

Significance level: 0.05

Test statistic and p value

stat normal p bootstrap p

5.174 <1e-16 0.0374

20

So the test statistic T= 5.174, normal p value is smaller than 10-16 and bootstrap p value is

0.0374. In this case, there is enough evidence to reject Ho in the significance level 0.05. In

other words, there is enough evidence to prove the human mean of plasma citrate

concentrations is higher than 100 ,MP which is in the normal range.

21

5. Conclusion

Dense functional data are widely applied in bio-medicine, social science, and finance andthus are an important topic in statistics. Simultaneous inference is a good statistical tool offunctional data analysis (FDA). In particular, simultaneous confidence bands have two mainadvantages, which are powerful visualization and easy interpretation.

To conclude, this thesis introduces the basic process and methods to build SCB for the meanof dense functional data. Moreover, it applies SCB to hypothesis test of one sample test forpopulation mean by analyzing the plasma data set through the R package ‘SCBmeanfd’ andfinally gives normal and bootstrap p values to provide evidence to prove that the nullhypothesis should be rejected in the significance level .05.0 D

22

6. Reference

Andersen, A. H., Jensen, E. B., & Schou, G. (1981). Two-way analysis of variance withcorrelated errors. International Statistical Review, 49, 153–157.

Bande, M. F., Fuente, M. O., Galeano, P., Nieto, A., & Portugues, E. G. (2020). fda.usc:Functional Data Analysis and Utilities for Statistical Computing. Version 2.0.2.

Cao, G., Wang, L., Li, Y., & Yang, L. (2016). Oracle-efficient confidence envelopes forcovariance functions in dense functional data. Statistica Sinica, 26 (1), 359-383.

Choi, H., & Reimherr, M. (2018). A geometric approach to confidence regions and bands forfunctional parameters. J. R. Stat. Soc. B, 80, 239-260.

Degras, D. (2011). Simultaneous confidence bands for nonparametric regression withfunctional data. Statistica Sinica, 21, 1735-1765.

Degras, D. (2016). SCBmeanfd: Simultaneous Confidence Bands for the Mean of FunctionalData. Version 1.2.2.

Degras, D. (2017). Simultaneous confidence bands for the mean of functional data. WIREsComput Stat, 9, 1397.

Eilers, P., & Marx, B. (1996). Flexible smoothing with B-splines and penalties. Stat Sci, 11,89–121.

Guidoum, A.C. (2015). Kernel Estimator and Bandwidth Selection for Density and itsDerivatives. The kedd package. Retrieved October 22, 2020 fromhttps://cran.r-project.org/web/packages/kedd/vignettes/kedd.pdf

Hart, J.D., & Wehrly, T.E. (1986). Kernel regression estimation using repeated measurementsdata. J Am Stat Assoc, 81, 1080–1088.

Leslie, C.C., & Renty B.F. (2016). Plasma Citrate Homeostasis: How It Is Regulated; And ItsPhysiological and Clinical Implications. An Important, But Neglected, Relationship inMedicine. HSOA J Hum Endocrinol, 1(1).

Ramsay, J.O., & Silverman, B.W. (2005). Functional Data Analysis. Springer Series inStatistics. 2nd ed. Springer, New York.

Silverman, B. W. (1985). Some aspects of the the spline smoothing approach tonon-parametric regression curve fitting. J R Statist. Soc. Series B, 47, 1–52.

23

Steorts, R. C. (2017). Resampling Methods: Cross Validation [PowerPoint skides]. DukeUniversity. Retrieved December 18, 2020 fromwww2.stat.duke.edu/~rcs46/lectures_2017/05-resample/05-cv.pdf


Recommended