+ All Categories
Home > Documents > Asymptotically Optimal Selection of a Piecewise Polynomial ...Asymptotically Optimal Selection of a...

Asymptotically Optimal Selection of a Piecewise Polynomial ...Asymptotically Optimal Selection of a...

Date post: 24-Feb-2021
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
15
JOURNAL OF MULTIVARIATE ANALYSIS 22, 230-244 (1987) Asymptotically Optimal Selection of a Piecewise Polynomial Estimator of a Regression Function* KEH-WEI CHEN University of North Carolina, Charlotte Communicated by P. Hall Let (X, Y) be a pair of random variables such that X= (X, ,...,X,,) ranges over a nondegenerate compact d-dimensional interval C and Y is real-valued. Let the con- ditional distribution of Y given X have mean 0(x) and satisfy an appropriate moment condition. It is assumed that the distribution of X is absolutely continuous and its density is bounded away from zero and infinity on C. Without loss of generality let C be the unit cube. Consider an estimator of 0 having the form of a piecewise polynomial of degree k, based on rnt cubes of length l/m,, where the m,d(kn:d) coefficients are chosen by the method of least squares based on a random sample of size n from the distribution of (X, Y). Let (k,, m,) be chosen by the FPE procedure. It is shown that the indicated estimator has an asymptotically minimal squared error of prediction if 0 is not of the form of piecewise polynomial. 0 1987 Academic Press. Inc 1. INTRODUCTION Application of nonparametric regression estimators to real data requires the specification of an “index.” For example, the kernel estimator is not fully defined until a window width for the kernel has been chosen. The present paper investigates the selection of the index for nonparametric regression based on piecewise polynomial fits. Let (X, Y) be a pair of random variables such that X= (X, ,..., X,) ranges over a nondegenerate compact d-dimensional interval C and Y is real- valued. Let the regression function 0 of Y on X be unknown. Suppose that the conditional variance o2 of Y given X exists and is independent of X. Let Revised October 1986. AMS 1980 subject classitication: Primary 62H12. Key words and phrases: nonparametric regression, model selection, least squares, asymptotic efficiency. * This research was supported in part by the National Science Foundation under Grants NSF MCS83-01257 and MCS78-25301. 230 0047-259X187 $3.00 Copyright 0 1987 by Academic Press, Inc. All rights of reproduction in any form reserved.
Transcript
Page 1: Asymptotically Optimal Selection of a Piecewise Polynomial ...Asymptotically Optimal Selection of a Piecewise Polynomial Estimator of a Regression Function* KEH-WEI CHEN University

JOURNAL OF MULTIVARIATE ANALYSIS 22, 230-244 (1987)

Asymptotically Optimal Selection of a Piecewise Polynomial Estimator of a Regression Function*

KEH-WEI CHEN

University of North Carolina, Charlotte

Communicated by P. Hall

Let (X, Y) be a pair of random variables such that X= (X, ,..., X,,) ranges over a nondegenerate compact d-dimensional interval C and Y is real-valued. Let the con- ditional distribution of Y given X have mean 0(x) and satisfy an appropriate moment condition. It is assumed that the distribution of X is absolutely continuous and its density is bounded away from zero and infinity on C. Without loss of generality let C be the unit cube. Consider an estimator of 0 having the form of a piecewise polynomial of degree k, based on rnt cubes of length l/m,, where the m,d(kn:d) coefficients are chosen by the method of least squares based on a random sample of size n from the distribution of (X, Y). Let (k,, m,) be chosen by the FPE procedure. It is shown that the indicated estimator has an asymptotically minimal squared error of prediction if 0 is not of the form of piecewise polynomial. 0 1987

Academic Press. Inc

1. INTRODUCTION

Application of nonparametric regression estimators to real data requires the specification of an “index.” For example, the kernel estimator is not fully defined until a window width for the kernel has been chosen. The present paper investigates the selection of the index for nonparametric regression based on piecewise polynomial fits.

Let (X, Y) be a pair of random variables such that X= (X, ,..., X,) ranges over a nondegenerate compact d-dimensional interval C and Y is real- valued. Let the regression function 0 of Y on X be unknown. Suppose that the conditional variance o2 of Y given X exists and is independent of X. Let

Revised October 1986.

AMS 1980 subject classitication: Primary 62H12. Key words and phrases: nonparametric regression, model selection, least squares,

asymptotic efficiency.

* This research was supported in part by the National Science Foundation under Grants NSF MCS83-01257 and MCS78-25301.

230 0047-259X187 $3.00 Copyright 0 1987 by Academic Press, Inc. All rights of reproduction in any form reserved.

Page 2: Asymptotically Optimal Selection of a Piecewise Polynomial ...Asymptotically Optimal Selection of a Piecewise Polynomial Estimator of a Regression Function* KEH-WEI CHEN University

SELECTION OF A PIECEWISE POLYNOMIAL ESTIMATOR 231

the second moment of Y be finite. Without loss of generality it can be assumed that C is the unit cube

c= {(x,,..., x,):O<x,<l for l<j<d}.

Let {Xi = (xi1 ,..., x,), Yi: 1 < i < n} denote a random sample of size n from the distribution of (X, Y). To estimate the unknown regression function 6, many classes of estimators have been proposed, including the nearest neighbor method, ‘the kernel method, and the partition method [lo]. Optimal convergence properties of these estimators have been studied in great detail (e.g., [24, 251).

In this paper piecewise polynomial estimators are used mainly for their mathematical tractability. They have the form of piecewise polynomial of degree k based on md cubes of length l/m, where the (2+“) md coefficients are chosen by the method of least squares based on the data (Xi, Y,), 1 < i < n. It follows from arguments contained in papers of Stone [24-261 that piecewise polynomial estimators can achieve the optimal rates of con- vergence under certain regularity conditions. Major [ 15 J investigated the asymptotic properties of piecewise constant regression when d= 1. For dis- cussions of piecewise polynomial regression in more applied settings, see McGee and Carleton [ 183, Lerman [ 143, and Tisher and Zang [30]. For a review of nonparametric regression in general, see Collomb [7].

Let II denote a pair (k, m) where k is a nonnegative integer and m is a positive integer. Write C as the disjoint union of md cubes C,, of length l/m. Let *ml denote the indicator function for the cube C,, so that em,(x) = 1 or 0 according as x E C,, or x 4 C,,,!. For each 1= (k, m) con- sider the piecewise polynomial estimator of 8 of degree k given by

where {Pnll} are polynomials of degree k chosen to minimize the residual sum of squares x7=, ( Yi - 8,A(Xi))*. Equivalently pnll minimizes CXi, =,, ( Yi - P&Xi))* for all 1; so p,,,(x) and p,,Jx) are conditionally independent given X, ,..., X, for I # I’.

Each estimator mentioned above is associated with an index 1 (e.g., the number of neighbors for nearest neighbor estimates; the bandwidth for ker- nel estimates; the degree and the number of pieces for piecewise polynomial estimates). The choice of ;1 turns out to be crucial in effectively estimating 19. In theoretical studies 1 has been deterministically chosen. However, for practical use, it is necessary to have a data-dependent choice of 2.

Various proposals have been made for data-driven procedures for choos- ing this index A. Varieties of cross-validation 1281 have been especially prominent. Although there have been significant theoretical developments

Page 3: Asymptotically Optimal Selection of a Piecewise Polynomial ...Asymptotically Optimal Selection of a Piecewise Polynomial Estimator of a Regression Function* KEH-WEI CHEN University

232 KEH-WE1 CHEN

-for example, Craven and Wahba [Xl, Speckman [23], Stone [27], and Hardie and Marron [ 11 ]-many questions remain open.

There are numerous articles which propose criteria for selection rules in the parametric regression model. Two review articles, Hocking [ 131 and Thompson [29], and two regression texts, Seber [20] and Daniel and Wood [9], survey the existing literature. Of particular interest are Mallows’ C, statistics (see, e.g., [16]), the related Akaike Information Criterion (AIC) (see [2-4]), and the final prediction error of Akaike (FPE) (see [ 1 I). These statistics are essentially estimates of the mean squared error of prediction of the fitted model when the vector of regression parameters is estimated by subset least squares.

Hardle and Marron [12] showed that in the kernel regression setting FPE does not have an asymptotically minimal mean integrated square error when the conditional variance is dependent on x. Shibata [21] has shown that C,, AIC, and FPE selection procedures have an asymptotically minimal squared error of prediction in a setting involving linear regression on a potentially infinite number of nonrandom predictor variables. In Breiman and Freedman [S] these predictor variables are assumed to have a joint Gaussian distribution. Like Shibata, they characterize a procedure whose squared error of prediction is asymptotically minimal. In the context of the present work (involving finitely many “independent” random variables and a regression function not of known parametric form), the FPE selection rule will also be shown to yield an asymptotically minimal squared error of prediction.

2. MAIN RESULT

Let {M,} denote an increasing sequence of positive integers that tend to infinity, but sufftciently slowly so that

lirn wYogn=O

” n

Then the sample size in each subcube Cm,, 1 < I< md and 1 ,< m < M,, will tend to infinity under Condition 1 below. Let K0 be a nonnegative integer and/i,betheset {A=(k,m):O<k<k,, l<m<M,}.Let&=(k,,m,)be chosen based on the data subject to the constraint that A, E /1,. Given the choice of Iz, let the piecewise polynomial dnl, be obtained by the method of least squares. The FPE criterion plays an important role here. We introduce the motivation of Akaike’s FPE criterion in the following simple setting.

Page 4: Asymptotically Optimal Selection of a Piecewise Polynomial ...Asymptotically Optimal Selection of a Piecewise Polynomial Estimator of a Regression Function* KEH-WEI CHEN University

SELECTION OF A PIECEWISE POLYNOMIAL ESTIMATOR 233

Consider the linear regression model of full rank .I,

Y=~~+e-[::][:;]+[:‘l; (1)

where YT= (Y, ,..., Y,), Xi = (Xi, ,..., X,) for 1 < i < n are nonrandom and the e;s are independently’ and identically distributed with mean 0 and variance 02. The least squares estimator fl of j? is (X’X))’ XTY. Suppose the future observations Yi are taken at the original points Xi. Then the expectation of the sum of squared errors of prediction, E C;= 1 ( Yi - Xi/f)*, is equal to (n + J) c2, while the expectation of RSS, (the residual sum of squares) is (n-J) c2. The FPE statistic is defined to be the unbiased estimator (1 + 2J/(n -J)) RSS,. Now in the present setting consider the FPE selection rule A,, which chooses I E LI, to minimize the statistic

FPE(A)= 1 +

here 8,1 is chosen by the method of least squares. Consider the average squared error loss L:

Lt”rz,l, 8)=! ,f ten,l(xj)-e(xi))2.

r=l

Given X, = (Xi; 1 < i<n}, let R(IZ, I X,) denote the conditional risk of &A,:

An efficiency can be defined by

6, An WL 0) eff(A,) =

U&d”, 0)

The selection rules A,, n 2 1, are said to be asymptotically efficient if eff(,I,) converges to one in probability.

Condition 1. The marginal distribution F of X is absolutely continuous and its density f is bounded below by b on C for some b > 0.

Condition 2. There are numbers v> 1 and M, >O such that E(IY-8(x)14’ I X=x)<M, for XEC.

Page 5: Asymptotically Optimal Selection of a Piecewise Polynomial ...Asymptotically Optimal Selection of a Piecewise Polynomial Estimator of a Regression Function* KEH-WEI CHEN University

234 KEH-WJ3 CHEN

Condition 3. The regression function 0 is not almost everywhere of the form of a piecewise polynomial of degree &.

THEOREM 1. If Conditions 1, 2, and 3 hold, then the FPE selection rule is asymptotically efficient.

Remark 2.1. A mean efficiency can be defined by

inLEAn WA I XJ

A careful inspection of the proof of Theorem 1 shows that the FPE selec- tion rule is asymptotically mean ellicient if Conditions 1, 2 (for v > 3), and 3 hold.

Remark 2.2. The ideas behind the theorem are most easily seen in the case K, = 0. So the proof will be given here only in that case. The complete proof may be found in Chen [6] as the proofs of (I) and (II).

3. F'R~~F OF THEOREM

Let &=O. Then the notation n(n) can be represented by m(A). Choose mf E &&, depending on X, such that

R(m,* ( X,) = M;‘” R(m 1 X,). n

According to Condition 1, for all m E A’,

R(m I L)=a ,f (E(e,,(Xi) 1 X,) - O(X,))2 +c 0' (2) r=l

except on an event whose probability tends to zero as n goes to infinity. Let

~mr(x) = E@,&) I KJ

= T Il/mr(x) E( ~nmr I X,) = c tin&) ~nmr, /

where On,,,, = E( FA I X,) = C,,. cd ~(~JlE’= 1 Il/mdxi)) and Inrnl= {i: Xic C,,}. Let lZ,,,J be the number of elements in Znmr.

n. JV~~729 0) = i (“nm(xi) - emn(xi))2 + f: (enm(xi) - e(xi))‘. ,=I i=l

Page 6: Asymptotically Optimal Selection of a Piecewise Polynomial ...Asymptotically Optimal Selection of a Piecewise Polynomial Estimator of a Regression Function* KEH-WEI CHEN University

SELECTION OF A PIECEWISE POLYNOMIAL ESTIMATOR 235

Consequently for given X, satisfying (2),

n. L(8,,, 0) -n * R(m 1 X,)

= i!, C(enm(xi) - Bnm(xi))2 - E((8nm(xi) - emn(xi))2 I xn)l

I?& =

zL a2

I= 1 ( L?d - kd2 -a 1 l~nmrl

where &,,,, denotes lZnm,l ( Fnnm, - ~?7,,,)~ - a2.

LEMMA 1. Suppose Conditions 1 and 2 hold. Then there is a positive con- stant a, (depending on v and M,) such that

E(I,$, Z.,,~2v~X~)6avm* for l<mdM, (3)

except on an event whose probability tends to zero with n.

ProoJ: Since E(Z,,, I X,) = 0 for all m and 1, it follows from Theorem 2 of Whittle [31] that there is a positive constant C, such that

E(l,$l t.,,~2”/X.)GC,m”M~XE(~~~~,~zv~X.) for ldm<M,.

So it suffices to show that E( Iznm,l ‘” I X,) is uniformly bounded in m, 1 in probability. By Jensen inequality,

IZn*,12” < 22y-’ C IZdl 2v ( Fnmr - QnmJ4” + a4”l.

According to Theorem 2 of Whittle [31], there is a positive constant CL such that

EC ( K7zc -“Z2d)4y I ‘al G cL lznmtl -‘” yt,x E( yi-e(xf)14” Ix,),

so it follows from Condition 2 that

E(Zd12” IX,) < 22v-‘{ CIM, + a4”} for all m, 1.

Since the left-hand side of this inequality is independent of m, 1, the desired result follows. Q.E.D.

Page 7: Asymptotically Optimal Selection of a Piecewise Polynomial ...Asymptotically Optimal Selection of a Piecewise Polynomial Estimator of a Regression Function* KEH-WEI CHEN University

236 KEH-WE1 CHEN

LEMMA 2. Suppose Conditions 1 and 3 hold. Then there exists a sequence {b,} of positive numbers tending to infinity such that

Nm,* I KJ >&In (4)

except on an event whose probability tends to zero with n.

Proof It suffices to show that for all positive numbers CX,

lim P((n . R(m,*) 1 X,) 2 a) = 1. n

By (2) it sufkes to show that

limp i (B,,(xi)-0(Xi))2>Q = 1 for each fixed m. n ( i= 1 1

For simplicity this will be done for m = 1, the proof for general m being similar.

It must be shown that

f (0,,, -O(Xi))*>a i= 1

By Conditions 1 and 3,

(5)

min s (a-B(x))‘f.(x)dx=!; (EY-@(x))*f(x)dx>O. 0 c c

It follows easily from the strong law of large numbers that almost surely

“,” i f (grill -B(X;))2=I (EY-8(x))2f(x)dx>0, r=l ‘

and hence (5) holds.

LEMMA 3. Suppose Conditions 1, 2, and 3 hold.

lim inf P Min mL~ 0) &n Rfm,* I XJ

for all 6 > 0. (6) n-co

ProoJ In proving (6) it can be assumed that 0 < 6 < 1.

Page 8: Asymptotically Optimal Selection of a Piecewise Polynomial ...Asymptotically Optimal Selection of a Piecewise Polynomial Estimator of a Regression Function* KEH-WEI CHEN University

SELECTION OF A PIECEWISE POLYNOMIAL ESTIMATOR 231

Then by Chebyshev’s inequality and Lemmas 1 and 2, when (2) (3), and (4) are true,

p &fin L(8nm7 e, < l-6 1 X, -K, R(m,* I X,)’

Mn Jm7:1 zm,12” I XJ GC m= l 62Yn2YR2Y(m I X,)

M” 0 22

avmdv

m=l 6 “n “R2”(m ( X,)

G (7)

The right-hand side of the last inequality tends to zero (independent of X,) as n goes to infinity. The result follows. Q.E.D.

Lemma 3 shows that R(m,* I X,) is an asymptotic lower bound for the minimum value of loss L(8,,, 0) for m E A”.

LEMMA 4. Suppose Conditions 1, 2, and 3 hold

lim P ML”, 0) Rh: I X,)

=0 forallE>O. n

That is, the FPE selection rule tii, makes dnrir. asymptotically as good as On,..

Since MinAn L(d n,,,, 0) < L(enrir,, 19), it follows from Lemmas 3 and 4 that the FPE selection rule is asymptotically efficient. So it remains to prove Lemma 4.

Observe that

,cl (enm(xi) - yi)2 = f ( yi- Bnm(xi))2 - f (enm(xi) - Bnm(xi))2

i=l i=l

and

E i ( Yi-O,,(Xi))’ 1 X, =no2 + i (O(Xi)-O~~(X~))2. i= I > i=l

Page 9: Asymptotically Optimal Selection of a Piecewise Polynomial ...Asymptotically Optimal Selection of a Piecewise Polynomial Estimator of a Regression Function* KEH-WEI CHEN University

238 KEH-WE1 CHEN

Thus FPE,(m) can be rewritten as

n*FPE,(m)= i (yi-I,,)‘-E f (Y,-B,,(x,))’ 1 x, i i= I ( i=l

- f (e,,(x,) - B”,(X,))’ - mdd

i= 1

+no*+ { gd ,f (O,,(XJ - Yi)2 - 2mdo2 r=l 1

Set

+ i $, (e(xi) - e,,(xi))’ + mdo2 1 .

S,(m) = - nyid ,f (d,,(X,) - Yi)2 - 2m”a2 1=1

and

L(m)= i (yi-enm(xi))2-E i (Yj-g,,(Xi))2 1 X, . i=l ( i= I

Hence by the definition of Znm,,

n.FPE,(m)=n.R(m I X,)- 1 Z,,,+S,(m)+ ~,,(m)+n~. (8) I= 1

Now,

p Max IC;l=“, Zmll -K, tn. R(n I X,)

>SIXn}<mz, P{i,cl Z,,1>Bn.R(mlX,)lX.}.

Recall from (7) that under Conditions 1, 2, and 3,

lim P Max for all 6 > 0. (9) n dx. n*R(m IX,)

Thus the second term on the right-hand side of (8) is negligible relative to n. R(m I X,) uniformly in A,,. For the third term, we need the following lemma.

LEMMA 5. Under Conditions 1, 2, and 3,

ISh)l n+ R(m I X,)

for all 6 > 0.

Page 10: Asymptotically Optimal Selection of a Piecewise Polynomial ...Asymptotically Optimal Selection of a Piecewise Polynomial Estimator of a Regression Function* KEH-WEI CHEN University

SELECTION OF A PIECEWISE POLYNOMIAL ESTIMATOR 239

Proof Suppose (2) holds.

s,(m) = gd [ i (enm(xi)- yi)2-

i= 1

(n - md) 02]

and

,$, (emn(xi) - yi)2 = f ( yi- onm(xi))2 - f (enm(xi) - enm(xi))2*

i=l i=l

We have

C1= 1 (enm(xi) - yi)2 n - md c2

n n

+ 2 X1= 1 ( yi- e(xi))(e(xi) - onm(xi))

n

There is a constant B, > 1 such that n/(n - md) < B, for n large enough and for all m E A’“. Thus

IS,(m)l ?!f!ifn n * R(m 1 X,)

<B, Max 2md IC;=, (d,,(X,) - Y,)’ - (n - md) 0~1

meAX, n2. R(m I X,)

6 B, Max 2md lC;t=“, Zmll 2mdo2 n2 * R(m I X,)

/CI=l (yi-e(xi))2- 1

+n.R(mlX,) 1 no2

I 2md I 4md ICI=l (e(xi)-B,,(xi))(yi-e(Xi))l

G3&xn 3,

n2 * R(m I X,)

+ 4B, Max md Icy= 1 (e(xi)- onm(xi))( yi- 0(x,))/

rnE*Hn n2. R(m ) X,) I. 683/22:2-5

Page 11: Asymptotically Optimal Selection of a Piecewise Polynomial ...Asymptotically Optimal Selection of a Piecewise Polynomial Estimator of a Regression Function* KEH-WEI CHEN University

240 KEH-WEI CHEN

Since lim,(M: log n/n) = 0, it follows from (9) that the first term on the right-hand side of the last inequality converges to zero in probability. It follows from the weak law of large numbers that the second term converges to zero in probability. To handle the last term, note that by (2) and Theorem 2 of Whittle [31],

E md CY= 1 (@Xi) - onrn(xi))( yi- 6(xi)) 4v

n2. R(mX,) )I I

x ”

It now follows from (2), Lemma 2, and Chebyshev’s inequality that the last term converges to zero in probability. This completes the proof of Lemma 5. Q.E.D.

Therefore the third term on the right-hand side of (8) is also negligible relative to n. R(m 1 X,) uniformly in A,,. Not so are the last two terms, but the behavior of A, depends only on the differences of FPE,(m).

Set

A,(m) = T,(m) - TAm,*)

= -2 i (B,,(X,) - e”,,(x;))( Y -0(X,)). i= 1

LEMMA 6. Under Conditions 1, 2, and 3,

lim P Max “+CC i

IAn( mE~~, n. R(m ( X,)

>6 I

=0 for all 6 >O.

That is, A,,(m) is negligible relative to n. R(m ) X,) in A,,.

Proof: Note that by the definition of m,* and (2),

4n- R(m I X,) b 2 f C~,,CX,) - W’,))’ + 2 i (B,,,$xJ - O(X,))~ i=l i=l

2 i (enm(xi) - omn;(xi))2~ i= 1

For any 6 > 0,

X1= 1 ( yi - e(xi))(onrn(xi) - ennz,(xi)) 6

(n. R(m ) X,))“’ (C;= 1 (g,,(X,) - B,,,;(Xi))2)‘/2’4 ’

Page 12: Asymptotically Optimal Selection of a Piecewise Polynomial ...Asymptotically Optimal Selection of a Piecewise Polynomial Estimator of a Regression Function* KEH-WEI CHEN University

SELECTION OF A PIECEWISE POLYNOMIAL ESTIMATOR 241

Let

~Hr?sX) = km(4 - k&)

CC1= 1 (Bnm(xi) - 8,,,(xi))21 1’2’

Then

f U~,(XJ = 1. i=l

Thus by Conditions 2 and Theorem 2 of Whittle [31], there is a constant ai > 0 such that

E [( f ~,,(~i)(Yi- QV;)) 4v X, i=l )I 1

d 04, forall mEAn and HEN.

Consequently,

Mn 44'E[C;=, U,,(Xi)(Yj-e(xi))4' I x,] 0

m=l h4'[n.R(m I X,)12"

By (2) and Lemma 2, the last series converges uniformly to zero except on an event whose probability tends to zero. This completes the proof of Lemma 6. Q.E.D.

Now Lemma 4 can be proven as follows. By (8) and the triangle inequality,

n. IFPEn - Nm I X,)) - (FPE,(m,*) - WC I X,))l

= - !fj# Znml+ $ z,,,,+S,(m)-S,(m,*)+d,(m) I= 1 /= 1

G I,$, cm,1 + I[$ &;,I + IUm)l + P,W)l + Id,(m)l.

Page 13: Asymptotically Optimal Selection of a Piecewise Polynomial ...Asymptotically Optimal Selection of a Piecewise Polynomial Estimator of a Regression Function* KEH-WEI CHEN University

242 KEH-WEI CHEN

Thus

Max W-n(m) - FPE,(m:)) - Wm I XJ - WC I Xn)) rnE~” Nm I X,)

< Max Et:, Z,I + IC;t=“, .%?;,I + IUm)l + P,hT)I + I~rz(m)l \ ms”#YAT, n.R(m I X,)

~2 Max EL zmrl + 2 max I&(m)l IAn( me~/(. n. R(m ( X,) mE.~n n.R(m I X,)+2?” n.R(m I X,)’

Hence according to (9) and Lemmas 5 and 6,

Max IV%(m) - FPWm,*)) - (Nm I X,)-WC I XJ)l rnE”H” R(m I JLJ

converges to zero in probability. Consequently,

W’W%J - FPE,(m,*)) - E,oAY. ~,,zI,(@J R(m I X,) - WC I XJ) Cme-~n I,(%). Nm I Xn)

converges to zero in probability, where Z, denotes the indicator function of m.

Now it is clear by the definition of h, and m,* that

FPE,(k,) - FPE,(m,*) d 0

and

,FA ~,h%J R(m I XJ-Nm,* I KJ 20. n

Consequently,

c meAY, L(k) Nm I XJ - WC I XJ C, E .&, ~m(&J R(m I XJ

converges to zero in probability, and hence

c mtdX. ~,,PL) R(m I X,) R(m,* I XJ

converges to one in probability. Since

(10)

n. U8,,, 6)= $ Z,,,+n.R(m IX,), /=l

Page 14: Asymptotically Optimal Selection of a Piecewise Polynomial ...Asymptotically Optimal Selection of a Piecewise Polynomial Estimator of a Regression Function* KEH-WEI CHEN University

SELECTION OF A PIECEWISE POLYNOMIAL ESTIMATOR 243

it follows from (9) that

wnti~~ 4

c mEAX, Lwtz) R(m I xl)

converges to one in probability. Now,

so it follows from (10) and (11) that L(8,&", B)/R(m,* I X,) converges to one in probability. This completes the proof of Lemma 4 and hence that of Theorem 1. Q.E.D.

ACKNOWLEDGMENTS

The author is grateful to Professor Charles J. Stone for his enthusiastic guidance and helpful criticism during all phases of this research. Thanks are also due the referee for improvements in style and presentation.

REFERENCES

[l] AKAIKE, H. (1970). Statistical predictor identification. Ann. Inst. Statist. Math. 22 203-217.

[2] AKAIKE, H. (1973). Information theory and an extension of the maximum likelihood principle. In 2nd International Symposium of Information Theory (C. N. Petrov and F. Csaki, Eds.), pp. 267-281. Akademiai Kiado, Budapest.

[3] AKAIKE, H. (1977). On entropy maximization principle. In Applications of Statistics (P. R. Krishnaiah, Ed.), pp. 2741. North-Holland, Amsterdam.

[4] AKAIKE, H. (1978). A Bayesian analysis of the minimum AIC procedure. Ann. Inst. Statist. Math. 30 914.

[S] BREIMAN, L., AND FREEDMAN, D. A. (1981). How Many Variables Should Be Entered in a Regression Equation? Technical Report, Department of Statistics, University of California, Berkeley.

163 CHEN, K. W. (1983). Asymptotically Optimal Selection of a Piecewise Polynomial Estimator of a Regression Function. Ph.D. dissertation, Department of Statistics, Univer- sity of California, Berkeley.

[7] COLU)MB, G. (1981). Estimation nonparametrique de la regression: review bibliographi- que. Internat. Statist. Rev. 49 71-93.

[S] CRAVEN, P., AND WAHBA, G. (1979). Smoothing noisy data with spline functions. Numer. Math. 24 375-382.

[9] DANIEL, C., AND WOOD, F. S. (1971). Fitting Equation to Data. Wiley-Interscience, New York.

[lo] GORDON, L., AND OLSHEN, R. A. (1980). Consistent nonparametric regression from recursive partitioning schemes. J. Multivariute Anal. 10 611627.

[ 1 l] HARDLE, W., AND MARRON, J. S. (1984). Optimal bandwidth selection in nonparametric regression function estimation. Ann. Statist., in press.

Page 15: Asymptotically Optimal Selection of a Piecewise Polynomial ...Asymptotically Optimal Selection of a Piecewise Polynomial Estimator of a Regression Function* KEH-WEI CHEN University

244 KEH-WE1 CHEN

1121 HARDLE, W., AND MARRON, .I. S. (1985). Asymptotic nonequivalence of some bandwidth selectors in nonparametric regression. Biomelrika, in press.

1131 HOCKING, R. R. (1976). The analysis and selection of variables in linear regression. Biometrics 32 1119.

[ 143 LERMAN, P. M. (1980). Fitting segmented regression model by grid search. Appl. Statist. 29 77-84.

[ 151 MAJOR, P. (1973). On a nonparametric estimation of the regression function. Studia Sci. Math. Hungar. 8 347-361.

[16] MALLOWS, C. L. (1973). Some comments on C,. Technomefrics 15 661675. [17] MARCINKIEWICZ, J., AND ZYGMUND. A. (1937). Sur les fonctions indtpendantes. Fund.

Math. 29 60-90. [18] MCGEE, V. E.. AND CARLETON, W. T. (1970). Piecewise regression. J. Amer. Statist.

Assoc. 65 1109-l 124. [193 PETROV, V. V. (1975). Sums of Independenr Random Variables. Springer-Verlag, New

York. [20] SEBER, G. A. F. (1977). Linear Regression Analysis. Wiley, New York. [21] SHIBATA, R. (1981). An optimal selection of regression variables. Biometrika 68 45-54. [22] SHIBATA, R. (1982). Asymptoric Mean Efficiency of a Selection of Regression Variables.

Technical Report, Department of Mathematics, Tokyo Institute of Technology. [23] SPECKMAN. P. (1982). Spline Smoothing and Optimal Rates of Convergence in Non-

parametric Regression Models. Technical Report, Mathematics Department, University of Oregon, Eugene.

[24] STONE, C. J. (1980). Optimal rates of convergence for nonparametric estimators. Ann. Sfarist. 8 1348-1360.

[25] STONE. C. J. (1982). Optimal global rates of convergence for nonparametric regression. Ann. Stafist. 10 1040-1053.

[26] STONE, C. J. (1983). Rafes of Convergence for Additive Regression. Technical Report, Department of Statistics, University of California, Berkeley.

[27] STONE, C. J. (1984). An asymptotically optimal window selection rule for kernel density estimates. Ann. Sfafisf. 12 1285-1297.

[28] STONE, M. (1974). Cross-vahdatory choice and assessment of statistical prediction. J. Rqv. Sfatist. Sot. Ser. B 36 111-147.

[29] THOMPSON, M. L. (1978). Selection of variables in multiple regression. 1. A review and evaluation. Internat. Statist. Rev. 46 129-146. II. Chosen procedures, computations and examples. Ibid. 46 129-146.

[30] TISH~R. A., AND ZANG, I. (1981). A maximum likelihood method for piecewise regression models with a continuous dependent variable. Appl. Statist. 30 116-124.

[31] WHITTLE, P. (1960). Bounds for the moments of linear and quadratic forms in indepen- dent variables. Theorer. Probab. Appl. 3 302-305.


Recommended