+ All Categories
Home > Documents > Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf ·...

Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf ·...

Date post: 04-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
28
Efficient Estimation in Convex Single Index Models 1 Rohit Patra University of Florida http://arxiv.org/abs/1708.00145 1 Joint work with Arun K. Kuchibhotla (UPenn) and Bodhisattva Sen (Columbia) 1/28
Transcript
Page 1: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Efficient Estimation in Convex Single IndexModels1

Rohit PatraUniversity of Florida

http://arxiv.org/abs/1708.00145

1Joint work with Arun K. Kuchibhotla (UPenn) and Bodhisattva Sen (Columbia)

1/28

Page 2: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Overview

1 Introduction

2 Estimation

3 Asymptotics

4 Simulation study

2/28

Page 3: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Introduction

3/28

Page 4: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

A semiparametric model

Convex single index model

Y = m0(θ>0 X ) + ε, E(ε|X ) = 0.

(Y ,X ) ∈ R× Rd ∼ Pθ0,m0 .

θ0 ∈ Rd is the unknown coefficient vector.

m0 : R→ R is an unknown convex link function with no parametricrestriction.

Offers a balance between flexibility of nonparametric models andinterpretability of parametric models.

4/28

Page 5: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Goal and Applicability

Model

Y = m0(θ>0 X ) + ε, E(ε|X ) = 0.

Problem

Estimate θ0 and m0 simultaneously, when we have i.i.d. data {(xi , yi )}ni=1

from the above model.

Why convex link

Convex/concave SIMs are widely used in economics, operationsresearch, financial engineering among other fields.

Production functions, utility functions, and call option prices areknown to be concave.

5/28

Page 6: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Restrictions

Identifiability: If m1(t) = m0(−t/2) and θ1 = −2θ0, thenm0(θ>0 x) = m1(θ>1 x). Thus we need to assume

θ0 ∈ Θ := {β ∈ Rd : |β| = 1,β1 > 0}.

We need some “regularity” assumptions on the class of link functions.

Only Shape constraints

Shape and Smoothness constraints

Some relevant works in the shape constrained single index model include:Murphy et al. (1999), Chen and Samworth (2015), Groeneboom andHendrickx (2016), and Balabdaoui et al. (2016).

6/28

Page 7: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Y = 2(X>θ0)2 + N(0, 5),

X ∼ U[0, 1]3, m(t) = 2t2.

Y: Output for 555 Belgian Firms

X: Labour, Capital, Wage

●●

●●

●●

●●

●●

●●

●●

0

10

20

30

0 1 2 3 4 5Index

y

Convex LSE Smoothing Splines Truth

●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●●●

●●

●●●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●●

●●

●●●

●●●

●●

●●

●●

●●

−1

0

1

2

3

4

0 2 4Index

log(

outp

ut)

Splines Concave LSE Kong and Xia (2007)

Labour Data for Belgian Firms (1996)

Figure: Estimated link functions: stability due to convex constraint. Index := θ̂x

.7/28

Page 8: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Estimation

8/28

Page 9: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Estimation in Convex SIM [Kuchibhotla, Patra, Sen, 2017]

Lipschitz LSE

(m̂, θ̂) = argminm∈CL,θ∈Θ

1

n

n∑i=1

{yi −m(θ>xi )}2,

where

CL ={m|m is convex and |m(t1)−m(t2)| ≤ L|t1 − t2| ∀ t1, t2 ∈ R

}.

Penalized LSE

(m̌, θ̌) := argmin(m,θ)∈R×Θ

1

n

n∑i=1

{yi −m(θ>xi )}2 + λ̌2n

∫{m′′(t)}2dt,

where R denotes the class of all convex functions that have absolutelycontinuous first derivative.

9/28

Page 10: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Computation: Alternating Scheme

Recall:

(m̂, θ̂) = argminm∈CL,θ∈Θ Qn(m, θ), where

Qn(m, θ) :=1

n

n∑i=1

{yi −m(θ>xi )}2.

Minimization for fixed θ

For a fixed θ, define

m̂θ := argminm∈CL

Qn(m, θ).

The minimization is a convex optimization problem.

m̂θ can computed efficiently using the nnls package in R.

m̌θ can be computed via a damped newton type algorithm. Our Rpackage simest has a implementation of this computation.

10/28

Page 11: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Computation contd.

Minimization for fixed θ

We can now define the profiled loss Qn : Θ→ R,

Qn(θ) := Qn(m̂θ, θ)

Minimization over θ

To find θ̂, we now minimize Qn(θ) over θ ∈ Θ.

The loss function is not convex.

Simulations suggest a large domain of attraction for moderate d .

We use a gradient step on the unit sphere based on the rightderivative of m̂θ.

Some initial work has suggested that the convergence of thisalternating scheme is linear, i.e., |θ(k+1) − θ̂| ≤ (1− ρ)|θ(k) − θ̂|.

11/28

Page 12: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Asymptotics

12/28

Page 13: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Theoretical properties of LLSE

Rates of convergence of the estimators [Kuchibhotla, Patra, Sen, 2017]

Under some regularity conditions on m0 and distribution of X (boundedsupport), sub-Gaussian errors and L ≥ L0 the Lipschitz LSE satisfies

‖m̂ ◦ θ̂ −m0 ◦ θ0‖ = Op(n−2/5), [Estimation error]

‖m̂ ◦ θ0 −m0 ◦ θ0‖ = Op(n−2/5), [Estimation error of m̂]

‖m̂′ ◦ θ0 −m′0 ◦ θ0‖ = Op(n−2/15), [Estimation error of m̂′]

|θ̂ − θ0| = Op(n−2/5). [Estimation error of θ̂]

For a convex Lipschitz function g : R→ R let g ′ define the right

derivative of g that satisfies g(b) = g(a) +∫ b

ag ′(t)dt.

Here for any f : R→ R, and θ ∈ Rd , we define ‖f ◦ θ‖2 :=∫|f (θ>x)|2dPX (x).

13/28

Page 14: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Asymptotic normality: Homoscedastic model

Recall:

(m̂, θ̂) = argmin(m,θ)∈CL×Θ

1

n

n∑i=1

{yi −m(θ>xi )}2.

Semiparametric efficiency of θ̂ [Kuchibhotla, Patra, and Sen, 2017]

Assume that E(ε2|X ) ≡ σ2. Let `θ,m : Rd × R→ Rd−1 be the efficientscore and let us define the efficient information matrix

Iθ0,m0 := E(`θ0,m0`>θ0,m0

) ∈ R(d−1)×(d−1).

If m0 is twice differentiable, then under some regularity conditions we canconclude

√n(θ̂ − θ0)

d→ N(0,Hθ0I−1θ0,m0

H>θ0).

14/28

Page 15: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Inference

Our result readily yields confidence sets for θ0.

We can use the following plug-in estimator for the covarianceestimator:

Σ̂ := σ̂4Hθ̂Pθ̂,m̂[`θ̂,m̂(Y ,X )`>θ̂,m̂

(Y ,X )]−1H>θ̂,

where

σ̂2 :=n∑

i=1

[yi − m̂(θ̂>xi )]2/n

`θ,m(y , x) :=(y −m(θ>x)

)m′(θ>x)H>θ

{x − hθ(θ>x)

}.

Asymptotic 1− 2α confidence interval[θ̂i −

zα√n

(Σ̂i,i

)1/2

, θ̂i +zα√n

(Σ̂i,i

)1/2],

15/28

Page 16: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Asymptotic properties of the PLSE

If λ̌−1n = Op(n2/5) and λ̌n = op(n−1/4), then under some similar regularity

assumptions the PLSE satisfies

‖m̌ ◦ θ̌ −m0 ◦ θ0‖ = Op(λ̌n), [Estimation error]

‖m̌ ◦ θ0 −m0 ◦ θ0‖ = Op(λ̌n), [Estimation error for m̌]

‖m̌′ ◦ θ0 −m′0 ◦ θ0‖ = Op(λ̌1/2n ), [Estimation error for m̌′]

and √n(θ̌ − θ0)

d→ N(0,Hθ0I−1θ0,m0

H>θ0).

An example choice of λ̌n := C n−2/5.

Proof borrows ideas from the empirical process theory; see e.g.,Mammen and van de Geer (1997) and van de Geer (2000).

16/28

Page 17: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Difficulty in proving efficiency

The LLSE m̂ is a piecewise affine function and lies on the boundary of CL.

Nuisance tangent space

Consider the following model:

Y = m(θ>X ) + ε, where m ∈ CL, and θ ∈ Θ.

A linear perturbation/submodel around m:

ms,a(t) = m(t)− s a(t), where s ∈ R.

The score for the single index model along this submodel isproportional to a(·).

lin{a : D → R|ms,a ∈ R for small enough s} ⊆ L2(Λ).

The set inclusion is strict when m is not strongly convex.

17/28

Page 18: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Efficient score

Let Sθ denote the parametric score of the model and Λm denote thenuisance tangent space.

When m is strongly convex, the efficient score is known to be

Π(Sθ|Λ⊥m) := `θ,m(y , x) =(y−m(θ>x)

)m′(θ>x)H>θ

{x − hθ(θ>x)

}.

Since m̂ is not strongly convex, it is not clear if one can show that

Π(Sθ̂|Λ⊥m̂)

??= `θ̂,m̂.

18/28

Page 19: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Efficient score

Since m̂ lies on the boundary of CL, the “least favorable path” doesnot exist, i.e., we can not find (θt ,mt) (centered at (θ̂, m̂)) such that

`θ̂,m̂??=

∂t

(y −mt(θ

>t x)

)2∣∣∣∣t=0

.

Since LLSE is the minimizer of the least squares loss, this would mean

Pn`θ̂,m̂ = 0.

Since m̂ is piecewise affine, it is not clear if one can show that

Pn`θ̂,m̂??= op(n−1/2).

19/28

Page 20: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Approximations

We next try to construct some paths around (θ̂, m̂) that will havescore “close” to `θ̂,m̂.

We find a path that has the following score:

Sθ,m = {(y −m(θ>x)

)H>θ

[m′(θ>x)x +

∫ θ>x

s0

m′(u)k ′(u)du −m′(θ>x)k(θ>x)

+ m′0(s0)k(s0)−m′

0(s0)hθ0(s0)

].

Note `θ0,m0 = Sθ0,m0 =(y −m(θ>0 x)

)H>θ0

m′0(θ>0 x){x − hθ0 (θ>0 x)

}.

van der Vaart (2002) calls such a path “approximately leastfavorable”.

20/28

Page 21: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Approximation, Part 2

Sθ,m is not very tractable, so we further approximate this by

ψθ,m(x , y) := (y −m(θ>x))H>θ [m′(θ>x)x − hθ0 (θ>x)m′0(θ>x)].

Compare ψθ,m to

`θ,m =(y −m(θ>x)

)H>θ m

′(θ>x){x − hθ(θ>x)

}.

Also note ψθ0,m0 = `θ0,m0 .

We show thatPnψθ̂,m̂ = op(n−1/2).

21/28

Page 22: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Simulation study

22/28

Page 23: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Another Estimator

Convex LSE

(m̃, θ̃) := argmin(m,θ)∈C×Θ

Qn(m, θ).

We can compute the LSE via the alternating minimization algorithm.

Simulations suggest θ̃ is√n−consistent.

The behavior of m̃′ at the boundary is not well-understood.

In fact, in univariate regression, m̃′ is unbounded in probability at theboundary.

23/28

Page 24: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Choice of L

Y = (θ>0 X )2 + N(0, .12), where X ∼ Uniform[−1, 1]4 and θ0 = 14/2.

2 3 4 5 7 10 CvxLSE

0.00

0.04

0.08

L

Mea

n A

bsol

ute

Dev

iatio

n

Figure: Box plots of 14

∑4i=1 |θi − θ0,i | (over 1000 replications, n = 500) from the

following model as the tuning parameter varies over {3, 4, 5, 7, 10} and CvxLSE.Here L0 = 4.

24/28

Page 25: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Confidence Interval

Y = (θ>0 X )2 + N(0, .32), X ∼ Uniform[−1, 1]3 and θ0 = 13/√

3. (1)

Table: The estimated coverage probabilities and average lengths (obtained from800 replicates) of nominal 95% confidence intervals for the first coordinate of θ0

for the model (1).

nCvxLip CvxPen

Coverage Avg Length Coverage Avg Length

50 0.92 0.30 0.94 0.29100 0.91 0.18 0.92 0.19200 0.92 0.13 0.93 0.13500 0.94 0.08 0.92 0.08

1000 0.93 0.06 0.92 0.062000 0.92 0.04 0.93 0.04

25/28

Page 26: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

CvxPen

CvxLip

Smooth

EFM

EDR

0.000

0.001

0.002

0.003

0.004

0.005 d = 10

CvxPen

CvxLip

Smooth

EFM

EDR

0.000

0.002

0.004

0.006

0.008

0.010 d = 25

CvxPen

CvxLip

Smooth

EFM

EDR

0.000

0.005

0.010

0.015

0.020

0.025 d = 50

CvxPen

CvxLip

Smooth

EFM

0.00

0.02

0.04

0.06

0.08

0.10d = 100

Figure: Boxplots of∑d

i=1 |θ̂i − θ0,i |/d (over 500 replications) based on 200observations for dimensions 10, 25, 50, and 100.

26/28

Page 27: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

Summary

First work providing efficient estimator in shape-constrained LSE (in abundled parameter problem) when m̂ is piecewise affine.

Our estimators readily lead to asymptotic confidence sets for θ0.

Our methods are robust towards the choice of the tuning parameter.

The proposed estimators are implemented in the R package simest.

27/28

Page 28: Rohit Patra University of Floridausers.stat.ufl.edu/~rohitpatra/PapersandDraft/cvxsimslides.pdf · Goal and Applicability Model Y = m 0 ( >X) + ; E( jX) = 0: Problem Estimate 0 and

References

[1] Kuchibhotla, A. K. and Patra, R. K. (2016).simest: Single Index Model Estimation with Constraints on Link Function.R package version 0.2.

[2] Kuchibhotla, A. K., Patra, R. K., and Sen, B. (2017).Efficient Estimation in Convex Single Index Models.arxiv.org/abs/1708.00145.

[3] Kuchibhotla, A. K., and Patra, R. K. (2017).Efficient estimation in single index models through smoothing splines.arxiv.org/abs/1612.00068.

[4] Balabdaoui, F., Durot, C. and Jankowski, H. (2016).Least squares estimation in the monotone single index model.arxiv.org/abs/1610.06026.

[5] Groeneboom, P. and Hendrickx, K. (2016).Current status linear regression.Annals of Statistics (Forthcoming).

[6] Chen, Y. and Samworth, R. J. (2014).Generalised additive and index models with shape constraints.JRSSB. 78(4), 729–754..

[7] Murphy, S. A., van der Vaart, A. W. and Wellner, J. A. (1999).Current status regression.Math. Methods Statist. 8(3), 407425.

Thank You! Questions?

28/28


Recommended