+ All Categories
Home > Documents > Recitation4 for BigData

Recitation4 for BigData

Date post: 22-Feb-2016
Category:
Upload: dionne
View: 27 times
Download: 0 times
Share this document with a friend
Description:
Recitation4 for BigData. LASSO and Coordinate Descent. Jay Gu Feb 7 2013. A numerical example. Generate some synthetic data:. N = 50 P = 200 # Non zero coefficients = 5 X ~ normal (0, I) beta_1, beta_2, beta_3 ~ normal (1, 2) sigma ~ normal(0, 0.1*I) Y = Xbeta + sigma - PowerPoint PPT Presentation
Popular Tags:
15
Recitation4 for BigData Jay Gu Feb 7 2013 LASSO and Coordinate Descent
Transcript
Page 1: Recitation4  for  BigData

Recitation4 for BigData

Jay GuFeb 7 2013

LASSO and Coordinate Descent

Page 2: Recitation4  for  BigData

A numerical example

N = 50P = 200# Non zero coefficients = 5

X ~ normal (0, I)beta_1, beta_2, beta_3 ~ normal (1, 2)sigma ~ normal(0, 0.1*I)

Y = Xbeta + sigma

Split training vs testing: 80/20

Generate some synthetic data:

Page 3: Recitation4  for  BigData
Page 4: Recitation4  for  BigData
Page 5: Recitation4  for  BigData

Practicalities• Standardize your data:• Center X, Y

remove the intercept• Scale X to have unit norm at each column

fair regularization for all covariates

• Warm start. Run Lambdas from large to small,

• Starting from the largest lambda to be max(X’y)• Guarantees to have zero support size.

Page 6: Recitation4  for  BigData

Algorithm

Ridge Regression: Closed form solution.

LASSO: Iterative algorithms:Subgradient DescentGeneralized Gradient Methods (ISTA)Accelerated Generalized Gradient Methods

(FSTA)Coordinate Descent

Page 7: Recitation4  for  BigData

SubdifferentialsCoordinate Descent

• Slides from Ryan Tibshirani

http://www.cs.cmu.edu/~ggordon/10725-F12/slides/06-sg-method.pdfhttp://www.cs.cmu.edu/~ggordon/10725-F12/slides/25-coord-desc.pdf

Page 8: Recitation4  for  BigData
Page 9: Recitation4  for  BigData
Page 10: Recitation4  for  BigData
Page 11: Recitation4  for  BigData

Coordintate Descent: always find global optimum?

• Convex and differentiable? Yes

• Convex and non-differentiable? No

Page 12: Recitation4  for  BigData

• Convex but separable non-differentiable parts?

• Yes. Proof:

Page 13: Recitation4  for  BigData

CD for Linear Regression

Page 14: Recitation4  for  BigData

Rate of Convergence?

• Assuming gradient is Lipchitz continuous.

• Subgradient Descent: 1/sqrt(k)• Gradient Descent: 1/k• Optimal rate for first order methods: 1/(k^2)• Coordinate Descent:– Only know for some special cases

Page 15: Recitation4  for  BigData

Summary: Coordinate Descent

• Good for large P• No tuning parameter• In practice converge much faster than the

optimal first order methods

• Only applies to certain cases• Unknown convergence rate for general

function classes


Recommended