+ All Categories
Home > Documents > Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional...

Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional...

Date post: 19-Jun-2020
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
40
Regularized, Polynomial, Logistic Regression Pradeep Ravikumar Co-instructor: Ziv Bar-Joseph Machine Learning 10-701
Transcript
Page 1: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

Regularized,Polynomial,LogisticRegression

PradeepRavikumar

Co-instructor:Ziv Bar-Joseph

MachineLearning10-701

Page 2: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

Regressionalgorithms

Learning algorithm

2

LinearRegressionRegularizedLinearRegression– Ridgeregression,LassoPolynomialRegressionGaussianProcessRegression…

thatpredicts/estimatesoutputYgiveninputX

Page 3: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

Recap:LinearRegression

3

- ClassofLinearfunctions

β1 - intercept

β2 =slopeUni-variate case:

Multi-variate case:

where,

LeastSquaresEstimator

Page 4: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

Recap:LeastSquaresEstimator

4

f(Xi) = Xi�

Page 5: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

Recap:LeastSquaresolutionsatisfiesNormalEquations

5

Ifisinvertible,

Whenisinvertible?Recall:Fullrankmatricesareinvertible.Whatisrankof?

pxp px1 px1

Rank =numberofnon-zeroeigenvaluesof<=min(n,p)sinceA isnxp

So,rank=:r<=min(n,p)Notinvertibleifr<p(e.g.n<pi.e.high-dimensionalsetting)

Page 6: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

6

RegularizedLeastSquares

Whatifisnotinvertible?

requations,punknowns– underdeterminedsystemoflinearequationsmanyfeasiblesolutions

Needtoconstrainsolution further

e.g.biassolutionto“small”valuesofβ (smallchangesininputdon’ttranslatetolargechangesinoutput)

�̂MAP = (A>A+ �I)�1A>Y

(A>A+ �I)Is invertible?

� � 0

RidgeRegression(l2penalty)

Page 7: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

7

UnderstandingregularizedLeastSquares

RidgeRegression:

βs withconstantJ(β)(levelsetsofJ(β))

βs withconstantl2norm(levelsetsofpen(β))

β2

β1

Unregularized Least Squares solution

Page 8: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

8

RegularizedLeastSquares

Whatifisnotinvertible?

Lasso(l1penalty)

requations,punknowns– underdeterminedsystemoflinearequationsmanyfeasiblesolutions

Needtoconstrainsolution further

e.g.biassolutionto“small”valuesofb (smallchangesininputdon’ttranslatetolargechangesinoutput)

Manyparametervalues canbezero– manyinputsareirrelevanttopredictioninhigh-dimensionalsettings

� � 0

RidgeRegression(l2penalty)

Page 9: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

9

RegularizedLeastSquares

Whatifisnotinvertible?

Lasso(l1penalty)

requations,punknowns– underdeterminedsystemoflinearequationsmanyfeasiblesolutions

Needtoconstrainsolution further

e.g.biassolutionto“small”valuesofβ (smallchangesininputdon’ttranslatetolargechangesinoutput)

� � 0

RidgeRegression(l2penalty)

Noclosedformsolution,butcanoptimizeusingsub-gradientdescent(packagesavailable)

Page 10: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

RidgeRegressionvs Lasso

10

RidgeRegression: Lasso:

Lasso(l1penalty)resultsinsparsesolutions– vectorwithmorezerocoordinatesGoodforhigh-dimensionalproblems– don’thavetostoreallcoordinates,interpretablesolution!

βs withconstantl1norm

Ideallyl0penalty,butoptimizationbecomesnon-convex

βs withconstantl0norm

βs withconstantJ(β)(levelsetsofJ(β))

βs withconstantl2norm

β2

β1

Page 11: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

LassovsRidge

LassoCoefficients RidgeCoefficients

0 10 20 30 40 50 60 70 80 90 1000

2

4

6

8

10

12

0 10 20 30 40 50 60 70 80 90 100-3

-2

-1

0

1

2

3

4

5

Page 12: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

12

RegularizedLeastSquares– connectiontoMLEandMAP(Model-basedapproaches)

Page 13: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

LeastSquaresandM(C)LE

13

Intuition:Signalplus(zero-mean)Noisemodel

LeastSquareEstimateissameasMaximumConditionalLikelihoodEstimateunderaGaussianmodel!

Conditional log likelihood

= X�⇤

p({Yi}ni=1|�,�2, {Xi}ni=1)

Page 14: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

RegularizedLeastSquaresandM(C)AP

14

Whatifisnotinvertible?

Conditional log likelihood logprior

I)GaussianPrior

0

Ridge Regression

b�MAP = (AAA>AAA+ �III)�1AAA>YYY

p({Yi}ni=1|�,�2, {Xi}ni=1)

Page 15: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

RegularizedLeastSquaresandM(C)AP

15

Whatifisnotinvertible?

Priorbeliefthatβ isGaussianwithzero-meanbiasessolution to“small”β

I)GaussianPrior

0

Ridge Regression

Conditional log likelihood logprior

p({Yi}ni=1|�,�2, {Xi}ni=1)

Page 16: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

RegularizedLeastSquaresandM(C)AP

16

Whatifisnotinvertible?

Priorbeliefthatβ isLaplacewithzero-meanbiasessolution to“sparse”β

Lasso

II)LaplacePrior

Conditional log likelihood logprior

p({Yi}ni=1|�,�2, {Xi}ni=1)

Page 17: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

BeyondLinearRegression

17

PolynomialregressionRegressionwithnonlinearfeatures

Page 18: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

PolynomialRegression

18

Univariate (1-dim)case:

where,

Multivariate(p-dim)case:

degreem

f(X) = �0 + �1X(1) + �2X

(2) + · · ·+ �pX(p)

+pX

i=1

pX

j=1

�ijX(i)X(j) +

pX

i=1

pX

j=1

pX

k=1

X(i)X(j)X(k)

+ . . . terms up to degree m

b�MAP = (ATA+ �I)�1ATYor

where

Page 19: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

19

PolynomialRegression

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

1.2

1.4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-45

-40

-35

-30

-25

-20

-15

-10

-5

0

5

k=1 k=2

k=3 k=7

Polynomialoforderk,equivalentlyofdegreeuptok-1

Whatistherightorder?

Page 20: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

Bias– VarianceTradeoffLargebias,Smallvariance– poorapproximationbutrobust/stable

Smallbias,Largevariance– goodapproximationbutunstable

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

3Independenttrainingdatasets

Page 21: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

21

Bias– VarianceDecomposition

Laterinthecourse,wewillshowthat

E[(f(X)- f*(X))2]=Bias2 +Variance

Bias=E[f(X)]– f*(X) …..Howfaristhemodelfrom“truefunction”

Variance=E[(f(X)- E[f(X)])2] …..Howvariable/stableisthemodel

Page 22: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

EffectofModelComplexity

Testerror

Variance

Bias

Page 23: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

EffectofModelComplexity

Testerror

Variance

BiasTrainingerror

Page 24: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

Regressionwithbasisfunctions

24

PolynomialBasis

Basisfunctions (Linearcombinationsyieldmeaningful spaces)Basiscoefficients

Goodrepresentationforperiodic functions

Goodrepresentationforlocalfunctions

FourierBasis WaveletBasis

…… …

Page 25: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

25

Regressionwithnonlinearfeatures

Ingeneral,useanynonlinearfeatures

e.g.eX,logX,1/X,sin(X),…

Nonlinearfeatures

Weightofeachfeature

�0(X1) �1(X1) . . . �m(X1)

�0(Xn) �1(Xn) . . . �m(Xn)

X = [�0(X) �1(X) . . . �m(X)]

b�MAP = (ATA+ �I)�1ATY

or

Page 26: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

26

RegressiontoClassificationRegression

Classification

AnemiccellHealthycell

X=CellImage Y=Diagnosis

X=BrainScan

Y=Ageofasubject

Canwepredictthe“probability”ofclasslabelbeingAnemicorHealthy– arealnumber– usingregressionmethods?

Butoutput(probability)needstobein[0,1]

Page 27: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

LogisticRegression

27

AssumesthefollowingfunctionalformforP(Y|X):

Logisticfunction(orSigmoid):

Logisticfunctionappliedtoalinearfunctionofthedata

zlogistic(z)

Featurescanbediscreteorcontinuous!

Notreallyregression

Page 28: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

LogisticRegressionisaLinearClassifier!

28

AssumesthefollowingfunctionalformforP(Y|X):

Decisionboundary:

(LinearDecisionBoundary)

0

1

Note- Labelsare0,1

1

Page 29: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

LogisticRegressionisaLinearClassifier!

29

AssumesthefollowingfunctionalformforP(Y|X):

01

0

1

1

Page 30: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

TrainingLogisticRegression

30

Howtolearntheparametersw0,w1,…wd?(dfeatures)

TrainingData

MaximumLikelihoodEstimates

Butthereisaproblem…

Don’thaveamodelforP(X)orP(X|Y)– onlyforP(Y|X)

Page 31: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

TrainingLogisticRegression

31

Howtolearntheparametersw0,w1,…wd? (dfeatures)

TrainingData

Maximum(Conditional)LikelihoodEstimates

Discriminativephilosophy– Don’twasteeffortlearningP(X),focusonP(Y|X)– that’sallthatmattersforclassification!

Page 32: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

ExpressingConditionallogLikelihood

32

Badnews:noclosed-formsolutiontomaximizel(w)

Goodnews:l(w)isconcavefunctionofwconcavefunctionseasytomaximize

Page 33: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

33

Concavefunctionl(w)

w

A function l(w) is called concaveif the line joining two pointsl(w1),l(w2) on the function doesnot go above the function on theinterval [w1,w2]

w1 w2

l(w1)

l(w2)

(Strictly)Concavefunctionshaveauniquemaximum!

Convex BothConcave&Convex Neither

Page 34: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

Optimizingconcavefunction

34

• ConditionallikelihoodforLogisticRegressionisconcave• Maximumofaconcavefunctioncanbereachedby

GradientAscentAlgorithm

Gradient:

Learningrate,η>0Updaterule:

d

l(w)

w

Initialize:Pickw atrandom

Page 35: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

GradientAscentforLogisticRegression

35

Gradientascentruleforw0:

=X

j

"yj � 1

1 + exp(w0 +Pd

i wixji )

· exp(w0 +dX

i

wixji )

#

Page 36: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

GradientAscentforLogisticRegression

36

• Gradientascentissimplestofoptimizationapproaches– e.g.,Newtonmethod,Conjugategradientascent,IRLS(seeBishop4.3.3)

Gradientascentalgorithm:iterateuntilchange<ε

Fori=1,…,d,

repeat PredictwhatcurrentweightthinkslabelYshouldbe

Page 37: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

That’sallM(C)LE.HowaboutM(C)AP?

37

• Definepriorsonw– Commonassumption:Normaldistribution,

zeromean,identitycovariance– “Pushes”parameterstowardszero

• M(C)APestimate

Stillconcaveobjective!

Zero-meanGaussianprior

Penalizeslargeweights

Page 38: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

M(C)AP– Gradient

• Gradient

38

Zero-meanGaussianprior

Sameasbefore

ExtratermPenalizeslargeweights

Page 39: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

M(C)LEvs.M(C)AP

39

• Maximumconditionallikelihoodestimate

• Maximumconditionalaposterioriestimate

Page 40: Regularized, Polynomial, Logistic RegressionLogistic Regression 27 Assumes the following functional form for P(Y|X): Logistic function (or Sigmoid): Logistic function applied to a

LogisticRegressionformorethan2classes

40

• Logisticregressioninmoregeneralcase,whereY {y1,…,yK}

fork<K

fork=K(normalization,sonoweightsforthisclass)

Predict

Isthedecisionboundarystilllinear?

2


Recommended