+ All Categories
Home > Documents > Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis...

Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis...

Date post: 17-Dec-2015
Category:
Upload: conrad-hoover
View: 218 times
Download: 2 times
Share this document with a friend
Popular Tags:
39
Machine Learning Week 1, Lecture 2
Transcript
Page 1: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Machine Learning

Week 1, Lecture 2

Page 2: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

RecapSupervised Learning

Data Set

Learning Algorithm

Hypothesis hh(x) ≈ f(x)

Unknown Target f

Hypothesis Set5 0 4 1 9 2 1 3 1 4

Hyperplane

Halfspace >0

Halfspace < 0

w

Classification Regression

Page 3: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

np-hard in general

Assume Data Is Linear Separable!!!Perception find separating hyperplane Convex

Page 4: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Today

• Convex Optimization – Convex sets– Convex functions

• Logistic Regression– Maximum Likelihood– Gradient Descent

• Maximum likelihood and Linear Regression

Page 5: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Convex Optimization

Optimization problem, in general very hard (if possible at all)!!!

For convex optimization problemstheoretical (polynomial time) and practical solutions exist(most of the time)

Example:

Page 6: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Convex Sets

Convex Set Non-convex Set

The “line” from x to y must also be in the set

Page 7: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Convex Sets

Union of convex setsmay not be convex

Intersection of convex sets

Page 8: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Convex Functions

x,f(x)

y,f(y)

f is concave if –f is convex

Concave?, Convex? Both

Page 9: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Differentiable Convex Functions

x,f(x)

y,f(y)

f(x)+f’(x)(y-x)

Example

Page 10: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Twice Differentiable Convex Functions

f is convex if the Hessian is positive semi-definite for all x.

Real symmetric matrix A is positive semidefinite if for all nonzero x

1D:

Page 11: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Simple 2D Example

Page 12: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

More Examples

Quadratic Functions:

Convex if A is positive semidefinite

Affine Functions:

Page 13: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Convexity of Linear Regression

Quadratic Functions:

Convex if A is positive semidefinite

Real and Symmetric: Clearly

Page 14: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

EpigraphConnection between convex sets and convex functions

f is convex if epi(f) is a convex set

Page 15: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Sublevel setsConvex function

Define α-Sublevel set:

Is Convex

Page 16: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Convex Optimization

f and g are convex, h is affine

Local Minima are Global Minima

Page 17: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Examples of Convex Optimization

• Linear Programming

• Quadratic Programming (P is positive semidefinite)

Page 18: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Summary

Rockafellar stated, in his 1993 SIAM Review survey paper “In fact the great watershed in optimization isn’t between linearity and nonlinearity, but convexity and nonconvexity”

Convex GOOD!!!!

Page 19: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Estimating Probabilities

• Probability of getting cancer given your situation.

• Probability that AGF wins against Viborg given the last 5 results.

• Probability that the loan is not payed back as a function of credit worthiness

• Probability of a student getting an A in Machine Learning given his grades.

Data is actual events not probabilities, e.g. some students that failed and some who did not…

Page 20: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Breast Cancerhttp://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29

• 1. Sample code number: id number • 2. Clump Thickness: 1 - 10 • 3. Uniformity of Cell Size: 1 - 10 • 4. Uniformity of Cell Shape: 1 - 10 • 5. Marginal Adhesion: 1 - 10 • 6. Single Epithelial Cell Size: 1 - 10 • 7. Bare Nuclei: 1 - 10 • 8. Bland Chromatin: 1 - 10 • 9. Normal Nucleoli: 1 - 10 • 10. Mitoses: 1 - 10

Input Features• benign • malignant

Target Function

PREDICT PROBABILITY OF BENIGN AND MALIGNANT ON FUTURE PATIENTS

Page 21: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Maximum LikelihoodBiased Coin, (bias θ probability of heads)Flip it n times independently (Bernoulli trials), Count the number of heads k

Fix θ, What is the probability of seeing DTake Logs

After seeing the data what can we infer

Likelihood of the data

Page 22: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Maximum Likelihood

solve for 0

Compute Gradient

Negative Log Likelihood of the data (log is monotone)

Maximize

Minimize

Page 23: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Bayesian Perspective

Bayes Rule:

Want: Need: A Prior

Likelihood x Prior

Normalizing factor

Posterior

Page 24: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Bayesian Perspective

• Compute the probability of each hypotheses• Pick the most likely and use for predictions

(map = maximum a posteriori)• Compute Expected Values (Weighed average

over all hypotheses)

Page 25: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Logistic Regression

Assume Independent Data Points, Apply Maximum Likelihood (there is a Bayesian version to)

Hard Threshold Hard and Soft Threshold

Can and is used for classification. Predict most likely y

Page 26: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Maximum Likelihood Logistic Regression

Neg. Log likelihood is convex

Cannot solve for zero analytically

Page 27: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Descent Methods

Iteratively move toward a better solution

where f is twice continuously differentiable

• Pick start point x• Repeat Until Stopping Criterion Satisfied• Compute Descent Direction v• Line Search: Compute Step Size t• Update: x = x + t v

Gradient Descent

Page 28: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Line (Ray) Search• Pick start point x• Repeat Until Stopping Criterion Satisfied• Compute Descent Direction v• Line Search: Compute Step Size t• Update: x = x + t v

• Solve analytically (if possible)• Backwards Search start high and decrease until

improving distance found [SL 9.2]• Fix to a small constant• Use size of the gradient scaled with small constant.• Start with constant, let it decrease slowly or when to high

Page 29: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Stopping Criteria

• Gradient becomes very small• Max number of iterations used

Page 30: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Gradient Descent for Linear Reg.

Page 31: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

GD For Linear RegressionMatlab style

function theta= GD(X,y,theta)LR = 0.1for i=1:50

cost = (1/length(y))* sum((X*theta-y).^2)grad = (1/length(y))*2.*X'*(X*theta-y)

theta = theta – LR * gradend

Note we do not scale gradient to unit vector

Page 32: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Learning Rate Learning Rate Learning Rate

Page 33: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Gradient Descent Jump Around

Use Exact Line Search Starting From (10,1)

Page 34: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Gradient Descent Running Time

• Number of iterations x Cost per iteration.• Cost Per Iteration is usually not a problem.• Number of iterations depends choice of line

search and stopping Criterion clearly.– Very Problem and Data Specific– Need a lot of math to give bounds. – We will not cover it in this course.

Page 35: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Gradient Descent For Logistic Regression

Handin 1!A long with multiclass extension

Page 36: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Stochastic Gradient Descent

Pick at random and use

Use K points chosen at randomMini-Batch Gradient Descent

Page 37: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Linear Classification with K classes

• Use Logistic regression All Vs one.– Train K classifiers one for each class– Input X is the same. Y is 1 for all elements from that

class and 0 otherwise (All vs. One)– Prediction, compute the probability for all K classifiers

output class with highest probability.• Use Softmax Regression– Extension of logistic function to K classes in some

sense– Covered in Handin 1.

Page 38: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Maximum Likelihood and Linear Regression (Time to spare slide)

Assume: Independently

Page 39: Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.

Todays Summary• Convex Optimizations

– Many Definitions– Local Optimal is Global Optimal– Usually theoretical and practically feasible

• Maximum likelihood– Use as a proxy for– Assume Independent Data

• Gradient Descent– Minimize function – Iteratively finding better solution by local steps based on gradient– First order method (Uses gradient)– Other methods exist, e.g. Second order methods (use hessian)


Recommended