+ All Categories
Home > Documents > UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3:...

UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3:...

Date post: 20-May-2020
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
60
UVA CS 6316: Machine Learning Lecture 3: Linear Regression Basics Dr. Yanjun Qi University of Virginia Department of Computer Science 9/18/19 Dr. Yanjun Qi / UVA CS 1
Transcript
Page 1: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

UVA CS 6316: Machine Learning

Lecture 3: Linear Regression Basics

Dr. Yanjun Qi

University of Virginia Department of Computer Science

9/18/19 Dr. Yanjun Qi / UVA CS 1

Page 2: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Course Content Plan èSix major sections of this course

q Regression (supervised)q Classification (supervised)q Unsupervised models q Learning theory

q Graphical models

qReinforcement Learning

9/18/19 Yanjun Qi / UVA CS 2

Y is a continuous

Y is a discrete

NO Y

About f()

About interactions among X1,… Xp

Learn program to Interact with its environment

Page 3: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Today: Multivariate Linear Regression in a Nutshell

Regression: y continuous

Y = Weighted linear sum of X’s

Sum of Squared Error (Least Squared)

Normal Equation / GD / SGD

Regression coefficients

9/18/19 Dr. Yanjun Qi / UVA CS 3

Task: y

Representation: x, f()

Score Function: L()

Search/Optimization : argmin()

Models, Parameters : f(), w, b

Page 4: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

A Dataset for regression

• Data/points/instances/examples/samples/records: [ rows ]• Features/attributes/dimensions/independent

variables/covariates/predictors/regressors: [ columns, except the last] • Target/outcome/response/label/dependent variable: special column to be predicted

[ last column ]

9/18/19 Dr. Yanjun Qi / UVA CS 4

continuous valued

variable

Page 5: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

SUPERVISED Regression

• e.g. target Y can be a continuous target variable

9/18/19 Dr. Yanjun Qi / 5

f(x?)

Training dataset consists of input-

output pairs

10

12

1

5

7

9

12.5

3

5

20

7

Page 6: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

9/18/19 Dr. Yanjun Qi / UVA CS 6

!ytrain =

y1y2"yn

!

"

#####

$

%

&&&&&

!ytest =

yn+1yn+2"yn+m

!

"

#####

$

%

&&&&&

𝐗"#$%& =

− − 𝐱*+ − −− − 𝐱,+ − −⋮ ⋮ ⋮

− − 𝐱&+ − −

𝐗"./" =

− − 𝐱&0*+ − −− − 𝐱&0,+ − −⋮ ⋮ ⋮

− − 𝐱&01+ − −

Page 7: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Today: Multivariate Linear Regression in a Nutshell

Regression: y continuous

Y = Weighted linear sum of X’s

Sum of Squared Error (Least Squared)

Normal Equation / GD / SGD

Regression coefficients

9/18/19 Dr. Yanjun Qi / UVA CS 7

Task: y

Representation: x, f()

Score Function: L()

Search/Optimization : argmin()

Models, Parameters : f(), w, b

Page 8: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Review: f(x) is Linear when X is 1D

• f(x)=wx+b?

9/18/19 Dr. Yanjun Qi / UVA CS 8

b

w

A slope of 2 (i.e. w=2) means that every 1-unit change in X yields a 2-unit change in Y.

y

x

Page 9: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Review: f(x) is Linear when X is 1D

• f(x)=wx+b?

9/18/19 Dr. Yanjun Qi / UVA CS 9

b

w

A slope of 2 (i.e. w=2) means that every 1-unit change in X yields a 2-unit change in Y.

y

x

Page 10: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

9/18/19 Dr. Yanjun Qi / UVA CS 10

Page 11: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

One concrete example

9/18/19 Dr. Yanjun Qi / UVA CS 11

Page 12: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

9/18/19

Dr. Yanjun Qi / UVA CS 12

rent

Living area as X

rent

Living area

Location

(Living area, Location) as X

yf(x)=wx+b

y = f (x) =θ0 +θ1x1 +θ2x2

Page 13: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Linear SUPERVISED Regression

e.g. Linear Regression Models

9/18/19 Dr. Yanjun Qi / UVA CS 13

y = f (x) =θ0 +θ1x1 +θ2x2=> Features x:

e.g., Living area, distance to campus, # bedroom …=> Target y:

e.g., Rent è Continuous

Page 14: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Apply f(x) : A Concise Notation

• Represent each sample x as a column vector, plus a pseudo feature• We add a pseudo "feature" x0=1 (this is the intercept term ), and RE-define

the feature vector to be:

• The parameter vector is also a column vector

9/18/19 Dr. Yanjun Qi / UVA CS

14

xT=[(x0=1), x1, x2, … ,xp]

θ =

θ0

θ1

!θ p

⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥

θ

!!!

y = f (x)= xTθ =θTx

Page 15: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Review:

• Dot (or Inner) Product of two Vectors <x, y>

9/18/19 Dr. Yanjun Qi / UVA CS 15

is the sum of products of elements in similar positions for the two vectors

<x, y> = <y, x>

We also always write<x, y> =

Page 16: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Today: Multivariate Linear Regression in a Nutshell

Regression: y continuous

Y = Weighted linear sum of X’s

Sum of Squared Error (Least Squared)

Normal Equation / GD / SGD

Regression coefficients

9/18/19 Dr. Yanjun Qi / UVA CS 16

Task: y

Representation: x, f()

Score Function: L()

Search/Optimization : argmin()

Models, Parameters : f(), w, b

Page 17: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

One concrete example

9/18/19 Dr. Yanjun Qi / UVA CS 17

Page 18: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

9/18/19

Dr. Yanjun Qi / UVA CS 18

rent

Living area

rent

Living area

Location

Page 19: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Now the loss function:

• Our goal is to pick the optimal that minimize the following lost /cost function:

9/18/19 Dr. Yanjun Qi / UVA CS 19

θ

SSE: Sum of squared error

𝐽(𝜃) =128%9*

&

𝑓(𝐱%) − 𝑦% ,

Page 20: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Now the loss function: via A Concise Notation

• Using matrix form, we get the following general representation of the linear regression function:

9/18/19 Dr. Yanjun Qi / UVA CS 20

( ) ( )

( )yyXyyXXX

yXyX

yJ

TTTTTT

T

n

ii

Ti

!!!!

!!

+--=

--=

-= å=

qqqq

qq

qq

212121

1

2)()( x

X =

−− x1T −−

−− x2T −−

! ! !−− xn

T −−

"

#

$$$$$

%

&

'''''

Page 21: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Review: Training Set in Matrix Form

•The whole Training set (with n samples) as matrix form :

9/18/19 Dr. Yanjun Qi / UVA CS

21

X =

−− x1T −−

−− x2T −−

! ! !−− xn

T −−

⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

=

x1,0 x1,1 … x1,px2,0 x2,1 … x2,p! ! ! !xn ,0 xn ,1 … xn ,p

⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

!ytrain =

y1y2"yn

!

"

#####

$

%

&&&&&

Page 22: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

9/18/19 Dr. Yanjun Qi / UVA CS 22

Page 23: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

9/18/19 Dr. Yanjun Qi / UVA CS 23

Page 24: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

-3

-2

-10

1

2

3

4

5

6

-3 -2 -1 1 2 3x

2 3y x= -

2y x¢ =

Review (IV): Derivative of a Quadratic Function

9/18/19 Dr. Yanjun Qi / UVA CS 24

This quadrative (convex) function is minimized @ the unique point whose derivative (slope) is zero.

è When finding zeros of the derivative of this function, we

also find the minima (or maxima) of that function.

y" = 2

Page 25: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

One concrete example

9/18/19 Dr. Yanjun Qi / UVA CS 25In Step [D], we solve the matrix equation via Gaussian Elimination

Page 26: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Today: Multivariate Linear Regression in a Nutshell

Regression: y continuous

Y = Weighted linear sum of X’s

Sum of Squared Error (Least Squared)

Normal Equation / GD / SGD

Regression coefficients

9/18/19 Dr. Yanjun Qi / UVA CS 26

Task: y

Representation: x, f()

Score Function: L()

Search/Optimization : argmin()

Models, Parameters : f(), w, b

Page 27: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Method I: normal equations to minimize the loss• Write the cost function in matrix form:

To minimize J(θ), take derivative and set to zero:

9/18/19 Dr. Yanjun Qi / UVA CS

27

⇒ XTXθ = XT !yThe normal equations

!! θ* = XTX( )−1 XT !y

ß

( ) ( )

( )yyXyyXXX

yXyX

yJ

TTTTTT

T

n

ii

Ti

!!!!

!!

+--=

--=

-= å=

qqqq

qq

qq

212121

1

2)()( x

Closed form solution

∇=𝐽(𝜃) = 0

∇=𝐽(𝜃) =

Page 28: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

9/18/19 Dr. Yanjun Qi / UVA CS 28

Page 29: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

9/18/19

Dr. Yanjun Qi / UVA CS

29

Extra: Loss J() is Convex

Page 30: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Review: Hessian Matrix

9/18/19 Dr. Yanjun Qi / UVA CS 30

Positive Definite Hessian

Page 31: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

9/18/19 Dr. Yanjun Qi / UVA CS 31

Extra: Convex function • Intuitively, a convex function (1D case) has a single point at which the

derivative goes to zero, and this point is a minimum. • Intuitively, a function f (1D case) is convex on the range [a,b] if a

function’s second derivative is positive every-where in that range.

• Intuitively, if a multivariate function's Hessians is pd (positive definite!), this (multivariate) function is Convex

• Intuitively, we can think “Positive definite” matrices as analogy to positive numbers in matrix case

Our loss function J() ’s Hessian is Positive Semi-definite - PSD

Page 32: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

9/18/19 Dr. Yanjun Qi / UVA CS 32

è If finding zeros of the derivative of this function, we can also find minima (or maxima) of that function.

Intuitively, a convex function with PD hessian is minimized @ point whose

ü derivative (slope) is zero

ü gradient is zero vector (multivariate case)

Gram Matrix when X is full rank, H is positive definite!

H= 𝑋+𝑋

Page 33: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

9/18/19 Dr. Yanjun Qi / UVA CS 33

Extra: Gram Matrix is always positive semi-definite!

aT XT Xa =|Xa|22≥0Because for any vector a

Besides, when X is full rank, H is Positive Definite (PD) and invertible

H= 𝑋+𝑋

Page 34: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

9/18/19 Dr. Yanjun Qi / UVA CS

34

Later: Comments on the normal equation

• In most situations of practical interest, the number of data points n is larger than the dimensionality p of the input space and the matrix Xis of full column rank. If this condition holds, then it is easy to verify that XTX is necessarily invertible.

• The assumption that XTX is invertible implies that it is positive definite, thus the critical point (by solving gradient to zero) we have found is a minimum.

• What if X has less than full column rank? à regularization (later).

!! θ* = XTX( )−1 XT !y

Page 35: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Today: Multivariate Linear Regression in a Nutshell

Regression: y continuous

Y = Weighted linear sum of X’s

Sum of Squared Error (Least Squared)

Normal Equation / GD / SGD

Regression coefficients

9/18/19 Dr. Yanjun Qi / UVA CS 35

Task: y

Representation: x, f()

Score Function: L()

Search/Optimization : argmin()

Models, Parameters : f(), w, b

Page 36: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

9/18/19 Dr. Yanjun Qi / UVA CS 36

Computational Cost (Naïve Way)

Page 37: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

37

Extra: Naïve Matrix Multiply{implements C = C + A*B}for i = 1 to n

for j = 1 to nfor k = 1 to n

C(i,j) = C(i,j) + A(i,k) * B(k,j)

= + *C(i,j) C(i,j) A(i,:)

B(:,j)

Algorithm has 2*n3 = O(n3) Flops and operates on 3*n2 words of memory

Page 38: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

9/18/19 Dr. Yanjun Qi / 38

Extra: Scalability to big data?

Page 39: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Many architecture details and Algorithm details to consider

• Data parallelization through CPU SIMD / Multithreading/ GPU parallelization / ….

• Memory hierarchical / locality • Better algorithms, like Strassenʼs Matrix Multiply and many

others

9/18/19 Dr. Yanjun Qi / UVA CS 39

Page 40: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Basic Linear Algebra Subroutines (BLAS) è numpy: a wrapper library of BLAS

• Industry standard interface (evolving)• www.netlib.org/blas, www.netlib.org/blas/blast--forum

• Vendors, others supply optimized implementations• History

• BLAS1 (1970s): • vector operations: dot product, saxpy (y=a*x+y), etc• m=2*n, f=2*n, q = f/m = computational intensity ~1 or less

• BLAS2 (mid 1980s)• matrix-vector operations: matrix vector multiply, etc• m=n^2, f=2*n^2, q~2, less overhead • somewhat faster than BLAS1

• BLAS3 (late 1980s)• matrix-matrix operations: matrix matrix multiply, etc• m <= 3n^2, f=O(n^3), so q=f/m can possibly be as large as n, so BLAS3 is potentially much faster

than BLAS2

• Good algorithms use BLAS3 when possible (LAPACK & ScaLAPACK)• See www.netlib.org/{lapack,scalapack}

source: Stanford Optim EE course

Page 41: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

9/18/19 Dr. Yanjun Qi / UVA CS 41

BLAS performance is very much system dependent, e.g., https://www.hoffman2.idre.ucla.edu/blas_benchmark/

Page 42: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Today: Multivariate Linear Regression in a Nutshell

Regression: y continuous

Y = Weighted linear sum of X’s

Sum of Squared Error (Least Squared)

Normal Equation / GD / SGD

Regression coefficients

9/18/19 Dr. Yanjun Qi / UVA CS 42

Task: y

Representation: x, f()

Score Function: L()

Search/Optimization : argmin()

Models, Parameters : f(), w, b

Page 43: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Next: More Regression (supervised)

q Four ways to train / perform optimization for linear regression modelsq Normal Equationq Gradient Descent (GD) q Stochastic GD q Newton’s method

qSupervised regression models qLinear regression (LR) qLR with non-linear basis functionsqLocally weighted LRqLR with Regularizations

9/18/19 Dr. Yanjun Qi / UVA CS 43

Page 44: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

44

Probabilistic Interpretation of Linear Regression (LATER)

• Let us assume that the target variable and the inputs are related by the equation:

where ε is an error term of unmodeled effects or random noise

• Now assume that ε follows a Gaussian N(0,σ), then we have:

• By iid (among samples) assumption:

yi =θTx i + ε i

÷÷ø

öççè

æ --= 2

2

221

sq

spq )(exp);|( i

Ti

iiyxyp x

÷÷

ø

ö

çç

è

æ --÷

ø

öçè

æ== åÕ =

=2

12

1 221

sq

spqq

n

i iT

inn

iii

yxypL

)(exp);|()(

x

9/18/19 Dr. Yanjun Qi / UVA CS

Many more variations of LinearR from this

perspective, e.g. binomial / poisson

(LATER)

Page 45: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

References

• Big thanks to Prof. Eric Xing @ CMU for allowing me to reuse some of his slides

q http://www.cs.cmu.edu/~zkolter/course/15-884/linalg-review.pdf(please read)

q Prof. Alexander Gray’s slides

9/18/19 Dr. Yanjun Qi / UVA CS 45

Page 46: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

EXTRA

9/18/19 Dr. Yanjun Qi / UVA CS 46

Page 47: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Review (I):

• Sum the Squared Elements of a Vector equals Vector dot product to itself

9/18/19 Dr. Yanjun Qi / UVA CS 47

é ùê ú=ê úë û

52 8

a

aT = 5 2 8!"

#$

aTa = 5 2 8⎡⎣

⎤⎦

528

⎣⎢⎢

⎦⎥⎥ = 52 +22 + 82 = 93

J(θ )= 12 (x iTθ − yi )2i=1

n

Page 48: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Review(I) :

9/18/19 Dr. Yanjun Qi / UVA CS 48

a=

x1Tθ − y1x2Tθ − y2!

xnTθ − yn

⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

= Xθ − "y

2 J(θ )= (x iTθ − yi )2i=1

n

∑aTa =

Page 49: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Review (II): gradient of linear form

9/18/19 Dr. Yanjun Qi / 49

f (w)=wTa = w1 ,w2 ,w3⎡⎣ ⎤⎦

123

⎜⎜⎜

⎟⎟⎟=w1 +2w2 +3w3

One Concrete Example

∂f∂w

= ∂wTa∂w

= a =123

⎜⎜⎜

⎟⎟⎟

∂(θT XT y)∂θ

= XT y

Page 50: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Review (III): Gradient of Quadratic Form

• See L2-note.pdf -> Page 17, Page 23-24• See white board

9/18/19 Dr. Yanjun Qi / UVA CS 50

∂(θT XT Xθ )

∂θ= ∂(θTGθ )

∂θ=2Gθ =2XTXθ

Page 51: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Review (III): Single Var-Func to Multivariate

Single Var-Function

Multivariate Calculus

DerivativeSecond-order derivative

Partial Derivative Gradient Directional Partial Derivative Vector Field Contour map of a function Surface map of a functionHessian matrix Jacobian matrix (vector in / vector out)

9/18/19 Dr. Yanjun Qi / 51

Page 52: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Review (IV) : Definitions of gradient (Matrix_calculus / Scalar-by-vector)

9/18/19 Dr. Yanjun Qi / 52

• Size of gradient is always the same as the size of variable

if In principle, gradients are a natural extension of partial derivatives to functions of

multiple variables.

Page 53: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Review (V): Rank of a Matrix

• rank(A) (the rank of a m-by-n matrix A) is= The maximal number of linearly independent columns=The maximal number of linearly independent rows

• If A is n by m, then• rank(A)<= min(m,n)• If n=rank(A), then A has full row rank• If m=rank(A), then A has full column rank

9/18/19 Dr. Yanjun Qi / 53

Rank=? Rank=?

If A is n*n, rank(A)=n iff A is invertible

rank(AB) <= min( rank(A), rank(B) )

Page 54: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Extra: positive semi-definite!

9/18/19 Dr. Yanjun Qi / UVA CS 54

L2-Note: Page 17

See proof on L2-Note: Page 18

Page 55: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Extra: Scalability to big data?

• Traditional CS view: Polynomial time algorithm, Wow!• Large-scale learning: Sometimes even O(n) is bad! => Many state-of-the-art

solutions (e.g., low rank, sparse, hardware, sampling, randomized…)

9/18/19 Dr. Yanjun Qi / UVA CS 55

Page 56: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

SIMD: Single Instruction, Multiple Data

56

+

• Scalar processing• traditional mode• one operation produces

one result

• SIMD processing• with SSE / SSE2• SSE = streaming SIMD extensions• one operation produces

multiple results

X

Y

X + Y

+x3 x2 x1 x0

y3 y2 y1 y0

x3+y3 x2+y2 x1+y1 x0+y0

X

Y

X + Y

Slide Source: Alex Klimovitski & Dean Macri, Intel Corporation

Page 57: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Memory Hierarchy• Most programs have a high degree of locality in their accesses

• spatial locality: accessing things nearby previous accesses• temporal locality: reusing an item that was previously accessed

• Memory hierarchy tries to exploit locality to improve average

57

on-chip cacheregisters

datapath

control

processor

Second level

cache (SRAM)

Main memory

(DRAM)

Secondary storage (Disk)

Tertiary storage

(Disk/Tape)

Speed 1ns 10ns 100ns 10ms 10sec

Size KB MB GB TB PB

http://www.cs.berkeley.edu/~demmel/cs267_Spr12/

Page 58: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

58

Note on Matrix Storage

• A matrix is a 2-D array of elements, but memory addresses are “1-D”• Conventions for matrix layout

• by column, or “column major” (Fortran default); A(i,j) at A+i+j*n• by row, or “row major” (C default) A(i,j) at A+i*n+j• recursive (later)

• Column major (for now)

01234

56789

1011121314

1516171819

0481216

1591317

26101418

37111519

Column major Row major

cachelines Blue row of matrix is stored in red cachelines

Column major matrix in memory

http://www.cs.berkeley.edu/~demmel/cs267_Spr12/

Page 59: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Strassenʼs Matrix Multiply

• The traditional algorithm (with or without tiling) has O(n3) flops• Strassen discovered an algorithm with asymptotically lower flops

• O(n2.81)• Consider a 2x2 matrix multiply, normally takes 8 multiplies, 4 adds

• Strassen does it with 7 multiplies and 18 adds

Let M = m11 m12 = a11 a12 b11 b12

m21 m22 = a21 a22 b21 b22

Let p1 = (a12 - a22) * (b21 + b22) p5 = a11 * (b12 - b22)

p2 = (a11 + a22) * (b11 + b22) p6 = a22 * (b21 - b11)

p3 = (a11 - a21) * (b11 + b12) p7 = (a21 + a22) * b11

p4 = (a11 + a12) * b22

Then m11 = p1 + p2 - p4 + p6

m12 = p4 + p5

m21 = p6 + p7

m22 = p2 - p3 + p5 - p7

Extends to nxn by divide&conquer

http://www.cs.berkeley.edu/~demmel/cs267_Spr12/

Page 60: UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

Strassen (continued)

T(n) = Cost of multiplying nxn matrices = 7*T(n/2) + 18*(n/2)2 = O(n log2 7) = O(n 2.81)

60

• Asymptotically faster • Several times faster for large n in practice• Cross-over depends on machine• “Tuning Strassen's Matrix Multiplication for Memory Efficiency”,

M. S. Thottethodi, S. Chatterjee, and A. Lebeck, in Proceedings of Supercomputing '98

• Possible to extend communication lower bound to Strassen• #words moved between fast and slow memory =

Ω(nlog2 7 / M(log2 7)/2 – 1 ) ~ Ω(n2.81 / M0.4 ) • (Ballard, D., Holtz, Schwartz, 2011)• Attainable too

http://www.cs.berkeley.edu/~demmel/cs267_Spr12/


Recommended