CSE 1320 Intermediate Programming

Post on 13-Apr-2022

12 views 0 download

transcript

CS 7455

MOBILE APP DEVELOPMENT

Dr. Mingon Kang

Computer Science, Kennesaw State University

Terminology

Features

An individual measurable property of a phenomenon being observed

The number of features or distinct traits that can be used to describe each item in a quantitative manner

May have implicit/explicit patterns to describe a phenomenon

Samples

Items to process (classify or cluster)

Can be a document, a picture, a sound, a video, or a patient

Reference: http://www.slideshare.net/rahuldausa/introduction-to-machine-learning-38791937

Data In Machine Learning

๐‘ฅ๐‘–: input vector, independent variable

๐‘ฆ: response variable, dependent variable

๐‘ฆ โˆˆ {โˆ’1, 1}: binary classification

๐‘ฆ โˆˆ โ„: regression

Predict a label when having observed some new ๐‘ฅ

Types of Variable

Categorical variable: discrete or qualitative variables

Nominal:

Have two or more categories, but which do not have an

intrinsic order

Dichotomous

Nominal variable which have only two categories or levels.

Ordinal

Have two or more categories, which can be ordered or

ranked.

Continuous variable

Mathematical Notation

Matrix: uppercase bold Roman letter, ๐—

Vector: lower case bold Roman letter, ๐ฑ

Scalar: lowercase letter

Transpose of a matrix or vector: superscript T or โ€˜

E.g.

Row vector: ๐‘ฅ1, ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐‘

Corresponding column vector: ๐ฑ = ๐‘ฅ1, ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐‘T

Matrix: ๐— = {๐’™๐Ÿ, ๐’™๐Ÿ, โ€ฆ , ๐’™๐’‘}

Transpose of a Matrix

Operator which flips a matrix over its diagonal

Switch the row and column indices of the matrix

Denoted as AT, Aโ€ฒ, Atr, or, At.

[AT ]๐‘–๐‘—= [A]๐‘—๐‘– If A is an m*n matrix, Aโ€™ is an n*m matrix

(AT)T = A

(๐ด + ๐ต)T= AT + BT

(AB)T= BTAT

Inverse of a Matrix

The inverse of a square matrix A, sometimes called

a reciprocal matrix, is a matrix Aโˆ’1 such that

AAโˆ’1 = I

where I is the identity matrix.

The Inverse of a Matrix is the same idea but we

write it A-1

Reference: https://www.mathsisfun.com/algebra/matrix-inverse.html

What is Machine Learning?

Data Model

f(x)

Training

What is Machine Learning?

New Data Make a decision

f(x)

Supervised learning

Data: ๐ท = {๐‘‘1, ๐‘‘2, โ€ฆ , ๐‘‘๐‘›} a set of n samples

where ๐‘‘๐‘– =< ๐’™๐’Š, ๐‘ฆ๐‘– >

๐’™๐’Š is a input vector and ๐‘ฆ๐‘– is a desired output

Objective: learning the mapping ๐‘“: ๐‘ฟ โ†’ ๐’š

s.t. ๐‘ฆ๐‘– โ‰ˆ ๐‘“(๐’™๐’Š) for all i = 1,โ€ฆ,n

Regression: ๐’š is continuous

Classification: ๐’š is discrete

Linear Regression

Review of Linear Regression in Statistics

Linear Regression

Linear Regression

Linear Regression

Reference: https://lagunita.stanford.edu/c4x/HumanitiesScience/StatLearning/asset/linear_regression.pdf

Linear Regression

How to represent the data as a vector/matrix

We assume a model:

๐ฒ = b0 + ๐›๐— + ฯต,

where b0 and ๐› are intercept and slope, known as

coefficients or parameters. ฯต is the error term (typically

assumes that ฯต~๐‘(๐œ‡, ๐œŽ2)

Linear Regression

How to represent the data as a vector/matrix

Include bias constant (intercept) in the input vector

๐— โˆˆ โ„๐’ร—(๐’‘+๐Ÿ), ๐ฒ โˆˆ โ„๐’, and ๐› โˆˆ โ„๐’‘+๐Ÿ

๐— = ๐Ÿ, ๐ฑ๐Ÿ, ๐ฑ๐Ÿ, โ€ฆ , ๐ฑ๐ฉ

๐ฒ = {y1, y2, โ€ฆ , yn}T

๐› = {b0, b1, b2, โ€ฆ , bp}T

Linear Regression

Find the optimal coefficient vector b that makes the most similar observation

๐‘ฆ1โ‹ฎ๐‘ฆ๐‘›

โ‰ˆ111

๐‘ฅ11 โ‹ฏ ๐‘ฅ๐‘1โ‹ฎ โ‹ฑ โ‹ฎ

๐‘ฅ1๐‘› โ‹ฏ ๐‘ฅ๐‘๐‘›

๐‘0โ‹ฎ๐‘๐‘

or

๐‘ฆ1โ‹ฎ๐‘ฆ๐‘›

=111

๐‘ฅ11 โ‹ฏ ๐‘ฅ๐‘1โ‹ฎ โ‹ฑ โ‹ฎ

๐‘ฅ1๐‘› โ‹ฏ ๐‘ฅ๐‘๐‘›

๐‘0โ‹ฎ๐‘๐‘

+

๐‘’1โ‹ฎ๐‘’๐‘›

Ordinary Least Squares (OLS)

๐ฒ = ๐—๐› + ๐ž

Estimate the unknown parameters (b) in linear regression model

Minimizing the sum of the squares of the differences between the observed responses and the predicted by a linear function

Residual Sum of Squares (RSS) =

๐‘–=1

๐‘›

(๐‘ฆ๐‘– โˆ’ ๐ฑ๐‘–๐›)2

Ordinary Least Squares (OLS)

Optimization

Need to minimize the error

min ๐ฝ(๐›) =

๐‘–=1

๐‘›

(๐‘ฆ๐‘– โˆ’ ๐ฑ๐‘–๐›)2

To obtain the optimal set of parameters (b),

derivatives of the error w.r.t. each parameters must

be zero.

Optimization

๐ฝ = ๐žT๐ž = ๐ฒ โˆ’ ๐—๐› โ€ฒ ๐ฒ โˆ’ ๐—๐›= ๐ฒโ€ฒ โˆ’ ๐›โ€ฒ๐—โ€ฒ ๐ฒ โˆ’ ๐—๐›= ๐ฒโ€ฒ๐ฒ โˆ’ ๐ฒโ€ฒ๐—๐› โˆ’ ๐›โ€ฒ๐—โ€ฒ๐ฒ + ๐›โ€ฒ๐—โ€ฒ๐—๐›= ๐ฒโ€ฒ๐ฒ โˆ’ ๐Ÿ๐›โ€ฒ๐—โ€ฒ๐ฒ + ๐›โ€ฒ๐—โ€ฒ๐—๐›

๐œ•๐žโ€ฒ๐ž

๐œ•๐›= โˆ’2๐—โ€ฒ๐ฒ + 2๐—โ€ฒ๐—๐› = 0

๐—โ€ฒ๐— ๐› = ๐—โ€ฒ๐ฒแˆ˜๐› = (๐—โ€ฒ๐—)โˆ’1๐—โ€ฒ๐ฒ

Matrix Cookbook: https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf

Linear regression for classification

For binary classification

Encode class labels as y = 0, 1 ๐‘œ๐‘Ÿ {โˆ’1, 1}

Apply OLS

Check which class the prediction is closer to

If class 1 is encoded to 1 and class 2 is -1.

๐‘๐‘™๐‘Ž๐‘ ๐‘  1 ๐‘–๐‘“ ๐‘“ ๐‘ฅ โ‰ฅ 0

๐‘๐‘™๐‘Ž๐‘ ๐‘  2 ๐‘–๐‘“ ๐‘“ ๐‘ฅ < 0

Logistic regression

We will cover this later

Linear Model in Computer Vision

Features are pixel values in an image

Data matrix X:

An image is a two dimensional matrix

Need to reshape the 2D-matrix to 1D-array

Height by Width matrix Height * Width matrix array for

a single image

If we have N samples, X will be a N by (Height*Width)

matrix

Assume that every pixel is independent

Linear Model in Computer Vision

Applications

Any classification and regression problems:

Gender/age classification

Hand-written digit classification

MNIST

However, linear models are not the best for

classification problems

Logistic regression is a linear model for classification.