+ All Categories
Home > Documents > I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure...

I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure...

Date post: 09-Mar-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
27
Chapter ML:I I. Introduction Examples of Learning Tasks Specification of Learning Tasks Elements of Machine Learning Comparative Syntax Overview Functions Overview Algorithms Overview Classification Approaches Overview ML:I-43 Introduction © STEIN 2021
Transcript
Page 1: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Chapter ML:I

I. Introductionq Examples of Learning Tasksq Specification of Learning Tasksq Elements of Machine Learning

q Comparative Syntax Overviewq Functions Overviewq Algorithms Overviewq Classification Approaches Overview

ML:I-43 Introduction © STEIN 2021

Page 2: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Elements of Machine Learning(1) Model Formation: Real World→ Model World

Objects

OClassesγ(o)

C

XFeature vectors

α(o)c(x) ≈ y(x)

Related questions:

q From what kind of experience should be learned?

q Which level of fidelity is sufficient to solve a certain task?

ML:I-44 Introduction © STEIN 2021

Page 3: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Elements of Machine Learning(2) Design Choices for Model Function Construction

Optimization approach

Optimization objectiveLoss function [ + Regularization ]

Model function ; Hypothesis space

4

Task

Data

Stochastic gradient descent

q Objective: Minimize squared lossq Regularization: Noneq Loss: Sum of squared residuals

q Hypothesis space: w ∈ Rp+1

q Linear model: y(x) = w0 +∑p

i=1wixi

Binary classification

D = {(x1, c(x1)), . . . , (xn, c(xn))} ⊆ X × {−1, 1}

ML:I-45 Introduction © STEIN 2021

Page 4: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Elements of Machine Learning(2) Design Choices for Model Function Construction: LMS in a Nutshell

Optimization approach

Optimization objectiveLoss function [ + Regularization ]

Model function ; Hypothesis space

4

Task

Data

Stochastic gradient descent

q Objective: Minimize squared lossq Regularization: Noneq Loss: Sum of squared residuals

q Hypothesis space: w ∈ Rp+1

q Linear model: y(x) = w0 +∑p

i=1wixi

Binary classification

D = {(x1, c(x1)), . . . , (xn, c(xn))} ⊆ X × {−1, 1}

ML:I-46 Introduction © STEIN 2021

Page 5: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Elements of Machine Learning(2) Design Choices for Model Function Construction: LMS in a Nutshell (continued)

Optimization approach

Optimization objectiveLoss function [ + Regularization ]

Model function ; Hypothesis space

4

Task

Data

Stochastic gradient descent

q Objective: Minimize squared lossq Regularization: Noneq Loss: Sum of squared residuals

q Hypothesis space: w ∈ Rp+1

q Linear model: y(x) = w0 +∑p

i=1wixi

Binary classification

D = {(x1, c(x1)), . . . , (xn, c(xn))} ⊆ X × {−1, 1}

ML:I-47 Introduction © STEIN 2021

Page 6: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Elements of Machine Learning(2) Design Choices for Model Function Construction: LMS in a Nutshell (continued)

Optimization approach

Optimization objectiveLoss function [ + Regularization ]

Model function ; Hypothesis space

4

Task

Data

Stochastic gradient descent

q Objective: Minimize squared lossq Regularization: Noneq Loss: Sum of squared residuals

q Hypothesis space: w ∈ Rp+1

q Linear model: y(x) = w0 +∑p

i=1wixi

Binary classification

D = {(x1, c(x1)), . . . , (xn, c(xn))} ⊆ X × {−1, 1}

ML:I-48 Introduction © STEIN 2021

Page 7: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Elements of Machine Learning(2) Design Choices for Model Function Construction: LMS in a Nutshell (continued)

Optimization approach

Optimization objectiveLoss function [ + Regularization ]

Model function ; Hypothesis space

4

Task

Data

Stochastic gradient descent

q Objective: Minimize squared lossq Regularization: Noneq Loss: Sum of squared residuals

q Hypothesis space: w ∈ Rp+1

q Linear model: y(x) = w0 +∑p

i=1wixi

Binary classification

D = {(x1, c(x1)), . . . , (xn, c(xn))} ⊆ X × {−1, 1}

ML:I-49 Introduction © STEIN 2021

Page 8: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Related questions:

q What are useful classes of model functions?

q What are methods to fit (= learn) model functions?

q What are measures to assess the goodness of fit?

q How does (label) noise affect the learning process?

q How does the example number affect the learning process?

q How to deal with extreme class imbalance?

ML:I-50 Introduction © STEIN 2021

Page 9: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Elements of Machine Learning(3) Feature Space Structure

The feature space is an inner product space.

q An inner product space (also called pre-Hilbert space) is a vector space withan additional structure called “inner product”.

q Example: Euclidean vector space equipped with the dot product.

q Enables algorithms such as gradient descent and support vector machines.

ML:I-51 Introduction © STEIN 2021

Page 10: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Elements of Machine Learning(3) Feature Space Structure (continued)

The feature space is an inner product space.

q An inner product space (also called pre-Hilbert space) is a vector space withan additional structure called “inner product”.

q Example: Euclidean vector space equipped with the dot product.

q Enables algorithms such as gradient descent and support vector machines.

The feature space is a σ-algebra.

q A σ-algebra on a set X is a collection of subsets of X that includes X itself, isclosed under complement, and is closed under countable unions.

q Enables probability spaces and statistical learning, such as naive Bayes.

ML:I-52 Introduction © STEIN 2021

Page 11: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Elements of Machine Learning(3) Feature Space Structure (continued)

The feature space is an inner product space.

q An inner product space (also called pre-Hilbert space) is a vector space withan additional structure called “inner product”.

q Example: Euclidean vector space equipped with the dot product.

q Enables algorithms such as gradient descent and support vector machines.

The feature space is a σ-algebra.

q A σ-algebra on a set X is a collection of subsets of X that includes X itself, isclosed under complement, and is closed under countable unions.

q Enables probability spaces and statistical learning, such as naive Bayes.

The feature space is a finite set of vectors with nominal dimensions.

q Requires concept learning via set splitting as done by decision trees.

ML:I-53 Introduction © STEIN 2021

Page 12: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Remarks:

q The aforementioned examples of feature spaces are not meant to be complete. However,they illustrate a broad range of structures underlying the example sets we want to learn from.

q The structure of a feature space constrains the applicable learning algorithm. Usually, thisstructure is inherently determined by the application domain and cannot be chosen.

ML:I-54 Introduction © STEIN 2021

Page 13: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Elements of Machine Learning(4) Discriminative vs. Generative Approach to Classification

q Discriminative classifiers (models) learn a boundary between classes.

q Generative classifiers exploit the distributions underlying the classes.

x2

x1

-

--

--

--

-

--

- --

-

---

-

-

--- -

-

+

+

+

++

++

+

+

+

++

+

+ +

++

+

++

+

+

x2

x1

-

--

--

--

-

--

- --

-

---

-

-

--- -

-

+

+

+

++

++

+

+

+

++

+

+ +

++

+

++

+

+

ML:I-55 Introduction © STEIN 2021

Page 14: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Elements of Machine Learning(4) Discriminative vs. Generative Approach to Classification (continued)

q Discriminative classifiers (models) learn a boundary between classes.

q Generative classifiers exploit the distributions underlying the classes.

x2

x1

-

--

--

--

-

--

- --

-

---

-

-

--- -

-

+

+

+

++

++

+

+

+

++

+

+ +

++

+

++

+

+

x2

x1

-

--

--

--

-

--

- --

-

---

-

-

--- -

-

+

+

+

++

++

+

+

+

++

+

+ +

++

+

++

+

+

discriminative; classification rule

generative; class membership probability

ML:I-56 Introduction © STEIN 2021

Page 15: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Elements of Machine Learning(4) Discriminative vs. Generative Approach to Classification (continued)

q Discriminative classifiers (models) learn a boundary between classes.

q Generative classifiers exploit the distributions underlying the classes.

x2

x1

-

--

--

--

-

--

- --

-

---

-

-

--- -

-

+

+

+

++

++

+

+

+

++

+

+ +

++

+

++

+

+

x2

x1

-

--

--

--

-

--

- --

-

---

-

-

--- -

-

+

+

+

++

++

+

+

+

++

+

+ +

++

+

++

+

+

? ?

discriminative; classification rule

generative; class membership probability

ML:I-57 Introduction © STEIN 2021

Page 16: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Remarks:

q When classifying a new example (1) discriminative classifiers apply a decision rule that waslearned via minimizing the misclassification rate given training examples D, while(2) generative classifiers maximize the probability of the combined event P (X=x,Y=y), or,similarly, the a-posteriori probability P (Y=y | X=x), y ∈ {,⊕}.

q The LMS algorithm computes “only” a decision boundary, i.e., it constructs a discriminativeclassifier. A Bayes classifier is an example for a generative model.

q Yoav Freund provides an excellent video illustrating the pros and cons of discriminative andgenerative models respectively. [YouTube]

q Discriminative models may be further differentiated in models that also determine theposterior class probabilities P (Y=y | X=x) (without computing the joint probabilitiesP (X=x,Y=y)) and those that do not. In the latter case, only a so-called “discriminantfunction” is computed.

ML:I-58 Introduction © STEIN 2021

Page 17: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Elements of Machine Learning(5) Frequentist vs. Subjectivist Approach to Parameter Estimation

Frequentism:

q There is a (hidden) mechanism that generates D.

q To model this mechanism you consider– a family of distributions, or– a model function, or– a combination of both,

parameterized by θ. The possible values for θ form the hypothesis space H.

q Select a most probable hypothesis hML ∈ H by estimating θ using a sampleD′ ⊂ D. hML is called maximum likelihood hypothesis.

ML:I-59 Introduction © STEIN 2021

Page 18: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Elements of Machine Learning(5) Frequentist vs. Subjectivist Approach to Parameter Estimation (continued)

Frequentism:

q There is a (hidden) mechanism that generates D.

q To model this mechanism you consider– a family of distributions, or– a model function, or– a combination of both,

parameterized by θ. The possible values for θ form the hypothesis space H.

q Select a most probable hypothesis hML ∈ H by estimating θ using a sampleD′ ⊂ D. hML is called maximum likelihood hypothesis.

ML:I-60 Introduction © STEIN 2021

Page 19: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Elements of Machine Learning(5) Frequentist vs. Subjectivist Approach to Parameter Estimation (continued)

Frequentism:

q There is a (hidden) mechanism that generates D.

q To model this mechanism you consider– a family of distributions, or– a model function, or– a combination of both,

parameterized by θ. The possible values for θ form the hypothesis space H.

q Select a most probable hypothesis hML ∈ H by estimating θ using a sampleD′ ⊂ D. hML is called maximum likelihood hypothesis.

ML:I-61 Introduction © STEIN 2021

Page 20: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Elements of Machine Learning(5) Frequentist vs. Subjectivist Approach to Parameter Estimation (continued)

Frequentism:

q There is a (hidden) mechanism that generates D.

q To model this mechanism you consider– a family of distributions, or– a model function, or– a combination of both,

parameterized by θ. The possible values for θ form the hypothesis space H.

q Select a most probable hypothesis hML ∈ H by estimating θ using a sampleD′ ⊂ D. hML is called maximum likelihood hypothesis.

θ ; D′, hML = argmaxh∈H

P (D′ | h)

ML:I-62 Introduction © STEIN 2021

Page 21: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Remarks:

q θ is a parameter or a parameter vector that is considered as fixed (in particular: not as arandom variable), but unknown.

q In the experiment of flipping a coin, one may suppose a Laplace experiment and consider thebinomial distribution, B(n, p).

q P (D′ | h) is the probability of observing D′ under h. I.e., it is the probability of observing D′

if the hidden mechanism that generates D′ behaves according to the considered modelwhose parameter θ is set to h.

ML:I-63 Introduction © STEIN 2021

Page 22: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Elements of Machine Learning(5) Frequentist vs. Subjectivist Approach to Parameter Estimation (continued)

Subjectivism:

q Consider a model for the mechanism that has generated D.

q There are different beliefs about the parameter (vector) θ that characterizesthe model. The possible values for θ form the hypothesis space H.

q Select a most probable hypothesis hMAP ∈ H by weighting the ML estimatesunder D with the priors. hMAP is called maximum a-posteriori hypothesis.

Belief/Prior 1: P (p = 0.5︸ ︷︷ ︸θ1

) = 0.95 Belief/Prior 2: P (p = 0.75︸ ︷︷ ︸θ2

) = 0.50

ML:I-64 Introduction © STEIN 2021

Page 23: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Elements of Machine Learning(5) Frequentist vs. Subjectivist Approach to Parameter Estimation (continued)

Subjectivism:

q Consider a model for the mechanism that has generated D.

q There are different beliefs about the parameter (vector) θ that characterizesthe model. The possible values for θ form the hypothesis space H.

q Select a most probable hypothesis hMAP ∈ H by weighting the ML estimatesunder D with the priors. hMAP is called maximum a-posteriori hypothesis.

Belief/Prior 1: P (p = 0.5︸ ︷︷ ︸θ1

) = 0.95 Belief/Prior 2: P (p = 0.75︸ ︷︷ ︸θ2

) = 0.50

ML:I-65 Introduction © STEIN 2021

Page 24: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Elements of Machine Learning(5) Frequentist vs. Subjectivist Approach to Parameter Estimation (continued)

Subjectivism:

q Consider a model for the mechanism that has generated D.

q There are different beliefs about the parameter (vector) θ that characterizesthe model. The possible values for θ form the hypothesis space H.

q Select a most probable hypothesis hMAP ∈ H by weighting the ML estimatesunder D with the priors. hMAP is called maximum a-posteriori hypothesis.

Belief/Prior 1: P (p = 0.5︸ ︷︷ ︸θ1

) = 0.95 Belief/Prior 2: P (p = 0.75︸ ︷︷ ︸θ2

) = 0.50

ML:I-66 Introduction © STEIN 2021

Page 25: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Elements of Machine Learning(5) Frequentist vs. Subjectivist Approach to Parameter Estimation (continued)

Subjectivism:

q Consider a model for the mechanism that has generated D.

q There are different beliefs about the parameter (vector) θ that characterizesthe model. The possible values for θ form the hypothesis space H.

q Select a most probable hypothesis hMAP ∈ H by weighting the ML estimatesunder D with the priors. hMAP is called maximum a-posteriori hypothesis.

Belief/Prior 1: P (p = 0.5︸ ︷︷ ︸θ1

) = 0.95 Belief/Prior 2: P (p = 0.75︸ ︷︷ ︸θ2

) = 0.50

θ1 +D → P (D | θ1)

θ2 +D → P (D | θ2)

ML:I-67 Introduction © STEIN 2021

Page 26: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Elements of Machine Learning(5) Frequentist vs. Subjectivist Approach to Parameter Estimation (continued)

Subjectivism:

q Consider a model for the mechanism that has generated D.

q There are different beliefs about the parameter (vector) θ that characterizesthe model. The possible values for θ form the hypothesis space H.

q Select a most probable hypothesis hMAP ∈ H by weighting the ML estimatesunder D with the priors. hMAP is called maximum a-posteriori hypothesis.

Belief/Prior 1: P (p = 0.5︸ ︷︷ ︸θ1

) = 0.95 Belief/Prior 2: P (p = 0.75︸ ︷︷ ︸θ2

) = 0.50

θ1 +D → P (D | θ1)

θ2 +D → P (D | θ2)

hMAP = argmaxh∈{θ1,θ2}

P (h | D) = argmaxh∈{θ1,θ2}

P (D | h) · P (h)P (D)

ML:I-68 Introduction © STEIN 2021

Page 27: I.Introduction - Webis · 2021. 2. 15. · Elements of Machine Learning (3) Feature Space Structure (continued) The feature space is an inner product space. q An inner product space

Remarks:

q θ is considered as random variable. There is prior knowledge about the distribution of θ.

q p is a parameter of the binomial distribution and denotes the success probability for each trial.

– Belief 1: With a probability of 0.95 the coin is fair (both sides are equally likely).– Belief 2: With a probability of 0.5 the odds of preferring a particular side is 3:1.

Given D from a number of trials compute P (D | θ1) and P (D | θ2) and the respective valuesfor P (θ1 | D) and P (θ2 | D).

Disclaimer. While only mild conditions are required for MAP estimation to be a limiting case ofBayes estimation, it is not very representative of Bayesian methods in general. This isbecause MAP estimates are point estimates, whereas Bayesian methods are characterizedby the use of distributions to summarize data and draw inferences. [Wikipedia]

q The subjectivist approach is also called Bayesian interpretation of probability. The Bayesianinterpretation of probability enables by design the integration of prior knowledge, backgroundknowledge, and human expertise. [Wikipedia: probability interpretations, Bayes interpretations]

q Food for thought: Discuss the use of frequentist and subjectivist approaches to decisionmaking if you had to develop an AI that plays poker.

ML:I-69 Introduction © STEIN 2021


Recommended