+ All Categories
Home > Documents > Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning...

Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning...

Date post: 25-Jan-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
50
Introduction to ML Abhijit Mishra Research Scholar Center for Indian Language Technology Department of Computer Science and Engineering Indian Institute of Technology Bombay Email: [email protected] URL: http://www.cse.iitb.ac.in/~abhijitmishra
Transcript
Page 1: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Introduction to MLAbhijit Mishra

Research Scholar

Center for Indian Language Technology

Department of Computer Science and Engineering

Indian Institute of Technology Bombay

Email: [email protected]: http://www.cse.iitb.ac.in/~abhijitmishra

Page 2: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Task: Get mangoes of a particular type from the market

Page 3: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •
Page 4: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Randomness??Ambiguity??

Nuances??

Task 1: Solve an equation

Task 2: Get mangoes of a particular type from the market

Page 5: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

RandomnessSlight Variation in shape, size, color and odor etc.

AmbiguitySimilarity in size, color but belong to different categoriesNuances??Differences in size, color but belong to the same categoryHow to make machines understand these?

Page 6: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Introduction to ML - Roadmap• Definition of Machine Learning• Learning to predict

• Classification• Regression

• Learning Paradigms• Rule based• Statistical • Example Based

• Statistical Machine Learning • Supervised • Semi-supervised• Unsupervised• Reinforcement

• Supervised approaches• Probabilistic approaches• Non-probabilistic approaches

• Example - Text Classification• Books, Online Courses and Tools

Page 7: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Definition of Machine Learning• Machine learning1 is a type of artificial intelligence

(AI) that provides computers with the ability to learn without being explicitly programmed.

• Explores the study and construction of algorithms that can learn and make predictions on data

• Applications:• Pattern Recognition (e.g., Handwriting Recognition, Face

detection, Gesture detection)• Prediction of events (e.g., Stock market predictions,

weather forecasting, prediction of diseases based on symptoms)

• Almost all popular online services (e.g., Google, Facebook, Amazon) use ML.

https://en.wikipedia.org/wiki/Machine_learning

Page 8: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Introduction to ML - Roadmap• Definition of Machine Learning• Learning to predict

• Classification• Regression

• Learning Paradigms• Rule based• Statistical • Example Based

• Statistical Machine Learning • Supervised • Semi-supervised• Unsupervised• Reinforcement

• Supervised approaches• Probabilistic approaches• Non-probabilistic approaches

• Example - Text Classification

Page 9: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Learning to Predict - Classification• Classification is the problem of predicting to

which of a set of categories (sub-populations) a new observation belongs.

• Input: Properties of the new observation • Output: or the class of the new observation• When , the problem is called “binary

classification problem” (e.g., classifying emails into spam or non-spam categories)

• When the problem is called N-class/multi-class classification problem (e.g., classifying documents into multiple categories like sports, health, politics etc.).

•  

Page 10: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Learning to Predict - Regression• When the out-put space of a predictor is

a real number instead of (nominal categories as in classification), the prediction problem is referred to as statistical regression or simply regression.

• Input: Properties of the new observation • Output: where • Example: Predicting the temperature of a

day given the climatic conditions of the previous day, estimating number of units of a new product to be sold in an year.

•  

Page 11: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Note: Structured prediction

• Deals with more complex output (instead of scalar output as in cases of classification and regression)

• Output: where N

• Example: Automatic text translation (output is a sentence in another language), Parse tree generation (output is a tree structure), Image Captioning

We will only focus on classification problems.

•  

Page 12: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Introduction to ML - Roadmap• Definition of Machine Learning• Learning to predict

• Classification• Regression

• Learning Paradigms• Rule based• Statistical • Example Based

• Statistical Machine Learning • Supervised • Semi-supervised• Unsupervised• Reinforcement

• Supervised approaches• Probabilistic approaches• Non-probabilistic approaches

• Example - Text Classification

Page 13: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Learning Objective• Back to Mangoes -

Task: Given some basic measurable properties of a certain mango, predict which category it belongs to.

ColorWeightSmell

DimensionsTaste??

Alphonso/Alice/Irwin

(Measurable properties/ Attributes/ Features)

(Classes)

Page 14: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Learning Objectives

• What to learn?• Correspondences between various

attributes of the input object and the classes

• How to learn? • Rule based learning • Statistical learning • Example based learning

Page 15: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Learning Paradigms – Rule Based• Learning is based on a set of rules handcrafted by

humans.

• The collection of rules or the “rule-base” has to be exhaustive enough to capture all the corner cases.

• Problems: Extremely hard, needs domain expertise and is highly time-consuming

If (weight<0.5 &&color == “yellow” || color== “green”){ category = “Alphonso”;}else if (…){ category = “Alice”;}

Page 16: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Learning Paradigms – Example Based• A very small set examples having of complete

information (both input and classes) are available. • Templates for each classes are learned automatically. • When a new observation arrives, class prediction is

made based on the template that fits the observation best.

• Problems: • Templates are generic representatives of classes that are

supposed to represent the whole sub-population belonging to certain classes. For many problems, it is quite hard to come up with such representatives with small number of examples.

• Susceptible to change in the nature of the input data

Page 17: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Learning Paradigms – Statistical • Beneficial if a large set of diversified

examples are available.• Feature-Class correspondences are

learned better.• Easy to update classifier if the nature of

the input data changes.• Leverage huge volume of available web-

data • Problems: Overlearning can happen

sometime (referred to as overfitting). Feature selection affects system accuracy.

Page 18: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Introduction to ML - Roadmap• Definition of Machine Learning• Learning to predict

• Classification• Regression

• Learning Paradigms• Rule based• Statistical • Example Based

• Statistical Machine Learning • Supervised • Semi-supervised• Unsupervised• Reinforcement

• Supervised approaches• Probabilistic approaches• Non-probabilistic approaches

• Example - Text Classification

Page 19: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Statistical Machine Learning- Supervised Approaches• Learning is based on a set of

observations for class labels are available.

Alice Irwin Alphonso

Learned Model

Alphonso

Page 20: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Statistical Machine Learning- Semi-Supervised Approaches

• Learning is based on a set of observations for class labels are available AND another set (typically of larger volume than labelled set) of observations for which class labels are not available

Alice Irwin Alphonso

Learned Model

Alphonso

Page 21: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Statistical Machine Learning- Un-Supervised Approaches• Learning when no class labels are

available.

Page 22: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Statistical Machine Learning- Reinforcement Learning• Learning happens with the objective of

maximizing the reward associated with the task.

• Reinforcement learning differs from standard supervised learning in that correct input/output pairs are never presented. Association is captured in terms of rewards.

Page 23: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Introduction to ML - Roadmap• Definition of Machine Learning• Books, Online Courses and Tools• Learning to predict

• Classification• Regression

• Learning Paradigms• Rule based• Statistical • Example Based

• Statistical Machine Learning • Supervised • Semi-supervised• Unsupervised• Reinforcement

• Supervised approaches• Probabilistic approaches• Non-probabilistic approaches

• Example - Text Classification

Page 24: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Supervised Approaches

• Recap:

ColorWeightSmell

DimensionsTaste??

Alphonso/Alice/Irwin

(Measurable properties/ Attributes/ Features)

(Classes)

Page 25: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Supervised Approaches – Probabilistic Models• Given a set of features the classification

decision of probabilistic models can be expressed as

where ,

•  

Page 26: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Supervised Approaches – Naïve Bayes

The prior can be assumed to be a multinomial distribution for classification problems

•  

Likelihood

Prior

Posterior

Page 27: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Supervised Approaches – Naïve Bayes (1)• Now if we assume that features are

independent of each other.

• Note: The independent assumption may not hold true for many real life problems.

•  

Page 28: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Supervised Approaches – Logistic Regression• Remember:

• In Logistic Regression is directly estimated

Where u follows a regular weighted linear equation

The coefficients ( and have to be learned during training).

•  

Page 29: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Supervised Approaches: Non-Probabilistic Models

Class-1

Class-2

(x1,x2)Class-1

Class-2

Page 30: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Supervised Approaches: K-Nearest Neighbor

Class-1

Class-2

K-closest neighbors are decided based on a pre-defined distance measure. The class to which maximum number of close neighbors belong to becomes the winner class

Page 31: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Distance/similarity measures• Euclidian Distance (between vectors X1

and X2)

Which is a special case of Minkowski Distance

• Cosine Distance

•  

Page 32: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Supervised Approaches: Support Vector Machines

Class-1

Class-2

f(x,w,b) = sign(w. x - b)

w. x – b = 0

Page 33: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

SVMs: Specifying the boundary

Plus-Plane

Minus-Plane

Classifier Boundary

“Predict Class

= +1”

zone

“Predict Class

= -1”

zone

M = Margin Width =

w. x – b = 1

w. x – b = -1

w. x – b = 0

Given a guess of w and b we can• Compute whether all data

points in the correct half-planes• Compute the width of the

marginSo now we just need to write a program to search the space of w’s and b’s to find the widest margin that matches all the datapoints. This is primarily done through quadratic programming.

ww.

2

Page 34: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Supervised Approaches – Decision Tree

catego

rical

catego

rical

continuo

us

class MarSt

Color

TaxInc

BA

A

A

Yellow Green

Small Big, Medium

< 80

There could be more than one tree that fits the same data!

Page 35: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Supervised Approaches - Note• It is important to decide a set of features that

adequately explains the data.• Selecting extremely small number of features may

underspecify the data and may not help the classifier to learn properly

• As the number of features increases, the model-complexity increases (i.e., more number of parameters to be learned and chances of overfitting increases).

• Very high dimensional feature vectors make it unintuitive to analyze them, design distance functions and performing combinatorics and optimizations. This is known as “Curse of Dimensionality”

Page 36: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Introduction to ML - Roadmap• Definition of Machine Learning• Books, Online Courses and Tools• Learning to predict

• Classification• Regression

• Learning Paradigms• Rule based• Statistical • Example Based

• Statistical Machine Learning • Supervised • Semi-supervised• Unsupervised• Reinforcement

• Supervised approaches• Generative approaches• Discriminative approaches

• Example - Text Classification

Page 37: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Example – Text Classification• Text classification is an important

problem in the field of Natural Language Processing and Machine Learning.

• Objective: Assign labels to a given text with a class

• Example:1: Obama won the election: Politics2: Brasil lost the football match: Sports

Page 38: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Problems in Text Classification• Lexical Problems:

• Presence of ambiguous words e.g., Cricket (game) vs Cricket (insect)

• Structural Problems:• Complexity at the syntactic levele.g., Mohd. Kaif, who was the hero of the Natwest final match against England in 2002, has joined BJP and will be running for an MP position. (Politics)

• Semantic Problems:• Complexity at the semantic levele.g., With the humiliating defeat in Bihar, INC’s innings seems to be over.

• Pragmatic Problems:e.g., India lost to Zimbawe yesterday (Sports) Bernie lost to Clinton in Newyork. (Politics)

Page 39: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Text Classification – Method

Some Documents

Training DataAnnotation

MODEL(Naïve Bayes, SVM,

Decision Tree etc.)

AnyUnseen

Document

Prediction

Features

Labels

Compute Features

Page 40: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Text Classification – Feature Extraction• Example:

• Training Sample: (Domain classification)1: Obama won the election: Politics2: Brasil lost the football match: Sports

• Features:• Vocabulary: <Obama, won, the, election,

Brasil, lost , football, match>• Bag of Word Features based on

presence/absence:• 1: <1,1,1,1,0,0,0,0>:0• 2: <0,0,1,0,1,1,1,1>:1

Page 41: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Text Classification – Training and Testing• Training:

• Weight of each feature towards a label is computed by training algorithm. Weight decides predictability.

• Test:• Based on the features presented in the test

data, the combined weightage is computed and a label is decided.

• Problem: When a feature is not seen in the training data (Data sparsity problem).

• Solution – instead of taking Bag of Word based features, consider bag of senses, word embedding etc.

Page 42: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Text Classification – Evaluation Metric• Performance of classifiers are typically

measured by Accuracy, Precision, Recall and F-Measure

• For a binary classification problem, if the class lables are positive and negative

• True Positive (TP): Number of test documents that are actually positive, are predicted positive

• True Negative (TN): Number of test documents that are actually negative, are predicted negative.

• False Positive (FP): Number of test documents that are actually negative, are predicted positive.

• False Negative (FN): Number of test documents that are actually positive, are predicted negative.

Page 43: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

•  

Text Classification – Evaluation Metric (1)

Page 44: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Text Classification - DEMO

• Package: Scikit-learn (install numpy, scipy, matplotlib and scikit-learn packages)

• Demo:• Naïve Bayes• SVM• KNN• Decision Tree

Page 45: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Books and Online Courses

• Books• Machine Learning by Tom Mitchell• Pattern Recognition and Machine Learning by Christopher M.

Bishop• Foundations of Machine Learning by Mehryar Mohri, Afshin

Rostamizadeh, Ameet Talwalkar • Machine learning: a Probabilistic Perspective – Kevin Murphy• Bayesian Reasoning and Machine Learning - David Barber• Probabilistic Graphical Models: Principles and Techniques by

Daphne Koller, Nir Friedman

• Courses• Machine Learning - Stanford University (Coursera) –Andrew

Ng• Mining Massive Datasets – Stanford Online

Page 46: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Tools

• Java• Weka (for supervised/semi-supervised)(www.cs.waikato.ac.nz/ml/weka/)• Mallet (for unsupervised)(www.mallet.cs.umass.edu)

• Python• Scikit-Learn (http://scikit-learn.org/)• Statsmodel

(www.statsmodels.sourceforge.net)

• R statistical packages (https://cran.r-project.org/web/packages/)

Page 47: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Thank you

Page 48: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Questions?

Page 49: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

References

• C.J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):955-974, 1998. http://citeseer.nj.nec.com/burges98tutorial.html

• Statistical Learning Theory by Vladimir Vapnik, Wiley-Interscience; 1998

• Bishop, Christopher M. "Pattern recognition." Machine Learning 128 (2006).

Page 50: Introduction to ML - IIT Bombay...Introduction to ML - Roadmap • Definition of Machine Learning • Learning to predict • Classification • Regression • Learning Paradigms •

Image URLS

• depositphotos.com• vizagcityonline.com• en.wikipedia.org/wiki/List_of_mango_culti

vars• tropicalfloridagardens.com• alphonsomango.net• alamy.com


Recommended