+ All Categories
Home > Documents > 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning Instructor: Rong Jin Office...

1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning Instructor: Rong Jin Office...

Date post: 20-Dec-2015
Category:
View: 222 times
Download: 0 times
Share this document with a friend
37
1 Machine Learning Spring 2010 Rong Jin
Transcript
Page 1: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

1

Machine Learning

Spring 2010

Rong Jin

Page 2: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

2

CSE847 Machine Learning Instructor: Rong Jin Office Hour:

Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm

Textbook Machine Learning The Elements of Statistical Learning Pattern Recognition and Machine Learning Many subjects are from papers

Web site: http://www.cse.msu.edu/~cse847

Page 3: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

3

Requirements 6~10 homework assignments One project for each person

Team: no more than 2 people Topics: either assigned by the instructor or

proposed by students themselves Results: a project proposal, a progress report and

a final report Midterm exam & final exam

Page 4: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

4

Goal Familiarize you with the state-of-art in

Machine Learning Breadth: many different techniques Depth: Project Hands-on experience

Develop the way of machine learning thinking Learn how to model real problems using machine

learning techniques Learn how to deal with real problems practically

Page 5: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

5

Course Outline

Theoretical Aspects

•Information Theory

• Optimization Theory

• Probability Theory

• Learning Theory

Practical Aspects

• Supervised Learning Algorithms

• Unsupervised Learning Algorithms

• Important Practical Issues

• Applications

Page 6: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

6

Today’s Topics Why is machine learning? Example: learning to play backgammon General issues in machine learning

Page 7: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

7

Why Machine Learning? Past: most computer programs are mainly

made by hand Future: Computers should be able to program

themselves by the interaction with their environment

Page 8: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

8

Recent Trends Recent progress in algorithm and theory Growing flood of online data Computational power is available Growing industry

Page 9: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

9

Three Niches for Machine Learning Data mining: using historical data to improve

decisions Medical records medical knowledge

Software applications that are difficult to program by hand Autonomous driving Image Classification

User modeling Automatic recommender systems

Page 10: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

10

Typical Data Mining Task

Given:

• 9147 patient records, each describing pregnancy and birth

• Each patient contains 215 features

Task:

• Classes of future patients at high risk for Emergency Cesarean Section

Page 11: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

11

Data Mining Results

One of 18 learned rules:

If no previous vaginal delivery abnormal 2nd Trimester Ultrasound

Malpresentation at admission

Then probability of Emergency C-Section is 0.6

Page 12: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

12

Credit Risk Analysis

Learned Rules:

If Other-Delinquent-Account > 2Number-Delinquent-Billing-Cycles > 1

Then Profitable-Costumer ? = no

If Other-Delinquent-Account = 0(Income > $30K or Years-of-Credit > 3)

Then Profitable-Costumer ? = yes

Page 13: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

13

Programs too Difficult to Program By Hand

ALVINN drives 70mph on highways

Page 14: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

14

Programs too Difficult to Program By Hand

ALVINN drives 70mph on highways

Page 15: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

15

Programs too Difficult to Program By Hand

Positive Examples

Negative Examples

Sta

tist

ical

Mod

el

Train Test

Classify Bird Images

Image Classification

Page 16: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

16

Image Retrieval using Texts

Page 17: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

17

Automatic Image Annotation Automatically annotate images with textual

words Retrieve images with textual queries

Page 18: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

18

Software that Models Users

Description:A homicide detective and a fire marshall must stop a pair of murderers who commit videotaped crimes to become media darlings

Rating:

Description: Benjamin Martin is drawn into the American revolutionary war against his will when a brutal British commander kills his son.

Rating:

Description: A biography of sports legend, Muhammad Ali, from his early days to his days in the ring

Rating:

History What to Recommend?Description: A high-school boy is given the chance to write a story about an up-and-coming rock band as he accompanies it on their concert tour.

Recommend: ?

Description: A young adventurer named Milo Thatch joins an intrepid group of explorers to find the mysterious lost continent of Atlantis.

Recommend: ?

No

Yes

Page 19: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

19

Netflix Contest

Page 20: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

20

Where is this Headed? Today: tip of iceberg

First generation algorithms Applied to well-formatted databases Budding industry

Opportunities for Tomorrow Multimedia Database Robots Automatic computing Bioinformatics …

Page 21: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

21

Relevant Disciplines Artificial Intelligence Statistics (particularly Bayesian Stat.) Computational complexity theory Information theory Optimization theory Philosophy Psychology …

Page 22: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

22

Today’s Topics Why is machine learning? Example: learning to play backgammon General issues in machine learning

Page 23: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

23

What is the Learning Problem Learning = Improving with experience at some task

Improve over task T With respect to performance measure P Based on experience E

Example: Learning to Play Backgammon T: Play backgammon P: % of games won in world tournament E: opportunity to play against itself

Page 24: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

24

Backgammon

More than 1020 states (boards) Best human players see only small fraction of all board

during lifetime Searching is hard because of dice (branching factor > 100)

                              

               

Page 25: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

25

TD-Gammon by Tesauro (1995)

Trained by playing with itself Now approximately equal to the best human

player

Page 26: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

26

Learn to Play Chess Task T: Play chess Performance P: Percent of games won in the

world tournament Experience E:

What experience? How shall it be represented? What exactly should be learned? What specific algorithm to learn it?

Page 27: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

27

Choose a Target Function Goal:

Policy: : b m Choice of value

function V: b, m

B = board

= real values

Page 28: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

28

Choose a Target Function Goal:

Policy: : b m Choice of value

function V: b, m V: b

B = board

= real values

Page 29: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

29

Value Function V(b): Example Definition

If b final board that is won: V(b) = 1 If b final board that is lost: V(b) = -1

If b not final board V(b) = E[V(b*)] where b* is final board after playing optimally

Page 30: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

30

Representation of Target Function V(b)

Same value

for each board

Same value

for each board

Lookup table

(one entry for each board)

Lookup table

(one entry for each board)

No Learning No Generalization

Summarize experience into• Polynomials• Neural Networks

Page 31: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

31

Example: Linear Feature Representation Features:

pb(b), pw(b) = number of black (white) pieces on board b

ub(b), ub(b) = number of unprotected pieces

tb(b), tb(b) = number of pieces threatened by opponent

Linear function: V(b) = w0pb(b)+ w1pw(b)+ w2ub(b)+ w3uw(b)+ w4tb(b)+

w5tw(b)

Learning: Estimation of parameters w0, …, w5

Page 32: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

32

Given: board b Predicted value V(b) Desired value V*(b)

Calculateerror(b) = (V*(b) – V(b))2

For each board feature fi

wi wi + cerror(b)fi

Stochastically minimizesb (V*(b)-V(b))2

Tuning Weights

Gradient Descent Optimization

Page 33: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

33

Obtain Boards

Random boards Beginner plays Professionals plays

Page 34: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

34

Obtain Target Values Person provides value V(b) Play until termination. If outcome is

Win: V(b) 1 for all boards Loss: V(b) -1 for all boards Draw: V(b) 0 for all boards

Play one move: b b’V(b) V(b’)

Play n moves: b b’… b(n)

V(b) V(b(n))

Page 35: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

35

A General Framework

MathematicalModeling

Finding Optimal Parameters

Statistics Optimization+

Machine Learning

Page 36: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

36

Today’s Topics Why is machine learning? Example: learning to play backgammon General issues in machine learning

Page 37: 1 Machine Learning Spring 2010 Rong Jin. 2 CSE847 Machine Learning  Instructor: Rong Jin  Office Hour: Tuesday 4:00pm-5:00pm Thursday 4:00pm-5:00pm.

37

Importants Issues in Machine Learning Obtaining experience

How to obtain experience? Supervised learning vs. Unsupervised learning

How many examples are enough? PAC learning theory

Learning algorithms What algorithm can approximate function well, when? How does the complexity of learning algorithms impact the learning

accuracy? Whether the target function is learnable?

Representing inputs How to represent the inputs? How to remove the irrelevant information from the input representation? How to reduce the redundancy of the input representation?


Recommended