+ All Categories
Home > Documents > Short Introduction to Machine Learning Instructor: Rada Mihalcea.

Short Introduction to Machine Learning Instructor: Rada Mihalcea.

Date post: 28-Dec-2015
Category:
Upload: adele-stewart
View: 220 times
Download: 0 times
Share this document with a friend
Popular Tags:
24
Short Introduction to Machine Learning Instructor: Rada Mihalcea
Transcript

Short Introductionto Machine Learning

Instructor: Rada Mihalcea

Slide 1

Learning?

What can we learn from here?If Sky=Sunny and Air Temperature = Warm Enjoy Sport = YesIf Sky=Sunny Enjoy Sport = YesIf Air Temperature = Warm Enjoy Sport = YesIf Sky=Sunny and Air Temperature = Warm and Wind = Strong

Enjoy Sport = Yes ??

Example Sky Air Temp Humidity Wind Water Forecast Enjoy Sport

1 Sunny Warm Normal Strong Warm Same Yes2 Sunny Warm High Strong Warm Same Yes3 Rainy Cold High Strong Warm Change No4 Sunny Warm High Strong Cold Change Yes

Slide 1

What is machine learning?

(H.Simon)“Any process by which a system improves performance”

(T.Mitchell)“A computer program is said to learn from experience E with

respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”

• Machine Learning has to do with designing computer programs that improve their performance through experience

Slide 1

Related areas

Artificial intelligenceProbability and statisticsComputational complexity theoryInformation theoryHuman language technology

Slide 1

Applications of ML

Learning to recognize spoken wordsSPHINX (Lee 1989)

Learning to drive an autonomous vehicleALVINN (Pomerleau 1989)

Learning to classify celestial objects(Fayyad et al 1995)

Learning to play world-class backgammonTD-GAMMON (Tesauro 1992)

Learning to translate between languagesLearning to classify texts into categories

Web directories

Slide 1

Main directions in ML

Data miningFinding patterns in dataUse “historical” data to make a decision

Predict weather based on current conditionsSelf customization

Automatic feedback integrationAdapt to user “behaviour”

Recommending systemsWriting applications that cannot be programmed by hand

In particular because they involve huge amounts of dataSpeech recognitionHand writing recognitionText understanding

Slide 1

Terminology

Learning is performed from EXAMPLES (or INSTANCES)An example contains ATTRIBUTES or FEATURES

E.g. Sky, Air Temperature, WaterIn concept learning, we want to learn the value of the

TARGET ATTRIBUTE Classification problems. Binary case +/– positive/negative

Attributes have VALUES:A single value (e.g. Warm)? - indicates any value possible for this attribute - indicates that no value is acceptable.

All features in an example are sometimes referred to as FEATURE VECTOR

Slide 1

Terminology

Feature vector for our learning problem:(Sky, Air Temp, Humidity, Wind, Water, Forecast) and the

target attribute is EnjoySport.How to represent Aldo enjoys sports only on cold days

with high humidity(?, Cold, High, ?, ?, ?)

How about Emma enjoys sports regardless of the weather?

Hypothesis = the entire set of vectors that cover given examples

Most general hypothesis(?, ?, ?, ?, ?, ?)

Most specific hypothesis(, , , , , )

How many hypothesis can be generated for our feature vector ?

Slide 1

Task in machine learning

Given:A set of examples XA set of hypotheses HA target concept c

Determine:A hypothesis h in H such that h(x) = c(x)

Practically, we want to determine those hypotheses that would best fit our examples.(Sunny, ?, ?, ?, ?, ?) Yes(?, Warm, ?, ?, ?, ?) Yes(Sunny, Warm, ?, ?, ?, ?) Yes

Slide 1

Machine learning applications

Until now: toy example, decide if X enjoys sport given the current and future forecast

Practical problems:Part of speech tagging. How?Word sense disambiguationText categorizationChunking..

Whatever problem that can be modeled through examples should support learning

Slide 1

Machine learning algorithms

Concept learning via searching on general-specific hypotheses

Decision tree learningInstance based learningRule based learningNeural networksBayesian learningGenetic algorithms

Slide 1

Basic elements of information theory

How to determine which attribute is the best classifier?Measure the information gain of each attribute

Entropy characterizes the (im)purity of an arbitrary collection of examples. Given a collection S of positive and negative examplesEntropy(S) = - p log p – q log q Entropy is at its maximum when p = q = ½Entropy is at its minimum when p = 1 and q = 0

Example:S contains 14 examples: 9 positive and 5 negativeEntropy(S) = - (9/14) log (9/14) – (5/14) log (5/14) = 0.94

log 0 = 0

Slide 1

Basic elements of information theory

Information gainMeasures the expected reduction in entropy

Many learning algorithms are making decisions based on information gain

)(||

||)(),(

)(v

AValuesv

v SEntropyS

SSEntropyASGain

Slide 1

Basic elements of information theory

Slide 1

Decision trees

Slide 1

Decision trees

Slide 1

Decision trees

Have the capability of generating rules:IF outlook=sunny and temperature = hot THEN play tennis = no

Powerful! It would be very hard to do that as a human.C4.5 (Quinlan)ID3Integral part of MLC++Integral part of Weka (for Java)

Slide 1

Instance based algorithms

Distance between examplesRemember the WSD algorithm?

K-nearest neighbourGiven a set of examples X

(a1(x), a2(x) … an(x))Classify a new instance based on the distance between

current example and all examples in training

n

rjrirji xaxaxxd

1

2))()((),(

Slide 1

Instance based algorithms

Take into account every single example:Advantage? Disadvantage?

“Do not forget exceptions”Very good for NLP tasks:

WSDPOS tagging

Slide 1

Measure learning performance

Error on test dataSample error (generalization error): wrong cases / total casesTrue error: estimate an error range starting with the sample

error

Cross validation schemes – for more accurate evaluations10 fold cross validation schemeDivide training data into 10 setsUse one set for testing, and the other 9 sets for trainingRepeat 10 times, measure average accuracy

Slide 1

Practical issues – Using Weka

Weka – freewareJava implementation of many learning algorithms+ boosting+ capability of handling very large data sets+ automatic cross – validation

To run an experiment:file.arff [test optional – if not present, will evaluate through cross-

validation]

Slide 1

Specify the feature types

Specify the feature types:Discrete: value drawn from a set of nominal valuesContinuous: numeric value

Example : Golf dataPlay, Don't Play. | the target attribute

outlook: sunny, overcast, rain. | features.temperature: real.humidity: real.windy: true, false.

Slide 1

Weather Data

sunny, 85, 85, false, Don't Playsunny, 80, 90, true, Don't Playovercast, 83, 78, false, Playrain, 70, 96, false, Playrain, 68, 80, false, Playrain, 65, 70, true, Don't Playovercast, 64, 65, true, Playsunny, 72, 95, false, Don't Playsunny, 69, 70, false, Playrain, 75, 80, false, Playsunny, 75, 70, true, Playovercast, 72, 90, true, Playovercast, 81, 75, false, Playrain, 71, 80, true, Don't Play

Slide 1

Running Weka

Check “Short Intro to Weka”


Recommended