+ All Categories
Home > Documents > Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University...

Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University...

Date post: 18-Jan-2016
Category:
Upload: antony-holmes
View: 219 times
Download: 0 times
Share this document with a friend
Popular Tags:
26
Seminar of Interest Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin-Whitewater. "Hybrid User Model for Information Retrieval: Framework and Evaluation".
Transcript
Page 1: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

Seminar of InterestSeminar of Interest

Friday, September 15, at 11:00 am, EMS W220.

Dr. Hien Nguyen of the University of Wisconsin-Whitewater.

"Hybrid User Model for Information Retrieval: Framework and Evaluation".

Page 2: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

Overview of Today’s Overview of Today’s LectureLecture

• Last Time: representing examples Last Time: representing examples (feature selection) HW0, intro to (feature selection) HW0, intro to supervised learningsupervised learning• HW0 due on TuesdayHW0 due on Tuesday

• Today: K-NN wrapup, Naïve BayesToday: K-NN wrapup, Naïve Bayes

• Reading Assignment: Section 2.1, Reading Assignment: Section 2.1, 2.2, Chapter 52.2, Chapter 5

Page 3: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

Nearest-Neighbor Nearest-Neighbor AlgorithmsAlgorithms(aka. Exemplar models, instance-based learning (aka. Exemplar models, instance-based learning

(IBL), case-based learning)(IBL), case-based learning)

• Learning ≈ memorize training examplesLearning ≈ memorize training examples• Problem solving = find most similar Problem solving = find most similar

example in memory; output its categoryexample in memory; output its categoryVenn

-

--

-

-

--

-+

+

+

+ + +

++

+

+?

…“Voronoi

Diagrams”(pg 233)

Page 4: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

Sample Experimental Sample Experimental ResultsResults

TestbedTestbed Testset CorrectnessTestset Correctness

IBLIBL D-TreesD-Trees Neural NetsNeural Nets

Wisconsin Wisconsin CancerCancer

98%98% 95%95% 96%96%

Heart Heart DiseaseDisease

78%78% 76%76% ??

TumorTumor 37%37% 38%38% ??

AppendicitisAppendicitis 83%83% 85%85% 86%86%

Simple algorithm works quite well!

Page 5: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

““Hamming Distance”Hamming Distance”•Ex 1 = 2Ex 1 = 2•Ex 2 = 1Ex 2 = 1•Ex 3 = 2Ex 3 = 2

Simple Example – 1-NNSimple Example – 1-NN

Training SetTraining Set1.1. a=0, b=0, c=1a=0, b=0, c=1 ++2.2. a=0, b=1, c=0a=0, b=1, c=0 --3.3. a=1, b=1, c=1a=1, b=1, c=1 --Test ExampleTest Example• a=0, b=1, c=0 a=0, b=1, c=0 ??

So output -

(1-NN ≡(1-NN ≡ one nearest neighbor)one nearest neighbor)

Page 6: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

K-NN AlgorithmK-NN Algorithm

Collect K nearest neighbors, select majority Collect K nearest neighbors, select majority classification (or somehow combine their classification (or somehow combine their classes)classes)

• What should K be?What should K be?• Problem dependentProblem dependent• Can use Can use tuning setstuning sets (later) to select (later) to select

a good setting for Ka good setting for KTuning SetError Rate

1 2 3 4 5 K

Page 7: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

What is the “distance” What is the “distance” between two between two examples?examples?

features

i

iiii eedweed

#

12121 ),(*),(

distance between examples 1 and 2

numeric feature specific weight

distance for feature i only

One possibility: sum the distances between features

Page 8: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

Using K neighbors to Using K neighbors to classify an exampleclassify an example

Given: nearest neighbors Given: nearest neighbors ee11, ..., e, ..., ekk

with output categories with output categories OO11, ..., O, ..., Okk

The output for example The output for example eett is is

k

iiti

c

cOee1categories possible

),(*),(maxarg Ot =

the kernel “delta” function (=1 if Oi=c, else =0)

Page 9: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

Kernel FunctionsKernel Functions

• Term “kernel” comes from Term “kernel” comes from statisticsstatistics

• Major topic for support vector Major topic for support vector machines (later)machines (later)

• Weights interaction between pairs Weights interaction between pairs of examplesof examples• can involve a similarity measurecan involve a similarity measure

Page 10: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

Kernel function Kernel function ((eeii, , eett) Examples) Examples

((eeii, e, ett) = 1) = 1

If If ((eeii, e, ett) =1 / dist() =1 / dist(eeii, e, ett) )

simple majority vote (? classified as -)

inverse distance weight (? could be classified as +)

-

-+?

In the diagram to the right, the example ‘?’ has three neighbors, two of which are ‘-’ and one of which is ‘+’.

Page 11: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

Gaussian Kernel: Gaussian Kernel: popular in SVMspopular in SVMs

2

2

2),( ti ee

ti eee

Euler’s constant

distance between two examples

“standard deviation”

Page 12: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

y = 1 / x

y = 1 / exp(x2)

y = 1 / x2

Page 13: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

Instance-Based Instance-Based Learning (IBL) and Learning (IBL) and EfficiencyEfficiency• IBL algorithms postpone work IBL algorithms postpone work

from training to testingfrom training to testing• Pure NN/IBL just memorizes the Pure NN/IBL just memorizes the

training datatraining data

• Computationally intensiveComputationally intensive• Match all features of all training Match all features of all training

examplesexamples

Page 14: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

Instance-Based Instance-Based Learning (IBL) and Learning (IBL) and EfficiencyEfficiency• Possible Speed-upsPossible Speed-ups

• Use a subset of the training Use a subset of the training examples (Aha)examples (Aha)

• Use clever data structures (A. Use clever data structures (A. Moore)Moore)• KD trees, hash tables, Voronoi diagramsKD trees, hash tables, Voronoi diagrams

• Use a subset of the featuresUse a subset of the features• Feature selectionFeature selection

Page 15: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

Feature Selection as Feature Selection as Search ProblemSearch Problem

• State = set of featuresState = set of features• Start state: Start state:

• No feature (forward selection) or No feature (forward selection) or • All features (backward selection)All features (backward selection)

• Operators = add/subtract Operators = add/subtract featuresfeatures

• Scoring function = acc. on tuning Scoring function = acc. on tuning setset

Page 16: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

Forward and Backward Forward and Backward Selection of FeaturesSelection of Features

• Hill-climbing (“greedy”) searchHill-climbing (“greedy”) search

{}50%

{FN}71%

{F1}62%

add F

N

ad

d F

1

{F1,F2,...,FN}73%

{F2,...,FN}79%

Forward

Backward

add F1

...

...

subtract F1

subtract F2

Features to use

Accuracy on tuning set (our heuristic function)

...

Page 17: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

Forward vs. Backward Forward vs. Backward Feature SelectionFeature Selection

• Faster in early steps Faster in early steps because fewer because fewer features to testfeatures to test

• Fast for choosing a Fast for choosing a small subset of the small subset of the featuresfeatures

• Misses useful features Misses useful features whose usefulness whose usefulness requires other requires other features (feature features (feature synergy)synergy)

• Fast for choosing all Fast for choosing all but a small subset but a small subset of the featuresof the features

• Preserves useful Preserves useful features whose features whose usefulness requires usefulness requires other featuresother features• Example: area Example: area

important, features = important, features = length, widthlength, width

Forward Backward

Page 18: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

Feature Selection and Feature Selection and Machine LearningMachine Learning

Filtering-Based Filtering-Based Feature SelectionFeature Selection

all featuresall features

subset of featuressubset of features

modelmodel

Wrapper-Based Wrapper-Based Feature SelectionFeature Selection

FS algorithm

ML algorithmML algorithm

all features

model

FS algorithm

calls ML algorithm many times, uses it to help select features

Page 19: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

Number of Features Number of Features and Performanceand Performance

• Too many features can hurt test set performance

• Too many irrelevant features mean many spurious correlation possibilities for a ML algorithm to detect

Page 20: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

““Vanilla” K-NN Report Vanilla” K-NN Report CardCard

Learning EfficiencyLearning Efficiency A+A+

Classification EfficiencyClassification EfficiencyFF

StabilityStability CC

Robustness (to noise)Robustness (to noise) DD

Empirical PerformanceEmpirical PerformanceCC

Domain InsightDomain Insight FF

Implementation EaseImplementation Ease AA

Incremental EaseIncremental Ease AA

But is a good baseline

Page 21: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

K-NN SummaryK-NN Summary

• K-NN can be an effective ML K-NN can be an effective ML algorithmalgorithm• Especially if few irrelevant featuresEspecially if few irrelevant features

• Good baseline for experimentsGood baseline for experiments

Page 22: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

A Different Approach to A Different Approach to Classification:Classification:Probabilistic ModelsProbabilistic Models• Indicate Indicate confidence confidence in in

classificationclassification• Given feature vector:Given feature vector:

F = (fF = (f11= v= v11, … , f, … , fnn = v = vnn))

• Output probability:Output probability:P(class = + | F)P(class = + | F)

The probability the class is positive given”the feature vector

Page 23: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

Probabilistic K-NNProbabilistic K-NN

• Output probability using Output probability using kk neighborsneighbors

• Possible algorithm:Possible algorithm:

P(class = + | F) = number of “+” P(class = + | F) = number of “+” neighborsneighbors

kk

Page 24: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

Bayes’ RuleBayes’ Rule

• Definitions:Definitions:P(A^B) P(A^B) P(B)*P(A|B) P(B)*P(A|B)P(A^B) P(A^B) P(A)*P(B|A) P(A)*P(B|A)

• So So (assuming P(B) > 0):(assuming P(B) > 0):

P(B)*P(A|B) = P(A)*P(B|A)P(B)*P(A|B) = P(A)*P(B|A)

P(A|B) = P(A|B) = P(A)*P(B|A)P(A)*P(B|A) P(B)P(B)

A B

Bayes’ rule

Page 25: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

Conditional Conditional ProbabilitiesProbabilities

• Note the difference:Note the difference:• P(A|B) is smallP(A|B) is small• P(B|A) is largeP(B|A) is large

Page 26: Seminar of Interest Friday, September 15, at 11:00 am, EMS W220. Dr. Hien Nguyen of the University of Wisconsin- Whitewater. "Hybrid User Model for Information.

Bayes’ Rule Applied to Bayes’ Rule Applied to MLML

P(class | F) =P(class | F) =P(F | class) * P(class)P(F | class) * P(class)

P(F)P(F)

Why do we care about Bayes’ rule?Why do we care about Bayes’ rule?Because while P(class|F) is typically difficult to Because while P(class|F) is typically difficult to

directly measure, the values on the RHS are directly measure, the values on the RHS are often easy to estimate (especially if we often easy to estimate (especially if we make simplifying assumptions)make simplifying assumptions)

Shorthand forShorthand for

P(class = P(class = c c | f| f11= v= v11, … , f, … , fnn = v = vnn))


Recommended