+ All Categories
Home > Business > Poggi analytics - distance - 1a

Poggi analytics - distance - 1a

Date post: 16-Jan-2017
Category:
Upload: gaston-liberman
View: 53 times
Download: 0 times
Share this document with a friend
51
Buenos Aires, marzo de 2016 Eduardo Poggi www.umiacs.umd.edu/~mrastega/
Transcript
Page 1: Poggi   analytics - distance - 1a

Buenos Aires, marzo de 2016Eduardo Poggi

www.umiacs.umd.edu/~mrastega/

Page 2: Poggi   analytics - distance - 1a

Instance Based Learning

Distancias Introducción k-nearest neighbor Locally weighted regression Radial Basis Functions Case-Based Reasoning Reducción de instancias

Page 3: Poggi   analytics - distance - 1a

Distancias

¿Y si es para algunos en lugar de para todos?

Page 4: Poggi   analytics - distance - 1a

Distancias

Page 5: Poggi   analytics - distance - 1a

Distancias

Page 6: Poggi   analytics - distance - 1a

Distancias

Page 7: Poggi   analytics - distance - 1a

Distancias

Autos Motos Elect. Juguet. Golosinas Trigo Pollos

Autos 1 0.8 0.5 0.2 0.1 0 0

Motos 1 0.5 0.2 0.1 0 0

Elect. 1 0.2 0.1 0 0

Juguet. 1 0.1 0 0

Golosinas 1 0.5 0.5

Trigo 1 0.7

Pollos 1

Page 8: Poggi   analytics - distance - 1a

Distancias

Distancia de Levenshtein, distancia de edición, o distancia entre palabras, al número mínimo de operaciones requeridas para transformar una cadena de caracteres en otra. Se entiende por operación: inserción, eliminación o la sustitución de un carácter.

https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#Python

Page 9: Poggi   analytics - distance - 1a

Distancias

www.sc.ehu.es/ccwgrrom/transparencias/pdf-vision-1-transparencias/capitulo-1.pdf

Page 10: Poggi   analytics - distance - 1a

Distancias

http://www.nidokidos.org/threads/29243-Animals-humans-face-similarity-funny-pics!!

Page 11: Poggi   analytics - distance - 1a

Distancias

http://lear.inrialpes.fr/people/nowak/similarity/

Page 12: Poggi   analytics - distance - 1a

DistanciasProducto

Comestibles Limpieza Indumentaria

Animal Vegetal Mineral

Lácteos Cárnicos

Leche liquida Leche fermentada Quesos Manteca

Yogurt entero Yogurt descremado

Yogurt natural Yogurt saborizado

Page 13: Poggi   analytics - distance - 1a

¿IBL?

La idea es simple: La clase de una instancia debe ser similar a la clase asociada e

ejemplos parecidos. Almacenar todo los ejemplos. Cuando se recibe una instancia para clasificar se buscan los

ejemplos “más parecidos” y se analizan las clases asignadas.

Pero: La clasificación puede ser costosa ¿Todos los atributos son igual de relevantes? ¿Cuántos son los ejemplos parecidos? ¿Si los ejemplos parecidos tienen clases disímiles? ¿Todos lso ejemplos parecidos “pesan” igual? ¿Qué tan parecidos deben ser los parecidos?

Page 14: Poggi   analytics - distance - 1a

K-nearest neighbor

To define how similar two examples are we need a metric. We assume all examples are points in an n-dimensional space Rn and

use the Euclidean distance: Let Xi and Xj be two examples. Their distance d(Xi,Xj) is defined as:

d(Xi, Xj) = ( Σk [xik – xjk]2 ) ** 1/2

Where xik is the value of attribute k on example Xi.

Page 15: Poggi   analytics - distance - 1a

K-nearest neighbor for discrete classes

K = 4 New example

Page 16: Poggi   analytics - distance - 1a

Nearest Neighbor

Four things make a memory based learner:

1. A distance metric Euclidian

2. How many nearby neighbors to look at?One

3. A weighting function (optional)Unused

4. How to fit with the local points?Just predict the same output as the nearest neighbor.

Page 17: Poggi   analytics - distance - 1a

Voronoi Diagram

Decision surface induced by a 1-nearest neighbor. The decisionsurface is a combination of convex polyhedra surrounding each training example.

Page 18: Poggi   analytics - distance - 1a

The Zen of Voronoi Diagrams

Page 19: Poggi   analytics - distance - 1a

0 Nearest Neighbor

Page 20: Poggi   analytics - distance - 1a

1 Nearest Neighbor

Page 21: Poggi   analytics - distance - 1a

3 Nearest Neighbor

Page 22: Poggi   analytics - distance - 1a

5 Nearest Neighbor

Page 23: Poggi   analytics - distance - 1a

7 Nearest Neighbor

Page 24: Poggi   analytics - distance - 1a

k-Nearest Neighbour Classification Method

Key idea: keep all the training instances Given query example, take vote amongst its k

neighbours Neighbours are determined by using a distance

function

Page 25: Poggi   analytics - distance - 1a

k-Nearest Neighbour Classification Method

(k=1) (k=4)

Probability interpretation: estimate p(y|x) as

, | , ( )( | ) , ( ) is the neighborhood around

| ( ) |i i i ix y y y x N x

p y x N x xN x

Sample adapted from Rong Jin’s slides

Page 26: Poggi   analytics - distance - 1a

k-Nearest Neighbour Classification Method

Advantages: Training is really fast Can learn complex target functions

Disadvantages Slow at query time: Efficient data structures are needed

to speed up the query

Page 27: Poggi   analytics - distance - 1a

How to choose k?

Use validation with leave-one-out method

For k = 1, 2, …, K

Err(k) = 0;

1. Randomly select a training data point and hide its class label

2. Using the remaining data and given k to predict the class label for the left data point

3. Err(k) = Err(k) + 1 if the predicted label is different from the true label

Repeat the procedure until all training examples are tested

Choose the k whose Err(k) is minimal

Page 28: Poggi   analytics - distance - 1a

How to choose k?

Use validation with leave-one-out method

For k = 1, 2, …, K

Err(k) = 0;

1. Randomly select a training data point and hide its class label

2. Using the remaining data and given k to predict the class label for the left data point

3. Err(k) = Err(k) + 1 if the predicted label is different from the true label

Repeat the procedure until all training examples are tested

Choose the k whose Err(k) is minimal

Page 29: Poggi   analytics - distance - 1a

How to choose k?

Use validation with leave-one-out method

For k = 1, 2, …, K

Err(k) = 0;

1. Randomly select a training data point and hide its class label

2. Using the remaining data and given k to predict the class label for the left data point

3. Err(k) = Err(k) + 1 if the predicted label is different from the true label

Repeat the procedure until all training examples are tested

Choose the k whose Err(k) is minimal

(k=1)

Page 30: Poggi   analytics - distance - 1a

How to choose k?

Use validation with leave-one-out method

For k = 1, 2, …, K

Err(k) = 0;

1. Randomly select a training data point and hide its class label

2. Using the remaining data and given k to predict the class label for the left data point

3. Err(k) = Err(k) + 1 if the predicted label is different from the true label

Repeat the procedure until all training examples are tested

Choose the k whose Err(k) is minimal

Err(1) = 1

Page 31: Poggi   analytics - distance - 1a

How to choose k?

Use validation with leave-one-out method

For k = 1, 2, …, K

Err(k) = 0;

1. Randomly select a training data point and hide its class label

2. Using the remaining data and given k to predict the class label for the left data point

3. Err(k) = Err(k) + 1 if the predicted label is different from the true label

Repeat the procedure until all training examples are tested

Choose the k whose Err(k) is minimal

Err(1) = 1

Page 32: Poggi   analytics - distance - 1a

How to choose k?

Use validation with leave-one-out method

For k = 1, 2, …, K

Err(k) = 0;

1. Randomly select a training data point and hide its class label

2. Using the remaining data and given k to predict the class label for the left data point

3. Err(k) = Err(k) + 1 if the predicted label is different from the true label

Repeat the procedure until all training examples are tested

Choose the k whose Err(k) is minimalErr(1) = 3

Err(2) = 2

Err(3) = 6k = 2

Page 33: Poggi   analytics - distance - 1a

K-nearest neighbor for discrete classes

Algorithm (parameter k) For each training example (X,C(X)) add the example to our

training list. When a new example Xq arrives, assign class:

C(Xq) = majority voting on the k nearest neighbors of Xq C(Xq) = argmax v Σi δ(v, C(Xi))

where δ(a,b) = 1 if a = b and 0 otherwise

Page 34: Poggi   analytics - distance - 1a

K-nearest neighbor for real-valued functions

Algorithm (parameter k)

For each training example (X,C(X)) add the example to our training list.

When a new example Xq arrives, assign class: C(Xq) = average value among k nearest neighbors of Xq C(Xq) = Σ C(Xi) / k

Page 35: Poggi   analytics - distance - 1a

Distance Weighted Nearest Neighbor

It makes sense to weight the contribution of each example according to the distance to the new query example.

C(Xq) = argmax v Σi wi δ(v, C(Xi))

For example, wi = 1 / d(Xq,Xi)

Page 36: Poggi   analytics - distance - 1a

Nearest Neighbor

Four things make a memory based learner:

1. A distance metric Euclidian

2. How many nearby neighbors to look at?k

3. A weighting function (optional)1 / d(Xq,Xi)

4. How to fit with the local points?Just predict the same output as the nearest neighbor.

Page 37: Poggi   analytics - distance - 1a

Distance Weighted Nearest Neighbor for Real-Valued Functions

For real valued functions we average based on the weight function and normalize using the sum of all weights.

C(Xq) = Σi wi C(Xi) / Σ wi

Page 38: Poggi   analytics - distance - 1a

Problems with k-nearest Neighbor

The distance between examples is based on all attributes. What if some attributes are irrelevant?

Consider the curse of dimensionality. The larger the number of irrelevant attributes, the higher the

effect on the nearest-neighbor rule.

One solution is to use weights on the attributes. This is like stretching or contracting the dimensions on the input space.

Ideally we would like to eliminate all irrelevant attributes.

Page 39: Poggi   analytics - distance - 1a

Locally Weighted Regression

Let’s remember some terminology:

Regression: Is a problem similar to classification but the value to predict is a real number.

Residual: The difference between the true target value f and our approximation f’: f(X) – f’(X)

Kernel Function: The distance function that provides a weight to each example. The kernel function K is a function of the distance between examples: K = f(d(Xi,Xq))

Page 40: Poggi   analytics - distance - 1a

Locally Weighted Regression

The method is called locally weighted regression for the following reasons:

“Locally” because the predicted value for an example Xq is based only on the vicinity or neighborhood around Xq.

“Weighted” because the contribution of each neighbor of Xq will depend on the distance between the neighbor example and Xq.

“Regression” because the value to predict will be a real number.

Page 41: Poggi   analytics - distance - 1a

Locally Weighted Regression

Consider the problem of approximating a target function using a linear combination of attribute values:

f’(X) = w0 + w1x1 + w2x2 + … + wnxn where X = (x1, x2, …, xn)

We want to find those coefficients that minimize the error: E = ½ Σk [f(X) – f’(X)]2

Page 42: Poggi   analytics - distance - 1a

Locally Weighted Regression

If we do this in the vicinity of an example Xq and we wish to use a kernel function, we get a form of locally weighted regression:

E(Xq) = ½ Σk ( [f(X) – f’(X)]2 K(d(Xq,X) )

where the sum now goes over the neighbors of Xq.

Page 43: Poggi   analytics - distance - 1a

Locally Weighted Regression

Using gradient descent search, the update rule is defined as:

ΔΔ Wj = n Σk [f(X) – f’(X)] K(d(Xq,X) xj

where n is the learning rate and xj is the jth attribute of example X.

Page 44: Poggi   analytics - distance - 1a

Locally Weighted Regression

Then here are somecommonly usedweighting functions…(we use a Gaussian)

Page 45: Poggi   analytics - distance - 1a

Nearest Neighbor

1. A distance metric Scaled Euclidian

2. How many nearby neighbors to look at?All of them

3. A weighting function (optional)w_k = exp(-D(x_k , x_query )^2 / Kw^2 )

4. How to fit with the local points?First form a local linear model. Find the β that

minimizes the locally weighted sum of squared residuals:

Page 46: Poggi   analytics - distance - 1a

Locally Weighted Regression

Remarks:

The literature contains other functions that are non linear.

There are many variations to locally weighted regression that use different kernel functions.

Normally a linear model is sufficiently good to approximate the local neighborhood of an example.

Page 47: Poggi   analytics - distance - 1a

Reducción de instancias

Page 48: Poggi   analytics - distance - 1a

Reducción de instancias

Page 49: Poggi   analytics - distance - 1a

Reducción de instancias

Page 50: Poggi   analytics - distance - 1a

[email protected]

eduardo-poggi

http://ar.linkedin.com/in/eduardoapoggi

https://www.facebook.com/eduardo.poggi

@eduardoapoggi

Page 51: Poggi   analytics - distance - 1a

Bibliografía


Recommended