+ All Categories
Home > Data & Analytics > K Nearest Neighbor Presentation

K Nearest Neighbor Presentation

Date post: 13-Apr-2017
Category:
Upload: dessy-amirudin
View: 150 times
Download: 3 times
Share this document with a friend
36
k Nearest Neighbor Dessy Amirudin May 2016 Data Science Indonesia Bootcamp
Transcript
Page 1: K Nearest Neighbor Presentation

k Nearest Neighbor

Dessy Amirudin

May 2016Data Science Indonesia

Bootcamp

Page 2: K Nearest Neighbor Presentation

Introduction

Page 3: K Nearest Neighbor Presentation

Other Name• K-Nearest Neighbors • Memory-Based Reasoning• Example-Based Reasoning• Instance-Based Learning• Case-Based Reasoning• Lazy Learning

Page 4: K Nearest Neighbor Presentation

History of kNN

• Has been used in statistical estimation and pattern recognition already in the beginning of 1970’s (non-parametric techniques).

• The outcome decision is based on k nearest neighbor from its evidence

• The nearest neighbor is calculated based on the distance

Page 5: K Nearest Neighbor Presentation

Application

text mining agriculture

financial healthcare

Page 6: K Nearest Neighbor Presentation

Source: http://personalexcellence.co/

Page 7: K Nearest Neighbor Presentation

Distance

• Numerical Data

• Categorical Data

𝐷=√∑𝑖=1𝑛

(𝑥 𝑖− 𝑦 𝑖 )2

Page 8: K Nearest Neighbor Presentation

Distance – Text Mining

Hamming Distance

•"karolin" and "kathrin" is 3.•"karolin" and "kerstin" is 3.•1011101 and 1001001 is 2.•2173896 and 2233796 is 3.

Page 9: K Nearest Neighbor Presentation

Regression Formulation

Page 10: K Nearest Neighbor Presentation

kNN Regression

0 2 4 6 8 10 12 14 16 180

20

40

60

80

100

120

Page 11: K Nearest Neighbor Presentation

0 2 4 6 8 10 12 14 16 180

5

10

15

20

25

30

35

40

kNN Regression

𝑦 ′= 1𝐾 ∑

𝑖=1

𝐾

𝑦 𝑖

Page 12: K Nearest Neighbor Presentation

Simple Linear Regression

Page 13: K Nearest Neighbor Presentation

Exercise 1• Open “simple_regression.R”• Create the simulated data• Follow the instruction

Page 14: K Nearest Neighbor Presentation

Simulated Data 1

Page 15: K Nearest Neighbor Presentation

MSE Plot Simple Regression

Page 16: K Nearest Neighbor Presentation

Plot with K=1

Page 17: K Nearest Neighbor Presentation

Plot with K=10

Page 18: K Nearest Neighbor Presentation

Plot with K=100

Page 19: K Nearest Neighbor Presentation

Simple Linear RegressionIntroduce Non Linearity

Page 20: K Nearest Neighbor Presentation

Introducing Non Linear Component

Page 21: K Nearest Neighbor Presentation

MSE Plot Non Linear Problem

Page 22: K Nearest Neighbor Presentation

Curse of Dimensionality

Page 23: K Nearest Neighbor Presentation

Exercise 2• Open “boston_knn_class.R”• Load MASS library• Load “Boston” data• Follow the step in the file

Page 24: K Nearest Neighbor Presentation

kNN Tips• Normalize the input variable• Find the optimum value of K using cross validation

Page 25: K Nearest Neighbor Presentation
Page 26: K Nearest Neighbor Presentation

Other experiment

Page 27: K Nearest Neighbor Presentation

Classification Formulation

Page 28: K Nearest Neighbor Presentation

kNN Classification

𝑦 ′=argmin𝑣

∑( 𝑥𝑖 , 𝑦 𝑖)∈𝐷𝑧

𝐼 (𝑣=𝑦 𝑖)

Page 29: K Nearest Neighbor Presentation

Binary Classification

Page 30: K Nearest Neighbor Presentation

Exercise 3• Open “logistic vs knn v2.R”• Follow the step

Page 31: K Nearest Neighbor Presentation

Recall on Confusion Table

• Source wikipedia

Page 32: K Nearest Neighbor Presentation

Multi-class Classification

Page 33: K Nearest Neighbor Presentation

Exercise 4• Open “multiclass.R”• Follow the step

Page 34: K Nearest Neighbor Presentation

Assigment

Page 35: K Nearest Neighbor Presentation

Assignment – Due to Next Week• Increase the accuracy of the Multiclass problem by 10%• In word document, tell what is the improvement that you can obtaind,

what is your method, why it is work, why it doesn’t work

• Submit your code and word document to [email protected] before 23 May 2016 23:59:59

Hint: You can increase the sample size

Page 36: K Nearest Neighbor Presentation

References

• Hastie T., Tibshirani R., Witten D. and James G. The Introduction of Statistical Learning. Springer. 2014.


Recommended