+ All Categories
Home > Documents > The K Nearest Neighbor Algorithm (kNN) Erik Zeitler Uppsala Database Laboratory.

The K Nearest Neighbor Algorithm (kNN) Erik Zeitler Uppsala Database Laboratory.

Date post: 20-Dec-2015
Category:
View: 217 times
Download: 1 times
Share this document with a friend
Popular Tags:
12
The K Nearest Neighbor Algorithm (kNN) Erik Zeitler Uppsala Database Laboratory
Transcript
Page 1: The K Nearest Neighbor Algorithm (kNN) Erik Zeitler Uppsala Database Laboratory.

The K Nearest Neighbor Algorithm (kNN)

Erik ZeitlerUppsala Database Laboratory

Page 2: The K Nearest Neighbor Algorithm (kNN) Erik Zeitler Uppsala Database Laboratory.

23-04-18 Erik Zeitler 2

Examination

Examination is split in two parts• Solve the assignment• Oral examination

During the oral examination• The instructor validates your program using a

script• Non-working program

the examination ends immediately (“fail” grade is given) you may re-do the examination later

• The instructor will ask questions on your implementation on the method itself

• All group members must take part in the solution. Group members can get different grades on the same

assignment.

Page 3: The K Nearest Neighbor Algorithm (kNN) Erik Zeitler Uppsala Database Laboratory.

23-04-18 Erik Zeitler 3

Grades

Fail PassComplete

Before end of semester

Page 4: The K Nearest Neighbor Algorithm (kNN) Erik Zeitler Uppsala Database Laboratory.

23-04-18 Erik Zeitler 4

Examination

Why do we have the oral part? Are we out to get you?

• The assignments cover a good part of the course understanding them will help you.

• If you have problems solving the assignment, please ask during office hours. The only way asking will affect your grade is

that you might learn more.

Different things!

Solving assignmentsUnderstanding your own solution

Page 5: The K Nearest Neighbor Algorithm (kNN) Erik Zeitler Uppsala Database Laboratory.

23-04-18 Erik Zeitler 5

What you need to do

Sign up for oral exam• Groups of 2 – 3 students• Forms are on my office door, P1320

Implement a solution• Deadline: Submit by e-mail 24h before your

oral exam• 1, 2: [email protected]• 3, 4: [email protected]

Answer the questions on the form• Bring one form per student

Prepare for oral exam:• Study the theory behind

Page 6: The K Nearest Neighbor Algorithm (kNN) Erik Zeitler Uppsala Database Laboratory.

23-04-18 Erik Zeitler 6

K Nearest Neighbor

Basic idea:• If it walks like a duck and it quacks like a

duck Then it must be a duck

So how do we know how a duck walks and talks?• Either we ask the other ducks

– or if they are unavailable –• Look at who else is walking and talking

this way.

Page 7: The K Nearest Neighbor Algorithm (kNN) Erik Zeitler Uppsala Database Laboratory.

23-04-18 Erik Zeitler 7

Duck walking and talking

Assume that a duck• has average step length 5…15 cm• quacks at a frequency 600…700 Hz

On the other hand consider a cow:• step length is 30…60 cm• a cow moos at 100…200 Hz

Page 8: The K Nearest Neighbor Algorithm (kNN) Erik Zeitler Uppsala Database Laboratory.

23-04-18 Erik Zeitler 8

Cows and Ducks in a Plot

Page 9: The K Nearest Neighbor Algorithm (kNN) Erik Zeitler Uppsala Database Laboratory.

23-04-18 Erik Zeitler 9

Enter the Chicken

Page 10: The K Nearest Neighbor Algorithm (kNN) Erik Zeitler Uppsala Database Laboratory.

23-04-18 Erik Zeitler 10

Classifying you using kNN

Each of you belong to a group:• [F|STS|Int Masters|Exchange Students|

Other] Let’s classify each one using 1-NN and

3-NN How do we select our distance

measure? How do we decide which of 1-NN and

3-NN is best?

Page 11: The K Nearest Neighbor Algorithm (kNN) Erik Zeitler Uppsala Database Laboratory.

23-04-18 Erik Zeitler 11

Things to Consider for the Assignment

Preprocessing• What are the ranges of the different measurements?• Is one characteristic more important than another?

If so, how can we reflect this? If not, do we need to do something else?

• You can assume: no missing points, no noise

Selecting training and testing data and choosing K• Is the data sorted in any way? If so is this good or bad?• Are there different ways of subdividing the known data?• How do we know if the value of K is good or bad?

Page 12: The K Nearest Neighbor Algorithm (kNN) Erik Zeitler Uppsala Database Laboratory.

23-04-18 Erik Zeitler 12

Things to Consider for the Assignment

Classifying unknown data• Do we need to preprocess the unknown

data?• Which data set should we use to classify

the unknown data? Complexity

• What is the offline part of kNN and what is the online part?

• What is the complexity for the offline and online parts of kNN?


Recommended