+ All Categories
Home > Documents > By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI...

By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI...

Date post: 03-Jul-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
16
By Atul S. Kulkarni Graduate Student, University of Minnesota Duluth Under The Guidance of Dr. Richard Maclin
Transcript
Page 1: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John

ByAtul S. Kulkarni

Graduate Student,University of Minnesota Duluth

Under The Guidance ofDr. Richard Maclin

http://www.d.umn.edu/~kulka053/Presentation_full.pdf
Page 2: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John

Problem Statementy Given a set of users with their previous ratings for a set of

movies, can we predict the rating they will assign to a movie they have not previously rated?

y Netflix puts it as y “The Netflix Prize seeks to substantially improve the accuracy of

predictions about how much someone is going to love a movie based on their movie preferences. Improve it enough and you win one (or more) Prizes. Winning the Netflix Prize improves our ability to connect people to the movies they love.” – www.netlfixprize.com

y So what do they want?y 10% improvement to their existing system.

y They are paying $1 Million for this.

Page 3: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John

Problem Statementy Similarly, “which movie will you like” given that you

have seen X-Men, X-Men II, X-Men : The Last Stand and users who saw these movies also liked “X-Men Origins : Wolverine”?

y Answer:?

Page 4: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John

Background - Datasety Data in the training file is per movie

y It looks like thisMovie#

Customer#,Rating,Date of Rating

Customer#,Rating,Date of Rating

Customer#,Rating,Date of Rating

- Example 4:

1065039,3,2005-09-06

1544320,1,2004-06-28

410199,5,2004-10-16

Page 5: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John

Background – Dataset statsy Total ratings possible = 480,189 (user) * 17,770 (movies) = 8532958530 (8.5

Billion)y Total available = 100 Milliony The User x Movies matrix has 8.4 Billion entries

missingy Sparse Data

Page 6: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John

Background of the Solutiony What if I was very conservative about my rating and

someone else was too generous?y I rate the movie I like the most as 3 and the least as 1.y someone else rates his/her high at 5 and high at 3.y So am I like this person?

y Difficult to say.

y We are comparing two people with very high personal biases. Which will result in obvious flawed similarity measure.

y Solution? Normalization of the data.

==> subtract mean, divide by STD
Page 7: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John

Proposed Solutiony K-Nearest Neighbor approach (Overview)

y Given a query instance q(movieId, UserId)y normalize the data before processing.y Find the distance of this instance with all the users who

rated this movie.y Of the these users select the K users that are nearest to

the query instance as its neighborhood.y Average the rating of the users form this neighborhood

for this particular movie.y This is the predicted rating for the query instance.

Page 8: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John

Proposed Solution - Exampley Example: (Representative data, not real)

Matrix

Star

Wars

Dark

knight Rocky

Sita

Aur

Gita

Star

Trek Cliffhanger A.I. MI X-Men

Jim 1 3 1 5 2 1 1

Sean 2 3 2 4 5 3

John 3 4 5 3 4

Sidd 4 3 4 2

Penny 5 2 2 5 1

Pete 5 ? 4 4

Page 9: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John

Proposed Solution - Exampley calculate the Mean and Standard Deviation vectors.

meanRating standardDeviation

Jim 2 1.527525232

Sean 3.166666667 1.169045194

John 3.8 0.836660027

Sidd 3.25 0.957427108

Penny 3 1.870828693

Pete 4.333333333 0.577350269

Page 10: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John

Proposed Solution - Exampley Normalized data

MatrixStar

Wars

Dark

knightRocky

Sita

Aur

Gita

Star

TrekCliffhanger A.I. MI X-Men

Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7

Sean -1 -0.14 -1 0.71 1.57 -0.14

John -1 0.24 1.434 -1 0.24

Sidd 0.783 -0.26 0.78 -1.3

Penny 1.069 -0.53 -0.53 1.07 -1.1

Pete 1.15 ? -0.6 -0.58

Page 11: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John

Proposed Solution - Exampley So now we have a query instance q(Pete, Sita Aur Gita)

y i.e. we wish to evaluate how much will Pete like movie “Sita Aur Gita” on a scale of 1 - 5.

y To do this we need to indentify Pete’s two neighbors who rated this movie. (2-NN case).

y Users who rated the movie Sita Aur Gita are.

candidate_users

Jim

Sidd

Penny

Page 12: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John

Proposed Solution - Exampley Users with their distance and the 2 neighbors in the

neighborhood are

y 2 Nearest Neighbors are Jim and Sidd.

Users Distance

Jim 0.500046868

Sidd 1.360699721

Peny 1.646395237

Page 13: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John

Proposed Solution - Exampley The average of the ratings by Jim and Sidd to movie

“Sita Aur Gita” is “0.7956”.y So is our prediction “0.7956” correct? Not yet.y This prediction is in normalized form.y We need to bring it back to Pete’s prediction level.

How?y Multiply by Standard Deviation of Pete’s ratings.y Add Pete mean rating to this product.

y (0.7956 * 0.5773) + 4.3333 = 4.7925y So predicted rating for Pete is 4.7925.

Page 14: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John

Experiments - Setupy This is a regression problem, hence we want to know if

we are off the expected value, how off are we?y Hence, Test Metric used is

y Root Mean Square Error (RMSE):

y Absolute Average Error (AAE):

y Time taken.

Page 15: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John

Experiments - Resultsy Result on described dataset

Method Absolute Average Error Root Mean Square Error Time (Minutes)

K-NN 0.5087 0.67164 8640 *

C-K-NN 0.6894 0.88995 9

Netflix (Ladder Board

Topper)

NA 0.8596 NA

Netflix Current System1 NA 0.9514 NA

Page 16: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John

Experiments - ResultsRMSE Comparisons Time taken

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

K-NN C-K-NN Netflix (Current Topper)

Netflix (Current System)

Comparison of the RMSE and Absolute Average Error

RMSE

Absolute Average Error

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

K-NN C-K-NN

Time in Minutes

Time in Minutes


Recommended