+ All Categories
Home > Documents > Actores y Actrices. Peligro Please be careful! IMDb (I assume you all know?)

Actores y Actrices. Peligro Please be careful! IMDb (I assume you all know?)

Date post: 24-Dec-2015
Category:
Upload: asher-mitchell
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
12
Actores y Actrices
Transcript
Page 1: Actores y Actrices. Peligro Please be careful! IMDb (I assume you all know?)

Actores y Actrices

Page 2: Actores y Actrices. Peligro Please be careful! IMDb (I assume you all know?)

Peligro

• Please be careful!

Page 3: Actores y Actrices. Peligro Please be careful! IMDb (I assume you all know?)

IMDb (I assume you all know?)

Page 4: Actores y Actrices. Peligro Please be careful! IMDb (I assume you all know?)

IMDb Dump

Not open/free!

Page 5: Actores y Actrices. Peligro Please be careful! IMDb (I assume you all know?)

The Question You are Going to Answer …

Which pair of actors/actresses have acted together the most times?

Page 6: Actores y Actrices. Peligro Please be careful! IMDb (I assume you all know?)

An Example

In how many movies have Al Pacino and Robert Di Nero starred together in IMDb?

?

Page 7: Actores y Actrices. Peligro Please be careful! IMDb (I assume you all know?)

IMDB: Typical File

• Log into machine cluster.dcc.uchile.cl• Username: uhadoop

• zcat /data/hadoop/hadoop/data/imdb/actors.list.gz | more

Page 8: Actores y Actrices. Peligro Please be careful! IMDb (I assume you all know?)

IMDb: Already Parsed

zcat /data/hadoop/hadoop/data/imdb/tsv/actpersons-to-movies.tsv.gz | more

How many theatrical movies was Uma Thurman in?

zcat /data/hadoop/hadoop/data/imdb/tsv/actresses-to-movies.tsv.gz | grep -e “^Thurman, Uma” | grep -e “THEATRICAL_MOVIE” | wc -l

Page 9: Actores y Actrices. Peligro Please be careful! IMDb (I assume you all know?)

The Question You are Going to Answer …

Which pair of actors/actresses have acted together the most times?

Page 10: Actores y Actrices. Peligro Please be careful! IMDb (I assume you all know?)

1. Download the project

http://aidanhogan.com/teaching/cc5212-1/mdp-lab5.zip

Page 11: Actores y Actrices. Peligro Please be careful! IMDb (I assume you all know?)

2. Implement the Hadoop job(s)!

• Adapt WordCount example– Refer to lab slides from last week

• Can use class file for each part of the task

• Test on small file– /uhadoop/imdb/actpersons-to-movies.100k.tsv

• Run on big file– /uhadoop/imdb/full/actpersons-to-movies.tsv

• Write to your directory!!!– /uhadoop/[username]

Page 12: Actores y Actrices. Peligro Please be careful! IMDb (I assume you all know?)

3. Continuation

• Count the pairs– CountPairs.java

• Sort the pairs– SortPairs.java

• Figure out the input• Figure out the map/reduce phase• Adapt a previous example– WordCount or EmitPairs– Change generics– Implement new Map/Reduce

• Run it!


Recommended