+ All Categories
Home > Documents > Information Retrieval

Information Retrieval

Date post: 15-Feb-2016
Category:
Upload: roden
View: 31 times
Download: 0 times
Share this document with a friend
Description:
Information Retrieval. Lecture 9. Outline. Map Reduce, cont. Index compression [Amazon Web Services]. Map Reduce. Map: (key, value)  list(key ’, value’) Reduce: (key’, list(value ’)  list (value’). Example: counting occurrences of words in large collections of documents. - PowerPoint PPT Presentation
Popular Tags:
13
Information Retrieval Lecture 9
Transcript
Page 1: Information Retrieval

Information Retrieval

Lecture 9

Page 2: Information Retrieval

Outline

• Map Reduce, cont.• Index compression• [Amazon Web Services]

Page 3: Information Retrieval

Map Reduce

• Map: (key, value) list(key’, value’)• Reduce: (key’, list(value’) list (value’)

Page 4: Information Retrieval

Example: counting occurrences of words in large collections of documents

map(String key, String value): // key: document name // value: document contents for each word w in value: EmitIntermediate(w, "1");

reduce(String key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result));

Page 5: Information Retrieval
Page 6: Information Retrieval

Other Map-Reducible problems• Distributed Grep: The map function emits a line if it matches a

supplied pattern. The reduce function is an identity function that just copies the supplied intermediate data to the output.

• Count of URL Access Frequency: The map function processes logs of web page requests and outputs URL, 1 . The reduce ⟨ ⟩function adds together all values for the same URL and emits a URL, total count pair.⟨ ⟩

• Reverse Web-Link Graph: The map function outputs target, ⟨source pairs for each link to a target URL found in a page ⟩named source. The reduce function concatenates the list of all source URLs associated with a given target URL and emits the pair: target, list(source) ⟨ ⟩

Page 7: Information Retrieval
Page 8: Information Retrieval

Compression: example

• amendment allen-p/_sent_mail/465.:1:34 stclair-c/sent/993.:5:45,60,76,84,100

• Too verbose!

Page 9: Information Retrieval

Compression: example• amendment

allen-p/_sent_mail/465.:1:34 stclair-c/sent/993.:5:45,60,76,84,100

(that’s 74 characters)

• compare to

• 34 1342:1:34 9745:5:45,60,76,84,100

(that’s 33 characters)

need to keep a mappingbetween numbers and names

Page 10: Information Retrieval
Page 11: Information Retrieval
Page 12: Information Retrieval
Page 13: Information Retrieval

Recommended