+ All Categories
Home > Documents > Information Retrieval Liam Quin, Barefoot Computing, Toronto.

Information Retrieval Liam Quin, Barefoot Computing, Toronto.

Date post: 27-Mar-2015
Category:
Upload: jackson-conway
View: 215 times
Download: 1 times
Share this document with a friend
Popular Tags:
21
Information Retrieval Liam Quin, Barefoot Computing, Toronto
Transcript
Page 1: Information Retrieval Liam Quin, Barefoot Computing, Toronto.

Information Retrieval

Liam Quin,

Barefoot Computing,

Toronto

Page 2: Information Retrieval Liam Quin, Barefoot Computing, Toronto.

Agenda

• Overview of Information Retrieval• What people want, and how to give

it to them• Things people don’t know they

want, and how to do them

Page 3: Information Retrieval Liam Quin, Barefoot Computing, Toronto.

Chapter One: The Problem

gooseberry

Page 4: Information Retrieval Liam Quin, Barefoot Computing, Toronto.

Gooseberry Picking Hurts

Gooseberries have thorns.

Gooseberry pickers in Botswana might not wear shirts (or shoes).

When you pick one gooseberry, others fall to the ground.

The harvest would be improved if we could retrieve the fallen fruit safely.

There are texts on this on the Internet.

Page 5: Information Retrieval Liam Quin, Barefoot Computing, Toronto.

Searching for an Answer

• Search for “information on texts about gooseberry retrieval” on the web.

Page 6: Information Retrieval Liam Quin, Barefoot Computing, Toronto.

The Result

• Pages on text retrieval. . .• . . . on information retrieval. . .• . . .and, at the top of the list. . .

Cycling in Cape Gooseberry, Labrador

Page 7: Information Retrieval Liam Quin, Barefoot Computing, Toronto.

Lessons

• Indexes on words alone aren’t enough

• Word order can be important• Relevance Ranking is often bogus• Sometimes you have to wear shirt

and shoes.

Page 8: Information Retrieval Liam Quin, Barefoot Computing, Toronto.

How can we Improve?

• Better textual analysis• Word order• Context• Metadata• External categorisation (RDF, Topic

Maps)• Grow thornless gooseberries

Page 9: Information Retrieval Liam Quin, Barefoot Computing, Toronto.

Better Textual Analysis• Part of speech information during

indexing• Stemming (boy/boys, foot/feet,

run/running, me/mine)• Record more in index (caps, separation)• Co-location analysis (mine next to gold)• Ask the User what she means in the

query (mine as in of me, or as in quarry?)• Thesaurus Expansion of queries

Page 10: Information Retrieval Liam Quin, Barefoot Computing, Toronto.

Word Order

• Give added weight to word order:• information retrieval vs. retrieval

of information• Times Square vs. square times• include all words (What If Inc., The

Times)

Page 11: Information Retrieval Liam Quin, Barefoot Computing, Toronto.

Context

• Co-location of words helps disambiguate

• The xml containing element• Feedback from nearby documents

(e.g. on the same website, or in the same chapter or publication)

• Domain-specific information at index-time

Page 12: Information Retrieval Liam Quin, Barefoot Computing, Toronto.

Metadata

• Add information to documents• Dublin Core (e.g. Warwick

Framework)• The html meta element• The html rel/rev attributes in links

Page 13: Information Retrieval Liam Quin, Barefoot Computing, Toronto.

External Categorisation

• Use xml schemas to add context information

• Document or site-wide information• Resource Description Framework• Topic Maps (iso 13250)• Categorise the result set [see

picture]

Page 14: Information Retrieval Liam Quin, Barefoot Computing, Toronto.

Grow thornless gooseberries• Sometimes it’s easier to change

the problem than to solve it as stated.

• Sometimes people don’t describe the problem that they need solved.

• Sometimes it’s easier to solve a more general problem (thornless fruit? Or padded shirts)

Page 15: Information Retrieval Liam Quin, Barefoot Computing, Toronto.

What Most People Want

• Find this string or phrase in this element.

• That’s all most people ask for.• It’s all they want.• But it’s hardly ever all they need.

Page 16: Information Retrieval Liam Quin, Barefoot Computing, Toronto.

The real needs

• Needs of other staff• Executives who understand the

problem• Indirect needs

– internal use by software– other departments– private uses by sneaky employees– enabling technologies change

perspectives

Page 17: Information Retrieval Liam Quin, Barefoot Computing, Toronto.

I didn’t know I could...

• Quality control– check for known errors– find unusual words or phrases– phrases not marked up

• Analysis– look for unusual markup– co-location (phrase summary)

Page 18: Information Retrieval Liam Quin, Barefoot Computing, Toronto.

Oooh, can you really?

• Automatic linking– Glossary– Glossary Index page– Dictionary Samples

• Add markup automatically– based on phrases in context

Page 19: Information Retrieval Liam Quin, Barefoot Computing, Toronto.

Summery Summary

• You may need more than you thought…

• …but it might do more than you expected…

• …but...

Page 20: Information Retrieval Liam Quin, Barefoot Computing, Toronto.

There is no gooseberry pie for lunch.

Page 21: Information Retrieval Liam Quin, Barefoot Computing, Toronto.

Liam QuinBarefoot ComputingToronto

http://www.valinor.sorcery.net/~liam/

[email protected]


Recommended