+ All Categories
Home > Documents > Course Overview: An Introduction to Information Retrieval and Applications

Course Overview: An Introduction to Information Retrieval and Applications

Date post: 01-Jan-2016
Category:
Upload: lamar-mckee
View: 10 times
Download: 3 times
Share this document with a friend
Description:
Course Overview: An Introduction to Information Retrieval and Applications. J. H. Wang Feb. 23, 2011. Instructor & TA. Instructor J. H. Wang ( 王正豪 ) Assistant Professor, CSIE, NTUT Office: R1534, Technology Building E-mail: [email protected] Tel: ext. 4238 - PowerPoint PPT Presentation
Popular Tags:
29
Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 23, 2011
Transcript
Page 1: Course Overview:  An Introduction to Information Retrieval and Applications

Course Overview: An Introduction to Information

Retrieval and Applications

J. H. WangFeb. 23, 2011

Page 2: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 2

Instructor & TA

• Instructor– J. H. Wang ( 王正豪 )– Assistant Professor, CSIE, NTUT– Office: R1534, Technology Building– E-mail: [email protected]– Tel: ext. 4238– Office Hour: 10:00-12:00 am, every Wednesday and

Thursday• TA

– Mr. Lin ( 林承翰 ): [email protected] – R1424, Technology Building

Page 3: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 3

Course Description• Course Web Page

– http://www.ntut.edu.tw/~jhwang/IR/• Time: 13:10-16:00pm, Wed.• Classroom: R327, 6th Teaching Building• Textbook:

– Christopher D. Manning, Prabhakar Raghavan and Hinrich Schuetze, Introduction to Information Retrieval, Cambridge University Press, 2008.

• Available online• International Student Edition, imported by Kai-Fa ( 開發 ) Publis

hing• Prerequisites:

– Basic knowledge of data structures and algorithms, linear algebra, and probability theory

– Programming experience is necessary for projects

Page 4: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 4

Additional References

• References: – Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Mo

dern Information Retrieval: The Concepts and Technology behind Search, Addison-Wesley, 2011.

• This is the second edition of their book Modern Information Retrieval in 1999. ( 華通 )

– Stefan Buettcher, Charles L.A. Clarke, and Gordon V. Cormack, Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010.

– Bruce Croft, Donald Metzler, and Trevor Strohman, Search Engines: Information Retrieval in Practice, Addison-Wesley, 2010. ( 全華 )

Page 5: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 5

More Books on IR• Gerald Salton, Automatic information organization an

d retrieval, McGraw-Hill, 1968.• Gerald Salton and M.J. McGill, Introduction to modern

information retrieval, McGraw-Hill, 1983.– Two classics, but out-of-print.

• C. J. van Rijsbergen, Information Retrieval, Butterworths, 1979. – The classic. More than 40 years old, but still worth reading.

• K. Sparck Jones, P. Willett, Readings in Information Retrieval, Morgan Kaufmann, 1997. – A collection of classical IR papers. (out of print)

• I.H. Witten, A. Moffat, T.C. Bell. Morgan Kaufmann, Managing Gigabytes, 1999. – The authority on index construction and compression.

Page 6: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 6

Grading Policy

• Homework assignments and programming exercises: 40%

• Mid-term exam: 25%• Term project (including the

proposal): 35%

Page 7: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 7

Programming Exercises and Term Project

• At least two programming exercises– Team-based (at most 4 persons per team)– You can either write your own code or reuse

existing open source code– Topics: (to be announced…)

• The term project– Either team-based system development (the

same as programming exercises)– Or academic paper presentation

• But, you should do it on your own (only 1 person), NOT team-based

– A proposal is required around midterm (Apr. 2011)

• Introduction, methods, experiment designs

Page 8: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 8

Online Submission

• Submission instructions– Programs, project proposals, and project

reports in electronic files must be submitted to the TA online at:• http://140.124.183.39/ir/

– Before submission: • User name: Your student ID• Please change your default password at your

first login

Page 9: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 9

What this Course is NOT about

• This course will NOT tell you– The tips and tricks when using search engines,

although power users might have better ideas on how to improve them

• There’re plenty of books and websites on that…

– How to find books in libraries, although it’s somewhat related to the basic concepts of IR

– How to make money on the Web, although the currently largest search engine did it

Page 10: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 10

What’s Information Retrieval

Page 11: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 11

On Wikipedia

Page 12: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 12

On GeoNet

Page 13: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 13

On Google Maps

Page 14: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 14

On Google News

Page 15: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 15

On Blogs

Page 16: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 16

Or More Related Keywords

• South Island• Christchurch• Canterbury• Christchurch Cathedral• …

Page 17: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 17

What if We Search in Chinese

Page 18: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 18

And More…

• 南島• 第二大城• 基督城• 大教堂• …• And other languages…

Page 19: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 19

What Is Information Retrieval?

• “Information retrieval is a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information.” (Salton, 1968)

Page 20: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 20

Goal

• Information retrieval (IR): a research field that targets at effectively and efficiently searching information in text and multimedia documents

• In this course, we will introduce the basic text and query models in IR, retrieval evaluation, indexing and searching, and applications for IR

Page 21: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 21

A Big Picture

Page 22: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 22

Inverted Index

UserInterface

Text Operations

Query Expansion Indexing

Retrieval

Ranking

Text

query

user need

user feedback

ranked docs

retrieved docs

Doc representationlogical view

inverted file

Document Collection

Page 23: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 23

Topics

• Text IR– Indexing and Searching– Query Languages and Operations

• Retrieval Evaluation• Modeling

– Boolean model– Vector space model– Probabilistic model

• Applications for IR– Multimedia IR– Web Search– Digital Libraries

Page 24: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 24

Organization of the Textbook

• Basics in IR (focus)– Inverted indexes for boolean queries (Ch.1-5)– Term weighting and vector space model (Ch. 6-7)– Evaluation in IR (Ch. 8)

• Advanced Topics– Relevance feedback (Ch. 9)– XML retrieval (Ch. 10)– Probabilistic IR (Ch. 11)– Language models (Ch. 12)

• Machine learning in IR– Text classification (Ch. 13-15)– Document clustering (Ch. 16-18)

• Web Search– Web crawling and indexes (Ch. 19-20)– Link analysis (Ch. 21)

Page 25: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 25

Pointers to Other Topics

• Cross-language IR• Image, video, and multimedia IR• Speech retrieval• Music retrieval• User interfaces• Parallel, distributed, and P2P IR• Digital libraries• Information science perspective• Logic-based approaches to IR• Natural language processing techniques

Page 26: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 26

Tentative Schedule

• Before midterm– Boolean retrieval (1 wk)– Indexing (2 wks)– Vector space model and evaluation (2 wk)– Relevance feedback (1 wk)– Probabilistic IR (2 wk)

• After midterm – Text classification (1 wk)– Document clustering (1 wk)– Web search (2 wks)– Advanced topics: CLIR, IE, … (2 wks)– Term Project Presentation (3 wks)

Page 27: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 27

Generic Resources

• Wikipedia page on Information Retrieval: http://en.wikipedia.org/wiki/Information_retrieval

• Information Retrieval Resources: http://www-csli.stanford.edu/~hinrich/information-retrieval.html

Page 28: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 28

Academic Resources

• Journals– ACM TOIS: Transactions on Information Systems – JASIST: Journal of the American Society of Information

Sciences– IP&M: Information Processing and Management

• Conferences– ACM SIGIR: International Conference on Information

Retrieval– ACM CIKM: Conference on Information Knowledge and

Management– JCDL: ACM/IEEE Joint Conference on Digital Libraries– TREC: Text Retrieval Conference

Page 29: Course Overview:  An Introduction to Information Retrieval and Applications

IR, Spring 2011 NTUT CSIE 29

Thanks for Your Attention!


Recommended