Date post: | 01-Jan-2016 |
Category: |
Documents |
Upload: | lamar-mckee |
View: | 10 times |
Download: | 3 times |
Course Overview: An Introduction to Information
Retrieval and Applications
J. H. WangFeb. 23, 2011
IR, Spring 2011 NTUT CSIE 2
Instructor & TA
• Instructor– J. H. Wang ( 王正豪 )– Assistant Professor, CSIE, NTUT– Office: R1534, Technology Building– E-mail: [email protected]– Tel: ext. 4238– Office Hour: 10:00-12:00 am, every Wednesday and
Thursday• TA
– Mr. Lin ( 林承翰 ): [email protected] – R1424, Technology Building
IR, Spring 2011 NTUT CSIE 3
Course Description• Course Web Page
– http://www.ntut.edu.tw/~jhwang/IR/• Time: 13:10-16:00pm, Wed.• Classroom: R327, 6th Teaching Building• Textbook:
– Christopher D. Manning, Prabhakar Raghavan and Hinrich Schuetze, Introduction to Information Retrieval, Cambridge University Press, 2008.
• Available online• International Student Edition, imported by Kai-Fa ( 開發 ) Publis
hing• Prerequisites:
– Basic knowledge of data structures and algorithms, linear algebra, and probability theory
– Programming experience is necessary for projects
IR, Spring 2011 NTUT CSIE 4
Additional References
• References: – Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Mo
dern Information Retrieval: The Concepts and Technology behind Search, Addison-Wesley, 2011.
• This is the second edition of their book Modern Information Retrieval in 1999. ( 華通 )
– Stefan Buettcher, Charles L.A. Clarke, and Gordon V. Cormack, Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010.
– Bruce Croft, Donald Metzler, and Trevor Strohman, Search Engines: Information Retrieval in Practice, Addison-Wesley, 2010. ( 全華 )
IR, Spring 2011 NTUT CSIE 5
More Books on IR• Gerald Salton, Automatic information organization an
d retrieval, McGraw-Hill, 1968.• Gerald Salton and M.J. McGill, Introduction to modern
information retrieval, McGraw-Hill, 1983.– Two classics, but out-of-print.
• C. J. van Rijsbergen, Information Retrieval, Butterworths, 1979. – The classic. More than 40 years old, but still worth reading.
• K. Sparck Jones, P. Willett, Readings in Information Retrieval, Morgan Kaufmann, 1997. – A collection of classical IR papers. (out of print)
• I.H. Witten, A. Moffat, T.C. Bell. Morgan Kaufmann, Managing Gigabytes, 1999. – The authority on index construction and compression.
IR, Spring 2011 NTUT CSIE 6
Grading Policy
• Homework assignments and programming exercises: 40%
• Mid-term exam: 25%• Term project (including the
proposal): 35%
IR, Spring 2011 NTUT CSIE 7
Programming Exercises and Term Project
• At least two programming exercises– Team-based (at most 4 persons per team)– You can either write your own code or reuse
existing open source code– Topics: (to be announced…)
• The term project– Either team-based system development (the
same as programming exercises)– Or academic paper presentation
• But, you should do it on your own (only 1 person), NOT team-based
– A proposal is required around midterm (Apr. 2011)
• Introduction, methods, experiment designs
IR, Spring 2011 NTUT CSIE 8
Online Submission
• Submission instructions– Programs, project proposals, and project
reports in electronic files must be submitted to the TA online at:• http://140.124.183.39/ir/
– Before submission: • User name: Your student ID• Please change your default password at your
first login
IR, Spring 2011 NTUT CSIE 9
What this Course is NOT about
• This course will NOT tell you– The tips and tricks when using search engines,
although power users might have better ideas on how to improve them
• There’re plenty of books and websites on that…
– How to find books in libraries, although it’s somewhat related to the basic concepts of IR
– How to make money on the Web, although the currently largest search engine did it
IR, Spring 2011 NTUT CSIE 10
What’s Information Retrieval
IR, Spring 2011 NTUT CSIE 11
On Wikipedia
IR, Spring 2011 NTUT CSIE 12
On GeoNet
IR, Spring 2011 NTUT CSIE 13
On Google Maps
IR, Spring 2011 NTUT CSIE 14
On Google News
IR, Spring 2011 NTUT CSIE 15
On Blogs
IR, Spring 2011 NTUT CSIE 16
Or More Related Keywords
• South Island• Christchurch• Canterbury• Christchurch Cathedral• …
IR, Spring 2011 NTUT CSIE 17
What if We Search in Chinese
IR, Spring 2011 NTUT CSIE 18
And More…
• 南島• 第二大城• 基督城• 大教堂• …• And other languages…
IR, Spring 2011 NTUT CSIE 19
What Is Information Retrieval?
• “Information retrieval is a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information.” (Salton, 1968)
IR, Spring 2011 NTUT CSIE 20
Goal
• Information retrieval (IR): a research field that targets at effectively and efficiently searching information in text and multimedia documents
• In this course, we will introduce the basic text and query models in IR, retrieval evaluation, indexing and searching, and applications for IR
IR, Spring 2011 NTUT CSIE 21
A Big Picture
IR, Spring 2011 NTUT CSIE 22
Inverted Index
UserInterface
Text Operations
Query Expansion Indexing
Retrieval
Ranking
Text
query
user need
user feedback
ranked docs
retrieved docs
Doc representationlogical view
inverted file
Document Collection
IR, Spring 2011 NTUT CSIE 23
Topics
• Text IR– Indexing and Searching– Query Languages and Operations
• Retrieval Evaluation• Modeling
– Boolean model– Vector space model– Probabilistic model
• Applications for IR– Multimedia IR– Web Search– Digital Libraries
IR, Spring 2011 NTUT CSIE 24
Organization of the Textbook
• Basics in IR (focus)– Inverted indexes for boolean queries (Ch.1-5)– Term weighting and vector space model (Ch. 6-7)– Evaluation in IR (Ch. 8)
• Advanced Topics– Relevance feedback (Ch. 9)– XML retrieval (Ch. 10)– Probabilistic IR (Ch. 11)– Language models (Ch. 12)
• Machine learning in IR– Text classification (Ch. 13-15)– Document clustering (Ch. 16-18)
• Web Search– Web crawling and indexes (Ch. 19-20)– Link analysis (Ch. 21)
IR, Spring 2011 NTUT CSIE 25
Pointers to Other Topics
• Cross-language IR• Image, video, and multimedia IR• Speech retrieval• Music retrieval• User interfaces• Parallel, distributed, and P2P IR• Digital libraries• Information science perspective• Logic-based approaches to IR• Natural language processing techniques
IR, Spring 2011 NTUT CSIE 26
Tentative Schedule
• Before midterm– Boolean retrieval (1 wk)– Indexing (2 wks)– Vector space model and evaluation (2 wk)– Relevance feedback (1 wk)– Probabilistic IR (2 wk)
• After midterm – Text classification (1 wk)– Document clustering (1 wk)– Web search (2 wks)– Advanced topics: CLIR, IE, … (2 wks)– Term Project Presentation (3 wks)
IR, Spring 2011 NTUT CSIE 27
Generic Resources
• Wikipedia page on Information Retrieval: http://en.wikipedia.org/wiki/Information_retrieval
• Information Retrieval Resources: http://www-csli.stanford.edu/~hinrich/information-retrieval.html
•
IR, Spring 2011 NTUT CSIE 28
Academic Resources
• Journals– ACM TOIS: Transactions on Information Systems – JASIST: Journal of the American Society of Information
Sciences– IP&M: Information Processing and Management
• Conferences– ACM SIGIR: International Conference on Information
Retrieval– ACM CIKM: Conference on Information Knowledge and
Management– JCDL: ACM/IEEE Joint Conference on Digital Libraries– TREC: Text Retrieval Conference
IR, Spring 2011 NTUT CSIE 29
Thanks for Your Attention!