+ All Categories
Home > Documents > Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web...

Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web...

Date post: 12-Jan-2016
Category:
Upload: bryan-long
View: 217 times
Download: 0 times
Share this document with a friend
Popular Tags:
14
Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page: http://www.cs.unt.edu/~rada/CSCE5300
Transcript
Page 1: Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page: rada/CSCE5300.

Information Retrieval and Web Search

Lecture 1. Course overview

Instructor: Rada MihalceaClass web page: http://www.cs.unt.edu/~rada/CSCE5300

Page 2: Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page: rada/CSCE5300.

Slide 2

What is this course about?

•Processing

•Indexing

•Retrieving

•… textual data

•Fits in four lines, but much more complex and interesting than that

Page 3: Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page: rada/CSCE5300.

Slide 3

Need for IR

•With the advance of WWW - more than 3 Billion documents indexed on Google

•Various needs for information:– Search for documents that fall in a given topic– Search for a specific information– Search an answer to a question– Search for information in a different language

Page 4: Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page: rada/CSCE5300.

Slide 4

Some definitions of Information Retrieval (IR)

Salton (1989): “Information-retrieval systems process files of records and requests for information, and identify and retrieve from the files certain records in response to the information requests. The retrieval of particular records depends on the similarity between the records and the queries, which in turn is measured by comparing the values of certain attributes to records and information requests.”

Kowalski (1997): “An Information Retrieval System is a system that is capable of storage, retrieval, and maintenance of information. Information in this context can be composed of text (including numeric and date data), images, audio, video, and other multi-media objects).”

Page 5: Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page: rada/CSCE5300.

Slide 5

Examples of IR systems

• Conventional (library catalog)Search by keyword, title, author, etc. E.g. : You are probably familiar with www.library.unt.edu

• Text-based (Lexis-Nexis, Google, FAST).Search by keywords. Limited search using queries in natural language.

• Multimedia (QBIC, WebSeek, SaFe)Search by visual appearance (shapes, colors,… ).

• Question answering systems (AskJeeves, Answerbus)Search in (restricted) natural language

• Other: cross language information retrieval, music retrieval

Page 6: Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page: rada/CSCE5300.

Slide 6

Page 7: Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page: rada/CSCE5300.

Slide 7

Page 8: Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page: rada/CSCE5300.

Slide 8

IR systems on the Web

•Search for Web pages http://www.google.com

•Search for images http://images.google.com

•Search for image content http://wang.ist.psu.edu/IMAGE/

•Search for answers to questions http://www.askjeeves.com

•Search for music?

Page 9: Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page: rada/CSCE5300.

Slide 9

Course information

•Instructor: Rada Mihalcea

•Contact info: NTRP 228, 940-369-7630, [email protected]

•Teaching assistant: TBA

•Class meets TTh, 2:00-3:20pm

•Office hourse – T, 4:00-5:30pm– Any time electronically – For grading, programming problems, first try to get in

touch with the TA.

Page 10: Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page: rada/CSCE5300.

Slide 10

Course resources

•Textbook:– Modern Information Retrieval Ricardo Baeza-Yates and Berthier Ribeiro-Neto

•Recommended:– Readings in Information Retrieval

K.Sparck Jones and P. Willett

– See the class website for pointers to places to buy them for less

•Papers from conferences, journals will be assigned throughout the course. Whenever possible, a copy of the paper will be placed on the class website.

Page 11: Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page: rada/CSCE5300.

Slide 11

Grading

•Homeworks: 30% – Start early! Some may be time consuming– 3 days late policy

•Midterm I: 15%

•Midterm II: 15%

•Project: 30%

•Class participation: 10%

•Good news! No final – final is replaced by the project

Page 12: Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page: rada/CSCE5300.

Slide 12

Programming language

• Students are free to choose the programming language they want to work with

• However:– I recommend working with Perl– We’ll have a short Perl tutorial next 1-2 lectures

– Why Perl? • Makes life much much more easier for text processing problems and

for Web based applications• Information Retrieval involves a lot of text processing, and often

involves Web access– Code reusability

• Regardless of the language, code MUST compile and run on the CSP Linux machines. – No credit will be given for programs that do not compile!

Page 13: Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page: rada/CSCE5300.

Slide 13

Tentative schedule

Course Overview

Short Perl Tutorial

Introduction to IR models and methods

Text analysis / document preprocessing

Vectorial model

Boolean model

Probabilistic model; other IR models

IR collections

IR evaluation

Query operations

Query languages

Natural Language IR (Named Entity recognition)

Page 14: Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page: rada/CSCE5300.

Slide 14

Tentative scheduleNatural Language IR (Semantic ambiguity, conceptual indexing)

Natural Language IR (Phrase indexing, other)

Question Answering: TREC / Web

Information extraction

Text classification/Topic tracking and detection

Web IR: crawlers

Web IR: search engines

Web IR: link based / content based

Web IR: evaluation metrics / Midterm review

Special topics: Cross Language IR

Special topics

Final IR overview, future directions

…. Midterm I, Midterm II, Project presentations


Recommended