+ All Categories
Home > Documents > Lecture2 information ritrival

Lecture2 information ritrival

Date post: 17-Aug-2015
Category:
Upload: vishal-sathawane
View: 216 times
Download: 1 times
Share this document with a friend
Description:
chapter notes
Popular Tags:
34
Introduction to Information Retrieval Introduction to Information Retrieval Document ingestion
Transcript

Introduction to Information RetrievalIntroduction to Information Retrieval Introduction toInformation RetrievalDocument ingestionIntroduction to Information RetrievalIntroduction to Information Retrieval Recall the basic indexing pipelineTokenizerToken streamFriends Romans CountrymenLinguistic modulesModifed tokensfriend roman countrymanIndexerInverted indexfriendromancountryman2 42! "

Documents tobe indexedFriends, Romans, countrymen.Introduction to Information RetrievalIntroduction to Information Retrieval #arsing a document$hat %ormat is it in&pd%'(ord'excel'html&$hat language is it in&$hat character set is in use&)*#2+2, -T./0, 123ach o% these is a classifcation problem, (hich (e (ill stud4 later in the course56ut these tasks are o%ten done heuristicall4 17ec5 25Introduction to Information RetrievalIntroduction to Information Retrieval *omplications8 .ormat'languageDocuments being indexed can include docs %rom man4 di9erent languages: single index ma4 contain terms %rom man4 languages57ometimes a document or its components can contain multiple languages'%ormats.rench email (ith a ;erman pd% attachment5.rench email


Recommended