+ All Categories
Home > Documents > UB LIS 571 Soergel Lecture 6.2b Document analysis for retrieval and information extraction Dagobert...

UB LIS 571 Soergel Lecture 6.2b Document analysis for retrieval and information extraction Dagobert...

Date post: 29-Jan-2016
Category:
Upload: ashlee-mills
View: 220 times
Download: 0 times
Share this document with a friend
Popular Tags:
9
UB LIS 571 Soergel Lecture 6.2b Document analysis for retrieval and information extraction Dagobert Soergel Department of Library and Information Studies Graduate School of Education University at Buffalo Includes audio.
Transcript

PowerPoint Presentation

UB LIS 571 SoergelLecture 6.2b Document analysisfor retrieval and information extractionDagobert SoergelDepartment of Library and Information StudiesGraduate School of Education University at BuffaloIncludes audio.1 Overview of document/text analysisYou should have read Lecture Notes p. 225 226 (pink).Read Lecture Notes p. 227 228.

22 Approaches to document/text analysisRead Lecture Notes p. 229.The audio gives brief comments on some of these (6 min).

3

3 Linguistic techniquesRead Lecture Notes p. 230, take the page out of your binder.3.1 Query expansion, p. 231. Remember the Medline exercise. Will elaborate on this later, in Part 5.3.2 Noun phrases, p. 231:Key pointsImportance of noun phrases as carriers of meaning in English.Many single nouns have multiple meanings, in a noun phrase the meaning is specified. For another example, try some noun phrases that include the word pool.

43.3 Word sense disambiguationLecture Notes p. 232 233:Listen to the audio while following in the Lecture Notes (5 min.)

5

3.4 Named entity recognitionLecture Notes p. 234 235:

Look at the examples. They should be self-explanatory.

The audio talks about why this is important (2 min.).

6

3.5 Text structure. Cohesion (grammatical)Coherence (lexical semantic)Read Lecture Notes p. 236 237.Lecture Notes p. 238 239:Listen to the audio while following in the Lecture Notes (5 min.)

7

3.5.3 Introduction to the analysis of how relationships are expressed in text Application to information extractionInformation extraction is extremely important for systems to provide answers rather than pointers to where answers can be found. It helps people and organization cope with ever-increasing volumes of text. Google's Knowledge Graph is a huge database of statements, many extracted from text (using technology being further developed by Google). This database is used now to add substantive data to search results and will be used later to support question-answering.Read Lecture Notes p. 240, then study the second example from Crombie (p. 242) to find all the ways in which the relationship A B is expressed. In the top part, mentions of the same referent (object, action, or adjective) are connected through lines. In the bottom part, relationship types (e.g. cause) are labeled to make your task easier. After your own exploration listen to the audio; it explains the second example (7 min.).

Then study p. 243. Explanation in the audio (4 min.).Optional: apply what you have learned to analyzing Example 1 from Crombie.

8

3.6 Extracting datathrough slot filling in framesLecture Notes p. 244, with examples on p. 245:Read p. 244, first boxThe audio explains the rest (8 min.)

9


Recommended