+ All Categories
Home > Documents > Automating & Evaluating Metadata Generation

Automating & Evaluating Metadata Generation

Date post: 08-Jan-2016
Category:
Upload: clancy
View: 34 times
Download: 0 times
Share this document with a friend
Description:
Automating & Evaluating Metadata Generation. Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse University. Outline. Semantic Web Metadata 3 Metadata R & D Projects. Semantic Web. - PowerPoint PPT Presentation
53
Automatic Metadata Generation & Evaluation Automating & Evaluating Metadata Generation Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse University
Transcript
Page 1: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Automating & Evaluating Metadata Generation

Elizabeth D. Liddy

Center for Natural Language ProcessingSchool of Information Studies

Syracuse University

Page 2: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Outline

• Semantic Web

• Metadata

• 3 Metadata R & D Projects

Page 3: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Semantic Web

• Links digital information in such a way as to make the information easily processable by computers globally

• Enables publishing data in a re-purposable form

• Built on syntax which uses URIs and RDF to represent and exchange data on the web– Maps directly & unambiguously to a model– Generic parsers are available

• However, requisite processing is still largely manual

Page 4: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Metadata

• Structured data about resources • Supports a wide range of operations:

– Management of information resources– Resource discovery

• Enables communication and co-operation amongst: – Software developers– Publishers – Recording & television industry – Digital libraries– Providers of geographical & satellite-based information– Peer-to-peer community

Page 5: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Metadata (cont’d)

• Value-added information which enables information objects to be:– Identified– Represented– Managed– Accessed

• Standards within industries enable interoperability between repositories & users

• However, produced manually

Page 6: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Educational Metadata Schema Elements

GEM Metadata Elements

• Audience

• Cataloging

• Duration

• Essential Resources

• Pedagogy

• Grade

• Standards

• Quality

Dublin Core Metadata Elements• Contributor• Coverage• Creator• Date• Description• Format• Identifier• Language• Publisher• Relation• Rights• Source• Subject• Title• Type

Page 7: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Educational Metadata Schema Elements

GEM Metadata Elements

• Audience

• Cataloging

• Duration

• Essential Resources

• Pedagogy

• Grade

• Standards

• Quality

Dublin Core Metadata Elements• Contributor• Coverage• Creator• Date• Description• Format• Identifier• Language• Publisher• Relation• Rights• Source• Subject• Title• Type

Page 8: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Semantic Web MetaData ?

• But both….– Seek same goals

– Use standards & crosswalks between schema

– Look for comprehensive, well-understood, well-used sets of terms for describing content of information resources

– Enable mutual sharing, accessing, and reuse of information resources

Page 9: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

NSDL MetaData Projects

• Breaking the MetaData Generation Bottleneck– CNLP– University of Washington

• StandardConnection– University of Washington– CNLP

• MetaTest– CNLP– Center for Human Computer Interaction –

Cornell University

Page 10: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Breaking the MetaData Generation Bottleneck

• Goal: Demonstrate feasibility of automatically generating high-quality metadata for digital libraries through Natural Language Processing

• Data: Full-text resources from clearinghouses which provide teaching resources to teachers, students, administrators and parents

• Metadata Schema: Dublin Core + Gateway for Educational Materials (GEM) Schema

Page 11: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Method: Information Extraction

• Natural Language Processing– Technology which enables a system to accomplish

human-like understanding of document contents– Extracts both explicit and implicit meaning

• Sublanguage Analysis– Utilizes domain and genre-specific regularities vs.

full-fledged linguistic analysis• Discourse Model Development

– Extractions specialized for communication goals of document type and activities under discussion

Page 12: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation Types of Features recognized & utilized:• Non-linguistic

• Length of document• HTML and XML tags

• Linguistic• Root forms of words• Part-of-speech tags • Phrases (Noun, Verb, Proper Noun, Numeric Concept)• Categories (Proper Name & Numeric Concept)• Concepts (sense disambiguated words / phrases)• Semantic Relations• Discourse Level Components

Information Extraction

Page 13: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation Stream Channel Erosion Activity

Student/Teacher Background: Rivers and streams form the channels in which they flow. A river channel is formed by the quantity of water and debris that is carried by the water in it. The water carves and maintains the conduit containing it. Thus, the channel is self-adjusting. If the volume of water, or amount of debris is changed, the channel adjusts to the new set of conditions. …..…..Student Objectives: The student will discuss stream sedimentation that occurred in the Grand Canyon as a result of the controlled release from Glen Canyon Dam.…

Sample Lesson Plan

Page 14: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Input:The student will discuss stream sedimentation that occurred in the Grand Canyon as a result of the controlled release from Glen Canyon Dam.

Morphological Analysis:The student will discuss stream sedimentation that occurred in the Grand Canyon as a result of the controlled release from Glen Canyon Dam.

Lexical Analysis:The|DT student|NN will|MD discuss|VB stream|NN sedimentation|NN that|WDT occurred|VBD in|IN the|DT Grand|NP Canyon|NP as|IN a|DT result|NN of|IN the|DT controlled|JJ release|NN from|IN Glen|NP Canyon|NP Dam|NP .|.

NLP Processing of Lesson Plan

Page 15: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation Syntactic Analysis - Phrase Identification:

The|DT student|NN will|MD discuss|VB <CN> stream|NN sedimentation|NN </CN> that|WDT occurred|VBD in|IN the|DT <PN> Grand|NP Canyon|NP </PN> as|IN a|DT result|NN of|IN the|DT <CN> controlled|JJ release|NN </CN> from|IN <PN> Glen|NP Canyon|NP Dam|NP </PN> .|.

Semantic Analysis Phase 1- Proper Name Interpretation:

The|DT student|NN will|MD discuss|VB <CN> stream|NN sedimentation|NN </CN> that|WDT occurred|VBD in|IN the|DT <PN cat=geography/location> Grand|NP Canyon|NP </PN> as|IN a|DT result|NN of|IN the|DT <CN> controlled|JJ release|NN </CN> from|IN <PN cat=geography/structure> Glen|NP Canyon|NP Dam|NP </PN> .|.

NLP Processing of Lesson Plan (cont’d)

Page 16: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Semantic Analysis Phase 2 - Event & Role Extraction

Teaching event: discuss actor: student

topic: stream sedimentation

event: stream sedimentation location: Grand Canyon

cause: controlled release

NLP Processing of Lesson Plan (cont’d)

Page 17: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Potential

Keyword data

Html Document

ConfigurationHTML

Converter

MetadataRetrieval Module

Cataloger

Catalog Date

Rights

Publisher

Format

Language

Resource Type

eQueryExtraction

ModuleCreator

Grade/Level

Duration

Date

Pedagogy

Audience

Standard

HTML Document with Metadata

PreProcessorTf/idf

Keywords

Title

Description

Essential Resources

Relation

Output Gathering Program

MetaExtract

HTML Document

Page 18: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation Title: Grand Canyon: Flood! - Stream Channel

Erosion ActivityGrade Levels: 6, 7, 8GEM Subjects: Science--Geology

Mathematics--GeometryMathematics--Measurement

Keywords: Named Entities: Colorado River (river), Grand Canyon

(geography / location), Glen Canyon Dam (geography / structures)

Subject Keywords: channels, conduit, controlled_release, dam, flow_volume, hold, reservoir, rivers, sediment, streams

Material Keywords: clayboard, cookie_sheet, cup, paper_towel, pencil, roasting_pan, sand, water

Automatically Generated Metadata

Page 19: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Pedagogy: Collaborative learning

Hands on learning

Tool For: Teachers

Resource Type: Lesson Plan

Format: text/HTML

Placed Online: 1998-09-02

Name: PBS Online

Role: onlineProvider

Homepage: http://www.pbs.org

Automatically Generated Metadata (cont’d)

Page 20: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Metadata Evaluation Experiment

• Blind test of automatic vs. manually generated metadata

• Subjects:– Teachers– Education Students– Professors of Education

• Web-based experiment– Subjects provided with educational resources

and metadata records – 2 conditions tested

Page 21: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Metadata Evaluation Experiment

Blind Test of Automatic vs. Manual Metadata

Expectation Condition – Subjects reviewed:1st - metadata record2nd - lessson plan

and then judged whether metadata provided an accurate preview of the lesson plan on 1 to 5 scale

Page 22: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Metadata Evaluation Experiment

Blind Test of Automatic vs. Manual Metadata

Expectation Condition – Subjects reviewed:1st - metadata record2nd - lessson plan

and then judged whether metadata provided an accurate preview of the lesson plan on 1 to 5 scale

Satisfaction Condition– Subjects reviewed: 1st – lesson plan 2nd – metadata recordand then judged the accuracy and coverage of metadata on 1 to 5 scale, with 5 being high

Page 23: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Qualitative Experimental Results

Expec Satis Comb

# Manual Metadata Records 153 571 724

# Automatic Metadata Records 139 532 671

Page 24: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Qualitative Experimental Results

 

Expec Satis Comb

# Manual Metadata Records 153 571 724

# Automatic Metadata Records 139 532 671

Manual Metadata Average Score 4.03 3.81 3.85

Automatic Metadata Average Score 3.76 3.55 3.59

Page 25: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Qualitative Experimental Results

 

Expec Satis Comb

# Manual Metadata Records 153 571 724# Automatic Metadata Records 139 532 671

Manual Metadata Average Score 4.03 3.81 3.85Automatic Metadata Average Score 3.76 3.55 3.59

Difference 0.27 0.26 0.26

Page 26: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

MetaData Research Projects

1. Breaking the MetaData Generation Bottleneck

2. StandardConnection

3. MetaTest

Page 27: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

StandardConnection

• Goal: Determine feasibility & quality of automatically mapping teaching standards to learning resources

• “Solve linear equations and inequalities algebraically and non-linear equations using graphing, symbol-manipulating or spreadsheet technology.”

• Data: Educational Resources: Lesson Plans, Activities, Assessment Units, etc.

• Teaching Standards: Achieve/McREL Compendix

Page 28: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

“Simultaneous Equations Using Elimination”

URI: M8.4.11ABCJWashington Mapping

CompendixArkansas Mapping

Alaska Mapping

Michigan Mapping

California Mapping

New York Mapping

Florida Mapping

Texas Mapping

Cross-mapping through the Compendix Meta-language

Page 29: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

StandardConnection Components

CompendixMathematics: 6.2.1 C

Adds, subtracts, multiplies,& divides whole numbers

and decimals

State Standards

Educational Resources:Lesson Plans, Activities,Assessment Units, etc.

Page 30: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Lesson Plan: “Simultaneous Equations Using Elimination”

Submitted by: Leslie Howe Email: [email protected]/University/Affiliation: Farragut High School, Knoxville, Tn

Grade Level: 9, 10, 11, 12, Higher education, Vocational education, Adult/Continuing education

Subject(s): Mathematics / Algebra

Duration: 30 minutes

Description: The Elimination method is an effective method for solving a system of two unknowns. This lesson provides students with immediate feedback using a computer program or online applet.

Goals: The student will be able to solve a system of two equations when there are two unknowns.

Materials: Online computer applet / program http://www.usit.com/howe2/eqations/index.htm Similar downloadable C++ application available at the same site.

Procedure: A system of two unknowns can be solved by multiplying each equation by the constant that will make the coefficient of one of the variables become the LCM (least common multiple) of the initial coefficients. Students may use the scroll bars on the indicated applet to multiply the equations by constants until the GCF is located. When the "add" button is activated after the correct constants are chosen one of the variables will be eliminated. The process can be repeated for the second variable. The student may enter the solution of the system by using scroll bars. When the "check" button is pressed the answer is evaluated and the student is given immediate feedback. (The same procedure can be done using the downloadable C++ application.) After 5-10 correct responses the student should make the transition to paper and solve the equations without using the applet. The student can still use the applet to check the answer. The applet will generate problems in a random fashion. All solutions are integers.

Assessment: The lesson itself provides alternative assessment. The correct responses are recorded.

Page 31: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Lesson Plan: “Simultaneous Equations Using Elimination”

Submitted by: Leslie Howe Email: [email protected]/University/Affiliation: Farragut High School,

Knoxville, Tn Grade Level: 9, 10, 11, 12, Higher education,

Vocational education, Adult/Continuing educationSubject(s): Mathematics / Algebra Duration: 30 minutes

Standard: McREL 8.4.11 Uses a variety of methods (e.g., with graphs, algebraic methods, and matrices) to solve systems of equations and inequalities

Description: The Elimination method is an effective method for solving a system of two unknowns. This lesson provides students with immediate feedback using a computer program or online applet.

Goals: The student will be able to solve a system of two equations when there are two unknowns.

Materials: Online computer applet / program http://www.usit.com/howe2/eqations/index.htm Similar downloadable C++ application available at the same site.

Procedure: A system of two unknowns can be solved by multiplying each equation by the constant that will make the coefficient of one of the variables become the LCM (least common multiple) of the initial coefficients. Students may use the scroll bars on the indicated applet to multiply the equations by constants until the GCF is located. When the "add" button is activated after the correct constants are chosen one of the variables will be eliminated. The process can be repeated for the second variable. The student may enter the solution of the system by using scroll bars. When the "check" button is pressed the answer is evaluated and the student is given immediate feedback. (The same procedure can be done using the downloadable C++ application.) After 5-10 correct responses the student should make the transition to paper and solve the equations without using the applet. The student can still use the applet to check the answer. The applet will generate problems in a random fashion. All solutions are integers.

Assessment: The lesson itself provides alternative assessment. The correct responses are recorded.

Page 32: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Index of terms from Standards

Automatic Assigning of Standards as a Retrieval Process

Page 33: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Standards Assembled Standard

Indexed

DOCUMENT COLLECTION = Compendix Standards

Processed

Index of Standards is assembled from the subject heading, secondary subject, actual standard, and vocabulary.

Page 34: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Index of terms from Standards

Automatic Assigning of Standards as a Retrieval Process

Page 35: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Lesson Plan as Query

Index of terms from Standards

Automatic Assigning of Standards as a Retrieval Process

Page 36: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation New Lesson Plan

Query=Top 30 terms: equation, eliminate solve

TF/IDF: Relative frequency weights of words, phrases, proper names, etc

QUERY = NLP Processed Lesson Plan

Filtering: Sections are eliminated or given greater weight (e.g. citations are removed).

Relevant parts of lesson plan

Simultaneous|JJ Equations|NNS Using|VBG Elimination|NN

Natural Language Processing: Includes part-of-speech tagging, bracketing of phrases & proper names

Page 37: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Lesson Plan as Query

Index of terms from Standards

Automatic Assigning of Standards as a Retrieval Process

Assignment of Standard to Lesson Plan

Page 38: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Teaching Standard Assignment as Retrieval Task Experiment

• Exploratory test run

– 3,326 standards (documents)

– TF/IDF term weighting scheme

– 2,239 lesson plans (queries)

– top 30 weighted terms from each as a query vector

• Manual evaluation

– Focusing on understanding of issues & solutions

Page 39: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Information Retrieval Experiments

• Baseline Results

– 68 queries (lesson plans) evaluated

– 24 (35%) queries - appropriate standard was ranked first

– 28 (41%) queries - predominant standard was in top 5

– Room for improvement, but promising

Page 40: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Future Research

• Improve current retrieval performance– Matching algorithm, document expansion, etc

• Apply classification approach to Standard Connection Project

• Compare information retrieval approach and classification approach

• Improve browsing access for teachers & administrators

Page 41: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation Automatic Assignment of Standards to Lesson PlansStandard 8.3.6: Solves simple

inequalities and non-linear equations with rational number solutions, using concrete and informal methods .

Standard 8.4.11:Uses a variety of methods (e.g., with graphs, algebraic methods, and matrices) to solve systems of equations and inequalities

Lesson Plan with Standards attached

Standard 8.4.12 Understands formal notation (e.g., sigma notation, factorial representation) and various applications (e.g., compound interest) of sequences and series

Browsable Map of Standards, e.g. Strand Maps

Standard 8.4.11Linked

Browsing Access to Learning Resources

Page 42: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

MetaData Research Projects

1. Breaking the MetaData Generation Bottleneck

2. StandardConnection

3. MetaTest

Page 43: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Life-Cycle Evaluation of Metadata

1. Initial generation 2. Accessing DL resources - Methods - Users’ interactions - Manual - Browsing - Automatic - Searching- Costs - Relative contribution of - Time each metadata

element - Human Resources

- Technology 3. Search Effectiveness - Precision

- Recall

Page 44: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

MetadataGeneration

SystemUser

Metadata

Understanding

Evaluation

GOAL: Measure Quality & Usefulness of Metadata

PrecisionRecall

BrowsingSearching

METHODS:Manual

Semi-AutomaticAutomatic

COSTS:Time

Human ResourcesTechnology

Page 45: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Evaluation Methodology

• Automatically metatag a Digital Library collection that has already been manually meta-tagged.

• Solicit range of appropriate Digital Library users.

• For each metadata element:1. Users qualitatively evaluate it in light of the digital resource.2. Conduct a standard IR experiment.3. Observe subjects while searching & browsing.

• Monitor with eye-tracking & think-aloud protocols

Page 46: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Information Retrieval Experiment

• Users ask queries of system

• System retrieves documents using either:

– Manually assigned metadata

– Automatically generated metadata

• System ranks documents in order by system estimation of relevance

• Users review retrieved documents & judge relevance

• Compute precision & recall

• Compare results according to:

– Method of assignment

– The Metadata element which enabled retrieval

Page 47: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

User Studies: Methods & Questions

1. Observations of Users Seeking DL Resources

– How do users search & browse the digital library?

– Do search attempts utilize the available metadata?

– Which metadata elements are most important to users?

– Which are used consistently for the best results?

Page 48: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

User Studies: Methods & Questions (cont’d)

2. Eye-tracking with Think-aloud Protocols– Which metadata elements do users spend most time viewing?– What are users thinking about when seeking digital library

resources?– Show correlation between what users are looking at and

thinking.– Use eye-tracking to measure the number & duration of

fixations, scan paths, dilation, etc.

3. Individual Subject Data– How does expertise / role influence seeking resources from

digital libraries?

Page 49: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Sample Lesson Plans

Page 50: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Eye Scan Path For Bug Club Document

Page 51: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

Eye Scan Path For Sigmund Freud Document

Page 52: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

What, When, Where, and How Long

Word FixatedFixation NumberFixation Duration

Page 53: Automating & Evaluating Metadata Generation

Automatic Metadata Generation & Evaluation

In Summary: Metadata Research Goals

1. Improve access via automatic metadata generation: • Provide richer, more complete and consistent metadata. • Increase the number of resources available electronically.• Increase the speed with which they are added.

2. Add appropriate teaching standards to each resource.

3. Provide empirical results on quality, utility, and cost of automatic vs. manual metadata generation.

4. Show evidence as to which metadata elements are needed.

5. Inform HCI design with a better understanding of users’ behaviors when browsing and searching Digital Libraries.

6. Employ automatic metadata generation to build the Semantic Web.


Recommended