+ All Categories
Home > Documents > 2003.09.11 - SLIDE 1IS 202 - FALL 2003 Lecture 06: Metadata Introduction Prof. Ray Larson & Prof....

2003.09.11 - SLIDE 1IS 202 - FALL 2003 Lecture 06: Metadata Introduction Prof. Ray Larson & Prof....

Date post: 21-Dec-2015
Category:
View: 214 times
Download: 0 times
Share this document with a friend
48
2003.09.11 - SLIDE 1 IS 202 - FALL 2003 Lecture 06: Metadata Introduction Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am Fall 2002 SIMS 202: Information Organization and Retrieval
Transcript

2003.09.11 - SLIDE 1IS 202 - FALL 2003

Lecture 06: Metadata Introduction

Prof. Ray Larson & Prof. Marc Davis

UC Berkeley SIMS

Tuesday and Thursday 10:30 am - 12:00 am

Fall 2002

SIMS 202:

Information Organization

and Retrieval

2003.09.11 - SLIDE 2IS 202 - FALL 2003

Lecture Contents

• Review– Lexical Relations– WordNet

• Organization of Information

• Metadata

• Kinds of Metadata

• Dublin Core

• Discussion

2003.09.11 - SLIDE 3IS 202 - FALL 2003

Lecture Contents

• Review– Lexical Relations– WordNet

• Organization of Information

• Metadata

• Kinds of Metadata

• Dublin Core

• Discussion

2003.09.11 - SLIDE 4IS 202 - FALL 2003

Syntax

• The syntax of a language is to be understood as a set of rules which accounts for the distribution of word forms throughout the sentences of a language

• These rules codify permissible combinations of classes of word forms

2003.09.11 - SLIDE 5IS 202 - FALL 2003

Semantics

• Semantics is the study of linguistic meaning

• Two standard approaches to lexical semantics (cf., sentential semantics; and, logical semantics):– (1) compositional– (2) relational

2003.09.11 - SLIDE 6IS 202 - FALL 2003

Pragmatics

• Deals with the relation between signs or linguistic expressions and their users

• Deixis (literally “pointing out”)– E.g., “I’ll be back in an hour” depends upon the time of the

utterance• Conversational implicature

– A: “Can you tell me the time?”– B: “Well, the milkman has come.” [I don’t know exactly, but

perhaps you can deduce it from some extra information I give you.]

• Presupposition– “Are you still such a bad driver?”

• Speech acts– Constatives vs. performatives– E.g., “I second the motion.”

• Conversational structure– E.g., turn-taking rules

2003.09.11 - SLIDE 7IS 202 - FALL 2003

Lexical Relations

• Conceptual relations link concepts– Goal of Artificial Intelligence

• Lexical relations link words– Goal of Linguistics

2003.09.11 - SLIDE 8IS 202 - FALL 2003

Major Lexical Relations

• Synonymy

• Polysemy

• Metonymy

• Hyponymy/Hypernymy

• Meronymy/Holonymy

• Antonymy

2003.09.11 - SLIDE 9IS 202 - FALL 2003

WordNet

• Started in 1985 by George Miller, students, and colleagues at the Cognitive Science Laboratory, Princeton University– Miller also known as the author of the paper

“The Magical Number Seven, Plus or Minus Two: Some Limits on our Capacity for Processing Information” (1956)

• Can be downloaded for free:– www.cogsci.princeton.edu/~wn/

2003.09.11 - SLIDE 10IS 202 - FALL 2003

Miller on WordNet

• “In terms of coverage, WordNet’s goals differ little from those of a good standard college-level dictionary, and the semantics of WordNet is based on the notion of word sense that lexicographers have traditionally used in writing dictionaries. It is in the organization of that information that WordNet aspires to innovation.”– (Miller, 1998, Chapter 1)

2003.09.11 - SLIDE 11IS 202 - FALL 2003

Structure of WordNet

2003.09.11 - SLIDE 12IS 202 - FALL 2003

Structure of WordNet

2003.09.11 - SLIDE 13IS 202 - FALL 2003

Structure of WordNet

2003.09.11 - SLIDE 14IS 202 - FALL 2003

Lecture Contents

• Review– Lexical Relations– Wordnet

• Organization of Information

• Metadata

• Kinds of Metadata

• Dublin Core

• Discussion

2003.09.11 - SLIDE 15IS 202 - FALL 2003

Organization of Information

• Is there a basic human need to put things into some sort of order?– Much of natural language concerns

categories of things rather than individual things

– Why do we organize things and information?• Why do spoons go in THAT drawer in the kitchen

and not in a can in the garage?• Why do your favorite books go on one shelf and

not-so-favorite on another?

2003.09.11 - SLIDE 16IS 202 - FALL 2003

Why Organize Information?

• The main reason– So that you can find things more effectively

• I.e., effective retrieval is predicated on some sort of organization applied to information resources

• Historically there have been many institutions and tools devoted to information organization– Libraries– Museums– Archives– Indexes and catalogs, dictionaries, phone books, etc.

2003.09.11 - SLIDE 17IS 202 - FALL 2003

Why Organize Information?

• A question of scale– Using your own ad hoc set of categories and

methods to organize your own collection of books or CDs seems to work fine…

– What if your collection grew to• 10 Times the size? How would you organize it?• 100 Times? • 1000 Times?• 100000 times?

2003.09.11 - SLIDE 18IS 202 - FALL 2003

What is Information Organization?

• Identifying the existence of all types of information-bearing entities as they are made available

• Identifying the works contained within those information-bearing entities or as parts of them

• Systematically pulling together these information-bearing entities into collections in libraries, archives, museums, Internet communications files and other such depositories

From Hagler via Taylor, Chap. 1

2003.09.11 - SLIDE 19IS 202 - FALL 2003

What is Information Organization?

• Producing lists of these information-bearing entities prepared according to standard rules for citation

• Providing name, title, subject and other useful access to these information-bearing entities

• Providing the means of locating each information-bearing entity or a copy of it

2003.09.11 - SLIDE 20IS 202 - FALL 2003

Organizing Information

• Libraries

• Archives

• Museums and galleries

• Internet

• Corporate and office environments

2003.09.11 - SLIDE 21IS 202 - FALL 2003

Key Issues in This Course

• How to describe information resources or information-bearing objects in ways so that they may be effectively used by those who need to use them– Organizing

• How to find the appropriate information resources or information-bearing objects for someone’s (or your own) needs– Retrieving

2003.09.11 - SLIDE 22IS 202 - FALL 2003

Key Issues

Creation

Utilization Searching

Active

Inactive

Semi-Active

Retention/Mining

Disposition

Discard

Using Creating

AuthoringModifying

OrganizingIndexing

StoringRetrieval

DistributionNetworking

AccessingFiltering

2003.09.11 - SLIDE 23IS 202 - FALL 2003

Organizing/Indexing

• Collecting and integrating information

• Affects data, information and metadata

• “Metadata” describes data and information– More on this later

• Organizing information– Types of organization?

• Indexing

2003.09.11 - SLIDE 24IS 202 - FALL 2003

Accessing/Filtering

• Using the organization created in the O/I stage to– Select desired (or relevant) information– Locate that information– Retrieve the information from its storage

location (often via a network)

2003.09.11 - SLIDE 25IS 202 - FALL 2003

Structure of an IR System

Interest profiles& Queries

Documents & data

Rules of the game =Rules for subject indexing +

Thesaurus (which consists of

Lead-InVocabulary

andIndexing

Language

StorageLine

Potentially Relevant

Documents

Comparison/Matching

Store1: Profiles/Search requests

Store2: Documentrepresentations

Indexing (Descriptive and

Subject)

Formulating query in terms of

descriptors

Storage of profiles

Storage of Documents

Information Storage and Retrieval System

2003.09.11 - SLIDE 26IS 202 - FALL 2003

Lecture Contents

• Review– Lexical Relations– WordNet

• Organization of Information

• Metadata

• Kinds of Metadata

• Dublin Core

• Discussion

2003.09.11 - SLIDE 27IS 202 - FALL 2003

Metadata

• Metadata is– “Data about Data” (database systems)– Information about Information

• First used (to the best we can discover) in 1978 (meta-data)

• Used for databases in (Meta-Data Base)– “a data base which itself contains the structural and

semantic data of other data bases”» Thomas R. Cousins & Wayne D. Dominick, “The

Management of Data Bases of Data Bases” ASIS Proceedings, 1978.

2003.09.11 - SLIDE 28IS 202 - FALL 2003

Metadata

• Structures and languages for the description of information resources and their elements (components or features)

• “Metadata is information on the organization of the data, the various data domains, and the relationship between them” (Baeza-Yates p. 142)

2003.09.11 - SLIDE 29IS 202 - FALL 2003

Metadata

• Often two main types of metadata are distinguished– Descriptive metadata

• Describes the information/data object and its properties

• May use a variety of descriptive formats and rules

– Topical metadata• Describes the topic or “aboutness” of an

information/data object • May include a variety of vocabularies for

describing, subjects, topics, categories, etc.

2003.09.11 - SLIDE 30IS 202 - FALL 2003

Types of Metadata

• Element names

• Element description

• Element representation

• Element coding

• Element semantics

• Element classification

2003.09.11 - SLIDE 31IS 202 - FALL 2003

Metadata Systems and Standards

• Naming and ID systems• Bibliographic description

– Texts

• Music• Images and objects• Numeric data• Geospatial data• Collections• Video and motion pictures

2003.09.11 - SLIDE 32IS 202 - FALL 2003

The Same Item in Different Metadata Systems

• ISBD

• RFC 1807

• TEI Header

• MARC Record

• Dublin Core (a bit later)

2003.09.11 - SLIDE 33IS 202 - FALL 2003

ISBD Punctuation

• Title Proper (GMD) = Parallel title : other title info / First statement of responsibility ; others. -- Edition information. -- Material. -- Place of Publication : Publisher Name, Date. -- Material designation and extent ; Dimensions of item. -- (Title of Series / Statement of responsibility). -- Notes. -- Standard numbers: terms of availability (qualifications).

2003.09.11 - SLIDE 34IS 202 - FALL 2003

Bibliographic Record

• Introduction to cataloging and classification / Bohdan S. Wynar. -- 8th ed. / Arlene G. Taylor. -- Englewood, Colo. : Libraries Unlimited, 1992. -- (Library science text series).

2003.09.11 - SLIDE 35IS 202 - FALL 2003

RFC 1807

• BIB-VERSION:: CS-TR-v2.1• ID:: UCB//123456• ENTRY:: September 9, 1997• TYPE:: BOOK• TITLE:: Introduction to cataloging and classification• AUTHOR:: Wynar, Bohdan S.• AUTHOR:: Taylor, Arlene G.• DATE:: 1992• PAGES:: 633• COPYRIGHT:: Libraries Unlimited, 1992• SERIES:: Library Science Text Series• END:: UCB//123456

2003.09.11 - SLIDE 36IS 202 - FALL 2003

Minimal TEI Header

• <teiHeader>• <fileDesc>• <titleStmt>• <title> Introduction to cataloging and classification</title>• <respStmt><name>Bohdan S. Wynar<resp> 8th edition by</resp>• <name>Arlene G. Taylor</name>• </respStmt>• </titleStmt>• <publicationStmt>• <distributor>Libraries Unlimited</distributor>• </publicationStmt>• <sourceDesc>• <bibl> Introduction to cataloging and classification / Bohdan S. Wynar. -- 8th

ed. / Arlene G. Taylor. -- Englewood, Colo. : Libraries Unlimited, 1992. • </bibl>• </sourceDesc>• </fileDesc>• <teiHeader>

2003.09.11 - SLIDE 37IS 202 - FALL 2003

MARC Record (Display)

• ID:DCLC9124851-B RTYP:c ST:p FRN: MS:c EL: AD:06-20-91• CC:9110 BLT:am DCF:a CSC: MOD: SNR: ATC: UD:04-11-92• CP:cou L:eng INT: GPC: BIO: FIC:0 CON:b• PC:s PD:1992/ REP: CPI:0 FSI:0 ILC:a II:1• MMD: OR: POL: DM: RR: COL: EML: GEN: BSE:• 010 9124851• 020 0872878112 (cloth)• 020 0872879674 (paper)• 040 DLC$cDLC$dDLC• 050 00 Z693$b.W94 1991• 082 00 025.3$220• 100 1 Wynar, Bohdan S.• 245 10 Introduction to cataloging and classification /$cBohdan S. Wynar.• 250 8th ed. /$bArlene G. Taylor.• 260 Englewood, Colo. :$bLibraries Unlimited,$c1992.• 300 xvii, 633 p. :$bill. ;$c24 cm.• 440 0 Library science text series• 504 Includes bibliographical references (p. 591-599) and index.• 650 0 Cataloging.• 650 0 Subject cataloging.• 650 0 Classification$xBooks.• 630 00 Anglo-American cataloguing rules.• 700 10 Taylor, Arlene G.,$d1941-

2003.09.11 - SLIDE 38IS 202 - FALL 2003

Lecture Contents

• Review– Lexical Relations– WordNet

• Organization of Information

• Metadata

• Kinds of Metadata

• Dublin Core

• Discussion

2003.09.11 - SLIDE 39IS 202 - FALL 2003

Dublin Core

• Simple metadata for describing internet resources

• For “Document-Like Objects”

• 15 Elements (in base DC)

2003.09.11 - SLIDE 40IS 202 - FALL 2003

Dublin Core

• TITLE: Introduction to cataloging and classification• CREATOR: Taylor, Arlene G.• OTHER CONTRIBUTOR: Wynar, Bohdan S.• DATE: 1992• FORMAT: BOOK• LANGUAGE: ENG• PAGES: 633• PUBLISHER: Libraries Unlimited• SUBJECT: Cataloging.• SUBJECT: subject cataloging.• SUBJECT: Classification -- Books• DESCRIPTION: Textbook on cataloging and classification• RESOURCE TYPE: text.monograph• RESOURCE IDENTIFIER: (ISBN) 0872879674

2003.09.11 - SLIDE 41IS 202 - FALL 2003

Dublin Core Elements

• Title

• Creator

• Subject

• Description

• Publisher

• Other Contributors

• Date

• Resource Type

• Format

• Resource Identifier

• Source

• Language

• Relation

• Coverage

• Rights Management

2003.09.11 - SLIDE 42IS 202 - FALL 2003

Mega-Metadata Standards

• METS - Metadata Encoding and Transmission Standard (http://www.loc.gov/standards/mets)– Developed by the Digital Library Federation as an

implementation strategy for preservation metadata– "XML document format for encoding metadata

necessary for both management of digital library objects within a repository and exchange of such objects between repositories (or between repositories and their users)”

– Provides a flexible mechanism for encoding descriptive, administrative, and structural metadata for a digital library object, and for expressing the complex links between these various forms of metadata

2003.09.11 - SLIDE 43IS 202 - FALL 2003

Metadata Resources

• Check the Links section from the class home page

• Best site is the “Digital Library: Metadata Resources” page from IFLA at http://www.ifla.org/II/metadata.htm

• For another good source of information on metadata standards see http://www.chin.gc.ca/English/Standards

2003.09.11 - SLIDE 44IS 202 - FALL 2003

Lecture Contents

• Review– Lexical Relations– WordNet

• Organization of Information

• Metadata

• Kinds of Metadata

• Dublin Core

• Discussion

2003.09.11 - SLIDE 45IS 202 - FALL 2003

Chap 3 Questions: Tu Tran

• In the 1830s, there was debate and opposition to having a non-Englishman (Italian born Anthony Panizzi) be the Keeper of the Printed Books of the British Museum. In today’s society, who is qualified to categorize information? For instance, should it be done by the creators of the works, experts in the fields, information science professionals, or a combination of people? How is categorization open to interpretation?

2003.09.11 - SLIDE 46IS 202 - FALL 2003

Tu Tran

• What would be some pros and cons of using a standard list of subject headings (such as the Library of Congress Subject Headings) to organize the Internet? See http://www.loc.gov/catdir/cpso/lcco/lcco.html for the LOC list.

2003.09.11 - SLIDE 47IS 202 - FALL 2003

Chap 5 Questions: Hong Qu

• As more information gets produced and published everyday, would an extensible metadata standard such as the Dublin Core work better than a fixed metadata standard? What arguments are there for fixed versus extensible metadata systems?

• Taylor notes that the newer standards do not prescribe content. What exact does that mean? And what reasons would the metadata creators have for not prescribing “form of content”?

2003.09.11 - SLIDE 48IS 202 - FALL 2003

Next Time

• Controlled vocabularies (Introduction)

• Readings/discussion (all in PDF)– Paper by Elaine Svenonius on controlled

vocabularies• Sarah

– Paper by Chris Borgman on online catalogs• Paul

– Paper by Marcia Bates on a design model for subject access

• Matt


Recommended