+ All Categories
Home > Documents > Metadata ARLIS Study Day 9 September 2009 John Hargreaves Technical Support Officer JISC Digital...

Metadata ARLIS Study Day 9 September 2009 John Hargreaves Technical Support Officer JISC Digital...

Date post: 13-Dec-2015
Category:
Upload: mervyn-shields
View: 218 times
Download: 0 times
Share this document with a friend
Popular Tags:
36
Metadata ARLIS Study Day 9 September 2009 John Hargreaves Technical Support Officer JISC Digital Media
Transcript

Metadata

ARLIS Study Day9 September 2009

John HargreavesTechnical Support Officer

JISC Digital Media

JISC Digital Media

•JISC Digital Media is a JISC Advisory Service providing advice and guidance to the UK Further and Higher Education communities on all aspects of finding, making, managing and using digital images, moving images and sound files.

Services

– Web resourceshttp://www.jiscdigitalmedia.ac.uk/

[email protected](for FE/HE; limited support for other sectors)

– Traininghttp://www.jiscdigitalmedia.ac.uk/training/

– Email list and bloghttp://www.jiscmail.ac.uk/tasihttp://www.jiscdigitalmedia.ac.uk/blog/

– Consultancy

Metadata Content:Areas to cover

• What is metadata?• What metadata do I need to collect?• Where does metadata come from?• How is metadata organised?• The importance of vocabularies• Real examples of metadata collection and

use

What is Metadata?

Image courtesy of stock.xchng

What is Metadata?

OED: “Data operating at a higher level of abstraction”

“Usefulinformation

aboutstuff”

- serves purpose- has structure- is referential- potentially anything!

Common definition:“Structured data about data”

“Useful” – Purposes

– Finding, identifying and understanding a resourceDescriptive/Discovery metadatae.g. “Title”, “Subject”

– Creating, managing andpreserving a resourceAdministrative, Technical, and Preservation metadatae.g. “Format”, “Filesize”

“Useful” – Purposes

– Organising and relating resourcesStructural and Packaging metadatae.g. “Is part of”, “Master image location”

– Using a resourceUsage and User-contributed metadatae.g. “Published in”, “License requirements”, “User rating”

“Information” - Structure

– General categoriese.g. “Format” or “Subject”Metadata schemas

– Specific valuese.g. “JPEG” or “Dog”Metadata vocabularies

“About Stuff” – Reference

– Different ‘levels’ of a resource(e.g. collection, item, component)

– Different ‘layers’ within a resource(e.g. physical resources, intermediaries, digital resources)

– Things outside the resource(e.g. rights ownership)

Some initial questions…

– What am I actually describing?– For whom?– For what purposes?– What categories and vocabularies

might I need to assemble? Where am I going to get the

metadata from? Where am I going to keep it?

Metadata can have different origins…

– “Implicit”– derived from the image itself(typically technical data)

– “Explicit” – brought to the image(typically descriptive metadata;might be ‘legacy’ data, or newly created)

– New metadata might be:• Provided by an image contributor• Inferred from a context• Added by a cataloguer• Added by a user• All of above

… and can exist in different locations

– Embedded within the digital resource itself

– Held in a traditional database– Within an XML encoding

<image>

<ID> Jga-0019a </ID>

<Title> Sanctuary of Apollo </Title>

</image>

“Metadata Communities”

– Libraries(e.g. Yale Library Catalogue http://orbis.library.yale.edu/)

– Individual, published, non-unique items

– Long tradition of highly standardised metadata, shared cataloguing, interoperability (e.g. AACR2/MARC, DDC, LC Name Authorities…)

“Metadata Communities”

– Archives(e.g. Online Archive of California http://www.oac.cdlib.org/)

– Large, unique collections, context very important, limited resources

– Common standards are relatively recent, Collection descriptions (“Finding aids”)(e.g. ISAD(G)/EAD, ISAAR(CPF)…)

“Metadata Communities”

– Museums and galleries(e.g. British Museum http://www.britishmuseum.org/)

– Large, unique and often diverse collections, context and administration important

– Have typically developed in-house approaches, common standards relatively recent (e.g. CDWA, Spectrum…)

“Metadata Communities”

– Photographers/Picture Libraries(e.g. UCAR Atmospheric Research Photo Library http://www.fin.ucar.edu/ucardil/)

– Individual items, simple systems, focus on metadata within images

– In-house approaches, “niche” standardisation (for e.g. technical and embedded metadata)

Choosing, adapting,and mapping schemas

– Ideally we’d pull a schema off the shelf and begin cataloguing

– Choice is clear for some collections but difficult for others (esp. where collection spans resource types or communities)

– Adaptation is common and generally necessary (but needs to be done carefully!)

– You might be combining several standard schemas or developing your own and mapping to standards for particular purposes

Dublin Core International (ISO 15836-2003)

cross-community standard for

describing digital resources

http://dublincore.org/ Concentrates on descriptive/

discovery metadata “1:1 rule” (1 record for 1 thing) Frequently adapted, mapped-to,

used to achieve interoperability

TitleCreatorSubjectDescriptionPublisherContributorDateTypeFormatIdentifierSourceLanguageRelationCoverageRights

Three ways to adapt a schema

Adapting schemas

(1) Extend

(2) Qualify

(3) Simplify

Consequences for interoperability

VRA Core – Visual Resources Association– Version 4.0 is now also available– Concentrates on

descriptive/discovery metadata– For art and cultural images – Influenced by Dublin Core– 1:1 rule (Work/Image)– Frequently adapted– http://www.vraweb.org/

Record TypeTypeTitleMeasurementsMaterialTechniqueCreatorDateLocationID NumberStyle/PeriodCultureSubjectRelationDescriptionSourceRights

IPTC Core

International Press and Telecommunications Council

Schema for embedding metadata within an image

Version 1.0 (for XMP) launched in 2004

http://www.iptc.org/IPTC4XMP/

Contact Information(e.g. Creator, Address, Email)

Content Information(e.g. Description, Keywords)

ImageInformation(e.g. Intellectual Genre, Location)

StatusInformation(e.g. Title, Source, Copyright)

SEPIADESSafeguarding European PhotographicImages for Access

– For photographic collections – Very extensive, with many sub-categories– Covers description and administration, physical works and

their digital reproductions– Multi-level description which can describe a whole collection

at many levels at once (based on archival metadata)– http://www.knaw.nl/ecpa/sepia/workinggroups/wp5/

sepiadestool/sepiadesdef.pdf

CDWACategories for the Description of Works of Art • Describes art works or cultural objects

• Museum/gallery community• Extensive with many sub-categories• Covers description and administration, original

works and their reproductions• Can describe complex objects with multiple parts• Note that there is a ‘lite’ version• http://www.getty.edu/research/

conducting_research/standards/cdwa/index.html

Some Established Mappings

– Mapping metadata schemas:• Getty crosswalks:

http://www.getty.edu/research/conducting_research/standards/intrometadata/crosswalks.html

• UKOLN resources:http://www.ukoln.ac.uk/metadata/

Vocabularies:“Controlling your

Language”

Image courtesy of stock.xchng

Why Use Controlled Vocabularies?

– Better retrieval

– Improved cataloguing efficiency and consistency

– ‘Disambiguate’ the language (e.g. ‘bank’)

– Put things in their place (e.g. classify, identify relationships)

– Support interoperability (improvedcross-searching and metadata sharing)

Ways to Control Vocabularies

– Data entry rules or guidelines– Formal subject headings– Thesauri– Classifications– Authority lists (people, places, events…)– In-house keyword lists– Uncontrolled cataloguer-added keywords?– Combination of approaches

Formal Controlled Vocabularies

Great Britain - - History - - Norman period, 1066-1154

Anglo-Norman

942.02

William the Conqueror

William I, King of England, 1027 or 8-1087

Library of Congress Subject Heading (LCSH)

Art and Architecture Thesaurus (AAT)

Full hierarchy = Styles and Periods \ European \ Medieval \ Anglo-Norman

Dewey Decimal Classification (DDC) 900=History, 940=European History, 942=British

History, 942.02=Norman period

Library of Congress Name Authorities

Cataloguer keyword

What about ‘Uncontrolled’ Keywords?

– Made up by a cataloguer at the point of cataloguing

– Not an either/or situation – your metadata can accommodate both

– A mix of both can assist with retrieval

Alternative Vocabularies•Consider some more creative approaches:

– Ask some of your users to ‘catalogue’ a representative sample of your collection

– Get your users to do the cataloguing!(e.g. tagging or “folksonomies” – more later)

– Get the technology to do the cataloguing!(e.g. CBIR – more later)

– Draw on vocabularies from other communities, traditions and disciplines

– Use an alternative vocabulary source(e.g. a children’s encyclopaedia, book index)

CBIR & ‘Folksonomy’ using Flickr

Exploring Flickr by colour:http://labs.systemone.at/retrievr/

Using Flickr to catalogue a collectionhttp://www.flickr.com/photos/Library_of_Congress/

Another kind of user metadata

User-generated metadata Web browser ‘cookies’ Page tracking Failed search analysis

Can provide very useful feedback Can enable you to offer additional services to

users (e.g. customisation and email notification)

Examples for evaluation

– Galaxy Zoo - http://www.galaxyzoo.org/

– Staffordshire PastTrack - http://www.staffspasttrack.org.uk/

– History Wired - http://historywired.si.edu/

Back to those initial questions

– What am I actually describing?– For whom?– For what purpose?– What categories and vocabularies

might I need to assemble?– Where am I going to get the

metadata from?– Where am I going to keep it?

How am I going to exploit it?

Further Support and Guidance

Web site: http://www.jiscdigitalmedia.ac.uk/

helpdesk: http://www.jiscdigitalmedia.ac.uk/helpdesk/

JISC Mail:https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind0907&L=JISCDIGITALMEDIA


Recommended