Taxo
nom
y &
Met
adat
a / I
nfor
mat
ion
Arch
itect
ure
Cons
ultin
g
Amy J. Warner, Ph.D.
Metadata & Taxonomies for a More Flexible Information
ArchitectureInformation Architecture Summit
March 16, 2002Amy J. Warner, Ph.D.
Amy J. Warner, Ph.D. 2
Outline
What Ill cover: Metadata and IA. Metadata schema. Vocabulary development.
Underlying themes: Standards. Reality. Some IR (information retrieval) issues.
Amy J. Warner, Ph.D. 3
What is Metadata?
Metadata is structured data which describes the characteristics of a resource. It shares many similar characteristics to the cataloguing that takes place in libraries, museums and archives.
Chris TaylorUniversity of Queensland
Amy J. Warner, Ph.D. 4
Types & Functions of MetadataTYPE DEFINITION EXAMPLES
Administrative Metadata used in managing andadministering resources
Acquisition informationRights and reproduction trackingDocumentation of legal accessrequirementsLocation informationVersion control
Descriptive Metadata used to describe oridentify information resources
Cataloging recordsSpecialized indexesHyperlinked relationships betweenresourcesAnnotations by users
Preservation Metadata related to thepreservation of informationresources
Documentation of actions taken topreserve physical and digitalversions of resources (e.g., datarefreshing and migration)
Technical Metadata related to how asystem functions or metadatabehaves
Digitization information (e.g.,formats, compression ratios,scaling routines)Authentication and security data(e.g., encryptions, passwords)
Use Metadata related to the level andtype of use of informationresources
Use and user trackingContent re-use and multi-versioning information
Introduction to Metadata, Getty Information Institute
Amy J. Warner, Ph.D. 5
Confusing Terminology Controlled vocabularies
Subject Headings: traditionally employed in libraries to tag (index) the topics of books and other library materials
Thesauri: traditionally employed in abstracting & indexing services to tag (index) the topics of journal articles and other scholarly material in a given subject area (e.g. medicine, engineering)
Taxonomies: the classification of different organisms into mutually exclusive categories based on phylum species
Amy J. Warner, Ph.D. 6
Levels of Control
Simple Complex
SynonymRings
AuthorityFiles Thesauri
ClassificationSchemes
Equivalence Hierarchical Associative
(Vocabularies)
(Relationships)
Taxonomies
Amy J. Warner, Ph.D. 7
Metadata & IA
Content
UsersBusinessContext
Identify patternsin content
Determine how target audience(s) search for and use information
Determine how stakeholderswant to organize &present
their information
Amy J. Warner, Ph.D. 8
IA Generations
Brochureware
Pages served from database
Metadata-driven website
CMS
Amy J. Warner, Ph.D. 9
Metadata in Metadata-Driven Websites
MetadataRecords
Content
J. Jones xxxx White Paper Employees http://...
Author Title DocType Audience URL
http://.
Amy J. Warner, Ph.D. 10
Two Parts to Generating a Metadata Schema
Decisions about indexable parameters (attributes, aspects) of documents; this corresponds to fields in the database records.
Decisions about the elements (terms, descriptors, subject headings, tags) that these fields contain.
Amy J. Warner, Ph.D. 11
Two Possibilities
Content already exists Identify content that exists--content
inventory. Most or all content does not exist
Use wish lists to identify desired content. To do content inventory, need to go to
those who are going to develop, own, maintain content.
Amy J. Warner, Ph.D. 12
Content Analysis
Look for patterns, similarities: logical--themes, sensitivity, specialization. physical--formats, dynamic vs. static (dated
vs. rarely updated). Look for relationships--note connections
between content (parent-child, sibling, dependencies.
Begin to create groupings.
Amy J. Warner, Ph.D. 13
Generating a Metadata Table The beginning of a metadata-driven website. Determine the major indexable parameters or attributes
for each major document type in your sample. Determine what major types of rules or general guidelines
your indexing system will follow for each attribute. Create an X-by-Y table. Put indexable attributes on the X axis and the rules on the
Y axis. Fill in the decisions you make about each rule application
in the individual cells of the table.
Amy J. Warner, Ph.D. 14
Required Repeatable Auto/Manual Whole doc/Concepts
CV
Author Yes Yes Manual Whole Doc. No
Title Yes No Manual Whole Doc. No
DocType No Yes Manual Whole Doc. DocTypesList
Subject Yes Yes Semi-Auto Concepts SubjectsVocabulary
Audience No No Manual WholeDocument
AudienceList
Metadata Table
Amy J. Warner, Ph.D. 15
User and Stakeholder Involvement
When organizing content, start with the content, generate the metadata, and then evaluate with users and stakeholders.
When organizing entities (i.e. products, projects) where content is not the major focus, start with stakeholders and users to determine metadata.
Amy J. Warner, Ph.D. 16
Identify Terms Published Reference Materials
Thesauri, classification schemes, encyclopedias, dictionaries, glossaries, indexes.
Content Representative sample of web site / intranet.
Users Search log analysis, surveys, interviews.
Experts Authors, subject experts.
Amy J. Warner, Ph.D. 17
Organize Terms Define preferred terms. Link synonyms and variants.
Synonym Rings
Group preferred terms by subject. Identify broader and narrower terms.
Taxonomies / Hierarchies Identify related terms.
Thesauri
Amy J. Warner, Ph.D. 18
Variant Terms
Variant terms provide the user with entrypoints into the vocabulary.
Synonyms (same meaning):cats USE felines helicopters USE whirlybirds
Lexical Variants (different word forms):paediatrics USE pediatrics BK USE Burger King
Quasi-Synonyms (treated as equivalent):generic posting: beagle USE dogantonyms/continuum: wetness USE dryness
Amy J. Warner, Ph.D. 19
Term Specificity
Assuming a good entry vocabulary, increased term specificity allows for improved precision without hurting recall (but costs grow fast).
Vocabulary A Vocabulary B United States United States
California San Diego
Amy J. Warner, Ph.D. 20
Compound TermsArticle Title: Software for Information Architects
Hig
h Pr
ecis
ion
Hig
h R
ecal
lOne Term Information Architecture Software
Two Terms Information Architecture Software
Three Terms Architecture Information Software
Amy J. Warner, Ph.D. 21
Facets
Things (entities)ConceptsProcessesPeopleOrganizationsOccupations
etc.
TopicAudienceIntellectual LevelFormTypeLanguageDate
etc.
Facets of a Topic Facets of Documents
Aspects of Documentsto Index
Controlled Vocabular(ies)
Amy J. Warner, Ph.D. 22
Facet Analysis
Facets come from content inventory, intuition, and users.
Break domain into logical categories or chunks based on how documents need to be managed (both for system and for search).
Amy J. Warner, Ph.D. 23
Polyhierarchy
Strict Hierarchies Each term appears in only
one place in the hierarchy. Essential for placement
of physical objects. Polyhierarchies
Terms cross-listed in multiple categories
Accepts complex nature of reality.
Amy J. Warner, Ph.D. 24
Polyhierarchy
Compound terms neededto manage 6 milliondocuments in Medline.
High level ofpre-coordinationforces polyhierarchy.
Terms may havemore than one BT. Viral
Pneumonia
Diseases
VirusDiseases
RespiratoryTract
Diseases
Medical Subject Headings (MeSH)
Amy J. Warner, Ph.D. 25
Facets, Coordination, Specificity
Drying of ApplesDrying of PearsDrying of PeachesCanned ApplesCanned PearsCanned PeachesFrozen ApplesFrozen PearsFrozen PeachesFresh ApplesFresh PearsFresh PeachesFreezing of Canned ApplesCanning of Dried PearsDrying of Fresh Peaches
EntitiesApplesPearsPeaches
ProcessesCanningFreezingDrying
FormsCannedFrozenFresh
ApplesPearsPeachesCanningFreezingDryingCannedFrozenFreshCanning of ApplesCanning of PearsCanning of PeachesFreezing of ApplesFreezing of PearsFreezing of Peaches
Partial List of Potential Combinations
Amy J. Warner, Ph.D. 26
Semantic Relationships
Equivalence: Use/Used For (USE/UF) Leads from variants to preferred
e.g., prams: USE baby carriages
A = B
Amy J. Warner, Ph.D. 27
Semantic Relationships
Hierarchical: Broader Term/Narrower Term (BT/NT)
Types Generic (class/species, inheritance)
Vertebrata NT Amphibia Whole-Part (associative unless exclusive)
Ear NT Vestibular Apparatus Instance (proper name)
Seas NT Mediterranean Sea
AB
Amy J. Warner, Ph.D. 28
Semantic Relationships
Associative: Related Term (RT, See Also) Non-hierarchical and non-equivalent Relation should be strongly implied
e.g., hammers RT nails
A B
Amy J. Warner, Ph.D. 29
Associative Relationships Field of Study and Object of Study:
Forestry RT Forests Process and its Agent:
Temperature Control RT Thermostat Concepts and their Properties:
Poisons RT Toxicity Action and Product of Action:
Weaving RT Cloth Concepts Linked by Causal Dependence:
Bereavement RT Death
Amy J. Warner, Ph.D. 30
Leveraging the Thesaurus User Interface:
Generate browsable indexes (site-wide, sub-site, specialized authority lists).
Enable Field-Specific Searching (filters, zones, sorting).
Support personalization (map profile to vocabulary).
Behind the Scenes: Enable efficient content management. Support decentralized tagging.
Amy J. Warner, Ph.D. 31
Uses of Metadata-Driven Website
Routing Search Navigation
Amy J. Warner, Ph.D. 32
RoutingDocument Stream Metadata Filter Document Subset
From IndividualContributors or Syndication Service
Profile orFilter
Amy J. Warner, Ph.D. 33
Generalizations about Routing
Can be push or pull. Can be driven by various metadata
elements (e.g., audience, topic, etc.). May have both internal and external
metadata schemes to consider; mapping may be an important issue.
Amy J. Warner, Ph.D. 34
SearchingSearchingUser Query Databases Document
Subset
MetadataRecords
http://.
Amy J. Warner, Ph.D. 35
Epicurious.com
Amy J. Warner, Ph.D. 36
Epicurious, First Facet Browse > Picnics
Amy J. Warner, Ph.D. 37
Epicurious.com FacetsBeans, Beef, Berries, Cheese, Chocolate, Citrus,Dairy, Eggs, Fish, Fruits, Garlic, Ginger, Grains,Greens, Herbs, Lamb, Mushrooms, Mustard, Nuts,Olives, Onions, Pasta, Peppers, Pork, Potatoes, Poultry, Rice, Shellfish, Tomatoes, Vegetables
Main Ingredients
African, American, Asian, Caribbean, EasternEuropean, French, Greek, Indian, Italian, Jewish,Mediterranean, Mexican, Middle Eastern,Scandinavian, Spanish
Cuisine
Advance, Bake, Broil, Fry, Grill, Marinade,Microwave, No Cook, Poach, Quick, Roast, Saut, Slow Cook, Steam, Stir Fry
Preparation Method
Christmas, Easter, Fall, Fourth of July,Hanukkah, New Years, Picnics, Spring,Summer, Superbowl, Thanksgiving, Valentine's Day, Winter
Season/Occasion
Appetizers, Bread, Breakfast, Brunch,Condiments, Cookies, Desserts, HorsD'oeuvres, Main Dish, Salads, Sandwiches,Sauces, Side Dish, Snacks, Soup, Vegetables
Course/Dish
Amy J. Warner, Ph.D. 38
Epicurious, Second FacetBrowse > Picnics > Poultry
Amy J. Warner, Ph.D. 39
Integration of Search and Browse
Amy J. Warner, Ph.D. 40
Integration of Search and Browse
Amy J. Warner, Ph.D. 41
Amazon.com Advanced Search
Amy J. Warner, Ph.D. 42
Generalizations about Search & Navigation
The relationship between the metadata and search engine capabilities is crucial.
Controlled vocabulary and keyword searching are often both enabled.
Navigation and search are often both provided as complements to each other.
Amy J. Warner, Ph.D. 43
Contact:Amy J. Warner, [email protected]
Questions??
Metadata & Taxonomies for a More Flexible Information ArchitectureOutlineWhat is Metadata?Types & Functions of MetadataConfusing TerminologyLevels of ControlMetadata & IAIA GenerationsMetadata in Metadata-Driven WebsitesTwo Parts to Generating a Metadata SchemaTwo PossibilitiesContent AnalysisGenerating a Metadata TableSlide 14User and Stakeholder InvolvementIdentify TermsOrganize TermsVariant TermsTerm SpecificityCompound TermsFacetsFacet AnalysisPolyhierarchySlide 24Facets, Coordination, SpecificitySemantic RelationshipsSlide 27Slide 28Associative RelationshipsLeveraging the ThesaurusUses of Metadata-Driven WebsiteRoutingGeneralizations about RoutingSearchingEpicurious.comEpicurious, First FacetEpicurious.com FacetsEpicurious, Second FacetIntegration of Search and BrowseSlide 40Amazon.com Advanced SearchGeneralizations about Search & NavigationSlide 43