Taxonomies, Lexicons and Organizing Knowledge

Post on 11-Jan-2016

36 views 0 download

description

Taxonomies, Lexicons and Organizing Knowledge. Wendi Pohs, IBM Software Group. Agenda. Benefits, business and technical A few definitions Planning Issues Measuring value Futures Q&A. The Mantra. - PowerPoint PPT Presentation

transcript

Taxonomies, Lexicons and Organizing Knowledge

Wendi Pohs, IBM Software Group

IBM Software Group

Agenda

•Benefits, business and technical•A few definitions•Planning•Issues•Measuring value•Futures•Q&A

IBM Software Group

The Mantra

Knowledge is in the eye of the beholder, but reflecting end user needs is as critical as representing texts....and it takes work!

IBM Software Group

Business Benefits

Mergers and acquisitionsResearch and developmentIndustries:

ConsultingPharmaceuticalsFinancial servicesLegal

If only I could find information to help me do my job better ...

IBM Software Group

Technical Benefits

•Site creation•Navigation/search•Personalization•Defining areas of expertise

IBM Software Group

•“The science, laws or principles of classification” (From the Greek: rules of arrangement)

•Biology (Linnaeus)•Education (Bloom)

•A hierarchical collection of categories and documents

•Structure and content

Definitions: Taxonomy

IBM Software Group

Definitions: Directory•More general than taxonomy

•Natural structure•Wide vs deep

•Category structure less controlled•File system•Yahoo (http://www.yahoo.com)•Yellow Pages•Corporate Web sites (http://www.ibm.com)

IBM Software Group

•Controlled vocabulary•Subject headings, labels•Synonyms (U, UF)•Relation types (TT, BT, NT,SN, HN, RT, SA)

•Examples: http://www.loc.gov/flicc/wg/taxonomy.html

Definitions: Thesaurus

IBM Software Group

Definitions: Meta-data and tagging

•Meta-data •Properties, attributes: information describing types of data [Crandall]

•The ‘energy’ required to keep things organized [Earley]

•Tagging•<META>, <Source>•Document Properties

IBM Software Group

•Analyzing documents and assigning them to predefined categories

•Rule-based vs natural•Classification schemes

•Dewey•Library of Congress•Industry-specific

Definitions: Classification

IBM Software Group

Definitions: Clustering

•Clustering•Automatically generating groups of similar documents based on distance or proximity measures

•"Bags of words"•Vector analysis determines boundaries•Adaptive, but not abstract

IBM Software Group

Develop a Plan

•Determine user information needs• Information audit, Content audit

•Select appropriate sources•Create initial taxonomy•Edit categories•Categorize new documents•Test the UI•Train the taxonomy

IBM Software Group

Plan: Information audit

•What is the objective of the system?•Who owns the project?•What do users need?•What do content creators need?•What do system managers need?

IBM Software Group

Plan: Content audit•Is there an existing taxonomy?•How clean is the meta-data?•Is the content suited to automatic classification techniques?

•Good example: Notes discussion databases

•Not-so-good example: Web site with little text, lots of links

•Is a subset of a source better than the whole?

IBM Software Group

Plan: Select sources

•Which sources?•Who owns them?•Which sources do users access most often?

•How do users access these sources?•What is the lifecycle of the content?•Who identifies the most current content?

IBM Software Group

•Resources•Centralized or department-level•Who decides when new content is added?•Term approval process

•How do new concepts get into the taxonomy?

Plan: Maintenance

IBM Software Group

Identify issues•Getting user involvement and buy-in•Maintenance resources•Directory versus taxonomy•Meta-data•Globalization and regionalization•Hidden vs published taxonomies

IBM Software Group

Understand the BIG issues

•Organizational “perfection complex” [Chait]

•Multiple taxonomies•Automated versus manual categorization

IBM Software Group

Multiple taxonomies•Many editors•Term approval process, synonyms•Standard tools across the enterprise•Federated taxonomies

•Taxonomy links, “cross-connections,” facets, views

•Taxonomy mapping

IBM Software Group

IBM Software Group

Measuring value

•NCR Corporation - Support Organization

•Needed to convince organization of the value of captured content

•Managers resisted diverting resources to maintaining content

•Current measure: Time per incident

•How could the value of a knowledge classification system be demonstrated?

IBM Software Group

Measuring value

•NCR developed a new parameter:•Knowledge helpful (the answer was in the support database and was used to solve the problem)

•Knowledge not effective (the answer sent them in the wrong direction, did not help to address the issue)

•Knowledge not available (nothing available to assist in solving the problem)

•Knowledge not required (problem solved without the use of the knowledge base)

IBM Software Group

Futures

•Methods: •Feature extraction, statistical analysis, rules-based, label generation

•Starter taxonomies, imports•Taxonomy mapping•Interfaces: Visualization, better training tools

IBM Software Group

Q&A

•?