Date post: | 26-Dec-2015 |
Category: |
Documents |
Upload: | oscar-evans |
View: | 220 times |
Download: | 0 times |
Knowledge Organization in Digital Libraries (II)
Digital LibrariesINFO 653
Week 6
Xia LinCollege of Information Science and
TechnologyDrexel University
Approaches: Keyword Indexing
Making search engines functional Metadata (bottom-up)
Extending traditional subject indexing Classification (Top-down)
Using a structured classification frame to provide hierarchical browsing and access.
Ontology Approach
Keyword Indexing Highly automated process. Use every meaningful word to
index documents. Make search engines functional Make large amount of
information accessible.
MetaData Approach Digital Object Identifiers Dublin Core
Subject tag Description tag
RDF Data model Resource
Classification Approach Use Current Classification Scheme
LC Classification Dewey Classification Most projects are not completed
A mile wide an inch deep Use ad-hoc classification schemes
Yahoo style hierarchical list Use automatic classification
Ontology Approach Ontologies
Define not only concepts but also relationships of concepts.
Define both links and types of links.
Ontology An ontology is a specification of a
conceptualization. An ontology is a description (like a
formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents.
An ontology is a commitment to use the shared vocabulary in a coherent and consistent manner.
Projects
People
Government
Organizations
Concepts(taxonomy and
ontology)
Document
WorkforcePrograms
InfoResources
Events(conferences,workshops, ...)
write
describes
initiates
refers-to
includes
is-related-to
Policy and regulation Documents
Guides,Handbooks
Presentations
example-of
example-of
Describes
sponsors
represents
sponsors
refers-to
uses
describes
Peter Creticos
example-of
is-part-of
Cases that worked Lessons
learned
example-ofexample-of
sponsors
Work Force Digital Library Ontology
Why Develop an Ontology?
To enable a machine to use the knowledge in some application.
To enable multiple machines to share their knowledge.
To help yourself understand some area of knowledge better.
To help other people understand some area of knowledge.
To help people reach a consensus in their understanding of some area of knowledge.
Ontology and thesaurus Ontology inherits the ideas,
purposes, and functions of the thesaurus.
Ontology extends relationships among concepts beyond those in thesaurus (NT, BT, RT, Synonyms).
Ontology intends to be consumed by both human and machine.
Topic Maps A key component of Semantic
Web A new ISO standards
ISO 13250 Topic Maps XML-like syntax
XML Schema XTM: XML Topic Maps
XTM Topic MAPS XML Topic Maps(XTM) defines an
abstract model and XML grammar for topic maps. XTM does not define topic maps at
the implementation level. Each implementation may interpret
XTM differently or define their own “metadata” with the framework of XTM.
TAO of Topic Maps <topicmap>
TOPIC topname
basename dispname sortname
OCCURS ASSOC
assocrl facet
fvalue addthms
</topicmap>
Topic Maps for Knowledge Representation
Establishing an associative network between resources which represent concepts
Organizing legacy resources into a new information/knowledge space, by relating them to topics, and associating those topics, in a structured way
Enabling disparate sets of information resources to be used together, by interrelating them using a unifying conceptual framework
Topic Map Implementation Why is topic map implementation
hard? There are no “magic” solutions for content
representation. It is labor-intensive and involves many
manual activities to create a complete TAO.
There are no good tools for topic map creation.
XML is not designed to let end-users work directly on objects contained in a XML file.
Topic Maps and Thesaurus
Different Directions of indexing Thesaurus: assign descriptors to
documents Topic maps: associate occurrences to
terms Different structures
Thesaurus: mainly a hierarchy plus some cross-references
Topic Maps: more link types
ALL Together –
XML RDF
Ontology
Topic Maps
Thesaurus
ClassificationKeyword indexing
Metadata
Semantic Web
Libraries
Personal Research Projects Explore solutions to make
knowledge organizing practical Knowledge Class KEPT Knowledge Middleware
Knowledge Class Purposes
to customize knowledge organization and access,
to supplement and complement existing devices for Web users, and
to explore the possibility of combining existing methods of knowledge organization with advanced Web technology.
Knowledge Class Design Principles
balance of browsing and searching
balance of manual indexing and automatic indexing
balance of personal (topical) information space and the whole web space
Knowledge Class Three components
an organizing framework a dynamic web interface Search strategies for each term
Knowledge Class Features A hierarchical structure of subject
terms constructed on classification principles
Multiple levels of knowledge organization --Expandable and contractible branches of the hierarchy to allow varying levels of depths,
Static links to remote resources and related sites or pages
Dynamic links to target information through search engines such as Google, AltaVista, InfoSeek, Yohoo!, and Lycos, etc.
Coded search strategies for terms Use of scope terms for classes and for branches
Knowledge Class Features Referral links among terms within a knowledge
class and potentially among knowledge classes to assist cross reference.
Instant switch among search engines available over the Web to allow access of a variety of resources covered by different search engines.
Yahoo Categories: References – Libraries – Digital
Libraries: Cataloging Electronic Resources@ Conferences (5) Electronic Literature@ Electronic Theses and Dissertations (ETDs) (14) Metadata@ Organizations (2) Projects and Collections (33)
IFLA page: Resources and Projects Cataloguing & Indexing of Electronic
Resources Electronic Text & Journal Archives Metadata Resources
Digital Libraries: a Selected Resource Guide
Overview and general resources Project planning & management Architecture Technology Standards and guidelines Archiving & Preservation Metadata Intellectual property rights.
Northern Light folders Digital Libraries
Special collections Conferences dlib.org dlib.org.ar uh.edu rutgers.edu stanford.edu stfx.ca vt.edu uni-trier.de ucla.edu Class notes & Assignments all others...
Digital libraries by William Y. Arms:
Table of Contents 1 Libraries, Technology, and People2 The Internet and the World Wide Web3 Libraries and Publishers4 Innovation and Research5 People, Organizations, and Change6 Economic and Legal Issues7 Access Management and Security8 User Interfaces and Usability9 Text10 Information Retrieval and Descriptive Metadata11 Distributed Information Discovery12 Object Models, Identifiers, and Structural Metadata13 Repositories and Archives14 Digital Libraries and Electronic Publishing Today
Practical Digital Libraries: Books, Bytes, and Bucks by Michael Lesk
1. Evolution of Libraries 2. Text Access Methods 3. Images of Pages 4. Multimedia Storage and Access 5. Knowledge Representation Methods 6 Distribution 7 Usability and Retrieval Evaluation 8 Collections and Preservation 9 Economics 10 Intellectual Property Rights 11 International Activities 12 Future: Ubiquity, Diversity, Creativity, and Public
Policy
How do I build a Thesaurus Use existing dictionaries and thesauri to decide on the terms and their relationships.
Collect a set of representative documents and try to index them; take the set of indexing terms as your preliminary list.
Review and organize the preliminary term set: decide on preferred terms and make Use
references from the variants and synonyms;
build hierarchical and associative relationships among the preferred terms.
Produce a draft list, test and revise.
Scope terms Each knowledge class can have one
scope term to limit the search scope: Technology -- will be searched by
technologies AND “digital libraries” in the kclass of Digital Libraries.
Each branch of knowledge class can have one scope term: Issues – in Technology branch will be
search by “Issues and Technology and digital libraries”
Data Format –first year
--, mutual funds, mutual-funds Investment-trusts Unit-trusts, http://www.brill.com, 1 1. Hierarchical level 2. Display term 3. Search term (synonyms) 4. URL 5. Search strategy code
Second year-- Last Year’s student
project <topicmap title="Digital Libraries"> <topic id="General Resources" type="Main category"> <topic id="Bibliography"> <topname> <basename>Bibliography</basename> <dispname>Bibliography</dispname> <sortname></sortname> </topname> <occurs> </occurs> <topic id="IFLA bibliography" type="reference"> <topname> <basename>IFLA bibliography</basename> <dispname>IFLA bibliography</dispname> <sortname></sortname> </topname> <occurs> type="website"
href="http://www.ifla.org/II/diglib.htm" </occurs> </topic>
Search Strategykey word search:
0 search term + branch scope term + class scope term1 search term + class scope term2 search term only
Phrase search:3 search term (as a phrase) +branch scope term + class scope term4 search term (as a phrase) + class scope term5 search term (as a phrase)
Hierarchical search:6 search term +its all the children + branch scope term + class scope term7 search term +its all the children +class scope term8 search term +its all the children
No search:9 No search No link for this display term; Label only
Search terms+ display term:10 same as 0 except display term also adds to the query11 same as 1 except display term also adds to the query12 … …
Digital Libraries General Resources Technology Projects Indexing & Cataloging
Knowledge representation Metadata Resources Collections and Repositories Digital Preservation Economic and legal issues
Intellectual Property Rights People and organizations
Next Integration: KEPT
InformationResources
Knowledge-Enabled Personalization Tool (KEPT)
Web Browser
HTTP Server
XML Application Server
RDF-ISOStandards
Search engines
OAI protocol
Knowledge RepositoryDrag and drop
Hierarchical Generator
Co-occurrence Mapping
Topic Map Editor
Searching/Browsing Interface
XM
LSc
hem
a
XM
LX
SLT
Relational DatabaseThesauri
OntologiesTopic maps
…….
New InterfaceSearch: Recycling
TopicMapRelated Terms: Conservation (Environment) Depleted Resources Ecology Natural Resources Pollution Recycling Solid Wastes Waste Disposal Waste Water Wastes Water Treatment
Broader Terms:Sanitation Waste Disposal Recycling
ERIC Thesaurus
Co-occurrence Terms: Environmental Education Waste Disposal Conservation (Environment) Science Education Natural Resources Solid Wastes Ecology Pollution Learning Activities Higher Education Wastes Instructional Materials Conservation Education Energy Environment
ERIC Database
MeSH Terms matched “Pollution”:Air PollutionAir Pollution, Indoor Indoor Air PollutionAir Pollution, RadioactiveEnvironmental Pollution Pollution, EnvironmentalTobacco Smoke Pollution Air Pollution, Tobacco Smoke Environmental Pollution, Tobacco Smoke Environmental Smoke Pollution, Tobacco Environmental Tobacco Smoke PollutionWater Pollution Thermal Water Pollution Water Pollution, ThermalWater Pollution, Chemical Chemical Water PollutionWater Pollution, Radioactive
Secondary Source:
Primary Source: ERIC Thesaurus
MeSH
Recycling Ecology Wastes Waste Water Waste disposal
Pollution Air pollution Water pollution Indoor pollution Energy Natural Resources Water Power Conservation Education Attitudes Motivations ……
Next Level: Building a Knowledge Middleware
CORECollections
CORECollections
CORECollections
Visual Interface
ThesaurusA
ThesaurusB
ThesaurusC
Unification
PFNET mapping
Switching Latent Semantic
Crosswalks
RepositoryKnowledge Base
Authoring toolPersonalizedtopic maps
Kohonen Mapping
Semantic Neighborhoods
Knowledge Repository Middleware
Search Engine
Personalization
Conceptual structures
The Knowledge Middleware
A centralized repository that integrates diverse knowledge structures
A set of mapping tools and protocols for crosswalks among various thesauri;
A dynamic knowledge base for semantic neighborhoods that uses term occurrences and co-occurrences
A web-based authoring and editing tool for building personalized topic maps from existing knowledge structures in the repository
A visual search interface for content-base searching with the help of knowledge structures in the repository.
Conclusions
Knowledge Organizing is one of the major challenges of Digital Libraries.
There are increasing demand for formalized (marked up) knowledge.
There are increasing tools and specification for subject access (or knowledge access) to the Web and to Digital libraries.
References Xiao, Y. (1994). Facet Classification: A
consideration of its features as a paradigm of knowledge organization. Knowledge Organization 21(2), pp. 64-68.
Bies, W. (1996). Thinking with the help of images: on the metaphors of knowledge organization. Knowledge Organization 23(1), pp. 3-8.
Huth, M. (1995). Symbolic and sub-symbolic knowledge organization in the computational theory of mind. Knowledge Organization 22(1), 10 - 17.