+ All Categories
Home > Documents > Case Studies in the US National Science Digital Library (NSDL): DL-in-a-box, CITIDEL, OCKHAM...

Case Studies in the US National Science Digital Library (NSDL): DL-in-a-box, CITIDEL, OCKHAM...

Date post: 30-Dec-2015
Category:
Upload: angel-jenkins
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
74
Case Studies in the US National Science Digital Library (NSDL): DL-in-a-box, CITIDEL, OCKHAM ICADL2003, Dec, 8-11, 2003 Kuala Lumpur, Malaysia Edward A. Fox [email protected] CS / DLRL, Virginia Tech, USA http://fox.cs.vt.edu http://www.dlib.vt.edu
Transcript

Case Studies in the US National Science Digital Library (NSDL):DL-in-a-box, CITIDEL, OCKHAM

ICADL2003, Dec, 8-11, 2003Kuala Lumpur, Malaysia

Edward A. Fox [email protected]

CS / DLRL, Virginia Tech, USA

http://fox.cs.vt.edu http://www.dlib.vt.edu

ACKNOWLEDGEMENTS• Helpful sponsorship by many organizations, especially Adobe,

AOL, CONACyT, DFG, FIPSE (US Dept. Education), IBM, Mellon, Microsoft, NSF (IIS-9986089, 0086227, 0080748, 0325579; DUE-0121679, 0136690, 0121741, 0333601), OCLC, SOLINET, SUN, SURA, UNESCO, VTLS, many governments (Australia, Brazil, Germany, India, …), …

• Colleagues at Virginia Tech (faculty, staff, students), and collaborators at many universities– Boots Cassel, Su-Shing Chen, Debra Dudley, Jeremy

Frumkin, Lee Giles, Martin Halbert, Rex Hartson, JAN Lee, Kurt Maly, Gail McMillan, Eric Morgan, Manuel Perez, Layne Watson, …

– Yuxin Chen, Fernando Das Neves, Marcos Goncalves, Rohit Kelapure, Aaron Krowne, Ming Luo, Paul Mather, Ryan Richardson, Rao Shen, Hussein Suleman, Wensi Xi, Baoping Zhang, Qinwei Zhu, …

Outline

• Context• Digital Libraries for Education (DLE)• National Science Digital Library (NSDL)• OAI, ODL, DL-in-a-box• OCKHAM• CITIDEL (incl. GrapeZone, PIPE)• Conclusions

Information Life Cycle

AuthoringModifying

OrganizingIndexing

StoringRetrieving

DistributingNetworking

Retention/ Mining

AccessingFiltering

UsingCreating

Digital Libraries in Education

• Analytical Survey, ed. Leonid Kalinichenko• © 2003, www.iite-unesco.org, [email protected]• Transforming the Way to Learn• DLs of Educational Resources & Services• Integrated/Virtual Learning Environment• Educational Metadata• Current DLEs: US (NSDL, DLESE, CITIDEL,

NDLTD), Europe (Scholnet, Cyclades), UK (Distributed National Electronic Resource)

Digital Libraries in Education - 2

• Advanced Frameworks & Methodologies– Instructional course development with learning

module repositories, Learning Object reuse

– Community organization around DLEs

– Other content for science and research

– Cyberinfrastructure, data grids

– Curriculum-based interfaces (see Krowne et al.)

– Concept-based organization of learning materials and courses (CMs, ontologies)

DLEs: Future Vision (p. 6)

• Global learning environment of the future:• Student-centered• Interactive and dynamic• Enabling group work on real world problems• Enabling students to determine their own

learning routes (styles, personalization)• Supporting lifelong learning

DLEs: Objectives (p. 11)

• Long-range: lifelong/distance/anytime-anywhere• Intermediate goals

– Support for students, teachers, parents

– Enhanced student performance

– More students excited about science

– More Internet-based science educational resources• with increased quality and comprehensiveness,• easy to discover and retrieve,• preserved and universally available

DLEs: Guiding Principles (p. 12)

• Driven by educational and science needs• Facilitating educational innovation• Stable, reliable, permanent• Accessible to all• Leveraging prior research: DL, courseware, …• Adaptable to new technologies• Supporting decentralized services• Resource integration thru tools/organization

“The network is the library.”

NSDL Visioning: Learning Environments andResources Network for STEM Education

NSDL Tracksinclude

CI (CoreIntegration)

ServicesCollections

Research

CITIDEL GetSmart

ConceptMaps

include include

supports

OCKHAM

P2Plibraries

include

supports

Expectations of NSDL ProgramTracks

• Core Integration: coordinate a distributed alliance of resource collection and service providers; and ensure reliable and extensible access to and usability of the resulting network of learning environments and resources

• Collections: aggregate and actively manage a subset of the digital library’s content within a coherent theme / specialty

• Services: increase the impact, reach, efficiency, and value of the digital library in its fully operational form

• Targeted (Applied) Research: have immediate impact on one or more of the other three tracks

Collections

• Discovery of content• Classification and cataloguing• Acquisition and/or linking; referencing• Disciplinary-based themes define a natural body of content,

but other possibilities are also encouraged • Access to massive real-time or archived datasets• Software tool suites for analysis, modeling, simulation, or

visualization• Reviewed commentary on learning materials and pedagogy

Services

• Help services, frequently asked questions, etc.

• Synchronous/asynchronous collaborative learning environments using shared resources

• Mechanisms for building personal annotated digital information spaces

• Reliability testing for applets or other digital learning objects

• Audio, image, and video search capability

• Metadata system translation

• Community feedback mechanisms

NSDL Information ArchitectureEssentially as developed by the Technical Infrastructure Workgroup

referenceditems &

collections

referenceditems &

collections

Special Databases

NSDLServicesNSDL

ServicesOther NSDLServices

CI Services

annotation

CI Services

discussion

CI Services

personalization

CI Services

authentication

CI Services

browsing

Core Services:information retrieval

Core Collection-Building Services

harvesting

Core Collection-Building Services

protocols

Core Services:metadata gathering

Portals &ClientsPortals &

ClientsPortals &Clients

Usage Enhancement

Collection Building

User Interfaces

NSDLCollections

NSDLCollections

NSDLCollections

CoreNSDL“Bus”

OAI, ODL, DL-in-a-box

• Open Archives Initiative– since 1999, www.openarchives.org

• Open Digital Libraries– since 2001, from www.dlib.vt.edu

– with Hussein Suleman (now U. Capetown)

• DL-in-a-box– NSDL support since 2001

– Aimed to help new collections / services projects

– http://dlbox.nudl.org

Open Archives Initiative (OAI)

• Advocacy for interoperability• Standard for transferring metadata among

digital libraries– Protocol for Metadata Harvesting (PMH)

• Simplicity

• Generality

• Extensibility

• Support for PMH => Open Archive (OA)

OAI = Technical Umbrella forPractical Interoperability…

ReferenceLibraries

PublishersE-Print

Archives

…that can be exploited by different communities

Museums

OAI – Repository Perspective

Required: Protocol

DODO DO DO

MDO

MDO MDOMDOMDO

MDOMDOMDO

OAI – Black Box Perspective

OA 1

OA 2

OA 4

OA 3

OA 5OA 6

OA 7

Tiered Model of Interoperability

Mediator services

Metadata harvesting

Document models

DiscoveryCurrent

AwarenessPreservation

Service Providers

Data Providers

Meta

data

harv

estin

g

The World According to OAI

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video

users digital objects

?

?1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video?digital library

Monolithicand/or

Custom-builtweb-basedapplication

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video

componentized digital library

?

?

?

?

???

?

?

?

?

??

? ?

?

?

?

?

?

?

?

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video

open digital library

OA OA

OA

OA

OA

OA

OA

OA

OA

PMH

PMH

XPMH

XPMH

XPMH

XPMH

XPMH

XPMH

XPMH

XPMH

XPMH

XPMH

XPMH

Open Digital Library Protocol

Extended OAI-PMH

Protocol for Metadata Harvesting

Open Digital Library Component

Extended OPEN ARCHIVE

OPENARCHIVE

Open Digital Library Deployments

• NDLTD (www.ndltd.org)• Computer Science Teaching Center

(www.cstc.org)• Computing and Information Technology

Interactive Digital Educational Library (www.citidel.org)

• Open Archives Distributed (NSF, DFG) – enhancements to PhysNet

• OCKHAM• Open to others through DL-in-a-box

Open Digital Library

• Network of Extended Open Archives where each node acts as either a provider of data, services or both.

• Component = Node• Protocol = Arc

Open Digital Library Components

• Running now– XML-File (data provider from file system)– Search: simple or in-memory (Essex) or generalized– Union, browse, recent, filter– E-journal/review, Submit, Edit, Annotation– Recommender, Rating; Mirroring (see JCDL’02)– Working with NCSA: from DB, unstructured text

• Others in process– Classification/categorization– Registry (and other connections with web services)

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

ETD-1

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

ETD-2

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

ETD-3

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

ETD-4

ETD DL for the Networked Digital Library of Theses and Dissertations

(www.ndltd.org)

Search

Filter

Filter

Union

Recent

Browse

PMH

PMH

PMH

ODLRecent

ODLBrowse

ODLUnion

ODLUnion

ODLSearch

ODLUnionPMH

PMH

US

ER

INT

ER

FA

CE

Students and researchers ETD collections

Example Open Digital Library

Harvest from data providers

DBUnion Archive Merger Component

DBBrowse Browse Engine

IRDB-1 Search Engine

As Metadata Search Service Provider

As Metadata Browse Service Provider

XML File Coll. & Data Provider 1

XML File Coll. & Data Provider 2

XML File Coll. & Data Provider 3

Open Digital Library: Extended

What’s NewEngine

As What’s New Service Provider

OAI-PMHData Provider

Submit Archive

OAIB (NCSA:from RDBMS)

Filter

Recommend

RateEngine

AnnotationEngine

IRDB-2 Search Engine

As Annotation Search Service

Provider

As Recommend & Rate Service Provider

New ODL Component: Generalized

Search Platform

CS6604 Client: Patrick Fan, Wensi Xi

Group Member: Ming Luo, Rui Yang, Xiaoyan Yu

Introduction

• Background– The importance of search service in a digital

library– Problems of search engines in DLRL

IRDB Low search effectiveness, insufficient parsing component

ESSEX Less scalability due to in-memory Index

MARIAN Low search efficiency

Algorithms

• Phrase Searching Algorithms– Adjacency of terms

• Ranking Functions– Okapi (baseline)

– GP-based ranking function

Genetic Programming (GP)

• A problem solving system designed based on principles of evolution and heredity

Order Doc. Rele.1 A 12 D 13 F 14 G 15 B 06 C 07 E 0

Order Doc. Rele.1 A 12 B 03 C 04 D 15 E 06 F 17 G 1

Feedback

Training

Data

Input

Ranking FunctionDiscovery

Ranking

Function f

Output

Order Doc. Rele.1 A 12 D 13 F 14 G 15 B 06 C 07 E 0

Order Doc. Rele.1 A 12 B 03 C 04 D 15 E 06 F 17 G 1

Feedback

Training

Data

Input

Order Doc. Rele.1 A 12 B 03 C 04 D 15 E 06 F 17 G 1

Feedback

Training

Data

Input

Ranking FunctionDiscovery

Ranking

Function f

Output

An Example of GP-based RF(log (+ (* df (log (log (* (* (/ n df) (* (* (/ n df) (* (* df_max_Col tf) (+ df_max_Col tf_avg))) (* (/ tf tf_max) (log tf_avg_Col)))) (* (/ (* (* (/ n df) (* (* df_max_Col tf) (+ df_max_Col tf_avg))) (* (/ tf tf_max) (log tf_avg_Col))) (+ (* length df) tf_avg_Col)) (log tf_avg_Col)))))) (+ (* (* df_max_Col tf) (/ (* (* (/ (/ (* tf 6.720) (/ df N)) (* df_max_Col tf)) (* (* tf N) (+ df_max_Col tf_avg))) (* (/ tf tf_max) (log tf_avg_Col))) (+ (* length df) (* (* (/ tf tf_max) (+ (* length df) (* 2.812 1))) tf_avg)))) (+ (/ df tf_avg) tf))))

tf Query term frequency in the document ( vector )

tf_query Query term frequency in the query ( vector )

tf_max The maximum term frequency in a document ( scalar )

Length Document length in the number of words ( scalar )

Length_avg Average document length in the number of words ( scalar )

N Number of documents in the collection ( scalar )

tf_avg Average term frequency in the current document (scalar)

tf_avg_Col Average term frequency for all the documents in the collection ( scalar )

df_max_Col Maximum document frequency for a word in the collection ( scalar )

df Document frequency for the query words ( vector )

tf Query term frequency in the document ( vector )

tf_query Query term frequency in the query ( vector )

tf_max The maximum term frequency in a document ( scalar )

Length Document length in the number of words ( scalar )

Length_avg Average document length in the number of words ( scalar )

N Number of documents in the collection ( scalar )

tf_avg Average term frequency in the current document (scalar)

tf_avg_Col Average term frequency for all the documents in the collection ( scalar )

df_max_Col Maximum document frequency for a word in the collection ( scalar )

df Document frequency for the query words ( vector )

tftf Query term frequency in the document ( vector ) Query term frequency in the document ( vector )

tf_querytf_query Query term frequency in the query ( vector )Query term frequency in the query ( vector )

tf_maxtf_max The maximum term frequency in a document ( scalar )The maximum term frequency in a document ( scalar )

LengthLength Document length in the number of words ( scalar )Document length in the number of words ( scalar )

Length_avgLength_avg Average document length in the number of words ( scalar )Average document length in the number of words ( scalar )

NN Number of documents in the collection ( scalar )Number of documents in the collection ( scalar )

tf_avgtf_avg Average term frequency in the current document (scalar)Average term frequency in the current document (scalar)

tf_avg_Coltf_avg_Col Average term frequency for all the documents in the collection ( scalar )Average term frequency for all the documents in the collection ( scalar )

df_max_Coldf_max_Col Maximum document frequency for a word in the collection ( scalar )Maximum document frequency for a word in the collection ( scalar )

dfdf Document frequency for the query words ( vector )Document frequency for the query words ( vector )

Parser

• Flexibility– TREC Style SGML/HTML

– Configurable tagging

• Abbreviation and number detection• Case sensitive• Phrase parsing

Interface –(I)

1. Receive user query

2. Send query to search engine

3. Get ranked list

4. Search database

5. Get document information

6. Return results to user

Servlet

Socket

JDBC

1

6

Database

4

5

Search Engine

2 3

Interface –(II)

1. Receive user query thru ODL’s XOAI searching protocol

2. Send query to search engine

3. Get ranked list4. Request metadata5. Get metadata6. Return results in

format complying with ODL’s searching protocol

Perl Adaptor

Socket

1

6OAI data provider

4

5

Search Engine

2 3

As an ODL component

OCKHAM Initiative, Contact Info

• Supported by DL Federation, Mellon, NSF, …

• P2P University Network involving:

• Emory, Notre Dame, U. Arizona, Virginia Tech, …

• PI: Martin Halbert Phone 404-727-2204

Email: [email protected]

• OCKHAM URL:

http://ockham.library.emory.edu

The Problem

• Digital library development is complex and expensive.

• Various DL development communities (in the USA at least) are not working together well.

• Results exhibit much incompatibility, little common practice, slow progress, and no leverage on investment.

• If this continues, we are just going to languish and fester.

Lightweight Protocols

• “Lightweight”, or relatively small and simple protocols seem to have clear advantages over “Full” protocols that attempt to be comprehensive.

• Successes of protocols considered lightweight is illuminating.

• Examples: TCP/IP, HTTP, LDAP, and the OAI PMH

Reference Models

• Reference Model: a common vocabulary and description of components, services, and inter-relationships that comprise a system under consideration

• Useful as a tool to foster consensus and common understanding in a time of rapid change and/or disagreement

• Explored in CS6604 class project with 2 focus groups: librarians, education experts

Current Focus: Peer-to-Peer (P2P) Lightweight (Protocol) Reference Models

• Builds on successful example of the OAI PMH, clearly understood minimalist concept of metadata distribution, implemented in simple protocols (e.g., ODL)

• Leads to developing simple reference models of specific subsystems, with associated simple protocols and standards

• Testing in NSDL, connecting university libraries to support teaching & learning

OCKHAM Proposed Services

• Alerting• Browsing• Cataloging• Conversion• OAI – Z39.50• Pathfinding• Registry – prototype in CS6604 now• (plus others such as from adapted ODL)

Computing and Information TechnologyInteractive Digital Educational Library

Technical Development Content Collection

Edward Fox (director)

John A. N. Lee

Manuel Pérez-Quiñones

Community Development

John Impagliazzo

Assessment

Lillian Cassel

Search Engines

C. Lee Giles

CSTC

Deborah Knox

http://www.citidel.org/

CITIDEL -> NSDL

• CITIDEL is a collection project in the:

• US National STEM (science, technolgy, engineering, and mathematics) education Digital Library –

• NSDL (www.nsdl.org)

English

Spanish

Nominated

Editor reviewed

Java

Multimedia

LLaanngguuaaggee TTooppiicc

QQuuaalliittyy

Identified by crawl

Peer reviewed

Algorithms

Multi-dimensional Categorization

CITIDEL: Computing & Information Technology Interactive Digital Education Library

CITIDEL Technology Features•Component architecture (Open Digital Library)

•Re-use and compose re-deployable digital library components.

•Built Using Open Standards & Technologies

•OAI: Used to collect DL Resources and DL Interoperability

•XSL and XML: Interface rendering with multi-lingual community based translation of screens and content (Spanish, …)

•Perl: Component Integration

•ESSEX: Search Engine Functionality

•Very fast, utilizing in-memory processing

•Includes snap-shots for persistence

•Multi-scheming

•Integrates multiple classifications / views through maps, closure

Programming Team DL ProjectLogan Hanks, [email protected]

Mike Scarborough, [email protected]

Stafford Fuller, [email protected]

Problem Description: • VT has multiple programming teams, and has sent a team to the

ACM world finals every year for the past decade. • Each week during the semester, the teams practice using a

problem set from a past regional or international contest. • Each practice generates multiple solutions for each problem. • What is needed is a digital library to collect these solutions and

serve as a reference.

Programming Team DL Project

Deliverables:• Problem statement and solution importer/archiver.• Classification framework for problems and solutions. • Search engine for the DL to locate problems and solutions by

their relevance to a set of classes given as input. • Web interface for browsing problems and solutions as well as

accessing all of the above deliverables. • Integration with CITIDEL.

Requirements: • Importing and classifying problem statements and solutions.• Solutions should be classified based on what algorithms and

methods they use and what problems they solve.• Interface for browsing problem statements and their solutions. • A search engine for finding problem statements or solutions

based on their classifications.

SearchingCITIDEL searching, which is driven by the ESSEX search engine for relevance computation (fast, in-RAM processing with checkpoints), also provides a list of relevant categories within the classification schemes.

Browsing and Searching with FiltersUsers are placed in chosen sub-communities. They can filter results based on these sub-communities. Also there is further customization. Alternatively, users may view all results. Users may set up multiple filters for simple or complex filtering based on many factors such as education level, role, resource type, language, source, and much more. This allows users to get exactly what they are or are not looking for in the digital library. At any time, users are free to disable these filters or see results excluded by them.

Enjoy in GrapeZone

• Derived from Carrot2 project(http://www.cs.put.poznan.pl/dweiss/carrot/index.php/index.xml?lang=en)

• Online GrapeCluster search results from CITIDEL

• Offline GrapeCluster a static collection

Cluster search results from CITIDEL

Cluster a member collection (from a content source) in CITIDEL

• The Computer Science Teaching Center (CSTC)

• NDLTD-Computing

• ACM Digital Library• …

Cluster CSTC

Cluster NDLTD-Computing

Cluster ACM

MOCA Algorithm

PIPE: Personalization by Partial Evaluation

• Interactions at existing web sites are predefined by the site designer

• Personalization is achieved by the designer’s anticipation of users’ expectations

• PIPE allows automatic personalization of a web site without designer anticipation– Recognized with the 2001 New Century

Technology Council Innovation award

CITIDEL + PIPE

• Adds Interaction Personalization to CITIDEL

•Automatically handles multi-modal conversion to Cell phone, PDA, Etc.

•Can be adopted to any digital data set, only requires XML file of content with hierarchy maintained.

PIPE provides Mixed-Initiative Interaction

• Involves an extra specification window (e.g., a toolbar)• system-initiated + user-initiated modes of interaction

Traditional browser: the user merely clicks on available hyperlinks.

PIPE window: the user can type in any information out-of-turn

Can also mix-n-match

Features of PIPE

• Applicable to many information system technologies

• web sites (even third-party)

•Digital Libraries (currently working on CITIDEL integration)• voice-activated systems (e.g., pizza ordering, movie information, and flight reservation services)

• PIPE is available for licensing and is ready for commercialization, through VTIP• PIPE has been featured in IEEE Internet Computing, IEEE IT Professional, and the Appian Web Personalization Report.

PIPE system architecture

Conclusions

• UNESCO analytical survey: DLE in every nation• NSDL as an example,; case studies inside it• OAI -> ODL -> DL-in-a-box -> OCKHAM as

framework for collaboration on services• CITIDEL to highlight NSDL collection efforts

– Many sources for computing resources

– Software deployed from above efforts, refined, and then the results made available for reuse

– Even class projects can lead to useful DL components!


Recommended