+ All Categories
Home > Documents > Building a National Science Digital Library on Fedora Dean Krafft, Cornell University...

Building a National Science Digital Library on Fedora Dean Krafft, Cornell University...

Date post: 31-Dec-2015
Category:
Upload: chester-neal
View: 218 times
Download: 2 times
Share this document with a friend
35
Building a National Science Digital Library on Fedora Dean Krafft, Cornell University [email protected]
Transcript

Building a National Science Digital Library on Fedora

Dean Krafft, Cornell [email protected]

Building NSDL on Fedora

Structure of the talk: The Fedora-based NSDL Data Repository

(NDR) and NSDL 2.0 Scaling Challenges Inspiring Contribution and Collaboration -

ExpertVoices Other NSDL 2.0 Services and Tools Q&A

What is the NSDL?

An NSF-funded $20 million/year program in Science, Technology, Engineering and Mathematics (STEM) education

A digital library describing over a million carefully selected online STEM resources from over 100 collections (at http://nsdl.org)

A core integration team (Cornell, UCAR, Columbia) working with 9 “pathways” portals and over 200 NSF grantees

A large community of researchers, librarians, content providers, developers, students, and teachers

Going beyond the card catalog NSDL 1.0: Metadata Repository – Oracle-

based union catalog of metadata records aggregated with OAI-PMH

NSDL 2.0: A library that guides not just resource discovery, but resource selection, use, and contribution Supports creating “context” for resources Presents resources in context: in a lesson plan;

with ratings; correlated with education standards Supports creating a permanent archive of

resources Enables community tools for structuring,

evaluation, annotation, contribution, collaboration

Goal: Create a dynamic, living library

NSDL 2.0: NSDL Data Repository Goals:

Architecture of participation: service-based, not a monolithic application/single user experience

Remixable data sources and data transformations Harnessing (and capturing) collective intelligence A free market of millions of inter-related resources

(create the “long tail”) Two-way data flow: NSDL ↔ users

Solution: Fedora-based NSDL Data Repository

Implementing the NDR with Fedora

Multiple Object Types: Resources (with local or remote content) Metadata Aggregations (collections) Metadata Providers (branding) Agents

RDF relationships that use the Fedora Resource Index to support arbitrary graph queries: Structural (part of) Equivalence Annotation

NSDL RecommenderService

ExampleCollection

NSDL BigBang

NSDL Agent1000

MDP 3000

Aggr2002 M

4002

NSDL Collections

1002

Aggr2005

M4005

NSDLRecom-mended

1005

NSDL RSAgent 1004

MDP 3004

ExampleAgent 10010

MDP 10011

Aggr10012

Aggr2004

M10005

Example.org

10006

pBy

pBy

asWith

asWithpBy

mOf

m4

m4

m4

agg4

mdp4

agg4mdp4

agg4

agg4

1st mOf

asWith

Types of Objects

Agents

Aggregators

Metadata Providers

Resources

Metadata

Types of Relationships

associatedWith (asWith)metadataProviderFor (mdp4)aggregatorFor (agg4)providedBy (pBy)metadataFor (m4)memberOf (mOf)· 1st. A recommended resource· 2nd. Makes it a “blessed” NSDL Collection

2nd mOf

M10007

m4

pBy

mdp4

NSDL FEDORA-BASED REPOSITORY

Draft NDR API Characteristics Uses REST calls for all interactions; uses

handles (DOIs) for all external references Ensures external applications can’t violate

the NDR model constraints Disseminations allow combining metadata

from multiple sources, or related content Authentication: Requests signed with

private key associated with an agent Authorization: Agent can become a

metadata provider or aggregator; can create resources

NDR Architecture

An Information Network Overlay Think of the NDR as a lens for viewing

science content on the net Content can be:

Local: stored directly in the NDR Remote: accessed through a URL Computed: derived from a database or

web service Archived: an older version stored at SDSC

It all has a repository-based URL

Network Overlay View

User View

API/UI

Repository View with Relations & Annotations

Resources on the Web

Scaling Challenges

“You can tell the pioneers by the arrows intheir backs”

The Resource Index First application to build a large-scale

triplestore Initial NDR design (Sept. ’05):

438 triples per object, total of ~600 million Kowari only tested to 200 million triples Redesigned to approx 70 triples/object

Kowari challenges: Memory mapped, requires 64-bit addressing Memory leak in Kowari, fixed by Chris Wilper RI corruption problems – implemented fixes,

instituted best practices, monitoring, and verification before backup

Loading the repository

Fedora initially optimized for quick access (i.e. load/modify not so optimal)

Initial test load of repository (roughly ½ size): over 875,000 metadata records over 2 million digital objects over 163 million RDF triples (lots) Initial test load last December took weeks

– it got slower as the repository got bigger

More Scaling Challenges

OAI provider worked fine for small repositories, but initial queries didn’t scale – redesigned queries

Fedora buffers RI updates, flush very expensive – redesigned API, working on Fedora solution that peeks in buffer

Sockets weren’t being closed quickly enough – fixed

Initial modify times 26 sec/object - fixed

The Good News The NDR has intercepted most of the

scaling arrows Many updates to add multi-threading, fix

threading/concurrency problems Every Fedora API-M operation tuned to

<100ms Result: Overall performance has

improved 1-2 orders of magnitude for many NDR operations

Fedora journaling system to support redundant servers nearly complete

The Fedora team has been highly responsive to every single NDR issue

How should we use the NDR? The NDR provides powerful capabilities

for: Creating context around resources Enabling the NSDL community to directly

contribute resources and context Representing a web of relationships among

science resources and information about those resources

How do we use it? Here’s one specific example …

ExpertVoices

What is Expert Voices?

A system using blogging technology to: Support STEM conversations among

scientists, teachers and students Tie NSDL resources to real-world science

news Create context for resources to enhance

discovery, selection and use Enable NSDL community members to

become NSDL contributors: of resources, questions, reviews, annotations, and metadata

Broadening Participation: An Expert Voices Learning Scenario

“Hurricane Season Blog” run by a National Weather Service hurricane expert, an Earth Science teacher, and a school media specialist familiar with NSDL resources

Expert creates an entry for Hurricane Gertrude “On track to hit Ft. Lauderdale in 72 hours” “Currently undergoing eyewall replacement cycle” “Expecting 15 foot storm surge”

Media specialist adds links to NSDL resources: Hurricane Hunters site, latest satellite photos, and USGS flooding and flood plain site (storm surge context)

Teacher makes connections to relevant standards and appropriate pedagogy for use by other teachers

Students experience engaging real-time, real-world applications of science lessons

Expert VoicesImplementation

Initial blog system is multi-user WordPress WordPress plug-ins provide NDR

integration and Shibboleth authentication Publication of blog entry creates:

Content, as a new resource with simple metadata

New NDR resources included in entry New metadata for any referenced resources in

content Graph of relationships between entry and all

referenced resources Blog available as independent RSS feed

NDR Entry for Expert Voices

Blog Entry

NewMetadata

NewAudience

MD

ReferencedNew

Resource 1

ReferencedExisting

Resource 2

Annotates

Metadata for

Metadata for

Member ofMetadataProvider

MetadataProvider

ExistingCollection

Topic-basedBlog

Member of

Inferred relationshipbetween resources

But Expert Voices is just the beginning…

NDR Application: OnRamp

A multi-user, multi-project content management system

Built on Fedora – content objects can transition to become NDR resources

Decentralized workflow for the creation and distribution of both simple and complex content – possible first step in general Fedora workflow system

Disseminates content in multiple publication and online forms

Delivery estimated 3Q06

NDR Application: Integrated Wiki Community of approved contributors

(e.g. teachers, librarians, scientists) are granted edit access on OpenNSDL wiki

New resources and metadata are created as wiki pages and reflected into the NDR

Non-wiki-based NDR resources and metadata are displayed as read-only wiki pages, subject to comment and linking

User and project pages organize NDR resources

Other applications in development Automated grade-level assignment based on

vocabulary analysis (SDSC) Educational Standards assignment

(Syracuse) iVia-based Expert-Guided crawl: Tool for

Pathways and others to turn websites into resource collections (UC Riverside)

Automated subject assignment (UC Riverside)

MyNSDL: Bookmark and tag STEM resources within and outside the NSDL (Cornell)

NSDL 2.0 Ecosystem

Protocol:OAI-PMHHTTPRESTNDR API

STEMCollections

SearchServiceArchive

Service

Fedora-basedNDR

Summary

Fedora and all its capabilities were essential to the creation of NSDL 2.0: a digital library that allows scientists, teachers, librarians, and students to create a unique web of context, contribution, and collaboration around the high-quality STEM education resources at the core of the NSDL.

The NDR demonstrates that Fedora is a powerful and flexible tool that can scale to a complex repository with millions of dynamic objects.

Acknowledgements

NSDL NSF Program Officers Lee Zia David McArthur

NSDL Core Integration Team UCAR: Kaye Howe, PI and Executive Director Cornell: Dean Krafft, PI Columbia: Kate Wittenberg, PI

Fedora Development Team Cornell: Sandy Payette & Carl Lagoze Univ. of Virginia: Thornton Staples

Questions?

Contact Information

Dean B. KrafftCornell Information Science301 College Ave.Ithaca, NY [email protected]

This work is licensed under the Creative Commons Attribution-NoDerivs 2.5 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nd/2.5/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.


Recommended