Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | chester-neal |
View: | 218 times |
Download: | 2 times |
Building NSDL on Fedora
Structure of the talk: The Fedora-based NSDL Data Repository
(NDR) and NSDL 2.0 Scaling Challenges Inspiring Contribution and Collaboration -
ExpertVoices Other NSDL 2.0 Services and Tools Q&A
What is the NSDL?
An NSF-funded $20 million/year program in Science, Technology, Engineering and Mathematics (STEM) education
A digital library describing over a million carefully selected online STEM resources from over 100 collections (at http://nsdl.org)
A core integration team (Cornell, UCAR, Columbia) working with 9 “pathways” portals and over 200 NSF grantees
A large community of researchers, librarians, content providers, developers, students, and teachers
Going beyond the card catalog NSDL 1.0: Metadata Repository – Oracle-
based union catalog of metadata records aggregated with OAI-PMH
NSDL 2.0: A library that guides not just resource discovery, but resource selection, use, and contribution Supports creating “context” for resources Presents resources in context: in a lesson plan;
with ratings; correlated with education standards Supports creating a permanent archive of
resources Enables community tools for structuring,
evaluation, annotation, contribution, collaboration
Goal: Create a dynamic, living library
NSDL 2.0: NSDL Data Repository Goals:
Architecture of participation: service-based, not a monolithic application/single user experience
Remixable data sources and data transformations Harnessing (and capturing) collective intelligence A free market of millions of inter-related resources
(create the “long tail”) Two-way data flow: NSDL ↔ users
Solution: Fedora-based NSDL Data Repository
Implementing the NDR with Fedora
Multiple Object Types: Resources (with local or remote content) Metadata Aggregations (collections) Metadata Providers (branding) Agents
RDF relationships that use the Fedora Resource Index to support arbitrary graph queries: Structural (part of) Equivalence Annotation
NSDL RecommenderService
ExampleCollection
NSDL BigBang
NSDL Agent1000
MDP 3000
Aggr2002 M
4002
NSDL Collections
1002
Aggr2005
M4005
NSDLRecom-mended
1005
NSDL RSAgent 1004
MDP 3004
ExampleAgent 10010
MDP 10011
Aggr10012
Aggr2004
M10005
Example.org
10006
pBy
pBy
asWith
asWithpBy
mOf
m4
m4
m4
agg4
mdp4
agg4mdp4
agg4
agg4
1st mOf
asWith
Types of Objects
Agents
Aggregators
Metadata Providers
Resources
Metadata
Types of Relationships
associatedWith (asWith)metadataProviderFor (mdp4)aggregatorFor (agg4)providedBy (pBy)metadataFor (m4)memberOf (mOf)· 1st. A recommended resource· 2nd. Makes it a “blessed” NSDL Collection
2nd mOf
M10007
m4
pBy
mdp4
NSDL FEDORA-BASED REPOSITORY
Draft NDR API Characteristics Uses REST calls for all interactions; uses
handles (DOIs) for all external references Ensures external applications can’t violate
the NDR model constraints Disseminations allow combining metadata
from multiple sources, or related content Authentication: Requests signed with
private key associated with an agent Authorization: Agent can become a
metadata provider or aggregator; can create resources
An Information Network Overlay Think of the NDR as a lens for viewing
science content on the net Content can be:
Local: stored directly in the NDR Remote: accessed through a URL Computed: derived from a database or
web service Archived: an older version stored at SDSC
It all has a repository-based URL
Network Overlay View
User View
API/UI
Repository View with Relations & Annotations
Resources on the Web
The Resource Index First application to build a large-scale
triplestore Initial NDR design (Sept. ’05):
438 triples per object, total of ~600 million Kowari only tested to 200 million triples Redesigned to approx 70 triples/object
Kowari challenges: Memory mapped, requires 64-bit addressing Memory leak in Kowari, fixed by Chris Wilper RI corruption problems – implemented fixes,
instituted best practices, monitoring, and verification before backup
Loading the repository
Fedora initially optimized for quick access (i.e. load/modify not so optimal)
Initial test load of repository (roughly ½ size): over 875,000 metadata records over 2 million digital objects over 163 million RDF triples (lots) Initial test load last December took weeks
– it got slower as the repository got bigger
More Scaling Challenges
OAI provider worked fine for small repositories, but initial queries didn’t scale – redesigned queries
Fedora buffers RI updates, flush very expensive – redesigned API, working on Fedora solution that peeks in buffer
Sockets weren’t being closed quickly enough – fixed
Initial modify times 26 sec/object - fixed
The Good News The NDR has intercepted most of the
scaling arrows Many updates to add multi-threading, fix
threading/concurrency problems Every Fedora API-M operation tuned to
<100ms Result: Overall performance has
improved 1-2 orders of magnitude for many NDR operations
Fedora journaling system to support redundant servers nearly complete
The Fedora team has been highly responsive to every single NDR issue
How should we use the NDR? The NDR provides powerful capabilities
for: Creating context around resources Enabling the NSDL community to directly
contribute resources and context Representing a web of relationships among
science resources and information about those resources
How do we use it? Here’s one specific example …
What is Expert Voices?
A system using blogging technology to: Support STEM conversations among
scientists, teachers and students Tie NSDL resources to real-world science
news Create context for resources to enhance
discovery, selection and use Enable NSDL community members to
become NSDL contributors: of resources, questions, reviews, annotations, and metadata
Broadening Participation: An Expert Voices Learning Scenario
“Hurricane Season Blog” run by a National Weather Service hurricane expert, an Earth Science teacher, and a school media specialist familiar with NSDL resources
Expert creates an entry for Hurricane Gertrude “On track to hit Ft. Lauderdale in 72 hours” “Currently undergoing eyewall replacement cycle” “Expecting 15 foot storm surge”
Media specialist adds links to NSDL resources: Hurricane Hunters site, latest satellite photos, and USGS flooding and flood plain site (storm surge context)
Teacher makes connections to relevant standards and appropriate pedagogy for use by other teachers
Students experience engaging real-time, real-world applications of science lessons
Expert VoicesImplementation
Initial blog system is multi-user WordPress WordPress plug-ins provide NDR
integration and Shibboleth authentication Publication of blog entry creates:
Content, as a new resource with simple metadata
New NDR resources included in entry New metadata for any referenced resources in
content Graph of relationships between entry and all
referenced resources Blog available as independent RSS feed
NDR Entry for Expert Voices
Blog Entry
NewMetadata
NewAudience
MD
ReferencedNew
Resource 1
ReferencedExisting
Resource 2
Annotates
Metadata for
Metadata for
Member ofMetadataProvider
MetadataProvider
ExistingCollection
Topic-basedBlog
Member of
Inferred relationshipbetween resources
NDR Application: OnRamp
A multi-user, multi-project content management system
Built on Fedora – content objects can transition to become NDR resources
Decentralized workflow for the creation and distribution of both simple and complex content – possible first step in general Fedora workflow system
Disseminates content in multiple publication and online forms
Delivery estimated 3Q06
NDR Application: Integrated Wiki Community of approved contributors
(e.g. teachers, librarians, scientists) are granted edit access on OpenNSDL wiki
New resources and metadata are created as wiki pages and reflected into the NDR
Non-wiki-based NDR resources and metadata are displayed as read-only wiki pages, subject to comment and linking
User and project pages organize NDR resources
Other applications in development Automated grade-level assignment based on
vocabulary analysis (SDSC) Educational Standards assignment
(Syracuse) iVia-based Expert-Guided crawl: Tool for
Pathways and others to turn websites into resource collections (UC Riverside)
Automated subject assignment (UC Riverside)
MyNSDL: Bookmark and tag STEM resources within and outside the NSDL (Cornell)
…
NSDL 2.0 Ecosystem
Protocol:OAI-PMHHTTPRESTNDR API
STEMCollections
SearchServiceArchive
Service
Fedora-basedNDR
Summary
Fedora and all its capabilities were essential to the creation of NSDL 2.0: a digital library that allows scientists, teachers, librarians, and students to create a unique web of context, contribution, and collaboration around the high-quality STEM education resources at the core of the NSDL.
The NDR demonstrates that Fedora is a powerful and flexible tool that can scale to a complex repository with millions of dynamic objects.
Acknowledgements
NSDL NSF Program Officers Lee Zia David McArthur
NSDL Core Integration Team UCAR: Kaye Howe, PI and Executive Director Cornell: Dean Krafft, PI Columbia: Kate Wittenberg, PI
Fedora Development Team Cornell: Sandy Payette & Carl Lagoze Univ. of Virginia: Thornton Staples
Contact Information
Dean B. KrafftCornell Information Science301 College Ave.Ithaca, NY [email protected]
This work is licensed under the Creative Commons Attribution-NoDerivs 2.5 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nd/2.5/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.