Architecting an Extensible Digital Repository
Anoop Kumar, Ranjani Saigal,Rob Chavez, Nikolai Schwertner
Tufts University, Medford, MA
Overview
Background Information on the evolution of TDL
Design Requirements TDL Architecture Applications that interface with TDL
– Tufts DL search– VUE
History of Digital Collections at Tufts
About Tufts – Interdisciplinary– Focus on teaching and learning
Digital Collections at Tufts– Perseus (Classics)– Tufts University Science Knowledgebase (TUSK-Medicine)– Artifact (Art History)– Digital Collections and Archives (DCA)
Bolles, etc– Other (Crime and Punishment)
Projects Materials Tools
Perseus DL 50 million words, highly structured TEI encoded XML texts of many types.
50,000 images
Perseus document management system and tools
DCA 13 million words,
35,000 images,
geospatial datasets
multimedia objects
Perseus document management system and tools
TUSK 15,000 documents
Includes full-text syllabi, digital slide images, lecture recordings (audio and video) and text notes and exam questions, evaluation forms, and bibliographies linked to full-text articles.
Networked course management system interface
Artifact 2500 images links to the Art History slide collection database containing 120,000 entries.
On-demand viewing and searching with Internet-based adaptations of traditional learning aids, such as flashcards, for review and study
Why TDL?(Tufts Digital Library)
The collections were continuously expanding adding content in a variety of formats. The architecture of these libraries was not built to accommodate such expansion.
Needed a university wide digital repository that can manage the ever increasing content while continuing to service the discipline specific needs and leveraging existing and new tools and service
Designing TDL
Digital Collections and Archives partnered with Academic Technology to create a digital library that can manage the content while supporting teaching and learning.
Commitment to comply with standards in the library and the open source community.
Ensure Scalability, Flexibility, Reusability, Extensibility and Interoperability
Design Requirements
Ingest: – Ability to enforce archival
standards Management:
– Use of information packages to facilitate storage and dissemination
– Ability to incorporate content models
Persistence:– Use of persistent identifiers– mapped URNs
Requirements System Services
Unique and persistent identification of materials
Naming Service
Use of archival information packages (AIP)
Digital Object Provider (DOP) Service -- Fedora
Use of submission information Packages (SIP)
Drop Box, Ingestion Service
Use of Dissemination Information Packages (DIP)
DOP Service
Authentication and integrity checking
DOP Service
Dissemination Disseminators, Caching Service, Digital Library Application, Search Service
Access Search Service and other applications
Tufts DL Architecture
Fedora
Drop Box
FedoraIngestionService
ApplicationCreationService
Search IndexingService
Naming ServiceSearch Index
SearchInterface
ApplicationData
ApplicationInterface
FedoraClient
M
U
U
AA
U - UsersM - ManagerA - Administrators
Components of TDL
Component Role
Drop Box and Ingestion Service
Validation, Tagging, Preprocessing, Ingestion
Naming Service Unique persistent identifiers mapped to objects (“tufts:dca:central:MS102.33.1345”)
Fedora Repository Management and access framework for digital objects
Search and Indexing Service
Provides search mechanism
Application Creation Service
Provides mechanism for external applications to interface with repository
TDL Architecture
Drop Box and Ingestion Service Naming Service Fedora Repository Service at Tufts Indexing Service and Search Engine Application Creation Service
Drop Box and Ingestion Service
TDL Architecture
Drop Box and Ingestion Service Naming Service Fedora Repository Service at Tufts Indexing Service and Search Engine Application Creation Service
Naming Service
Assigns, reserves and resolves URNs URN Format
tufts:school name:owner:[collection:]item name
tufts:dca:central:MS102.33.1345 URN Properties
– Provides unique ID to objects deposited into repository
– Service assures resolution to unique resource.
TDL Architecture
Drop Box and Ingestion Service Naming Service Fedora Repository Service at Tufts Indexing Service and Search Engine Application Creation Service
Fedora Repository Service@Tufts
Fedora - Key Features Repository at Tufts Content Models at Tufts
– Objects, Behaviors and Disseminator
Implementation Challenges
Flexible Extensible Data Object Repository Architecture (Fedora)
Support for heterogeneous data types Accommodation of new types as they emerge Aggregation of mixed, possibly distributed, data into
complex objects The ability to specify multiple content disseminations
of these objects The ability to associate rights management schemes
with these disseminations.
StorageDevice
High Bandwidth
(20Mb TIFF)HTTP Request
Medium Bandwidth(20Mb TIFF)
HTTP
(200Kb JPEG)
Medium Bandwidth
Request
Caching Service
Fedora
ProcessingService
HTTPServer
store
s URLs
for
User
Applicati
ons
(200Kb JPEG)
Internet Bandwidth
HTTP Request
Repository Model
Content Model (CM) Hierarchy
Specific Implementations
(TEI text, EAD text, Encyclopedia, Directory, TIFF image, etc)
Text CM
•getTOC
•getChunksList
•getChunk
•Etc.
Image CM
•getThumbnail
•getAccessHigh
•getImageStats
•Etc.
Binary CM
•getObject
•getMIME
•Etc.
Collection CM
•getObjects
•getInfo
•Etc.
VUE CM
•getConceptMap
•getResource
•Etc.
Indexing Disseminators
•getIndexTerms
•getForIndexing
•Etc.
Repository-Level Disseminators
•getArchivalCopy
•getPreview
•getClass
•Etc.
Implementation Challenges
Processing Large XML Documents Transforming Large Images Modeling Collections Advanced Search Customized Search Caching Disseminations
TDL Architecture
Drop Box and Ingestion Service Naming Service Fedora Repository Service at Tufts Indexing Service and Search Engine Application Creation Service
Indexing Service and Search Engine
Indexing– Specialized Polymorphic Disseminators
Implementation– Lucene
Supported Types of Search– Basic Keyword– Advanced metadata based
Accessing the service – HTTP GET/POST– SOAP
TDL Architecture
Drop Box and Ingestion Service Naming Service Fedora Repository Service at Tufts Indexing Service and Search Engine Application Creation Service
Application Creation Service
An important design requirement for TDL was to allow current digital library applications to easily interface with TDL and provide access to the content in the digital library within their own environments in a seamless fashion.
Current applications like Perseus can interface with this service to allow their tools to disseminate the content that resides in TDL
The service has been designed not only to support current application but also to accommodate the needs of future yet-to-be-defined applications like course management systems, learning tools, portals etc.
Applications Accessing TDL Content
Tufts DL Search Visual Understanding Environment (VUE)
Visual Understanding Environment (VUE)
VUE
OKI
FEDORA
DRAPI
DigitalRepository
OKI-FEDORA Bridge
Technical Infrastructure
DR Implementations
DigitalRepository
VUE Architecture
Why TDL?(Tufts Digital Library)
The collections are continuously expanding adding content in a variety of formats. The current architecture of these libraries is not built to accommodate such expansion.
Need a university wide digital repository that can manage the ever increasing content while continuing to service the discipline specific needs and leveraging existing and new tools and service
Future Direction
Authentication and authorization service Customization and enhancement to
Fedora@Tufts to address a wide variety of needs.
Provide automated browsing service for Repository.