+ All Categories
Home > Documents > NCSU Libraries Digital Repository Projects at the North Carolina State University Libraries James...

NCSU Libraries Digital Repository Projects at the North Carolina State University Libraries James...

Date post: 30-Dec-2015
Category:
Upload: tobias-robbins
View: 215 times
Download: 2 times
Share this document with a friend
30
NCSU Libraries Digital Repository Projects at the North Carolina State University Libraries James Jackson Sanborn Jim Tuttle Open Repositories/DSpace User Group ‘07
Transcript

NCSU Libraries

Digital Repository Projects at the

North Carolina State University Libraries

James Jackson Sanborn

Jim Tuttle

Open Repositories/DSpace User Group ‘07

NCSU Libraries

Early Repository Planning

• Digital Repository Planning Committee• What it wouldn’t be (at least to start)

– Distributed community structure– Open submission– ‘Institutional’ Repository

• What it would be (at least to start)

– Library-managed collections– Building block for campus partnership– Learning opportunity

NCSU Libraries

Repository Building Blocks

• NCSU Electronic Theses and Dissertations– Started 1997– Mandatory since 2002– Virginia Tech’s ETDdb– ~3,000 ETDs

• NCSU Authors Database– Started 1995– Access Database/Cold Fusion front-end– ~22,000 citations

NCSU Libraries

Repository Building Blocks (cont’d)

• Technical Reports Print Collection– Campus Institutes and Departments– Massive fall-off in print distribution

• Special Collections Resource Center– Digitized texts and photographs– Campus Newsletters

• GIS Data– Library managed/acquired data collection– Homegrown data layer database/discovery

tools

NCSU Libraries

Repository Plan

• Target ‘Research’ collections first– Technical Reports– ETDs– Faculty Publications/Citations

• Treat each collection as its own project

• Actively pursue common technological solutions

NCSU Libraries

Technical Reports

• DSpace Application

• Lightly Customized

• Library Harvested– Local Cataloging/Metadata database– Scripted Ingest Object Creation– Batch Ingest

• Mix of ongoing submission by institute/departmental personnel and Library capture.

NCSU Libraries

Tech Rep Screenshot

NCSU Libraries

Technical Reports Item Detail

NCSU Libraries

Electronic Theses & Dissertations

• Partnership with Graduate School

• Hybrid System: DSpace and ETD-db– ETD-db submission/approval/management– Direct database extract for DSpace Ingest

Object creation– Scheduled Batch Ingest process

• DSpace Considerations/Alterations– Metadata Mapping– Author Browse (exclude contributor.advisor)– Various interface changes

NCSU Libraries

ETD-DB screenshot

NCSU Libraries

ETD DSpace screenshot

NCSU Libraries

Faculty Publications

• Built on Existing Author Database– Rebuilt Authors DB from Access/ColdFusion

to Oracle/PHP• Re-modeled data• Added Functionality

– OpenURL– ‘Vita-like’ citation display– Full-text or submission links

– Full-text stored in DSpace• Citation metadata and file exported by script• DSpace Identifier currently manually entered

NCSU Libraries

Faculty Publications Schematic

Scholar

Oracle FacultyPublications DB (citations)

Web interface (php)

DSpaceJava/JSP

(full-text only)

Cataloging and Coll. Mgt.

Access

DSpace Item DisplayWeb Submission Form

ISIAnn. Reps

Etc.

View full-text

S+R Citations

Add/Edit data

Handle IDs

SubmitCitations

and/or Text

File System(files)

PostgreSQL(metadata)

NCSU Libraries

FacPubs Search Screen

NCSU Libraries

FacPubs result screenshot

NCSU Libraries

FacPubs Item screenshot

NCSU Libraries

Repository Governance

• Internal– Digital Repository Planning Committee– Data Repository Architect

• External– Faculty Repository Advisory Committee– Partnerships with departments and institutes

NCSU Libraries

NCGDAP: Overview

• NDIIPP: National Digital Information Infrastructure and Preservation Program

• Collaboration with Library of Congress

• 1 of 8 three year projects to study long-term (50+ years) digital preservation

• Objective: engage existing state/federal geospatial data infrastructures in preservation

• Project approaches: Technical and Social

NCSU Libraries

Repository Requirements

• Dim archive with possible future access– minimal IR/access component

• Minimal repository imprint on data– repository agnostic ingest and export

• Simple digital curation functions– Periodic MD5 checksum validation– Structured metadata index

• Expected archived-data exchange• Leverage existing investments• Free Software with active community

NCSU Libraries

Automation: Threat and format analysis, validationPython wrappers for the following:

• Anti-virus – ClamAV

• Compressed files (tar, zip, gzip, bzip)

• At-risk formats

• Executable files (magic numbers)

• Jhove validation

NCSU Libraries

Automation:Archive package organization• ESRI ArcGIS toolbar for selected formats

NCSU Libraries

Automation:Archive package organization• Rule-based python

logic– filestem – extension

relationships ( multi-file format validation)

– directory structure

• Manual intervention• NOID assignment

NCSU Libraries

Metadata:Seed file form• 'Transfer set' metadata capture in 'Seed

file'– communicates with DSpace backend,

generates xml used to inform later scripts

NCSU Libraries

Metadata:Communities and Collections

• Search by type for 100+ communities• Facilitates creation and reduces errors

NCSU Libraries

Curation Processing

• At-risk format migration, original retained

• Agency-specific XML templates in ArcCatalog with synchronization flags

• Provenance and curation metadata scripted

NCSU Libraries

Source Metadata Translation

• Repository agnostic approach

• Spokes for each transformation

• Facilitates export from Dspace into other repositories

• Generate Dspace QDC, METS; populate Workflow database

NCSU Libraries

Extra-repository AIP management

• Workflow Management Database (WMD) populated as a spoke on the metadata/ingest hub

• External tracking of NOID, Handle, ISO keywords, other metadata for interaction with other systems

• Integrates with existing GIS Lookup tool

NCSU Libraries

Repository Architecture Overview

PostgreSQL

repository tomcat instance

Faculty PublicationsPHP/DSpace hybrid

TomcatDSpace Internal

NDIIPP(DSpace)

SCRC(DSpace)

Asset Store/ATABeast

(sub-directory for each DSpace app)

One shared username. Separate database for each

app

Repository(DSpace)•Technical Reports•ETDs

Collections (DSpace)SCRC --Course Catalogs --Green ‘N’ Growing

NCSU Libraries

Upcoming Repository Related Projects

• Enhancements to current system– XTF search interface– Inter-archive exchange

• Digital Collections Repository– Special Collections Research Center– Other non-faculty collections

• Data Repository– Scientific data– Statistical resources

NCSU Libraries

For More Information:

• James Jackson Sanborn– [email protected]

• Jim Tuttle– [email protected]


Recommended