Digital Repository Service (DRS) Harvard University Library OIS presented by: Wendy Gogel & Andrea...

Post on 23-Dec-2015

218 views 3 download

Tags:

transcript

Digital Repository Service (DRS)

Harvard University Library OISpresented by:

Wendy Gogel &Andrea Goethals

Today’s Agenda April 26, 2010

How DRS began Building the collections Where DRS is now Where DRS is going

How DRS began

The formative years

November 1997: Library Digital Initiative (LDI) Proposal

“…create the first-generation technical infrastructure to support storage of and access to digital library materials.”

July 1998: LDI was approved and funded December 1998: planning for DRS began

October 2000 launch

Digital Repository Service (DRS) provides a set of professionally

managed services to ensure the usability of securely stored digital objects over time.

is both a preservation and an access repository 

DRS is … Technical infrastructure

Deposit tools Delivery services Management tools Storage system

People Technical expertise and advice Content and system monitoring and management Preservation planning and activities User support and guidance

Content

Building the collections

Content

Programs & Projects LDI Internal Challenge Grant Program

1999-2007 Harvard Art Museum inventory project

2005-2009 Open Collections Program

2002-2010 Google Books project

2005 - 2009 Web Archiving

2007 - ongoing

Content

Digitizing Facilities Harvard College Library Imaging Services HCL Fine Arts Library Digital Imaging Lab Harvard Art Museum Digital Imaging and

Visual Resources Harvard College Library Audio Preservation

Services Peabody Museum of Archaeology and

Ethnology

Metadata

Metadata

Audio

Matins for Sunday after the Elevation of the Holy Cross

Laura Boulton (1899-1980) Collection of Byzantine and Orthodox MusicsArchive of World Music

One of a series of Byzantine hymns and liturgies recorded in a monastery on Patmos, 1960.

Logbook (Part I, p. 1-10)

Where DRS is now

DRS by the numbers

109 TB of content 356 TB total (counting all copies)

15 M files Includes compressed archives - in reality

closer to 707 M files 857,000 compressed Google books

containing 676 M files 7,300 compressed web harvests

containing 17.5 M web files

Format distribution: file count

ZIP8%

TIFF16%

JPEG16%

TEXT19%

JP240%

image/jp2

text/plain

image/jpeg

image/tiff

application/zip

text/xml

audio/x-wave

application/x-gzip

image/x-photo-cd

audio/x-pn-realaudio

application/pdf

audio/x-aiff

application/x-icc

image/gif

Format distribution: file size

ZIP53%

TIFF26%

JP216%

WAVE3%

JPEG2%

application/zip

image/tiff

image/jp2

audio/x-wave

image/jpeg

application/x-gzip

audio/x-pn-realaudio

audio/x-aiff

image/x-photo-cd

text/plain

text/xml

application/pdf

application/x-icc

image/gif

Content Growth

0

20

40

60

80

100

120

2000-10

2001-02

2001-06

2001-10

2002-02

2002-06

2002-10

2003-02

2003-06

2003-10

2004-02

2004-06

2004-10

2005-02

2005-06

2005-10

2006-02

2006-06

2006-10

2007-02

2007-06

2007-10

2008-02

2008-06

2008-10

2009-02

2009-06

2009-10

2010-02

TB

DRS Architecture

TCP/IP

NFS

Metadata Storage

Database

DRS Web Admin Tools

Delivery ServicesIngest Services

Consistency Validation Service Content Storage Service

DRS Architecture

Disk archive (High use, copy 1)

Site 2 Boston

Site 1 Cambridge

Disk archive (High use, copy 2)

Disk archive (Low use. copy 1)

Tape archive (High use, copy 3)Tape archive (Low use, copy 2)

Media only

Tape archive (High use, copy 4)Tape archive (Low use, copy 3)

Site 3 Westborough

TCP/IP

NFS

Load BalancedDelivery Services

Metadata Storage

Database

DRS Web Admin Tools

Load BalancedDelivery Services

DRS Loader

Catalogs – Web Sites - Google

Access Management

Service

Name Resolution Service

SFTP Drop

Boxes

Consistency Validation Service

BatchBuilder

SAM/QFS

DepositorsWeb Archiving

Service

DRS third-party componentsOpen source software Castor (XML to Java

mapping) XX XML Validator Java Swing (U/I toolset) iText PDF creator JHOVE Apache Lucene, Struts,

Tomcat JQUERY (javascript tools) Apache Log4j (logging) Giffy (Tiff-to-Gif Converter) XML tools (Xerces, Xalan,

JaxB, JDOM) Linux

COTS software Luratech Image Server Real Media Helix Streaming

Server Oracle database SUN Solaris SUN SAM/QFS Storage Archive

Manager

Common OIS software Access Management Service Name Resolution Server

Open Source Berkeley DB

Where DRS is going

DRS 2

Why?1. To better support digital preservation

planning & activities2. To better support operational &

collection management needs of DRS depositors, collection managers, library administrators & repository staff

DRS 2 Process

Phases of work DRS 2.1, 2.2, 2.3, etc.

Themed phases DRS 2.1: “Object Security and Integrity” DRS 2.2: “Management and Monitoring” DRS 2.3: “Delivery and Dissemination”

Includes support for new formats DRS 2.1: PDFs, opaque objects DRS 2.2: more audio formats (MP3, MP4/AAC) DRS 2.3: drawings, dissemination

DRS 2 Deliverables

New backend New deposit tools New management tools New dissemination tools Support for new formats

DRS 2 Timing

July 2010, (for testing): New backend New deposit environment (5 object types)

August 2011, release: New backend New deposit environment (16 object types) New management interface Enhanced rights metadata Enhanced audio support Migrated content

DRS 2 Timing

Spring 2012, release: Dissemination Support for drawing formats

Questions?