+ All Categories
Home > Documents > ARROW - Monash University · Swinburne UNSW Monash ARROW Repository Digital Object Storage using...

ARROW - Monash University · Swinburne UNSW Monash ARROW Repository Digital Object Storage using...

Date post: 16-Apr-2018
Category:
Upload: dinhnhi
View: 219 times
Download: 2 times
Share this document with a friend
63
ARROW Update 4 November 2005 The ARROW team
Transcript

ARROW Update4 November 2005

The ARROW team

ARROW Update, November 20052

Welcome to ARROWWhat we will cover

Background and contextDemonstration Issues in populating the repository Putting the pieces togetherWhat is yet to comePracticalities – what you need to implement itPricing and contacts

ARROW Update, November 20053

Why have a repository?Provides a platform for promoting research output in the ARROW contextSafeguards digital informationGathers an institution’s research output into one placeProvides consistent ways of finding similar objectsAllows information to be preserved over the long termAllows information from many repositories to be gathered and searched in one stepEnables resources to be shared, while respecting access constraints (when software allows access controls)Enables effective communication and collaboration between researchers

ARROW Update, November 20054

What you need : Policies to populate

Mandated thesis deposit

Research framework

Incentives

ARROW Update, November 20055

ARROW - Summary of design criteria

A generalised institutional repository solution for research information managementInitial focus on managing and exposing traditional “print equivalent” research outputsExpanded to managing other digital research outputsDesign decisions accommodate management of other digital objects such as learning objects and research inputs such as large data setsDEST Research reporting and audit, and Research Quality Framework likely to drive deposit of content by academics and research managers in ARROW universities

ARROW Update, November 20056

ARROW Branded Services ProfileInternet

ARROW Web Site

Project Information

National Library of Australia

Swinburne

UNSW

Monash

ARROW Repository

Digital Object Storage using Fedora & VITAL

Members only area forMeeting Minutes etc

National Library of Australia

ARROW Resource Discovery Service

Using TeraText to index metadata harvested by OAI PMH

ARROW Open Access Journal Publishing System

Using OJS from Public Knowledge Project

Internet Search Engines

indexing content specifically exposed by by ARROW Repositories

Aust Digital Theses Program

Australian Theses Discovery Service

Using metadata harvested by OAI PMH

Research Management Systems

Sharing descriptive metadata and linking from an RMS to the research publications

ARROW Update, November 20057

ARROW Technology – Software SelectedFlexible Extensible Digital Object Repository Architecture- Fedora™http://fedora.info

Cornell and University of VirginiaARROW a founding member of the Fedora Development Consortium

VITAL from VTLS Inc http://www.vtls.comARROW / VTLS partnership to take the Fedora “engine” and construct a working repository to meet ARROW’s functional requirements usingVITAL and open source web servicesSustainability through vendor support

Open Journal Systems (OJS) from Public Knowledge Project(University of British Columbia) http://www.pkp.ubc.ca/ojs/

for open access journal publishing

ARROW Update, November 20058

Repositories : Open Source Software and Sustainability

The business case for open source software is not clear cut…Red Hat model - “manageable” open source software for a feeNeed for reasonable level of technical expertiseComplete self relianceReliance on a consortium of users of a particular productTotal cost of ownership is difficult to calculateOpen source software is suited as an environment for preservation – no software features are buried in proprietary encodings which compromise the ability to extract contentProprietary software has advantage of support

ARROW Update, November 20059

Open Source criteria

Sustainable as a long-term archiveAllows customisationEnsures migration pathSupports OAI-PMHSupports a range of metadata standards

ARROW Update, November 200510

Why Fedora?ARROW needed something as a platform to develop its own application(s)

ARROW wanted a flexible object-oriented data model

ARROW wanted to be able to have persistent identifiers down to the level of individual datastreams, accommodating its compound content model

ARROW wanted to be able to version both content and disseminators (which can be thought of as software behavioursfor content)

ARROW required clean and open exposure of APIs with well-documented SOAP/REST web services.

Fedora satisfied these requirements

ARROW Update, November 200511

What we have achieved so far

VITAL Manager ingests content and metadata edited externally with XMLSpy , edits existing content/metadata

Web submission for theses (expanding to new resource types)

Batch Ingest matching metadata and digital objects

Institutional search

National search on National Discovery Service

Ability to specify indexing

ARROW Update, November 200512

INGEST

EXPORT WEB ACCESS HARVEST

Synchronising between ARROW and external resources

Validating data, adding evidentiary material

ResearchDEST AuditResearcher citingfor offprints, or for comment, or for refereeing

DEST searchResearcher searchfor topics and research

Migration to 3rd party METS compliant tool, eg new repository, research mgmt tool

DEST reportingInclusion in 3rd party tool eg:Excel, EndnoteAcademic use for CVFaculty production of webpages7

National Discovery

search

Web access to documents

Web access to metadata

Export metadata and docs

Export metadata

only

Vital Client Webform BatchAuthorised Self submit e.g. research mgmt tools

Bibliographic SourcesMetadata only

Metadata + docsMetadata only

Metadata + docs Metadata only

Metadata + docs

ARROW Update, November 200514

ARROW – many different kinds of objectsJPEG Images

http://arrowdev.lib.monash.edu.au/hdl/1959.100/459Or http://hdl.handle.net/1959.100/459Composite weather image of a tropical cyclone Creator NASAhttp://arrowdev.lib.monash.edu.au/hdl/1959.100/457Satellite image of Victoria and Northern Tasmania Creator NASA

MrSid images with navigationag050009http://arrowdev.lib.monash.edu.au/hdl/1959.100/507Victoria Dock, circa 1910 and 1942ag050002http://arrowdev.lib.monash.edu.au/hdl/1959.100/516Victoria Dock, 1972 and 2002

ARROW Update, November 200515

ARROW demonstration

Text and supporting imagesTitle:History Australia, Volume2, No.1, 2004. Ferals and their muddies: Making a home in the bush http://arrowdev.lib.monash.edu.au/hdl/1959.100/418

Text onlyThesis Title:To define the ways in which VITAL will support the creation of a range of branded interfaces, showing either all of the repository or particular subsets http://arrowdev.lib.monash.edu.au/hdl/1959.100/486

ARROW Update, November 200516

ARROW demonstration

XML plus imagesMelbourne 2030: Chapter 5 - Residential infill and its threat to Melbourne's liveabilityhttp://arrowdev.lib.monash.edu.au/hdl/1959.100/283

.

MPEG movieMedical computer animation #20http://arrowdev.lib.monash.edu.au/hdl/1959.100/586

Quicktime movieMedical computer animation #10http://arrowdev.lib.monash.edu.au/hdl/1959.100/566

mp3 audioAsh Grunwald, Bakelite Radio and Blues Progression 5Mbhttp://arrowdev.lib.monash.edu.au/hdl/1959.100/571

AVI movieFantastic Four, movie trailer**WARNING large file 16Mb** DivX codec must be installed first to view.http://arrowdev.lib.monash.edu.au/hdl/1959.100/551

ARROW Update, November 200517

ARROW – a demonstration

Ingest and management

The Web ingest tool allows users to submit and describe documents, while the Web Review tool allows a staged review process

The Vital Manager : a 3rd generation repository application allowing ingest and edit

The batch ingest tool allows ingest of objects as well as their matching metadata

ARROW Update, November 200518

Batch Ingest – power ingest!

A set of command line scripts that uses a series of XML files todefine the import/ingest of large batches of like document typesand their metadata. Complex functionality to master but rewarded great flexibility in ingest options.

Each Batch job requires 3 files. Configuration file. Model Definition file. An XSL Style Sheet to convert the metadata into DC.

ARROW Update, November 200519

Configuration File

Defines the parameters required to configure a batch import

Model definition fileLocation of content filesMatching parameter to link up datastreamsFedora connection informationTest only statusHandles assignment

ARROW Update, November 200520

Model Definition file

Declares object model for the Fedora object created by the batch import

Number of datastreamsMime type of datastreamsStatus of datastreamDatastream LabelDefines Style sheet

ARROW Update, November 200521

XSL Style sheet

Maps the native metadata tags into Dublin CoreDC is used internally in Fedora and is the Metadata payload for the OAI harvest filesCurrently using specific XSL per batch jobFuture use OCLC interoperable core tools

ARROW Update, November 200522

Batch Import Process

Content directory

Location of the ingest target content

Configuration file

Model Definition file

Defines structure of fedora object to be built for each item ingested.

Style Sheet file

XSL Style sheet to convert native metadata into DC

Shell Script ARROW

ARROW Update, November 200523

Software tools - XML editor

Scripts must be edited for each new batch ingestVITAL delivered with sample scripts for common ingest types

thesesImagesxml

Samples can be customisedXML editing

Con: Complex Pro: Very flexible

ARROW Update, November 200524

FTP transfer program

ARROW Update, November 200525

Remote terminal connection service

ARROW Update, November 200526

Test mode

Runs script and does every except assign handles and load into the repositoryShows that

Batch load has processed all filesObject model matches model definitionMetadata and content files are matched

Log files generated for each batch load

ARROW Update, November 200527

Output

Can process from one to one million records* Have successfully tested with many 100’s of filesCan be executed by a “cron job” – overnight runIntend to use Batch Ingest to process approx 1,500 files of Monash contentComplex functionality to master but rewarded with great flexibility in ingest options

*Theoretical fedora upper limit

ARROW Update, November 200528

Partner showcaseAll partners are gathering research content, but they are

also considering other collections for ARROW

MonashPhD theses.

Mandatory collection of all new theses July 2005Retrospective Digitisation of all post 2000 PhD theses.

Indigenous Linguistical Research Resources PortalRelated Monash rare books content digitised and stored in ARROWAccess to material through portal using XACML access control to manage researcher and community access.

Centre for Gippsland Studies Picture Collection1000+ historical images in JPEG2000 format.

ARROW Update, November 200529

Partner showcase

ARROW@UNSWSchool of Biological, Earth & Environmental Sciences (BEES) is trialing the deposit of honours theses by final year students in 2005. School of Mining Engineering Trial: academic and research staff (including postgraduate students) are trialing the depositof research publications until the end of 2005.Centre for Health Informatics

18,000 medical x-ray images

ARROW Update, November 200530

Partner showcase

Swinburne UniversityJournal of Applied Psychology (ejap)Australian Journal of Educational Technology in SocietyInvestigating automatic ingest from OJS into ARROW in co-operation with NLAEntire research output from Institute for Social Research since 2001

National Library of Australia Harold White paper - Independent scholars working papersNLA staff papersArchive of research related emails

ARROW Update, November 200531

Lunch break

ARROW Update, November 200532

ARROW: an Information Management Tool to meet Government reporting requirements

Around 30-40% of Australian university research funding comes from governmentAt present an annual statistical return is required and audit evidence of research outputs is compiled as collections on paperARROW can improve the efficiency of this process as an information management toolIn the Monash context this will capture 4000 publications annually

ARROW Update, November 200533

Research Quality Framework (RQF)Integration with Research Management Systems

(Currently- RM4)Store Research Objects(with associated Metadata)Using -Open Supported StandardsCurrently- METSFuture- XML based formats (MPEG21 DIDL etc)Provide Persistent Links(HANDLES)Provide Secure Access(XACML)Expose Research Digital Objects(Google, National Discovery Service etc)

ARROW Update, November 200534

Gathering EvidenceInternal auditing and analysis

Who?What?Information accuracy?Timing?Ethics?Impact?

Preparing the reportSubmissionAssessmentCommenting

Key Activities

ARROW Update, November 200535

Working with Research Management Systems(Gathering information)

New Research Information

New ResearchObjects

Old Research Information

Old ResearchObjects

ARROW Update, November 200536

Working with Research Management Systems(Recording information)

Research Managementand

Analysis Software

New Research Information

New ResearchObjects

Old Research Information

Old ResearchObjects

ARROW Update, November 200537

Working with Research Management Systems(Additional information)

Research Managementand

Analysis Software

New Research Information

New ResearchObjects

Old Research Information

Old ResearchObjects

Research staff informationResearch students informationResearch grants information, etc

ARROW Update, November 200538

Working with Research Management Systems(Depositing into ARROW)

Research Managementand

Analysis Software

Research Objects + Subset of Research Information

ARROW

New Research Information

New ResearchObjects

Old Research Information

Old ResearchObjects

ARROW Update, November 200539

Working with Research Management Systems(Handles returned)

Research Managementand

Analysis Software

ARROW

Handles

Example:http://arrowdev.lib.monash.edu.au/hdl/1959.100/630

New Research Information

New ResearchObjects

Old Research Information

Old ResearchObjects

ARROW Update, November 200540

Working with Research Management Systems(Evidence Portfolios)

Research Managementand

Analysis Software

ARROWEvidence Submission

RQF

Evidence

Portfolios

RQF

Evidence

Portfolios

RQF

EvidencePortfolios

RQF

Evidence

Portfolios

RQF

Evidence

Portfolios

RQF

EvidencePortfolios

RQF

Evidence

Portfolios

RQF

Evidence

Portfolios

RQF

EvidencePortfolios

New Research Information

New ResearchObjects

Old Research Information

Old ResearchObjects

ARROW Update, November 200541

Working with Research Management Systems(Handles link back to stored Research Objects)

Research Managementand

Analysis Software

ARROW

RQF

Evidence

Portfolios

RQF

Evidence

Portfolios

RQF

EvidencePortfolios

RQF

Evidence

Portfolios

RQF

Evidence

Portfolios

RQF

EvidencePortfolios

RQF

Evidence

Portfolios

RQF

Evidence

Portfolios

RQF

EvidencePortfolios

New Research Information

New ResearchObjects

Old Research Information

Old ResearchObjects

ARROW Update, November 200542

ARROW’s role in supporting The RQF

If no Research Management System, ARROW will enable you to:Store Research Objects(with associated Metadata)Using- Open Supported StandardsCurrently- METSFuture- XML based formats (MPEG21 DIDL etc)Provide Persistent Links(HANDLES)Provide Secure Access(XACML)Expose Research Digital Objects(Google, National Discovery Service etc)

ARROW Update, November 200543

(Recording Research Information)

ARROW

New Research Information

New ResearchObjects

Old Research Information

Old ResearchObjects

ARROW Update, November 200544

(Handles provided and information analysis)

Information compilation, auditing, analysis and report preparation

ARROW

HandlesGather additional information:

Research outputs Research staff informationResearch students informationResearch grants information, etc

New Research Information

New ResearchObjects

Old Research Information

Old ResearchObjects

ARROW Update, November 200545

CollectingNew Research

Information

CollectingOld Research

Information

CollectingOld Research

Objects

CollectingNew Research

Objects

(Reporting Evidence Portfolios)

ARROW

RQF

Evidence

Portfolios

RQF

Evidence

Portfolios

RQF

EvidencePortfolios

RQF

Evidence

Portfolios

RQF

Evidence

Portfolios

RQF

EvidencePortfolios

RQF

Evidence

Portfolios

RQF

Evidence

Portfolios

RQF

EvidencePortfolios

RQF Reporting(submission)

ARROW Update, November 200546

CollectingNew Research

Information

CollectingOld Research

Information

CollectingOld Research

Objects

CollectingNew Research

Objects

(Linked back to your Research Objects)

ARROW

Handles

RQF

Evidence

Portfolios

RQF

Evidence

Portfolios

RQF

EvidencePortfolios

RQF

Evidence

Portfolios

RQF

Evidence

Portfolios

RQF

EvidencePortfolios

RQF

Evidence

Portfolios

RQF

Evidence

Portfolios

RQF

EvidencePortfolios

RQF Reporting

ARROW Update, November 200547

RQF Assessment by DEST(DEST collection )

DEST RQF

Collection

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

ARROW Update, November 200548

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF Assessment(DEST informs Assessment Panels)

RQFASSESSMENT

Panel

RQFASSESSMENT

Panel

RQFASSESSMENT

Panel

DEST RQF

Collection

ARROW Update, November 200549

RQF Assessment(Panels View Evidence)

RQFASSESSMENT

Panel

RQFASSESSMENT

Panel

RQFASSESSMENT

Panel

DEST RQF

Collection

Handles enable viewing of Research Digital Objects in Digital RepositoriesRQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

review

review

review

ARROW Update, November 200550

RQF Assessment(Panels Comment on Evidence)

RQFASSESSMENT

Panel

RQFASSESSMENT

Panel

RQFASSESSMENT

Panel

DEST RQF

Collection

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

review

review

review

Annotations

Annotations

Annotations

ARROW Update, November 200551

RQF Assessment(Viewing annotations)

RQFASSESSMENT

Panel

RQFASSESSMENT

Panel

RQFASSESSMENT

Panel

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

RQF

EvidencePortfolios

Annotations

Annotations

Annotations

DEST RQF

Collection

ARROWUSERS

ARROW Update, November 200552

In SummaryARROW supports the RQF

Storing ResearchExposing ResearchAllowing commenting and annotations

ARROW Update, November 200553

To come…Soon

Interface with Research Master and other research management toolsWeb ingest for other content types – alpha software now being tested at:http://arrowdev.lib.monash.edu:8000/cgi-bin/valet-1.1/submit.cgiEnhanced user interface with browse capabilities

Early 2006XACML Access control at Object and datastream levelsSupport for OAI Sets for metadata harvesting

Mid 2006Generalised content model managementMetadata interoperability

ARROW Update, November 200554

What you need: Infrastructure

Operating Platform

Processor Speed

Memory (Minimum Requirement)

Memory (Recommended Requirement)

Hard Disk Space

For typical 1000 object collection Max # of Users

Windows 2 GHz 1 GB RAM 2 GB RAM 20 GB 128 (min config) 256 (max config)

Red Hat Linux Enterprise AS

2 GHz 1 GB RAM 2 GB RAM 20 GB 128 (min config)256 (max config)

Solaris 650 MHz 512 MB RAM 1 MB RAM 20 GB 64 (min config)128 (max config)

Recommended Server SoftwareJava Virtual Machine Runtime: JRE 1.4.2Databases: Oracle, MySQL, McKoiWeb Server: Apache 1.3.22+

Recommended Server hardware

ARROW Update, November 200555

What you need: Staff, training, knowledge

Technical

Administrative

Marketing and advocacy

Training

Support- ARROW Users Group and Forum

ARROW Update, November 200556

Contracts and licensingMonash is the lead institution, and has a Head License Agreement with VTLS

Monash arranges a sublicense to (new) Additional Partners

Additional Partners enter into a separate maintenance agreement directly with VTLS for support of the Software

Monash coordinates software development

VTLS installs, trains and supports

System and training documentation are made available by VTLS

An ARROW users group will be established

ARROW Update, November 200557

Pricing

Vital

XMLSpy

Others

ARROW Update, November 200558

Appendices

ARROW Update, November 200559

How the ARROW components fit together

Fedora: the open source storage layer

VTLS: commercial and open source application layer

Handles: a universal persistent identifier

Content models: a consistent way of describing resource types

Templates: a method for creating valid MARCXML metadata

Standards: MARCXML, Dublin Core, METS

Collections: explicit and results sets

Exchange: Relationships with RM4, EndNote etc….

ARROW Update, November 200560

Example of a MARCXML template using XMLSpy

ARROW Update, November 200561

Compound vs. atomistic object model

Rels-Ext

Rels-Int

Content

Content Content

ContentAtomistic – a data object with one or more content datastreams that are all considered primary to the object.

Compound – a data object consisting of multiple content datastreams that are not all primary to the object.

ContentContent

Content

Content

ARROW Update, November 200562

Arrow has chosen the compound object model

SM1: System Meta

Fedora PID

OMS1:Object metadata

DS1:ExternalUniqueID

CS1:Pub Body 1

CMS1:Body 1 metadata

CS2: Pub Body 2

CS4:WebPages

CMS3:Images metadata

CS3: Images

CS6:Bibliography

CMS4: Web metadata

CMS6:Bibliog metadata

CMS7: Evidence metadata

RELS-INT: RELS-INT

DC: Dublin Core

CMS2: Body 2 meta

CS5: Multimedia

CMS5: multimedia metad

CS17: Evidence

DS18: Native Metadata

Compound object

Each object in the repository comprises two or more datastreams.

One object may contain many different kinds of files.

ARROW Update, November 200563

OCLC Metadata Interoperability Core

From: Godby, Smith and Childress. 2003. “Two paths to interoperable metadata” p. 3 at

http://www.oclc.org/research/publications/archive/2003/godby-dc2003.pdf


Recommended