ARROW Update, November 20052
Welcome to ARROWWhat we will cover
Background and contextDemonstration Issues in populating the repository Putting the pieces togetherWhat is yet to comePracticalities – what you need to implement itPricing and contacts
ARROW Update, November 20053
Why have a repository?Provides a platform for promoting research output in the ARROW contextSafeguards digital informationGathers an institution’s research output into one placeProvides consistent ways of finding similar objectsAllows information to be preserved over the long termAllows information from many repositories to be gathered and searched in one stepEnables resources to be shared, while respecting access constraints (when software allows access controls)Enables effective communication and collaboration between researchers
ARROW Update, November 20054
What you need : Policies to populate
Mandated thesis deposit
Research framework
Incentives
ARROW Update, November 20055
ARROW - Summary of design criteria
A generalised institutional repository solution for research information managementInitial focus on managing and exposing traditional “print equivalent” research outputsExpanded to managing other digital research outputsDesign decisions accommodate management of other digital objects such as learning objects and research inputs such as large data setsDEST Research reporting and audit, and Research Quality Framework likely to drive deposit of content by academics and research managers in ARROW universities
ARROW Update, November 20056
ARROW Branded Services ProfileInternet
ARROW Web Site
Project Information
National Library of Australia
Swinburne
UNSW
Monash
ARROW Repository
Digital Object Storage using Fedora & VITAL
Members only area forMeeting Minutes etc
National Library of Australia
ARROW Resource Discovery Service
Using TeraText to index metadata harvested by OAI PMH
ARROW Open Access Journal Publishing System
Using OJS from Public Knowledge Project
Internet Search Engines
indexing content specifically exposed by by ARROW Repositories
Aust Digital Theses Program
Australian Theses Discovery Service
Using metadata harvested by OAI PMH
Research Management Systems
Sharing descriptive metadata and linking from an RMS to the research publications
ARROW Update, November 20057
ARROW Technology – Software SelectedFlexible Extensible Digital Object Repository Architecture- Fedora™http://fedora.info
Cornell and University of VirginiaARROW a founding member of the Fedora Development Consortium
VITAL from VTLS Inc http://www.vtls.comARROW / VTLS partnership to take the Fedora “engine” and construct a working repository to meet ARROW’s functional requirements usingVITAL and open source web servicesSustainability through vendor support
Open Journal Systems (OJS) from Public Knowledge Project(University of British Columbia) http://www.pkp.ubc.ca/ojs/
for open access journal publishing
ARROW Update, November 20058
Repositories : Open Source Software and Sustainability
The business case for open source software is not clear cut…Red Hat model - “manageable” open source software for a feeNeed for reasonable level of technical expertiseComplete self relianceReliance on a consortium of users of a particular productTotal cost of ownership is difficult to calculateOpen source software is suited as an environment for preservation – no software features are buried in proprietary encodings which compromise the ability to extract contentProprietary software has advantage of support
ARROW Update, November 20059
Open Source criteria
Sustainable as a long-term archiveAllows customisationEnsures migration pathSupports OAI-PMHSupports a range of metadata standards
ARROW Update, November 200510
Why Fedora?ARROW needed something as a platform to develop its own application(s)
ARROW wanted a flexible object-oriented data model
ARROW wanted to be able to have persistent identifiers down to the level of individual datastreams, accommodating its compound content model
ARROW wanted to be able to version both content and disseminators (which can be thought of as software behavioursfor content)
ARROW required clean and open exposure of APIs with well-documented SOAP/REST web services.
Fedora satisfied these requirements
ARROW Update, November 200511
What we have achieved so far
VITAL Manager ingests content and metadata edited externally with XMLSpy , edits existing content/metadata
Web submission for theses (expanding to new resource types)
Batch Ingest matching metadata and digital objects
Institutional search
National search on National Discovery Service
Ability to specify indexing
ARROW Update, November 200512
INGEST
EXPORT WEB ACCESS HARVEST
Synchronising between ARROW and external resources
Validating data, adding evidentiary material
ResearchDEST AuditResearcher citingfor offprints, or for comment, or for refereeing
DEST searchResearcher searchfor topics and research
Migration to 3rd party METS compliant tool, eg new repository, research mgmt tool
DEST reportingInclusion in 3rd party tool eg:Excel, EndnoteAcademic use for CVFaculty production of webpages7
National Discovery
search
Web access to documents
Web access to metadata
Export metadata and docs
Export metadata
only
Vital Client Webform BatchAuthorised Self submit e.g. research mgmt tools
Bibliographic SourcesMetadata only
Metadata + docsMetadata only
Metadata + docs Metadata only
Metadata + docs
ARROW Update, November 200513
ARROW – a demonstration
Searching
The Access Portal offers simple and advanced searching, while the Access Explorer gives indexing options.
The National Discovery Service provides consolidated searching across many repositories
ARROW Update, November 200514
ARROW – many different kinds of objectsJPEG Images
http://arrowdev.lib.monash.edu.au/hdl/1959.100/459Or http://hdl.handle.net/1959.100/459Composite weather image of a tropical cyclone Creator NASAhttp://arrowdev.lib.monash.edu.au/hdl/1959.100/457Satellite image of Victoria and Northern Tasmania Creator NASA
MrSid images with navigationag050009http://arrowdev.lib.monash.edu.au/hdl/1959.100/507Victoria Dock, circa 1910 and 1942ag050002http://arrowdev.lib.monash.edu.au/hdl/1959.100/516Victoria Dock, 1972 and 2002
ARROW Update, November 200515
ARROW demonstration
Text and supporting imagesTitle:History Australia, Volume2, No.1, 2004. Ferals and their muddies: Making a home in the bush http://arrowdev.lib.monash.edu.au/hdl/1959.100/418
Text onlyThesis Title:To define the ways in which VITAL will support the creation of a range of branded interfaces, showing either all of the repository or particular subsets http://arrowdev.lib.monash.edu.au/hdl/1959.100/486
ARROW Update, November 200516
ARROW demonstration
XML plus imagesMelbourne 2030: Chapter 5 - Residential infill and its threat to Melbourne's liveabilityhttp://arrowdev.lib.monash.edu.au/hdl/1959.100/283
.
MPEG movieMedical computer animation #20http://arrowdev.lib.monash.edu.au/hdl/1959.100/586
Quicktime movieMedical computer animation #10http://arrowdev.lib.monash.edu.au/hdl/1959.100/566
mp3 audioAsh Grunwald, Bakelite Radio and Blues Progression 5Mbhttp://arrowdev.lib.monash.edu.au/hdl/1959.100/571
AVI movieFantastic Four, movie trailer**WARNING large file 16Mb** DivX codec must be installed first to view.http://arrowdev.lib.monash.edu.au/hdl/1959.100/551
ARROW Update, November 200517
ARROW – a demonstration
Ingest and management
The Web ingest tool allows users to submit and describe documents, while the Web Review tool allows a staged review process
The Vital Manager : a 3rd generation repository application allowing ingest and edit
The batch ingest tool allows ingest of objects as well as their matching metadata
ARROW Update, November 200518
Batch Ingest – power ingest!
A set of command line scripts that uses a series of XML files todefine the import/ingest of large batches of like document typesand their metadata. Complex functionality to master but rewarded great flexibility in ingest options.
Each Batch job requires 3 files. Configuration file. Model Definition file. An XSL Style Sheet to convert the metadata into DC.
ARROW Update, November 200519
Configuration File
Defines the parameters required to configure a batch import
Model definition fileLocation of content filesMatching parameter to link up datastreamsFedora connection informationTest only statusHandles assignment
ARROW Update, November 200520
Model Definition file
Declares object model for the Fedora object created by the batch import
Number of datastreamsMime type of datastreamsStatus of datastreamDatastream LabelDefines Style sheet
ARROW Update, November 200521
XSL Style sheet
Maps the native metadata tags into Dublin CoreDC is used internally in Fedora and is the Metadata payload for the OAI harvest filesCurrently using specific XSL per batch jobFuture use OCLC interoperable core tools
ARROW Update, November 200522
Batch Import Process
Content directory
Location of the ingest target content
Configuration file
Model Definition file
Defines structure of fedora object to be built for each item ingested.
Style Sheet file
XSL Style sheet to convert native metadata into DC
Shell Script ARROW
ARROW Update, November 200523
Software tools - XML editor
Scripts must be edited for each new batch ingestVITAL delivered with sample scripts for common ingest types
thesesImagesxml
Samples can be customisedXML editing
Con: Complex Pro: Very flexible
ARROW Update, November 200526
Test mode
Runs script and does every except assign handles and load into the repositoryShows that
Batch load has processed all filesObject model matches model definitionMetadata and content files are matched
Log files generated for each batch load
ARROW Update, November 200527
Output
Can process from one to one million records* Have successfully tested with many 100’s of filesCan be executed by a “cron job” – overnight runIntend to use Batch Ingest to process approx 1,500 files of Monash contentComplex functionality to master but rewarded with great flexibility in ingest options
*Theoretical fedora upper limit
ARROW Update, November 200528
Partner showcaseAll partners are gathering research content, but they are
also considering other collections for ARROW
MonashPhD theses.
Mandatory collection of all new theses July 2005Retrospective Digitisation of all post 2000 PhD theses.
Indigenous Linguistical Research Resources PortalRelated Monash rare books content digitised and stored in ARROWAccess to material through portal using XACML access control to manage researcher and community access.
Centre for Gippsland Studies Picture Collection1000+ historical images in JPEG2000 format.
ARROW Update, November 200529
Partner showcase
ARROW@UNSWSchool of Biological, Earth & Environmental Sciences (BEES) is trialing the deposit of honours theses by final year students in 2005. School of Mining Engineering Trial: academic and research staff (including postgraduate students) are trialing the depositof research publications until the end of 2005.Centre for Health Informatics
18,000 medical x-ray images
ARROW Update, November 200530
Partner showcase
Swinburne UniversityJournal of Applied Psychology (ejap)Australian Journal of Educational Technology in SocietyInvestigating automatic ingest from OJS into ARROW in co-operation with NLAEntire research output from Institute for Social Research since 2001
National Library of Australia Harold White paper - Independent scholars working papersNLA staff papersArchive of research related emails
ARROW Update, November 200532
ARROW: an Information Management Tool to meet Government reporting requirements
Around 30-40% of Australian university research funding comes from governmentAt present an annual statistical return is required and audit evidence of research outputs is compiled as collections on paperARROW can improve the efficiency of this process as an information management toolIn the Monash context this will capture 4000 publications annually
ARROW Update, November 200533
Research Quality Framework (RQF)Integration with Research Management Systems
(Currently- RM4)Store Research Objects(with associated Metadata)Using -Open Supported StandardsCurrently- METSFuture- XML based formats (MPEG21 DIDL etc)Provide Persistent Links(HANDLES)Provide Secure Access(XACML)Expose Research Digital Objects(Google, National Discovery Service etc)
ARROW Update, November 200534
Gathering EvidenceInternal auditing and analysis
Who?What?Information accuracy?Timing?Ethics?Impact?
Preparing the reportSubmissionAssessmentCommenting
Key Activities
ARROW Update, November 200535
Working with Research Management Systems(Gathering information)
New Research Information
New ResearchObjects
Old Research Information
Old ResearchObjects
ARROW Update, November 200536
Working with Research Management Systems(Recording information)
Research Managementand
Analysis Software
New Research Information
New ResearchObjects
Old Research Information
Old ResearchObjects
ARROW Update, November 200537
Working with Research Management Systems(Additional information)
Research Managementand
Analysis Software
New Research Information
New ResearchObjects
Old Research Information
Old ResearchObjects
Research staff informationResearch students informationResearch grants information, etc
ARROW Update, November 200538
Working with Research Management Systems(Depositing into ARROW)
Research Managementand
Analysis Software
Research Objects + Subset of Research Information
ARROW
New Research Information
New ResearchObjects
Old Research Information
Old ResearchObjects
ARROW Update, November 200539
Working with Research Management Systems(Handles returned)
Research Managementand
Analysis Software
ARROW
Handles
Example:http://arrowdev.lib.monash.edu.au/hdl/1959.100/630
New Research Information
New ResearchObjects
Old Research Information
Old ResearchObjects
ARROW Update, November 200540
Working with Research Management Systems(Evidence Portfolios)
Research Managementand
Analysis Software
ARROWEvidence Submission
RQF
Evidence
Portfolios
RQF
Evidence
Portfolios
RQF
EvidencePortfolios
RQF
Evidence
Portfolios
RQF
Evidence
Portfolios
RQF
EvidencePortfolios
RQF
Evidence
Portfolios
RQF
Evidence
Portfolios
RQF
EvidencePortfolios
New Research Information
New ResearchObjects
Old Research Information
Old ResearchObjects
ARROW Update, November 200541
Working with Research Management Systems(Handles link back to stored Research Objects)
Research Managementand
Analysis Software
ARROW
RQF
Evidence
Portfolios
RQF
Evidence
Portfolios
RQF
EvidencePortfolios
RQF
Evidence
Portfolios
RQF
Evidence
Portfolios
RQF
EvidencePortfolios
RQF
Evidence
Portfolios
RQF
Evidence
Portfolios
RQF
EvidencePortfolios
New Research Information
New ResearchObjects
Old Research Information
Old ResearchObjects
ARROW Update, November 200542
ARROW’s role in supporting The RQF
If no Research Management System, ARROW will enable you to:Store Research Objects(with associated Metadata)Using- Open Supported StandardsCurrently- METSFuture- XML based formats (MPEG21 DIDL etc)Provide Persistent Links(HANDLES)Provide Secure Access(XACML)Expose Research Digital Objects(Google, National Discovery Service etc)
ARROW Update, November 200543
(Recording Research Information)
ARROW
New Research Information
New ResearchObjects
Old Research Information
Old ResearchObjects
ARROW Update, November 200544
(Handles provided and information analysis)
Information compilation, auditing, analysis and report preparation
ARROW
HandlesGather additional information:
Research outputs Research staff informationResearch students informationResearch grants information, etc
New Research Information
New ResearchObjects
Old Research Information
Old ResearchObjects
ARROW Update, November 200545
CollectingNew Research
Information
CollectingOld Research
Information
CollectingOld Research
Objects
CollectingNew Research
Objects
(Reporting Evidence Portfolios)
ARROW
RQF
Evidence
Portfolios
RQF
Evidence
Portfolios
RQF
EvidencePortfolios
RQF
Evidence
Portfolios
RQF
Evidence
Portfolios
RQF
EvidencePortfolios
RQF
Evidence
Portfolios
RQF
Evidence
Portfolios
RQF
EvidencePortfolios
RQF Reporting(submission)
ARROW Update, November 200546
CollectingNew Research
Information
CollectingOld Research
Information
CollectingOld Research
Objects
CollectingNew Research
Objects
(Linked back to your Research Objects)
ARROW
Handles
RQF
Evidence
Portfolios
RQF
Evidence
Portfolios
RQF
EvidencePortfolios
RQF
Evidence
Portfolios
RQF
Evidence
Portfolios
RQF
EvidencePortfolios
RQF
Evidence
Portfolios
RQF
Evidence
Portfolios
RQF
EvidencePortfolios
RQF Reporting
ARROW Update, November 200547
RQF Assessment by DEST(DEST collection )
DEST RQF
Collection
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
ARROW Update, November 200548
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF Assessment(DEST informs Assessment Panels)
RQFASSESSMENT
Panel
RQFASSESSMENT
Panel
RQFASSESSMENT
Panel
DEST RQF
Collection
ARROW Update, November 200549
RQF Assessment(Panels View Evidence)
RQFASSESSMENT
Panel
RQFASSESSMENT
Panel
RQFASSESSMENT
Panel
DEST RQF
Collection
Handles enable viewing of Research Digital Objects in Digital RepositoriesRQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
review
review
review
ARROW Update, November 200550
RQF Assessment(Panels Comment on Evidence)
RQFASSESSMENT
Panel
RQFASSESSMENT
Panel
RQFASSESSMENT
Panel
DEST RQF
Collection
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
review
review
review
Annotations
Annotations
Annotations
ARROW Update, November 200551
RQF Assessment(Viewing annotations)
RQFASSESSMENT
Panel
RQFASSESSMENT
Panel
RQFASSESSMENT
Panel
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
RQF
EvidencePortfolios
Annotations
Annotations
Annotations
DEST RQF
Collection
ARROWUSERS
ARROW Update, November 200552
In SummaryARROW supports the RQF
Storing ResearchExposing ResearchAllowing commenting and annotations
ARROW Update, November 200553
To come…Soon
Interface with Research Master and other research management toolsWeb ingest for other content types – alpha software now being tested at:http://arrowdev.lib.monash.edu:8000/cgi-bin/valet-1.1/submit.cgiEnhanced user interface with browse capabilities
Early 2006XACML Access control at Object and datastream levelsSupport for OAI Sets for metadata harvesting
Mid 2006Generalised content model managementMetadata interoperability
ARROW Update, November 200554
What you need: Infrastructure
Operating Platform
Processor Speed
Memory (Minimum Requirement)
Memory (Recommended Requirement)
Hard Disk Space
For typical 1000 object collection Max # of Users
Windows 2 GHz 1 GB RAM 2 GB RAM 20 GB 128 (min config) 256 (max config)
Red Hat Linux Enterprise AS
2 GHz 1 GB RAM 2 GB RAM 20 GB 128 (min config)256 (max config)
Solaris 650 MHz 512 MB RAM 1 MB RAM 20 GB 64 (min config)128 (max config)
Recommended Server SoftwareJava Virtual Machine Runtime: JRE 1.4.2Databases: Oracle, MySQL, McKoiWeb Server: Apache 1.3.22+
Recommended Server hardware
ARROW Update, November 200555
What you need: Staff, training, knowledge
Technical
Administrative
Marketing and advocacy
Training
Support- ARROW Users Group and Forum
ARROW Update, November 200556
Contracts and licensingMonash is the lead institution, and has a Head License Agreement with VTLS
Monash arranges a sublicense to (new) Additional Partners
Additional Partners enter into a separate maintenance agreement directly with VTLS for support of the Software
Monash coordinates software development
VTLS installs, trains and supports
System and training documentation are made available by VTLS
An ARROW users group will be established
ARROW Update, November 200559
How the ARROW components fit together
Fedora: the open source storage layer
VTLS: commercial and open source application layer
Handles: a universal persistent identifier
Content models: a consistent way of describing resource types
Templates: a method for creating valid MARCXML metadata
Standards: MARCXML, Dublin Core, METS
Collections: explicit and results sets
Exchange: Relationships with RM4, EndNote etc….
ARROW Update, November 200561
Compound vs. atomistic object model
Rels-Ext
Rels-Int
Content
Content Content
ContentAtomistic – a data object with one or more content datastreams that are all considered primary to the object.
Compound – a data object consisting of multiple content datastreams that are not all primary to the object.
ContentContent
Content
Content
ARROW Update, November 200562
Arrow has chosen the compound object model
SM1: System Meta
Fedora PID
OMS1:Object metadata
DS1:ExternalUniqueID
CS1:Pub Body 1
CMS1:Body 1 metadata
CS2: Pub Body 2
CS4:WebPages
CMS3:Images metadata
CS3: Images
CS6:Bibliography
CMS4: Web metadata
CMS6:Bibliog metadata
CMS7: Evidence metadata
RELS-INT: RELS-INT
DC: Dublin Core
CMS2: Body 2 meta
CS5: Multimedia
CMS5: multimedia metad
CS17: Evidence
DS18: Native Metadata
Compound object
Each object in the repository comprises two or more datastreams.
One object may contain many different kinds of files.
ARROW Update, November 200563
OCLC Metadata Interoperability Core
From: Godby, Smith and Childress. 2003. “Two paths to interoperable metadata” p. 3 at
http://www.oclc.org/research/publications/archive/2003/godby-dc2003.pdf