Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green

Post on 25-Feb-2016

37 views 0 download

Tags:

description

Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green. Curation in the Cloud, London, 7/8 March 2012. Institutional repository background. Hull has been running a Fedora-based institutional repository for several years Originally based on Fedora + Muradora UI - PowerPoint PPT Presentation

transcript

Curation in the Cloud

Hull’s Fedora and Hydra perspective

Richard Green

Curation in the Cloud, London, 7/8 March 2012

Institutional repository background

• Hull has been running a Fedora-based institutional repository for several years

– Originally based on Fedora + Muradora UI– More recently (6 months) based on Fedora + Hydra

• The repository covers a wide range of content – not just OA articles…

Curation in the Cloud | London | 7/8 March 2012 | 2

Curation in the Cloud | London | 7/8 March 2012 | 3

Curation in the Cloud | London | 7/8 March 2012 | 4

Wide range of content to deal with

- Exam papers- e-Theses & dissertations (ETDs)- Journal articles- Meeting papers or minutes- Policies or procedures- Dissertations (undergraduate)- Photographs- Presentations- Books- Book chapters- Regulations- Reports- Conference papers or abstracts- Leaning materials- Handbooks

- Internet publications- Newsletter articles- Datasets- Sound- Moving images- Guidance documents- Licences- Posters- Events- Letters- Artwork- Diagrams- Maps- Software- etc (!!!)

Affiliations

• Hull was instrumental in founding the Fedora UK & Ireland User Group…– 20 or so informal members

Curation in the Cloud | London | 7/8 March 2012 | 5

Acuity UnlimitedBritish Cartoon ArchiveUniversity College Dublin (Irish Virtual Research Library and Archive)University of DurhamUniversity of Essex (UK Data Archive)Freshwater Biological AssociationGlasgow Caledonian University (Spoken Word Services)University of HullKing's College London (CeRch)University of Leeds (Timescapes Project)

London School of Economics and Political ScienceUniversity of ManchesterNational e-Science Centre, EdinburghNational Library of ScotlandNational Library of WalesOpen UniversityUniversity of Oxford LibrariesUniversity of Oxford (Forced Migration Project)University of St AndrewsUniversity of York

Affiliations [2]

• and is a founder member of the Hydra partnership (with the University of Virginia, Stanford University and Fedora Commons)– Fedora does not have an ‘out-of-the-box’ UI. Hydra set out to

provide building blocks from which highly functional (full-CRUD) UIs could be built over it

– Growing number of Hydra-using institutions in the US, two or three so far in the UK

– Hydra “content modelling” is proving useful to non-Hydra Fedora users

Curation in the Cloud | London | 7/8 March 2012 | 6

At the moment?

• Just starting to think seriously about opportunities in the cloud– This meeting is opportune to help clarify what is still somewhat

fuzzy thinking

• At the moment, we in Hull are considering the use of cloud storage in addition to local storage for its Hydra repository

Curation in the Cloud | London | 7/8 March 2012 | 7

At the moment? [2]

• Why the cloud?– Could be used to provide near-line capability for rarely used

assets which are individually ‘small’ but numerous

– Potential to store very large, but rarely accessed, assets (TB range) ‘cheaply’ (cf high-performance SAN storage)

– Possibility of leveraging ‘above campus’ services (Image manipulation? Video streaming? Format migration?)

Curation in the Cloud | London | 7/8 March 2012 | 8

At the moment? [3]

• WE’RE NOT – considering a complete repository infrastructure in the cloud

• Happier with the software stack locally

– considering local software with all-cloud storage• There are known problems with latency etc

• WE ARE– considering a hybrid of the two

Curation in the Cloud | London | 7/8 March 2012 | 9

At the moment? [4]

• How?– In principle, Fedora (and therefore Hydra) allows for a mix and

match of storage: Fedora managed (local file system), external (http accessible), redirected (redirects user to appropriate URL)

– So: • use “managed content” for straightforward, small and/or high access

materials;

• use “external content” for low access materials or where there is a value-added service.

Curation in the Cloud | London | 7/8 March 2012 | 10

Scale of problem

• Bulk of repository content is “small” – megabytes

• Multimedia content is larger (10s-100s megabytes) and our current offering is “download” – we cannot (yet) stream

• We know there are multi-TB datasets on campus to be dealt with– eg Biology have one 6TB growing at 2TB per quarter

Curation in the Cloud | London | 7/8 March 2012 | 11

Potential practical problems

• High-access materials could generate large download charges– Better suited to low access objects or to get ‘value added’

services– Need a way of predicting costs over long periods (using the

LIFE model?)

• Getting large objects/volumes into the cloud– Transfer times for TBs of content are considerable. Use UPS to

send a hard drive (or several?)

Curation in the Cloud | London | 7/8 March 2012 | 12

Potential practical problems [2]

• Security– Hull’s IR has very granular security (categories

[public/staff/student], groups [eg student modules], individuals)

– Need to be able to restrict access to cloud-based materials accordingly

Curation in the Cloud | London | 7/8 March 2012 | 13

Potential practical problems [3]

• Durability– “Designed to provide 99.999999999% durability” (Amazon S3

SLA). And the other 0.000000001%? Not a lot, but…• Could that mean for every terabyte you send us we promise not to

corrupt more than ten or so bytes?!?

• Or that we might lose 1 in 1011 files, which might not be quite so bad providing it’s not one of your files

– LOCKSS type approach across several providers?

Curation in the Cloud | London | 7/8 March 2012 | 14

Potential Practical Problems [4]

• Management of an institutional cloud– Can an institution realistically manage its own cloud space(s)?

• Managing just the data• Maybe managing cloud-based services

– Is the idea of third-party management (à la DuraSpace) a more appropriate model?

Curation in the Cloud | London | 7/8 March 2012 | 15

So, in summary…

• Hull is potentially interested in cloud solutions for:– Low access materials which individually are not big but taken

together are (eg 000s of images)– TB+, low-access objects– ‘Above campus’, value-added services (Image manipulation,

media streaming, format migration, LOCKSS-in-the-Cloud?)

• Maybe sounds like a job for a UK HE oriented, brokered service akin to DuraCloud’s model?

Curation in the Cloud | London | 7/8 March 2012 | 16

Curation in the Cloud | London | 7/8 March 2012 | 17

Contacts and links

IR Service owner: Chris Awre (c.awre@hull.ac.uk)

Hydra Project Manager for Hull: Richard Green (r.green@hull.ac.uk)

Hull Institutional Repository: hydra.hull.ac.uk

Fedora website: fedora-commons.org

Hydra website: projecthydra.org

Fedora UK&I User group: fedora-uki.org.uk