Managing large and complex data sets

Post on 25-May-2015

1,610 views 1 download

Tags:

description

Presentation given by Catherine Hardman of the Archaeology Data Service in York.The presentation was given at the 'Managing Archaeology Data' event on Monday 7th March 2011 at the University of Glasgow.

transcript

Managing large and complex data sets:

… THE CHALLENGES OF ARCHIVING AND ONLINE DELIVERY

CATHERINE HARDMAN

My lithics report here, on floppy disc

The problem….in 1996

The Archaeology Data Service:•set up in 1996 •one of five AHDS subject centres•based within the University of York

Funding:•initially received funding from

•Arts and Humanities Research Council (AHRC)

•Joint Information Systems Committee (JISC)•Presently receives core funding from AHRC alongside cross-sectoral, project-based funding.

The ADS: some ancient history

Our remit:

“To support research, learning and teaching with high quality and dependable digital resources.”

In practice this means three key things:

•That ADS collect and preserve datasets•That we allow full, easy and free access to these•And that we additionally provide guidance and support to data creators

What do we do?

No need for digital preservation

Domesday Book: Publisher: William of

Normandy (1086) – still readable

Where’s preservation when you need it?

Domesday Disc: Publisher: BBC (1986) –nearly lost

Why is it important?

Michener, W.K., Brunt, J.W., Helly, J.J., Kirchner, T.B. and Stafford, S.G. 1997. Nongeospatial Metadata for the Ecological Sciences. Ecological Applications. 7: 330-342.

What’s the problem? Information Entropy

The scale of the problem in the 1990s

None47%

Humidity control

8%

Heat control

7%

Fire-resistant container

23%

Anti-magnetic

10%

Anti-static

protected5%

Strategies for protecting physical media

Findings and Recommendations from ‘Digital Data in Archaeology: A Survey of User Needs’ Condron et al 1999

Protecting Physical media

…never the twain

The scale of the problem in the 1990s

Hard disc28%

Tape22%

CD-ROM14%

Netw ork13%

Floppy disc23%

The popularity of storage options

Findings and Recommendations from ‘Digital Data in Archaeology: A Survey of User Needs’ Condron et al 1999

8" Floppy

3.5" Floppy

5.25" Floppy

12" Optical Disk

5.25" Optical Disk

CD-ROM

Sparq Disk Cartridge

Zip Disk

Click!

DVD-ROM

Jaz Disk

Floptical Disk

Punch Tape

Rectangular Hole Punch Card

IBM 3480

DLT Tape

DG90M Tape

DC4_120

8mmD-eight

QIC DC600

G2000 Tape

4mm Tape

Ditto Max

9-Track Reel

Cassette tape

       Memory Stick

MultiMedia Card SD Memory Card

xD Picture Card

Smart Media

CompactFlash

Travan

Why is it all so difficult?

Deterioration of the storage medium Obsolescence of the storage mediumFailure to document the format adequatelyObsolescence of the softwareObsolescence of the hardware Long-term management

How do we do it?Open Archival Information System (OAIS)

But that’s people…

Migration based approach & controlled ingest

Aim to connect with data

producers early on in their project

lifecycles to ensure that preservation

planning is a key consideration

during the project rather than an afterthought.

17

Guides to help you do all that.

It hasn’t really got much easier

The goal posts keep moving!

The size of digital archives held by different types of The size of digital archives held by different types of archaeological bodies archaeological bodies

0

10

20

30

40

1-5Mb 5-10Mb 10-50Mb 50-100Mb 100-1,000Mb

>1Gb

Num

ber

of a

rchi

ving

bod

ies

National body

Local gov. archaeology

Field archaeology

HEI

Museum

Consultancy

http://ads.ahds.ac.uk/

Archaeology Data Service

Big Data ProjectRoughly how much data would be generated by a single project?

Average project size (estimated)

19%

3%

3%

25%

50%

over 200GB

150 - 200GB

100 - 150GB

50 - 100GB

under 50GB

Which of these data collection techniques do you carry out?

Technologies used

12%

4%

4%

3%

8%

1%

3%

11%

9%

9%

7%

14%

3%

12%

3D Laser Scanning

Sidescan Sonar

Multibeam Scanning

Single Beam Scanning

Geophysics

Acoustic Tracking

Sub bottom profiling

Geographic (eg GIS)

Lidar

Digital Video

Video Movie Clips

Still Images

CAD (2D or 3D)

Other

What are the main software packages you use ?

Software (noted more than once)

4%10%

12%

4%

4%

4%

6%6%10%

4%

4%

4%

8%

6%

4%

4%4%

3D Studio Max

ArcGIS

AutoCAD

BAE SOCETSET

CODA

ENVI / IDL

ERDAS Imagine

Golden Software Surfer

Leica Cyclone

MicroStation

Pointools

Polyworks

RapidForm

TerraScan

Trimble Realworks

Custom software

MySQL

Do you have an archiving policy for the data sets / types in question?

Archival policy?

48%

27%

25%

Yes

No

No response

back-up

When you start a new project …would you consider using existing datasets?

Yes, 28

Not answered, 2

Yes

Not answered

This is the opportunity!

Making the inaccessible accessible

to make available unpublished fieldwork reports in an easily retrievable fashion. There are currently 8018 reports available and this number is increasing steadily through the OASIS project in England and Scotland.

…between publication and archives …

Blurring the distinction …

Making the LEAP…

What does that mean for you?

Plan for reusePlan for reusePlan for reusePlan for reuse

How do you do that?

Include a data management plan (use the DCCs)Order your dataFile naming strategyVersion controlBack-up (in the field)Consider your file formatsDissemination plan (and it’s longevity)What does the long term look like?Discuss requirements with an archive

We’re here to help

http://archaeologydataservice.ac.uk/

http://guides.archaeologydataservice.ac.uk/

catherine.hardman@york.ac.uk