+ All Categories
Home > Documents > Future of Web Archiving - Digital Preservation (Library of ... · Future of Web Archiving Stephen...

Future of Web Archiving - Digital Preservation (Library of ... · Future of Web Archiving Stephen...

Date post: 26-Aug-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
14
Future of Web Archiving Stephen Abrams California Digital Library Martin Klein Los Alamos National Laboratory Jimmy Lin University of Maryland Michael Nelson Old Dominion University Digital Preservation 2014, Washington, July 22-24
Transcript
Page 1: Future of Web Archiving - Digital Preservation (Library of ... · Future of Web Archiving Stephen Abrams California Digital Library Martin Klein Los Alamos National Laboratory Jimmy

Future of Web Archiving

Stephen Abrams California Digital Library

Martin Klein Los Alamos National Laboratory

Jimmy Lin University of Maryland

Michael Nelson Old Dominion University

Digital Preservation 2014, Washington, July 22-24

Page 2: Future of Web Archiving - Digital Preservation (Library of ... · Future of Web Archiving Stephen Abrams California Digital Library Martin Klein Los Alamos National Laboratory Jimmy

www.flickr.com/photos/adesigna/4090782772

Agenda

Web archiving problems and opportunities

Memento tools

WarcBase platform

Assessing quality of archives

Discussion

Agenda

Web archiving problems and opportunities

Memento tools

WarcBase platform

Assessing quality of archives

Discussion

Page 3: Future of Web Archiving - Digital Preservation (Library of ... · Future of Web Archiving Stephen Abrams California Digital Library Martin Klein Los Alamos National Laboratory Jimmy

Web archiving is important but (really) hard

Why web archiving? Continuation of longstanding mission to collect, preserve, and provide access to the scholarly record and our cultural heritage

Publishing/dissemination platform of choice

But … www.flickr.com/photos/alaig/3522953697

www.flickr.com/photos/hier_gibt_es_nichts_zu_sehen_bitte_gehen_sie_weiter/840587382

the web isn’t the web anymore

Page 4: Future of Web Archiving - Digital Preservation (Library of ... · Future of Web Archiving Stephen Abrams California Digital Library Martin Klein Los Alamos National Laboratory Jimmy

Web in transition

Document retrieval

Document viewer

HTML

Common

Desktop

Information

Programming environment

Virtual machine

JavaScript

Personalized

Mobile/handheld/wearable

Things

www.flickr.com/photos/swamibu/2223726960 www.flickr.com/photos/sharples/79222765

A “web” of notes with links (like references) between them …”

– Tim Berners-Lee, March 1989

Page 5: Future of Web Archiving - Digital Preservation (Library of ... · Future of Web Archiving Stephen Abrams California Digital Library Martin Klein Los Alamos National Laboratory Jimmy

(Some) other issues

Crawlers don’t act like browsers

► Need robots that act more like people

www.flickr.com/photos/benhusmann/5126030385

Page 6: Future of Web Archiving - Digital Preservation (Library of ... · Future of Web Archiving Stephen Abrams California Digital Library Martin Klein Los Alamos National Laboratory Jimmy

(Some) other issues

Crawlers don’t act like browsers

Responsiveness to time-sensitive content

► Need to bypass v-e-r-y deliberate collection development procedures

Gaurdian News and Media Limited

Page 7: Future of Web Archiving - Digital Preservation (Library of ... · Future of Web Archiving Stephen Abrams California Digital Library Martin Klein Los Alamos National Laboratory Jimmy

www.flickr.com/photos/vblibrary/7414544704

(Some) other issues

Crawlers don’t act like browsers

Responsiveness to time-sensitive content

Policies, rights, and permissions

► Need to overcome legal barriers that follow the monetization of content

Page 8: Future of Web Archiving - Digital Preservation (Library of ... · Future of Web Archiving Stephen Abrams California Digital Library Martin Klein Los Alamos National Laboratory Jimmy

www.flickr.com/photos/21664580@N04/2095574414

into traditional management

(Some) other issues

Crawlers don’t act like browsers

Responsiveness to time-sensitive content

Policies, rights, and permissions

Difficult integration into traditional management and discovery services

► Leading to …

Page 9: Future of Web Archiving - Digital Preservation (Library of ... · Future of Web Archiving Stephen Abrams California Digital Library Martin Klein Los Alamos National Laboratory Jimmy

(Some) other issues

Crawlers don’t act like browsers

Responsiveness to time-sensitive content

Policies, rights, and permissions

Difficult integration into traditional management and discovery services

Siloed collections

www.flickr.com/photos/54159370@N08/7148880783

Page 10: Future of Web Archiving - Digital Preservation (Library of ... · Future of Web Archiving Stephen Abrams California Digital Library Martin Klein Los Alamos National Laboratory Jimmy

(Some) other issues

Crawlers don’t act like browsers

Responsiveness to time-sensitive content

Policies, rights, and permissions

Difficult integration into traditional management and discovery services

Siloed collections

Scale

► Storage capacity

► Full-text indexing

► De-duplication

► Resources Raiders of the Lost Ark © Paramount Pictures

Page 11: Future of Web Archiving - Digital Preservation (Library of ... · Future of Web Archiving Stephen Abrams California Digital Library Martin Klein Los Alamos National Laboratory Jimmy

Supporting research

Little awareness in the scholarly community

Poorly understood use cases

Few tools

Traditional find → download → manipulate locally workflows may not be feasible at web scale

► Need APIs and business models for in situ analysis

berkeley.edu/teach www.flickr.com/photos/infocux/8450190120

Page 12: Future of Web Archiving - Digital Preservation (Library of ... · Future of Web Archiving Stephen Abrams California Digital Library Martin Klein Los Alamos National Laboratory Jimmy

www.flickr.com/photos/bartelomeus/4184705426

Browsing the past should be as simple and intuitive as the now

Better discovery modalities

www.flickr.com/photos/shebalso/6357626617

mechanisms

Technological opportunities

Better capture mechanisms

► Headless browsers

► API harvesters …

Better discovery modalities

► Browsing the past should be as simple and intuitive as the now …

Page 13: Future of Web Archiving - Digital Preservation (Library of ... · Future of Web Archiving Stephen Abrams California Digital Library Martin Klein Los Alamos National Laboratory Jimmy

Cooperative opportunities

Complementary collection development

Coordinated infrastructure support and operation

► Or perhaps centralized – a HathiTrust for web archives?

Crowd sourcing selection, description, quality assurance

www.flickr.com/photos/chiotsrun/4115059294 www.flickr.com/photos/sagesolar/9230445157

Page 14: Future of Web Archiving - Digital Preservation (Library of ... · Future of Web Archiving Stephen Abrams California Digital Library Martin Klein Los Alamos National Laboratory Jimmy

And now …

cdn.ws.citrix.com/wp-content/uploads/2012/05/iStock_000010348904XSmall.jpg


Recommended