One Site Among Many: Stanford and Collaborative Technical … · 2020. 6. 27. · opportunities for...

Post on 24-Aug-2020

0 views 0 download

transcript

One Site Among Many: Stanford

and Collaborative Technical

Development for Web Archiving

Nicholas Taylor

Web Archiving Service Manager

Stanford University Libraries

PASIG 2016

March 11, 2016

overview

• web archiving

opportunity gaps

• situation of SUL web

archiving

• APIs + community

(technical)

development

“LAX on take off” by Doug under CC BY-NC-ND 2.0

web content >

“The Seeker” by C MB 166 under CC BY-ND 2.0

preserved web content

link rot + content drift

Andrew Jackson: “Ten years of the UK Web Archive”

a centralized enterprise

60%

25%

14%

63%

20%16%

0%

10%

20%

30%

40%

50%

60%

70%

External Local Both

2011 2013

NDSA: “Web Archiving in the U.S.: A 2013 Survey”

a centralized enterprise

0 01

0

2

01

01

0

3 3

12

4

2

6

4

10

2

0

0

1

1

0

1 3

5

3

4 2

25

6

15

0

2

4

6

8

10

12

14

16

18

20

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

Number of organizations Archive-It Partner as of 2013

NDSA: “Web Archiving in the U.S.: A 2013 Survey”

minimal local preservation

19%

81%

20%

80%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

Transferred Haven't transferred

2011 2013

NDSA: “Web Archiving in the U.S.: A 2013 Survey”

opportunities for research

“Exploring the Canadian Political Interest Group and Political Parties Web Sphere” by Ian Milligan under Standard YouTube License

Stanford Web Archive Portal

Stanford University Libraries: “Stanford Web Archive Portal”

SearchWorks (online catalog)

Stanford University Libraries: “SearchWorks”

web archaeology (SLAC)

oldweb.today: “WorldWideWeb SLAC Home Page”

building + integrating infrastructure

discovery

preservation

access

capture

SDR

APIS + COMMUNITY DEVELOPMENT

“P1050827” by Rebecca Siegel under CC BY 2.0

web archiving lifecycle

Internet Archive: “The Web Archiving Life Cycle Model”

functional overlap

Appraisal

and

Selection

ScopingData

Capture

Storage and

Organization

QA and

Analysis

Metadata /

Description

Access

/ Use /

Reuse

PreservationRisk

Management

ACT

Archive-It

AtN

BCWeb

CDL WAS

DigiBoard

Islandora

WARC

Solution Pack

Netarchive

Suite

PageFreezer

UNT

Nomination

Tool

WCT

smaller, modular components

“Giant Rubik's Cube” by Francois Lamotte under CC BY 2.0

API candidates

• capture tool/proxy

interconnect

• capture tool

management

• data import/export

• query + extraction

• integrity audit + repair

• descriptive metadata

• logs + analytics

• renderings/derivative

formats

• federated data

delivery

• federated replay

• federated full-text

search