Update on Memento (IIPC 2011 Plenary)

Post on 11-May-2015

1,045 views 0 download

Tags:

transcript

Memento Update

2011 IIPC General Assembly, Den Hague 1

Update on Memento

http://www.mementoweb.org/

Herbert Van de Sompel Robert Sanderson Michael L. Nelson

This research funded by the Library of Congress

Towards Seamless Navigation of the Web of the Past

Memento Update

2011 IIPC General Assembly, Den Hague 2

Overview of Memento Framework

Deployment Progress

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies

Memento Update

2011 IIPC General Assembly, Den Hague 3

Overview of Memento Framework

Deployment Progress

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies

Memento Update

2011 IIPC General Assembly, Den Hague 4

Memento wants to make it easy

to navigate the Web of the Past.

Memento Update

2011 IIPC General Assembly, Den Hague 5

Tate Online Today

Select Date March 16 2008

Tate Online March 16 2008

From National Archives

Memento Update

2011 IIPC General Assembly, Den Hague 6

Content Management Systems

•  Designed to be aware of all versions of a resource

•  Self-contained

•  Variety of proprietary version mechanisms

•  Versions interlinked using proprietary mechanisms

World Wide Web

•  Designed to forget about prior versions of a resource

•  Highly Distributed

•  No standard version mechanisms

•  Standardized interlinking mechanisms

Versions: Web vs CMS

Memento Update

2011 IIPC General Assembly, Den Hague 7

The Web Architecture has a hard time dealing with the versions that do exist:

•  Cannot talk about a resource as it used to exist

•  Cannot access a prior version given the current one

•  Cannot access the current version given a prior one

Versions are not Integrated

Memento Update

2011 IIPC General Assembly, Den Hague 8

•  Regards the Web as a big Content Management System

•  Introduces a uniform capability to access versions on the Web

•  Does not build new archives but leverages all systems that host versions

Memento Framework

Memento Update

2011 IIPC General Assembly, Den Hague 9

•  Is Distributed: versions may exist on several servers

•  Uses Time as a global version indicator

•  Is based on the primitives of the Web: resource, resource state, representation, content negotiation, link

Memento Framework

Memento Update

2011 IIPC General Assembly, Den Hague 10

Memento Interaction Overview

Memento Update

2011 IIPC General Assembly, Den Hague 11

Original Resource and Versions

Memento Update

2011 IIPC General Assembly, Den Hague 12

Bridge from Present to Past

Memento Update

2011 IIPC General Assembly, Den Hague 13

Bridge from Past to Present

Memento Update

2011 IIPC General Assembly, Den Hague 14

Memento Framework

Memento Update

2011 IIPC General Assembly, Den Hague 15

Framework with Multiple Archives

Memento Update

2011 IIPC General Assembly, Den Hague 16

Overview of Memento Framework

Deployment Progress

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies

Memento Update

2011 IIPC General Assembly, Den Hague 17

Significant progress has been made towards

seamless navigation of the Web of the Past.

Memento Update

2011 IIPC General Assembly, Den Hague 18

•  Standardization process started via the IETF

•  Interest from IETF and W3C

•  Encouraged by major Web architects, including: Tim Berners-Lee, Mark Nottingham, Michael Hausenblas

https://datatracker.ietf.org/doc/draft-vandesompel-memento/

Standardization

Memento Update

2011 IIPC General Assembly, Den Hague 19

•  Several client tools developed by us and others

•  Add-ons for FireFox (operational) and Internet Explorer (experimental)

•  Applications for Android (operational) and iPhone/iPad (in development)

•  Paper in current Issue of Code4Lib Journal

http://www.mementoweb.org/tools/

Memento Clients

Memento Update

2011 IIPC General Assembly, Den Hague 20

•  Memento-compliant Wayback software:

•  In use by Internet Archive

•  Available to Web archives, worldwide

•  Please experiment with this new 1.6 version!

http://www.mementoweb.org/tools/

Memento Server Support

Memento Update

2011 IIPC General Assembly, Den Hague 21

•  Plug-in for MediaWiki (operational)

•  Used on W3C’s main wiki

•  Please install it for your MediaWiki!

http://www.mementoweb.org/tools/

Memento Server Support (2)

Memento Update

2011 IIPC General Assembly, Den Hague 22

•  Server side client:

•  Attempts to perform all Memento actions against a given URI

•  Reports success/failure of the interactions and warnings for optional aspects

•  Kept up to date with IETF Internet Draft

http://www.mementoweb.org/tools/validator/

Memento Server Validator

Memento Update

2011 IIPC General Assembly, Den Hague 23

•  Several systems that host Mementos made Memento-compliant “by proxy”:

•  Many Web Archives that do not yet run Memento-compliant software

•  3,000+ MediaWiki systems, including Wikipedia, Wikia

•  We would love all of these to become natively Memento compliant!

Memento Proxy Support

Memento Update

2011 IIPC General Assembly, Den Hague 24

•  Ongoing effort to add materials that support understanding and adoption:

•  Introduction to Memento •  How to recognize

Mementos, TimeGates, Original Resources?

•  Guidelines for servers that host Mementos (Web Archives, CMS, snapshot archives, etc.)

http://www.mementoweb.org/guide/

Memento Web Site

Memento Update

2011 IIPC General Assembly, Den Hague 25

•  2007-2010: US $250K grant from Library of Congress

•  Approx. $50K on Memento

•  2010-2011: US $1 Million follow-up grant from Library of Congress

•  For: Specification, outreach, tool development, further research

Funding

Memento Update

2011 IIPC General Assembly, Den Hague 26

Overview of Memento Framework

Deployment Progress

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies

Memento Update

2011 IIPC General Assembly, Den Hague 27

Very few Web sites provide a “timegate” link.

Need additional mechanisms to support Discovery.

Memento Update

2011 IIPC General Assembly, Den Hague 28

Batch Discovery: TimeMaps

A TimeMap minimally lists:

•  URI and datetime of Mementos known to an archive •  URI of Original Resource

TimeMaps can be aggregated across systems that host Mementos

Memento Update

2011 IIPC General Assembly, Den Hague 29

Batch Discovery: Feed of TimeMaps

System that hosts Mementos exposes Feed of TimeMaps to allow applications to remain in sync with its collection:

•  One Atom entry per Original Resource •  The entry links to or includes a TimeMap •  The entry's updated changes when additional Mementos become available •  The ID of the entry is a tag URI based on URI of Original Resource •  Can be protected, and include license information •  Could be anonymized by aggregating service

Memento Update

2011 IIPC General Assembly, Den Hague 30

Batch Discovery: robots.txt

•  robots.txt file is used by Web servers to convey crawling policies

•  Add a directives to support discovery of TimeGates and Feeds of TimeMaps

TimeGate: http://dutch.archive.org/timegate/ Archived: .nl

TimeGate: http://all.archive.org/timegate/ Archived: *

TimeMapFeed: http://dutch.archive.org/feed/feed1.xml

Memento Update

2011 IIPC General Assembly, Den Hague 31

Overview of Memento Framework

Deployment Progress

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies

Memento Update

2011 IIPC General Assembly, Den Hague 32

Memento can recreate pages using resources from different archives.

This poses a branding challenge.

Memento Update

2011 IIPC General Assembly, Den Hague 33

Current Branding Practice for Web Archives

Page and embedded resources from same Web Archive

Branding for

page and

embedded resources from single

archive

Memento Update

2011 IIPC General Assembly, Den Hague 34

Branding for Web Archives in Memento Mode

Will be researched

Page and embedded resources from various Web Archives

HTML's branding

No branding

No branding

Memento Update

2011 IIPC General Assembly, Den Hague 35

Overview of Memento Framework

Deployment Progress

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies

Memento Update

2011 IIPC General Assembly, Den Hague 36

Crawl-based Archives host distinct observations.

Transactional Archives never miss an update.

Memento Update

2011 IIPC General Assembly, Den Hague 37

Crawl-Based Web Archives

Distinct Observations are Archived for Many Servers

Memento Update

2011 IIPC General Assembly, Den Hague 38

Server-Side Transactional Web Archives

Entire Change History is Archived for a Single Server

Memento Update

2011 IIPC General Assembly, Den Hague 39

Development of Transactional Web Archive Software

Access: •  Online, real time access via Memento TimeGates •  Batch Export via WARC files for long term preservation

Capture: •  Apache connection filter module captures URI, headers, body •  POSTs in real-time to transactional archive

Memento Update

2011 IIPC General Assembly, Den Hague 40

Update on Memento http://mementoweb.org/

Herbert Van de Sompel Robert Sanderson Michael L. Nelson

Towards Seamless Navigation of the Web of the Past