+ All Categories
Home > Documents > Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28...

Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28...

Date post: 30-Dec-2015
Category:
Upload: randell-mcbride
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
37
Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013
Transcript
Page 1: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

Web Archives: Interacting with Scholars

Helen Hockx-Yu

Head of Web Archiving

British Library

28 November 2013

Page 2: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

OVERVIEW Access to Web Archives

2

Page 3: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

Web Archiving initiatives worldwide

3http://en.wikipedia.org/wiki/File:Map_of_Web_archiving_initiatives_worldwide.png

Page 4: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

(Scholarly) use of web archives?

4

Restricted access, e.g. large scale national web archives referred to as “dark archives”

Archiving institutions’ focus on data collection, not usage

“Document-centric” access methods

Cannot produce replicas of original websites

No agreed way of calculating / benchmarking access statistics

Little evidence of scholarly use of web archives, making it difficult to understand requirements

Page 5: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

Access methods

5

URL search Keyword search

Full-text search

Thematic Collections

Subject Browsing

Alphabetical browsing

26 15 11 11 9 14

International Internet Preservation Consortium (IIPC) – 46 members worldwide

“IIPC members’ archives” has 29 entries 19 have full or partial online access, often permission-based

URL search as standard, universal access method - requires users to know the URL of the website they are looking for

For many archives, full-text search is the next challenge on the roadmap

Page 6: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

6

Web archive as historical document

Page 7: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

SCHOLARLY FEEDBACK UK Web Archive

7

Page 8: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

Scholarly feedback

User Survey in 2012 to identify scholarly value of the UK Web Archive, as perceived by researchers To obtain feedback on the access

mechanisms currently offered by archive To identify gaps in terms of content

coverage To obtain insight into reason why

researchers may or may not use the web archive

8

Page 9: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

Methodology

9

By IRN Research between May and June 2012 94 telephone interviews with previous and non-

users of the UK Web Archive – 74% are non-users

A small group was asked to undertake a second phase, running search and detailing each stage – documented as case studies

Page 10: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

Interview sample by subject

10

Subject Non-users Users

Arts and Humanities 33 10

Social Sciences 27 11

Science Technology Medicine

4 3

Total 64 24

Unclassified 6 -

Page 11: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

Scholarly value

11

Non users Users

Appreciate potential value but for many no relevant content

All understand the value as snapshot of selective sites at specific times

More special collections would increase value

Value would increase with more scientific and technical content

Page 12: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

Access Mechanisms

12

Non users Users

Search tool easy to use but complicated for minority

Majority satisfied with presentation of results and ease of use of site

Most search / browse by special collections

More interest in visualisation tools

Search results unstructured and random

Need for improved data mining tools

More explanation about functions and features needed

Limited interest in visualisation tools

Page 13: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

Additional functions and features

13

Non users Users

Improvements to search results pages

6-monthly updates

Interactive features Interactive features

Facility to suggest special collections

Too much text on home page

Page 14: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

Content coverage

14

Non users Users

More relevant special collections More images, illustrations, rich media

More images, blogs Politics, contemporary British history

Too much missed from specific websites

Page 15: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

Reason for using or not using UKWA

15

Non users Users

Current content not relevant Majority “very likely” to use again as there is content of interest

More information regarding selection policy

Another 39% “quite likely”

Less than a quarter “very likely” to use again

Page 16: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

Why do researchers use / not use a web archive

16

Relevance of content determines whether researchers use it

Selective web archives please some but disappoint others

Use web archives for reference AND analytics

Still a significant portion of the research community yet to be reached

Page 17: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

Access statistic of the UK Web Archive: 1 Jan – 28 Nov 2013

17

Page 18: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

INTERACTING WITH SCHOLARS

Web Archives

18

Page 19: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

Scholarly interactions: three types

Archive-driven Initiated by archival institutions Aimed at understanding scholarly requirements and improving archival

practice

Scholar-driven Initiated by scholars with research interest related to web archiving or

archived web material, including many “unknown” scholars A number of active research groups emerging

Netlab, WebArt and DMI, IHR, OII, ODU… Attention from the Web Science community

Project-based Various scale, scope and funding sources Developing web archiving or discipline specific solutions Researchers and archiving institutions as partners

19

Page 20: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

Scholarly interactions: three phases

20

Phase 1: Building collections Scholars’ involvement in scoping collections, selecting and

describing websites relevant to research interest Creation of specific, (narrow) topical collections, e.g. “Religion,

politics and law since 2005” in the UK Web Archive

Phase 2: Formulating research questions Brain-storm sessions, workshops etc. Shift of focus to web archives in entirety The Analytical Access to the Domain Dark Archive (AADDA) project

9 research proposals by arts, humanities and social sciences scholars

A prototype UI for analytical access Lack of awareness & baseline knowledge, Time & resource consuming Challenging: you don’t know what you don’t know

Page 21: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

21

Scholarly interaction: three Phases

Phase 3: independent use of web archives The desired “go-to” state, meet common scholarly

requirements Web archives do not become bottlenecks Base-line knowledge is self-explanatory, e.g. scope of the

archive, its coverage and lacunae, how it was collected, and how a particular website was crawled

Clear interfaces and jargon-free descriptions in alignment with scholarly requirements

Open access Including provision of downloadable derived or secondary

datasets, e.g. http://data.webarchive.org.uk/opendata/

Page 22: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

How was the UK web linked in 1996?

22

• By Rainer Simon using UK Host-Level Link Graph (1996-2010) dataset

• Based on the 1996 portion: 58,842 hosts (nodes); 184,433 host-to-host links (edges)

• UK web as part of the global web

• Scalability issues with large dataset over time

Page 23: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

SCHOLARLY REQUIREMENTS Web Archives

23

Page 24: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

Scholarship is changing

Blurred boundaries between scholarly sources and popular sources, even more so in the context of the web

Any source used for scholarly purposes can be defined as scholarly source

Scholarship is evolving: computational engaged research gaining momentum e.g. digital humanities Redrawing disciplinary boundaries Less text-based, multi-media driven Web playing an important role – will archives of the web

do the same?

24

Page 25: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

25

Scholarly use (of digital sources): key characteristics

Availability or accessibility

Text and paratext, defined by Gérard Genette as “accompaniments” that “surround or prolong the text”. Niels Brugger (2010) applied this concept to websites and argues it is different in form and function, and plays a crucial role in textual coherence of a website

Or context, in the usual sense of the word, e.g. out and in-links

Citation – backbone of research - requires persistence identification of sources, ideally retrievable

Sources relevant and specific to research question, without any arbitrarily imposed (national , geographical or format related) boundaries

Quality

Flexibility /ability to apply digital methods for analytics and discovery of new knowledge

Page 26: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

26

Requirements for web archives

Characteristics of Scholarly use Requirements for web archives

Availability No access restriction, available online

Paratext or context

Access to collection policy and scope, crawl configuration, craw log and any contextual information

Persistence and citability

- Longevity of web archives - Persistent identifiers- Standards of citing archived websites- Integration with bibliographical management tools (eg Zotero)

Collect / organise research corpus

- Archiving of research corpora on demand- Means to mix and match and reassemble corpora based on research questions

Quality- Archival version represents as much as possible the live website in completeness, intellectual content, behaviour and look and feel- Curation

Applying Digital methods

- Multiple access methods including data analytics and visualisations- Access to web archives as “big data”

Boundary & format-independent

- Interlinked web archives - integration with other digital and printed holdings eg books, ejournals

Page 27: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

Unique Selling Points (USPs)

27

The live web as a fast evolving, interactive, multi-dimensional, open and participatory and interlinked collective system

Web archives as static, flat, exclusive, individual systems with boundaries and limitations

Focus on USPs – things that differentiate web archives from the live web Some web resources have vanished and web archives hold the

only copies of these Periodic snapshots showing evolution and change of websites Web archives as comprehensive historical datasets - lends itself to

opportunities for analytical access

Linked web archives

Page 28: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

Who has archived http://www.conservatives.com/?

Page 29: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

Mementos service

Allow users to find archived web pages (mementos) in multiple web archives across the world (search based on aggregated metadata)

Exposes the memento protocol, which adds time dimension to HTTP - accessing the past web as it is to access the current web

uses the Memento aggregate TimeGate hosted by lanl.gov Source code

Also developed the Find memento bookmarklet, finding archived versions of 404 webpages while browsing

Page 30: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

EXTRA SLIDES FOR ILLUSTRATION

UK Web Archive

30

Page 31: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

UK Web Archive: search interface

31

Page 32: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

UK Web Archive: browse interface

32

Page 33: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

33

Using N-gram for scholarly research

Courtesy of Dr Peter Webster, Institute of Historical Research, University of London

Page 34: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

UK Web Archive: visual browsing

34

Page 35: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

RSS feed of latest instances

35

Page 36: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

Replacing original search function on site

36

Page 37: Web Archives: Interacting with Scholars Helen Hockx-Yu Head of Web Archiving British Library 28 November 2013.

Showing the big picture

37

http://seadragon.com/view/wky


Recommended