+ All Categories
Home > Technology > Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

Date post: 20-Jul-2015
Category:
Upload: biblioteca-nacional-de-espana
View: 282 times
Download: 2 times
Share this document with a friend
Popular Tags:
50
Integrating web archiving in acquisition practices Sophie Derrot, Clément Oury, Caroline Rives
Transcript
Page 1: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

Integrating web archiving in acquisition

practices

Sophie Derrot, Clément Oury, Caroline Rives

Page 2: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

2

> The BnF network of subject librarians responsible for selecting websites and collection policy

> Use case (a): integrating web archiving in the BnF Collection Policy Charter

> Use case (b): building and maintaining a cooperation network for the French Presidential and General Elections project

Page 3: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

Content policy for web archives: a digital

curator’s view

Clément Oury

Head of Digital Legal Deposit

Bibliothèque nationale de France@DlWebBnF

Page 4: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

4

Workflow

Access

Preservation

Selection

Harvesting

Content curators

Digital curators: legal

deposit department

Engineers : IT department

Engineers:

Development

Engineers : IT department

Preservation experts

Page 5: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

5

Who selects websites?

Content curators

Within BnF :

12 collection and

legal deposit departments

12 « coordinators »

70 recommending

officers

Regional libraries

20 libraries associated

for 2012 electoral crawl

University libraries

and laboratoriesAssociations :

ORSE, APA

Page 6: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

Within BnF

COLLECTIONS LEGAL DEPOSIT IT

Recommending

officers (ca. 70) : tools,

workshops, tutorials,

guidelines

Legal deposit Board Digital & IT

Steering Committee

Digital

Legal

Deposit (5)

Legal

deposit of

prints

Reference,

Audiovisual,

Literature and Arts,

Sciences, Law, Social

Sciences, Philosophy,

History, Maps, Music,

Photographs,

Performing Arts…

Development

(2)

Operations

(2)

software

hardware

Discussions on amount

of resources dedicated

to each project

Recommending officers

coordinators (12)

Collection Board

Page 7: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

7

BnF curators network

> Each collection unit designates staff to contribute = recommending officers or « correspondants DLWeb »

> In every collection unit, a leader (coordinator)

> Trainings and workshops for all, all year long and by cycles

> Training and awareness combine three key dimensions : technology, budget, collection

> 4 meetings a year and a governing board for collection leaders

> Documents, collaborative guidelines

> = ca. 80 people involved

http://collecteweb.bnf.fr/

Page 8: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

3 layers of guidelines

Layer One:

BNF general

policy (scope)

High level principles

and settings, Law

Applies to all crawls, either

bulk or focused

Layer Two:

Focused crawls

policy

Applies to focused and event

crawls, in all collection units

Generic guidelines and

templates for all curators

Layer Three:

Thematic

policies, special

projects

Specific guidelines for

special areas, themes,

projects and Library

units.

Each collection unit to choose

their own strategy

Page 9: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

Framework for Layer 1

Layer One:

BNF general

policy (scope)

High level principles

and settings, Law

Applies to all crawls, either

bulk or focused

Page 10: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

Legal aspects

The Library can crawl anything in scope

but access restrictions apply.

Protection of personal data?

The Library isn’t responsible for the

contents it displays but may retrieve

them if required.

illicit contents?

No permission required.

Access is restricted within the Library.

Permissions?

-.fr and other TLDs related to the

French territory (.re, .nc, .paris)

-Domain names registered by people or

organizations hosted in France

-Content produced on French territory

Scope of the « nation » ?

BNF’S ANSWERSKEY LEGAL QUESTIONS

= BnF can do bulk harvesting but needs to restrict access.

Page 11: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

Collection history and tradition

Encyclopaedism, consistency,

continuity, select quality with an

open mind, thinking of the long

term. Foreign acquisitions enrich

and complete legal deposit.

Collection development history,

missions as a research library

Take it all, have it all. Make samples

rather than selections

Legal deposit philosophy

BNF’S ANSWERSKEY COLLECTION QUESTIONS

= BnF missions require a combination of

both approaches: bulk + focus

Page 12: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

Framework for layer 2

Layer Two:

Focused crawls

policyGeneric guidelines and

templates for all curatorsApplies to focused and event

crawls, in all collection units

Page 13: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

13

At the curator level, scoping means:

> Identify seeds (targeted URL/documents)

> Whenever applicable, find an existing « template » from the curator tool which matches the publishing model of the website

> In this context, we call a template a generic type or resource for which pre-defined harvesting settings can be applied by default.

> If no template applies, for each seed, define:

- depth

- frequency, exact required date(s) of harvest, if applicable

- budget (for big sites : max URL)

Page 14: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

14

Examples of templates

> Governmental publication websites (ex: French Department of Culture)

- Big

- Stable

- self-archived

= Frequency: Once a year ; Depth : high

> Festival websites (ex: Festival de Cannes)

- Small or medium size

- High rate of change during a short period

- Seldom self archived

= Target a date ; Depth : high

> Newspaper websites (ex: Le Monde)

- Very big

- Very high rate of change

- Partly self-archived (depends)

= Combine strategies: harvest whole site once a year ; harvest surface /news weekly

Page 15: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

Framework for layer 3

Layer Three:

Thematic policies,

special projectsSpecific guidelines for

special areas, themes,

projects and Library

units.

Each collection unit to choose

their own strategy

Page 16: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

16

Website selection 1/2

> Each department is responsible for its own content

selection policy

> But they need to respect the limits given by the Digital Legal

Deposit Service

> Ongoing crawls against one-shots

> … but project crawls are often renewed each year

> For ongoing crawls:

> Departments in charge of thematic acquisitions generally

harvest the “academic web”

> Departments in charge of legal deposit try to encompass all kind

of documents

Page 17: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

17

Website selection 2/2

> Depending on the type of crawl or the type and project, websites are chosen and classified according to various

criteria

> By type of author / publisher (governmental publications,

electoral crawls, activist web, videos)

> By geographical origin (electoral crawls)

> By publishing model (online literature)

> By content format (videos)

> By author name…

> And these categories may be combined!

Page 18: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

18

BnF collections diversity

> News websites> http://www.lemonde.fr

> Governmental publications > http://www.diplomatie.gouv.fr

> Online literature and net-art> http://www.desordre.net

> Event-focused websites> http://www.facebook.com/pages/Comite-de-Solidarite-avec-la-Lutte-du-Peuple-

Egyptien/186252268073586/

> Other websites…> http://theouchocolat.fr/minijeux/index.php

Page 19: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

19

News websites

Page 20: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

20

Government publications

Page 21: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

Online literature and net-art

Page 22: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

22

Event-focused websites

Page 23: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

23

Other websites…

Page 24: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

24

Integrating web archiving

in the BnF Collection Policy Charter

> Caroline Rives, in charge of coordinating the acquisition

policy at the Direction des Collections

> Questions of terminology :« acquisition policy », « collection development », « content strategy » (British Library)

Page 25: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

25

The Charte documentaire des acquisitions

> The charter is a collaborative work, developed in relation with the specialized librarians

> The charter is validated by the direction of the BnF andthe Ministry of Culture and Communication

> The charter establishes guidelines for the staff

> The charter explains how the Bnf collects documents to an external audience

> The charter justifies public expenditure: crucial in a time ofeconomic difficulties !

Page 26: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

26

Scope of acquisition policy

> Any document selected by the library, around the core of the

exhaustive legal deposit of printed books and periodicals

> For web archiving, the whole French domain snapshot can be

assimilated to traditional legal deposit (no discrimination)

> Focused crawls can be assimilated to acquisition policy

Page 27: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

27

Origins of the charter

> Beginning of the 1990s: the new project of Bibliothèque nationale de France and the new building

> A new collection of books and periodicals for open access : 700,000 books

> Encyclopaedism : renewal of acquisitions in new fields ofknowledge (sciences and technology, economics ans

sociology…)

> Renewal of acquisitions of foreign scientific literature

> Guidelines by destination, by disciplines, by languages…

Page 28: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

28

The 2005 charter

> Projet d’établissement 2000-3003 (strategic plan)

> Realization of a synthetic document, including acquisitions

of:

> all formats of documents

> all types of acquisitions (purchases, but also donations, exchanges

and legal deposit of « specilized documents)

> all disciplinary fields covered by the library

> Accessible on the BnF website since january 2005 http://www.bnf.fr/documents/charte_doc_acquisitions.pdf

Page 29: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

29

Website archiving in the acquisition policy

> Not included in the 2005 charter

> But web harvesting had already been organized, in relation with the specialized librarians at the DCO for section of

websites

> Articulation with other types of selections : physical

documents, but also electronic resources, Signets…

> 2006 : DADVSI law

> 2007 : a first approach of acquisition policy for websites

Page 30: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

30

The 2007 document

> Questionnaire to define criteria for selection

> Filled in by the correspondants and coodinators in the

departments

> Summary included in the 2007 general strategic document on

web archiving

> Main concern at the time : ensuring the continuity of

collections

> Rising interest to new models of publication on the web,

especially in the arts

Page 31: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

31

The new charter

> 2010-2013 : Realization of an updated charter

> Web archiving is now part of the acquisition policy, and will

be included in the new charter.

Page 32: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

32

Summary of the charter: introduction

> Missions of the BnF : to preserve the national heritage, to

provide materials and services for research and information

> Context : diversity of acquisition means, diversity of formats

> Ground rules : focus on France, encyclopaedism, openness to other cultures, broad chronological scope, services to all

audiences, cooperation with the library and information network

> Budget and human resources.

Page 33: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

33

Summary of the charter: new fields

> Impact of evolutions of audiences

> Impact of the reorganization of the Haut-de-jardin library

> Increasing place of electronic resources

> Evolution of acquisition policy in sciences and technology

> Web harvesting

> Impact of rehabilitation of the Richelieu building

> Coverage of foreign languages

> Partnerships with other libraries.

Page 34: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

34

Summery of the new charter:

acquisition policies by domains

> A domain can be a discipline (music, mathematics, French

literature…) but it can also be a special format of material (manuscripts, films...).

> A domain can be shared by several departments.

> For each domain, a specific policy is defined, both

quantitative and qualitative.

Page 35: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

35

Structure of a domain policy. 1

> Name of domain> Related domains

> Description of domain> General directions of acquisition policy in the domain

> Departments involved

> Collected formats> Means of acquisitions by department and by format:

purchase, legal deposit, donations…> Audiences by locations: general public, research,

professional…

Page 36: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

36

Structure of a domain policy. 2

> Types of formats (volumes and contents):1. Books and periodicals

2. “Specialized documents” (for example, printed and manuscript music scores)

3. Audiovisual documents (for example recorded music, films, multimedia)

4. Archives and manuscripts

5. Electronic resources6. Web harvesting

> Other libraries specialized in the domain.

Page 37: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

An example of thematic collections:

archiving the French political Web

Sophie Derrot

Digital Legal Deposit Service

Page 38: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

38

Why archive the political Web?

> Specific role played by the Web on the development of

the electoral campaigns

> Material with a short lifespan

Page 39: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

39

A dynamic project

> Dense and dynamic project, with a high reactivity needed

> A large coverage of the national territory

> Regular repetition of the project, with strong links between

each collection

Page 40: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

40

Ten years of the French electoral Web

> Project began in 2002

> All the major elections covered

> Project over a short period of time

> A growing cooperation between the BnF and its partners

Page 41: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

41

Collection policy

> Websites which are representative and active

> Common selection criterias, which are the ones of thedigital legal deposit

> A typology organised by producer : government, candidates, community

Page 42: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

0 - Official websites about the campaign

The typology 1/3

Page 43: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

The typology 2/3

1 - Candidates and their organizations

> 1.1 Candidate websites and blogs

> 1.2 Political parties and networks

> 1.3 Related organizations

Page 44: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

The typology 3/3

2 - Looking the campaign> 2.1 Directories, observatories and analyses

> 2.2 Traditional media (press)

> 2.3 Unions, federations, lobbies, others

> 2.4 Individuals and Web communities

Page 45: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

45

Close-up: elections of 2012

> Crawls from January till July

> 2012 collection:

> 11.04 Tb data;

> 379 millions of collected URLs;

> 10,000 selected harvested websites

Page 46: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

46

The team

> IT department at the BnF for technical side

> Digital Legal Deposit service for the coordination of the

project

> Librarians for the selection:

> 18 librarians within the BnF, in 3 thematic departments

> 44 librarians in 20 regional libraries

Page 47: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

47

National cooperation

Page 48: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

48

A collection to be put to use

> All the collections are available for the researchers in thereading rooms

> Studies already carried out on this subject

> Guided tour in the Archives since 2008 (for 2002-2007)

> 2012: data on the French Open-Data portal, data.gouv.fr

Page 49: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

49

Page 50: Integrating web archiving in acquisition practices. Sophie Derrot, Clément Oury y Caroline Rives

26th November 2012 Session 2 - Integrating web archiving in

acquisition practices

50

Questions?

Thank you for your attention!


Recommended