Date post: | 20-Jul-2015 |
Category: |
Technology |
Upload: | biblioteca-nacional-de-espana |
View: | 282 times |
Download: | 2 times |
Integrating web archiving in acquisition
practices
Sophie Derrot, Clément Oury, Caroline Rives
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
2
> The BnF network of subject librarians responsible for selecting websites and collection policy
> Use case (a): integrating web archiving in the BnF Collection Policy Charter
> Use case (b): building and maintaining a cooperation network for the French Presidential and General Elections project
Content policy for web archives: a digital
curator’s view
Clément Oury
Head of Digital Legal Deposit
Bibliothèque nationale de France@DlWebBnF
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
4
Workflow
Access
Preservation
Selection
Harvesting
Content curators
Digital curators: legal
deposit department
Engineers : IT department
Engineers:
Development
Engineers : IT department
Preservation experts
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
5
Who selects websites?
Content curators
Within BnF :
12 collection and
legal deposit departments
12 « coordinators »
70 recommending
officers
Regional libraries
20 libraries associated
for 2012 electoral crawl
University libraries
and laboratoriesAssociations :
ORSE, APA
Within BnF
COLLECTIONS LEGAL DEPOSIT IT
Recommending
officers (ca. 70) : tools,
workshops, tutorials,
guidelines
Legal deposit Board Digital & IT
Steering Committee
Digital
Legal
Deposit (5)
Legal
deposit of
prints
Reference,
Audiovisual,
Literature and Arts,
Sciences, Law, Social
Sciences, Philosophy,
History, Maps, Music,
Photographs,
Performing Arts…
Development
(2)
Operations
(2)
software
hardware
Discussions on amount
of resources dedicated
to each project
Recommending officers
coordinators (12)
Collection Board
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
7
BnF curators network
> Each collection unit designates staff to contribute = recommending officers or « correspondants DLWeb »
> In every collection unit, a leader (coordinator)
> Trainings and workshops for all, all year long and by cycles
> Training and awareness combine three key dimensions : technology, budget, collection
> 4 meetings a year and a governing board for collection leaders
> Documents, collaborative guidelines
> = ca. 80 people involved
http://collecteweb.bnf.fr/
3 layers of guidelines
Layer One:
BNF general
policy (scope)
High level principles
and settings, Law
Applies to all crawls, either
bulk or focused
Layer Two:
Focused crawls
policy
Applies to focused and event
crawls, in all collection units
Generic guidelines and
templates for all curators
Layer Three:
Thematic
policies, special
projects
Specific guidelines for
special areas, themes,
projects and Library
units.
Each collection unit to choose
their own strategy
Framework for Layer 1
Layer One:
BNF general
policy (scope)
High level principles
and settings, Law
Applies to all crawls, either
bulk or focused
Legal aspects
The Library can crawl anything in scope
but access restrictions apply.
Protection of personal data?
The Library isn’t responsible for the
contents it displays but may retrieve
them if required.
illicit contents?
No permission required.
Access is restricted within the Library.
Permissions?
-.fr and other TLDs related to the
French territory (.re, .nc, .paris)
-Domain names registered by people or
organizations hosted in France
-Content produced on French territory
Scope of the « nation » ?
BNF’S ANSWERSKEY LEGAL QUESTIONS
= BnF can do bulk harvesting but needs to restrict access.
Collection history and tradition
Encyclopaedism, consistency,
continuity, select quality with an
open mind, thinking of the long
term. Foreign acquisitions enrich
and complete legal deposit.
Collection development history,
missions as a research library
Take it all, have it all. Make samples
rather than selections
Legal deposit philosophy
BNF’S ANSWERSKEY COLLECTION QUESTIONS
= BnF missions require a combination of
both approaches: bulk + focus
Framework for layer 2
Layer Two:
Focused crawls
policyGeneric guidelines and
templates for all curatorsApplies to focused and event
crawls, in all collection units
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
13
At the curator level, scoping means:
> Identify seeds (targeted URL/documents)
> Whenever applicable, find an existing « template » from the curator tool which matches the publishing model of the website
> In this context, we call a template a generic type or resource for which pre-defined harvesting settings can be applied by default.
> If no template applies, for each seed, define:
- depth
- frequency, exact required date(s) of harvest, if applicable
- budget (for big sites : max URL)
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
14
Examples of templates
> Governmental publication websites (ex: French Department of Culture)
- Big
- Stable
- self-archived
= Frequency: Once a year ; Depth : high
> Festival websites (ex: Festival de Cannes)
- Small or medium size
- High rate of change during a short period
- Seldom self archived
= Target a date ; Depth : high
> Newspaper websites (ex: Le Monde)
- Very big
- Very high rate of change
- Partly self-archived (depends)
= Combine strategies: harvest whole site once a year ; harvest surface /news weekly
Framework for layer 3
Layer Three:
Thematic policies,
special projectsSpecific guidelines for
special areas, themes,
projects and Library
units.
Each collection unit to choose
their own strategy
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
16
Website selection 1/2
> Each department is responsible for its own content
selection policy
> But they need to respect the limits given by the Digital Legal
Deposit Service
> Ongoing crawls against one-shots
> … but project crawls are often renewed each year
> For ongoing crawls:
> Departments in charge of thematic acquisitions generally
harvest the “academic web”
> Departments in charge of legal deposit try to encompass all kind
of documents
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
17
Website selection 2/2
> Depending on the type of crawl or the type and project, websites are chosen and classified according to various
criteria
> By type of author / publisher (governmental publications,
electoral crawls, activist web, videos)
> By geographical origin (electoral crawls)
> By publishing model (online literature)
> By content format (videos)
> By author name…
> And these categories may be combined!
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
18
BnF collections diversity
> News websites> http://www.lemonde.fr
> Governmental publications > http://www.diplomatie.gouv.fr
> Online literature and net-art> http://www.desordre.net
> Event-focused websites> http://www.facebook.com/pages/Comite-de-Solidarite-avec-la-Lutte-du-Peuple-
Egyptien/186252268073586/
> Other websites…> http://theouchocolat.fr/minijeux/index.php
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
19
News websites
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
20
Government publications
Online literature and net-art
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
22
Event-focused websites
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
23
Other websites…
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
24
Integrating web archiving
in the BnF Collection Policy Charter
> Caroline Rives, in charge of coordinating the acquisition
policy at the Direction des Collections
> Questions of terminology :« acquisition policy », « collection development », « content strategy » (British Library)
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
25
The Charte documentaire des acquisitions
> The charter is a collaborative work, developed in relation with the specialized librarians
> The charter is validated by the direction of the BnF andthe Ministry of Culture and Communication
> The charter establishes guidelines for the staff
> The charter explains how the Bnf collects documents to an external audience
> The charter justifies public expenditure: crucial in a time ofeconomic difficulties !
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
26
Scope of acquisition policy
> Any document selected by the library, around the core of the
exhaustive legal deposit of printed books and periodicals
> For web archiving, the whole French domain snapshot can be
assimilated to traditional legal deposit (no discrimination)
> Focused crawls can be assimilated to acquisition policy
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
27
Origins of the charter
> Beginning of the 1990s: the new project of Bibliothèque nationale de France and the new building
> A new collection of books and periodicals for open access : 700,000 books
> Encyclopaedism : renewal of acquisitions in new fields ofknowledge (sciences and technology, economics ans
sociology…)
> Renewal of acquisitions of foreign scientific literature
> Guidelines by destination, by disciplines, by languages…
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
28
The 2005 charter
> Projet d’établissement 2000-3003 (strategic plan)
> Realization of a synthetic document, including acquisitions
of:
> all formats of documents
> all types of acquisitions (purchases, but also donations, exchanges
and legal deposit of « specilized documents)
> all disciplinary fields covered by the library
> Accessible on the BnF website since january 2005 http://www.bnf.fr/documents/charte_doc_acquisitions.pdf
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
29
Website archiving in the acquisition policy
> Not included in the 2005 charter
> But web harvesting had already been organized, in relation with the specialized librarians at the DCO for section of
websites
> Articulation with other types of selections : physical
documents, but also electronic resources, Signets…
> 2006 : DADVSI law
> 2007 : a first approach of acquisition policy for websites
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
30
The 2007 document
> Questionnaire to define criteria for selection
> Filled in by the correspondants and coodinators in the
departments
> Summary included in the 2007 general strategic document on
web archiving
> Main concern at the time : ensuring the continuity of
collections
> Rising interest to new models of publication on the web,
especially in the arts
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
31
The new charter
> 2010-2013 : Realization of an updated charter
> Web archiving is now part of the acquisition policy, and will
be included in the new charter.
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
32
Summary of the charter: introduction
> Missions of the BnF : to preserve the national heritage, to
provide materials and services for research and information
> Context : diversity of acquisition means, diversity of formats
> Ground rules : focus on France, encyclopaedism, openness to other cultures, broad chronological scope, services to all
audiences, cooperation with the library and information network
> Budget and human resources.
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
33
Summary of the charter: new fields
> Impact of evolutions of audiences
> Impact of the reorganization of the Haut-de-jardin library
> Increasing place of electronic resources
> Evolution of acquisition policy in sciences and technology
> Web harvesting
> Impact of rehabilitation of the Richelieu building
> Coverage of foreign languages
> Partnerships with other libraries.
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
34
Summery of the new charter:
acquisition policies by domains
> A domain can be a discipline (music, mathematics, French
literature…) but it can also be a special format of material (manuscripts, films...).
> A domain can be shared by several departments.
> For each domain, a specific policy is defined, both
quantitative and qualitative.
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
35
Structure of a domain policy. 1
> Name of domain> Related domains
> Description of domain> General directions of acquisition policy in the domain
> Departments involved
> Collected formats> Means of acquisitions by department and by format:
purchase, legal deposit, donations…> Audiences by locations: general public, research,
professional…
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
36
Structure of a domain policy. 2
> Types of formats (volumes and contents):1. Books and periodicals
2. “Specialized documents” (for example, printed and manuscript music scores)
3. Audiovisual documents (for example recorded music, films, multimedia)
4. Archives and manuscripts
5. Electronic resources6. Web harvesting
> Other libraries specialized in the domain.
An example of thematic collections:
archiving the French political Web
Sophie Derrot
Digital Legal Deposit Service
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
38
Why archive the political Web?
> Specific role played by the Web on the development of
the electoral campaigns
> Material with a short lifespan
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
39
A dynamic project
> Dense and dynamic project, with a high reactivity needed
> A large coverage of the national territory
> Regular repetition of the project, with strong links between
each collection
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
40
Ten years of the French electoral Web
> Project began in 2002
> All the major elections covered
> Project over a short period of time
> A growing cooperation between the BnF and its partners
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
41
Collection policy
> Websites which are representative and active
> Common selection criterias, which are the ones of thedigital legal deposit
> A typology organised by producer : government, candidates, community
0 - Official websites about the campaign
The typology 1/3
The typology 2/3
1 - Candidates and their organizations
> 1.1 Candidate websites and blogs
> 1.2 Political parties and networks
> 1.3 Related organizations
The typology 3/3
2 - Looking the campaign> 2.1 Directories, observatories and analyses
> 2.2 Traditional media (press)
> 2.3 Unions, federations, lobbies, others
> 2.4 Individuals and Web communities
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
45
Close-up: elections of 2012
> Crawls from January till July
> 2012 collection:
> 11.04 Tb data;
> 379 millions of collected URLs;
> 10,000 selected harvested websites
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
46
The team
> IT department at the BnF for technical side
> Digital Legal Deposit service for the coordination of the
project
> Librarians for the selection:
> 18 librarians within the BnF, in 3 thematic departments
> 44 librarians in 20 regional libraries
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
47
National cooperation
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
48
A collection to be put to use
> All the collections are available for the researchers in thereading rooms
> Studies already carried out on this subject
> Guided tour in the Archives since 2008 (for 2002-2007)
> 2012: data on the French Open-Data portal, data.gouv.fr
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
49
26th November 2012 Session 2 - Integrating web archiving in
acquisition practices
50
Questions?
Thank you for your attention!