Date post: | 14-Jun-2015 |
Category: |
Presentations & Public Speaking |
Upload: | rosalielack |
View: | 1,111 times |
Download: | 0 times |
From Crawling to Walking: Improving Access to Web Archives
SAA 2014Session 703
From Crawling to Walking: Improving Access to Web Archives
1. Jane Zhang2. Michael Paulmeno3. Meg Tuomala4. Benn Joseph5. Polina Ilieva6. Jennifer Wright7. John Bence8. Olga Virakhovskaya9. Anna Perricci10. Rick Fitzgerald11. Rosalie Lack
Jane Zhang
Catholic University of America
Web Records, Web Archived Files, and Web Archives Access Models
Jane Zhang, Catholic University of America
Session 703 - From Crawling to Walking: Improving Access to Web Archives
SAA 2014, Washington DC Saturday, August 16
Introduction
Web as records
The Web ARChive files as recordkeeping formats
Web archives access models
Web Archiving Initiatives
• A survey on web archiving initiatives– Daniel Gomes et al., Foundation for National
Scientific Computing, Portuguese Web Archive Team
– International Conference on Theory and Practice of Digital Libraries 2011, 25-29 September 2011
• Wikipedia: List of Web archiving initiatives
Web Archiving Initiatives
A survey on web archiving initiatives (2011) 42 web archiving initiatives worldwide 9 initiatives from the United States
List of Web Archiving Initiatives (July 2014) 70 web archiving initiatives worldwide 17 initiatives from the United States
Web File Formats 2011 Worldwide Survey
The ARC and WARC formats are dominant, being used by 54% of the initiatives.
2014 List – USA 10 out of 17 initiatives identified as
using the ARC and/or WARC formats 58% of the US Web archiving initiatives
Web Archives Access Models 2011 Worldwide Survey
89% support access to URL history 79% enable searching metadata 67% provide full-text search over archived
content 2014 List – USA
URL history: 12 out of 17 – 70% Metadata: 13 out of 17 – 76% Full-text: 12 out of 17 – 70%
Metadata: Theme-based Collections
Collection overview, name, title, subject, abstract, language, year captured
Site title, subject, place, language Collection description, keyword, filter
by site title, and/or file type, topic group
Catalog records (collection or website)
Metadata: Provenance-based Collections
Site owner, business activity, topic, sub-topic, region, country, language, year created, date archived
Collection/series description, site title Keyword search, browse by agency Collection description, title keyword,
browse by agency name, government branch, or agency expiration date
Browse by region, then site owner
Michael Paulmeno
Delta State University
Accessing Web Archives Through the Library Catalog
ByMichael Paulmeno
Overview• Many challenges to making web archives
accessible• Archival description not fully compatible with
library catalogs• Problem not unique to web archives• Differing metadata and content standards lead
to separation between libraries and archives (i.e. silos)
• Researchers who access archives through library systems tend to use them longer1
1 Noah Huffman, “More than Just Linking: Integrating MARC and EAD in a Single Discovery Interface at Duke, UNC-Chapel Hill, and NCSU”, 14
The Current State of Affairs
• Collections accessed through access multiple points
• Subject headings2
• Many organizations create two descriptions and link via MARC 856 field; this can cause confusion3
• Yet significant discovery occurs through search engines4
2 Michelle Mascaro, “Controlled Access Headings in EAD Finding Aids: Current Practices in Number of and Types of Headings Assigned,” 223.3 Noah Huffman, “More than Just Linking: Integrating MARC and EAD in a Single Discovery Interface at Duke, UNC-Chapel Hill, and NCSU,” 3 –5.4. Ibid, 13
Challenges to Integration• MARC records lack detail 5 6
• Archivists uncertain about readiness to adopt new standards 7
• Many different systems (ArchivesSpace, Ebsco Discovery, Blacklight, various Integrated Library Systems) and metadata standards
• Other challenges specific to web archives• Ex. How to represent a continuously accessioned
resource?
5 Caprini and Kelcy Shepherd, “The MARC Standard and Encoded Archival Description,” 19.6 Karen F. Gracy and Frank Lambert, “Who’s Ready to Surf the Next Wave? A Study of Perceived Challenges to Implementing New and Revised Standards for Archival Description,” 102.7 Ibid, 117
Towards the Future• Increasing efforts to integrated archival
description and library catalogs– University of Denver Penrose Library 8
– Triangle Research Libraries Network 9
– Library of Congress– UNC Chapel Hill
• Adaptability key to future collaboration• What affects archives, affects web archives as
well8 Gregory C. Colati, Katherine M. Crowe, and Elizabeth S. Meagher, “Better, Faster, Stronger: Integrating Archives Processing and Technical Services.”9 Noah Huffman, “More than Just Linking: Integrating MARC and EAD in a Single Discovery Interface at Duke, UNC-Chapel Hill, and NCSU.”
Works Cited• Caprini, Peter, and Kelcy Shepherd. “The MARC Standard and Encoded Archival Description.”
Library Hi-Tech 22, no. 1 (2004): 18 –27. doi:10.1108/07378830410524468. • Gregory C. Colati, Katherine M. Crowe, and Elizabeth S. Meagher. “Better, Faster, Stronger:
Integrating Archives Processing and Technical Services.” Library Resources and Technical Services 53, no. 4 (October 2009): 261 – 270.
• Karen F. Gracy, and Frank Lambert. “Who’s Ready to Surf the Next Wave? A Study of Perceived
Challenges to Implementing New and Revised Standards for Archival Description.” The American Archivist 77, no. 1 (Spring/Summer 2014): 96–132.
• Michelle Mascaro. “Controlled Access Headings in EAD Finding Aids: Current Practices in Number
of and Types of Headings Assigned.” Journal of Archival Organization 9, no. 3–4 (January 2011): 208 – 225. doi:10.1080/15332748.2011.643690.
• Noah Huffman. “More than Just Linking: Integrating MARC and EAD in a Single Discovery
Interface at Duke, UNC-Chapel Hill, and NCSU.” Journal for the Society of North Carolina Archivists 8, no. 2 (April 2011): 2 – 17.
Meg Tuomala
Gates Archive
Different strokes for different folks / Meeting the descriptive & access needs of multiple web archive collections / With minimal workflow and process change
Meg TuomalaAssistant archivist, Gates Archive
Formerly e-records archivist at UNC-Chapel Hill
Web archiving at UNC: context
● Started in 2013; using Archive-it ● 6 web archive collections ● Extension of / supplement to existing
collections● Special collections at UNC consolidated;
archival & biblio tech services are one dept
Different folks: the collections
Archival
● Southern Historical
Collection
● Southern Folklife
Collection
● University Archives
Biblio
● North Carolina Collection
● Rare Book Collection
● Digital Artists’ File
Different strokes: cataloging/ description & access
Archival
● Archival arrangement
& description at the
collection level, series
level○ Online finding aid
○ Catalog record
(and EAD) in
Library catalog
Biblio
● Bibliographic cataloging at the item level○ Catalog record in
Library catalog
(artificial
collection)
catalog record in library catalog
finding aids
Benn Joseph
Northwestern University Library
WASsup?: Describing Web Archives Using Archon
SAA Washington, D.C.August 16, 2014
Benn JosephManuscript Librarian
Northwestern University [email protected]
Image of WAS public interface
Item record for crawled site in WAS
NU version of Archon:• Only used for collection management• Separate blacklight/solr public
interface that searches and displays the finding aids
• Finding aids all live in a fedora repository
• “Ingest EAD” button added to Archon, puts xml into fedora to then be served via finding aids portal
Pic of entering in archon—container list
Entering WAS site URL as digital object in Archon
NUWA finding aid
NUWA finding aid
Finding aids exported as MODS and ingested by Primo
Polina Ilieva
University of California, San Francisco
August 16, 2014Polina Ilieva, UCSF Archives & Special Collections
Science Online:Evaluating usage, impact and appraisal
Since it’s so easily accessible, lab websites are used as reference tools by lab members
Sharing datasets Channels for scholarly
communications After funding ends
website can be the only place where the data is preserved and available
Why collect?
Not just preserved for future use, scientists need instant access
Websites become integral part of scientific scholarly output
Impact
Curation and Appraisal How to select from hundreds of
labs? Web Archive pilot project in
collaboration with the library’s Research Informationist: Research @UCSF collection
Will use UCSF Profiles: Research Networking and Expertise Mining Tool
Collect and analyze info about faculty and researchers who lead labs: the length of service/title, # of scholarly publications, availability of websites, grants and awards.
Protocols Data Images Lectures (a/v) Publications List of lab members
What to collect?
Access
Need to know how data and collections are used to find an optimal way to provide access
Access
Thank you!
Polina E. Ilieva, CA
Head of Archives and Special Collections
University of California, San Francisco
Jennifer Wright
Smithsonian Institution Archives
Square Peg in a Round Hole: Integrating Web Archives into Existing Descriptive Practices
Jennifer WrightArchives and Information Management Team
Leader
SAA 2014Session 703
Accession-based Collections Management
• Each transfer is separate accession• Each accession cataloged separately in
CMS• Each accession has own finding aid
Solution for websites:
Crawls with similar dates and the same creator are combined into one accession
Description and Cataloging• Describes each
website/blog in accession
• Notes technical and other issues
• Includes crawl date(s)
• Indexes subjects, website/blog/ exhibition titles, and other creators
EAD Finding Aid
• Includes descriptive data from CMS
• Lists each website/blog included in accession
• Uses DAO tag to link to crawl on Archive-It
Search on “Website Records” at http://siarchives.si.edu/search/sia_search_findingaids
Archive-It• Browse URLs• Search across
all Smithsonian crawls
• Search by keyword or limiting options
• Plan to take better advantage of metadata
Smithsonian on Archive-It:https://archive-it.org/organizations/660
John Bence
Emory University
WAS GOING ON AT EMORY?
Integration of WAS-CDL web archives with MARBL online finding aids and web presence
John [email protected]
@jdbence
54
“Topics” for browsing sites by creator or by
institutional hierarchies (Laney Graduate School; ‘Administration’)
55
Supplied URL from WAS given a ID and persistent URL. The URL is then linked in <dao>
element
56
“Digital Materials Available” banner indicates existence of <dao> element
Choosing “Series 3: Web Archives”
provides link to WAS site for relevant
content
57
Website migration in summer 2013 allowed for integration of WAS search interface as a
page on MARBL website
58
• Next steps• UX testing on finding aids integration vs. local search
page• Gather (read: develop) additional use analytics
• For more go to:• http://marbl.library.emory.edu/collections/archives/web.h
tml• http://findingaids.library.emory.edu/
Google analytics for search interface from Feb 2013 to June 2014. Page went live in June 2013.
• #1 referral: Redirected URL of single web archive
• #2 referral: MARBL website search interface
• #3 referral: finding aids database
Thanks!
Olga Virakhovskaya
Bentley Historical Library, University of Michigan
Describing <archived> web content from single sites to web archives
Olga [email protected]
http://bentley.umich.edu/
Local subject heading (MARC fields 690)
LC subject headings (MARC fields 6xx)
MARC field 260/264
MARC fields 1xx/7xx
MARC fields 520 & 545 / History & Scope
and Content notes
MARC field 245
– Think BIG – Automate – Follow standards– Be consistently clear– Communicate
e hU a
…because machines don’t know everything
Anna Perricci
Columbia University Libraries
MARC records for the Contemporary Composers Web Archive
Anna PerricciColumbia University Libraries
SAA Lightning Talk (August 16, 2014)
Web Archiving at Columbia
We’ve only got 5 minutes!
• Columbia University Libraries web archiving program precedents
• Current Mellon grant
• Collaborative web archiving
Contemporary Composers Web Archive
Selectors• Borrow Direct Music Librarians Group: music librarians at Brown,
Columbia, Cornell, Dartmouth, Harvard, Johns Hopkins, Princeton, and Yale universities, MIT, and the universities of Chicago and Pennsylvania
Cataloging expertise• Russell Merritt (cataloger specializing in music resources)• Kate Harcourt (Director of Original and Special Materials
Cataloging)• Alex Thurman (Web Resources Collection Coordinator)
CCWA in Archive-It
Creating MARC records for web archives
• Creating MARC records for archived websites is standard practice at CUL– MARC records make web
archives discoverable in CLIO (Columbia Libraries Information Online)
• Collection level and seed level records
• Will use Archive-It interface to make Dublin Core records
Patron view of record in CLIO
Cataloger’s view of record in CLIO
Anticipating wider use of MARC records
• Records have been released to WorldCat
• Collaborators on cataloging were attentive to which fields will ordinarily be stripped out when a MARC record is imported to another institution’s OPAC
Conclusions
• So far sample of 10 records has taught us…
• Positive feedback from music librarians
• Next we will add another 44 records for the archived sites in CCWA soon
Rick Fitzgerald
Library of Congress
Access in Transition:
Rethinking Descriptive Practices for the LC Web Archives
Migration effort
• Began in 2013, ongoing• Move web archives from stand-alone web
application at http://loc.gov/lcwa to library-wide discovery system at http://loc.gov/websites/
• Metadata and content migration• Cross-functional team effort
Interface - before and after
New Possibilities
• Web archives discoverable alongside other LC
collections for first time
• Web archives searchable from LC main page
for first time – greater visibility
• Consistent navigation, look and feel mirrors LC
website
Integrated into search
New Challenges
• Thousands of MODS records already created
for access, how to repurpose?
• Different interfaces, different needs
• Enable new ideas (combined records)
• Keeping useful elements, old and new
Rosalie Lack
Califronia Digital Library
Web Archiving Service (WAS)
SAA Web Archiving Roundtable
Follow the blog!• http://webarchivingrt.wordpress.com/
Learn more!• http://www2.archivists.org/groups/web-archi
ving-roundtable
Tearing Down Silos
What We’re Doing
• Creating finding aids for each web archive• Adding links to existing finding aids for the
relevant archived sites• Providing a web archive collection search page• Uploading records into library catalogs• Sending records to OCLC • Building collaborative collections and providing
unified access• Integrating access with other formats in our
discovery systems
What Else Should We Be Doing?
Open Discussion
Image creditsTitle: The razing of silos on the former Roy Ranch, San Geronimo, California, May, 1964 [photograph]Creator/Contributor: unknownDate: May, 1964Contributing Institution: Marin County Free Libraryhttp://content.cdlib.org/ark:/13030/kt3489r96r/?order=1http://content.cdlib.org/ark:/13030/kt067nf0kk/?order=1http://content.cdlib.org/ark:/13030/kt467nf1dq/?order=1
Thank you!