+ All Categories
Home > Documents > Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each...

Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each...

Date post: 18-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
52
Progress Made and Lessons Learned through Collaborative Web Archiving Projects Anna Perricci Columbia University Libraries Archive-It Partner Meeting 2014 November 18, 2014
Transcript
Page 1: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Progress Made and Lessons Learned

through Collaborative Web Archiving Projects

Anna Perricci

Columbia University Libraries

Archive-It Partner Meeting 2014

November 18, 2014

Page 2: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Web Resources Archiving Collaboration

• Many thanks to the Mellon Foundation

• Building collaborations among

– The web archiving community

– Other research libraries

– Users and potential users of web archives

– Site creators

Page 3: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Incentive awards projects to advance web archiving tools

Warcbase: Building a Scalable Web Archiving Platform on HBase and Hadoop. (Jimmy Lin, University of Maryland)

Archiving Transactions Towards Uninterruptible Web Service (Zhiwu Xie and Edward A. Fox, Virginia Tech University)

Page 4: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Incentive awards projects to advance web archiving tools

Visualizing Digital Collections of Web Archives (Michele

Weigle, Old Dominion University)

Tools for Managing Seed URLs (Michael Nelson, Old

Dominion University)

Page 5: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Incentive awards projects to advance web archiving tools

Perma.cc: Mitigating the Pervasive Problem of Link Rot in Scholarly Works and Preserving Online Content (Kim Dulin, The Harvard Library Innovation Lab)

Free Law Project

Providing free access to primary legal materials, developing legal research tools, and supporting academic research on legal corpora)

Page 6: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Building an efficient, coherent, and scalable national framework for collecting web content

Page 8: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Program Components

• Communication and coordination

• Seed management and harvest

• Supplemental quality review (QA testing)

• MARC Metadata

• Local preservation storage (seeking solutions)

Page 9: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

The first 18 months of collaborative collecting

• Planning, needs assessment (interviews with stakeholders including Associate University Librarians for collection development at each Borrow Direct institution in 2013), timelines created

• Group communication (spreadsheets, Basecamp), cultivating dialogs

• Coordinate seed URLs nomination for pilots collections (CCWA, CAUSEWAY), QA testing and creation of MARC records

• Trying out workflows for optimal balance of involvement and efficient forward motion on projects

• In planning stages for sharing costs & 5 year plan for Borrow Direct/Ivy Plus collaborations

Page 10: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Collaboration with music librarians

Page 11: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Contemporary Composers Web Archive

Selectors

• Borrow Direct Music Librarians Group: music librarians at Brown, Columbia, Cornell, Dartmouth, Harvard, Johns Hopkins, Princeton, and Yale universities, MIT, and the universities of Chicago and Pennsylvania

Cataloging expertise

• Russell Merritt (cataloger specializing in music resources)

• Kate Harcourt (Director of Original and Special Materials Cataloging)

• Alex Thurman (Web Resources Collection Coordinator)

Page 12: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

CCWA

Page 13: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

CCWA

Page 14: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Progress on CCWA & lessons learned so far

By the numbers:

• 11 curators participating

• 56 sites currently available in Archive-It – 23 additional sites for follow up

• 27 GB of content archived (268,519 URLs)

• 50 MARC records in WorldCat as of 11/18/14 – Russell Merritt (music cataloger) collaboratively developed MARC records

for composers websites; further cataloging of available sites through 2CUL

Outreach

• SAA presentation on MARC records for CCWA http://www.slideshare.net/annaperricci/lightning-talk-for-session-703-of-society-of-american-archivists

• Over 30 sites tested for quality by five music librarians; bibliographic assistant on the grant tested all sites in collection

Page 15: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

CCWA Permissions

77 Composers

Yes (37)

No (0)

Did not respond (35)

No contact info (2)

Recently died/did notcontact (3)

Page 16: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Quality Assurance with music librarians

Page 17: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Creating MARC records for web archives

• Creating MARC records for archived websites is standard practice at CUL

– MARC records make web archives discoverable in CLIO (Columbia Libraries Information Online)

• Collection level and seed level records

• Will use Archive-It interface to add Dublin Core metadata

Page 18: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Anticipating wider use of MARC records

• Records have been regularly released to WorldCat

• Collaborators on cataloging were attentive to which fields will ordinarily be stripped out when a MARC record is imported to another institution’s OPAC

Page 19: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

MARC records

Page 20: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Patron view of record in CLIO

Page 21: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Cataloger’s view of record in CLIO

Page 22: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Progress on CAUSEWAY & lessons learned

• Curators from 9 Borrow Direct institutions (Ivies Plus Art & Architecture Group) – Lead advisors: Carole Ann Fabian and Chris Sala

• 137 seed URLs (over 100 harvested and being released as sites are tested, cataloged and assigned metadata in Archive-It)

• 51 GB of content archived (1,006,114 URLs )

• Over 60 sites available in Archive-It with DC metadata (also all 60+ have MARC records in CLIO)

Outreach

• Update sent to IVAAG soliciting feedback

• Gave update and got feedback at semi annual IVAAG meeting

• Presentation scheduled for ARLIS/NA 2015

Page 23: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

CAUSEWAY Permissions

137 Site owners

Yes (74)

No (3)

Later (2)

No contact info (2)

Did not respond (56)

Page 24: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

CAUSEWAY

Page 25: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

CAUSEWAY

Page 26: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

CAUSEWAY

Page 27: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

CAUSEWAY

Page 28: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Cataloging expertise brought to CAUSEWAY

• Alex’s expertise in cataloging architecture and urban planning sites (built through collaboration with Chris Sala on the Avery collecting of web archives) equips him to make more specific MARC records for sites in CAUSEWAY

• Columbia University art and architecture librarians encourage users to find resources via records in the OPAC so access to CAUSEWAY sites will likely be via the MARC records which point to the calendar page for archived sites

• Alex is working with our Bibliographic Assistant, Naeema Akter (position funded by the grant as well) to add appropriate metadata for better browsing in the Archive-It interface

Page 29: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Early start on facets in Archive-It

Page 30: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

CAUSEWAY goals for duration of remainder of grant

• Collect all nominated sites in scope, test for quality, create a MARC record for each archived website (by early 2015)

• Evaluate quality and solicit feedback (ongoing)

• Meet at ARLIS/NA and discuss progress (March 2015)

– Anna will also give a presentation on collaborative web archiving projects at ARLIS/NA

• Establish ongoing workflows and goals (2015 and onward)

• End of pilot phase: December 2015

Page 31: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Project tracking: Basecamp & many, many spreadsheets

Page 32: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Pilot climate change collecting & lessons learned so far

• 25 selectors from 5 institutions

Great range of fields:

-Wide variety of area studies (9)

-Social science (5)

-Science and environmental science (4)

-Medical (1), Law (1), Special Collections (1)

-Collection Development AUL (3), Preservation (1)

• 127 seeds websites nominated (some duplication)

• A lot of enthusiasm for topic

Page 33: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

What we’ve learned about workflows and scale

• Distributing work does not reduce costs

• Collaborative effort builds the project and new tasks promote professional growth

• Quality Assurance and cataloging integral to process of creating high quality collections of web archives

Page 34: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

#webarchivinghappenshere

Page 35: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Use cases

Image credit: Flickr user: Nicky Jurd (CC BY 2.0)

Page 36: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Using the Human Rights Web Archive & learning from human rights scholars’ work

(publications, citations)

Page 37: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Citations scraped from articles published in 2010 in select scholarly journals

Page 38: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Isolating URLs from list of citations using Open Refine

(approximately 10% of citations scraped have URLs in them)

Page 39: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Querying Internet Archive collection (via API)

Page 40: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Leveraging HRWA Solr index http://hrwa.cul.columbia.edu

Page 41: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Columbia University web resources: creating best practices for site creators

Page 42: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Wider reach with guidelines rather than suggesting changes on case by case basis

Page 43: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Web archiving initiatives focusing on art resources

An initiative designed to address the “urgent need to document the

dynamic web-based versions of auction catalogues, catalogues raisonnés, and scholarly research projects, as well as artist, gallery, and museum websites” (http://www.nyarc.org/content/web-archiving)

Artist files Special Interest Group

Page 44: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

What do you want to learn about web archiving?

Do you have any suggestions on how the SAA Web

Archiving Roundtable can help you develop your knowledge of web archiving?

Categories we identified based on the 33 responses:

– Description

– Preservation

– Access/ Use

– Project Management/ Collaboration

– Appraisal/ Collection Dev/ Policy

– Technology/ Capture/ Tools

– Business Case/ Costs/ Best Practices

Page 45: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Some presentations, papers, panels & posters during grant

• Moderated: “Web Archiving: Experiences, Perspectives and Possibilities” held at METRO on 10/20/14

• Presentation (lightning talk): “MARC Records for the Contemporary Composers Web Archive” for the Society of American Archivists annual conference on 8/16/14 URL (via Academic Commons): http://dx.doi.org/10.7916/D8028Q3S

• Presentation: “SAA Web Archiving Roundtable Education Needs Assessment Survey Results” for the SAA Web Archiving Roundtable meeting at Society of American Archivists annual conference (co-presented with John Bence) on 8/14/14

• Presentation: “How Collaboration Can Save [More of] the Web: Recent Progress in Collaborative Web Archiving Initiatives” for the METRO Conference 2014 on 1/15/14

• Poster session: “Assessment of the Effectiveness of the Human Rights Web Archive @Columbia University” (co-presented with Pamela Graham) at the ACRL/NY Symposium on 12/6/13

URL (via Academic Commons): http://dx.doi.org/10.7916/D8BG2KZ9

• Presentation: “How Collaboration Can Save [More of] the Web: Recent Progress in Collaborative Web Archiving Initiatives” for the Best Practices Exchange on 11/14/13 (with Scott Reed)

URL (via Academic Commons): http://dx.doi.org/10.7916/D8G73BNK

• Presentation: “Web Archiving Resource Collaboration” at CrawlCamp held at METRO on 7/17/13

Page 46: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Are project elements on schedule & within budget?

• So far yes though we have plenty of challenges and work ahead of us

• Steady progress on citation analysis but it’s been much harder than we thought it’d be

• Lots of room for engagement and team work including maintenance and coordination of cooperative efforts

Page 47: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Refining building materials

Page 48: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Modest gains

Page 49: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

The next 12.5 months

• Complete remainder of work called for in grant

• Establish shared cost model for collaborative collection building (e.g. CCWA and CAUSEWAY)

• Plan for scaling (maintenance and growth)

• Codify roles for meaningful involvement in web archiving efforts

• Contribute to professional organizations to strengthen web archiving efforts nationally and internationally

Page 50: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Credits to some of many collaborators

• Bob Wolven, Alex Thurman, Naeema Akter

• Pamela Graham, Kate Harcourt, Christina Harlow

• Talia Jimenez, Stephen Davis, incentives awards oversight panel: Kris Carpenter, Mark Phillips, Rob Sanderson & Perry Willett

• Elizabeth Davis, Russell Merritt & Borrow Direct music librarians

• Carole Ann Fabian, Chris Sala, Ivies Plus Art & Architecture Group

• Borrow Direct Associate University Librarians for Collection Development group

• Climate change selectors at Borrow Direct institutions

• Archive-It staff

• Community for discussion and participation Including: NYARC, METRO, International Internet Preservation Consortium (IIPC), SAA Web Archiving Roundtable, ARLIS/NA Artist Files SIG

Page 51: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Growing web archives

Page 52: Progress Made and Lessons Learned through Collaborative ... · 11/18/2014  · record for each archived website (by early 2015) • Evaluate quality and solicit feedback (ongoing)

Thanks!

Anna Perricci

[email protected]

@AnnaPerricci

Columbia University Libraries


Recommended