+ All Categories
Home > Documents > The Library of Congress Cooperative Web Archiving Project

The Library of Congress Cooperative Web Archiving Project

Date post: 07-Feb-2016
Category:
Upload: vanig
View: 35 times
Download: 0 times
Share this document with a friend
Description:
The Library of Congress Cooperative Web Archiving Project. November 4, 2009. Abbie Grotke, Library of Congress Grant Harris, Library of Congress Jennifer Long, Georgetown University. Agenda. LC’s Web archiving program Overview of the Cooperative Project - PowerPoint PPT Presentation
Popular Tags:
22
The Library of Congress Cooperative Web Archiving Project Abbie Grotke, Library of Congress Grant Harris, Library of Congress Jennifer Long, Georgetown November 4, 2009
Transcript
  • The Library of Congress Cooperative Web Archiving ProjectAbbie Grotke, Library of CongressGrant Harris, Library of CongressJennifer Long, Georgetown UniversityNovember 4, 2009

    The Library of Congress

    AgendaLCs Web archiving programOverview of the Cooperative ProjectFeatured Partner: Georgetown UniversityLessons Learned

    The Library of Congress

    Library of Congress Web Archives: loc.gov/lcwa

    The Library of Congress

    LC Collections: over 130 TBUS National Elections2000, 2002, 2004, 2006, 2008Iraq War 2003September 11 2001 & September 11 Remembrance 2002Olympics 2002Congress106th, 107th , 108th , 109th, 110th, Supreme Court NominationsLegal BlawgsPapal TransitionOverseas Operations: Indian and Indonesian ElectionsCase Studies: health care, terrorism, visual image content, organizational Web sites, Crisis in Darfur, single site

    http://www.loc.gov/webarchiving/projects.html

    The Library of Congress

    Organizational StructureINFORMATION TECHNOLOGY OFFICE and TECHNICAL ARCHITECTURE TEAMAlso in OSI. Supports Wayback and Web Curator Tool development, Repository development and Data Transfers.Contractors are also used in this area.BIBLIOGRAPHIC ACCESSMODS records are created in Library Services: the Network Development & MARC Standards Office & Acquisitions & Bibliographic Access staff do the cataloging.WEB ARCHIVING TEAMIn the Office of Strategic Initiatives (OSI). We are project managers and technical staff focused on capture, tools, and permissions.CURATORS/RECOMMENDING OFFICERSIn Library Services, Congressional Research Service, and the Law Library pick the collections and what URLs to archive,and research who to contact for permission.

    The Library of Congress

    Collaborations and PartnershipsEarly collections: Election 00 and 02, September 11End of Term ProjectHurricane Katrina ArchiveIIPC upcoming Olympics CollectionNDIIPP Partners K-12 Web ArchivingCooperative Archive-IT projects

    The Library of Congress

    ProblemWeb content that will be important for future research is disappearing before it can be collectedIdentification of sites, and review of captured sites, is labor-intensive; LC staff are stretched thin Outside institutions may not have resources/budgets for collecting web sites

    The Library of Congress

    Cooperative Archive-IT Project ConceptEnlist Library Services subject experts to identify international and national high-value collecting areas, with a focus on foreign countries experiencing volatile political situationsEnlist Library Services subject experts to identify scholarly centers, or partner institutions, with recognized expertise in the collecting areas, to assist in the collection and preservation of important at-risk materialsPrioritize collecting areas/centers of expertise (7 priority areas selected)

    The Library of Congress

    GoalsTo enable institutions outside the Library to gain experience creating Web site collectionsTo extend the network of NDIIPP partners working to identify and collect high value, at-risk Web materialsTo develop subject areas collections that could become part of the Librarys collections in the future, andTo broaden the understanding of issues related to the development of curated collections of Web content.

    The Library of Congress

    Library of Congress agreed to:Establish and fund an Archive-It account for the partner for up to one year (with possible extension);Provide support as needed;Provide subject matter expertise as requested by the partner;Invite partner institutions to at least one conference at the Library (if funding is available); Maintain a second copy of the harvested content.

    The Library of Congress

    Each Center Was Asked To:Identify high risk, high value web sites for their area, and use Archive-It to harvest the sites;Document their selection criteria and provide it to the Library;Document issues, lessons learned, etc. related to their web collecting;Participate in a conference with Library experts and other participants (if scheduled).

    The Library of Congress

    Electronic Literature OrganizationLiterary SitesJuly 12, 2008 (ongoing)9,214,920 documents401.29 GBGeorge Washington University, Institute for European, Russian, and Eurasian StudiesRussian Parliamentary Elections, Dec. 07, and the Russian Presidential Election 08August 13, 2007 August 12, 200818,175,664 documents870.09 GBGeorgetown UniversityBelarus, Moldova, UkraineSeptember 17, 2007 - (ongoing)19,880,435 documents580 GB

    University of North Carolina, Chapel HillIslam in AsiaSeptember 27, 2007 February 1, 2008 3,856,205 documents105.35 GBStanford University Libraries, Islamic StudiesIranian BlogsFebruary 29, 2008 - (ongoing)27,997,040 documents2,099.70 GB

    George Washington University, Center for Global HealthAvian bird flu in Asian countriesJune 3, 2008 January 6, 200918,699,986 documents640.6 GB

    The Library of Congress

    Featured Partner: Georgetown UniversityBelarus, Moldova, Ukraine CollectionProposed by LC Curator: Grant Harris Aim: the web capture of fragile websites from Belarus, Moldova, and Ukraine, to include selected government websites, opposition parties, ethnic and religious groups, elections, and security issues.

    The Library of Congress

    The Library of Congress

    The Library of Congress

    The Library of Congress

    The Library of Congress

    Lessons LearnedFinding good partners was KEY - partners should be committed and really get the concept of web archiving and archiving primary source materialsCrawling ALL of Twitter not so good.Confusion over LCs own web archiving program vs. this project

    The Library of Congress

    Lessons LearnedCollaborative collection building is a good thingNew partnerships formedNew ways for our curators to get engaged with web archivingLC might not have been able to archive some content collected on our own (permissions, staff time, etc.)

    The Library of Congress

    Next StepsThree partners collecting (at least) for another year: ELO, Georgetown, and StanfordFocus on description and access: George Washington University/Russian ElectionsFuture: Data transfer to LC

    The Library of Congress

    For more information LC Web Archiving: http://www.loc.gov/webarchiving/LCWA: http://loc.gov/lcwa/National Digital Information and Infrastructure Preservation Program: http://www.digitalpreservation.gov/Georgetowns Archive-IT collections: http://archive-it.org/public/partner?id=168

    The Library of Congress

    Questions?Abbie Grotke [email protected] Harris [email protected] Jennifer Long [email protected]

    Collections available via loc.gov/lcwa public access. METADATA is available for everything, but some archived websites restricted to onsite access only (based on permissions obtained).Whos involved at LCSome of NDIIPP partners are collecting web content, and we work closely with them.

    Coop archive-it projects LC is funding Archive-IT accounts so that other universities/institutions can select and collect content of interest to LC curators (who selected the partners and topics). The Partners keep a copy of the data, and LC gets a copy.

    End of term collected .gov at the end of the Bush administration with 4 other partners.Katrina, another collaborative collecting projectNational Strategy recent NDIIPP meeting do discuss how to develop a strategy for a national collection of public policy materials. Started in 2007We briefed managers and then put a call out to get project proposals. ELO, Georgetown, and Stanford still crawling today GWU was also a success. completed crawling after one year. Curator and our curator working on cataloging and access to archive (via LC or GWU). Slow process. UNC/GWU, not so good. didnt last a year.

    Have Grant talk about why he felt it was important to collect these materials, how he IDd Jennifer and her program and how we met with her initially, worked with them along the way?

    Maybe after Jennifer speaks, they can both speak to how they feel its going? We have a thriving web archive program why did we need others to collect for us?

    Talk about our trouble partners a bit and why they might have failed.


Recommended