Date post: | 26-Dec-2015 |
Category: |
Documents |
Upload: | morgan-rodgers |
View: | 215 times |
Download: | 0 times |
HATHITRUST A Shared Digital Repository
HathiTrust: Putting Research in Context
HTRC UnCampSeptember 10, 2012
John Wilkin, Executive Director, HathiTrust
PartnershipArizona State UniversityBaylor UniversityBoston CollegeBoston UniversityCalifornia Digital LibraryColumbia UniversityCornell UniversityDartmouth CollegeDuke UniversityEmory UniversityFlorida State UniversityGetty Research InstituteHarvard University LibraryIndiana UniversityJohns Hopkins UniversityLafayette CollegeLibrary of CongressMassachusetts Institute of
TechnologyMcGill University`Michigan State UniversityNew York Public LibraryNew York UniversityNorth Carolina Central
University
North Carolina StateUniversity
Northwestern UniversityThe Ohio State UniversityThe Pennsylvania State
UniversityPrinceton UniversityPurdue UniversityStanford UniversityTexas A&M UniversityUniversidad Complutense
de MadridUniversity of ArizonaUniversity of CalgaryUniversity of California
BerkeleyDavisIrvineLos AngelesMercedRiversideSan DiegoSan FranciscoSanta BarbaraSanta Cruz
The University of ChicagoUniversity of Connecticut
University of DelawareUniversity of FloridaUniversity of IllinoisUniversity of Illinois at ChicagoThe University of IowaUniversity of MarylandUniversity of MiamiUniversity of MichiganUniversity of MinnesotaUniversity of MissouriUniversity of Nebraska-LincolnThe University of North
Carolina at Chapel HillUniversity of Notre DameUniversity of PennsylvaniaUniversity of PittsburghUniversity of UtahUniversity of VirginiaUniversity of WashingtonUniversity of Wisconsin-MadisonUtah State UniversityWashington UniversityYale University Library
Mission
To contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge
Digital Repository
• Launched 2008• Initial focus on digitized book and journal
content– 10.5 million total volumes – 5.5 million book titles– 270,000 serial titles– 3.2 million public domain (~30%)
Goals
• Reliable and comprehensive archive of materials converted from print…co-owned
• Improve access …to meet the needs of the co-owning institutions
• Ensure the long-term preservation of content• Coordinate shared storage strategies• “public good” …sustaining the historical record• Simultaneously …centralized …open
Content Distribution
In-copyright or unde-termined
70%
Public Domain (worldwide)
15%
U.S. Federal Government Documents (worldwide)
4%
Public Domain(US)10%
Open Access.1%
Creative Commons .01%
Content Sources
Michigan45%
California33%
Wisconsin5%
Cornell4%
NYPL3%
Princeton3%
Indiana2%
Columbia1%
Harvard1%
LC1%
Madrid1%
Minnesota1%
English48%
German9%
French7%
Spanish5%
Chinese4%
Russian4%
Japanese3%
Italian2%
Arabic2%
Latin1%
Remaining Languages
14%
Language Distribution (1)
The top 10 languages make up ~86% of all content
Undetermined7%
Polish7%
Portuguese7%
Dutch5%
Hebrew5%
Hindi5%
Indonesian4%
Korean4%Swedish
3%
Urdu3%
Turkish3%
Thai3%Danish
3%
Czech3%
Unknown3%
Croatian2%
Persian2%
Tamil2%
Bengali2%
Music2%
Hungarian2%
Norwegian2%
Sanskrit2%
Vietnamese1%
Ukrainian1%
Greek1%
Bulgarian1%Serbian
1%Armenian
1%Romanian
1%Marathi
1%
Ancient-Greek1% Panjabi
1%
Telugu1%Malay
1%
Catalan1%
Malayalam1%
Multiple1%
Finnish1%
Slovak1%
Language Distribution (2)
The next 40 languages make up ~13% of total
Oct-08 Mar-09 Oct-09 Mar-10 Oct-10 Mar-11 Oct-11 Mar-12 Jul-120%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
YaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan
Services
• Long-term preservation– Bit-level and migration
• Bibliographic search• Full-text search• Reading and download capabilities• Print on demand• Collections• Datasets, Research Center
0 20 40 60 80 100 1200%
10%
20%
30%
40%
50%
60%
Rank in 2008 ARL Investment Index
% o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication: 31%
June 2009Median duplication: 19%
Academic print book collection already substantially duplicated in mass digitized book corpus
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
3,500,000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75% of mass digitized corpus is ‘backed up’ in one or more shared print repositories
~3.5M titles
~2.5M
Collection Management, Development
• Overlap– More than 50% median overlap with ARL
institutions; higher for small liberal arts colleges• Pricing model based on Print holdings
– Requires print holdings database– Also support expansion of legal uses, efforts in de-
duplication– Facilitate individual and collaborative collection
development and management operations• Print monographs archiving
Discovery and Use
• Search, collections, online access• APIs and data feeds
– Data API– Bibliographic API– “Hathifiles” inventory files– OAI
• Computational Research– Distribution of datasets– Protocol-based access– Research Center
Constitutional Convention
• October 2011• 52 partners• 3-year review overseen by SAB• Ballot Proposals
– Print monograph storage– Approval Process for development initiatives– U.S. Government Documents– Fee-for-service content deposit– Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
Budget/FinancesDecision-making
Guidance on Policy, Planning
• 12-member Board of Governors
• Executive Committee• Executive Director
Collaborative Support
• New pricing model• Base infrastructure costs
– Public domain– In-copyright/undetermined
• Funds for programmatic initiatives