Date post: | 13-Jan-2016 |
Category: |
Documents |
Upload: | lorena-lynch |
View: | 222 times |
Download: | 3 times |
HATHITRUST A Shared Digital Repository
Your Library Now Online Putting HathiTrust in the Context of Traditional (and New) Library
ServicesMCLS Webinar
February 6 2012Jeremy York Project Librarian HathiTrust
Unless otherwise noted these slides and their contents are licensed under a Creative Commons Attribution Unported License
Outline
bull The Big Ideandash Mission and Goals
bull What wersquore doing to get therendash Repository and Contentndash Making content availablendash Organizational structure
bull How HathiTrust can change the way we work
The Big Idea
PartnershipArizona State UniversityBaylor UniversityBoston CollegeBoston UniversityBrandeis UniversityCalifornia Digital LibraryCarnegie Mellon UniversityColumbia UniversityCornell UniversityDartmouth CollegeDuke UniversityEmory UniversityFlorida State UniversityGetty Research InstituteHarvard University LibraryIndiana UniversityIowa State UniversityJohns Hopkins UniversityKansas State UniversityLafayette CollegeLibrary of CongressMassachusetts Institute of
TechnologyMcGill University`Michigan State UniversityNew York Public LibraryNew York UniversityNorth Carolina Central
University
North Carolina StateUniversity
Northwestern UniversityThe Ohio State UniversityThe Pennsylvania State
UniversityPrinceton UniversityPurdue UniversityStanford UniversitySyracuse UniversityTexas AampM UniversityUniversidad Complutense
de MadridUniversity of ArizonaUniversity of CalgaryUniversity of California
BerkeleyDavisIrvineLos AngelesMercedRiversideSan DiegoSan FranciscoSanta BarbaraSanta Cruz
The University of ChicagoUniversity of ConnecticutUniversity of DelawareUniversity of Florida
University of IllinoisUniversity of Illinois at ChicagoThe University of IowaUniversity of KansasUniversity of MarylandUniversity of MiamiUniversity of MichiganUniversity of MinnesotaUniversity of MissouriUniversity of Nebraska-LincolnThe University of North
Carolina at Chapel HillUniversity of Notre DameUniversity of PennsylvaniaUniversity of PittsburghUniversity of UtahUniversity of VermontUniversity of VirginiaUniversity of WashingtonUniversity of Wisconsin-
MadisonUtah State UniversityVanderbilt UniversityVirginia TechWake Forest UniversityWashington UniversityYale University Library
Digital Repository
bull Launched 2008bull Initial focus on digitized book and journal
contentndash 106 million total volumes ndash 558 million book titlesndash 276000 serial titlesndash 32 million public domain (~31)
The Name
bull The meaning behind the namendash Hathi (hah-tee)--Hindi for elephantndash Big strongndash Never forgets wisendash Securendash Trustworthy
Mission
bull To contribute to the common good by collecting organizing preserving communicating and sharing the record of human knowledge
Universal Library
Common Goal
Single Entity Many Partners
HathiTrust
Collections and Collaboration
bull Comprehensive collection- Preservationhellipwith Access
bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services
bull Public Good
What we are doing to get there
Cost-effective long-term preservation and access for digitized content
bull Facilitate decision-making about digitization and print collection management
bull Facilitate activities such as discovery copyright review use of materials
Repository and Content
Michigan 43
California 32
Wisconsin 5
Cornell 4NYPL 2
Princeton 2Indiana 2
Columbia 1
Harvard 2LoC 1 Madrid 1 Minnesota 1
Illinois 1
Content Sources
English48
German9
French7
Spanish5
Chinese4
Russian4
Japanese3
Italian2
Arabic2
Latin1
Remaining Languages
14
Language Distribution (1)
The top 10 languages make up ~86 of all content
Undetermined7
Polish7
Portuguese7
Dutch5
Hebrew5
Hindi5
Indonesian4
Korean4Swedish
3
Urdu3
Turkish3
Thai3Danish
3
Czech3
Unknown3
Croatian2
Persian2
Tamil2
Bengali2
Music2
Hungarian2
Norwegian2
Sanskrit2
Vietnamese1
Ukrainian1
Greek1
Bulgarian1Serbian
1Armenian
1Romanian
1Marathi
1
Ancient-Greek1 Panjabi
1
Telugu1Malay
1
Catalan1
Malayalam1
Multiple1
Finnish1
Slovak1
Language Distribution (2)
The next 40 languages make up ~13 of total
Dates
Copyright Distribution
In-copyright or unde-termined
69
Public Domain (worldwide)
15
US Federal Government Documents (worldwide)
4
Public Domain(US)11
Open Access1
Creative Commons 04
1010
8
110
9
410
9
710
9
1010
9
111
0
411
0
711
0
1011
0
111
1
411
1
711
1
1011
1
111
2
411
2
711
2
1011
2
111
30
10
20
30
40
50
60
70
80
90
100
Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsTDR
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Outline
bull The Big Ideandash Mission and Goals
bull What wersquore doing to get therendash Repository and Contentndash Making content availablendash Organizational structure
bull How HathiTrust can change the way we work
The Big Idea
PartnershipArizona State UniversityBaylor UniversityBoston CollegeBoston UniversityBrandeis UniversityCalifornia Digital LibraryCarnegie Mellon UniversityColumbia UniversityCornell UniversityDartmouth CollegeDuke UniversityEmory UniversityFlorida State UniversityGetty Research InstituteHarvard University LibraryIndiana UniversityIowa State UniversityJohns Hopkins UniversityKansas State UniversityLafayette CollegeLibrary of CongressMassachusetts Institute of
TechnologyMcGill University`Michigan State UniversityNew York Public LibraryNew York UniversityNorth Carolina Central
University
North Carolina StateUniversity
Northwestern UniversityThe Ohio State UniversityThe Pennsylvania State
UniversityPrinceton UniversityPurdue UniversityStanford UniversitySyracuse UniversityTexas AampM UniversityUniversidad Complutense
de MadridUniversity of ArizonaUniversity of CalgaryUniversity of California
BerkeleyDavisIrvineLos AngelesMercedRiversideSan DiegoSan FranciscoSanta BarbaraSanta Cruz
The University of ChicagoUniversity of ConnecticutUniversity of DelawareUniversity of Florida
University of IllinoisUniversity of Illinois at ChicagoThe University of IowaUniversity of KansasUniversity of MarylandUniversity of MiamiUniversity of MichiganUniversity of MinnesotaUniversity of MissouriUniversity of Nebraska-LincolnThe University of North
Carolina at Chapel HillUniversity of Notre DameUniversity of PennsylvaniaUniversity of PittsburghUniversity of UtahUniversity of VermontUniversity of VirginiaUniversity of WashingtonUniversity of Wisconsin-
MadisonUtah State UniversityVanderbilt UniversityVirginia TechWake Forest UniversityWashington UniversityYale University Library
Digital Repository
bull Launched 2008bull Initial focus on digitized book and journal
contentndash 106 million total volumes ndash 558 million book titlesndash 276000 serial titlesndash 32 million public domain (~31)
The Name
bull The meaning behind the namendash Hathi (hah-tee)--Hindi for elephantndash Big strongndash Never forgets wisendash Securendash Trustworthy
Mission
bull To contribute to the common good by collecting organizing preserving communicating and sharing the record of human knowledge
Universal Library
Common Goal
Single Entity Many Partners
HathiTrust
Collections and Collaboration
bull Comprehensive collection- Preservationhellipwith Access
bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services
bull Public Good
What we are doing to get there
Cost-effective long-term preservation and access for digitized content
bull Facilitate decision-making about digitization and print collection management
bull Facilitate activities such as discovery copyright review use of materials
Repository and Content
Michigan 43
California 32
Wisconsin 5
Cornell 4NYPL 2
Princeton 2Indiana 2
Columbia 1
Harvard 2LoC 1 Madrid 1 Minnesota 1
Illinois 1
Content Sources
English48
German9
French7
Spanish5
Chinese4
Russian4
Japanese3
Italian2
Arabic2
Latin1
Remaining Languages
14
Language Distribution (1)
The top 10 languages make up ~86 of all content
Undetermined7
Polish7
Portuguese7
Dutch5
Hebrew5
Hindi5
Indonesian4
Korean4Swedish
3
Urdu3
Turkish3
Thai3Danish
3
Czech3
Unknown3
Croatian2
Persian2
Tamil2
Bengali2
Music2
Hungarian2
Norwegian2
Sanskrit2
Vietnamese1
Ukrainian1
Greek1
Bulgarian1Serbian
1Armenian
1Romanian
1Marathi
1
Ancient-Greek1 Panjabi
1
Telugu1Malay
1
Catalan1
Malayalam1
Multiple1
Finnish1
Slovak1
Language Distribution (2)
The next 40 languages make up ~13 of total
Dates
Copyright Distribution
In-copyright or unde-termined
69
Public Domain (worldwide)
15
US Federal Government Documents (worldwide)
4
Public Domain(US)11
Open Access1
Creative Commons 04
1010
8
110
9
410
9
710
9
1010
9
111
0
411
0
711
0
1011
0
111
1
411
1
711
1
1011
1
111
2
411
2
711
2
1011
2
111
30
10
20
30
40
50
60
70
80
90
100
Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsTDR
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
The Big Idea
PartnershipArizona State UniversityBaylor UniversityBoston CollegeBoston UniversityBrandeis UniversityCalifornia Digital LibraryCarnegie Mellon UniversityColumbia UniversityCornell UniversityDartmouth CollegeDuke UniversityEmory UniversityFlorida State UniversityGetty Research InstituteHarvard University LibraryIndiana UniversityIowa State UniversityJohns Hopkins UniversityKansas State UniversityLafayette CollegeLibrary of CongressMassachusetts Institute of
TechnologyMcGill University`Michigan State UniversityNew York Public LibraryNew York UniversityNorth Carolina Central
University
North Carolina StateUniversity
Northwestern UniversityThe Ohio State UniversityThe Pennsylvania State
UniversityPrinceton UniversityPurdue UniversityStanford UniversitySyracuse UniversityTexas AampM UniversityUniversidad Complutense
de MadridUniversity of ArizonaUniversity of CalgaryUniversity of California
BerkeleyDavisIrvineLos AngelesMercedRiversideSan DiegoSan FranciscoSanta BarbaraSanta Cruz
The University of ChicagoUniversity of ConnecticutUniversity of DelawareUniversity of Florida
University of IllinoisUniversity of Illinois at ChicagoThe University of IowaUniversity of KansasUniversity of MarylandUniversity of MiamiUniversity of MichiganUniversity of MinnesotaUniversity of MissouriUniversity of Nebraska-LincolnThe University of North
Carolina at Chapel HillUniversity of Notre DameUniversity of PennsylvaniaUniversity of PittsburghUniversity of UtahUniversity of VermontUniversity of VirginiaUniversity of WashingtonUniversity of Wisconsin-
MadisonUtah State UniversityVanderbilt UniversityVirginia TechWake Forest UniversityWashington UniversityYale University Library
Digital Repository
bull Launched 2008bull Initial focus on digitized book and journal
contentndash 106 million total volumes ndash 558 million book titlesndash 276000 serial titlesndash 32 million public domain (~31)
The Name
bull The meaning behind the namendash Hathi (hah-tee)--Hindi for elephantndash Big strongndash Never forgets wisendash Securendash Trustworthy
Mission
bull To contribute to the common good by collecting organizing preserving communicating and sharing the record of human knowledge
Universal Library
Common Goal
Single Entity Many Partners
HathiTrust
Collections and Collaboration
bull Comprehensive collection- Preservationhellipwith Access
bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services
bull Public Good
What we are doing to get there
Cost-effective long-term preservation and access for digitized content
bull Facilitate decision-making about digitization and print collection management
bull Facilitate activities such as discovery copyright review use of materials
Repository and Content
Michigan 43
California 32
Wisconsin 5
Cornell 4NYPL 2
Princeton 2Indiana 2
Columbia 1
Harvard 2LoC 1 Madrid 1 Minnesota 1
Illinois 1
Content Sources
English48
German9
French7
Spanish5
Chinese4
Russian4
Japanese3
Italian2
Arabic2
Latin1
Remaining Languages
14
Language Distribution (1)
The top 10 languages make up ~86 of all content
Undetermined7
Polish7
Portuguese7
Dutch5
Hebrew5
Hindi5
Indonesian4
Korean4Swedish
3
Urdu3
Turkish3
Thai3Danish
3
Czech3
Unknown3
Croatian2
Persian2
Tamil2
Bengali2
Music2
Hungarian2
Norwegian2
Sanskrit2
Vietnamese1
Ukrainian1
Greek1
Bulgarian1Serbian
1Armenian
1Romanian
1Marathi
1
Ancient-Greek1 Panjabi
1
Telugu1Malay
1
Catalan1
Malayalam1
Multiple1
Finnish1
Slovak1
Language Distribution (2)
The next 40 languages make up ~13 of total
Dates
Copyright Distribution
In-copyright or unde-termined
69
Public Domain (worldwide)
15
US Federal Government Documents (worldwide)
4
Public Domain(US)11
Open Access1
Creative Commons 04
1010
8
110
9
410
9
710
9
1010
9
111
0
411
0
711
0
1011
0
111
1
411
1
711
1
1011
1
111
2
411
2
711
2
1011
2
111
30
10
20
30
40
50
60
70
80
90
100
Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsTDR
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
PartnershipArizona State UniversityBaylor UniversityBoston CollegeBoston UniversityBrandeis UniversityCalifornia Digital LibraryCarnegie Mellon UniversityColumbia UniversityCornell UniversityDartmouth CollegeDuke UniversityEmory UniversityFlorida State UniversityGetty Research InstituteHarvard University LibraryIndiana UniversityIowa State UniversityJohns Hopkins UniversityKansas State UniversityLafayette CollegeLibrary of CongressMassachusetts Institute of
TechnologyMcGill University`Michigan State UniversityNew York Public LibraryNew York UniversityNorth Carolina Central
University
North Carolina StateUniversity
Northwestern UniversityThe Ohio State UniversityThe Pennsylvania State
UniversityPrinceton UniversityPurdue UniversityStanford UniversitySyracuse UniversityTexas AampM UniversityUniversidad Complutense
de MadridUniversity of ArizonaUniversity of CalgaryUniversity of California
BerkeleyDavisIrvineLos AngelesMercedRiversideSan DiegoSan FranciscoSanta BarbaraSanta Cruz
The University of ChicagoUniversity of ConnecticutUniversity of DelawareUniversity of Florida
University of IllinoisUniversity of Illinois at ChicagoThe University of IowaUniversity of KansasUniversity of MarylandUniversity of MiamiUniversity of MichiganUniversity of MinnesotaUniversity of MissouriUniversity of Nebraska-LincolnThe University of North
Carolina at Chapel HillUniversity of Notre DameUniversity of PennsylvaniaUniversity of PittsburghUniversity of UtahUniversity of VermontUniversity of VirginiaUniversity of WashingtonUniversity of Wisconsin-
MadisonUtah State UniversityVanderbilt UniversityVirginia TechWake Forest UniversityWashington UniversityYale University Library
Digital Repository
bull Launched 2008bull Initial focus on digitized book and journal
contentndash 106 million total volumes ndash 558 million book titlesndash 276000 serial titlesndash 32 million public domain (~31)
The Name
bull The meaning behind the namendash Hathi (hah-tee)--Hindi for elephantndash Big strongndash Never forgets wisendash Securendash Trustworthy
Mission
bull To contribute to the common good by collecting organizing preserving communicating and sharing the record of human knowledge
Universal Library
Common Goal
Single Entity Many Partners
HathiTrust
Collections and Collaboration
bull Comprehensive collection- Preservationhellipwith Access
bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services
bull Public Good
What we are doing to get there
Cost-effective long-term preservation and access for digitized content
bull Facilitate decision-making about digitization and print collection management
bull Facilitate activities such as discovery copyright review use of materials
Repository and Content
Michigan 43
California 32
Wisconsin 5
Cornell 4NYPL 2
Princeton 2Indiana 2
Columbia 1
Harvard 2LoC 1 Madrid 1 Minnesota 1
Illinois 1
Content Sources
English48
German9
French7
Spanish5
Chinese4
Russian4
Japanese3
Italian2
Arabic2
Latin1
Remaining Languages
14
Language Distribution (1)
The top 10 languages make up ~86 of all content
Undetermined7
Polish7
Portuguese7
Dutch5
Hebrew5
Hindi5
Indonesian4
Korean4Swedish
3
Urdu3
Turkish3
Thai3Danish
3
Czech3
Unknown3
Croatian2
Persian2
Tamil2
Bengali2
Music2
Hungarian2
Norwegian2
Sanskrit2
Vietnamese1
Ukrainian1
Greek1
Bulgarian1Serbian
1Armenian
1Romanian
1Marathi
1
Ancient-Greek1 Panjabi
1
Telugu1Malay
1
Catalan1
Malayalam1
Multiple1
Finnish1
Slovak1
Language Distribution (2)
The next 40 languages make up ~13 of total
Dates
Copyright Distribution
In-copyright or unde-termined
69
Public Domain (worldwide)
15
US Federal Government Documents (worldwide)
4
Public Domain(US)11
Open Access1
Creative Commons 04
1010
8
110
9
410
9
710
9
1010
9
111
0
411
0
711
0
1011
0
111
1
411
1
711
1
1011
1
111
2
411
2
711
2
1011
2
111
30
10
20
30
40
50
60
70
80
90
100
Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsTDR
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Digital Repository
bull Launched 2008bull Initial focus on digitized book and journal
contentndash 106 million total volumes ndash 558 million book titlesndash 276000 serial titlesndash 32 million public domain (~31)
The Name
bull The meaning behind the namendash Hathi (hah-tee)--Hindi for elephantndash Big strongndash Never forgets wisendash Securendash Trustworthy
Mission
bull To contribute to the common good by collecting organizing preserving communicating and sharing the record of human knowledge
Universal Library
Common Goal
Single Entity Many Partners
HathiTrust
Collections and Collaboration
bull Comprehensive collection- Preservationhellipwith Access
bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services
bull Public Good
What we are doing to get there
Cost-effective long-term preservation and access for digitized content
bull Facilitate decision-making about digitization and print collection management
bull Facilitate activities such as discovery copyright review use of materials
Repository and Content
Michigan 43
California 32
Wisconsin 5
Cornell 4NYPL 2
Princeton 2Indiana 2
Columbia 1
Harvard 2LoC 1 Madrid 1 Minnesota 1
Illinois 1
Content Sources
English48
German9
French7
Spanish5
Chinese4
Russian4
Japanese3
Italian2
Arabic2
Latin1
Remaining Languages
14
Language Distribution (1)
The top 10 languages make up ~86 of all content
Undetermined7
Polish7
Portuguese7
Dutch5
Hebrew5
Hindi5
Indonesian4
Korean4Swedish
3
Urdu3
Turkish3
Thai3Danish
3
Czech3
Unknown3
Croatian2
Persian2
Tamil2
Bengali2
Music2
Hungarian2
Norwegian2
Sanskrit2
Vietnamese1
Ukrainian1
Greek1
Bulgarian1Serbian
1Armenian
1Romanian
1Marathi
1
Ancient-Greek1 Panjabi
1
Telugu1Malay
1
Catalan1
Malayalam1
Multiple1
Finnish1
Slovak1
Language Distribution (2)
The next 40 languages make up ~13 of total
Dates
Copyright Distribution
In-copyright or unde-termined
69
Public Domain (worldwide)
15
US Federal Government Documents (worldwide)
4
Public Domain(US)11
Open Access1
Creative Commons 04
1010
8
110
9
410
9
710
9
1010
9
111
0
411
0
711
0
1011
0
111
1
411
1
711
1
1011
1
111
2
411
2
711
2
1011
2
111
30
10
20
30
40
50
60
70
80
90
100
Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsTDR
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
The Name
bull The meaning behind the namendash Hathi (hah-tee)--Hindi for elephantndash Big strongndash Never forgets wisendash Securendash Trustworthy
Mission
bull To contribute to the common good by collecting organizing preserving communicating and sharing the record of human knowledge
Universal Library
Common Goal
Single Entity Many Partners
HathiTrust
Collections and Collaboration
bull Comprehensive collection- Preservationhellipwith Access
bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services
bull Public Good
What we are doing to get there
Cost-effective long-term preservation and access for digitized content
bull Facilitate decision-making about digitization and print collection management
bull Facilitate activities such as discovery copyright review use of materials
Repository and Content
Michigan 43
California 32
Wisconsin 5
Cornell 4NYPL 2
Princeton 2Indiana 2
Columbia 1
Harvard 2LoC 1 Madrid 1 Minnesota 1
Illinois 1
Content Sources
English48
German9
French7
Spanish5
Chinese4
Russian4
Japanese3
Italian2
Arabic2
Latin1
Remaining Languages
14
Language Distribution (1)
The top 10 languages make up ~86 of all content
Undetermined7
Polish7
Portuguese7
Dutch5
Hebrew5
Hindi5
Indonesian4
Korean4Swedish
3
Urdu3
Turkish3
Thai3Danish
3
Czech3
Unknown3
Croatian2
Persian2
Tamil2
Bengali2
Music2
Hungarian2
Norwegian2
Sanskrit2
Vietnamese1
Ukrainian1
Greek1
Bulgarian1Serbian
1Armenian
1Romanian
1Marathi
1
Ancient-Greek1 Panjabi
1
Telugu1Malay
1
Catalan1
Malayalam1
Multiple1
Finnish1
Slovak1
Language Distribution (2)
The next 40 languages make up ~13 of total
Dates
Copyright Distribution
In-copyright or unde-termined
69
Public Domain (worldwide)
15
US Federal Government Documents (worldwide)
4
Public Domain(US)11
Open Access1
Creative Commons 04
1010
8
110
9
410
9
710
9
1010
9
111
0
411
0
711
0
1011
0
111
1
411
1
711
1
1011
1
111
2
411
2
711
2
1011
2
111
30
10
20
30
40
50
60
70
80
90
100
Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsTDR
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Mission
bull To contribute to the common good by collecting organizing preserving communicating and sharing the record of human knowledge
Universal Library
Common Goal
Single Entity Many Partners
HathiTrust
Collections and Collaboration
bull Comprehensive collection- Preservationhellipwith Access
bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services
bull Public Good
What we are doing to get there
Cost-effective long-term preservation and access for digitized content
bull Facilitate decision-making about digitization and print collection management
bull Facilitate activities such as discovery copyright review use of materials
Repository and Content
Michigan 43
California 32
Wisconsin 5
Cornell 4NYPL 2
Princeton 2Indiana 2
Columbia 1
Harvard 2LoC 1 Madrid 1 Minnesota 1
Illinois 1
Content Sources
English48
German9
French7
Spanish5
Chinese4
Russian4
Japanese3
Italian2
Arabic2
Latin1
Remaining Languages
14
Language Distribution (1)
The top 10 languages make up ~86 of all content
Undetermined7
Polish7
Portuguese7
Dutch5
Hebrew5
Hindi5
Indonesian4
Korean4Swedish
3
Urdu3
Turkish3
Thai3Danish
3
Czech3
Unknown3
Croatian2
Persian2
Tamil2
Bengali2
Music2
Hungarian2
Norwegian2
Sanskrit2
Vietnamese1
Ukrainian1
Greek1
Bulgarian1Serbian
1Armenian
1Romanian
1Marathi
1
Ancient-Greek1 Panjabi
1
Telugu1Malay
1
Catalan1
Malayalam1
Multiple1
Finnish1
Slovak1
Language Distribution (2)
The next 40 languages make up ~13 of total
Dates
Copyright Distribution
In-copyright or unde-termined
69
Public Domain (worldwide)
15
US Federal Government Documents (worldwide)
4
Public Domain(US)11
Open Access1
Creative Commons 04
1010
8
110
9
410
9
710
9
1010
9
111
0
411
0
711
0
1011
0
111
1
411
1
711
1
1011
1
111
2
411
2
711
2
1011
2
111
30
10
20
30
40
50
60
70
80
90
100
Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsTDR
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Universal Library
Common Goal
Single Entity Many Partners
HathiTrust
Collections and Collaboration
bull Comprehensive collection- Preservationhellipwith Access
bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services
bull Public Good
What we are doing to get there
Cost-effective long-term preservation and access for digitized content
bull Facilitate decision-making about digitization and print collection management
bull Facilitate activities such as discovery copyright review use of materials
Repository and Content
Michigan 43
California 32
Wisconsin 5
Cornell 4NYPL 2
Princeton 2Indiana 2
Columbia 1
Harvard 2LoC 1 Madrid 1 Minnesota 1
Illinois 1
Content Sources
English48
German9
French7
Spanish5
Chinese4
Russian4
Japanese3
Italian2
Arabic2
Latin1
Remaining Languages
14
Language Distribution (1)
The top 10 languages make up ~86 of all content
Undetermined7
Polish7
Portuguese7
Dutch5
Hebrew5
Hindi5
Indonesian4
Korean4Swedish
3
Urdu3
Turkish3
Thai3Danish
3
Czech3
Unknown3
Croatian2
Persian2
Tamil2
Bengali2
Music2
Hungarian2
Norwegian2
Sanskrit2
Vietnamese1
Ukrainian1
Greek1
Bulgarian1Serbian
1Armenian
1Romanian
1Marathi
1
Ancient-Greek1 Panjabi
1
Telugu1Malay
1
Catalan1
Malayalam1
Multiple1
Finnish1
Slovak1
Language Distribution (2)
The next 40 languages make up ~13 of total
Dates
Copyright Distribution
In-copyright or unde-termined
69
Public Domain (worldwide)
15
US Federal Government Documents (worldwide)
4
Public Domain(US)11
Open Access1
Creative Commons 04
1010
8
110
9
410
9
710
9
1010
9
111
0
411
0
711
0
1011
0
111
1
411
1
711
1
1011
1
111
2
411
2
711
2
1011
2
111
30
10
20
30
40
50
60
70
80
90
100
Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsTDR
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Collections and Collaboration
bull Comprehensive collection- Preservationhellipwith Access
bull Shared strategiesndash Copyrightndash Collection management developmentndash Preservationndash Discovery Usendash Bibliographic Indeterminacyndash Efficient user services
bull Public Good
What we are doing to get there
Cost-effective long-term preservation and access for digitized content
bull Facilitate decision-making about digitization and print collection management
bull Facilitate activities such as discovery copyright review use of materials
Repository and Content
Michigan 43
California 32
Wisconsin 5
Cornell 4NYPL 2
Princeton 2Indiana 2
Columbia 1
Harvard 2LoC 1 Madrid 1 Minnesota 1
Illinois 1
Content Sources
English48
German9
French7
Spanish5
Chinese4
Russian4
Japanese3
Italian2
Arabic2
Latin1
Remaining Languages
14
Language Distribution (1)
The top 10 languages make up ~86 of all content
Undetermined7
Polish7
Portuguese7
Dutch5
Hebrew5
Hindi5
Indonesian4
Korean4Swedish
3
Urdu3
Turkish3
Thai3Danish
3
Czech3
Unknown3
Croatian2
Persian2
Tamil2
Bengali2
Music2
Hungarian2
Norwegian2
Sanskrit2
Vietnamese1
Ukrainian1
Greek1
Bulgarian1Serbian
1Armenian
1Romanian
1Marathi
1
Ancient-Greek1 Panjabi
1
Telugu1Malay
1
Catalan1
Malayalam1
Multiple1
Finnish1
Slovak1
Language Distribution (2)
The next 40 languages make up ~13 of total
Dates
Copyright Distribution
In-copyright or unde-termined
69
Public Domain (worldwide)
15
US Federal Government Documents (worldwide)
4
Public Domain(US)11
Open Access1
Creative Commons 04
1010
8
110
9
410
9
710
9
1010
9
111
0
411
0
711
0
1011
0
111
1
411
1
711
1
1011
1
111
2
411
2
711
2
1011
2
111
30
10
20
30
40
50
60
70
80
90
100
Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsTDR
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
What we are doing to get there
Cost-effective long-term preservation and access for digitized content
bull Facilitate decision-making about digitization and print collection management
bull Facilitate activities such as discovery copyright review use of materials
Repository and Content
Michigan 43
California 32
Wisconsin 5
Cornell 4NYPL 2
Princeton 2Indiana 2
Columbia 1
Harvard 2LoC 1 Madrid 1 Minnesota 1
Illinois 1
Content Sources
English48
German9
French7
Spanish5
Chinese4
Russian4
Japanese3
Italian2
Arabic2
Latin1
Remaining Languages
14
Language Distribution (1)
The top 10 languages make up ~86 of all content
Undetermined7
Polish7
Portuguese7
Dutch5
Hebrew5
Hindi5
Indonesian4
Korean4Swedish
3
Urdu3
Turkish3
Thai3Danish
3
Czech3
Unknown3
Croatian2
Persian2
Tamil2
Bengali2
Music2
Hungarian2
Norwegian2
Sanskrit2
Vietnamese1
Ukrainian1
Greek1
Bulgarian1Serbian
1Armenian
1Romanian
1Marathi
1
Ancient-Greek1 Panjabi
1
Telugu1Malay
1
Catalan1
Malayalam1
Multiple1
Finnish1
Slovak1
Language Distribution (2)
The next 40 languages make up ~13 of total
Dates
Copyright Distribution
In-copyright or unde-termined
69
Public Domain (worldwide)
15
US Federal Government Documents (worldwide)
4
Public Domain(US)11
Open Access1
Creative Commons 04
1010
8
110
9
410
9
710
9
1010
9
111
0
411
0
711
0
1011
0
111
1
411
1
711
1
1011
1
111
2
411
2
711
2
1011
2
111
30
10
20
30
40
50
60
70
80
90
100
Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsTDR
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Cost-effective long-term preservation and access for digitized content
bull Facilitate decision-making about digitization and print collection management
bull Facilitate activities such as discovery copyright review use of materials
Repository and Content
Michigan 43
California 32
Wisconsin 5
Cornell 4NYPL 2
Princeton 2Indiana 2
Columbia 1
Harvard 2LoC 1 Madrid 1 Minnesota 1
Illinois 1
Content Sources
English48
German9
French7
Spanish5
Chinese4
Russian4
Japanese3
Italian2
Arabic2
Latin1
Remaining Languages
14
Language Distribution (1)
The top 10 languages make up ~86 of all content
Undetermined7
Polish7
Portuguese7
Dutch5
Hebrew5
Hindi5
Indonesian4
Korean4Swedish
3
Urdu3
Turkish3
Thai3Danish
3
Czech3
Unknown3
Croatian2
Persian2
Tamil2
Bengali2
Music2
Hungarian2
Norwegian2
Sanskrit2
Vietnamese1
Ukrainian1
Greek1
Bulgarian1Serbian
1Armenian
1Romanian
1Marathi
1
Ancient-Greek1 Panjabi
1
Telugu1Malay
1
Catalan1
Malayalam1
Multiple1
Finnish1
Slovak1
Language Distribution (2)
The next 40 languages make up ~13 of total
Dates
Copyright Distribution
In-copyright or unde-termined
69
Public Domain (worldwide)
15
US Federal Government Documents (worldwide)
4
Public Domain(US)11
Open Access1
Creative Commons 04
1010
8
110
9
410
9
710
9
1010
9
111
0
411
0
711
0
1011
0
111
1
411
1
711
1
1011
1
111
2
411
2
711
2
1011
2
111
30
10
20
30
40
50
60
70
80
90
100
Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsTDR
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
bull Facilitate decision-making about digitization and print collection management
bull Facilitate activities such as discovery copyright review use of materials
Repository and Content
Michigan 43
California 32
Wisconsin 5
Cornell 4NYPL 2
Princeton 2Indiana 2
Columbia 1
Harvard 2LoC 1 Madrid 1 Minnesota 1
Illinois 1
Content Sources
English48
German9
French7
Spanish5
Chinese4
Russian4
Japanese3
Italian2
Arabic2
Latin1
Remaining Languages
14
Language Distribution (1)
The top 10 languages make up ~86 of all content
Undetermined7
Polish7
Portuguese7
Dutch5
Hebrew5
Hindi5
Indonesian4
Korean4Swedish
3
Urdu3
Turkish3
Thai3Danish
3
Czech3
Unknown3
Croatian2
Persian2
Tamil2
Bengali2
Music2
Hungarian2
Norwegian2
Sanskrit2
Vietnamese1
Ukrainian1
Greek1
Bulgarian1Serbian
1Armenian
1Romanian
1Marathi
1
Ancient-Greek1 Panjabi
1
Telugu1Malay
1
Catalan1
Malayalam1
Multiple1
Finnish1
Slovak1
Language Distribution (2)
The next 40 languages make up ~13 of total
Dates
Copyright Distribution
In-copyright or unde-termined
69
Public Domain (worldwide)
15
US Federal Government Documents (worldwide)
4
Public Domain(US)11
Open Access1
Creative Commons 04
1010
8
110
9
410
9
710
9
1010
9
111
0
411
0
711
0
1011
0
111
1
411
1
711
1
1011
1
111
2
411
2
711
2
1011
2
111
30
10
20
30
40
50
60
70
80
90
100
Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsTDR
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Repository and Content
Michigan 43
California 32
Wisconsin 5
Cornell 4NYPL 2
Princeton 2Indiana 2
Columbia 1
Harvard 2LoC 1 Madrid 1 Minnesota 1
Illinois 1
Content Sources
English48
German9
French7
Spanish5
Chinese4
Russian4
Japanese3
Italian2
Arabic2
Latin1
Remaining Languages
14
Language Distribution (1)
The top 10 languages make up ~86 of all content
Undetermined7
Polish7
Portuguese7
Dutch5
Hebrew5
Hindi5
Indonesian4
Korean4Swedish
3
Urdu3
Turkish3
Thai3Danish
3
Czech3
Unknown3
Croatian2
Persian2
Tamil2
Bengali2
Music2
Hungarian2
Norwegian2
Sanskrit2
Vietnamese1
Ukrainian1
Greek1
Bulgarian1Serbian
1Armenian
1Romanian
1Marathi
1
Ancient-Greek1 Panjabi
1
Telugu1Malay
1
Catalan1
Malayalam1
Multiple1
Finnish1
Slovak1
Language Distribution (2)
The next 40 languages make up ~13 of total
Dates
Copyright Distribution
In-copyright or unde-termined
69
Public Domain (worldwide)
15
US Federal Government Documents (worldwide)
4
Public Domain(US)11
Open Access1
Creative Commons 04
1010
8
110
9
410
9
710
9
1010
9
111
0
411
0
711
0
1011
0
111
1
411
1
711
1
1011
1
111
2
411
2
711
2
1011
2
111
30
10
20
30
40
50
60
70
80
90
100
Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsTDR
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Michigan 43
California 32
Wisconsin 5
Cornell 4NYPL 2
Princeton 2Indiana 2
Columbia 1
Harvard 2LoC 1 Madrid 1 Minnesota 1
Illinois 1
Content Sources
English48
German9
French7
Spanish5
Chinese4
Russian4
Japanese3
Italian2
Arabic2
Latin1
Remaining Languages
14
Language Distribution (1)
The top 10 languages make up ~86 of all content
Undetermined7
Polish7
Portuguese7
Dutch5
Hebrew5
Hindi5
Indonesian4
Korean4Swedish
3
Urdu3
Turkish3
Thai3Danish
3
Czech3
Unknown3
Croatian2
Persian2
Tamil2
Bengali2
Music2
Hungarian2
Norwegian2
Sanskrit2
Vietnamese1
Ukrainian1
Greek1
Bulgarian1Serbian
1Armenian
1Romanian
1Marathi
1
Ancient-Greek1 Panjabi
1
Telugu1Malay
1
Catalan1
Malayalam1
Multiple1
Finnish1
Slovak1
Language Distribution (2)
The next 40 languages make up ~13 of total
Dates
Copyright Distribution
In-copyright or unde-termined
69
Public Domain (worldwide)
15
US Federal Government Documents (worldwide)
4
Public Domain(US)11
Open Access1
Creative Commons 04
1010
8
110
9
410
9
710
9
1010
9
111
0
411
0
711
0
1011
0
111
1
411
1
711
1
1011
1
111
2
411
2
711
2
1011
2
111
30
10
20
30
40
50
60
70
80
90
100
Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsTDR
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
English48
German9
French7
Spanish5
Chinese4
Russian4
Japanese3
Italian2
Arabic2
Latin1
Remaining Languages
14
Language Distribution (1)
The top 10 languages make up ~86 of all content
Undetermined7
Polish7
Portuguese7
Dutch5
Hebrew5
Hindi5
Indonesian4
Korean4Swedish
3
Urdu3
Turkish3
Thai3Danish
3
Czech3
Unknown3
Croatian2
Persian2
Tamil2
Bengali2
Music2
Hungarian2
Norwegian2
Sanskrit2
Vietnamese1
Ukrainian1
Greek1
Bulgarian1Serbian
1Armenian
1Romanian
1Marathi
1
Ancient-Greek1 Panjabi
1
Telugu1Malay
1
Catalan1
Malayalam1
Multiple1
Finnish1
Slovak1
Language Distribution (2)
The next 40 languages make up ~13 of total
Dates
Copyright Distribution
In-copyright or unde-termined
69
Public Domain (worldwide)
15
US Federal Government Documents (worldwide)
4
Public Domain(US)11
Open Access1
Creative Commons 04
1010
8
110
9
410
9
710
9
1010
9
111
0
411
0
711
0
1011
0
111
1
411
1
711
1
1011
1
111
2
411
2
711
2
1011
2
111
30
10
20
30
40
50
60
70
80
90
100
Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsTDR
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Undetermined7
Polish7
Portuguese7
Dutch5
Hebrew5
Hindi5
Indonesian4
Korean4Swedish
3
Urdu3
Turkish3
Thai3Danish
3
Czech3
Unknown3
Croatian2
Persian2
Tamil2
Bengali2
Music2
Hungarian2
Norwegian2
Sanskrit2
Vietnamese1
Ukrainian1
Greek1
Bulgarian1Serbian
1Armenian
1Romanian
1Marathi
1
Ancient-Greek1 Panjabi
1
Telugu1Malay
1
Catalan1
Malayalam1
Multiple1
Finnish1
Slovak1
Language Distribution (2)
The next 40 languages make up ~13 of total
Dates
Copyright Distribution
In-copyright or unde-termined
69
Public Domain (worldwide)
15
US Federal Government Documents (worldwide)
4
Public Domain(US)11
Open Access1
Creative Commons 04
1010
8
110
9
410
9
710
9
1010
9
111
0
411
0
711
0
1011
0
111
1
411
1
711
1
1011
1
111
2
411
2
711
2
1011
2
111
30
10
20
30
40
50
60
70
80
90
100
Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsTDR
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Dates
Copyright Distribution
In-copyright or unde-termined
69
Public Domain (worldwide)
15
US Federal Government Documents (worldwide)
4
Public Domain(US)11
Open Access1
Creative Commons 04
1010
8
110
9
410
9
710
9
1010
9
111
0
411
0
711
0
1011
0
111
1
411
1
711
1
1011
1
111
2
411
2
711
2
1011
2
111
30
10
20
30
40
50
60
70
80
90
100
Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsTDR
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Copyright Distribution
In-copyright or unde-termined
69
Public Domain (worldwide)
15
US Federal Government Documents (worldwide)
4
Public Domain(US)11
Open Access1
Creative Commons 04
1010
8
110
9
410
9
710
9
1010
9
111
0
411
0
711
0
1011
0
111
1
411
1
711
1
1011
1
111
2
411
2
711
2
1011
2
111
30
10
20
30
40
50
60
70
80
90
100
Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsTDR
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
1010
8
110
9
410
9
710
9
1010
9
111
0
411
0
711
0
1011
0
111
1
411
1
711
1
1011
1
111
2
411
2
711
2
1011
2
111
30
10
20
30
40
50
60
70
80
90
100
Boston CollegeFloridaYaleUtah StateUNC-Chapel HillPurduePenn StateNorthwesternNCSUIllinoisDukeChicagoVirginiaMinnesotaMadridLoCHarvardColumbiaIndianaPrincetonNYPLCornellWisconsinCaliforniaMichigan
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsTDR
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsTDR
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsTDR
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsTDR
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Source
Bibliographic Data
Content Package
MichiganIndiana
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
Datasets
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
We engage in preservationfor purposes of access
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Source
Bibliographic Data
Content Package
Bib Data
Data Management
Rights Data
Storage
Access
Ingest
Catalog
Full-text Search
PageTurner
APIs
Collections
Holdings Data
DatasetsMichigan
Indiana
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Making Content Available
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Descriptive headings added (hidden from GUI with CSS)
Info about SSD service amp link to accessibility page
Images used for style are in css so no need to use alt tags
Skip navigation link
Access keys for navigating pages with keyboard
Added labels amp descriptive titles to forms amp ToC table
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Access
Catalog
Full-text Search
PageTurner
APIs
Collections
Datasets
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
APIs
bull Data APIndash Volume and rights informationndash Page imagesndash OCR
bull Bibliographic APIndash Volume and rights informationndash MARC records
bull OAIbull ldquoHathifilesrdquo
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Datasets
bull Google-digitized ~28 million texts Requires proposal to HathiTrust Agreement with Google Statement on usemanagement
bull Non-Google-digitized ~370000 texts Freely available Statement on management
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Research Center
bull Environment to perform research on HathiTrust corpus
bull httpwwwhathitrustorghtrc
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
bull httplibumichedumpachbull Package of tools to enable publication of open
access born-digital journal content directly into HathiTrustndash Including accompanying data and media files
bull Allows integration with popular journal publishing tools such as Open Journal Systems (OJS)
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Source Archive
Editorial Market
Higher Education
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Access Determinations
bull Automatedbull Manual
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Automatic Rights Determination
bull Conducted on all works at time of ingest and when records are modifiedndash Public domain worldwide
bull US works published before 1923 US federal government publications non-US works published prior to 1873
ndash Public domain in the United Statesbull Non-US works published prior to 1923
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Manual Rights Determination
bull IMLS-funded CRMS projectndash CRMS-US
bull 2008 US-published works 1923-1963bull Staff at 4 partner institutions
ndash CRMS-Worldbull 2011 Expanded to non-US worksbull Staff at 16 partner institutions
ndash Double review with additional expert review for conflictsndash Compliance with copyright formalitiesndash As of January 2013 241541 reviewed more than 132644
openedbull Rights Holder Permissions
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
bull System of Precedence
Rights Database
Bibliographic (automatic)
Manual
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Lawful uses
bull Users who have print disabilitiesndash All in-copyright works in HathiTrust currently
owned (or owned previously) by the partner institution
ndash Must be authenticatedndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgaccessibility
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Lawful uses (2)
bull Out of print and brittle missingndash Works must be currently owned (or owned
previously) by the partner institutionndash Must be authenticated or accessing work from
library premisesndash Must be on US soilndash One simultaneous access per copy ownedndash httpwwwhathitrustorgout-of-print-brittle
bull Access and use statementsndash httpwwwhathitrustorgaccess_use
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
e-Commerce
Print on Demand
Content Ingest
Transformation
Validation
Content Access
PageTurner
Collection Builder
Large-scale Search
Bibliographic Catalog
Research Center
APIs
Quality Assurance
Quality Review
Content Certification
User Services
Usability
User support (helpdesk)
Outreach
Project website
Monthly newsletter
Papers and presentations
Communication with potential
partners
Surveys general inquiries
Repository evaluation and
audit (eg DRAMBORA
TRAC)
Legal
Risk management (use of materials)
Partner agreements
Advocacy
Governance
Budget Finances
Decision-making
Policy
Planning
Enterprise Management
Communication and Coordination
with partner institutions
Project management
Repository Administration
Hardware configuration and
maintenance
Web and application server configuration and
maintenance
Security
Permissions
Logging
Repository Administration
Data management (content storage backup integrity checks deletion)
Hardware selection and replacement
Content and Metadata
specifications
Disaster Recovery
Processes for ensuring content
integrity
Rights Management
Copyright determination
Copyright review
Copyright information
management (database)
Rightsholder permissions
Bibliographic Data
Management
Entity description (record-level)
Object identification (item-
level)
Data availability
Collection Development
Digitalbull Expansion beyond
books and journals (born-digital images and maps audio)
bull Selection of content (for non-Google volume ingest and pilots projects)
Printbull Cloud Library (effect
of digital on print)
Financial contributions of partners
HathiTrust Functional Framework
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
HathiTrust
Strategic Advisory BoardBudgetFinances Decision-making
Guidance on Policy Planning
bull Driven by needs of institutionsbull Leverage across the partnershipbull Projects Print on Demand Grant Work Ingest Specifications
PageTurner Bibliographic Data Management
Executive Committee
Collective Work Working Groups and Committees
Operationalbull Communicationsbull User Supportbull User Experience
Operationalbull Communicationsbull User Supportbull User Experience
Strategicbull Collectionsbull Discovery Interfacebull Full-text Search
Distributed work
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Constitutional Convention
bull October 2011bull 52 partnersbull 3-year review overseen by SABbull Ballot Proposals
ndash Print monograph storagendash Approval Process for development initiativesndash US Government Documentsndash Fee-for-service content depositndash Governance
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
HathiTrust
Executive Committee
Strategic Advisory
Board
BudgetFinancesDecision-making
Guidance on Policy Planning
bull 12-member Board of Governors
bull Chief Executive Officer bull Executive Committee
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Governance
bull Efficient practicalbull Inclusive collective
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Outline
bull The Big Idea ndash Mission and Goals
bull What wersquore doing to get there ndash Repository and Content ndash Making content available ndash Organizational structure
bull How HathiTrust can change the way we work
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
How HathiTrust Can Change the Way We Work
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Seeing collective problems as collective
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Breakdown of HathiTrust book corpus by publication date
Bibliographic Indeterminacy and the Scale of Problems and Opportunities of Rights in Digital Collection Building ndash 22011
42
19
20
19
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Breakdown of HathiTrust book corpus by publication date
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Copyright status of books published pre-1923 and US works published 1923-1963
42
19
20
19
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Copyright status of books published pre-1923 and US works published 1923-1963
In Print 42
19
20
19
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
bull Identificationbull Descriptionbull Rights
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic records
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objects
Relationships
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
bull Identificationbull Descriptionbull Rightsbull Relationships
ndash Bibliographic recordsndash Bib records and objectsndash Digital objectsndash Digital and print
Relationships
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Understanding the relationship between the collective and local
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
1st model Price per GB
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
2008 2009 2010 2011 2012 (Oct)Total Volumes 2477871 5221092 7836698 9966572 10531566
Public Domain 372085 758947 1959223 2712626 3218132
2008 2009 2010 2011 2012 (Oct)0
2000000
4000000
6000000
8000000
10000000
12000000
Total VolumesPublic Domain
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
0 20 40 60 80 100 1200
10
20
30
40
50
60
Rank in 2008 ARL Investment Index
o
f Tit
les
in L
ocal
Col
lecti
on
A global change in the library environment
June 2010Median duplication 31
June 2009Median duplication 19
Academic print book collection already substantially duplicated in mass digitized book corpus
Courtesy of Constance Malpas OCLC Research
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Digitized Books in Shared Repositories
Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-100
500000
1000000
1500000
2000000
2500000
3000000
3500000
Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
Uni
que
Titl
es
~75 of mass digitized corpus is lsquobacked uprsquo in one or more shared print repositories
~35M titles
~25M
Courtesy of Constance Malpas OCLC Research
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Collection Overlap
bull More than 50 median overlap with ARL institutions higher for small liberal arts colleges
bull New Pricing model based on Print holdingsndash httpwwwhathitrustorgcostndash Requires print holdings databasendash Also support expansion of legal uses efforts in de-
duplicationndash Facilitate individual and collaborative collection
development and management operationsbull Print monographs archiving
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Sourcing and Scalinghttporweblogoclcorgarchives002058html
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
bull Scalendash Institution-scalendash Group-scalendash Web-scale
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
bull Sourcingndash Institutionalndash Collaborativendash Third-party
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
A new kind of library
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
Thank you
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust
How to find out more
bull About httpwwwhathitrustorgaboutbull Twitter httptwittercomhathitrustbull Facebook httpwwwfacebookcomhathitrustbull Monthly newsletter
ndash httpwwwhathitrustorgupdatesndash RSS httpwwwhathitrustorgupdates_rss
bull Contact us feedbackissueshathitrustorgbull Blogs httpwwwhathitrustorgblogs
ndash Large-scale Searchndash Perspectives from HathiTrust