Post on 05-Jan-2016
description
transcript
Managing Collections in the Networked Environment: New Analytic Approaches
OCLC Research Webinar9 September 2010
Constance Malpas, OCLC ResearchZack Lane, Columbia UniversityHelen Look, University of MichiganJacob Nadal, University of California, Los Angeles
Managing Collections in the Networked Environment: New Analytic Approaches
2
Context
• Making library data “work harder” • Decision support: where should limited
institutional resources be directed?• New skill sets, professional cohort emerging
• Highlight significant work at RLG partner institutions
• Identify shared research priorities, methodologies• Staffing and infrastructure requirements,
organizational development
Managing Collections in the Networked Environment: New Analytic Approaches
3
Helen LookCollection AnalystUniversity of Michiganhlook@umich.edu
Zack LaneReCAP CoordinatorColumbia UniversityZl2114@columbia.edu
Jacob NadalPreservation OfficerUCLAjnadal@library.ucla.edu
Today’s Panelists
Circulation Data at Columbia University Libraries
Zack Lane
ReCAP CoordinatorColumbia University
Managing Collections in the Networked Environment: New Analytic Approaches
5
Project: Look at System-wide Circ Data
570827 567739
527171
505288
474218 469217 467272
0
100000
200000
300000
400000
500000
600000
FY03/04 FY04/05 FY05/06 FY06/07 FY07/08 FY08/09 FY09/10
Total Charges by FY
0
5000
10000
15000
20000
25000
30000
35000
40000
7/1
/20
03
9/1
/20
03
11
/1/2
00
3
1/1
/20
04
3/1
/20
04
5/1
/20
04
7/1
/20
04
9/1
/20
04
11
/1/2
00
4
1/1
/20
05
3/1
/20
05
5/1
/20
05
7/1
/20
05
9/1
/20
05
11
/1/2
00
5
1/1
/20
06
3/1
/20
06
5/1
/20
06
7/1
/20
06
9/1
/20
06
11
/1/2
00
6
1/1
/20
07
3/1
/20
07
5/1
/20
07
7/1
/20
07
9/1
/20
07
11
/1/2
00
7
1/1
/20
08
3/1
/20
08
5/1
/20
08
7/1
/20
08
9/1
/20
08
11
/1/2
00
8
1/1
/20
09
3/1
/20
09
5/1
/20
09
7/1
/20
09
9/1
/20
09
11
/1/2
00
9
1/1
/20
10
3/1
/20
10
5/1
/20
10
Charges by Patron Group: Monthly
GRD OFF REG VIS
0
50000
100000
150000
200000
250000
FY03/04 FY04/05 FY05/062 FY06/072 FY07/082 FY08/092 FY09/102
Charges to Patrons by Patron Group:From FY03/04-09/10
GRD OFF REG VIS
• Several data categories
• Many ways to slice
• Tip of the iceberg
Managing Collections in the Networked Environment: New Analytic Approaches
6
Lots of Amazing (Granular) Data
• Main categories: accessions, retrieval, delivery and circulation
Managing Collections in the Networked Environment: New Analytic Approaches
7
Monday Tuesday Wednesday
Thursday Friday Saturday Sunday0
10000
20000
30000
40000
50000
60000
Request Volume by WeekdayFY02-FY09 (total: 289,140)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 230
5000
10000
15000
20000
25000
30000
35000
Request Volume by HourFY02-FY09 (total: 289,140)
Retrieval Rate by Publication Date and Language
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
40.00%
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
spa rus por pol ita hin ger fre eng ara
0
1000
2000
3000
4000
5000
6000Multi-dimensional Analyses
Managing Collections in the Networked Environment: New Analytic Approaches
8
Circulation Analysis Project: Spring 2010
• Identify bright and capable intern: Steve Zweibel!• Locate data sets• Understand data sets• Working with Systems staff to improve data• Reformatting Data• Manipulating data with Excel 2003/2007 (Pivot tables)• Presenting data with Power Point• Rethink, rework and refine
Managing Collections in the Networked Environment: New Analytic Approaches
9
Big Numbers: 18% decline in circ over 7 years
570827 567739
527171
505288
474218 469217 467272
0
100000
200000
300000
400000
500000
600000
FY03/04 FY04/05 FY05/06 FY06/07 FY07/08 FY08/09 FY09/10
Total Charges by FY
Managing Collections in the Networked Environment: New Analytic Approaches
10
What’s to Gain?
• Data set is clean and compact• Categories apply to both on-site and off-site
collections• Low barrier of entry for library school intern• Everyone has access; no-one is looking• Improves understanding of circ system, patron
behavior, staff habits, data analysis tools and system-wide trends
• Staff somewhat surprised that circulation data analysis was system-wide not ReCAP-specific
Managing Collections in the Networked Environment: New Analytic Approaches
11
Big Numbers: Revisited
0
5000
10000
15000
20000
25000
30000
35000
40000
7/1
/20
03
9/1
/20
03
11
/1/2
00
3
1/1
/20
04
3/1
/20
04
5/1
/20
04
7/1
/20
04
9/1
/20
04
11
/1/2
00
4
1/1
/20
05
3/1
/20
05
5/1
/20
05
7/1
/20
05
9/1
/20
05
11
/1/2
00
5
1/1
/20
06
3/1
/20
06
5/1
/20
06
7/1
/20
06
9/1
/20
06
11
/1/2
00
6
1/1
/20
07
3/1
/20
07
5/1
/20
07
7/1
/20
07
9/1
/20
07
11
/1/2
00
7
1/1
/20
08
3/1
/20
08
5/1
/20
08
7/1
/20
08
9/1
/20
08
11
/1/2
00
8
1/1
/20
09
3/1
/20
09
5/1
/20
09
7/1
/20
09
9/1
/20
09
11
/1/2
00
9
1/1
/20
10
3/1
/20
10
5/1
/20
10
Charges by Patron Group: Monthly
GRD OFF REG VIS
4 main patron groups: Grad Students, Faculty, Undergrads and Visitors
Managing Collections in the Networked Environment: New Analytic Approaches
12
Zack’s Deep Thoughts
• Trend of total charges is downward due to use of e-resources
• Leveling off indicates that use of print copy is still strong and critical
• Faculty charges have increased due to more Grads serving as adjunct faculty (with OFF privileges)
• Grads peak a month before Undergrads because of course requirements (tilted towards written papers instead of tests)
Managing Collections in the Networked Environment: New Analytic Approaches
13
What Else Did Steve Discover?
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
7/1/
2003
9/1/
2003
11/1
/200
3
1/1/
2004
3/1/
2004
5/1/
2004
7/1/
2004
9/1/
2004
11/1
/200
4
1/1/
2005
3/1/
2005
5/1/
2005
7/1/
2005
9/1/
2005
11/1
/200
5
1/1/
2006
3/1/
2006
5/1/
2006
7/1/
2006
9/1/
2006
11/1
/200
6
1/1/
2007
3/1/
2007
5/1/
2007
7/1/
2007
9/1/
2007
11/1
/200
7
1/1/
2008
3/1/
2008
5/1/
2008
7/1/
2008
9/1/
2008
11/1
/200
8
1/1/
2009
3/1/
2009
5/1/
2009
7/1/
2009
9/1/
2009
11/1
/200
9
1/1/
2010
3/1/
2010
5/1/
2010
Renewals by Patron Group: Monthly
GRD OFF REG VIS
Patrons renew when they must renew
Managing Collections in the Networked Environment: New Analytic Approaches
14
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
200000
FY03/04 FY04/05 FY05/06 FY06/07 FY07/08 FY08/09 FY09/10
OFF by Charge Type: FY 03/04-09/10
OFF - Sum of Charges OFF - Sum of Renewals OFF - Sum of Recalls OFF - Sum of Holds
Patron Group Habits:
• Charge/Renewal ratios are consistent with staff perceptions
• Faculty are more likely to hold onto the books that they have than charge out new ones
• Undergraduates have little need to renew with extended loan period
• Grad students charge a lot of material and hold it for more than one term
• Visitors hold books longer since obtaining OPAC renewal permission
0
50000
100000
150000
200000
250000
FY03/04 FY04/05 FY05/06 FY06/07 FY07/08 FY08/09 FY09/10
REG by Charge Type: FY 03/04-09/10
REG - Sum of Charges REG - Sum of Renewals REG - Sum of Recalls REG - Sum of Holds
Managing Collections in the Networked Environment: New Analytic Approaches
15
0
100000
200000
300000
400000
500000
600000
FY03/04 FY04/05 FY05/06 FY06/07 FY07/08 FY08/09 FY09/10
Onsite Vs. Offsite Charges
Offsite Onsite
Managing Collections in the Networked Environment: New Analytic Approaches
16
Relevant to Other Issues
Access Services considering changes to Hold policy:
• Holds have limited duration
• Data indicates that most Holds expire
• Should Hold duration be extended?
• Should Holds (via OPAC) be eliminated?
Enter Circ data analysis…
Managing Collections in the Networked Environment: New Analytic Approaches
17
3,661
5,342
5,9775,784
5,460
6,431 6,365
0
1000
2000
3000
4000
5000
6000
7000
FY03/04 FY04/05 FY05/06 FY06/07 FY07/08 FY08/09 FY09/10
Total Holds by FY
6,593
6,0436,262
10,191
11,065 11,105
11,825
0
2000
4000
6000
8000
10000
12000
14000
FY03/04 FY04/05 FY05/06 FY06/07 FY07/08 FY08/09 FY09/10
Total Recalls by FY
Overall Holds are half that of Recalls
Online Holds are one-eighth that of Recalls
0
2000
4000
6000
8000
10000
12000
FY03/04 FY04/05 FY05/06 FY06/07 FY07/08 FY08/09 FY09/10
Recalls: OPAC vs Everywhere Else
OPAC Everywhere Else
0
1000
2000
3000
4000
5000
6000
FY03/04 FY04/05 FY05/06 FY06/07 FY07/08 FY08/09 FY09/10
Holds: OPAC vs Everywhere Else
OPAC Everywhere Else
Managing Collections in the Networked Environment: New Analytic Approaches
18
Moving Forward
• Bring data to staff; don’t expect staff to come to data
• Learn from the intern• Query, pressure and improve data
• Congratulations Steve Zweibel! Part-time professional cataloging position at NYU and part-time reference at Hunter College Health Sciences Library
Post-digitization Use of Print Collections
Helen Look
Collection AnalystUniversity of Michigan
Managing Collections in the Networked Environment: New Analytic Approaches
20
Background
• Study of post-digitization use of print collections at the University of Michigan
• University of Michigan digitization efforts
• HathiTrust Digital Library - http://www.hathitrust.org
Managing Collections in the Networked Environment: New Analytic Approaches
21
Access to Institutional Resources
• Harmonizing data from different sources
• Working with different staff to gather the data
• Consulting with internal and external colleagues
Managing Collections in the Networked Environment: New Analytic Approaches
22
Methodology
• Top 500 accessed titles in HathiTrust Digital Library by the University of Michigan community in 2009
• Title-level online usage was compared to title-level print usage
• Print circulation for the sample was compiled for 2008, 2009 and total circulation history
Managing Collections in the Networked Environment: New Analytic Approaches
23
Low Usage of the Print
• 98% (489) of the titles had zero circulation
• 2% (11) of the titles circulated
• 2009 circulation for the 11 titles was equal to or less than the 2008 circulation
2%
98%
2009 Circulation
Circulated Titles
Non-Circu-lated Titles
Managing Collections in the Networked Environment: New Analytic Approaches
24
Increased Discoverability of the Content
• 39% (193) of the titles had not circulated
• 61% (307) of the titles had circulated
• Hidden treasures
61%
39%
Historic Circulation
Circulated Titles
Non-Circu-lated Titles
Managing Collections in the Networked Environment: New Analytic Approaches
25
Subject Distribution
Technology
Social Sciences
Science
Political Science
Philosophy, Psychology, Religion
Naval Science
Music
Military Science
Medicine
Law
Language and Literature
History
Geography, Anthropology, Recreation
General Works
Fine Arts
Education
Bibliography, Library Science, General Info Resources
Auxiliary Sciences of History
Agriculture
0% 2% 4% 6% 8% 10% 12% 14% 16%
13%
15%
11%
2%
6%
2%
1%
2%
1%
0%
13%
15%
0%
10%
3%
2%
1%
1%
2%
Managing Collections in the Networked Environment: New Analytic Approaches
26
Patterns in the Overall Online Usage
<1 pa
gevi
ews
1 pa
gevi
ews
2 pa
gevi
ews
3 pa
gevi
ews
4 pa
gevi
ews
5 pa
gevi
ews
6 pa
gevi
ews
7 pa
gevi
ews
8 pa
gevi
ews
9 pa
gevi
ews
10 p
agev
iews
11 p
agev
iews
12 p
agev
iews
13 p
agev
iews
14 p
agev
iews
15 p
agev
iews
16 p
agev
iews
17 p
agev
iews
18 p
agev
iews
19 p
agev
iews
20+ p
agev
iews
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
Managing Collections in the Networked Environment: New Analytic Approaches
27
Lessons Learned
• Improved our understanding of the use of online and print materials made accessible through mass digitization
• Learned from the process about what data is available and where better metrics are needed
• Identified some potential patterns for further study
Managing Collections in the Networked Environment: New Analytic Approaches
28
“The temptation to form premature theories upon insufficient data is the bane of our
profession.”
– Sherlock Holmes
Data-based Preservation Decision Making
Jacob Nadal
Preservation Officer UCLA Library
Managing Collections in the Networked Environment: New Analytic Approaches
30
Preservation Theory and History:
Medicine, Zoos and Fortresses
• 20th century preservation was effectively local. We tried to protect or repair the items in the collection
– Rigid, comprehensive security and environmental standards
– Fortification of item (library binding, deacidification)– Replacement of weak items with hardened versions
(library editions, microfilm, facsimiles)
• Libraries are more like zoos than fortresses • Preservation was trying to deal with public
health problems in the metaphorical emergency room
Controversial assertion: What libraries call preservation is more like conservation at scale, and it’s still not to scale.
Managing Collections in the Networked Environment: New Analytic Approaches
31
Preservation Theory at UCLA: Public Health, Flood Control & Habitats• Preservation works from the collection down• Conservation works from the item up• At UCLA, one strategy governs both approaches
• Every activity has a:– 1) Method of analysis or evidence-gathering– 2) Treatment proposal & outside review– 3) Hedge or fail-safe option
• We’re operating a dam or flood control channel, not manning a wall under siege
• You can make your LA River jokes now… ha, ha…• As our program matures, the watchword is
habitat:– Habitats are flexible and adaptable– Habitats are sustainable or not, depending on certain
pressures– Habitats have local versions of global types
Managing Collections in the Networked Environment: New Analytic Approaches
32
The Los Angeles River
The concrete bottom reduces the effectiveness for flood control, creates a bad habitat for wildlife, and ruins its recreational value. All that effort for nothing!
The natural river is less work and functions better. A model worth emulating!
More about the River: http://www.lariver.org and http://folar.org/
Managing Collections in the Networked Environment: New Analytic Approaches
33
From Theory towards Practice: Habitat, Sure, but Who Lives There?• UCLA, like all big RLs, has some really shabby
books.
• These “brittle books” are frustrating.– Repair is not the answer: little structural integrity
means they’re irreparable or require “heroic” conservation
– Reformatting is costly: fragile, poor contrast materials, so scanning has to be careful and high-quality
• And yet, we’re obligated to preserve certain things:
– Materials with high Los Angelocity– Scarce within the UC system, California, or the world– Signature collections, future classics, academic
emphases
• Everything else, we want you to do for us– kthanksnextslide!
Managing Collections in the Networked Environment: New Analytic Approaches
34
From Practice to Practicalities
• Push decisions from the item to the network context
• Holdings review is the first step• Holdings data parsed into global (Worldcat), regional
(CA/350 miles of zip code 90095), and system (Worldcat Local/NGM)
• HathiTrust status checked (Portico, (C)LOCKSS, JSTOR to come?)
• These data are placed into a risk assessment model
• Series of automated recommendations are made
So, how does that dam/wall/habitat thing address the problem of lots of individual shabby books, in the context of a globally-intended collection of record?
Managing Collections in the Networked Environment: New Analytic Approaches
35
Risk Assessment Model From Candace Yano (UC Berkeley/Ithaka)
Initial number of copies Survival probability1 36.6032%2 59.8085%3 74.5199%4 83.8464%5 89.7592%6 93.5076%7 95.8841%8 97.3906%9 98.3457%
10 98.9513%11 99.3351%
12 99.5785%13 99.7328%14 99.8306%15 99.8926%16 99.9319%17 99.9568%18 99.9726%19 99.9827%20 99.9890%21 99.9930%22 99.9956%23 99.9972%24 99.9982%25 99.9989%
26 99.9993%
12 copies is where the curve is asymptotic
26 is derived from past decisions by UCLA
Managing Collections in the Networked Environment: New Analytic Approaches
36
Basic Scenario for Preservation Review
• Three outcomes:• Keep if [<12 global] OR [<3 CA] OR [0 UC] • Withdraw if [> 26 global]• Else Review
• Data is collected (point 1) then a proposed treatment gets external review by coll. managers (point 2) and all decisions are hedged by the network (point 3)
• “Keep” implies that preservation will see to it that the content remains in the collection
• “Withdraw” really means withdraw• “Review” means we need a genuine person to make a
decision (and people are both slow and idiosyncratic, so…)
Managing Collections in the Networked Environment: New Analytic Approaches
37
Alternate Scenarios
• Decision making starts with the basic scenario. We’ll fine-tune that as we collect decision data
• Seeking best match between collection managers decisions and automated indicators
• After a designated review period, may use a more risk-tolerant scenario to decide on materials lingering in “review” status
• At present, we have a known unknown regarding artifactual value
• Conservation screens materials and routes to preservation review. Conservators are eagle-eyed about artifactual value
• Preservation officer reviews all “withdraws” and, for better or worse, yours truly has a mild case of bibliomania
Managing Collections in the Networked Environment: New Analytic Approaches
38
The Hedgerow
#2 OR, with Hathi, Retain First
#3 AND, Retain First
#4 AND, with Hathi, Retain First
#1 OR, Retain First
Managing Collections in the Networked Environment: New Analytic Approaches
39
The Long Tail
Managing Collections in the Networked Environment: New Analytic Approaches
40
The Los Angeles Triangle
Inside this zone, we have a stewardship obligation, driven by preservation, a general good
Outside, we have options, driven by an institutional intention
Managing Collections in the Networked Environment: New Analytic Approaches
41
Acknowledgements and Next Steps
• What made this possible• Annie Peterson – summer intern in the UCLA Library
preservation office, from UIUC GSLIS. What made it possible
• Willingness by all to try a “Cynefin” style of work -- gradual sense-making and continuous process improvement
• What would make it easier• In-house statistics expertise and research support• Longer stretches of uninterrupted time• Better serials data – Communal 583 + Local Holdings
Records
• What comes next• More of the same, to refine and test our process• Application to other activities: gifts & exchange,
replacements and preservation-driven acquisition, preservation survey & audit
Managing Collections in the Networked Environment: New Analytic Approaches
42
For More Information
ReCAP Data Center (Columbia University)• Zack Lane & Colleen Major “
Impact Theories: Trends in Off-site Shelving Facility Use” (ACRL, 2008)
HathiTrust Digital Library• Helen Look "
Mass Digitization: Analyzing Online vs. Print Usage at a Large Academic Research Library" (ARL, 2010)
UCLA Library Preservation Blog• Jake Nadal and John Riemer “
Preservation Actions, MARC 21 Field 583, and Communal Local Holdings in OCLC WorldCat” (CONSER, 2009)
Managing Collections in the Networked Environment: New Analytic Approaches
43
Questions, Comments?
Zack Lane - zl2114@columbia.edu
Helen Look - hlook@umich.edu
Jacob Nadal - jnadal@library.ucla.edu
http://www.oclc.org/research/events/webinars.htmhttp://itunes.apple.com/podcast/oclc-research-podcasts-webinars/id284764834