Date post: | 29-Jan-2018 |
Category: |
Government & Nonprofit |
Upload: | the-national-archives |
View: | 491 times |
Download: | 0 times |
#UKArcDiscovery16
Andrew Janes
Senior Archivist – Future Catalogues
UKAD forum, 17 March 2016
Twitter: @cartivist #UKArcDiscovery16
State of the Catalogue
2
C Madsen and M Hurst, Resource Discovery @ The University of Oxford 2015, p 24
http://discovery.bodleian.ox.ac.uk/wp-content/uploads/sites/5/2016/02/OxfordResourceDiscoveryReport_Final_public.pdf
“To state the obvious, collections cannot be
discovered using electronic search tools
unless they have some sort of
representative electronic description.”
3
S van Hooland and R Verborgh, Linked Data for Libraries, Archives and Museums (2014), p 71
“All metadata is dirty,
but you can do something about it.”
4
The next 15 minutes
• Data-(in)adequacy: the good, the bad and the ugly
• How did we end up like this?
• The State of the Catalogue approach
• “Data-centric” or “neo-Jenkinsonian”?
5
TNA’s catalogue within Discovery
6
Discovery
Finding Archives
data
ePRO and DRI
data
TNA Catalogue
data
Inadequate catalogue data
7
“It does exactly what it says on the tin”
Reference: OS 33/1812
Description: Map showing the boundaries of the
following places: Somerset: Clapton-in-
Gordano (parish), Portbury (parish)
Date: 1964
Held by: The National Archives, Kew
Legal status: Public Record
Closure status: Open Document, Open Description
8
The bare minimum?
Essential elements
according to ISAD(G)2*
• Reference code
• Title
• Date(s)
• Extent
• Level of description
• Name of creator(s)
TNA measures of basic
data adequacy
• Description
• Date
• Unique reference code
Non-measurable data
adequacy
• Hierarchical structure
• Browsing order*ISAD(G), 2nd edition (2000), para I.12
www.ica.org/10207/standards/isadg-general-international-
standard-archival-description-second-edition.html
9
The extent of the problem challenge
File- and item-level catalogue entries that fail the
Ronseal test as of January 2016:
10
Unreferenced items: 477,723
Files and items with no descriptions: 305,429
Files and items with blank dates: 454,345
Total: 1,237,497
Uneven distribution of data problems
11
30% in worst 10 series
39% in worst 20 series
56% in worst 50 series
69% in worst 100 series
81% in worst 200 series
92% in worst 500 series
97% in worst 1000 series
PRO finding aids system in the mid 1990s
12
Creating PROCAT,
the original PRO online
catalogue
13
• Series level and
above: curated entries
• Subseries level and
below: bulk conversion
of entries
Residual issues after
retrospective conversion
14
OS 33/1812
[blank description]
[blank date]
OS 33/1812
Clapton-in-Gordano
1964
OS 33/1812
Portbury
1964
Subsequent developments
15
Data enhancement
• More conversion of
paper finding aids
• More cataloguing
from original
records
• Tidying up for data-
adequacy
• Correcting errors
• Accessions: more
records, more data
Changes in the context
of presenting the data
• Users expect to search
bottom-up not top-down
• Growth of digital and
digitised content
• Discovery: a combined
search tool
Where are we now with our catalogue data?
16
Grows and improves
every day
Multifaceted approach
to enhancement
Very uneven richness
and granularity
1 million + major
inadequacies
State of the Catalogue: purpose
17
• Assess the scope and extent of bad data
• Reduce bad data proactively and efficiently
• Working series by series
• Making records more accessible
• Faciliating research
• Influence wider cataloguing priorities and decisions
DO
Understand context,
structure and content
Consult relevant
colleagues
Check and fix errors
and anomalies
Edit data in bulk
Check the records
systematically
Expand descriptions
from paper finding aids
Enforce finer detail of
descriptive standards
Proofread every line
State of the Catalogue: in and out of scope
DON’T
18
State of the Catalogue: typical outline process
19
1. Download data to Excel
2. Fix errors and anomalies
3. Make bulk changes
4. Fix errors and anomalies
5. Upload data to PROCAT
Editorial
6. Manual editing (e.g. of
subseries)
7. Release into Discovery
8. Delete any redundant
entries
9. Check and mop up
Some simple and useful data tools in Excel
• Sorting
• Filtering
• Conditional formatting
• Text to columns
• Functions
Functions include:
• AND, OR, NOT
• IF
• CONCATENATE
• LEFT, MID, RIGHT
• VLOOKUP
• LEN
• TRIM
20
Progress in eliminating bad data
21
• January 2013:
2.85 million data problems
• January 2016:
1.23 million data problems
Is State of the Catalogue “data-centric”?
• A radically-traditional approach
• Moral defence of the record requires adequate data
• Helps redress huge variation in depth and richness of
catalogue data (see MAD’s rule against bias*)
• Prioritises adequate breadth over extra depth
• A record-centric approach
• A data-centric approach?
22
*M Proctor and M Cook, A Manual of Archival Description, 3rd edition (2000), para 8.6E