Date post: | 16-Dec-2014 |
Category: |
Technology |
Upload: | dan-chudnov |
View: | 870 times |
Download: | 3 times |
CTS* at LC**
Daniel Chudnov - 2010-10-15 - dchud at loc govAccess 2010 - Winnipeg
* Content Transfer Services** Library of Congress
follow along at
slideshare.net / dchud
slideshare.net / dchud
work in progress
transferverificationinventoryreportingworkflow
notificationstatusaccess
hard to show
but that won’t stop me
tinyurl.com/cts2010
when i’m donethis shouldmake sense
NDNP
publishingbreaking news*
online
* 100 years after it happens
chroniclingamerica.loc.gov
1,442,264pages
last year at Access
2,692,369pages
this year at Access
went livespring 2007
first two years 1.4Mlast year 2.7M
56 TB content117 TB in copies
how?
1.built abetter
access system
faster ingestfrom 1 month to 1 day
2.workflow
CTSdoesthis
monthsbatches
monts
page counts
first batchreceived2005-10
went live spring 2007
press event
the gap
first CTS workflow2009-09
2010-09
3-4 month lag
2-3 month lag
1-2 month lag
ingest rateapproachesreceipt rate
this makes ussmile
contenttransferservices
some requirements
LC starteddigitizing
in the 1980s
we have a lot
of stuff
in alot
of places
distributedcomputing
environment
commercialMFT*
license
* Managed File Transfer
buy or build?
why, both, thank you
100s of collections
dozens ofcuratorial organizations
lots morestuff
coming every day
along term project
collectingand
making available
services fortransfer ofcontent
any content
lots of transfers“movage”
several services
transferverificationinventoryreportingworkflow
notificationstatusaccess
transfer across
systemsorganizations
time
content transferis
risky
copies fail
bits go bad
drives get lost
you forgetwhat you did
you forgetwhat you had
people retire
software breaks
hardware breaks
three blizzardsin
DC
CTS helpsmake transfers
reliable and resilient
reliable
know when you’vesucceeded
BagIt
packing slipfor data
data in a Bag
.|-- bag-info.txt|-- bagit.txt|-- data| |-- batch.xml| |-- batch_1.xml| |-- batch_ne_dewitt_rework| | |-- 00206538016_batch.xml| | |-- 00206538028_batch.xml| | `-- sn99021999| `-- sn99021999| |-- 00206538016| | |-- 0000.jp2| | |-- 0000.pdf| | |-- 0000.tif| | |-- 0000.xml| | |-- 0001.jp2| | |-- 0001.pdf| | |-- 0001.tif| | |-- 0001.xml
identifiesa bag
.|-- bag-info.txt|-- bagit.txt|-- data| |-- batch.xml| |-- batch_1.xml| |-- batch_ne_dewitt_rework| | |-- 00206538016_batch.xml| | |-- 00206538028_batch.xml| | `-- sn99021999| `-- sn99021999| |-- 00206538016| | |-- 0000.jp2| | |-- 0000.pdf| | |-- 0000.tif| | |-- 0000.xml| | |-- 0001.jp2| | |-- 0001.pdf| | |-- 0001.tif| | |-- 0001.xml
where thedata starts
.|-- bag-info.txt|-- bagit.txt|-- data| |-- batch.xml| |-- batch_1.xml| |-- batch_ne_dewitt_rework| | |-- 00206538016_batch.xml| | |-- 00206538028_batch.xml| | `-- sn99021999| `-- sn99021999| |-- 00206538016| | |-- 0000.jp2| | |-- 0000.pdf| | |-- 0000.tif| | |-- 0000.xml| | |-- 0001.jp2| | |-- 0001.pdf| | |-- 0001.tif| | |-- 0001.xml
packingslip
.|-- bag-info.txt|-- bagit.txt|-- data| |-- batch.xml| |-- batch_1.xml| |-- batch_ne_dewitt_rework| | |-- 00206538016_batch.xml| | |-- 00206538028_batch.xml| | `-- sn99021999| `-- sn99021999| |-- 00206538016| | |-- 0000.jp2| | |-- 0000.pdf| | |-- 0000.tif| | |-- 0000.xml| | |-- 0001.jp2| | |-- 0001.pdf| | | ...|-- manifest-md5.txt`-- tagmanifest-md5.txt
71607ad119be88c842268a76f0b6b9e9 data/sn99021999/00206538107/1884091301/0621.pdfc602d2ac07508059ce5f5597e239b97f data/sn99021999/00206538120/1885100601/0831.xmla59795bd1584532d5cbc0b1d82f75cf8 data/sn99021999/00206538016/1880061401/0593.pdf3c64fac7e2d49671e0d93908ae42a779 data/sn99021999/00206539616/1888101801/0905.xml03158a560baa7479b3805d2b45ee02cd data/sn99021999/00206538028/1880111501/0405.tiffa56ea18580e1446939ed62709e5b2db data/sn99021999/00206538077/1883061901/1145.pdfbf4fb83ff8305e8256970a3466c1a12d data/sn99021999/00206538120/1885061501/0043.pdf8f3649fc812de74b9d9443ee90a8ac9c data/sn99021999/00206538120/1885111101/1109.tife0b83a7f9ca228271fdaecf6348e1cec data/sn99021999/00206538120/1885101201/0871.xml1c2f84e12792c123ba0aabedd0c0bbad data/sn99021999/00206538107/1884071401/0197.xml080e557fe9f68037605e5b80df4bc4ac data/sn99021999/0020653820A/1888050701/0543.tif532efe32c156459d9d9589caf618f502 data/sn99021999/00206538120/1885071401/0250.tifce607af59a96f2656d9448f38ffda072 data/sn99021999/0020653820A/1888052801/0731.pdf60b626d8fd40aca1b425e86a004bb055 data/sn99021999/00206539628/1888111801/0088.xmla467cd62350334c7aa83cf1e9056c1c6 data/sn99021999/00206539616/1888091701/0629.jp21a434f7a4d843a2c8ffe8d0824fafc3f data/sn99021999/00206538028/1880120801/0482.jp222996d89b4a3334256afaddcaa0238d8 data/sn99021999/00206538016/1874102001/0259.jp236f550da273ad4c592fee1761c98322a data/sn99021999/00206538016/1880052201/0518.jp27f7ccec3f2afae896338498372fd476e data/sn99021999/00206539616/1888080101/0200.pdfc247a5d74d0e7f857c534d935661adbe data/sn99021999/00206538107/1884072601/0286.jp24d497a18a154adcc8636239378ab340b data/sn99021999/00206539628/1889021101/0868.pdf2e8ca2558b54b5c49b2f20a355a60895 data/sn99021999/00206538065/1882092001/0136.xmlfb71493048e5010100f18012f5060d42 data/sn99021999/00206538028/1880123001/0569.xml40b100432890b055a5defbfbea815d57 data/sn99021999/00206538107/1884090901/0590.xml46f6d61480dadc1c988b0baa4de8b6c4 data/sn99021999/00206539628/1888122801/0463.pdf1cb8af0648e8c9df395b63226fe7371f data/sn99021999/00206538016/1874101501/0244.pdf9257834023c683b02f354888b2740b8f data/sn99021999/00206539616/1888102301/0956.xml0d52b3b2b1c5459b7e8d500a8566b0bf data/sn99021999/00206538120/1885080801/0425.tif
indicates two things
1
what i thinki’m sending you
2
whether youreceived it
just likea
packing slip
works acrossspace
works acrosssystems
works acrossorgs
works acrosstime
easy to make
md5deep
BIL
BagItLibrary
Bagger
desktop GUI
BIL is free softwareBagger will be soon
sf.net/projects/loc-xferutils/
see also:BagIt
in Wikipedia
edsu++
reliabilitythrough bagging
resiliencethrough
persistence
verify thatcopies succeed
know whencopies fail
repeat untilcopies succeed
debug&
diagnose
record all of it
know what you haveknow what you did
inventory
BagIt checksumsin a DB
content propertiesproject, process, type
event timeline
receiptverification
QRcopies
accept/rejectingest/release
comments
life cycle ofsome setof content
basicfacts
all the copies projectdetails
event timeline
comments along the way
life cycle ofNDNPbatch
two key things
1automated workflow
using jBPM
this part
process definitionmanages the steps
doesn’t let us forget
2when content partners
callwe can answertheir questions
reporting
answering ourown questions
annual reportsvery important
file countsoverall size
etc.
used to bevery difficultto determine
nowimmediateanytime
mostlyNDNP
newerpartners
also project reporting /planning
NDNP batches - one awardee
NDNP batches - all awardees(same data, CSV export)
provides5000’ view
workflow
working statusat a glance
a personalized view
overview of a whole project
overview of a system
overview of a person
not exactly “Facebook for bags”
but kinda
but wait,there’s more
browse live copies
go right to the content
many benefits
aaaand...
a RESTy web API
we can buildcomplex workflows
withinventory
and reportingin CTS
we can buildQR/workflow/auditing
outside of CTSwith inventoryand reportingthrough CTS
CTS:java, spring, mysql
hibernate, velocity, tilesjquery, jBPM, jetty
NDNP:python, django,
mysql, solr, apache
nice clean interfacesnice separation
different coders,different styles
same benefitsfrom using CTS
what’s next?
many morecontent collections
now:
NDNPWeb Archives
NDIIPPCopyright Cards
next:P&PG&MWDLAFC
TwitterCopyright EDeposit
also coming:
more simple workflows
“Receive and Copy”
fits many use cases
receivebag/verify
copy to archivalcopy to access
works for reconworks for new stuff
and,get past typical problems
permissionsinsufficient storage
failed copies
connectionwith
high expectation
and, finallya UI redesign
thanks!
BagIt - wikipedia
sf.net/projects/loc-xferutils/
hooray for protovis
@dchud - dchud at loc gov