+ All Categories
Home > Documents > Columbia’s Born-Digital Preservation...

Columbia’s Born-Digital Preservation...

Date post: 25-Apr-2019
Category:
Upload: ngotu
View: 214 times
Download: 0 times
Share this document with a friend
22
Columbia’s Born-Digital Preservation Infrastructure & Ford International Fellowships Program
Transcript
Page 1: Columbia’s Born-Digital Preservation Infrastructuredaviss/work/files/presentations/CCA_presentation... · Columbia Libraries Digital Program 1. Collection-based digitization 2.

Columbia’s Born-Digital Preservation Infrastructure

& Ford International Fellowships Program

Page 2: Columbia’s Born-Digital Preservation Infrastructuredaviss/work/files/presentations/CCA_presentation... · Columbia Libraries Digital Program 1. Collection-based digitization 2.

Columbia University

Page 3: Columbia’s Born-Digital Preservation Infrastructuredaviss/work/files/presentations/CCA_presentation... · Columbia Libraries Digital Program 1. Collection-based digitization 2.

Columbia Libraries Digital Program

1. Collection-based digitization

2. Long-term digital preservation

3. Website development

4. Digital Library infrastructure development

5. Born-digital collection archiving and access

Page 4: Columbia’s Born-Digital Preservation Infrastructuredaviss/work/files/presentations/CCA_presentation... · Columbia Libraries Digital Program 1. Collection-based digitization 2.
Page 5: Columbia’s Born-Digital Preservation Infrastructuredaviss/work/files/presentations/CCA_presentation... · Columbia Libraries Digital Program 1. Collection-based digitization 2.

Ford International Fellowships Program

“The IFP has since 2001 offered fellowships for post-graduate study to leaders from underserved communities in Asia, Africa, Latin America, and Russia, and will complete its work in 2014. Their archives include documentation and videos of the more than 3,300 IFP fellows who passed through the program as well as comprehensive planning and adminstrative files.”

Page 6: Columbia’s Born-Digital Preservation Infrastructuredaviss/work/files/presentations/CCA_presentation... · Columbia Libraries Digital Program 1. Collection-based digitization 2.

Ford IFP Grant

• Received in October 2011

• Become permanent archive for IFP’s paper and digital archives

• $1 million

• 3 years (technology portion)

• Archive and provide access IFP’s archives

• And …

“ … to build out a full set of repository-based systems and services so that it can more easily acquire, ingest, process, preserve and make accessible both paper and born-digital organizational records.”

Page 7: Columbia’s Born-Digital Preservation Infrastructuredaviss/work/files/presentations/CCA_presentation... · Columbia Libraries Digital Program 1. Collection-based digitization 2.

High-Level View

Page 8: Columbia’s Born-Digital Preservation Infrastructuredaviss/work/files/presentations/CCA_presentation... · Columbia Libraries Digital Program 1. Collection-based digitization 2.

Countries Harvested From: • Brazil • Chile and Peru • China • Egypt • Ghana • Guatemala • India • Indonesia • Kenya • Mexico • Mozambique • Nigeria • Palestine (Gaza and West Bank) • Philippines • Russia • Senegal • South Africa • Tanzania • Thailand • Uganda • United States - NYC Secretariat • Vietnam

Page 9: Columbia’s Born-Digital Preservation Infrastructuredaviss/work/files/presentations/CCA_presentation... · Columbia Libraries Digital Program 1. Collection-based digitization 2.

Languages Encountered:

• English

• Russian

• Portuguese

• Spanish

• Chinese

• Arabic

• Indonesian

• French

• Thai

• Vietnamese

Page 10: Columbia’s Born-Digital Preservation Infrastructuredaviss/work/files/presentations/CCA_presentation... · Columbia Libraries Digital Program 1. Collection-based digitization 2.

Files Harvested:

334,000 and counting …

Page 11: Columbia’s Born-Digital Preservation Infrastructuredaviss/work/files/presentations/CCA_presentation... · Columbia Libraries Digital Program 1. Collection-based digitization 2.

File Formats Encountered:

32, 3gp, accdb, adb, adp, adx, ai, aif, amr, asf, avi, axd, back, bat, bin, bk, blb, bmp, BridgeSort, btr, bup, cab, cat, cda, cdr, cfg, chm, cnf, cnm, con, css, cst, csv, cxt, d, dat, db, dbf, ddb, ddx, dfont, dir, dll, dmi, doc, doc-MRB, docm, docx, dot, ds_store, dtd, dwz, dxr, edb, edx, eml, emz, eps, exe, F&A, fcp, fff, fh9, fil, flp, flv, fol, frm, gdb, gdx, gif, hdb, hdx, hk4, hlp, hta, htm, html, ico, idx, ifo, inc, indd, inf, info, ini, itc2, itdb, itl, jar, jp2, jpe, jpeg, jpg, js, l, lck, ldb, lnk, log, m4a, m4v, mbx, mdb, mde, mdi, mdx, mht, mid, mls, mno, mov, mp3, mp4, mpeg, mpg, mpp, msf, msg, msi, mso, msv, mswmm, nri, ocx, odc, odt, ofa, oft, opd, opf, otf, p65, pab, pages, pcx, pdf, php, pif, plist, pm, pm!, pm0, pm5, pmd, pmh, pmi, pmj, pml, pmm, pmo, pmr, pms, pmx, pnc, pnd, png, pns, pnx, pot, pps, ppsx, ppt, pptx, prod, prod1, properties, psd, psp, pst, pub, qpw, qxd, r, ra, ra-att, rar, rdp, rel, rels, rem, rex, rpt, rsc, rtf, sav, sc4, sdb, sdx, sh, shs, snm, spi, spss, spv, spx, sql, svn-base, swa, swf, sys, tdb, tdx, thm, thmx, tif, tiff, tlb, tmp, toc, tpl, ttf, txt, txz, up, url, usr, utf8, vcf, vdproj, vob, vsd, wav, wbk, webarchive, wma, wmf, wmv, wmz, wpd, wpl, wps, xla, xlk, xls, xlsb, xlsm, xlsx, xlw, xml, xps, zip (243 different file formats)

Page 12: Columbia’s Born-Digital Preservation Infrastructuredaviss/work/files/presentations/CCA_presentation... · Columbia Libraries Digital Program 1. Collection-based digitization 2.

High Level Workflow:

Page 13: Columbia’s Born-Digital Preservation Infrastructuredaviss/work/files/presentations/CCA_presentation... · Columbia Libraries Digital Program 1. Collection-based digitization 2.

Technology Tools:

• FRED (Forensic Recovery of Evidence Device) – hardware / OS

• Forensic Toolkit (FTK) – suit of tools

• Archivematica – preservation analysis and packaging

• Fedora – enterprise-level repository solution

• Archivists Toolkit – archival processing tool

• SOLR – powerful Lucene-based search server

• Blacklight – open source discovery interface

Page 14: Columbia’s Born-Digital Preservation Infrastructuredaviss/work/files/presentations/CCA_presentation... · Columbia Libraries Digital Program 1. Collection-based digitization 2.
Page 15: Columbia’s Born-Digital Preservation Infrastructuredaviss/work/files/presentations/CCA_presentation... · Columbia Libraries Digital Program 1. Collection-based digitization 2.
Page 16: Columbia’s Born-Digital Preservation Infrastructuredaviss/work/files/presentations/CCA_presentation... · Columbia Libraries Digital Program 1. Collection-based digitization 2.
Page 17: Columbia’s Born-Digital Preservation Infrastructuredaviss/work/files/presentations/CCA_presentation... · Columbia Libraries Digital Program 1. Collection-based digitization 2.
Page 18: Columbia’s Born-Digital Preservation Infrastructuredaviss/work/files/presentations/CCA_presentation... · Columbia Libraries Digital Program 1. Collection-based digitization 2.
Page 19: Columbia’s Born-Digital Preservation Infrastructuredaviss/work/files/presentations/CCA_presentation... · Columbia Libraries Digital Program 1. Collection-based digitization 2.
Page 20: Columbia’s Born-Digital Preservation Infrastructuredaviss/work/files/presentations/CCA_presentation... · Columbia Libraries Digital Program 1. Collection-based digitization 2.

Preservation ‘Curation’:

• From original file, generate format versions that are more preservable and more accessible than the original file form, e.g.,

• From MS .doc and .docx files generate .rtf and/or PDF-A

• From MS .xls and .xlsx files generate tab-delimited format

• From HD video files generate motion JPEG2000

• Database files? SPSS files? Pro Tools audio files?

• ‘Legacy’ file formats?

Page 21: Columbia’s Born-Digital Preservation Infrastructuredaviss/work/files/presentations/CCA_presentation... · Columbia Libraries Digital Program 1. Collection-based digitization 2.

File Formats Encountered:

32, 3gp, accdb, adb, adp, adx, ai, aif, amr, asf, avi, axd, back, bat, bin, bk, blb, bmp, BridgeSort, btr, bup, cab, cat, cda, cdr, cfg, chm, cnf, cnm, con, css, cst, csv, cxt, d, dat, db, dbf, ddb, ddx, dfont, dir, dll, dmi, doc, doc-MRB, docm, docx, dot, ds_store, dtd, dwz, dxr, edb, edx, eml, emz, eps, exe, F&A, fcp, fff, fh9, fil, flp, flv, fol, frm, gdb, gdx, gif, hdb, hdx, hk4, hlp, hta, htm, html, ico, idx, ifo, inc, indd, inf, info, ini, itc2, itdb, itl, jar, jp2, jpe, jpeg, jpg, js, l, lck, ldb, lnk, log, m4a, m4v, mbx, mdb, mde, mdi, mdx, mht, mid, mls, mno, mov, mp3, mp4, mpeg, mpg, mpp, msf, msg, msi, mso, msv, mswmm, nri, ocx, odc, odt, ofa, oft, opd, opf, otf, p65, pab, pages, pcx, pdf, php, pif, plist, pm, pm!, pm0, pm5, pmd, pmh, pmi, pmj, pml, pmm, pmo, pmr, pms, pmx, pnc, pnd, png, pns, pnx, pot, pps, ppsx, ppt, pptx, prod, prod1, properties, psd, psp, pst, pub, qpw, qxd, r, ra, ra-att, rar, rdp, rel, rels, rem, rex, rpt, rsc, rtf, sav, sc4, sdb, sdx, sh, shs, snm, spi, spss, spv, spx, sql, svn-base, swa, swf, sys, tdb, tdx, thm, thmx, tif, tiff, tlb, tmp, toc, tpl, ttf, txt, txz, up, url, usr, utf8, vcf, vdproj, vob, vsd, wav, wbk, webarchive, wma, wmf, wmv, wmz, wpd, wpl, wps, xla, xlk, xls, xlsb, xlsm, xlsx, xlw, xml, xps, zip (243 different file formats)

Page 22: Columbia’s Born-Digital Preservation Infrastructuredaviss/work/files/presentations/CCA_presentation... · Columbia Libraries Digital Program 1. Collection-based digitization 2.

Special IFP Project Challenges …

Determining and encoding intellectual property rights

Determining and encoding information relating to privacy and access

Metadata creation / extraction

Working with 23 separate entities in advance of their data deliveries and office closings

Building scalable workflows

Building scalable storage infrastructure


Recommended