+ All Categories
Home > Documents > SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information...

SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information...

Date post: 22-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
34
1 eLegislature September 12, 2005 Discussion Points Preservation of the records of the e-Legislature Richard Marciano [email protected]
Transcript
Page 1: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

1eLegislature September 12, 2005

Discussion PointsPreservation of the records of the e-Legislature

Richard [email protected]

Page 2: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

2eLegislature September 12, 2005

Page 3: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

3eLegislature September 12, 2005

Methodologies for Preservation & Access of Software-dependent Electronic RecordsExploring the ability to compare versions of records, retrieve changes, run historical queries.

SALT

Research & develop prototypes that will lead to the creation of useful tools for archivists to preserve and provide access to electronic records over the long-term

While an RMA is capable of storing and providing access to electronic records, it cannot ensure that they remain accessible as software becomes obsolete.

Test and evaluate the best available means to implement a cost-effective application and architecture for preserving electronic records.

1996 1998 2000 2002 2005

SRB

DigArch

2001 2003

Chronopolis ?

e-Legislature

2007

Cassis ?

MDAS

DOCT

LoC

NSDL

NDIIPP w. CDL

PERM

ICAP

PAT

Archivist Workbench

NARA

Integration of DB and archival storage – Elimination of notion of fileLarge scale persistent object computational support -- distributed document handling system

Interpares

Color Legend (Funding Agency)

DARPA

NARA

NHPRC

NSF

LoC

Newly Created Research Labs at SDSC

Storage Resource Broker – Arcot RajasekarSustainable Archives & Library Technologies – Richard Marciano

Page 4: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

4eLegislature September 12, 2005

“Born Digital” Content

DIGARCH / VanMAP / e-Legislature

“CASSIS”

The Opportunity…

Page 5: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

5eLegislature September 12, 2005

Storage Resource Broker

The SDSC Storage Resource Broker is a client server middleware that virtualizes data space by providing a unified view to multiple heterogeneous storage Resources over the network.

It is a software that sits in between users and resources and provides a storage service by managing users, file locations, storage resources and metadata information

SRB Space

Page 6: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

6eLegislature September 12, 2005

Storage Resource Broker Collections at SDSC (8/2/2005)GBs of

datastored

Numberof files

Userswith

ACLsData Grid Ź Ź ŹNSF/ITR - National Virtual Observatory 53,862 9,536,751 100NSF - National Partnership for Advanced Computational Infrastructure 36,149 7,539,180 380Static collections Š Hayden planetarium 8,013 161,352 227Pzone Š public collections 12,998 6,707,952 68NSF/NPACI - Biology and Environmental collections 40,155 76,083 67NSF/NPACI Š Joint Center for Structural Genomics 15,731 1,577,260 55NSF - TeraGrid, ENZO Cosmology simulations 176,730 2,125,945 3,267NIH - Biomedical Informatics Research Network 10,561 7,596,888 303Digital Library Ź Ź ŹNSF/NPACI - Long Term Ecological Reserve 256 9,033 36NSF/NPACI - Grid Portal 2,620 53,048 460NIH - Alliance for Cell Signaling microarray data 741 84,594 21NSF - National Science Digital Library SIO Explorer collection 2,733 1,083,998 27NSF/ITR - Southern California Earthquake Center 131,010 2,702,421 73Persistent Archive Ź Ź ŹNHPRC Persistent Archive Testbed (Kentucky, Ohio, Michigan, Minnesota) 100 382,186 28UCSD Libraries archive 4,147 408,050 29NARA- Research Prototype Persistent Archive 1,478 893,434 58NSF - National Science Digital Library persistent archive 3,600 27,034,150 136TOTAL 501 TB 68 million 5,335

Page 7: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

7eLegislature September 12, 2005

Senate Legislative Collection

from the 106th US Congress database

keeps track of Senate bills, resolutions, and amendments

raw format: 99 RTF (Rich Text Format) files on CD-ROM (provided by NARA)

one file per senator

Page 8: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

8eLegislature September 12, 2005

Legislative Bio Collection:NARA: 106th Senate

Raw DataRaw DataRaw Data: rtf

Senator 1:

Senator 2:

Senator 99:...

Paul S. Sarbanes of MarylandJanuary 06, 1999 to March 31, 2000

Section I: Sponsored measuresSection II: Cosponsored measuresSection III: Sponsored measures organized by committee referral

* Senate: Armed Services* Senate: Banking* House: Judiciary

Section IV: Cosponsored measures organized by committee referral* Senate: Agriculture* House: Science

Section V: Sponsored amendmentsSection VI: Cosponsored amendmentsSection VII: Subject index to measures and amendments

**** S. 151Date Introduced: 01/19/1999Cosponsors: NONEOfficial title: A bill to amend the International

Maritime Satellite Telecommunications Act…Latest status: Jan 19, 1999 Read twice and referred to the

Committee on CommerceAbstract: NONE

Subject Index:Academic Performance: S.7, S.514, S.564Access to Health Care: S.6, S.1678, S.1690

…Zoning and zoning law: S.9, S.Con.Res.10, S.Res.41, S.J.Res.39

Page 9: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

9eLegislature September 12, 2005

Senate Legislative Collection

• What you see:

… is maybe NOT what you get (a not so well documented format):

**** S. 345DATE INTRODUCED: 02/03/1999

SPONSOR: AllardOFFICIAL TITLE

A bill to amend the Animal Welfare Act to remove the limitation that permits interstate movement of live birds, for the purpose of fighting, to States in which animal fighting is lawful.

LATEST STATUSFeb 3, 1999 Read twice and referred to the Committee on

Agriculture.

^@^@y^K^@^@\206^K^@^@Ê^K^@^@Ô^K^@^@^@^L^@^@^N^L^@^@u^L^@^@\202^L^@^@È^L^@^@Ò^L^@^@ÿ\^L^@^@^M^M^@^@j^M^@^@w^M^@^@»^M^@^@Æ^M^@^@ô^M^@^@^B^N^@^@\203^N^@^@÷ëßÓëßǹ¹®¨®Â\Â\230Â\230 Â\230Â\230®¨®Â Â\230Â\230 Â\230Â\230 Â\230Â\230 Â\230Â\230 Â\230Â\

^N6^H\201OJ^C^@QJ^C^@]^H\201^@^N5^H\201OJ^C^@QJ^C^@\^H\201^@^K^B^H\201OJ^C^@QJ^C^@^\...

^ction sent to the House.^M^M**** S. 345^MDATE INTRODUCED: 02/03/1999^MSPONSOR: Alla\rd^MOFFICIAL TITLE^MA bill to amend the Animal Welfare Act to remove the limitation\that permits interstate movement of live birds, for the purpose of fighting, to St\

ates in which animal fighting is lawful.^MLATEST STATUS^MFeb 3, 1999 Read twice \and referred to the Committee on Agriculture.^M^M**** S. 387^MDATE INTRODUCED: 02/0\8/1999^MSPONSOR: McConnell^MOFFICIAL TITLE^MA bill to amend the Internal Revenue Co\d

Page 10: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

10eLegislature September 12, 2005

Senate Collection Example… the XML can be lifted from the presentation level:

… to the information level:

<p bold="off">**** S. 345</p> <p align="right" bold="off">DATE INTRODUCED: 02/03/1999</p> <p bold="off">SPONSOR: Allard</p> <p align="center" bold="off" italic="off">OFFICIAL TITLE</p> <p bold="off" italic="off">A bill to amend the Animal Welfare Act to remove the lim\itation that permits interstate movement of live birds, for the purpose of fighting\, to States in which animal fighting is lawful.</p> <p align="center" bold="off" italic="off">LATEST STATUS</p> <p><string>Feb 3, 1999&tab;Read twice and referred to the Committee on Agriculture\.</string></p> <p></p>

<bill name="S.345"> <committees>

<committee>SENATE: AGRICULTURE</committee> </committees> <date_introduced>02/03/1999</date_introduced> <latest_status_list>

<latest_status> <ls_date>Feb 3, 1999</ls_date> <ls_txt>Read twice and referred to the Committee on Agriculture</ls_txt>

</latest_status> </latest_status_list> <official_title>A bill to amend the Animal Welfare Act to remove the limitation that permits interstate movement of live birds, for

the purpose of fighting, to States in which animal fighting is lawful.</official_title> <sponsor>Allard, Wayne [CO]</sponsor>

</bill>

Page 11: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

11eLegislature September 12, 2005

Ingestion Network: Y2K Example

.xml .XML

.XML

.rtf

Convert (Omnimark)

Lift

consolidate

.TM

.XML

archive

decomposeS1 S2 S3

S5S4

S6

S0

generate generate

DIPSIP AIPLegend (stages):

Page 12: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

12eLegislature September 12, 2005

XML as an Archival FormatInformation level “schema” as an XML DTD:

<?xml version="1.0" encoding="UTF-8"?><!ELEMENT bills (bill*)><!ELEMENT bill ( abstract?, committees?, congressional_record?, cosponsors?, date_introduced?,

digest?, latest_status_list?, official_title?, sponsor?, statement_of_purpose?, submitted_by?, submitted_for?)>

<!ATTLIST bill_name CDATA #REQUIRED><!ELEMENT committees (committee*)><!ELEMENT cosponsors (cosponsor*)><!ELEMENT digest (#PCDATA)><!ELEMENT latest_status_list (latest_status*)><!ELEMENT latest_status (ls_date, ls_txt)><!ELEMENT abstract (#PCDATA)><!ELEMENT committee (#PCDATA)><!ELEMENT congressional_record (#PCDATA)><!ELEMENT cosponsor (co_name)><!ELEMENT co_name (#PCDATA)><!ATTLIST co_name a-date CDATA #IMPLIED><!ELEMENT date_introduced (#PCDATA)>… <!ELEMENT statement_of_purpose (#PCDATA)><!ELEMENT submitted_by (#PCDATA)><!ELEMENT submitted_for (#PCDATA)>

Check Demo

Page 13: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

13eLegislature September 12, 2005

PAT Project: Persistent Archives Testbed

Building Preservation Environments for State Archives

Testing proven technologies in new environments

Exploring the use of data grid technology to support preservation

Page 14: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

14eLegislature September 12, 2005

Participants & Observers

STATES:• California • Kentucky • Michigan • Minnesota • Ohio

FEDERAL AGENCIES & FOREIGN COUNTRIES:• NHPRC & NARA • Stanford SLAC • Korea

CULTURAL HERITAGE (museums/libraries/archives):• Getty Research Institute

NEWSPAPERS:• Los Angeles Times

UNIVERSITIES:• Georgia Tech • University of California Los Angeles • University of California San Diego• University of Florida • University of Illinois Urbana Champaign • Yale

RESEARCH LABS:• San Diego Supercomputer Center

Page 15: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

15eLegislature September 12, 2005

PAT Testbed

Explore advantages of national preservation infrastructure• Shared use of preservation resources • Shared evaluation of new technology

• Demonstration of generic infrastructure• Shared development of archival procedures

• Expanded assessment of technology across diverse types of records• Shared risk

Collaborative use of preservation environment by multiple states

Page 16: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

16eLegislature September 12, 2005

PAT Project

Test a community model for electronic records management, with archival and technological functions in a distributed network (using the SRB: Storage Resource Broker – data grid technology)

Initial Test sites: (1) Michigan Department of History, Arts and Libraries, (2) Ohio Historical Society, (3) Kentucky Department for Libraries and Archives,(4) Minnesota Historical Society, (5) SLAC Stanford Linear Accelerator Archives and History Office.

Participants:(a) California State Archives (b) Kansas State Historical Society(c) University of Illinois Urbana Champaign(d) University of California Los Angeles (UCLA): (e) Yale Manuscripts and Archives(f) Georgia Tech

Observers:(a) Getty

Page 17: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

17eLegislature September 12, 2005

Shared Infrastructure

KentuckyGrid Brick

SDSCArchive

MCAT

MichiganGrid Brick

MinnesotaGrid Brick

OhioGrid Brick

SLACStorage

Local Storage Resources

Shared Preservation Environment

Metadata Catalog(Oracle)

Archival Storage(HPSS, Sam-QFS)

Page 18: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

18eLegislature September 12, 2005

Shared Development of Archival Processes

The processes that are being automated are: • appraisal, • accessioning, • arrangement, • description,• Preservation, • access.

Page 19: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

19eLegislature September 12, 2005

Archival Processes explored in PAT

KentuckyWeb

MichiganRMA -Precinct

Results DB

MinnesotaSpatial

OhioE-mail

SLACDocuments

Appraisal X

Accession X X X

Arrangement X X X X

Description X X X X X

Preservation X X X X

Access X X X X

Page 20: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

20eLegislature September 12, 2005

National Archives and Records Administration - Research Prototype Persistent Archives

NARA SDSC

MCAT MCAT

U Md

MCAT

Powerful Platform for Collaborative Research

• Synchronization across zones• Interoperability across diverse platforms• Sufficient metadata to ensure complete and authentic records• Mitigation of risk of data loss

• Replication of data• Federation of catalogs

• Deep archive

Federation of Four Independent Data Grids

GTech

MCAT

Page 21: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

21eLegislature September 12, 2005

Local Storage Resource

Grid Brick - commodity based disk system• Current cost is about $2,000 per Terabyte• 2.8 Ghz CPU• 1 Gbyte of memory• Raid Controller• Gigabit / 100 / 10 Ethernet• 1-5 Terabytes of disk

Page 22: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

22eLegislature September 12, 2005

What is BIRN?• Testbed for Biomedical Knowledge Infrastructure

• Biomedical Informatics Research Network• Creation and support federated bioscience databases• Data integration• Interoperable analysis tools• Data mining software• Scalable and extensible• Complex Access Control –

• HIPAA Requirements

• Three Fairly Large test beds interoperating with each other:• Morphological, Functional, Mouse Brain

Virtual Data Grid (SRB) MCAT

Duke UCLANCMIR

CalTech SDSC

Page 23: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

23eLegislature September 12, 2005

The BIRN Data Grid

Page 24: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

24eLegislature September 12, 2005

Monitoring Grid Status

0.7 TB

5.2 TB

0 TB

1.6 TB

0.8 TB

0.8 TB

3.2 TB

0.8 TB

2.4 TB

0.8 TB

0.8 TB

2.4 TB

1.6 TB

0.8 TB

5.0 TB

0.78 TB

0.08 TB

Page 25: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

25eLegislature September 12, 2005

Logical Resources in BIRN

Page 26: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

26eLegislature September 12, 2005

Data Grids

Data grids provide the ability to name, organize, and manage data on distributed storage resources

Federation provides a way to name, organize, and manage data on multiple data grids.

Page 27: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

27eLegislature September 12, 2005

Data Grids

Distributed data sources• Inter-realm authentication and authorization

Heterogeneity• Storage repository abstraction

Scalability• Differentiation between context and content management

Preservation• Support for automated processing (migration, archival processes)

Page 28: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

28eLegislature September 12, 2005

The SRB, an example of a Data Grid

A distributed file system (Data Grid), based on a client-server architecture.

It’s also more: It provides a way to access files and computers based on their attributes rather than just their names or physical locations.

It replicates, syncs, archives, and connects heterogeneous resources in a logical manner using abstraction mechanisms.

Page 29: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

29eLegislature September 12, 2005

Data Grid Support for PreservationAuthenticity – the assurance that the records are what they purport to be

• Support for metadata necessary to maintain the provenance of the records • Support for maintaining the essential characteristics of the records across

transformationsIntegrity - the assurance that the electronic records are not corrupted

• Support for integrity metadata (audit trails, access controls, checksums, replicas)• Support for distributed environments (replication, federation)

Infrastructure Independence• Manage electronic records independently of the choice of storage system• Standard operations across databases and storage repositories

Page 30: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

30eLegislature September 12, 2005

Eliminate Infrastructure Dependence Upon NamingNeed persistent identifiers for

• Archivists• Records• Metadata• Storage resources• Access controls

Need automatic update of mapping from persistent identifiers to names used in storage systems

Page 31: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

31eLegislature September 12, 2005

Unix Shell

NT Browser,Kepler Actors

Linux I/O

HTTP,WSDL,

(WSRF),GIS

OAI,DSpace,

OpenDAP,GridFTP

Archives - Tape,Sam-QFS, DMF,

HPSS, ADSM,UniTree, ADS

Databases -DB2, Oracle,

Sybase, Postgres, mySQL, Informix

File SystemsUnix, NT,Mac OSX

Application

ORB

Storage Repository AbstractionDatabase Abstraction

Databases -DB2, Oracle, Sybase,

Postgres, mySQL,Informix

C, C++Library

Logical Name Space

LatencyManagement

DataTransport

MetadataTransport

Consistency & Metadata Management / Authorization, Authentication, Audit

JavaDLL /

Python,Perl,

Windows

Federation Management

Storage Resource Broker 3.3.1

Infrastructure Independence

Preservation Processes (Accession to Access)

Page 32: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

32eLegislature September 12, 2005

Mitigating Risk of Data Loss

Replication - keeping two copies of the records• Protect against media corruption

Data replication to remote site• Protect against local operational error• Protect against natural disaster

Data replication to another type of storage• Protect against systemic vendor problem

Data federation - keeping two independent metadata catalogs• Protect against malicious users

Page 33: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

33eLegislature September 12, 2005

Federation Between Data Grids

Data Grid

• Logical resource name space

• Logical user name space

• Logical file name space

• Logical context (metadata)

• Control/consistency constraints

Data Collection B

Data Access Methods (Web Browser, DSpace, OAI-PMH)

Data Grid

• Logical resource name space

• Logical user name space

• Logical file name space

• Logical context (metadata)

• Control/consistency constraints

Data Collection A

Access controls and consistency constraints on cross registration of digital entities

Page 34: SDSC Discussion Points for the E-Legislature Project Meeting · resources and metadata information SRB Space. eLegislature September 12, 2005 6 Storage Resource Broker Collections

34eLegislature September 12, 2005

Summary

Working preservation environment• Infrastructure independence• Authenticity• Integrity

Collaborative use of preservation environment by multiple states• Shared technology assessment• Shared infrastructure• Shared development• Shared risk


Recommended