1DAARWG _ May 24-25, 2007
CLASS
Presentation to theNOAA Science Advisory Board’s
Data Archiving and Access Requirements Working Group
Robert Rank
CLASS New Campaigns Manager
May 24-25, 2007
2DAARWG _ May 24-25, 2007
ObjectivesObjectives
Review CLASS mission, role, drivers, challenges
– What is an ‘Archive’
– Open Archive Information System (OAIS)
CLASS Architectural Transitions
Discuss GEO-IDE/CLASS joint efforts
CLASS Campaign Status
– NODC IOC/FOC
– CLASS APIs
– NPP DDR
Open discussion
3DAARWG _ May 24-25, 2007
CLASS MissionCLASS Mission
In its simplest form, CLASS’s mission is to provide IT infrastructure and support for NOAA archives.
4DAARWG _ May 24-25, 2007
What is an “Archive?”What is an “Archive?”
Historical definition: roughly “record preservation”
– Semantics, functions varied significantly across “archives”
Digital information preservation is driving changes
– Need for long-term understandability/usability
– “Data” vs. “Information” Modern definition: OAIS-RM
– “… an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community.”
5DAARWG _ May 24-25, 2007
OAIS-RMOAIS-RM
International standard reference model for “open information archives”
– Reference in:• Report to Congress, CLASS L1Rs, ARWG-A&A Reqs
Initially inspired by need to establish common framework for discussions between archival entities
Provides– Terminology and concepts– Guidelines– Typical functional entities and services– Information taxonomy– “Mandatory responsibilities”
• Core ‘Definition of Archive’
6DAARWG _ May 24-25, 2007
OAIS Mandatory ResponsibilitiesOAIS Mandatory Responsibilities
Negotiate for and accept appropriate information from Producers
Obtain sufficient control of the information to ensure Long-Term Preservation
Determine the Designated Community Ensure Independent Understandability Follow documented policies and procedures for
preservation Make information available to the Designated
Community
7DAARWG _ May 24-25, 2007
CLASS/Archive RelationshipCLASS/Archive Relationship
Information Preservation – (Requirements definition)– Archive “performs”– CLASS “provides IT capabilities in support of”
Expertise– Archive: information structure, semantics, usage; science– CLASS: IT
Focus– Archive: information– CLASS: data (bits)
Stewardship– Archive: science data stewardship - (Requirements development)– CLASS: data stewardship
Key takeaway: CLASS is not (nor does it “have”) an archive
8DAARWG _ May 24-25, 2007
CLASS Mission RevisitedCLASS Mission Revisited
CLASS’s mission is to provide IT infrastructure and support that enable NOAA archives to implement OAISs.
9DAARWG _ May 24-25, 2007
OAIS-RM Functional Entities in CLASS Context
OAIS-RM Functional Entities in CLASS Context
Preservation Planning
Producer SIP IngestArchival Storage
Data Management
Access
DIP
ConsumerResult Set
Archive
CLASS
Administration
DI DI
AIP AIP
Management
Query
Order
10DAARWG _ May 24-25, 2007
Selected CLASS AttributesSelected CLASS Attributes
Vision
– “… An enterprise-wide IT system supporting long-term, secure storage of and access to NOAA’s archived environmental datasets” – CLASS L1Rs
Scope
– “… NOAA (digital spatial environmental) data
– “… capable of supporting both existing and new archive collections …” – CLASS L1Rs
Applicability
– “NOAA has directed that both legacy and emerging environmental observing systems requiring long-term archive plan to use CLASS.” – CLASS L1Rs
Customers - (Requirements development)
– NOAA archives (e.g., NNDCs & Centers of Data) are CLASS’s direct customers
– Consumers and Producers are archives’ direct customers
11DAARWG _ May 24-25, 2007
Key CLASS ChallengesKey CLASS Challenges
NOAA roles, responsibilities
– Identification and allocation of all archival mission roles and responsibilities, operations
– Essential to CLASS success, both practice and perception
NOAA integration/interoperation – GEO/IDE
Data heterogeneity/specificity
Large data volumes
Data management
Long-term data preservation
– Complex, evolving problem
– Currently in a developmental, research stage
Widely divergent user needs, requirements
Evolving from legacy system
12DAARWG _ May 24-25, 2007
Addressing the Challenges - InternalAddressing the Challenges - Internal
Emphasize generality, scalability, flexibility
Depend on standards wherever possible
Data models
Ontologies
Provide infrastructure customizable by external entities
– Customer Interface for Access
Software must support hardware refresh transparently
– Abstract hardware in software
– Layered architecture
Track preservation-related best-practices and similar archival projects
13DAARWG _ May 24-25, 2007
Addressing the Challenges - ExternalAddressing the Challenges - External
Continue to push for clarification of roles and responsibilities
– Archive ConOps is key Continue to stress clear, comprehensive requirements
– L1Rs have helped enormously
– Push for L2, L*, requirements
• ARWG – Archive & Access Requirements - V2.2
Are they complete? Do they need to be revised? Transparency Early, strong commitment to GEO-IDE
14DAARWG _ May 24-25, 2007
CLASS’s SOACLASS’s SOA
Consistency with FEAF/NOAA EA Facilitates interoperability & provides public interfaces Enables custom “client” applications
Reduces pressure to be all things to all people Foundation for distributed infrastructure Facilitates evolution from “as-is” system Facilitates cost-effective extensibility Primary services
– Externally visible• Ingest• Access
– Internally visible• Archival Storage• Data Management
15DAARWG _ May 24-25, 2007
CLASS Architectural Transitions - InternalCLASS Architectural Transitions - Internal
Transition to SOA
– Wrap existing functionality in services
– Refactor behind-the-scenes later Transition to layered architecture
– Hardware abstraction
– Distributed infrastructure Continue to emphasize, expand componentization Tactics
– Prototype services and vet internally
– Incrementally redesign subsystems
– Increase use of standards (formats, code, terminology, etc.) Decouple CLASS Web Interface from CLASS internals
16DAARWG _ May 24-25, 2007
CLASS Architectural Transitions - ExternalCLASS Architectural Transitions - External
Track industry best-practices and others’ lessons-learned
Work with GEO-IDE to
– Specify services, interfaces, standards
– Vet CLASS-proposed services, interfaces, standards
Publish services gradually, starting with friendly users
…
17DAARWG _ May 24-25, 2007
Keys to NOAA Integration EffortsKeys to NOAA Integration Efforts
Standard terminology, so people can exchange information efficiently
Interfaces and APIs, so systems know how to communicate
Standards, so data can be exchanged
GEO-IDE partnerships development are the Key
Specification of roles and responsibilities
18DAARWG _ May 24-25, 2007
Where Should CLASS Participate in Pilots with GEO-IDE?
Where Should CLASS Participate in Pilots with GEO-IDE?
Services and signatures
Spatial operations
Structural Data Types
Metadata
Standards
Development of guidelines
– Early visibility and awareness helps us
Interoperability
– WHO do we need to interoperate with?
19DAARWG _ May 24-25, 2007
Priorities for CLASS/GEO-IDE RelationshipPriorities for CLASS/GEO-IDE Relationship
Start interactions now
– CLASS already moving forward
– Interactions now will help avoid re-work
Identify contacts and conduits
Identify joint efforts and pilot projects
– APIs – Prototype now
– NODC IOC/FOC
Agree on priorities, schedule
Initiate joint activities
20DAARWG _ May 24-25, 2007
CLASS NODC IOC Campaign StatusCLASS NODC IOC Campaign Status
NODC IOC – Five (5) Operational Threads – June 07, 2007
1. Ingest Operations – The thread begins when the NODC SIP arrives at CLASS.
2. Dissemination Operations – Two separate sub-threads are considered, depending on the restriction level of an AIP.
3. Data Update – This thread begins when NODC requests its data from CLASS for purposes of updating its data
4. Data Integrity Check Processing – This thread is triggered by a schedule for integrity checks on the stored data
5. Restriction Level Reset – This thread begins with the receipt of a restriction level reset request message from NOD
21DAARWG _ May 24-25, 2007
CLASS NODC FOC Campaign StatusCLASS NODC FOC Campaign Status
NODC FOC implementation steps – FY07-08
The Archive Requirements Working Group (ARWG) and CLASS will review the existing Submission Agreement (SA) templates, and will define additional templates for different types of data (e.g., non-periodic data, historical data)
Develop one or more SAs for NODC data as needed
Develop the NODC-CLASS Interface Control Document (ICD)
Evaluate the IOC and improve it if necessary
Resolve the technical issues regarding deletion and versioning of NODC data
Define the FOC requirements
Define steps for transition from the IOC to FOC
Implement the FOC
Transfer all NODC data to CLASS for long-term storage
22DAARWG _ May 24-25, 2007
CLASS API prototype background
CLASS API prototype background
NGDC has a strong interest in a CLASS API in order to fully integrate CLASS within the center
NGDC has extensive experience in API development through SPIDR, SABR and ESG systems.
August of 2006 Users workshop showed a strong user interest in CLASS API’s as well
An initial API un-veiled and discussed at the Asheville workshop (CLASS, SABR, SPIDR, etc.)
23DAARWG _ May 24-25, 2007
CLASS API GoalsCLASS API Goals
First draft of a user focused WS interface
Demonstration of the concept of “fundamental separation” of archive and storage from access
Interaction with and demonstration for users
Technology discovery and evaluation of cutting edge tools for CLASS
First integration of multiple data types through CLASS (time-series, grid, swath, etc..)
24DAARWG _ May 24-25, 2007
CLASS –NGDC PrototypeScope
25DAARWG _ May 24-25, 2007
Current snapshot of the CLASS API architectureCurrent snapshot of the CLASS API architecture
CLASS
CLASS products
Visualization service
SPIDRESG - INE - IDEAS – ESSE
NCEP , ERA-40 , ...
DB cluster
Inventory service
Order service
WebServices
OPeNDAP
OGSA-DAI for CLASS (Globus Toolkit 4)
Plugin PluginPluginPlugin
NOMADS
NARR
NetCDFNetCDF
NetCDF
THREDDS(OPeNDAP)
WCS(OGC)
SABR
OLS DMPS
Previewimage
Plugin
Inventory
GranuleGranuleSatellite
Granule
CLASS web interfaceCLASS
metadata
REST API for CLASS
DB cluster
OGSA-DAI client toolkitGRID client toolkit
26DAARWG _ May 24-25, 2007
CLASS NPP Campaign DDR DeliverablesCLASS NPP Campaign DDR Deliverables
Updated Documents
– CLASS-NPP Submission Agreement
– Software Description Document
– Allocated Requirements with NPP Requirements
– Review Item Discrepancy (RID) form New Documents
– NPP Delta Design Review Organization Note
– Software Upgrade Plan - (Gap Analysis)
– Hardware Upgrade Plan for NPP
– Network Upgrade Plan for NPP
– Prioritization Policies and Procedures Document– CLASS Load Test Plan for NPP
• Performance Benchmark Technical Report
– DDR Presentation Slides
27DAARWG _ May 24-25, 2007
CLASS NPP Campaign StatusCLASS NPP Campaign Status
The CLASS-NPP Delta Design Review (DDR)
Scheduled for June 21-22, ’07
at NSOF, Suitland, Md.
DAARWG Membership Invited
28DAARWG _ May 24-25, 2007
Discussion?
Thank you!
29DAARWG _ May 24-25, 2007
Background SlidesBackground Slides
30DAARWG _ May 24-25, 2007
Selected CLASS Bounds/AssertionsSelected CLASS Bounds/Assertions
CLASS is not an archive/OAIS
– OAIS-RM is important for CLASS, but CLASS does not (and can not) conform/comply with it
CLASS does not have a science mission
– “Data producers and data centers are responsible for the science data stewardship missions and the development and maintenance of science data stewardship data, information, and metadata.” – CLASS L1Rs
Ramifications for data specificity problem CLASS is an extant operational system
Future versions of CLASS will be evolutions The as-is CLASS system was developed for a very different set of
requirements than those which now exist
31DAARWG _ May 24-25, 2007
Selected CLASS Architectural DriversSelected CLASS Architectural Drivers
Requirements: L1, L* (future), A&A v. 2.2, system and allocated
As-is system
New campaigns
Data heterogeneity & volumes
Constant change
Long-term mission
OAIS-RM
GEO-IDE
32DAARWG _ May 24-25, 2007
Selected CLASS “ilities”Selected CLASS “ilities”
Flexibility adaptation to change
Generality support variety in data types, users needs, etc.
Scalability support increasing data volumes and user activity
Interoperability fundamental to NOAA, user community
Security essential aspect of any generally-accessible IT system
Reliability essential to long-term secure storage mission
Maintainability and evolvability address long-term mission and change
Openness and standards conformance support interoperability, usability
Modularity and layering promote flexibility and maintainability
Heterogeneity provide flexibility, cost reduction alternatives
33DAARWG _ May 24-25, 2007
Flexibility is EssentialFlexibility is Essential
CLASS’s environment is characterized by change
Emphasis on evolution, evolvability
What can be done, not what must be done
– Example: support multiple nodes
– Example: support small-footprint service deployment
Key goal: provide options to CLASS PM
34DAARWG _ May 24-25, 2007
CLASS Long-Term System Architecture Status
CLASS Long-Term System Architecture Status
Documents
– Long-Term System Architecture Overview - done
– To-Be System Architecture Overview – in progress, end of 2007
– Long-Term System Architecture Transition Plan
– Long-Term System Architecture Reference Manual Work in progress
– Service decomposition
– Interface development
– Data specificity approaches workable for both CLASS, NOAA
– Infusing LTSA thinking into CLASS redesigns
35DAARWG _ May 24-25, 2007
Service-Oriented ArchitectureService-Oriented Architecture
Design style used throughout all aspects of creating and using business services Defines the ways in which services are deployed and managed Increases reuse Lowers overall costs Improves extensibility Maps easily and directly to a business’s operational processes Supports a better division of labor between IT and business personnel Uses description model capable of unifying new and old IT systems Most important application is connecting the various operational systems that automate an
enterprise’s business processes For CLASS
• Internally: connecting Ingest, Archival Storage, Data Management, and Access• Externally: enabling participation in NOAA SOA; interoperability with other NOAA
systems For NOAA, connecting new and legacy IT systems (including CLASS)
Facilitates composition of services across disparate pieces of software, whether old or new; inter- or intra-enterprise; and regardless of platform
36DAARWG _ May 24-25, 2007
Archive ConOpsArchive ConOps
Identification and allocation of all archival mission roles and responsibilities, operations
Essential to CLASS success, both practice and perception
Example: CLASS strategy for dealing with data specificity needs to be feasible within NOAA
37DAARWG _ May 24-25, 2007
Strawman Process for Breaking Down Stovepipes
Strawman Process for Breaking Down Stovepipes
Extra-CLASS– Decide which stovepipes are to be transitioned into CLASS
Analysis– Analyze stovepipe holdings and capabilities– Assess impacts on CLASS for subsuming stovepipe holdings– Draft Submission Agreement and ICD
IOC– Demonstrate ingest and rudimentary access for sample of stovepipe holdings– Poll stovepipe users regarding UI issues
“Historical ingest” campaign– Finalize Submission Agreement and ICD– Extend CLASS-written UI as needed, or ….– … develop new stand-alone UI that duplicates the look-and-feel of the old
stovepipe’s interface, but uses CLASS services– Initiate ingest