SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT
V-GISC/SIMDAT project – a Virtual GISCAlfred Hofstadler, Matteo Dell’Acqua
ECMWF
SIMDAT TMB, 15 December 2004 AMD-2 SIMDAT
Project History: May 2002: Thirteenth session WMO Regional Association VI: “…agreed that the
concept of a Virtual GISC had merit…”
June 2002: V-GISC in RA-VI Kick-off Meeting Partners: DWD, Meteo France, UK Met-Office, EUMETSAT, ECMWF Steering Group + 4 working groups: Policy, Data, Communications,
Dissemination/Acquisition
2003: SIMDAT project proposal submitted to EU
1 September 2004: contract with EU is signed
October 2004: V-GISC steering group decides to move V-GISC development into the SIMDAT project
November 2004: SIMDAT Kick-off meeting 4 V-GISC working groups are mapped onto SIMDAT working groups: Virtual Organisation,
Ontologies, GRID Infrastructure, Access to Distributed Data
February 2005: First (V-)GISC-demonstrator at CBS
SIMDAT TMB, 15 December 2004 AMD-3 SIMDAT
SIMDAT - Introduction Data Grids for Process and Product Development using Numerical Simulation
and Knowledge Discovery
4 years project funded by the EU Contract with EU was signed on 1 September 2004
SIMDAT focuses on 4 applications Product design in automotive and aerospace Process design in pharmacology Service provision in meteorology
Objective of SIMDAT is to use data grid technology to resolve a complex problem for each of the 4 applications
Budget of 11 M € of which 10.5% for meteorological activity 320 men/month taking into account EU funding and the contribution from the
partners
SIMDAT TMB, 15 December 2004 AMD-4 SIMDAT
SIMDAT - Strategy 7 Grid-technology areas have been identified to achieving SIMDAT
objectives
Phase 1: Connectivity Phase 2: Interoperability Phase 3: Knowledge
. Deployment of Grid infrastructure with particular attention to data transport and management. Distributed DB access
. Virtual Data Repository
. Introduction of grid technologies research
Workflows for next-generation aggregated knowledge capture, discovery and mining
Ontologies Integration of analysis services Workflows Knowledge Services
Integrated Grid infrastructure offering basic services to applications
Access to data distributed on Grid sites
Management of Virtual Organisation
SIMDAT TMB, 15 December 2004 AMD-5 SIMDAT
Meteorology application : Project Aims 5 partners: DWD, Meteo-France, UK Met Office, EUMETSAT and ECMWF
3 “potential” GISCs : DWD, Meteo-France, UK Met Office 2 DCPCs : ECMWF, EUMETSAT
Instead of each National Met Service having a GISC (Global Information System Centre)
The V-GISC will be seen as a normal GISC and will fulfil the WMO Information System technical requirements
The project will build the foundations of the V-GISC by developing an infrastructure that brings together the data of the partners and provides access to the distributed meteorological databases
SIMDAT TMB, 15 December 2004 AMD-6 SIMDAT
Meteorology application : Project Aims A complex problem: To build a Virtual GISC, an integrated and scalable
framework for the collection and sharing of distributed data that will offer: A single view of meteorological information which is distributed amongst the 5
partners Improve visibility and access to meteorological data through a comprehensive
discovery service based on metadata development Offer a variety of reliable reliable delivery services (routine dissemination of and
collection of data) Provide a global access control policy managed by the partners and integrated into
their existing security infrastructure Quality of services, reliability and security Processing services and shared data manipulation facilities
The software developed within the project will be made available to WMO
SIMDAT TMB, 15 December 2004 AMD-7 SIMDAT
GRID Technology Grid technology will be used
To connect the diverse data sources and create a Virtual Database To enable flexible, secure collaboration through virtual organisation
Data Grid technology presents an architectural framework that aims to provide access to distributed data in a simple,secure, reliable and scalable manner from a widely distributed set of computers and across various administrative boundaries
The essential characteristics of a Data Grid are: Reference a dataset by a unique identifier Discover dataset by attributes Track multiple copies of a single file, and ultimately locate the "nearest" copy Move files from one point on the grid to another point (push, pull and third
party copy) The domain of the V-GISC is an ideal candidate to exploit such a
framework
SIMDAT TMB, 15 December 2004 AMD-8 SIMDAT
V-GISC infrastructure
Interface to offer a single view of the data - Discovery facilities - Request/Subscription
MonitoringLoggingControl
Error trackingSecurity
AuthenticationAuthorization
AuditManagementUser registration
DB adminCatalogue admin
Grid infrastructure for sharing data
Interoperability interfaces for data/metadata exchange
mechanisms to synchronise metadata
Dissemination/acquisition mechanisms
SIMDAT TMB, 15 December 2004 AMD-9 SIMDAT
Meteo requirements
SIMDAT TMB, 15 December 2004 AMD-10 SIMDAT
V-GISC Conceptual view
Through the Distributed Portal users searches for and retrieves data, subscribe to services subject to authentication and authorization
The Virtual Database Service provides a single view of partners databases
SIMDAT TMB, 15 December 2004 AMD-11 SIMDAT
V-GISC Conceptual view Virtual Database
Provide the unified view of all the shared datasets through a distributed catalogue Maintain the distributed catalogue amongst the partners using synchronization mechanisms Provide interfaces with the legacy databases Implement data replication mechanisms Preserve the integrity of the data
Access Facilities Collection & Dissemination services that support secure, efficient and reliable transport
mechanisms Quality of Service (QoS): Traffic Prioritization, Queuing mechanisms, Scheduling Discovery service by browsing the catalogue or using a keyword search engine Interactive and batch interfaces
VO Security Services (CA, AuthN, AuthZ, Audit,…) Users management Data policy management Monitoring and control
SIMDAT TMB, 15 December 2004 AMD-12 SIMDAT
V-GISC Distributed Architecture V-GISC node is installed on each partner site
All the nodes are interconnected through a dedicated secure communication channel; The Database Communication Layer (DCL)
All the nodes exchange messages through the DCL
The architecture is decentralized No central point where all the nodes are declared No single point of failure
The network of nodes is self-organized The network dynamically accepts new nodes and is aware of node disconnections The network organizes its topology and indicates to the entering new nodes their
position within the network No manual intervention on the nodes to accepts new peers
SIMDAT TMB, 15 December 2004 AMD-13 SIMDAT
V-GISC Distributed Architecture
SIMDAT TMB, 15 December 2004 AMD-14 SIMDAT
V-GISC Node Each node maintains a copy of the global catalogue describing data
available through the V-GISC The catalogue synchronization is done using the DCL
Each node maintains a cache used to replicate data and to efficiently serve the users
A node is interfaced with the local legacy databases A node has a Web Portal for interactive access A node has a Grid/Web Service Portal for batch access and integration
of the V-GISC in a bigger Grid A node implement all services offered by the V-GISC
SIMDAT TMB, 15 December 2004 AMD-15 SIMDAT
V-GISC Node - Functional Design
SIMDAT TMB, 15 December 2004 AMD-16 SIMDAT
Demonstrator – Functional View
To deploy a flexible infrastructure on top of which the Virtual Information Centre can be built
To use Grid technologies to federate databases located on partners site
To show to the user a unique view of data sets stored by at least 3 partners
To get a first implementation of the catalogue based on WMO core metadata
To offer first VO security services
SIMDAT TMB, 15 December 2004 AMD-17 SIMDAT
Demonstrator - Design 3 main components to build the virtual database: Data Repository,
Catalogue Node and Portal installed on each partner site and interconnected through a dedicated secure
connection channel Data Repository
Interface to the partners databases Offers metadata information to describe, search, locate data Offers interface to retrieve data from the associated local databases
Catalogue Node Maintains the catalogue and ensures synchronisation Harvests metadata and requests data from the data Repository Ingests data and maintains the cache of the V-GISC Serves clients: Portal or other Nodes Monitors the execution of the requests
Distributed Portal Offers interface to search/browse the V-GISC catalogue
SIMDAT TMB, 15 December 2004 AMD-18 SIMDAT
Demonstrator - Architectural Choices
Grid Architecture that can accept any kind of Grid Technology Free to choose any grid middleware (OGSA-DAI, GRIA, Glite, GT4) and pick the best
component of each middleware that meets the V-GISC requirement
Catalogue Node built on a J2EE component framework Solid framework used in production environment Includes different services such as persistency, monitoring, configuration, etc The framework can be seen as a kernel of components where it is easy to add services
such as Grid services or Web services
Catalogue duplicated and synchronized on each site To have a fast discovery (browse & search phase) phase To have a reliable system (client redirection to another node in case of problems)
SIMDAT TMB, 15 December 2004 AMD-19 SIMDAT
Demonstrator - Architecture
SIMDAT TMB, 15 December 2004 AMD-20 SIMDAT
Demonstrator - Deployment
SIMDAT TMB, 15 December 2004 AMD-21 SIMDAT
Problems and lessons learned - 1 Grid Middleware
Technology not mature for production environment Middleware still evolving toward standards (WSRF, WSI, …)
Access to distributed data No efficient and robust transport mechanism No mechanism to duplicate and synchronize data Difficult to ensure data integrity on huge data volumes OGSA-DAI is promising, easy to understand and use
SIMDAT TMB, 15 December 2004 AMD-22 SIMDAT
Problems and lessons learned - 2 Ontology / Metadata
Meteorological metadata are described using XML WMO-CORE metadata Profile
• Metadata description larger than the data • Same information repeated in all metadata records Unnecessary
information is circulating over the network• Large metadata records slowing down the Database hosting the
catalogue Universal request language was not a solution to the virtual database
problem VO
No standard tools to manage users and data policies No standard security policies
SIMDAT TMB, 15 December 2004 AMD-23 SIMDAT
What’s next Finalise the Connectivity phase (by M18/Mar 2006)
Connect EUMETSAT to the Grid (M12-M15/Sep-Dec 2005) Enhance the architecture (M13-M18/Oct 2005-Mar 2006) Implement Registration Authority (M16-M17/Jan-Feb 2006) Improve metadata model (M13-M16/Oct 2005-Jan 2006) Enhance distributed portal (M14-M16/Nov 2005-Jan 2006)
Introduce acquisition of data (M18-M24/Mar-Sep 2006) Develop subscription service (M20-M28/May 2006-Jan 2007) Start developing the Virtual Organisation
Monitoring and management of the system (M18-M24/Mar-Sep 2006) User management and data access control (M24-M30/Sep 2006-Mar 2007)
Develop the discovery mechanism (M20-M25/May-Oct 2006) Start testing with other potential GISC
Japan and Australia have expressed interest in joining the SIMDAT project
SIMDAT TMB, 15 December 2004 AMD-24 SIMDAT
Global View : Coordination Effort Metadata Request-reply mechanism Exchange of catalogues Definition on what data should be available and to whom Virtual Organisation Standardisation of services Quality of Service Security