Post on 10-Dec-2015
transcript
31 January 2003 GridPP Collaboration Meeting 1
CLRC e-Science Centre
David BoydDeputy Director, CLRC e-Science Centre
d.r.s.boyd@rl.ac.ukhttp://www.e-science.clrc.ac.uk/
31 January 2003 GridPP Collaboration Meeting 2
CLRC e-Science Centre
• Organisational structure
• Centre’s core programme
• National e-Science role
• Development programme
• Future challenges
31 January 2003 GridPP Collaboration Meeting 3
Organisational structure
• Centre Director – Paul Durham (secretary Shirley Miller)• Deputy Director – David Boyd (secretary Virginia Jones)
• 5 operating groups– Computing and Data Services – John Gordon– Grid Support – Alistair Mills– Grid Technology – Rob Allan– Data Management – Kerstin Kleese van Dam– Grid Visualisation – Lakshmi Sastry
• Now 37 technical staff in post in these groups
31 January 2003 GridPP Collaboration Meeting 4
CLRC e-Science Centre
• Organisational structure
• Centre’s core programme
• National e-Science role
• Development programme
• Future challenges
31 January 2003 GridPP Collaboration Meeting 5
Providing facilities . . .
• Providing Grid-accessible scientific computing facilities and expertise for CLRC’s user community (both in-house and external)– BaBar Tier A and UKHEP production centres and LHC Tier 1 prototype
centre• 500 CPUs + 45TB disk (and growing)
– Columbus• 24 CPU Alphaserver SC/Quadrics for the Computational Chemistry Working
Party
– Mott• 20 CPU Alphaserver SC/Quadrics for the Minerals and Ceramics Consortium
– Beowulf/PC/Linux clusters• 32 AMD CPUs/Wulfkit, 16 AMD CPUs/Myrinet• 2x 32 AMD CPUs/ethernet (for ISIS)• 64 Alpha CPUs/QsNET/ethernet
– Atlas Datastore • 100TB scientific data archive being upgraded to 1PB capacity
• Ensuring other major facilities are Grid-enabled – including HPCx
31 January 2003 GridPP Collaboration Meeting 6
. . . and infrastructure
• Providing a Grid-based infrastructure for major scientific facilities in CLRC and elsewhere– Particle Physics (CERN, SLAC, Fermilab)
• data analysis and management– ISIS
• data visualisation, analysis and management, remote instrument control– Synchrotron Radiation Source
• data analysis, data management, remote instrument control– British Atmospheric Data Centre
• data management– Space Science and Astronomy (ESA, NASA)
• data management
• Collaboration on new CLRC facility developments – Diamond
• data management and analysis, remote instrument control– 4GLS
• data management and analysis
31 January 2003 GridPP Collaboration Meeting 7
CLRC e-Science Centre
• Organisational structure
• Centre’s core programme
• National e-Science role
• Development programme
• Future challenges
31 January 2003 GridPP Collaboration Meeting 8
National e-Science support
• Grid Support Centre (Alistair Mills)– Certification Authority, directory and information services, software
distribution (Globus, Condor), helpdesk, user support, . . .• BBSRC Grid Support Service (Pete Oliver)
– Supporting BBSRC institutes, IGF centres and researchers outside e-Science
– Helping to develop Grid demonstrators for DNA homology searching, etc– Working with PPARC project supporting biomolecular simulation
• Network monitoring (Robin Tasker)– GNT-sponsored monitoring and information services
• Engineering Task Force (David Boyd)– Set up to build the UK e-Science Grid– Level 1 Grid implemented and tested in June 2002– Level 2 Grid project team formed in October 2002– Target is an operational Grid infrastructure by April 2003– Continue to strengthen and expand and eventually migrate to
OGSA/GT3
31 January 2003 GridPP Collaboration Meeting 9
Grid Support Centre Services
• Helpdesk - support@grid-support.ac.uk– provides access to expert technical support
• Web information resource - http://www.grid-support.ac.uk– offers Grid awareness and education material
• Grid Starter Kit – downloadable Grid software and installation tools
• National Grid Directory Service– supports Grid resource discovery and access to current status
• Certification Authority (CA) – http://www.grid-support.ac.uk/ca– assigns a trustable digital identity to an individual
– you need one to use the Grid!
31 January 2003 GridPP Collaboration Meeting 10
UK e-Science CA – Jens Jensen
• We issue X.509 certificates to people and servers in the UK e-Science community
• Web based software with extensive on-line documentation• CP/CPS published on the web• Registration Authorities (RAs) carry out local identity checks• We currently have ~30 active RAs • More RAs being appointed at about 3 per month• We regularly run training courses for new RAs• Over 250 certificates issued (personal and server)• Approved by DataGrid and CrossGrid CAs and US DOE• Collaborating with GGF CA group• Grid Support Centre provides documentation and resolves
problems • See http://www.grid-support.ac.uk/ca/
31 January 2003 GridPP Collaboration Meeting 11
Grid Information Services – Rob Allan
Information Portal created for deployment on the UK e-Science Grid
Uses the MDS system via services on ginfo to create HTML-based and on-line Web services for resource discovery and monitoring
• Resource-oriented view of compute and data resources: http://esc.dl.ac.uk/InfoPortal
• Site-oriented view via an active map: http://esc.dl.ac.uk/InfoPortal/Map
• Virtual Organisation view using UDDI with links to contacts, resources and trading models - available as a CLRC Web service http://esc5.dl.ac.uk:8080/uddi/inquiry
31 January 2003 GridPP Collaboration Meeting 12
Grid Compute Resources
InfoPortal augments the Globus MDS with a static XML-based information system showing resource architecture and installed software details:
• showing resource-centric view and menu bar
31 January 2003 GridPP Collaboration Meeting 13
Active Map of Sites
Site-centric view of e-Science Grid, shows:
• List of Grid resources from MDS
• Globus Integration Test results
• Network monitoring data
31 January 2003 GridPP Collaboration Meeting 14
UDDI
Two prototype UDDI registries have been developed:
• CLRC e-Science project registry listing services for HPCPortal, DataPortal and Visualisation Tools
• UK e-Science project registry
31 January 2003 GridPP Collaboration Meeting 15
Extending the Schema
Proposing an enriched schema for projects, users, applications and resources with OGSA-DAI access. Data will be stored as XML in an Oracle relational dbms server.
31 January 2003 GridPP Collaboration Meeting 16
Level 2 Grid project
• Building a usable, operational Grid for science and engineering applications linking resources at all the UK e-Science Centres
• First project attempting to integrate such a heterogeneous set of resources (physically, technically and organisationally) into a working Grid
• Project team led by Rob Allan• 8 workpackages
• Middleware deployment (Nick Hill)• Grid information services (Rob Allan)• User authentication (Alistair Mills)• User access management and accounting (Steven Newhouse)• Grid security (Jon Hillier)• Operational status monitoring (David Baker)• Grid platform deployment (Alistair Mills)• Grid applications (Simon Cox)
• Involves resource managers at all e-Science Centres setting up necessary middleware plus specialised technical input from several Centres
31 January 2003 GridPP Collaboration Meeting 17
Level 2 Grid – current status
• Standardised on Globus 2.2.3 – now installed at most e-Science Centres – MDS still proving unreliable
• Monitoring operational state of middleware now routinely carried out using test scripts – results reported to web – availability improving
• Problem with browser restrictions of Open CA being addressed• Imperial’s VOM software demonstrated and being evaluated for
access management – will VOM/VOMS/CAS converge?• Trusted host database being set up to help securely generate firewall
rules – short term solution to satisfy local security concerns• Applications being identified for initial release – workshop at S’ton on
22 January – issued raised of software licensing for Grid use• Now using GridSite to control access to project confidential
information• Regular fortnightly progress meetings – usually followed by live
demonstrations or debugging sessions
31 January 2003 GridPP Collaboration Meeting 18
Access Grid – group conferencing
Multi-site group-to-group conferencing system
Continuous audio and video contact with all participants
Globally deployed
All UK e-Science Centres have AG rooms
Widely used for technical and management meetings
31 January 2003 GridPP Collaboration Meeting 19
CLRC e-Science Centre
• Organisational structure
• Centre’s core programme
• National e-Science role
• Development programme
• Future challenges
31 January 2003 GridPP Collaboration Meeting 20
Integrated e-Science Environment
Framework for distributed scientific computing and experimentation
local remote
Computers
local remote
Data storage
local remote
Experiments
Grid services middleware
Computing Grid service
Data discovery
Grid service
Data visualisation Grid service
“Problem Solving Environments” Domain-specific application interfaces for scientists
AuthenticationAuthorisationAccounting
Experiment control Grid
service
31 January 2003 GridPP Collaboration Meeting 21
Scalable Application Visualisation Services – Lakshmi Sastry
• Motivation– Address the scalability requirements of scientific visualisation
using Grid and Web services– Preserve investment in familiar domain-specific problem solving
environments that are in everyday use– Improve the support for near-real time data exploration of very
large datasets at the desktop• Applications include
– Next generation data analysis software for ISIS MAPS and MERLIN detectors (mslice)
– Crystallography simulation and instrument monitoring (TobyFit)– Diagnostics of oceanographic modelling with assimilation of
observed data (GODIVA)
31 January 2003 GridPP Collaboration Meeting 22
Architecture
• Grid Aware Portals toolkit, GAPtk, provides scalable visualisation services and APIs to embed these services into familiar application portals and PSEs
• New software to be installed on the desk/lap top is minimal – e.g. no requirement to install Apache web servers or Globus type Grid toolkits
• GAPtk client-side software maps its advanced/specialised graphics and interaction onto the native tool’s facilities for drawing and input handling
• All communication between the client and server is based on SOAP• On the server side, all computations are handled as third-party
delegated tasks, generating computational data as well as visualisation data (geometry information from computational data)
• Connection to third party data services for data retrieval is also handled on the server side so that huge volumes of data don’t get to client desktop PC
31 January 2003 GridPP Collaboration Meeting 23
User’s desktop
Data, compute portal and
registry services
GAPtk AV Server
Grid aware visualisation services
GAPtk services communication frontend (SOAP messages parser and generator)
Grid aware application services
Communication to Grid fabric layer using Globus services API
Grid fabric layer: Data & compute resources and networks
Desktop client side software
Application portal (A task based domain specific user interface - e.g. Java or Matlab based UI)
Portal software’s own event handling backend (responds to user input)
Client side interface for GAPtk utilities (additional functionality registered or linked into the portal software using native mechanisms for extensions)
Communication backend to GAPtk services (SOAP generator and parser)
31 January 2003 GridPP Collaboration Meeting 24
DataPortal – Kerstin Kleese
A Grid service connecting experiment, observation and A Grid service connecting experiment, observation and simulationsimulation
PC-filesystem Unix-filesystem Tape Libraries Databases SAN
CLRC DataPortal
CLRC Metadata Model
General Data description language, enabling the integration of
experimental, observational and computational data from various
scientific disciplines.
Security
Authentication
Access Control
Secure Communication
Query Generation
Generating queries based on user request using the CLRC Metadata Model + XML, addressing multiple XML Wrappers
User Information
ID Mapping
Session Control + History
Data Facility Information
Name, Address, Type of Data held
Active Topics
Additional FeaturesResult Presentation
Request reply collection, collation and presentation
Manipulating and Downloading Data
XML-Wrapper
Translates local Metadata Formats into the CLRC Metadata Format. Handles Queries and requests.
External Metadata Databases (e.g. ISIS, BADC)
Linking Data description and physical Data
Visualisation Portal
Providing general and user specific visualisation tools in close
proximity to the data, a range of capabilities is offered.
HPC Portal
Providing Access to Compute Facilities
and codes on demand
SRB - Storage Resource Broker
Transparently integrates Storage resources of various types. Allows to create logical views on physical storage
space.
RasDaMan
Database Management System providing selective contents based access to multidimensional raster
data. Allows manipulation and reduction of data before retrieval.
Data Extraction
http, ftp, Grid-ftp
31 January 2003 GridPP Collaboration Meeting 25
DataPortal features
• Major functions of the DataPortal (DP) are grouped into modules
• Each module has a grid services interface to communicate with the other DP services and in some cases also with outside services like Visualisation or HPC Portal
• Soap protocol is used for communication and WSDL to describe the various services
• DP does not change any local metadata system, but use its own wrappers to translate its general query format into the local syntax
• Replies from the resources will be XML files compliant with the CLRC Scientific Metadata Format.
• As well as interacting with the DP via the Web Interface users can also run queries by directly calling the Query & Reply service assuming that they are properly authenticated
• Other services are also externally visible, for example the Shopping Cart
31 January 2003 GridPP Collaboration Meeting 26
DataPortal users
The DataPortal currently allows access to selected metadata and data from four facilities. The first three housed by CLRC:
The Synchrotron Radiation Department (SRD)
The Neutron Spallation Source (ISIS)
The British Atmospheric Data Centre (BADC)
Max-Planck Institute for Meteorology (MPIM)
Several e-Science projects are now using DataPortal technology: Environment from the Molecular level (NERC)
NERC DataGrid
E-Science technologies in the simulation of complex materials (EPSRC)
European Spatio-Temporal Data Infrastructure for high-performance computing – ESTEDI (EU)
31 January 2003 GridPP Collaboration Meeting 27
Wider InternetNERC Grid
taperobot
XML data-base
XML data-base
BADC NDG Wrapper
OnlineData
OnlineData
BODC NDGWrapper
OnlineData
XML data-base
Group NDGWrapper
Software Agent
Grid User
Satellite Supercomputer
Research Group DataSources
Internet Link
Internet User
Internet LinkESG (&other)Applications
Wider Internet
NDGWeb
Portal
XML data-base
NERC DataGrid – Bryan Lawrence
31 January 2003 GridPP Collaboration Meeting 28
NDG – close links with ESG
TOMCATServlet engine
TOMCATServlet engine
MCSMetadata Cataloguing Services
MCSMetadata Cataloguing Services
RLSReplica Location Services
RLSReplica Location Services
SOAP
RMI
MyProxyserver
MyProxyserver
MCS client
RLS client
MyProxy clientGRAM
gatekeeper
GRAMgatekeeper
CASCommunity Authorization Services
CASCommunity Authorization Services
CAS client
diskMSS
Mass Storage System
HPSSHigh PerformanceStorage System
disk
HPSSHigh PerformanceStorage System
disk
disk
SRMStorage Resource
Management
SRMStorage Resource
Management
SRMStorage Resource
Management
SRMStorage Resource
Management
SRMStorage Resource
Management
SRMStorage Resource
Management
SRMStorage Resource
Management
SRMStorage Resource
Management
gridFTP
gridFTP
gridFTPserver
gridFTPserver
gridFTPserver
gridFTPserver gridFTP
server
gridFTPserver
gridFTPserver
gridFTPserver
openDAPgserver
openDAPgserver
CAS-enabledStriped-gridFTP
server
CAS-enabledStriped-gridFTP
server
LBNL
LLNL
ISI
NCAR
ORNL
ANL
Earth System Grid
Striped gridFTPclient
Striped gridFTPclient
gridFTP
openDAPgserver
openDAPgserver
CAS-enabledStriped-gridFTP
server
CAS-enabledStriped-gridFTP
server
gridFTP
openDAPgserver
openDAPgserver
CAS-enabledStriped-gridFTP
server
CAS-enabledStriped-gridFTP
server
gridFTP
LASLive
AccessServer
LASLive
AccessServer
31 January 2003 GridPP Collaboration Meeting 29
NDG will provide support for
•Small-but-complex datasets.
•Data-mining (searchable metadata).
ESG will provide support for:
• large but simple data sets,
• limited metadata, but not searchable.
NDG is complementary to ESG!
Web-based Data Portal
31 January 2003 GridPP Collaboration Meeting 30
NDG will:
• Provide python based classes for non-gridded observational data to complement the access to 3D gridded data.
• Provide a web services wrapper so that other grid applications can access NDG data.
Example of a Client Application
31 January 2003 GridPP Collaboration Meeting 31
European Grid projects
• ESTEDI– Has developed a framework for storing and retrieving TB-scale
multi-dimensional data for HPC applications in climate modelling, CFD, etc
• European DataGrid / UK GridPP– Building a European Grid for large-scale data-intensive science
• European Grid Support Centre (just starting)– Developing a prototype multi-national support centre – CLRC (UK Grid), CERN (LHC Grid), KTH (NorduGrid)
• Enabling Grids for E-Science and Industry in Europe (EGEE)– FP6 Research Infrastructure proposal to develop a pan-European
framework integrating national and regional Grids to provide a single, coherent operational Grid for science and industry
– Involves all major European countries and regions– Grid Support Centre involved in support and operations aspects
31 January 2003 GridPP Collaboration Meeting 32
CLRC e-Science Centre
• Organisational structure
• Centre’s core programme
• National e-Science role
• Development programme
• Future challenges
31 January 2003 GridPP Collaboration Meeting 33
Some future challenges
• Delivering the promised benefits of the Grid philosophy and Grid technology for science and engineering
• Providing sufficiently credible and scalable solutions that they spread beyond the research community into commercial use
• Growing a body of knowledge in how to use Grid technology which is sufficient to drive and support its widespread adoption
• Defining a robust and enduring standards framework which will ensure the Grid doesn’t become a proprietary battleground
• Ensuring continuing open availability of essential generic Grid middleware in the face of growing pressure towards commercial licensing