Date post: | 17-Dec-2015 |
Category: |
Documents |
Upload: | byron-powers |
View: | 216 times |
Download: | 2 times |
Open Science Grid
Project DASH: Securing Direct MySQL
Database Access for the GridD. Malon, E. May, D. Ratnikov, A. Vaniachine
Argonne National Laboratory
M. Vranicar, J. WeicherPIOCON Technologies
XV International Conference on Computing in High Energy and Nuclear PhysicsT.I.F.R., Mumbai, India
February 13-17, 2006
CHEP06 Mumbai India Alexandre Vaniachine (ANL)
Open Science Grid
2
Databases and Grids
• Databases also play a critical role in grid middleware: file catalogues, monitoring, etc.
• Crosscutting the computational grid infrastructure, a database hyper-infrastructure emerges
• In addition to petabytes of file-based event data, high energy physics applications require access to non-event data (detector conditions, calibrations, etc.) stored in relational databases
Workload Orchestration
OSG WLCG NorduGrid
File Transport Production DB
Non-LHC Sites ATLAS Sites
Production DB
Sites
Sites
Cluster
Head Node Edge Services
Worker Node Worker Node Worker Node
Monitoring DB
CMS Sites
RFT Database
PanDA DB
Conditions DB
Meta-data DB
RLS Database
RLS Database RLS Database
Large Scale DistributedComputationsManagement
System
World-Wide Federation ofComputational
Grids
Workload Orchestration
OSG WLCG NorduGrid
File Transport Production DB
Non-LHC Sites ATLAS Sites
Production DB
Sites
Sites
Cluster
Head Node Edge Services
Worker Node Worker Node Worker Node
Monitoring DB
CMS Sites
RFT Database
PanDA DB
Conditions DB
Meta-data DB
RLS Database
RLS Database RLS Database
Large Scale DistributedComputationsManagement
System
World-Wide Federation ofComputational
Grids
CHEP06 Mumbai India Alexandre Vaniachine (ANL)
Open Science Grid
3
Project DASH
• As grid computing technologies mature, development must focus on database and grid integration
• New technologies are required to bridge the gap between data accessibility and the increasing power of grid computing used for distributed event production and processing
• The Database Access for Secure Hyperinfrastructure (DASH) project is funded by the DOE Small Business Innovative Research Program to build and test secure high-performance database access technology for distributed computing
www.piocon.com/DASH.php
A project of PIOCON Technologies, Inc and Argonne National Laboratory
CHEP06 Mumbai India Alexandre Vaniachine (ANL)
Open Science Grid
4
Database Access on the Grid
Two different architectures:• A separate middleware server does the grid authorization:
• OGSA-DAI: SOAP/XML + XML binary extensions
• Spitfire (EDG WP2): SOAP/XML text-only data transport
• Perl DBI database proxy (ALICE): SQL data transport
• Oracle 10g (separate authorization layer)
• Grid middleware is integrated in database server process:• Instead of surrounding database with external secure middleware layers
the safety features are embedded inside of the code
• By pushing secure authorization into the database engine the inefficient data transfer bottlenecks are eliminated
CHEP06 Mumbai India Alexandre Vaniachine (ANL)
Open Science Grid
5
Embedded Security Approach
• The embedded security approach is listed among the top ten innovations in security by the panel of experts convened by Battelle:– “The Global Cyber Net: Communications and
information are the lifeblood of security. Today we enjoy a worldwide web, which is open but unsecured. In the future, we will have a global cyber net that is faster and better protected than today… Software will contain embedded safety features inside of the code rather than just surrounding it.”
http://www.battelle.org/forecasts/defense.stm
CHEP06 Mumbai India Alexandre Vaniachine (ANL)
Open Science Grid
6
End-to-End Secure Transport• DASH technology bridges the gap between data accessibility and
the increasing power of grid computing• To overcome database access inefficiencies inherent in a traditional
middleware approach the DASH project implements secure authorization on the transport level
• Pushing the grid authorization into the database engine eliminates the middleware message-level security layer and delivers transport-level efficiency of SSL/TLS protocols for grid applications
• The DASH proof-of-concept prototype provides Globus grid proxy certificate authorization technologies for MySQL database access control
• DASH technology brings database access efficiencies similar to the https advantages introduced in the Globus Toolkit 4.0
• The database architecture with embedded grid authorization provides a foundation for secure end-to-end data processing solutions for the grids
CHEP06 Mumbai India Alexandre Vaniachine (ANL)
Open Science Grid
7
Aspect-Oriented Programming
• To avoid a brittle, monolithic system DASH uses an aspect-oriented programming approach
• By localizing Globus security concerns in a software aspect, DASH achieves a clean separation of Globus Grid Security Infrastructure dependencies from the MySQL server code
• During the database server build, the AspectC++ tool automatically generates the transport-level code to support a grid security infrastructure
• www.aspectc.org
CHEP06 Mumbai India Alexandre Vaniachine (ANL)
Open Science Grid
8
Automatic Code Generation
cbk.c
grid.ah
vio.c
GlobusGSI code
MySQLdatabase
server code
Auto-generated grid-enabled
MySQLdatabase
server code
DASH grid security
aspects code
tls.c
OpenSSLTransport
Level Security
code
cbk.ccbk.c
grid.ah
vio.c
GlobusGSI code
MySQLdatabase
server code
Auto-generated grid-enabled
MySQLdatabase
server code
DASH grid security
aspects code
tls.ctls.c
OpenSSLTransport
Level Security
code
CHEP06 Mumbai India Alexandre Vaniachine (ANL)
Open Science Grid
9
AOP is the Next ‘Big Thing’A 2001 paper on Aspect Oriented Programming is on Top 10 Downloads from ACM’s Digital Library
• Paper by our collaborators from Illinois Institute of Technology
ATLAS experience with AOP was first reported at the previous CHEP04
CHEP06 Mumbai India Alexandre Vaniachine (ANL)
Open Science Grid
10
Testing New Functionalities
• Prototype servers built with DASH technology are being tested in ANL, BNL, CERN and U Geneva
• We thank to– Jason Smith (BNL)– Yuri Smirnov (BNL)– Frederik Orellana (U Geneva)
Among the new functionalities are• Check for the proxy expiration time• Host name checking (to reject impersonation)
CHEP06 Mumbai India Alexandre Vaniachine (ANL)
Open Science Grid
11
Packaging Challenge
• Initial response from our beta-testers suggested that because of the globus gsi libraries dependencies the preferred distribution would be the static build
• However test showed that static builds works best on the platforms (Linux distributions) very close to those that of the build machine
• We experienced unexpected sensitivities to the minor variations in the glibc library version
• We are now addressing that issue by developing the dynamic build that will have the static globus gsi and openssl libraries built in
CHEP06 Mumbai India Alexandre Vaniachine (ANL)
Open Science Grid
12
Scalability Challenge
• The chaotic nature of opportunistic grid computations results in variations in daily production rates
• Database services capacities should be adequate for peak demand
• Large-scale world-wide distributed simulations performed by the ATLAS Collaboration show steady progress in grid computing
0
2000
4000
6000
8000
10000
12000
14000
Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May
Job
s/d
ayLCG/CondorG
LCG/Original
NorduGrid
Grid3
Data Challenge 2 (long jobs period)
Data Challenge 2(short jobs period)
Rome Production (mix of jobs)
0
2000
4000
6000
8000
10000
12000
14000
Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May
Job
s/d
ayLCG/CondorG
LCG/Original
NorduGrid
Grid3
Data Challenge 2 (long jobs period)
Data Challenge 2(short jobs period)
Rome Production (mix of jobs)
CHEP06 Mumbai India Alexandre Vaniachine (ANL)
Open Science Grid
13
Why Dynamic Deployment?
• The high level of sharing of computational resources achieved on grids result in increased fluctuations in demand for database services, because of the chaotic nature of shared resource availability
• Static services deployment require over-capacity • Opportunistic production on non-LCG sites requires
database services deployment on-demand• To provide on-demand database services capability for
Open Science Grid, the Edge Services Framework activity builds the DASH mysql-gsi database server into the virtual machine image, which is dynamically deployed via Globus Virtual Workspaces
CHEP06 Mumbai India Alexandre Vaniachine (ANL)
Open Science Grid
14
Edge Services• Services executing on the edge of the public and
private network
Site
CDFCMS ATLAS
GuestVO SECE
Compute nodes and Storage nodes
• See CHEP06 contribution id # 214http://indico.cern.ch/contributionDisplay.py?contribId=214&sessionId=7&confId=048
CHEP06 Mumbai India Alexandre Vaniachine (ANL)
Open Science Grid
15
Synergistic Collaboration
CMS & ATLAS collaborate in OSG ESF Activity
http://www.opensciencegrid.org/esf
To achieve the ESF proof-of-concept milestone:• The first ESF VM was deployed by CMS• The first ESF service on that VM was by ATLAS:
– Grid-enabled MySQL database built by the DASH project• To access the server the grid job used proxy certificate (instead of the clear-text passwords hardwired in the scripts that are distributed world-wide)
CHEP06 Mumbai India Alexandre Vaniachine (ANL)
Open Science Grid
16
Collaboration Benefits
Celebrating ESF proof-of-concept milestone at Supercomputing 2005
CHEP06 Mumbai India Alexandre Vaniachine (ANL)
Open Science Grid
17
Globus Folder at SC05http://www.globus.org/alliance/events/sc05
CHEP06 Mumbai India Alexandre Vaniachine (ANL)
Open Science Grid
18
Complementary Project
• A new collaborative project with the Globus team has just started at Argonne– to grid-enable the PostgreSQL database
• Both DASH and the new project target technology integration with OSGA-DAI
• Please contact us if you are interested to contribute to these projects
CHEP06 Mumbai India Alexandre Vaniachine (ANL)
Open Science Grid
19
OGSA-DAI Complementarity
• Neil P Chue Hong, OGSA-DAI Status SummaryThird OGSA-DAI Users Group Meeting, 6/1/2005
• Through our continued interactions with OGSA-DAI team we have established working relationships to achieve technological compatibility
CHEP06 Mumbai India Alexandre Vaniachine (ANL)
Open Science Grid
20
Additional Benefits
• Direct access to database servers unleashes a broad range of vendor-specific server capabilities for data processing applications: distributed XA transactions, binary data transport, etc.
• Grid proxy certificate technology opens technical opportunities to enable fine-grained delegation of rights for access control (attribute certificates)
• Grid-enabled relational database server technology has the potential for application beyond the domain of high energy physics, and is of interest to bioinformatics and other data-intensive sciences
CHEP06 Mumbai India Alexandre Vaniachine (ANL)
Open Science Grid
21
DASH Technologies DASH Collaborators and Early Adopters
AspectC++ http://www.aspectc.org
Open Science Grid Edge Services Framework http://www.opensciencegrid.org/esf
Globus http://www.globus.org
ATLAS Distributed Database Services http://atlas.web.cern.ch/Atlas/GROUPS/DATABASE/project/services
MySQL http://www.mysql.org IIT
Illinois Institute of Technology Concurrent Programming Research Group http://www.iit.edu/~concur
DASH Presentations at the Conferences and Workshops
Supercomputing 2005, November 12-18, 2005 Washington State Convention and Trade Center, Seattle, Washington, USA
http://osg-docdb.opensciencegrid.org/cgi-bin/ShowDocument?docid=307 First DIALOGUE Workshop: Applications-Driven Issues in Data Grids August 1-2, 2005, The Ohio State University, Columbus, Ohio, USA
http://www.datagrids.org/ws/docs/High-performanceDatabaseAccess.ppt
DASH Outreach