Tomasz Haupt
Northeast Parallel Architectures Center, Syracuse University
For Computational Communities
Constructing
Overview
What is a Web Portal? Web Portal Architecture Distributed Components: WebFlow Interfaces:
– Task Descriptor– Grid Interface
Portal Security Summary
Customizable access to information and services
What is a Web Portal?
Portal is not a static web page
It relies on a sophisticated browser technology– DHTML, JavaScript, cookies, applets, …
Server side processing– cgi-bin, servlets, asp, jsp, server side includes, XML
– search engines, mail servers, calendar, ...
Back End– data bases
– credit card processing
– external services: news, weather, stock quotes,...
Computational Portals
To provide a problem-oriented interface (a Web portal) to more effectively utilize HPC resources from the desktop via the Web browser.
This “point & click” view hides the underlying complexities and details of the HPC resources and creates a seamless interface between the user’s problem description on his/her desktop system and the heterogeneous computing resources
These HPC resources include supercomputers, mass storage systems, databases, workstation clusters, instruments, and visualization servers.
Example: Nanomaterials Research
Gaussian
Gamess
convert
convert
datarepository
selectedit
QS QS QSQS
Features: Data Flow computations, user supplied modules, seamlessaccess to heterogeneous mixture of computational resources, seamlessdata transfer, visualizations, data management.Goals: automate the task, maximize throughput.
Example: LMS Landscape Management System
WMS
EDYS CASC2D
DEM Land UseSoil
TextureVegetation
EDYS: vegetation model CASC2D: watershed modelWMS: Watershed Modeling System
Features: access to remote data (distributed databases, internetrepositories), data pre- and postprocessing, tightly coupledapplications running on remote hosts,visualizationsGoal: decision support systemavailable anytime, anywhere
Example: Gateway SystemProblem Solving Environment
Features: access to remote data, data pre- and postprocessing, applications running on remote hosts,visualizations, archivization.Goal: guide the user to select software,generate input files, submit jobs,analyze data; hide complexity anddetails of a heterogeneous back end.
Resources (software, hardware)templates, visualization tools
Resource Allocation
Problem Description
new select arch
Input filesOutput files
Design Issues
Support for a seamless access (security) Support for distributed, heterogeneous Back-End services
(HPCC, DBMS, Internet, ...) managed independently Variable pool of resources: support for discovery and
dynamical incorporation into the system Scalable, extensible, low-maintenance Middle Tier Web-based, extensible, customizable, self-adjusting to
varying capacities and capabilities of clients (humans, software and hardware) Front End
Access to desktop applications
Towards the solution ...
problem description (physics, chemistry, ...)
Task description: I need 64 nodes of SP-2 at Argonne to run my MPI-based executable “a.out” you can find in “/tmp/users/haupt” on
marylin.npac.syr.edu. In addition, I need any idle workstationwith jdk1.1 installed. Make sure that the output of my a.out is
transferred to that workstation
Middle-Tier: map the user’s task description onto the resource specification; this may include resource discovery, and other services
Resource Specification, Control Access, Events
Resource Allocation: run, transfer data, run
Three Tier System
Task Descriptor
Resource Descriptor
Front End: Tools to select or specify the problem to solve
Middle Tier: Translates the user task into resource requests
Back End: Resources and data to execute the task.
Abstract Application Descriptor (AAD)
“man pages written in XML” specifies how to install and run the application on
different hosts [current status of Gateway]
describes requirements, input and output data, options, arguments, etc.
to submit a job it must be reduced to a job descriptor (select host, options, input data…)
More on AAD: http://www.npac.syr.edu/users/haupt/WebFlow/MODULES/AAD.html
Reducing AAD to JDAbstract Application Descriptor to Job Descriptor
AAD
select host
JD
select optionsselect input...
submit
Generatebatch script
GenerateRSL
Informationservice(MDS)
Resource broker
JINI condor
GUI ProblemSolving
Environment
Data Flowmanager
Example Job Descriptor(with selected application, host and i/o files)
<?xml version="1.0"?><!DOCTYPE application SYSTEM "ApplDescV2.dtd"><application id=”Casc2d" installable="No"> selected application<target id="aga.npac.syr.edu"> selected host <status installed="Yes"/> <installed> <CmdLine command="/npac/home/haupt/CASC2D/casc2d" /> how to run it <input> <inFile Path="/npac/home/haupt/CASC2D/lms/" Name="sand.map"/> it expects this input file <source Host="maine.npac.syr.edu" Path="C:\LMS\fromEdys\" Name="S.map" > actual </input> location of the file <output> <outFile Path="/npac/home/haupt/CASC2D/lms/" Name="sed.out"/> it generates this output file <dest Host="maine.npac.syr.edu" Path="C:\LMS\toEdys\" Name="sed.out" > store it there </output> <stdout Host="aga.npac.syr.edu" Path="/npac/home/haupt/CASC2D/history/" Name="job2001.out" > <stderr Host="aga.npac.syr.edu" Path="/tmp/" Name="haupt_job2001.err" > </installed></target> save stdout </application> and stderr
simple job object (atomic task)
“input port”: method to be invoked
“output port”: event fired
run();
success failure
AAD
Complex Tasks
run();
success failure
run();
success failure
run();
success failure
run();
success failure
run();
success failure
run();
success failure
Task Descriptor
A computational task requested by the user may involve many steps.
Some steps can be performed concurrently, but typically there are data dependencies that force execution of the steps in some particular order.
Tasks can be defined recursively. Task may specify resources explicitly, or provide
requirements and/or preferences leaving the selection of resources to the discretion of a resource broker.
<!ELEMENT Task (TaskName, (Task|connection)*, InputPort+, OutputPort+><!ELEMENT TaskName EMPTY><!ATTLIST TaskName name CDATA #REQUIRED descriptor CDATA #IMPLIED><!ELEMENT connection (output+,input+)><!ELEMENT output EMPTY><!ATTLIST outputtask CDATA #REQUIREDevent CDATA #IMPLIED><!ELEMENT input EMPTY><!ATTLIST inputtask CDATA #REQUIREDmethod CDATA #IMPLIED><!ELEMENT InputPort EMPTY><!ATTLIST InputPort task CDATA #REQUIRED><!ELEMENT OutputPort EMPTY><!ATTLIST OutputPort task CDATA #REQUIRED>
ATD.dtd
Example Task Descriptor
<Task><TaskName name="ComplexTask" /> <Task> <TaskName name="atomic_task1" descriptor="task1.xml" /> <InputPort method="run" /> <OutputPort event ="done" /> </Task> <Task> <TaskName name="atomic_task2" descriptor="task2.xml" /> <InputPort event="run" /> <OutputPort method ="done" /> </Task> <connection> <output task="task1" /> <input task="task2" /> </ connection> <InputPort task="atomic_task1" /> <OutputPort task="atomic_task2" /></Task>
How the task descriptors are generated ?
Predefined (“set of scenarios”) Created interactively by the user using
Front End tools Generated by middle-tier components
LMS Front EndNavigate and choose an existing application
to solve the problem at hand.Import all necessary data.
Retrieve data
Pre/post-processing
Run simulations
Select host
Select model
Set parameters
Run
QS Front End
Compose interactivelyyour applicationfrom pre-existing
modules
Data-Flow Front-End
Building an application
XMLA visual representation
is converted into a XMLdocument
XMLservice
WebServer
save
parse
ApplContext
Generates Java code to add modules to ApplContextPublishes IOR
Front-End Applet
Middle-Tier
Gateway Front EndGateway Navigator
where do you want to go today?
Define the systemyou are interested in
Control applet:File AccessJob monitor
Middle-Tier: WebFlow ServerCORBA-based distributed components
• WebFlow server is given by a hierarchy of containers (contexts) and components
• The server is the root context.
• A context • knows its location in the hierarchy• has attributes• maintains a persistent state• controls its children life-cycle• is responsible for intercomponent communications (events)• can be specialized by adding services (WebFlow modules)
User 1 User 2
Application 1
Application 2
App 2App 1
WebFlow Services
Distributed Middleware
Master WebFlow
Server
Web Server
Cli
ents
Dis
trib
uted
Bac
k-E
nd R
esou
rces
DownloadApplet
WebFlowContextProxies
Tas
k D
escr
ipto
r
Gri
d In
terf
ace
JBD
C
Inf
orm
atio
n S
ervi
ces
“slave” WebFlow
Server
User Context
Problem Context
Example: Gateway Components
• Using PSE (Front End) user defines a problem• Session is an instance of it (an attempt to solve it)• Session comprises jobs
• Session context reflects the structure of ATD
• Session context can submit itself
Session Context
Job contextapplication descriptor, job id,date submitted, completed,input file(s), output file(s)
WebFlow Events
Method m
Client
BEvent
Adapteruses
CORBADSI,DII
Event eA
Context 1
Context 2
Module A does not care who is expecting the event; method fire Event
invokes a method of its parent context
Method m is a public method: anyone can invoke it, including the Event Adapter of Context 1.No protection against misuse!
Dynamic Interfaces
A
B
Middle Tier Components
Task Specification
Resource Specification
Component ContainerXMLparser
Fileaccess &transfer
jobcontrol
User Context
profileCredentials
(proxy)
Session ContextJob
object Jobobject
Jobobject
batchscript
generator
resourcebroker
data flowmanager NetSolve
Linear Algebraproxy
PSEsupport
Informationservices
databaseaccess
dataanalysis
Multi-disciplinarytask control
archivization
accesscontrol
contextlifecycle
Grid Interface
How to hide complexity and details of Back End resources?
Example: JDBC
Servlet
Application
Driver Manager
Oracle D. Sysbase D. mSQL D.
Oracle Sysbase mSQL
FrontEnd
Back-End independentbusiness logic
Java.sqlMiddle-Tier
JDBC model to provide accessto computational resources
Servlet
Application
GRAM
PBS NQS CONDOR
O2K SP2 NOW
FrontEnd
Back-End independentbusiness logic
Grid Interface
Grid Interface: access control, allocations, resource look-up, discovery, (co)allocation, monitoring, QoS, fault tolerance, services, events, ... Addressed by Grid Forum. Approximated by Globus.
Portal: builds on top of it, implementing proxies
Example of a proxy module
&(rsl_substitution = (MYDIR “/tmp/haupt”))(DATADIR $(MYDIR)/data)(EXECDIR) $MYDIR)/bin))(executable = $(EXECDIR)/a.out)(arguments=$(DATADIR)/file1)(stdout=(MYDIR)/result.dat))(count=1)
GRAM resource descriptionGenerate Data
Run Job
Analyze
The Run Job module is a proxy module. It generates the RSL in-the-fly and submits the job for execution using globusrun function.
The module has access to a job descriptor.
Security: Issues
Front End (Applet or Application)
Connection through open Internet
access control
Gatekeeper
HPCC resources
Layer 1: Secure Web
Layer 2: Secure Middle Tier
Layer 3: Secure access to resources
Policies defined by resource owners
access control and delegation
The same of different security domain
Security (2)
Different model than most commercial solutions:– charge for service and not CPU used to render
the service– identify by credit card number
Three-tier architecture:– delegation of credentials
Security: CORBA security service
Kerberos/SecurID
Web Server
ORB ORB
MasterWebFlow
Server
ORB
SlaveWebFlow
Server
SECIOP
krsh
C:\>kinit
C:\>krsh
(with forwardable ticket)
Downloadframeset
applet
Front End Middle Tier Back End
SSLWeb Server
ORB
MasterWebFlow
Server
ORB
SlaveWebFlow
Server
https
GlobusGSSAPI
Downloadframeset
applet
Front End Middle Tier Back End
IIOP
Servlets
Proxy Objects
The master creates and maintains proxies for each component– to forward requests from the Web client to remote objects
– Simplify the association of the distributed components
– Enable the communication between the client and the slave servers running on different hosts
– having the capability of logging, tracking and filtering all messages between components in the system to implement fault tolerance and security and transaction monitors.
Distributed Middleware
Master WebFlow
Server
Web Server
Cli
ents
Dis
trib
uted
Bac
k-E
nd R
esou
rces
DownloadApplet
WebFlowContextProxies
Tas
k D
escr
ipto
r
Gri
d In
terf
ace
JBD
C
Inf
orm
atio
n S
ervi
ces
“slave” WebFlow
Server
Clients andtheir servers Middle Tier Custom Servers
Back End Servers andtheir services
Emerging Object Web Multi-Server Model
Summary
We build Web Portals using new, emerging, often ephemeral technologies and standards
What will survive?– Multitier architectures
– Distributed Components
– XML to define interfaces
– Metadata (UML, XMI, …)
What next?– Developer tools for enterprise servers– ASP: Application Service Providers
Summary (2)
Academic example: WebFlow– Gateway, LMS, GEM, NCSA
We extended notion of a Web Portal– support for HPCC– Grid Interface (a new CORBA facility?)– Abstract Task Descriptor in XML
Middle-Tier components as proxies To be added: proxy communication channels