Date post: | 13-Jan-2016 |
Category: |
Documents |
Upload: | molly-goodwin |
View: | 225 times |
Download: | 0 times |
Accessing Grid Resources via Portals and Workflow ToolsAccessing Grid Resources via Portals and Workflow Tools
Sriram Krishnan, [email protected]
Condor pool SGE Cluster PBS Cluster
Globus Globus Globus
Application Services Security Services (GAMA)
StateMgmt
Gemstone
PMV/Vision Kepler
NBCR GridNBCR Grid
User Interfaces: GemstoneUser Interfaces: Gemstone
User Interfaces: AutoDockTools (ADT), PMVUser Interfaces: AutoDockTools (ADT), PMV
User Interfaces: What is a Portal?User Interfaces: What is a Portal?
• “A portal is a web based application that commonly provides personalization, single sign on, content aggregation from different sources and hosts the presentation layer of Information Systems”(JSR 168)
• Grid/Science Portals build upon the familiar Web portal model, such as Yahoo or Amazon, to deliver the benefits of Grid computing to virtual communities of users, providing a single access point to Grid services and resources.
User Interfaces: PortalsUser Interfaces: Portals
• Pros– Ubiquitous access to applications– No need to install complex software
• Cons– Limited interaction with local desktop tools– Interfaces may not be rich enough for complex tasks
such as visualization– Not very easy to make highly interactive interfaces
User Interfaces: The CAMERA Labs PortalUser Interfaces: The CAMERA Labs Portal
CAMERA Labs DemoCAMERA Labs Demo
Portal TechnologyPortal Technology
• Built on top of the GridSphere Portal Framework– http://www.gridsphere.org
• JSR 168 Portlet API compliant– Similar to Servlet API in providing reusable
Web applications – Ratified in August 2003 by vendors including
BEA, Sun, IBM, Oracle, Plumtree, etc
What is a Portlet?What is a Portlet?
• Standardized packaging model to share portlet applications among portal vendors
• Builds off Servlet API and spec. so no major surprises for existing Java portal developers
• Supports window states and mode settings like desktop environment• API provides useful methods for storing per user data and configuration
settings
What makes GridSphere different?What makes GridSphere different?
• Already many other OS portals out there:– Jetspeed2, uPortal, StringBeans, Exo, Liferay, JBoss
• A handy template build system using Apache Ant:– ant new-project
• Lightweight: no EJB, based on popular, robust libraries– e.g. Hibernate for persistence
• Visual UI tags and beans makes presentation development much easier
• Support for the Grid!! – GridPortlets offered as add-on webapp– Provides Library and collection of portlets for:
• Credential support, job launch (GRAM), data transfer (GridFTP)
• Used by several CyberInfrastructure projects like BIRN, NBCR, GEON, CAMERA– Lots of reusable software!
Advanced Usage: Workflows Advanced Usage: Workflows
• Need for automation of processes (scientific or otherwise)– An end-to-end application is typically more
than a single application run– Must be reproducible and maintainable– Should be easy to compose from individual
components
clienttravel agent
airline A airline B
bank/CC
delivery
buy a ticket
tickets
arrive
confirm
Workflow Scenario: BusinessWorkflow Scenario: Business
Scientific Workflows: Phylogeny AnalysisScientific Workflows: Phylogeny Analysis
Local Disk
MultipleSequenceAlignment
PhylogenyAnalysis
TreeVisualization
Scientific Workflow SystemsScientific Workflow Systems
• Combination of – data integration, analysis, and visualization steps – larger, automated "scientific process"
• Mission of scientific workflow systems– Promote “scientific discovery” by providing tools and methods to
generate scientific workflows– Create an extensible and customizable graphical user interface for
scientists from different scientific domains– Support computational experiment creation, execution, sharing,
reuse and provenance– Design frameworks which define efficient ways to connect to the
existing data and integrate heterogeneous data from multiple resources
Why not just a Python script?Why not just a Python script?
• End-users who define, reuse, modify, and specialize workflows would find visual interfaces much easier than scripts– Typically also possible to compile scripts from designed
workflows
• Other advantages:– Modular reuse, application interoperability– Debugging and monitoring– Automated data management (e.g. provenance)– Validation (e.g. data, structural, semantic typing)
• From integrated modeling to execution, optimization, and archival
Ptolemy II: A laboratory for investigating design
KEPLER: A problem-solving environment for Scientific Workflow
KEPLER = “Ptolemy II + X” for Scientific Workflows
Kepler: A Scientific Workflow SystemKepler: A Scientific Workflow System
• 1st Beta release (June 2, 2006)
www.kepler-project.orgwww.kepler-project.org
• Builds upon the open-source Ptolemy II framework
Actor-Oriented DesignActor-Oriented Design
• Actor– Encapsulation of parameterized
actions – Interface defined by ports and
parameters
• Port– Communication between input and
output data– Without call-return semantics
• Model of computation– Communication semantics among
ports – Flow of control– Implementation is a framework
Actors: Processing Components
Available ActorsAvailable Actors
• Generic Web Service Client and Web Service Harvester• Customizable RDBMS query and update• Command-line wrapper tools (local, ssh, scp, ftp, etc.)• Some Grid actors
– Globus Job runner, GridFTP-based file access, Proxy Certificate Generator
• SRB support• Imaging, Visualization Support• Textual and Graphical Output• Some domain-specific actors for Geosciences and Bio-
informatics
Directors: Definition of Workflow SemanticsDirectors: Definition of Workflow Semantics
• Implement different computational models• Define the semantics of
– execution of actors and workflows– interactions between actors
• Kepler is extending Ptolemy directors with specialized ones for Web service based workflows, and distributed workflows
• Process Networks• Rendezvous• Publish and Subscribe• Continuous Time• Finite State Machines
• Dataflow• Time Triggered• Synchronous/reactive model• Discrete Event• Wireless
Dataflow as a Computation ModelDataflow as a Computation Model
• Dataflow: Abstract representation of how data flows in the system
• A dataflow program: a graph– Nodes represent operations, edges represent data paths
• Sound, simple, powerful model of parallel computation– NOT having a locus of control makes it simple!– Naturally distributed model of computation:
– Asynchronous: Many actors can be ready to fire simultaneously– Execution ("firing") of a node starts when (matching) data is available at a node's input ports.
– Locally controlled events– Events correspond to the “firing” of an actor
– Actor:– A single instruction– A sequence of instructions
– Actors fire when all the inputs are available
Vergil is the GUI for KeplerVergil is the GUI for Kepler
• Actor ontology and semantic search for actors• Search -> Drag and drop -> Link via ports• Metadata-based search for datasets
Actor Search Data Search
Actor SearchActor Search
• Kepler Actor Ontology• Used in searching actors and creating conceptual views (= folders)
Currently more than 200 Kepler actors added!
Kepler Provenance Framework Kepler Provenance Framework
• OPTIONAL!– Modeled as a separate concern in the system – Listens to the execution and saves information customized by a set of
parameters• Context: who, what, where, when, and why that is associated with the run• Input data and its associated metadata• Workflow outputs and intermediate data products• Workflow definition (entities, parameters, connections): a specification of what
exists in the workflow and can have a context of its own • Information about the workflow evolution -- workflow trail
• Types of Provenance Information:– Data provenance
• Intermediate and end results including files and db references– Process provenance
• Keep the workflow definition with data and parameters used in the run– Error and execution logs– Workflow design provenance
Kepler Provenance Recording UtilityKepler Provenance Recording Utility
• Parametric and customizable – Different report formats– Variable levels of detail
• Verbose-all, verbose-some, medium, on error
– Multiple cache destinations
• Saves information on– User name, Date, Run, etc…
Kepler Basics: Hello World DemoKepler Basics: Hello World Demo
Advanced Kepler: MEME-MAST WorkflowAdvanced Kepler: MEME-MAST Workflow
Advantages of Scientific Workflow SystemsAdvantages of Scientific Workflow Systems
• Formalization of the scientific process• Easy to share, adapt and reuse
– Deployable, customizable, extensible
• Management of complexity and usability– Support for hierarchical composition– Interfaces to different technologies from a unified interface– Can be annotated with domain-knowledge
• Tracking provenance of the data and processes– Keep the association of results to processes– Make it easier to validate/regenerate results and processes– Enable comparison between different workflow versions
• Execution monitoring and fault tolerance• Interaction with multiple tools and resources at once
SummarySummary
• Presented access to Grid applications via Portals and Workflow tools
• References– PMV, ADT: http://mgltools.scripps.edu/– CAMERA: http://camera.calit2.net– GridSphere: http://www.gridsphere.org– Kepler: http://www.kepler-project.org
AcknowledgementsAcknowledgements
• CAMERA labs portal built in conjunction with the rest of the CAMERA team
• Several slides borrowed from Kepler tutorials presented by Ilkay Altintas [[email protected]]