+ All Categories
Home > Documents > Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago,...

Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago,...

Date post: 01-Jan-2016
Category:
Upload: meredith-blankenship
View: 215 times
Download: 1 times
Share this document with a friend
Popular Tags:
30
Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL
Transcript
Page 1: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Progress on Release, API Discussions,Vote on APIs, and PI mtg

Progress on Release, API Discussions,Vote on APIs, and PI mtg

Al GeistJanuary 14-15, 2004

Chicago, ILL

Page 2: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Coordinator: Al Geist

Participating Organizations

ORNLANLLBNLPNNL

PSCSDSCIBMSGI

SNLLANLAmesNCSA

CrayIntelUnlimited Scale

Participating OrganizationsParticipating Organizations

Changes

Page 3: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

IBMCrayIntelSGI

Scalable Systems SoftwareScalable Systems Software

Participating Organizations

ORNLANLLBNLPNNL

NCSAPSCSDSC

SNLLANLAmes

• Collectively (with industry) define standard interfaces between systems components for interoperability

• Create scalable, standardized management tools for efficiently running our large computing centers

Problem

Goals

• Computer centers use incompatible, ad hoc set of systems tools

• Present tools are not designed to scale to multi-Teraflop systems

ResourceManagement

Accounting& user mgmt

SystemBuild &Configure

Job management

SystemMonitoring

www.scidac.org/ScalableSystems

To learn more visit

Page 4: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Potential Impact of ProjectPotential Impact of Project

Fundamentally change the way future high-end systems software is developed and distributed

Reduced facility management costs

• reduce need to support ad hoc software

• better systems tools available

• able to get machines up and running faster and keep running

More effective use of machines by scientific applications

• scalable launch of jobs and checkpoint/restart

• job monitoring and management tools

• allocation management interface

Page 5: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Grid Interfaces

Accounting

Event Manager

ServiceDirectory

MetaScheduler

MetaMonitor

MetaManager

SchedulerNode StateManager

AllocationManagement

Process Manager

UsageReports

Meta Services

System &Job Monitor

Job QueueManager

NodeConfiguration

& BuildManager

Standard XML

interfaces

Working Components and Interfaces (bold)

authentication communication

Components written in any mixture of C, C++, Java, Perl, and Python can be integrated into the Scalable Systems Software Suite

Checkpoint /Restart

Scalable Systems Software SuiteScalable Systems Software Suite

Validation & Testing

HardwareInfrastructure

Manager

First Release at SC2003First Release at SC2003

Packaging&

Install

Page 6: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Scalable Systems Software CenterSeptember 11-12Washington DC

Review of Last MeetingReview of Last Meeting

Details inMain project notebook

Page 7: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Highlights from Sept. mtgHighlights from Sept. mtgRusty Lusk – Using SSS as the production systems software on Chiba City for a number of months now. Use restriction syntax for everything. Got blessing of ANL sysadmin group.

Scott Jackson – Standard Error reporting and codes across components. Discuss dividing up code space in consistent way.

Eric Debenedictus – Issues for peta-scale systemsRedstorm and Bluelight mesh rather than switch means that topology is important consideration for SSS to consider:XML attribute to specify topology and I/O resourcesXML attribute to specify data arrangement on diskOS functionality hints to help auto placement

Thomas Naughton – SSS deployment using OSCARA release of OSCAR that contains all SSS softwareRoll SSS components into OSCAR packages – RPM formatCreate repository for OSCAR package uploads

Page 8: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Highlights from Sept. mtg (cont.)Highlights from Sept. mtg (cont.)

Al Geist – Plans for SC2003

Working Group Leaders –What areas their working group is addressing Progress report on what their group has done Present problems being addressed Next steps for the group Discussion items for the larger group to consider

Long Term Strategy – Get Computer Centers involved and using suiteGet vendors to be compliant with APIs

Slides can be found in Main Notebook

Page 9: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Consensus and Voting:Consensus and Voting:

Communication Infrastructure SpecWire protocols – need to add security envelope protocolAdded service location. Bootstrapped using /etc/sss/Vote to Accept as spec for •Wire Protocol definition to get new ones accepted•Service Directory interface•Event Manager interface Second vote: 16 yes 2 abstaining 0 no

Agreement for having common error objects with 3 digit codes and messages. Message is human readable string. Two special ones 000 success 999 unknownStraw vote: 15 no 1 Abs 0

Al suggests these general error classes – success, warning, temp failure, partial failure, failure

People need to come up with counter proposal if they care

Page 10: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Scalable Systems Software Center

September-January

Progress Since Last MeetingProgress Since Last Meeting

Page 11: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Systems Software Suite ReleaseSystems Software Suite Release

Open Source License – Fred asks that we come up with one general text that all organizations can agree on and then he will bless it. DONE

SSS-OSCAR – Packaging done of all components (working around those components with license issues)

First Release – Announced at SC2003. Available from project web site www.scidac.org/ScalableSystems

Page 12: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

SC2003 Scalable System Demos and TalksSC2003 Scalable System Demos and Talks

Rusty – fancy dancing meatball in wxpythonThomas – SSS-OSCAR working Will – fancy graphic demonstration of APITest ????Brett – demonstrate swapping components in SSS architecture Paul – chkpoint interacting with PM on chiba

Locations: All Across the show floor

SciDAC booth – Talks by Rusty, Craig

OSCAR BOF on Tuesday 5:00-6:00 mentions SSS-OSCAR

Page 13: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Five Project NotebooksFive Project Notebooks

A main notebook for general information

And individual notebooks for each working group

• Over 297 total pages – 16 added since last meeting

• BC and PM groups need to get specs into their notebooks

• Add Telecom meeting notes even if short

Get to all notebooks through main web site www.scidac.org/ScalableSystems

Click on side bar or at “project notebooks” at bottom of page

Page 14: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Bi-Weekly Working Group TelecomsStarting back up after Holidays

Resource management, scheduling, and accounting

Tuesday 3:00 pm (Eastern) 1-800-664-0771 keyword “SSS mtg”

Validation and Testing Group

No need for telecoms recently

Proccess management, system monitoring, and checkpointing

Thursday 1:00 pm (Eastern) 1-877-252-5250 mtg code 160910

Node build, configuration, and information service

Thursday 3:00 pm (Eastern) 1-888-469-1934 mtg code (changes)

Page 15: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Scalable Systems Software Center

January 14-15, 2004

This MeetingThis Meeting

Page 16: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Major Topics this MeetingMajor Topics this Meeting

Stability of Systems Software Suite – first release is out. Are we ready for a more robust second release

Large Scale test run – NCSA has dedicated some time tonight to run our suite on their 1250 dual node cluster

Quarterly Report Due – would like to get one to Fred by end of January. Will need text from WG leaders.

Formal API presentations and voting - it is that time in the project when we are finalizing on some APIs.

SciDAC PI Mtg - March 22-24 in Charleston SC. We will need poster(s), talk, and 2 page summary document

Page 17: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Agenda - January 15Agenda - January 15 8:30 Al Geist – Project Status. 9:15 Thomas Naughton – SSS OSCAR software suite release Working Group Reports

Progress report on what their group has done API Proposals for adoption by the groupProgress on software suite improvements

9:30 Narayan Desai – Node Build, Configure10:30 Break11:00 Will McClendon – Validation and Testing 12:00 Lunch (on own – cafeteria room B) 1:00 Paul Hargrove – Process Management 2:00 Scott Jackson – Resource Management 3.00 Break 3:30 Narayan - Review of "restriction syntax" style of XML 4:00 Rusty - Discussion of restriction syntax for scheduler and queue mgr 4:30 Craig – Brief on on big testbed run 5:00 Eric – competitive system to SSS 5:30 Adjourn Evening Working groups may want to help with large NCSA test run

Page 18: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Agenda – January 16Agenda – January 16

8:30 Discussion, proposals, votes

Rusty - Process Manager API (discussion/vote) Narayan - Node state API (discussion/vote) Scott – Allocation Manager API (discussion/vote) Brett – Queue manager API (discussion/vote) Scott – SSSRMAP interface Al - Progress report Al - SciDAC mtg 2 pager, posters, talks

10:30 Break11:00 Al Geist – Summary SciDAC PI Mtg March 22-24, Charleston SC next meeting date: May 13-14

location: Argonne

12:00 meeting ends

Page 19: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Meeting notesMeeting notes

Al presents his slidesThomas Naughton – SSS deployment using OSCARGood – RPMs created for all SSS components! OSCAR packaging (varying levels) SourceForge project supplied central CVS locationBad – not all scripts are created equal – new untested submissions Some pain getting SF accounts. Time constraints forced script hacks OSCAR testing framework Status – Tarball available fairly toxic but builds full working cluster w/ SSS Updated OSCAR pkg HowToToDo – clean up hacks, integrate remaining SSS components (qbank) Add SSS interface to OSCAR itselfWould like to establish release schedule – March 1 Not clear that anyone has downloaded yetDiscussion of how many orgs in our group could shakedown the tarball Group feels better to have few very reliable components than all components

Page 20: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Meeting notesMeeting notes

Narayan – node build progress reportOnly had a few minor bug fixesInfrastructure has been reliable for 6 monthLibrary updates: Portability - OSX support, 64-bit tested, Tru64 support Thread-safety SSL wire-protocol module – soon to be the default protocol in ssslibNode state manager – reliableBuild System – building vs configuration interface/conflict issuesHardware infrastructure – model needs refinement WRT topology infoRestriction Syntax augmentations New operators added – negations, numeric, regular expression Integrated into all python componentsNext steps – work on new models for hardware infrastructure Work on multiple implementations of BCM components Performance tuning – for ssslib, event manager, service directory

Page 21: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Meeting notesMeeting notes

Will McLnedon – Component Interface testing reportDescription of his work for the new folksSC2003 demo of APItest v.1 in ASCI booth (GUI HTTP interface) built on Twisted Framework www.twistedmatrix.com Db interfacing, distributed component testing, HTTPD modeAPItest development. Lessons learned. V.2 new test file formats – collab with Jackson separate individual tests from batch grouping Runs through some examples.Feedback is encouragedHope to get some real test suites going this quarter

Ron Oldfield – introduced

Shows graphical APItest demo that was given at SC2003

Page 22: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Meeting notesMeeting notes

Paul Hargrove – Process management reportSSS-OSCAR releaseComing to a point where components have to interact more eg. ChkptReal deployment/testing on Chiba (ANL), XTORC (ORNL)Checkpoint manager – progress ported to RH9 (hard – Red Hat kernel’s…) checkpoint using LAM/MPI stand-alone package w/ LAM/MPI for chkpt suspend/resume interface working with queue managerOutstanding issues – need to design restart-time interactions need to implement a full interface - restriction syntax, event generation, error reporting basic ideas on file managementMonitoring progress in SSS-OSCAR Scalability work – thread pool, internal protocol changes fix service directory connections write documentation

Page 23: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Meeting notesMeeting notes

Process manager (cont)Rusty Lusk – Process Manager functionality overviewShow Schematic of process management componentsVarious commands that are in the syntaxProgress – already a stable component, fixed several bugs at SC03Improved queries and error codesFuture INTEGRATION! Stable software makes this possible Chiba production use has forced the issue Continued development

Page 24: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Meeting notesMeeting notes

Scott Jackson – Resource Manager reportShort overview for new attendeesProgress – released in SSS-OSCAR Bamboo, Maui, Gold, Warehouse Updated RM web page for new components being available Deployed user oriented problem response system Created SSSRMAP C-implementation module Completed per-component interface documentsSchedule Progress - Completed chkpt/restart based SSS calls. blocked until can test with checkpoint guys - support for dynamic jobs blocked until support provided in PM and QM discussion of feature of dynamic jobs how/if we should work on it - resource limit enforcement and tracking need rusage on process exit blocked until support from PM and QM progressToo much blocking seems RM group lacks coordination with other groups.

Page 25: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Meeting notesMeeting notes

Scott Jackson – Resource Manager report (cont)Initial release of Bamboo and wrote API documentAccounting and allocationQbank was an initial solution replaced by GoldGold – released under BSD open source licence packaged as tarball. And initial OSCAR rpm created added support for Service Directory registration implemented status codes implemented instance-level role-based authorizationGold running on 11 TF cluster at PNNLGUI improved to include user, project, machine management viewsMeta-scheduler – added thread support improved Silver installation procedure testing of (grid level) data stagingFuture- draft of SSSRMAP v3 protocol spec (chunking) release alpha versions of Bamboo, Maui, Gold, Warehouse complete design spec documents for above components.

Page 26: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Meeting notesMeeting notes

Discussion of having two XML syntax styles (functional, object)Al says he would like to see one common one across the suitethat he didn’t care which one as long as the whole group could agree.Rusty brought up a second issue, wire protocol, and having a single library that has all the protocols used by the components in theSSS suite.

Narayan – Restriction Syntax OverviewCommand syntax – incorporates imperative and database operations allows uniform data queries across components easy to process improves atomicity of operationsSemantics – Examples given going across attributes are ANDed and multple lines are OredAn issue of uniqueness was brought up and will be taken into consideration by Narayan.

Page 27: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Meeting notesMeeting notes

Rusty – Restriction Syntax on Chiba CityDavid would like to see a paper of the requirements that the Chibaeffort required.Narayan – Hack of quick interfaces for Queue ManagerRestriction Interface has 4 commands (add, del, run, get)Doesn’t show Scheduler Interface

Craig – 1280 dual xeon cluster “Titanium” is available this eveningTo test the scalability of SSS suite. One node will be used asHead node to install our suite and run on entire cluster.Could build everything but Bambo and ssslib due to XersesWill begin to be available at 6pm

Eric – A competing package. From his Russian “secret city” trip Oct. 03Package for - Distributed calculations, metacomputing, Grid.System is based on XML, web-based user interface,Configure, manage, and submit jobs. Challenges auto load balance.

Page 28: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Meeting notesMeeting notes

Late night session on 1280 node testbedPM ran at 1280 worked at 4000, hung at 6000Warehouse had a problem at 1280 and took out head nodeRM components ran on head node OK until Warehouse crashed it

Rusty – Process Manager Spec for first votePresentation and discussion…Who is responsible for limited enforcement PM or QM? I.e.Must use certain amount of memory, must not execute OS command(in general - things that happen after fork)Rusty says the question is good and he needs to think about How this may affect the interface.Other items to think about - use of wildcard as “to be returned” operator – OK - Inclusion but don’t show me. - Dynamic jobs and PM. - improve readabilityDelay vote until we have a written proposal.

Page 29: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Meeting notesMeeting notes

How to write spec to describe how XML should be extended to future needs.

Narayan – Node State Manager spec (no written doc so no vote)Presentation and lots of discussion…

Scott – Allocation Manager spec (has written doc in notebook)Goes through examples in the document. Discussion.Switches to discussion of comparison between both XML syntaxAnd Andrew Lusk thinks that a translator could be created for queries(but not for output) Rusty thinks it is a bad idea and feelsIt is not problem to have two syntax.David says the translation is good because it could buy time to switch syntax

Andrew and Paul and Craig offer to help build a prototype translatorTo see how / if it is possible.

Investigate standardization of tokens across the two syntax

Page 30: Progress on Release, API Discussions, Vote on APIs, and PI mtg Al Geist January 14-15, 2004 Chicago, ILL.

Meeting notesMeeting notes

How


Recommended