+ All Categories
Home > Documents > Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT...

Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT...

Date post: 22-Dec-2015
Category:
Upload: ashley-mckenzie
View: 212 times
Download: 0 times
Share this document with a friend
Popular Tags:
25
Experimen t Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010
Transcript
Page 1: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport

Introduction to HammerCloud for The LHCb Experiment

Dan van der Ster

CERN IT Experiment Support

3 June 2010

Page 2: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport Outline

• Introduction to HammerCloud– Motivation, History, Use-Cases

• How HammerCloud works– Design and Implementation Details

• Interface Tour for Users and Admins

• Possibilities for an LHCb Plugin

HammerCloud Introduction for LHCb – 2

Page 3: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport Introduction to HammerCloud

• HammerCloud (HC) is a Distributed Analysis testing system serving two use-cases:– Robot-like Functional Testing: frequent “ping” jobs to all

sites to perform basic site validation– DA Stress Testing: on-demand large-scale stress tests

using real analysis jobs to test one or many sites simultaneously to:• Help commission new sites• Evaluate changes to site infrastructure• Evaluate SW changes• Compare site performances…

HammerCloud Introduction for LHCb – 3

Page 4: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport HammerCloud and Job Robots

• HammerCloud is part of an evolution of job robots:– CMS Job Robot inspired the ATLAS GangaRobot (functional testing)– In ~Sept 2008, a form of the ATLAS GangaRobot was used to

manually stress test the Italian ATLAS Tier2’s:• 5 users manually submitting hundreds of instrumented jobs simultaneously

(SIMD)• Manual results collection and summarization• Early results were shown to be very useful:

– One early test showed a bimodal performance plot that was later traced to a faulty network switch which negatively affected the performance of some WNs. The need for an automated DA stress testing system was clear.

– HammerCloud was born in November 2008 to deliver on-demand stress tests to ATLAS sites:

• Since then HC has run >1300 “Tests” using more than 4 million jobs.• ATLAS has invested >200k CPU-days in HC tests

– CMS has also agreed to use HC: in April a prototype was delivered, and now scale tests are about to begin.

HammerCloud Introduction for LHCb – 4

Page 5: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport HC and ATLAS during STEP’09

HammerCloud Introduction for LHCb – 5

STEP’09

Page 6: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport HammerCloud Use-Cases

• Provides On-Demand and Automated Testing

• HC Operators define test templates: FUNCTIONAL and STRESS

• Functional Tests are automatically scheduled

– Results are published on the HC website and can be pushed to other systems (e.g. SAM)

• Stress tests are generally scheduled on demand as needed by:

– Central VO managers– Cloud/Regional managers– Site managers

• For all tests, a detailed report summarizing the job success rates and performances is produced.

HammerCloud Introduction for LHCb – 6

Page 7: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport HammerCloud Components

• The HC UI is implemented as a Django web app:– View test results– View cloud/site evolution– DB Admin

• State is maintained in a MySQL DB

• HC Logic (job submission, monitoring, resubmission) implemented on top of the Ganga Grid Programming Interface (GPI)

HammerCloud Introduction for LHCb – 7

Page 8: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport HammerCloud Logic

• An HC Test is described by:– The analysis code to run (typically a real analysis from the user community)– The dataset pattern (which can be resolved to a set of datasets appropriate

for the analysis code)– The list of sites to be tested, and the target number of jobs to run

concurrently per site– A start time and an end time

• Test execution proceeds in 4 steps:– Generate: Test description is converted to a set of submittable jobs (e.g.

Ganga job objects, one for each site under test)– Submit: the job objects are submitted– Run: jobs are monitored, outputs recorded to the HC DB, jobs are

resubmitted to achieve the target number of running jobs per site– Exit: at the test end time, leftover jobs are killed

• Concurrently, the HC Web shows real time test results

HammerCloud Introduction for LHCb – 8

Page 9: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport An HC-LHCb Plugin

• What customizations would be needed for an HC-LHCb plugin?

• HC is built upon Ganga and exploits its job management features:– job repository, job configuration via

python, job submission, job monitoring in background thread(s)

• Given the existing GangaLHCb plugins, modifications to HC itself would be relatively minor, e.g.– HC Test Generation:

• Query a data discovery service to form a job processing random input data

– HC Test Running:• Changes to extract LHCb-specific job

metrics from Ganga

HammerCloud Introduction for LHCb – 9

Page 10: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport

Interface Tour

1. The Public User Interface

HammerCloud Introduction for LHCb – 10

Page 11: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport HC Home

• The HC Homepage lists the running and scheduled tests.

HammerCloud Introduction for LHCb – 11

Page 12: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport Viewing a Test

• The test overview gives a quick summary of: Overall job efficiency, CPU/Walltime, Events/WrapperTime

• Also shows a summary of the jobs running at each site involved in the test.

HammerCloud Introduction for LHCb – 12

Page 13: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport Viewing a Test: Summary Stats

• The Test Overview page also gives summary statistics by site• Here you can see some example metrics (for CMS)

HammerCloud Introduction for LHCb – 13

Page 14: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport Viewing a Test: Per-Site Plots

• View plots of the recorded metrics for each site

HammerCloud Introduction for LHCb – 14

Page 15: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport Viewing a Test: Metric Comparisons

• View the plots for all sites for a specific metric

• Used to compare site-by-site

HammerCloud Introduction for LHCb – 15

Page 16: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport Modify a Running Test

• Authorized users can modify the parameters of a test at run time– E.g. change the end time, or number of running jobs per site

HammerCloud Introduction for LHCb – 16

Page 17: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport Clone a Previous Test

• Cloning a previous test is simple– Useful to repeat the test or to run an identical test at a

different set of sites

HammerCloud Introduction for LHCb – 17

Page 18: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport Overall HC Plots

• Historical plots show previous test statistics• Currently shows # running jobs per site. Plots showing the

evolution of the performance metrics are in development.

HammerCloud Introduction for LHCb – 18

Page 19: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport HC Robot View

• The “Robot” view is used to show the success rates of functional test jobs over the past 24 hrs. (Similar to SSB)

• Clicking a site takes you to the list of Robot jobs executed at that site

HammerCloud Introduction for LHCb – 19

Page 20: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport

Interface Tour

2. Admin Interface

HammerCloud Introduction for LHCb – 20

Page 21: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport HC Admin: Operator and User Views

• HC Operators have access to admin all tables in the HC DB via a web interface

• HC Users have more limited access

HammerCloud Introduction for LHCb – 21

Page 22: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport HC Admin: Tests and Templates

Above: List all Test Templates Below: List all Tests

HammerCloud Introduction for LHCb – 22

Page 23: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport HC Admin: Edit a Test Template

• Test templates are defined via the Admin UI

• All of the parameters of a test are here, plus:– An active flag indicating that a

template should be auto-scheduled

– A default lifetime: auto-scheduled test instances of this template will run for this time period

• Normally, functional test templates include the list of sites to be tested, whereas stress test templates do not include a list of sites.

HammerCloud Introduction for LHCb – 23

Page 24: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport HC Admin: Adding a new Test

• Adding a new test on-demand is simple. Select the test template of interest, a start time, and an end time.

• If needed, Tests can be further customized after the template is copied over.

HammerCloud Introduction for LHCb – 24

Page 25: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.

ExperimentSupport Summary

• HammerCloud is a DA functional and stress testing system used widely by ATLAS and coming soon for CMS

• Two basic use-cases:– Continuous stream of test jobs to measure site availability– Enable central managers to define standardized (stress)

tests, and empower site managers to invoke those tests on-demand.

• An HC-LHCb plugin would leverage the existing GangaLHCb work– A prototype plugin would not take significant effort

HammerCloud Introduction for LHCb – 25


Recommended