+ All Categories
Home > Technology > Matchmaking in glideinWMS in CMS

Matchmaking in glideinWMS in CMS

Date post: 15-Jan-2015
Upload: igor-sfiligoi
View: 327 times
Download: 3 times
Share this document with a friend
This document provides a high level overview of how glideinWMS-based instanced do matchmaking in CMS (a High Energy Experiment). The information is accurate as of early Dec 2012.
Popular Tags:
CERN, Dec 2012 glideinWMS matchmaking 1 glideinWMS for users Matchmaking in glideinWMS in CMS by Igor Sfiligoi (UCSD)
Page 1: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 1

glideinWMS for users

Matchmaking in glideinWMSin CMS

by Igor Sfiligoi (UCSD)

Page 2: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 2

Scope of this talk

This talk provides a high level description of howglideinWMS matchmaking

works in CMS.

Reader is expected to be familiar with the CMS experiment environmenthttp://cms.web.cern.ch/

Page 3: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 3

glideinWMS architecture

● A reminder

Central manager


Submit node


Execute node


Submit node

Submit node

Execute node

Execute node

Execute node

Execute node






Page 4: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 4

Two levels of matchmaking

● First in the VO Frontend● To decide where

to provision resources● i.e. where

to send glideins

● Then in the HTCondor Negotiator● To decide

which Job gets the glidein Slot

Central manager


Submit node


Execute node


Submit node

Submit node

Execute node

Execute node

Execute node

Execute node






The two

must havecompatible


Page 5: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 5

Defining the policy

● The VO FE configures the glideins● So it can define the Slot Requirements

● Preferred strategy to leave all policy decisions in the VO FE hands, i.e. both● VO FE matchmaking policy● HTCondor matchmaking policy

● This implies● Users should not define Job Requirements● Instead, publish attributes describing requirements

Easier keep themin sync this way


Page 6: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 6

CMS Production @ CERNPolicies

Page 7: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 7


● The VO FE @ CERN serves the production needs● i.e. Reconstruction and MC production

● Job submission regulated by service managed by a dedicated team, so jobs are● Targeted● Well behaved

At least by and large

Page 8: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 8

Matchmaking policy

● Two dimensions● Grid Site● Single CPU vs HTPC

● The actual policy is the AND of both● Both VO FE policy and HTCondor policy

defined in the VO FE instance configuration

Page 9: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 9

Matching on Grid site name

● User Jobs expected to publish the attributeDESIRED_Sites● e.g. +DESIRED_Sites = “T2_DE_DESY,T2_US_UCSD”

● The G.F. and the glideins advertisingGLIDEIN_CMSSite

● The matchmaking policy isGLIDEIN_CMSSite ∈ DESIRED_Sites

String list

Page 10: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 10

Matching on Job Type

● Use Jobs can publish the attributeDESIRES_HTPC● e.g. +DESIRES_HTPS = 1● If not defined, defaults to 0

● The G.F. And the glideins may advertiseGLIDEIN_Is_HTPC● If not defined, defaults to False

● The matchmaking policy is(GLIDEIN_Is_HTPC==True)==(DESIRES_HTPC==1)

Integer representation of Boolean values

Boolean value

Page 11: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 11

Example submit file

Universe = vanillaExecutable = mcgenArguments = -k 1543.3Output = mcgen.outError = mcgen.errLog = mcgen.log+DESIRED_Sites = “T2_DE_DESY,T2_US_UCSD”+DESIRES_HTPC = 0Requirements = TrueQueue 1

Universe = vanillaExecutable = mcgenArguments = -k 1543.3Output = mcgen.outError = mcgen.errLog = mcgen.log+DESIRED_Sites = “T2_DE_DESY,T2_US_UCSD”+DESIRES_HTPC = 0Requirements = TrueQueue 1

Page 12: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 12

CMS AnaOps @ UCSDPolicies

Page 13: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 13


● VO FE @ UCSD serves CMS analysis users● User Jobs much more chaotic

● Most users don't really understand their needs● Must protect from accidental errors● Yet keep the system flexible

● Net result● More complex policy

Page 14: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 14

Two different policies

● The AnaOps FE actually has two policies● The Regular policy● The Overflow policy

● The Regular policy tries to match resources● Based on User desires

● The Overflow policy “outsmarts” the Users● Will violate User desires without breaking the Jobs● The aim is to finish user jobs sooner● User can opt-out, if he wishes

Page 15: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 15

The Regular M.M. policy

● Four+one dimensions● Grid Site● Single CPU vs HTPC● Memory usage● Job duration● Number of Job Starts

● The actual policy is the AND of both● Both VO FE policy and HTCondor policy

defined in the VO FE instance configuration

Due to preemption

Page 16: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 16

Grid site selection

● This is both similar and different compared to the Production FE @CERN● Serves the same purpose, but supports three

different ways to select a site– Due to historical evolution

● The three options are● GLIDEIN_CMSSite ∈ DESIRED_Sites● GLIDEIN_SEs ∈ DESIRED_SEs● GLIDEIN_Gatekeeper ∈ DESIRED_Gatekeepers

● The actual policy is the OR of the three

Planning to extend to(GLIDEIN_SEs ∩ DESIRED_SEs) ≠∅

Page 17: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 17

Job type selection

● Just like @ CERN

Page 18: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 18

Memory Usage

● Most Grid sites put strict limits on the amount of memory that can be used● Will kill glideins if they exceed the limit

● G.F. and glideins advertise the Entry-specific limitGLIDEIN_MaxMemMBs

● Jobs can explicitly declare the needed memoryrequest_memory● Condor will also measure it at run time

– ImageSize – Virtual memory used– ResidentSetSize – True memory usage

● Policy: JobMemory <= GLIDEIN_MaxMemMBs

Native Condor attribute, no + needed

Use a combinationof these to calculatethe actual JobMemory

Page 19: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 19

Job Duration 1/2

● Glideins have a limited lifetime● Must fit within the limits of the Grid site's queue● Glideins publish the deadlineGLIDEIN_ToDie– Jobs must finish before reaching the deadline

● Final user job lifetime unpredictable● Depends on the type of computing done● User should indicate the expected job lifetime

– Else we have to assume reasonable defaults

Not many users setthis value(s) right now

Page 20: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 20

Job Duration 2/2

● The same type of computation may take different amount of time● e.g. Based on the type of input

● Jobs can declare two attributes● NormMaxWallTimeMins – Expected limit● MaxWallTimeMins – Absolute max limit

● The matchmaking logic is● Use NormMaxWallTimeMins for

the first job startup● Use MaxWallTimeMins for all others

Based on simple assumptionthat the job was killed for

hitting the deadline.

Page 21: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 21

Cut on number of re-starts

● Not really a user configurable property● More an emergency break

● In a properly configured system,should never be triggered● But unexpected problems happen● So better limit the damage

Page 22: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 22

The Overflow Use case

● User Jobs specify a list of sites, because the data they need is there

● With recent versions of CMSSW, jobs can access the data from remote● With a small performance penalty

● We can thus schedule jobs “anywhere”● As long as the needed data is

at a Site that has joined the xrootd federation● But only if no CPU available “close to the data”

– And not too far, eitherhttp://indico.cern.ch/contributionDisplay.py?contribId=381&sessionId=5&confId=149557http://indico.cern.ch/contributionDisplay.py?contribId=232&sessionId=8&confId=149557

Page 23: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 23

The Overflow M.M. policy

● Violate only the “Site selection” rule● Keep all the others

● Plus, add one+one more:● An opt-out mechanism● Delayed matching

Page 24: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 24

New Site M.M. policy

● The user specified attribute is used to flag the job as “Overflowable”● i.e. the job will match if and only if

(DESIRED_<site>s ∩ SUPPORTED_<site>s) ≠∅

● Matching jobs can then run on any glidein● Additional limits can be put in place by the FE,

but mostly invisible to the user

Still support all 3 types of site identification

Page 25: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 25

The opt-out mechanism

● The Overflow policy considers all jobs by default● But Users may want to opt-out some of the Jobs

– Sometimes it is just a need(to get deterministic results, e.g. for testing a site)

● To opt-out, the user defines+CMS_ALLOW_OVERFLOW = False

● The FE will not consider such jobs for Overflowing

Page 26: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 26

Delayed matching

● As said initially, Jobs should preferentially run close to the data● Overflow should only consider jobs

“that cannot find resources close to the data”

● We implemented it based on time● Jobs are matched only

if waiting in the queue for more than 6 hours

Users cannot influence it

Page 27: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 27

Example submit file

Universe = vanillaExecutable = myanaArguments = -k 1543.3Output = myana.outError = myana.errLog = myana.logrequest_memory = 1500+DESIRED_SEs = "dc2-grid-64.brunel.ac.uk,stormfe1.pi.infn.it"+NormMaxWallTimeMins = 7200+MaxWallTimeMins = 14400+DESIRES_HTPC = 0+CMS_ALLOW_OVERFLOW = TrueRequirements = TrueQueue 1

Universe = vanillaExecutable = myanaArguments = -k 1543.3Output = myana.outError = myana.errLog = myana.logrequest_memory = 1500+DESIRED_SEs = "dc2-grid-64.brunel.ac.uk,stormfe1.pi.infn.it"+NormMaxWallTimeMins = 7200+MaxWallTimeMins = 14400+DESIRES_HTPC = 0+CMS_ALLOW_OVERFLOW = TrueRequirements = TrueQueue 1

Page 28: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 28

The End

Page 29: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 29


● glideinWMS Home Pagehttp://tinyurl.com/glideinWMS

● HTCondor Home Pagehttp://research.cs.wisc.edu/htcondor/

● HTCondor [email protected]@cs.wisc.edu

● glideinWMS [email protected]

Page 30: Matchmaking in glideinWMS in CMS

CERN, Dec 2012 glideinWMS matchmaking 30


● The creation of this document was sponsored by grants from the US NSF and US DOE,and by the University of California system
