+ All Categories
Home > Documents > Using HTCondor Glideins to Run in IceCube Heterogeneous ...

Using HTCondor Glideins to Run in IceCube Heterogeneous ...

Date post: 31-Dec-2016
Category:
Upload: phungtuyen
View: 216 times
Download: 0 times
Share this document with a friend
15
Using HTCondor Glideins to Run in IceCube Heterogeneous Resources David Schultz IceCube
Transcript
Page 1: Using HTCondor Glideins to Run in IceCube Heterogeneous ...

Using HTCondor Glideins to Run in IceCube

Heterogeneous ResourcesDavid Schultz

IceCube

Page 2: Using HTCondor Glideins to Run in IceCube Heterogeneous ...

Overview

− Grid sites: GlideinWMS

− Non-grid sites: pyglidein

− Various resource types:

CPUs, GPUs, large memory

Page 3: Using HTCondor Glideins to Run in IceCube Heterogeneous ...

Grid sites: GlideinWMS

IceCube

OSG

Page 4: Using HTCondor Glideins to Run in IceCube Heterogeneous ...

Grid sites: GlideinWMS

Since 2013, IceCube has used the GLOW VO on the Open Science Grid, through CHTC

Page 5: Using HTCondor Glideins to Run in IceCube Heterogeneous ...

Grid sites: GlideinWMS

− Moving to icecube VO− Still leveraging CHTC / OSG− Adding more sites:

‣ Germany‣ Canada‣ Other IceCube grid sites

Page 6: Using HTCondor Glideins to Run in IceCube Heterogeneous ...

Non-grid sites: pyglidein

https://github.com/WIPACrepo/pyglidein

Page 7: Using HTCondor Glideins to Run in IceCube Heterogeneous ...

Non-grid sites: pyglidein

− Standard HTCondor server‣ Shared port and CCB to make

networking easier

− server.py user script‣ Query HTCondor every X minutes‣ Aggregate idle job resource requests‣ Present requests via http / jsonrpc

Page 8: Using HTCondor Glideins to Run in IceCube Heterogeneous ...

− client.py user script‣ Query server.py for requests

‣ Check local queue for # idle

‣ Submit new requests

− submit.py‣ Handles abstraction of different job schedulers

Non-grid sites: pyglidein

Page 9: Using HTCondor Glideins to Run in IceCube Heterogeneous ...

Non-grid sites: pyglidein

− Glidein job‣ Get resources allocated by scheduler

⁃ Environment variables from submit.py

⁃ Auto-sense for assigned GPU(s)

‣ Pass resources to HTCondor Startd

Page 10: Using HTCondor Glideins to Run in IceCube Heterogeneous ...

Non-grid sites: pyglidein

− Started in 2015‣ Simple, non-optimized, yet ran 20% of production

− Can be deployed in minutes by a non-expert

− Because we host it, updates are fast‣ GPU errors at a new site fixed in a day

‣ Latest parrot version needed for our OpenCL code

Page 11: Using HTCondor Glideins to Run in IceCube Heterogeneous ...

Non-grid sites: pyglidein

− Several collaboration sites have small, local clusters

‣ Pyglidein gives them a way to contribute in a non-monetary way

Page 12: Using HTCondor Glideins to Run in IceCube Heterogeneous ...

Non-grid sites: pyglidein

− Used for IceCube supercomputer allocations through XSEDE:‣ Comet (>10,000 GPU hours used so far)

‣ Bridges (coming soon)

Page 13: Using HTCondor Glideins to Run in IceCube Heterogeneous ...

Heterogeneous Resources

− IceCube jobs need (variously):‣ Large memory

‣ Large scratch disk

‣ GPUs

Page 14: Using HTCondor Glideins to Run in IceCube Heterogeneous ...

Heterogeneous Resources

− HTCondor partitionable/dynamic slots

‣ A regular single slot:

‣ PBS high memory:

‣ Whole node:

4 CPU, 10GB

1 CPU, 6GB 1 1 1

Glidein - 1 CPU, 2GB

Slot - 1 CPU, 2GB

24 CPU, 64GB, 2 GPU

1 CPU, 2GB, 1 GPU 1 CPU, 2GB, 1 GPU 1 CPU, 10GB 11 ...1

Page 15: Using HTCondor Glideins to Run in IceCube Heterogeneous ...

Questions?


Recommended