+ All Categories
Home > Documents > Open Science Grid: More compute power Alan De Smet [email protected].

Open Science Grid: More compute power Alan De Smet [email protected].

Date post: 23-Dec-2015
Category:
Upload: brittany-nicholson
View: 215 times
Download: 0 times
Share this document with a friend
Popular Tags:
13
Open Science Grid: More compute power Alan De Smet [email protected]
Transcript
Page 1: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc.edu.

Open Science Grid:More compute power

Alan De Smet [email protected]

Page 2: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc.edu.

chtc.cs.wisc.edu

(CPU days each day averaged over one month)

CHTC Cores In Use

1,500

Page 3: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc.edu.

chtc.cs.wisc.edu

(CPU days each day averaged over one month)

OSG Cores In Use

60,000

Page 4: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc.edu.

chtc.cs.wisc.edu

Open Science Grid

Page 5: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc.edu.

chtc.cs.wisc.edu

CHTC and OSG usage

(CPU days each day)

Page 6: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc.edu.

chtc.cs.wisc.edu

Challenges Solved

We worry about all of this.

You don’t have to.

›Authentication X.509 certificates, certificate authorities, VOMS

›Interface Globus, GridFTP, Grid universe

›Validation Linux distribution, glibc version, basic libraries

Page 7: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc.edu.

chtc.cs.wisc.edu

Using OSG

› Before

universe = vanilla

executable = myjob

log = myjob.log

queue

Page 8: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc.edu.

chtc.cs.wisc.edu

Using OSG

› After

universe = vanilla

executable = myjob

log = myjob.log

+WantGlidein = true

queue

Page 9: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc.edu.

chtc.cs.wisc.edu

Challenge: Opportunistic

› OSG computers go away without notice

› Solutions Condor restarts automatically Sub-hour jobs Self-checkpointing Automated checkpointing

• Condor’s standard universe

• DMTCPhttp://dmtcp.sourceforge.net/

Page 10: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc.edu.

chtc.cs.wisc.edu

Challenge: Local Software

Page 11: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc.edu.

chtc.cs.wisc.edu

Challenge: Local Software

› Bare-bones Linux systems

› Solution Bring everything with you CHTC provided MATLAB and R packages

• RunDagEnv/mkdag

Page 12: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc.edu.

chtc.cs.wisc.edu

Challenge: Erratic Failures

› Complex systems fail sometimes

› Solution Expect failures and automatically

retry DAGMan for retries DAGMan POST scripts to detect

problems• RunDagEnv/mkdag

Page 13: Open Science Grid: More compute power Alan De Smet chtc@cs.wisc.edu.

chtc.cs.wisc.edu

Challenge: Bandwidth

› Solutions Only send what you need Store large, shared files in our web

cache Read small amounts of data on the fly

• Condor’s standard universe• Parrot

http://www.cse.nd.edu/~ccl/software/parrot/


Recommended