+ All Categories
Home > Technology > Practical Guidelines for Moab Stacks

Practical Guidelines for Moab Stacks

Date post: 19-Jul-2015
Category:
Upload: insidehpc
View: 191 times
Download: 4 times
Share this document with a friend
25
© 2013 ADAPTIVE COMPUTING, INC. 1 Practical Guidelines for Highly Available Moab Stacks Daniel Hardman, Chief Solutions Architect @dhh1128 ~ http://codecraft.co ~ http://gplus.to/danielhardman ~ http://lnkd.in/z7PTAR [email protected] April 2013
Transcript
Page 1: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 1

Practical Guidelines for Highly Available Moab Stacks

Daniel Hardman, Chief Solutions Architect

@dhh1128 ~ http://codecraft.co ~ http://gplus.to/danielhardman ~ http://lnkd.in/z7PTAR

[email protected]

April 2013

Page 2: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 2 © 2013 ADAPTIVE COMPUTING, INC. 2

The Goal of HA

…NOT! :-)

Page 3: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 3 © 2013 ADAPTIVE COMPUTING, INC. 3

The real goals of HA

▪  Eliminate or reduce “downtime” for running jobs

▪  Eliminate or reduce “downtime” for new submissions

▪  Make failovers visible and manageable ▪  Satisfy regulatory requirements ▪  Preserve audit trail

Page 4: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 4 © 2013 ADAPTIVE COMPUTING, INC. 4

HA is constrained by time, money

How much are you willing to spend to tolerate: ▪  A power outage? ▪  A software crash? ▪  A hacker from unit 61398 in Shanghai? ▪  The Chelyabinsk meteor? ▪  The Chicxulub meteor that wiped out the

dinosarus?

Page 5: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 5 © 2013 ADAPTIVE COMPUTING, INC. 5

What is “downtime”?

0 – hardware failure

+3 min – usable, but very slow

-30 min – last checkpoint

+10 min – full restore

Page 6: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 6 © 2013 ADAPTIVE COMPUTING, INC. 6

4 Basic Recipes

▪  Simple built-in HA ▪  Standard pairwise HA ▪  Shared pairwise HA ▪  Advanced HA

Page 7: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 7

Recipe 1: simple, built-in HA

Page 8: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 8 © 2013 ADAPTIVE COMPUTING, INC. 8

Simple, built-in HA

▪  hot ~ warm (daemons idle on fallback svr)

▪  Moab, TORQUE ▪  shared file system, synced clocks, two daemons,

last mod date on semaphore

▪  MAM ▪  DB replication, primary and fallback server

Page 9: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 9 © 2013 ADAPTIVE COMPUTING, INC. 9

Sample deployment (simple, built-in HA)

Page 10: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 10 © 2013 ADAPTIVE COMPUTING, INC. 10

Pros and cons (simple, built-in HA)

▪  Pros ▪  Fast and easy to set up ▪  Minimal learning curve

▪  Cons ▪  Doesn’t protect the solution DB, MWS, Viewpoint ▪  Depends on synchronized clocks, reliable

propagation of file metadata in shared fs ▪  Risk of false triggers ▪  Shared FS may be single point of failure,

depending on how it’s implemented

Page 11: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 11

Recipe 2: standard, pairwise HA

Page 12: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 12 © 2013 ADAPTIVE COMPUTING, INC. 12

Standard, pairwise HA

▪  Twin headnodes (all daemons) ▪  hot ~ cold (daemons inert on fallback svr) ▪  Heartbeat, redhat clustering ▪  Replicated FS (DRBD)

Page 13: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 13 © 2013 ADAPTIVE COMPUTING, INC. 13

Sample deployment (standard, pairwise HA)

Page 14: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 14 © 2013 ADAPTIVE COMPUTING, INC. 14

Pros and cons (standard, pairwise HA)

▪  Pros ▪  All services fail over the same way ▪  Heartbeat is robust, well understood ▪  FS can’t be a single point of failure

▪  Cons ▪  Some vulnerability to “split brain” scenario ▪  More learning curve ▪  More complexity than simple, built-in HA

Page 15: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 15

Recipe 3: shared, pairwise HA

Page 16: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 16 © 2013 ADAPTIVE COMPUTING, INC. 16

Shared, pairwise HA

▪  Twin headnodes (all daemons) ▪  hot ~ warm (some daemons inert, some

idle on fallback svr) ▪  Heartbeat, redhat clustering ▪  DB failover ▪  Shared FS (e.g., GFS2)

Page 17: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 17 © 2013 ADAPTIVE COMPUTING, INC. 17

Sample deployment (shared, pairwise HA 1)

Page 18: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 18 © 2013 ADAPTIVE COMPUTING, INC. 18

Sample deployment (shared, pairwise HA 2)

Page 19: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 19 © 2013 ADAPTIVE COMPUTING, INC. 19

Pros and cons (shared, pairwise HA)

▪  Pros ▪  Solves “split brain” scenario ▪  May have slightly lower latency

▪  Cons ▪  Greater learning curve ▪  More complexity

Page 20: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 20

Recipe 4: advanced HA

Page 21: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 21 © 2013 ADAPTIVE COMPUTING, INC. 21

Advanced HA

▪  Each service (potentially) split onto dedicated box

▪  Daemons are paired and fail over with heartbeat, redhat clustering

▪  DB failover ▪  Replicated or shared FS

Page 22: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 22 © 2013 ADAPTIVE COMPUTING, INC. 22

Advanced HA

This is less of a recipe, and more of a general pattern. Each unique server role has to have N-way redundancy. Complexity of config is high; we recommend involvement of professional services.

Page 23: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 23 © 2013 ADAPTIVE COMPUTING, INC. 23

Pros and cons (advanced HA)

▪  Pros ▪  Can meet very aggressive SLAs ▪  Can be tailored and fine-tuned

▪  Cons ▪  Major implementation effort ▪  Requires sophisticated learning and monitoring

Page 24: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 24 © 2013 ADAPTIVE COMPUTING, INC. 24

General Observations

▪  Important to audit ▪  Super-fast failover not a goal in our

recipes ▪  Security implications ▪  Not perf enhancer ▪  Not scalability enhancer ▪  Not DR

Page 25: Practical Guidelines for Moab Stacks

© 2013 ADAPTIVE COMPUTING, INC. 25 © 2013 ADAPTIVE COMPUTING, INC. 25

More Info

Whitepaper now available. Email me ([email protected]) for a copy, or download from /documents/ha-moab-cloud-hpc.pdf. Documentation for Hopper release includes a new HA task guide for simple, built-in HA configuration.


Recommended