+ All Categories
Home > Documents > Moab: Intelligent Workload Management

Moab: Intelligent Workload Management

Date post: 11-Feb-2022
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
21
© 2012 ADAPTIVE COMPUTING, INC. 1 Moab: Intelligent Workload Management Colin Whitbread [email protected] 10/05/2012
Transcript
Page 1: Moab: Intelligent Workload Management

© 2012 ADAPTIVE COMPUTING, INC.1

Moab: Intelligent Workload Management

Colin Whitbread

[email protected] 10/05/2012

Page 2: Moab: Intelligent Workload Management

© 2012 ADAPTIVE COMPUTING, INC.2 © 2012 ADAPTIVE COMPUTING, INC.2

Intelligent Workload Management with Moab

2

Moab optimizes and automates by:

Unifying resource and workload information

Automating intelligent decisions through policies

Making and considering future commitments

Optimizing workload placement over time and resources

Modifying workload and resources for optimal performance

QueryWorkload, Resources, and State

Apply Policies to Control

Workloads

Resource Managers

Moab HPC Suite

Resources

Page 3: Moab: Intelligent Workload Management

© 2012 ADAPTIVE COMPUTING, INC.3

Moab HPC Suite - Enterprise Edition Architecture

3

Moab Intelligence EngineDecisions, policies, resource scheduling, allocation, orchestration

Portal

Accounting

Moab HPC Suite™ - Enterprise Edition

Management Integration (Moab Services Manager, Web Services)

User Self-Service Admin Dashboard

Users Administrators

Resources

Multi-Vendor Resource ManagersStorage

ManagerProvisioning

Manager

Job Manager(TORQUE) Network

ManagerHealthMonitor

Non-traditionalResource Managers

Other Resources: Licenses Chillers NFS Servers Database Servers Equipment Etc.

Heterogeneous Resources

CLI

Workload Job Queue

Page 4: Moab: Intelligent Workload Management

© 2012 ADAPTIVE COMPUTING, INC.4 © 2012 ADAPTIVE COMPUTING, INC.4

TORQUE is a descendant of PBS (Portable Batch System) developed 1993-1994

▪ What was happening with computers in 1993 and 1994

1993 - Commercial providers allowed to sell Internet connections to individuals

1993 - Intel releases P5-based Pentium processors with 60 MHz and 66 MHz versions

1993 - Novell purchased Digital Research, DR-DOS

1993 - Windows NT 3.1 released which supported 32-bit programs

1993 – MS-DOS 6.0 released

1994 – Intel releases 90 MHz and 100 MHz Pentium Processors

1994 – Motorola releases the 68060 processor

1994 - Linus Torvalds releases version 1.0 of Linux Kernel.

1994 – IBM releases PC-DOS 6.3

TORQUE 4.0 history lesson

4

Page 5: Moab: Intelligent Workload Management

© 2012 ADAPTIVE COMPUTING, INC.5 © 2012 ADAPTIVE COMPUTING, INC.5

Program practices of 1993-1994

Memory limited

Keep code size small

Slow and unreliable network connections

No threads for Unix or Linux

Disk storage slow

TORQUE 4.0 history lesson

Page 6: Moab: Intelligent Workload Management

© 2012 ADAPTIVE COMPUTING, INC.6 © 2012 ADAPTIVE COMPUTING, INC.6

▪ Scalability

▪ Support 10s of thousands of hosts in a cluster

▪ Support jobs using 10s of thousands of hosts

▪ Improved Communications

▪ Higher Throughput and Response

▪ Improved Reliability

▪ Enhanced Security

TORQUE 4.0 Goals

6

Scalable

Reliable

Secure

Responsive Enterprise-ReadyT O R Q U E

Page 7: Moab: Intelligent Workload Management

© 2012 ADAPTIVE COMPUTING, INC.7 © 2012 ADAPTIVE COMPUTING, INC.7

▪ Problem: all sister nodes communicate directly with Mother Superior

▪ Causes communication failures when a node becomes saturated

▪ Jobs are needlessly lost

▪ Communication bottlenecks at single point

Job Radix

MOAB pbs_server

MOM

MOM

MOM

MOM

MOMsuper

MOM

MOM

MOM

MOM

MOM

MOM

MOM

!

Page 8: Moab: Intelligent Workload Management

© 2012 ADAPTIVE COMPUTING, INC.8 © 2012 ADAPTIVE COMPUTING, INC.8

Job Radix Solution: $ qsub script.sh -W job_radix=3 States that each MOM should communicate directly with 3 other MOMs

No longer a single stress point, each node communicates with a maximum of 3 other MOMs

Scalability – Job Radix

Page 9: Moab: Intelligent Workload Management

© 2012 ADAPTIVE COMPUTING, INC.9 © 2012 ADAPTIVE COMPUTING, INC. | CONFIDENTIAL

Scalability – Job Radix=3

9

Mother Superior

SisterSister Sister

Sister Sister Sister Sister Sister Sister Sister Sister Sister

Page 10: Moab: Intelligent Workload Management

© 2012 ADAPTIVE COMPUTING, INC.10 © 2012 ADAPTIVE COMPUTING, INC.10

MOM Hierarchy

Goals

Improve efficiency of MOM to server update communications

Reduce network congestion

Improve cluster status reliability

No more MOMS going down when nothing is wrong

Page 11: Moab: Intelligent Workload Management

© 2012 ADAPTIVE COMPUTING, INC.11 © 2012 ADAPTIVE COMPUTING, INC.11

MOM Hierarchy

<TORQUE_HOME>server_priv/mom_hierarchy Specifies where each MOM should send status updates. The updates get

propagated from there to pbs_server.

Self-healing – Describes how retries should happen when the preferred connection can't be obtained.

Requires the server to process larger amounts of data, but with a lower number of connections, resulting in better performance.

Administrator can set up hierarchy to mirror actual network topology, resulting in optimum performance.

Page 12: Moab: Intelligent Workload Management

© 2012 ADAPTIVE COMPUTING, INC.12 © 2012 ADAPTIVE COMPUTING, INC.12

MOM Hierarchy

Sample file:

<path>

<level>node0,node1,node2</level>

<level>node3,node4,node5</level>

</path>

<path>

...

Note: You can specify multiple paths. Not all nodes need to be in all paths but nodes can be in multiple paths if desired.

Page 13: Moab: Intelligent Workload Management

© 2012 ADAPTIVE COMPUTING, INC.13 © 2012 ADAPTIVE COMPUTING, INC.13

MOM Hierarchy

pbs_server

MOM0MOM1 MOM2

MOM3MOM4 MOM5

MOM6MOM7 MOM8

Page 14: Moab: Intelligent Workload Management

© 2012 ADAPTIVE COMPUTING, INC.14 © 2012 ADAPTIVE COMPUTING, INC. | CONFIDENTIAL

TORQUE Throughput

14

Refactor internal structuresPass only used attributes.

Reduces memory footprint

Remove linked lists – replace with hash maps and arrays.

Replace DIS (Data is String).•45% of time spent encoding and decoding strings.

•Requires several function calls to send 1 character

Remove redundant code.7 calls to getenv() in qsub

Page 15: Moab: Intelligent Workload Management

© 2012 ADAPTIVE COMPUTING, INC.15

Moab HPC Suite - Enterprise Edition Architecture

15

Moab Intelligence EngineDecisions, policies, resource scheduling, allocation, orchestration

Portal

Accounting

Moab HPC Suite™ - Enterprise Edition

Management Integration (Moab Services Manager, Web Services)

User Self-Service Admin Dashboard

Users Administrators

Resources

Multi-Vendor Resource ManagersStorage

ManagerProvisioning

Manager

Job Manager(TORQUE) Network

ManagerHealthMonitor

Non-traditionalResource Managers

Other Resources: Licenses Chillers NFS Servers Database Servers Equipment Etc.

Heterogeneous Resources

CLI

Workload Job Queue

Page 16: Moab: Intelligent Workload Management

© 2012 ADAPTIVE COMPUTING, INC.16 © 2012 ADAPTIVE COMPUTING, INC. | CONFIDENTIAL

Moab - Scalability

16

Objects cached within Moab:- Jobs- Nodes- Reservations- Triggers

Real-time status

Page 17: Moab: Intelligent Workload Management

© 2012 ADAPTIVE COMPUTING, INC.17 © 2012 ADAPTIVE COMPUTING, INC. | CONFIDENTIAL

Moab - Scalability

17

Threaded client commands

More dynamic limits, fewer static limits

Per-partition scheduling

Reduced memory footprint

Page 18: Moab: Intelligent Workload Management

© 2012 ADAPTIVE COMPUTING, INC.18 © 2012 ADAPTIVE COMPUTING, INC.18

Moab

Moab® HPC Suite – Grid Option

Extends Moab HPC Suite to Grid Functionality

▪ Flexibility in grid structure, management and control

▪ Unified grid management

▪ Maintains sovereignty while enabling sharing

▪ Improved user collaboration and productivity

▪ Increased system throughput and utilization

18

Page 19: Moab: Intelligent Workload Management

© 2012 ADAPTIVE COMPUTING, INC.19 © 2012 ADAPTIVE COMPUTING, INC.19

Flexibility: Grid Management Options

▪ Centralized and/or local management

▪ Local management – “peer-to-peer”

19

Moab Head NodeAll Rules for Grid Environment or Shared Grid Rules

MoabLocal Rules

MoabLocal Rules

MoabLocal Rules

MoabLocal Rules

MoabLocal Rules

Page 20: Moab: Intelligent Workload Management

© 2012 ADAPTIVE COMPUTING, INC.20

Page 21: Moab: Intelligent Workload Management

© 2012 ADAPTIVE COMPUTING, INC.21 © 2012 ADAPTIVE COMPUTING, INC.21

▪ New authorization daemon (trqauthd) gives administrators better control over users

▪ Eliminated security loophole

▪ Users cannot run jobs as another user

▪ trqauthd runs as “root” (replaced pbs_iff with “sticky” bit)

Enhanced Client and Security Management

21


Recommended