+ All Categories
Home > Technology > Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

Date post: 22-Jan-2018
Category:
Upload: rafael-ferreira-da-silva
View: 260 times
Download: 0 times
Share this document with a friend
23
USING SIMPLE PID CONTROLLERS TO PREVENT AND MITIGATE FAULTS IN SCIENTIFIC WORKFLOWS Rafael Ferreira da Silva 1 , Rosa Filgueira 2 , Ewa Deelman 1 , Erola Pairo-Castineira 3 , Ian Michael Overton 4 , Malcolm Atkinson 5 11 th Workflows in Support of Large-Scale Science (WORKS’16) Salt Lake City, UT – November 14 th , 2016 1 USC Information Sciences Institute 2 British Geological Survey, Lyell Centre 3 MRC Institute of Genetics and Molecular Medicine, University of Edinburgh 4 Usher Institute of Population Health Sciences and Informatics, University of Edinburgh 5 School of Informatics, University of Edinburgh
Transcript
Page 1: Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

USING SIMPLE PID CONTROLLERS TO PREVENT AND MITIGATE FAULTS IN SCIENTIFIC WORKFLOWS

Rafael Ferreira da Silva1, Rosa Filgueira2, Ewa Deelman1, Erola Pairo-Castineira3, Ian Michael Overton4, Malcolm Atkinson5

11th Workflows in Support of Large-Scale Science (WORKS’16)Salt Lake City, UT – November 14th, 2016

1USC Information Sciences Institute2British Geological Survey, Lyell Centre3MRC Institute of Genetics and Molecular Medicine, University of Edinburgh4Usher Institute of Population Health Sciences and Informatics, University of Edinburgh5School of Informatics, University of Edinburgh

Page 2: Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

OUTLINE

Introduction PID Controllers Defining Controllers

Experimental Evaluation Tuning PID Controllers

Scientific WorkflowsMotivationRelated Work

DefinitionControl System Loop

Data ManagementMemory Management

Ziegler-Nichols MethodExperimental Evaluation

Workflow ApplicationExperiments ConditionsResults and Discussion

R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, M. AtkinsonUsing Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows 2

SummaryConclusionsFuture Research Directions

Page 3: Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

3

WHY SCIENTIFIC WORKFLOWS?

>>>AutomationEnables parallel, distributed computations

Automatically executes data transfers

ReproducibilityReusable, aids reproducibilityRecords how data was produced (provenance)

Automate

Recover

Debug

Recover & DebugHandles failures with to provide reliabilityKeeps track of data and files

R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, M. AtkinsonUsing Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

Page 4: Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

4

MOTIVATION

R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, M. AtkinsonUsing Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

Gridcomputing180kfailedtasksoutof340k

TheFailureTraceArchive26datasetsfrom

2006-2014

CMS(Aug2014)385kfailedtasksoutof790k

Mira(2014)over14kfailedjobsoutof80k

Page 5: Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

Typical ApproachesTask Retries

Task ResubmissionTask ClusteringCheckpointing

Provenance…

Statistical and Machine LearningLinear RegressionNeural NetworksClassification AlgorithmsTree-based MethodsSupport Vector Machines…

OthersException HandlingGame Theory…

Analytical SolutionsFailure Modeling

Markov ChainsPrincipal Component Analysis

Histograms…

5

SOME APPROACHES TO HANDLE FAULTS

R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, M. AtkinsonUsing Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

Page 6: Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

… AND SOME OF THEIR LIMITATIONS

6

< < <

Most of the systems do not prevent faults, but mitigate them

R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, M. AtkinsonUsing Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

Some approaches may overload the execution platform

Most of them make strong assumptions about resource and application characteristicsAccurate estimates of such requirements are still a steep challenge

Some approaches are tied to a smallset of applications

We seek for an approach to predict, prevent, and mitigate failures in end-to-end workflow executions across distributed systems under online and unknown conditions

Page 7: Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

7

PID CONTROLLERS

Proportional-Integral-Derivative ControllerControl loop mechanismWidely used in industrial control systems

- Temperature- Pressure- Flow rate- etc.

PID controller aims at detecting the possibility of a fault far enough in advance so that an action can be performed to prevent it from happening

PID

ΣΣ ProcessOutputSetpoint Input+

-

+

+

+

z

0.0

0.5

1.0

1.5

TimeDead time

Raise time

Percent overshoot

Settling time

Steady-state error

<R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, M. Atkinson

Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

Page 8: Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

8

PROCESS VARIABLES

proportional

integral

Present error

Accumulation of past errors

PIDProportional-Integral-Derivative Controller

R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, M. AtkinsonUsing Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

derivativePrediction of future errors based on current rate of change

Kp: Proportional gain constantKi: Integral gain constantKd: Derivative gain constante: error defined as the difference between the setpoint and the process variable value

Page 9: Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

P: the error between the setpoint, and the actual used disk space

I: cumulative value of the proportional responses

D: the difference between the current and the previous disk overflow (or underutilization) error values

A run of scientific workflows that manipulate

large data sets may lead the system to an out of disk space fault

u(t) < 0: data cleanup is used to remove unused data; or tasks are preempted

u(t) > 0: the number of concurrent taskexecutions may be increased

9

DATA FOOTPRINT AND MANAGEMENT

R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, M. AtkinsonUsing Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

PID Controller

Actions

Page 10: Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

P: error between the setpoint value, and the actual memory usage

I: cumulative value of previous memory usage errors

D: difference between the current and the previous memory overflow (or underutilization) error values

The performance of memory-intensive operations are often limited by the memory capacity of the

resource where the application is being executed.

u(t) < 0: tasks are preempted to prevent the system to run out of memory

u(t) > 0: the WMS may spawn additional tasks for concurrent execution

10

MEMORY USAGE AND MANAGEMENT

R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, M. AtkinsonUsing Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

PID Controller

Actions

Page 11: Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

1000 GENOME SEQUENCING ANALYSIS WORKFLOW

Identifies mutational overlaps using data from the 1000 genomes project

22 Individual tasks, 7 Population tasks, 22 Sifting tasks, 154 Pair Overlap Mutations tasks, and 154 Frequency Overlap Mutations tasks (Total 359 tasks)

11

WORKFLOW APPLICATION>

The workflow consumes/produces over 4.4TB of data, and requires over 24TB of memory

...c1 c2 c22 ...s1 s2 s22...p1 p2 pn

... fc 2505fc 1 fs 3fp 1 fp 2 fp n...

...m1 m2 m154 ...fr1 fr2 fr154

i 3 pop 2 sh 3

om 1

Data Preparation

Populations Sifting

Individuals1000 Genome Populations Sifting

PairOverlap

Mutations

Individuals

Analysis

ofm 1

Input Data

Output Data fom 2 fog 2

FrequencyOverlap

Mutations

https://github.com/pegasus-isi/1000genome-workflow

R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, M. AtkinsonUsing Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

Page 12: Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

EXPERIMENT SETUP

12

Sharedfilesystem

ComputeNode2

ComputeNode1

capacity 500GB

Workflow Management SystemPIDControlLoop

R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, M. AtkinsonUsing Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

Memory

Memory

Disk

shared memory

Page 13: Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

We assume Kp = Ki = Kd = 1(no tuning)

we arbitrarily define our setpoint as 80% of the

maximum total capacity (for both storage and memory usage, and a steady-state

error of 5%

the decision on the number of tasks to be scheduled or

preempted is computed as the min between the response

value of the unique disk usage PID controller, and the memory

PID controller per resource

13

EXPERIMENT CONDITIONS

R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, M. AtkinsonUsing Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

Computed offline under known conditions

Averaged Makespan: ~106h(standard deviation < 5%)

Reference Workflow Execution

Execution are performed under online and unknown conditions

Page 14: Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

14

OVERALL MAKESPAN EVALUATIONEVALUATION

Average workflow makespan for different configurations of the controllers

Proportional

only mitigates faults

R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, M. AtkinsonUsing Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

Kp = 1, Ki = Kd = 0

Makespan: 138.76hSlowdown: 1.30

Proportional-Integral

Kp = Ki = 1, Kd = 0

Makespan: 126.69hSlowdown: 1.19

Proportional-Integral-Derivative

Kp = Ki = Kd = 1

Makespan: 114.96hSlowdown: 1.08prevents faults

Page 15: Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

15

EXPERIMENTS: DATA FOOTPRINT

R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, M. AtkinsonUsing Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

PRO

POR

TIO

NA

LPR

OPO

RTI

ON

AL

INTE

GR

AL

DER

IVAT

IVE

PRO

POR

TIO

NA

LIN

TEG

RA

L

Page 16: Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

16

EXPERIMENTS: DATA FOOTPRINT

R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, M. AtkinsonUsing Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

PRO

POR

TIO

NA

L

This process occurs at about 4h, and performs

more than 6,000 preemptions

Page 17: Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

17

EXPERIMENTS: MEMORY USAGE

R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, M. AtkinsonUsing Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

PRO

POR

TIO

NA

LPR

OPO

RTI

ON

AL

INTE

GR

AL

DER

IVAT

IVE

PRO

POR

TIO

NA

LIN

TEG

RA

L

Page 18: Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

18

EXPERIMENTS: MEMORY USAGE

R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, M. AtkinsonUsing Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

PRO

POR

TIO

NA

L

only a few tasks (on average less than 5) are preempted due to

memory overflow

Page 19: Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

19

OVERALL RESULTSD

ATA

FOO

TPR

INT

MEM

ORY

USA

GE

R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, M. AtkinsonUsing Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

Page 20: Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

TUNING PID CONTROLLERS

Execution EnvironmentThe goal of tuning a PID loop is to make it stable, responsive, and to minimize overshooting

20

Ziegler-Nichols Method

>

1. Turn the PID controller into a P controller by setting Ki = Kd = 0. Initially, Kp is also set to zero

2. Increase Kp until there are sustained oscillations in the signal. This Kp value is the ultimate gain, Ku

3. Measure the ultimate (or critical) period Tu of the sustained oscillations

R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, M. AtkinsonUsing Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

Tuned gain parameters

Ziegler-Nichols tuning, using the oscillation method

Page 21: Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

21

TUNED GAIN PARAMETERS

R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, M. AtkinsonUsing Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

MEM

ORY

USA

GE

DAT

A FO

OTP

RIN

T

Avg. Makespan: 107.37h Avg. Slowdown: 1.01Preempted Tasks: 18Cleanup Tasks: 1

The key factor of its success is due to the specialization of the controllers to a single application

Page 22: Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

>

SUMMARY

Summary ConclusionsConclusionFuture Research Directions

Experimental results show that faults are detected and prevented before their occur, leading workflow execution to its completion with acceptable performance

PID controllers should be used sparingly, and metrics (and actions) should be defined in a way that they do not lead the system to an

inconsistent state

We will investigate the simultaneous use of multiple control loops at the application and infrastructure levels, to determine to which extent this approach may negatively impact the system

22R. Ferreira da Silva, R. Filgueira, E. Deelman, E. Pairo-Castineira, I. M. Overton, M. AtkinsonUsing Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

Future Research Directions

Page 23: Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Workflows

USING SIMPLE PID CONTROLLERS TO PREVENT AND MITIGATE FAULTS IN SCIENTIFIC WORKFLOWS

Rafael Ferreira da Silva, Ph.D.Research Assistant ProfessorDepartment of Computer ScienceUniversity of Southern [email protected] – http://rafaelsilva.com

Thank You

Questions?

http://pegasus.isi.edu


Recommended