+ All Categories
Home > Documents > Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of...

Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of...

Date post: 31-Mar-2015
Category:
Upload: scarlett-cordier
View: 215 times
Download: 1 times
Share this document with a friend
Popular Tags:
49
Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor: Prof. dr. ir. Erik Dirkx
Transcript
Page 1: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

Learning Causal Models of Multivariate Systems

and the Value of it for the Performance Modeling of Computer Programs

Jan LemeireDecember 19th 2007

Supervisor: Prof. dr. ir. Erik Dirkx

Page 2: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

2Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Overview

Learning causal models for the performance analysis of programs executed on various computer systems.

Intermezzo I: Causal inference.

Practical deployment of the causal learning algorithms.

Philosophical and theoretical study of causal inference.

Intermezzo II: Kolmogorov Minimal Sufficient Statistics.

The importance of qualitative properties.

Page 3: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

3Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Overview

Learning causal models for the performance analysis of programs executed on various computer systems.

Intermezzo I: Causal Inference.

Practical deployment of the causal learning algorithms.

Philosophical and theoretical study of causal inference.

Intermezzo II: Kolmogorov Minimal Sufficient Statistics

The importance of qualitative properties.

Page 4: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

4Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

CPU

M

CPU

M

CPU

MN

Time

What is Parallel Processing?

Ideally: Speedup = number of processors

Computational work:

Parallel system

Time

Page 5: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

5Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Parallel Overhead

Speedup = 2.55 Overhead = time the processors are not spending on useful work

= lost processor cycles

Time

Page 6: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

6Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Overhead Analysis

ratiosoverhead

processorsofnumberSpeedup

runtime

timeoverheadratioOverhead

1

Impact of overhead on speedup

Page 7: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

7Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Experimental Parallel Performance Analysis: Data Acquisition

Parallel Program

EPPADatabase

EPPA

EPPA instrumentation

library

Executable

Page 8: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

8Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

EPDA: Multivariate Analysis

User-defined variables

Database

EPDA

1.5 2 1842.5 4 8360.9 1 1043

Specify context

Modeling

Causal Model

Curve fitting

Analytical Model

CPT compression

Causal Inference

Derivatives of variables

Visualization

Outlier identification

Augmented Model

Page 9: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

9Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Intermezzo I: Causal Inference

System under study

Data A B C D Eexperiment 1 2 12 0.42 TRUE blueexperiment 2 1 73 1.93 FALSE greenexperiment 3 4 8 0.03 TRUE redexperiment 4 2 27 2.84 TRUE black

E

D

CACausal model

Experiments

B

A

B

C

ED

Page 10: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

10Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Causal Inference for PerformanceAnalysis

Utility based on the following properties:1. Dependency analysis: how variables relate.

2. Markov property.

3. A causal model corresponds to a decomposition.

#op

fclock

#instrop

array size

cache misses

memory

element typeCinstr

element size Cmem

Tcomp

PROGRAM#op

fclock

array size

cache misses

memory

element type

Cinstr

Cmem

Tcomp

#instrop

??element size

PERFORMANCE

Page 11: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

11Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Execution of program gives cache misses

Datastructure

PROGRAMx?

x?

datatype (integer, float, double,…)

data size in Bytes

44

Page 12: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

12Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Markov Property

cache missesdatatype

cache missesdatatype data size

data sizedatatype cache misses

Provides explanationsDifferentiate direct from indirect relations

Correlated

With information about the data size:

Page 13: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

13Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Cannon angle

Can we Observe Causal Relations?

distance~

OK, but: or ???

Page 14: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

14Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

What is Causality?

A causal relation denotes a mechanism, that a variable is `produced’ by its causes.

However… not directly observable.

Causality is a relic of a bygone age

Mmmh

Bertrand RussellJudea Pearl

But: we want to learn something about underlying system (goal of statistics)

Page 15: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

15Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

gunpowder distance~

Page 16: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

16Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

V-structure Property

cannon angle

distance

gunpowder

angle independent from gunpowder

but dependent when distance is known

Page 17: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

17Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

mechC

mechE

mechD

A

B

C

ED

A

B

C

ED

Conditional Independencies Make Causal Inference Possible

From a causal structure follow conditional independencies, irrespective of the mechanisms.– Markov– V-structure

A

B

C

ED

Page 18: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

18Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Graph is a Description of Independencies

Graphical criterion: d-separation– Intuitive

Faithfulness property: independencies independencies in graph in reality

E

D

CA

B

Page 19: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

19Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Causal Structure Learning

D

CBA

In two steps:1. Undirected

graph2. Orientation

C

A B

D

(a)

C

A B

D

A D

(b)

C

A B

D

A C B

C D B(c)

C

A B

D

(d)

C

A B

D

A D

A D B

(e)

C

A B

D

A C

A C B

(f)

Page 20: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

20Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Result

Partially directed acyclic graph

“We know what parts are unknown.”Faithfulness assumption: all independencies follow from the causal structure

Page 21: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

21Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Experimental Results

(1) Automatic learning of accurate performance models

(2) Model validation

(3) Identification of

unexpected dependencies

(4) Explanations for outliers

Contribution 1

Figuur opnieuw in png, zonder losless compression

Page 22: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

22Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Overview

Learning causal models for the performance analysis of programs executed on various computer systems.

Intermezzo I: Causal Inference.

Practical deployment of the causal learning algorithms.

Philosophical and theoretical study of causal inference.

Intermezzo II: Kolmogorov Minimal Sufficient Statistics

The importance of qualitative properties.

Page 23: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

23Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Practical Causal Inference

The following limitations had to be overcome:

Non-linear relations: form-free independence test

Mixture of continuous, discrete and categorical data: general independence test

Deterministic relations: augmented causal model and extended learning algorithms

Page 24: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

24Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Form-Free and General Dependency Test

Mutual information

Example

Kernel density estimation

Pearson:Rxy=0.083 => X and Y linearly independent

I(X;Y)=0.90 bits => dependent

X

Y

X

Y

P(X, Y)

Page 25: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

25Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Deterministic Relations

data sizedatatype cache misses

Data size and data type are information equivalent with respect to cache missesDuring learning connect least complex relation

Page 26: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

26Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Complexity Criterion

Correct models are learned under the Complexity Increase Assumption

Contribution 2a

mech1 mech2

X Y Z

Complexity( X – Z ) ≥ Complexity( X – Y )Complexity( X – Z ) ≥ Complexity( Y – Z )

Page 27: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

27Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Reestablishment of Faithfulness

Consequences are consideredInformation equivalences

Independence and simplicity

D-separation extension

Faithful model: represents all independencies

Contribution 2b

X and Y eq. for A

Information is added to the model Basic information equivalences

Y A

X

Z

Z

Y Z X Y Z XS

Y Z X Y Z Xeq

Dit moet erbij!!Details misschien niet?

Page 28: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

28Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Extension of PC Learning Algorithm

Detection of information equivalences

Among information equivalent relations, the simplest one is chosen

Orientation rules remain the same

Correct models are learned from data containing deterministic relations.

Contribution 2c

Page 29: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

29Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Overview

Learning causal models for the performance analysis of programs executed on various computer systems.

Intermezzo I: Causal Inference.

Practical deployment of the causal learning algorithms.

Philosophical and theoretical study of causal inference.

Intermezzo II: Kolmogorov Minimal Sufficient Statistics

The importance of qualitative properties.

Page 30: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

30Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Page 31: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

31Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Inductive Inference

Occam’s Razor“Among equivalent models

choose the simplest one.”

William of Ockham

Jaartallen van scientists erbij zetten

BUT: Objective measure of complexity?

2.cmE 3. HFmE hyx v

hm

v

ym

v

xm

F

E

...

22yx ddH

c

vvgF

yx22

.

Page 32: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

32Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Kolmogorov Complexity

Andrey Kolmogorov

REPEAT 11 TIMES PRINT "001"

PROGRAM

001001001001001001001001001001001

Universal Turing Machine

Kolmogorov Complexity of a binary string: the length of the shortest program that

computes the string and halts

Page 33: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

33Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Shortest Programs

001001001001001001001001001001001

REPEAT 11 TIMES PRINT "001"

PROGRAM

PRINT "01100011010110 1010111001001101000"

PROGRAM

regularity of repetition allows compression

011000110101101010111001001101000

random information = incompressible

Page 34: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

34Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Randomness versus Regularity

001001001001001001001001001001001

011000110101101010111001001101000Only random information (incompressible)

Kolmogorov Minimal Sufficient Statistics (KMSS): formal separation

Meaningful informationregularities

Accidental information randomness

repetition 11 times, 001

Page 35: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

35Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Learning = finding regularities = maximal compression

regularities

randomStructure of a diamond

Exact size

random

Page 36: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

36Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

ED

CA

B

P(A)P(B)P(C|A, B)P(D|B)

P(E|C, D)

CPDsGRAPH

Meaningful Information of Probability Distributions

meaningful information (Theorem 1)

Kolmogorov Minimal Sufficient Statistic if graph and CPDs are incompressible (Theorem 2)

Contribution 3a

a graph with random CPDs is faithful (Theorem 4)

Joint Probability Distribution

P(A, B, C, D, E)= E

D

CA

B

P(A)P(B)P(C|A, B)P(D|B)P(E|C, D)

CPDsGRAPH

Page 37: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

37Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

A

B

C

ED

mechC

mechE

mechD

A

B

C

ED

Causal Aspect of Causal Models = Decomposition

Canonical decomposition: quasi-unique and minimal decomposition into atomic and independent components (the CPDs)Corresponds to reality (mechanisms)

E

D

CA

B

Page 38: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

38Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Causal Component Relies on Reductionism

When DAG of Bayesian network is a complete graph

no meaningful information holism

The world can be studied in parts.

Or, even more:

The world is made up of indivisible parts.

Figuurtje toevoegen van holisme en reductionisme

E

D

CA

B

Page 39: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

39Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Validity of Causal Inference

Do CPD components correspond to physical mechanisms?

Contribution 3b

Minimal model?Faithful?Other regularities?

How OK is the learned causal model?

Unfaithful

Other regularities

Conformreality

Wrongdecomposition

1 4

Counterexamples from literature

3, 6 2, 5, 7, 8Causal model ≠

minimal model{

Page 40: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

40Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Well-known Example of Unfaithfulness

CB

A

D

CB

A

D

D A

12

’Normally’:A and D correlate

A and D get independent if influences along paths 1 and 2 cancel each other out

Mechanisms are relatedRegularity among them

Page 41: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

41Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Overview

Learning causal models for the performance analysis of programs executed on various computer systems.

Intermezzo I: Causal Inference.

Practical deployment of the causal learning algorithms.

Philosophical and theoretical study of causal inference.

Intermezzo II: Kolmogorov Minimal Sufficient Statistics

The importance of qualitative properties.

Page 42: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

42Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Regularities are Qualitative Properties

Different from quantitative information.

Allow for qualitative reasoning.

Qualitative properties determine behavior.

Page 43: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

43Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Communication Schemes on Network Topologies

Communication Scheme

Network Topology

1 2

3

4

56

7

8

1 2

3

4

56

7

8

Communication time?

Page 44: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

44Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Generic Performance Model

Good predictions for combinations of random schemes and random topologies

Contribution 4a

Scheme1

Topo-logy1

model Tcomm

Scheme2 Scheme3

Topo-logy2

Topo-logy3

Page 45: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

45Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Combinations of Patterns

broadcast shift

Communication Schemes

star ring

Network Topologies

Performance depends onmatch!

Contribution 4b

Page 46: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

46Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Qualitative Properties

Faithfulness: ”graph should describe all independencies”

KMSS: ”model should describe all regularities”

Qualitative information Quantitative information

contains no more regularities

explicitly describe regularities

Page 47: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

47Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Explicitly Mention Qualitative Properties!

Stone

(12,61)

(9,41)

(19,24)

(2,12)

(5,21)

??

(12,61)

(12,61)

(12,61)

(9,41)

(9,41)

Page 48: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

48Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Conclusions

Contribution to performance analysis.

Automatic causal analysis.

Useful add-on in combination with other techniques.

The value of causal inference is underlined.

The importance of regularities or qualitative properties.

Page 49: Learning Causal Models of Multivariate Systems and the Value of it for the Performance Modeling of Computer Programs Jan Lemeire December 19 th 2007 Supervisor:

49Causal Inference & Performance Analysis

Pag.Jan Lemeire / 49

Future Work

Application of the learned performance models for optimization.

Is the failure of generic performance models only due to regularities?

Augment models with qualitative properties.

But: how define, recognize and reason with regularities?


Recommended