Redundancy and synergy in dynamical systems

transcript

Synergy and redundancy in dynamical systems:towards a practical and operative definition

Daniele Marinazzo1 Luca Faes 2 Sebastiano Stramaglia 3

1Ghent University, Belgium2Fondazione Bruno Kessler, Italy

3University of Bari and INFN, Italy

December 16, 2016

7 @dan marinazzohttp://users.ugent.be/~dmarinaz/

Marinazzo, Faes, Stramaglia Synergy and redundancy in dynamical systems

Granger causality to recover dynamical networks

Context

Two time series X and Y

x , the future values of X

Operative definition, Wiener 1956, Granger 1969

Y is cause of X if the knowledge of Y allows to make more precisepredictions about x

Context

GC in multivariate datasets: a well-known issue

Condition GC estimation to the effect of other variables, to avoidfalse positivesSeveral proposed approaches, starting from Geweke et al 1984

Condition GC estimation to the effect of other variables, to avoidfalse positives

Several proposed approaches, starting from Geweke et al 1984

Condition GC estimation to the effect of other variables, to avoidfalse positivesSeveral proposed approaches, starting from Geweke et al 1984

Granger causality: definition

Predictive model of a multivariate system

n time series {xα(t)}α=1,...,n,state vectors

Xα(t) = (xα(t −m), . . . , xα(t − 1)) ,

m order of the model

Conditioned Granger Causality

δmv (β → α) = logε (xα|X \ Xβ)

ε (xα|X)

Pairwise Granger Causality

δbv (β → α) = logε (xα|Xα)

ε (xα|Xα,Xβ)

Xα(t) = (xα(t −m), . . . , xα(t − 1)) ,

ε (xα|X)

ε (xα|Xα,Xβ)

Xα(t) = (xα(t −m), . . . , xα(t − 1)) ,

ε (xα|X)

ε (xα|Xα,Xβ)

Granger causality and Transfer entropy

GC and TE are equivalent for Gaussian variables and otherquasi-Gaussian distributions(Barnett et al 2009, Hlavackova-Schindler 2011, Barnett andBossomaier 2012)

In this case they both measure information transfer.

Unified approach (model based and model free)

Mathematically more treatable

False positives in pairwise GC

Ten unidirectionally coupled noisy logistic maps, withx1(t) = f (x1(t − 1)) + 0.01η1(t), andxi (t) = (1− ρ)f (xi (t − 1)) + ρf (xi−1(t − 1)) + 0.01ηi (t), withi = 2, . . . , 10, η Gaussian noise terms, coupling ρ, f (x) = 1− 1.8x2

Stramaglia, Cortes and Marinazzo, New Journal of Physics 2014

False negatives in pairwise GC due to synergy

Three unit variance iid Gaussian noise terms x1, x2 and x3. Let

x4(t) = 0.1(x1(t − 1) + x2(t − 1)) + ρx2(t − 1)x3(t − 1) + 0.1η(t)

.x2 is a suppressor variable for x3 w.r.t. the influence on x4

Redundancy due to a hidden source

h(t) hidden Gaussian variable, influencing n variablesxi (t) = h(t − 1) + sηi (t), and w(t) = h(t − 2) + sη0(t) influencedby h but with a larger delay, s is the noise level.

Redundancy due to synchronization

Multiplet of logistic maps {xi}, i = 1, . . . , 4,:xi (t) = (1− ρ)f (xi (t − 1)) + ρ

∑4j=1,j 6=i f (xj(t − 1)) + 0.01ηi (t),

and x5(t) =∑4

i=1xi (t−1)

8 + η5(t),where η are unit variance Gaussian noise terms, coupling ρ.

multiplet to x5 multiplet to multiplet

Partial conditioning

Conditioned Granger Causality (CGC)

ε (xα|X)

Pairwise Granger Causality (PWGC)

ε (xα|Xα,Xβ)

Partially conditioned Granger causality (PCGC)

δYc (β → α) = logε (xα|Xα,Y)

ε (xα|Xα,Xβ,Y)

Conditioned Granger Causality (CGC)

ε (xα|X)

Pairwise Granger Causality (PWGC)

ε (xα|Xα,Xβ)

Partially conditioned Granger causality (PCGC)

ε (xα|Xα,Xβ,Y)

Fix a subset Y of the variables in X, excluding Xα and Xβ

Partially conditioned Granger causality

ε (xα|Xα,Xβ,Y)

Strategy 1, Information-Based (IB)

Y maximizes the mutual information I{Xβ;Y} among all thesubsets of nd variables

Strategy 2, Pairwise-Based (PB)

Select Y = {Xγ}ndγ=1 as the nd variables with the maximal pairwiseGC δbv (γ → α) w.r.t. that target node, excluding Xβ

ε (xα|Xα,Xβ,Y)

Information-based partial conditioning

Given the previous Yk−1 , the set Yk is obtained adding thevariable with greatest information gain

This is repeated until nd variables are selected

Marinazzo et al. Comput. Mat. Methods Med. 2012, Wu et al. Brain Connectivity 2013

Pairwise-based conditioned Granger causality

CGC performs poorly in presence of redundancy

Partial conditioning does not solve redundancy

Information about redundancy can be extracted from PWGC

Proposed approach

Some links inferred from PWGC are retained and added tothose obtained by CGC

The PWGC links that are discarded are those that can bederived as indirect links from the CGC pattern

Interim summary on partial conditioning

Synergy

The search for synergetic contributions in information flow isequivalent to the search for suppressors

PWGC bad, CGC ok, PCGC even better if the selectionstrategy succeeds in picking the suppressors

Information-based PCGC better with redundancy

Pruning-based PCGC better in tree-like structures

Redundancy

Bad for CGC, and not solvable

Indirect connections of CGC from PWGC links

Links not explained as indirect connections (redundant) aremerged into CGC

Synergy and redundancy

Pairwise information measures are commonly agreed upon(e.g. mutual information)

Shannon’s information theory does not fit multivariateinformation measures dealing with the notions of synergy andredundancy (Williams, Beer, Lizier, Wibral, Faes, Barrett)

All the proposed partial information decompositions, in theGaussian case, lead to the following (undesirable) results: (i)redundancy is the minimum of MI between the target andeach source (ii) synergy is the extra information provided bythe weaker source when the stronger source is known (Barrett,PRE 2015)

Joint information

Let’s go for an operative and practical definition

Relation (B and C) → A

synergy: (B and C) contributes to A with more informationthan the sum of its variables

redundancy: (B and C) contributes to A with less informationthan the sum of its variables

Joint information

Generalization of GC for sets of driving variables

Conditioned Granger Causality in a multivariate system

δX(B → α) = logε (xα|X \ B)

ε (xα|X)

Unnormalized version

δuX(B → α) = ε (xα|X \ B)− ε (xα|X)

An interesting property

If {Xβ}β∈B are statistically independent and their contributions in

the model for xα are additive, then δuX(B → α) =∑β∈B

δuX(β → α).

We remark that this property does not hold for the standarddefinition of Granger causality neither for entropy-rootedquantities, due to the presence of the logarithm.

ε (xα|X)

δuX(β → α).

ε (xα|X)

δuX(β → α).

Question from the audience:

What does it ever mean to have an unnormalized measure ofGranger causality?

Don’t you lose any link with information?

Question from the audience:

What does it ever mean to have an unnormalized measure ofGranger causality?

Don’t you lose any link with information?

Define synergy and redundancy in this framework

Synergy

δuX(B → α) >∑β∈B δ

uX\B,β(β → α)

Redundancy

δuX(B → α) <∑β∈B δ

uX\B,β(β → α)

Pairwise syn/red index

ψα(i , j) = δuX\j(i → α)− δuX(i → α)

= δuX({i , j} → α)− δuX(i → α)− δuX(j → α)

Stramaglia et al. IEEE Trans Biomed. Eng. 2016

Synergy

uX\B,β(β → α)

Redundancy

uX\B,β(β → α)

Synergy

uX\B,β(β → α)

Redundancy

uX\B,β(β → α)

ψ as cumulant expansion of the prediction error

ε (xα|Xα)− ε (xα|X) =∑B⊂X

The Moebius inversion formula allows to reconstruct S(B). Calling |nB | and |nΓ| the number of variables in thesubsets B and Γ respectively, and exploiting also the relation:

∑Γ⊂B

(−1)|nΓ| = 0,

leads to the cumulant expansion:

S(B) =∑Γ⊂B

(−1)|nB |+|nΓ| δuB (Γ→ α).

The first order cumulant is thenS(i) = δ

ui (i → α),

the second cumulant isS(i, j) = δ

uij ({ij} → α)− δuij (i → α)− δuij (j → α) ,

the third cumulant is

S(i, j, k) = δuijk ({ijk} → α)− δuijk ({ij} → α)

−δuijk ({jk} → α)− δuijk ({ik} → α)

+δuijk (i → α) + δuijk (j → α) + δ

uijk (k → α) , (1)

and so on. The index ψ may then be seen as the order two cumulant of the expansion of the prediction error of thetarget variable;Stramaglia et al. IEEE Trans Biomed. Eng. 2016

Predictive multivariate models

Faes et al. Phil. Trans. A 2016

Variance decomposition

Entropy decomposition

Overview

fMRI data, N=90, Human Connectome Project

Regions forming redundant and synergetic multiplets with arepresentative region (black)

RED SYN

Hierarchical structure of synergy and redundancy networksStramaglia et al. IEEE Trans Biomed. Eng. 2016

Take-home message

With a wider applicability in sight, we advocate an intuitiverather than axiomatic view of partial informationdecomposition

We aim to detect the presence of redundant and synergeticmultiplets rather than precisely measure synergy andredundancy

Variance decomposition is a viable alternative to entropydecomposition

Thanks

7 @dan marinazzohttp://users.ugent.be/~dmarinaz/

Redundancy and synergy in dynamical systems

Science