Old and New Methods in the Age of Big Data
Zoltán Somogyvári1,2
1Department of TheoryWigner Research Center for Physics
of the Hungarian Academy of Sciences2National Institute for Clinical Neuroscience
How do we find structures behind
the data?
Transformation of time series into
connections
How to find connection between data series?
The traditional method: Correlation(more precisely, the linear correlation coefficient)
Days
Exc
hang
e ra
te
USD vs GBP
2*EUR vs GBP
How to find connection between data series?
The traditional method: Correlation(more precisely, the linear correlation coefficient)
US
D v
s G
BP
2*EUR vs GBP
R=0.6
What does the correlation tells us? Problem 1: it is possible, that there is a clear connection between the two time series, but the correlation is 0 because of the non-linear form of connection.
Causality or common cause?
Is the temporal delay shows us the direction of the causality?
Cross correlation function:Correlation between delayed signals
Unfortunately not, because it assumes, that we observe the twoSignals with the same delays.
USD vs GBPleads
EUR vs GBPleads
Delay
Is there any way to infer causality?
Granger-causalityThe original idea came from Norbert Winer
x → y, if the inclusion of past x values improves the prediction quality on y
?Clive GrangerPublication 1969
Nobel price in Economic Sciences 2003
Granger-causality
Presumtions:– Stationary processes– Zero-mean– Uncorrelated
Gaussian noise– We have data of
every important valiable
?Linear autoregression:
Application – rat hippocampus
● Data:– Local Field Potential
→ Microelectrode-array● 256 channels● 20 kHz freq
● Information stream in the hippocamus
– options:→ State-dependent differences
● ie. sleep-awake→ Event-related
● Spike-related information transfer
Problems with the Granger-causality
Uses linear models (there are nonlinear extensions)
Assumes weak interactions (separability)
Unreliable results in case of circular causality
Has problems in deterministic (non-stochastic) cases
Cross Convergence Map:A new framework for causality analysis
A new approach, promising
● Detection of circular causality● Causality in nonlinear system● Deterministic (chaotical) system
Science 338, 496 (2012)
The model system: The logistic map
A one dimensional, discreet-time dynamical system implementingstretching an folding transformations.
xn+1=rxxn(1-xn)
The model system: The logistic map
It can exhibit different dynamical behavior, from stable fixpoint, throughperiodic oscillations to chaos, depending on the parameter r.
Taken's time delay embedingtheorem
First coordinate: the data itselfSecond coordinate: the data delayed by tauThird coordinate: the data delayed by 2 tau…....
The trajectory reconstructed in the state space is topologically equivalentWith the trajectory of the system's original trajectory in its real space.
Our model system: Two coupled logistic maps
xn+1=xn(rx(1-xn)+byxyn) yn+1=yn(ry(1-yn)+bxyxn)
rx=ry=3.8 so both maps are in the chaotic regime
Phase-space reconstruction based on delayed maps
Reconstructed state-space from thefirst data series in 3 embedding dimension
Reconstructed state-space from thesecond data series in 3 embedding dimension
Both dataset formed a 2D manifold in the 3D embedding space
xn+1=xn(rx(1-xn)+byxyn) yn+1=yn(ry(1-yn)+bxyxn)
Reconstructed state-space from thefirst data series in 3 embedding dimension
Reconstructed state-space from thesecond data series in 3 embedding dimension
Both dataset formed a 2D manifold in the 3D embedding space
In case of causal connections, the the reconstructed manifoldsholud be topologically equivalent according to the Takens' theorem.
But, how to test it?
Reconstructed state-space from thefirst data series in 3 embedding dimension
Reconstructed state-space from thesecond data series in 3 embedding dimension
Both dataset formed a 2D manifold in the 3D embedding space
Choose a point
Sugihara's method: Convergent Cross mapping
Reconstructed state-space from thefirst data series in 3 embedding dimension
Reconstructed state-space from thesecond data series in 3 embedding dimension
Both dataset formed a 2D manifold in the 3D embedding space
Find its neighborhood
Sugihara's method: Convergent Cross mapping
Reconstructed state-space from thefirst data series in 3 embedding dimension
Reconstructed state-space from thesecond data series in 3 embedding dimension
Lets do it for many points! If the neighbors in the first space are neighborsin the the second space as well, then the second variable is causal to thefirst one.
Find the same time points in the other state space
Sugihara's method: Convergent Cross mapping
In case of circular causality the mapping should work in both directions
Reconstructed state-space from thefirst data series in 3 embedding dimension
Reconstructed state-space from thesecond data series in 3 embedding dimension
Let us do it into the other direction!
Sugihara's method: Convergent Cross mapping
Reconstructed state-space from thefirst data series in 3 embedding dimension
Reconstructed state-space from thesecond data series in 3 embedding dimension
Let us do it into the other direction!
The chosen point
Sugihara's method: Convergent Cross mapping
Reconstructed state-space from thefirst data series in 3 embedding dimension
Reconstructed state-space from thesecond data series in 3 embedding dimension
Let us do it into the other direction!
The neighborhood
Sugihara's method: Convergent Cross mapping
Reconstructed state-space from thefirst data series in 3 embedding dimension
Reconstructed state-space from thesecond data series in 3 embedding dimension
The mapping worked well into both directions!This is the sign of circular causality.
Mapping
Sugihara's method: Convergent Cross mapping
Cross mapping in case of unidirectional interactions
How can be the topological equivalence is an asymmetric relation?
Reconstructed state-space from thefirst data series in 3 embedding dimension
Reconstructed state-space from thesecond data series in 3 embedding dimension
While the first dataset formed a 2D manifold, the second dataset resultedan only 1D manifold in the 3D embedding space!
yn+1=ryyn(1-yn)xn+1=xn(rx(1-xn)+byxyn)
Cross mapping in case of unidirectional interactions
How can be the topological equivalence is an asymmetric relation?
Reconstructed state-space from thefirst data series in 3 embedding dimension
Reconstructed state-space from thesecond data series in 3 embedding dimension
While the first dataset formed a 2D manifold, the second dataset resultedan only 1D manifold in the 3D embedding space!
Mapping works well in this direction
Cross mapping in case of unidirectional interactions
How can be the topological equivalence is an asymmetric relation?
Reconstructed state-space from thefirst data series in 3 embedding dimension
Reconstructed state-space from thesecond data series in 3 embedding dimension
The mapping worked well from x to y but failed from y to x, showing,that y is causal to x but x is not causal to y.
But spread out in the other direction!
MRI with implanted electrodes
4*8 channels in the grid plus 2*8 channelsIn two strip electrodes, 1024 Hz sampling
EEG signal of an epileptic seizure recorded on 48 channels
Ele
ctric
Po
ten
tial [
mV
]
Time [ms]
The initiation of the seizureE
lect
ric
Pot
entia
l [m
V]
Time [ms]
Connection dynamics during seizure
CausalityRight→LeftLeft→Right
This seizure appeared only on in the right hippocampus.
It is clear, that the right hippocampus has large effect to the Left hippocampus, while there is only mild effect in the backward direction.
Rig
ht F
OLe
ft F
O
LFP
Time [s]
Rig
ht F
OLe
ft F
O
LFP
Causality
Right→LeftLeft→Right
The seizure was more pronounced in the left hippocampus,
Although,
The right hippocampusdrove the left during the first period of the seizure, thena circular connection structure emerged.
Connection dynamics in seizure
May help in surgical preparation
Time [s]