Outline
• Linear Prediction Filters– Uses
• System Description
• Time Evolution
– Green’s Functions– Construction– Filter Analysis– Examples
• Dst-H
• Dst-VBs
• Neural Networks– Uses
• System Description
• Time Evolution
– Theory– Training– Network Analysis– Examples
• Dst-H
• Dst-VBs
Linear Prediction Filter Uses
• Map one Input (or many) to one Output– Convert Dst to Single Ground-station H
• Make a Forecast– Convert Solar Wind Measurements To
Geomagnetic Indices
• Determine System Dynamics– Use Impulse Response To Determine
Underlying Ordinary Differential Equation
What an LPF Looks Like
• An LPF can have an autoregressive (AR) part and a moving average (MA) part
• AR part describes internal dynamics
• MA part describes external dynamics
• Ambiguity occurs when used separately
O t a O t i t b I t j tii
N
jj
M
( ) ( ) ( )
1 0
AR MA
Other Names for LPFs
• Convolution Filter
• Moving Average
• Infinite Impulse Response Filter (IIR)
• Finite Impulse Response Filter (FIR)
• Recursive Filter
• ARMA Filter
• LPFs are a subset of Linear Filters which relate the Fourier spectra of two signals
MA Filters Are Green’s Functions
• Ordinary Differential Equations Can Be Solved with Greens Functions:
dOdt
O dGdt
G
t
jj
M
I t
G t e O t I t t G t dt
O t b I t j t
( )
( ) , ( ) ( ' ) ( ' ) '
( ) ( )
/
0
0
AR Filters Are Differential Equations
• Ordinary Differential Equations Can Be Rewritten As AR Filters:
dOdt
O
O t O tt
O t
t
I
I t
O t O t I t t
( ) ( ) ( ) ( )
( ) ( )[ ] ( )
1 1
1 1
Determining LPF Coefficients
The a’s and b’s are found by solving an overdetermined matrix equation:
O t a O t i t b I t j tii
N
jj
M
( ) ( ) ( )
1 0
O t t O t t O t N t I t I t M t
O t t O t t O t N t I t I t M t
a
a
a
b
b
O t
N
M
( ) ( ) ... ( ) ( ) ... ( )
( ) ( ) ... ( ) ( ) ... ( )
... ...
( )0 0 0 0 0
1 1 1 1 1
1
2
0
02
2
O t( )1
Solved using Least Squared Error Optimization or Singular Value Decomposition
Often tk = t0+kt
More on Linear Filters
• The Linear Filter formalism can be extended to multiple inputs
• There is an ambiguity between the MA and AR parts for certain kinds of differential equations. An ARMA filter can greatly simplify some MA filters
O t b I t j t b e
a O t i t I t a e
jj
M
jj t
t
( ) ( ),
( ) ( ),
/
/
0
1 1
Handling Data Gaps• Missing Inputs
– Omit intervals with data gaps, if possible– Interpolation over data gaps will smear out MA
coefficients
• Missing Outputs– Omit intervals with data gaps– Interpolation over data gaps can ruin the AR
coefficients– Less sensitive to interpolation in lag outputs
Linear Filter Analysis
• Once the filter coefficients are determined, we relate them to ODEs
bj
lag
Exponential Decay
dOdt
O I
bj
lagRecurrence
dOdt
O I t cI t T ( ) ( )
bj
lag
Derivative OperatordOdt
O dIdt
LPF Localization
• Localization adds Nonlinearity– Nonlocal LPFs cannot handle nonlinear dynamics
• Localize in Time– Continuously reconstruct LPF based only on most
recent data
– Can be very accurate but hard to interpret
• Localize in State Space– Construct different LPF for each region of state space
– Can provide multivariate nonlinearity
LPF Example: Dst-H
• Dst is a weighted average of H measured at 4 stations around the globe.
• Depending on activity and the local time of a station, there is a different relationship between Dst and H
• We will localize this filter in 24 1-hour bins of local time
• The filter will be very simple: H(t) = b0(lt) Dst(t)– lt is the local time of the station at time t
– We solve this equation in each 1-hour bin of local time
• By plotting b0 vs lt, we can infer the local current systems– This local current system is believed to be the partial ring current
Partial Ring Current
• Localized Filter: H = b0(lt)Dst
• Dst is less intense than H near Dusk due to an enhanced Ring Current in the Dusk sector
0.6
0.8
1
1.2
1/b 0
HSJG-Dst 1-Point Filter by Local Time
Local Dusk
0 4 8 12 16 20 24
Local Time
LPF Example: Dst-VBs
• The Ring Current (Dst) is largely driven by the solar wind
• One good “coupling function” for the solar wind driver is the interplanetary electric field (VBs)
• We construct a long MA filter:
Dst t b VBs t j tjj
M
( ) ( )
0
Dst-VBs Filter Coefficients
• Note the roughly exponential decay
• The differential equation could be:
• We can, therefore, build a trivial ARMA filter to do the same job:
dDstdt
Dst bVBs
Dst t bVBs t t a Dst t t( ) ( ) ( ) 1 1
-5 0 5 10 15 20 25 30 35 40 45-101234
Lag (hours)
b jDst t b VBs t j tj
j
M
( ) ( )
0
Neural Network Uses
• Map multiple Inputs to one Output (or many)– Excellent nonlinear interpolation but not reliable for
extrapolation
• Make a forecast– Excellent for forecasting complex phenomena
• Determine System Dynamics– NN is a black box model of the system
– Run NN on simulated data to isolate system response to individual inputs
• Many exotic NNs exist to perform other tasks
NN Theory
• Based on biological neural systems– Biological NN: composed of connections between
individual neurons
– Artificial NN: composed of weights between perceptrons
• We don’t know exactly how biological neural systems learn, so we have made some approximate training schemes
• Artificial NNs excel at quantifying• Biological NNs excel at qualifying
NN Topology• A standard Feed Forward
NN has no recursive connections
• Arrows represent weights (w,v) and biases (b,c)
• hi and Oi are perceptrons
• Typically only one hidden layer is necessary
• More hidden units allow for more complexity in fitting (not always good)
• Nonlinearity is achieved through an activation function – tanh(x) or [1+e-x]-1
I1
I2
I3
h1
h2
h3
h4
O1
O t b w c v I tii
i ij jj
( ) tanh( ( ))
wi
ij
• This is equivalent to an MA Filter
• AR behavior can be achieved through recurrence
NN Recurrence
• An Output Recurrent network is useful when O(t) depends on O(t-t)
• The recurrence is usually only implicit during training: I*
3 is taken from actual data rather than previous O1
I1
I2
I*3
h1
h2
h3
h4
O1wi
ij
I1
I2
I*3
h1
h2
h3
h4
O1wi
ij
• An Elman Network is useful when O(t) depends on the time history of the Inputs
• This makes training rather difficult– Continuous time series are needed
– Batch optimization is impossible
Pseudo-Input
Pseudo-Input
NN Training Theory
• A NN is initialized with a random set of weights• It is trained to find the optimal weights
– Iteratively adjust the weights
– Target least squared error for most data
– Target relative error for data with rare large events
• There must be far more training samples than there are weights– At least a factor of 10
• The goal is to achieve a fit which will work well out of sample as well as in sample– Poor out of sample performance is a result of overfitting
Gradient Descent• Simplest non-linear optimization• Weights are corrected in steps down the
error gradient• A learning rate is used to ensure
smooth convergence to error minimum• Descent can be stabilized by adding
momentum m, which recalls W(s-1) from last step
• m and should be between 0 and 1, sometimes functions of s
• For recurrent networks, gradient descent should be done serially for each tk
• This type of Gradient Descent replaces Backpropagation
E e e
e O t O t
JO
W
WE
WJ e
W s J e m W s
k k k
kll t
T
T
k
12
1
( )~
( )~
( ) ( )
W is a vector holding all the NN weights and biases
NN Output
Levenberg-Marquardt Training
• LM training is much faster than gradient descent
• LM training is not very appropriate for recurrent networks
• Algorithm:1. Increase until a step can be
taken without increasing the error
2. Decrease while error decreases with each step
3. Repeat 1 & 2 until exceeds threshold or other training limit is met
e O t O t
JO
W
e W e J W
W J J J e
W J J I J e
k k k
kll t
eW
T T
T T
k
( )~
( )~
0
1
1
NN Output
LM is based on Newton’s Method, but, when is large, LM training becomes gradient descent
NN Generalization• Overfitting can render NNs useless• Always reserve some of your training data (20%-50%)
for out-of-sample testing• Identical network topologies can perform differently,
depending on their initial weights (assigned randomly)– Train several networks (5) and keep the one with the best
out-of-sample performance
• Starve the NN by reducing the number of hidden units until fit quality begins to plummet
NN Analysis• The quality of NN an output is dependent on the training set density of
points near the associated inputs– Always plot histograms of the input & output parameters in the training
set to determine the high training density region
– Regions of input space which are sparsely populated are not well determined in the NN and may exhibit artificial behavior
• Try several different input combinations– Indicates what influences the system
• Analyzing weights directly is nearly impossible
• Instead, we run the trained NN on artificial inputs so that we can isolate the effects of a single variable on the multivariate system– Vary one input (or two) while holding all other inputs constant
– Simulate a square-wave input for a time series (pseudo-impulse)
– To identify real and artificial behavior, plot training points in the neighborhood of the simulated data to see what the NN is fitting
NN Example: Dst-H• We will repeat the Dst- H analysis nonlinearly• Inputs: Dst, VBs, sin(lt), cos(lt), Output: H
– Train with Levenberg-Marquardt
• The VBs input allows us to specify the intensity of (partial) ring current injection
• The lt inputs allow us to implicitly localize the NN• By plotting H vs lt for fixed values of Dst and VBs, we can
infer the local current systems and determine what role the VBs electric field plays
• By using an NN instead of an LPF, we add non-linear interpolation at the expense of linear extrapolation
Partial Ring Current (2)• We ran the NN on simulated data
– (Dst,VBs,lt) = ({-100:10},{0,5},{0:18})
– Psw was constant at 3 nPa
• We plot the H-Dst relationship at constant lt ,VBs, and Psw to see its characteristics
– A localized current system is creating an asymmetry (Dawn-Dusk)
– Otherwise, the relationship is linear
• Comparing the recovery phase (VBs = 0) to the main phase (VBs = 5 mV/m), we see that at larger Dst, the local-time asymmetry is weaker for the recovery phase than for the main phase
! It is generally impossible to make direct measurements of the local time H-Dst relationship at fixed VBs
-100 -80 -60 -40 -20 0 20-140
-120
-100
-80
-60
-40
-20
0
20
Dst
H
lt = 0
lt = 6
lt = 12
lt = 18
Main Phase
HSJG vs Dst (Psw=3) (VBs =5)
-100 -80 -60 -40 -20 0 20-120
-100
-80
-60
-40
-20
0
20HSJG vs Dst (Psw=3) (VBs =0)
Dst
H
lt = 0
lt = 6
lt = 12
lt = 18
Recovery Phase
NN Example: Dst-VBs
• There is some speculation that the dynamics of the ring current are nonlinear: try an NN!
• Inputs: Dst(t-1),VBs(t),Psw(t-1),Psw(t)
• Output: Dst(t)• Dst(t-1) provides implicit output recurrence
– We can still train with Levenberg-Marquardt!
• VBs(t) provides the new injection
• Psw allows the NN to remove the magnetopause contamination
Dst-VBs NN Analysis
• We ran the network on simulated data– (Dst,VBs) = ({-200:0},{0:7})
– Psw was constant at 3 nPa
• Dst = Dst(t+1)-Dst(t)
• The phase-space trajectories are very linear in the HTD area
• Curvature outside of the HTD area may be artificial
• Note how VBs affects the trajectories
• The dynamic equation is:dDst
dt
Dst
VBsQ VBs
( )( )
-40 -30 -20 -10 0 10 20-200
-180
-160
-140
-120
-100
-80
-60
-40
-20
0
Neural Network Phase Space
Dst
Dst
VBs = 0 mV/m
VBs = 1 mV/m
VBs = 2 mV/m
VBs = 3 mV/m
VBs = 4 mV/m
VBs = 5 mV/m
VBs = 6 mV/m
VBs = 7 mV/m
VBs Increases
High T
raining Density
Summary
• Linear Prediction Filters– Uses
• System Description
• Time Evolution
– Construction • Choose AR, MA or ARMA
• Least Squares Solution of Overdetermined Matrix
– Localize for Nonlinearity
– Filter Analysis • Green’s Function Analogue
• Local Filters Reveal Local Processes
• Neural Networks– Uses
• System Description
• Time Evolution
– Theory• Based on Biology
– Training• Iterative Adjustment of
Weights
– Network Analysis• Consider Training Density
• Run on Simulated Data
• Examine Phase Space
Further Reading
All you ever wanted to know about LPFs in Space Physics: Solar Wind-Magnetosphere Coupling Proceedings, (Kamide and Slavin eds) Terra Scientific Publishing, 1986. Articles by Clauer (p. 39), McPherron et al. (p. 93), and Fay et al (p. 111)
An Excellent Source for all kinds of NN stuff is the Matlab Neural Network Toolbox User’s Guide or Neural Network Design by Beale, Hagan, & Demuth. PWS Publishers 1995
“Learning internal representations by error propagation” by Rumelhart, Hinton, Williams in Parallel Data Processing, p. 318, Vol. 1, Ch 8. MIT Press (Rumelhart & McClelland eds)
Proceedings of the International Workshop on Artificial Intelligence Applications in Solar-Terrestrial Physics: 1993 and 1997 meetings