Classifier Neural Network Models Predict Relativistic ...

Classifier Neural Network Models Predict RelativisticElectron Events at Geosynchronous Orbit Betterthan Multiple Regression or ARMAX ModelsLaura E. Simms1 and Mark J. Engebretson1

1Department of Physics, Augsburg University, Minneapolis, MN, USA

Abstract To find the best method of predicting when daily relativistic electron flux (>2 MeV) will rise atgeosynchronous orbit, we compare model predictive success rates (true positive rate or TPR) for multipleregression, ARMAX, logistic regression, a feed‐forward multilayer perceptron (MLP), and a recurrent neuralnetwork (RNN) model. We use only those days on which flux could rise, removing days when flux isalready high from the data set. We explore three input variable sets: (1) ground‐based data (Kp, Dst, andsunspot number), (2) a full set of easily available solar wind and interplanetary magnetic field parameters(|B|, Bz, V, N, P, Ey, Kp, Dst, and sunspot number, and (3) this full set with the addition of previous day'sflux. Despite high validation correlations in the multiple regression and ARMAX predictions, theseregression models had low predictive ability (TPR < 45%) and are not recommended for use. The threeclassifier model types (logistic regression, MLP, and RNN) performed better (TPR: 50.8–74.6%). These rateswere increased further if the cost of missing an event was set at 4 times that of predicting an event thatdid not happen (TPR: 73.1–89.6%). The area under the receiver operating characteristic curves did not, forthe most part, differ between the classifier models (logistic, MLP, and RNN), indicating that any of the threecould be used to discriminate between events and nonevents, but validation suggests a full RNN modelperforms best. The addition of previous day's flux as a predictor provided only a slight advantage.

1. Introduction

High fluxes of relativistic electrons (>2 MeV) at geosynchronous orbit can damage satellites through deepdielectric charging (Baker et al., 1990). Accurate prediction of high fluxes is therefore of great benefit in pro-tecting satellites in this region, and numerous models have been proposed to this end (Potapov, 2017). Theparameters describing this system may interact in complex ways, including nonlinear effects, varyingresponse lag times, and synergistic effects of predictors. The best predictive models, therefore, may simulta-neously include many predictors, using both linear and nonlinear effects, at several lags, and with the pos-sibility of interactive effects between predictors (Simms et al., 2016, 2018a, 2018b). Techniques that couldproduce such models include multiple regression, time series models, or neural network methods.

Empirical models have been produced using regression of electron flux on possible physical drivers (Potapovet al., 2016; Simms et al., 2014, 2016, 2018a, 2018b), in addition to performingmultiple regression analyses ofpossible drivers, review and reference the numerous studies and parameters that have been considered byresearchers. Other empirical models have been built using neural networks (Koons & Gorney, 1991; Linget al., 2010; O'Brien & McPherron, 2003), and autoregressive (AR)‐moving average (MA) time series models(Balikhin et al., 2011, 2016). However, as these models have been developed using different data sets, inputvariables, and assessments, it may not be clear which is the best approach. In this paper, we propose to com-pare several algorithms to build models and several sets of useful input variables, all using the same trainingand test data, to determine which most effectively predicts high energy electron flux increases above acertain threshold.

While several of these models use waves and seed electrons as inputs, as they are thought to be moredirect influences on the flux of high energy electrons, for predictive purposes, the more practical modelsrely only on more readily available inputs such as solar wind and interplanetary magnetic field (IMF)parameters (obtained from the OMNIWeb database) (Baker, 2000; Balikhin et al., 2016; Li, 2004) orground‐measured indices such as Kp or Dst (Ling et al., 2010).

©2020. American Geophysical Union.All Rights Reserved.

RESEARCH ARTICLE10.1029/2019JA027357

Key Points:• Multiple regression and ARMAX

models performed much worse thanclassifier models at predictingrelativistic electron flux

• Of the three classifier models(logistic regression, MLP, and RNN)RNN performed best

• Persistence (previous day's flux) isnot a necessary predictor

Supporting Information:• Supporting Information S1• Table S1• Data Set S1• Data Set S2• Data Set S3• Data Set S4• Data Set S5• Data Set S6

Correspondence to:L. E. Simms,[email protected]

Citation:Simms, L. E., & Engebretson, M. J.(2020). Classifier neural networkmodels predict relativistic electronevents at geosynchronous orbit betterthan multiple regression or ARMAXmodels. Journal of GeophysicalResearch: Space Physics, 125,e2019JA027357. https://doi.org/10.1029/2019JA027357

Received 29 AUG 2019Accepted 15 JAN 2020Accepted article online 27 APR 2020

SIMMS AND ENGEBRETSON 1 of 16

https://orcid.org/0000-0002-2934-8823

https://orcid.org/0000-0002-3882-8108

https://doi.org/10.1029/2019JA027357

https://doi.org/10.1029/2019JA027357

http://dx.doi.org/10.1029/2019JA027357

http://dx.doi.org/10.1029/2019JA027357

http://dx.doi.org/10.1029/2019JA027357

http://dx.doi.org/10.1029/2019JA027357

http://dx.doi.org/10.1029/2019JA027357

http://dx.doi.org/10.1029/2019JA027357

http://dx.doi.org/10.1029/2019JA027357

http://dx.doi.org/10.1029/2019JA027357

http://dx.doi.org/10.1029/2019JA027357

http://dx.doi.org/10.1029/2019JA027357

http://dx.doi.org/10.1029/2019JA027357

http://dx.doi.org/10.1029/2019JA027357

http://dx.doi.org/10.1029/2019JA027357

http://dx.doi.org/10.1029/2019JA027357

http://dx.doi.org/10.1029/2019JA027357

http://dx.doi.org/10.1029/2019JA027357

http://dx.doi.org/10.1029/2019JA027357

mailto:[email protected]

https://doi.org/10.1029/2019JA027357

https://doi.org/10.1029/2019JA027357

http://publications.agu.org/journals/

http://crossmark.crossref.org/dialog/?doi=10.1029%2F2019JA027357&domain=pdf&date_stamp=2020-05-09

Flux persistence is often found to be the most statistically significant and predictive input parameter (Bakeret al., 1990; Li et al., 2004; Ling et al., 2010; Simms et al., 2016, 2018a). However, although a persistence factorcan greatly increase the apparent predictive ability of a model, predictions will often lag behind during rapidrises in flux that are of most interest (Simms et al., 2016). Additionally, as a practical matter, models that relyon recent observations of flux persistence are unusable if there is a gap in flux data over several hours ordays. A model that did not depend on this input, even if it produced somewhat less accurate predictions,would be a valuable tool during periods when a flux measure was not available. In this case, estimates of pastflux can be generated using time series techniques. These estimates have been used in successful predictionmodels such as the SNB3GEO NARMAX (Balikhin et al., 2011, 2016; Boynton et al., 2015, 2016).

Additionally, the reliance on flux persistence to predict future flux may inflate the perceived accuracy of amodel if all days are included in assessments. Because high‐energy electrons persist in the magnetospherefor days after they are accelerated, a model using past flux as a predictor will, most of the time, accuratelypredict that flux will still be high on a day following high flux. Of more practical interest is whether a dayexperiencing low flux will be followed by a high flux event.

We propose refinements to the current approaches to address these issues. First, we developmodels that onlypredict the probability of a flux event (relativistic electron flux above the 60th percentile) following days ofnonevent conditions. In other words, we are only interested in predicting when a flux event begins, as this isnot only more useful but also more difficult than predicting that an ongoing event will continue into the nextday. Not only will this reduce the inflated “correct” predictions that occur when a model merely reproduceswhat happened the day before, it may also improve actual accuracy rates in predicting sudden rises in flux todamaging levels. Second, we build models without previous flux as an input, allowing prediction even whenflux data is unavailable. Third, we build models using only ground‐based inputs that would presumably beavailable even if satellite observations are not (Baker et al., 1990).

We compare several empirical model types that may be able to extract enough information from solar windand IMF parameters and/or from ground‐observed indices to make the inclusion of previous flux unneces-sary (Table 1). We compare predictions from each of these model types both with and without previous fluxas an input.

Both multiple regression and ARMAX (autoregressive‐moving average transfer function models of time ser-ies data) models predict values. While previous flux measurements can be used with either of these as input,ARMAX models, trained on past flux behavior, can predict flux based on past behavior in the training dataset alone without the need for additional input of current flux. Although both regression and ARMAXmod-els can be trained on numerous past lags of input variables, the ARMAXmodels more easily lend themselvesto determining which input variables at which lags are most predictive. Either of these models can beassessed using R2 (sometimes called the prediction efficiency if applied to the training data), which is thefraction of the variation in the training data set explained by the model, or with a validation correlation,the agreement between new observations in a test data set with predictions from the model. However,neither of these measures accurately describe the ability of these model types to categorize predictions intohigh (cause for concern) or low (no cause for concern) electron levels.

Logistic regression and neural network analysis are both classification techniques that predict the probabil-ity of belonging to a class (e.g., high or low flux) rather than predicted values. However, logistic regression isa linear classifier and therefore may not produce an optimal prediction model if inputs act nonlinearly.While transformations and polynomial terms can be introduced to a regression model to deal with some

Table 1Model Type Comparison

Abbreviation Classifier model Time lags before 1 day included

Multiple regression REG No NoAutoregressive‐moving average transfer function ARMAX No YesLogistic multiple regression LGC Yes NoMultilayer perceptron neural network MLP Yes NoRecurrent neural network RNN Yes Yes

10.1029/2019JA027357Journal of Geophysical Research: Space Physics


nonlinearity, neural networks may be more successful both because transformations and complex relation-ships that are necessary to describe the system are incorporated automatically and because a neural networkcan handle nonlinearities that are not controlled by conventional means (DeTienne et al., 2003). However,while logistic regression produces output that is always the same based on a given set of inputs, a neural net-work, trained by supervised learning (guessing at a category for an observation and then correcting its mis-takes), may produce very different models (with different weights) that still accomplish roughly the sameclassifications each time it is trained.

We train feed‐forward multilayer perceptron (MLP) models with three layers: an input layer of predictors(e.g., ground‐based indices and/or solar wind and IMF parameters), a hidden layer of neurons chosen bythe supervised learning procedure, and an output layer of probabilities of belonging to each class (high vs.low flux). This type of network feeds information unidirectionally through a series of mathematical opera-tions (the neurons) to the output layer. Each node is used once and only once. A recurrent neural network(RNN), on the other hand, if given sequential input data, can use its classification decision from a previoustime step to influence its decision a time step later. Using the time‐dependent behavior of inputs, this added“memory” in the RNN that is not present in the feed‐forward MLP can greatly increase the classificationaccuracy (Alpaydin, 2014; Graves, 2013). This incorporates both the nonlinear learning capabilities of theMLP with the ability to use time‐dependent behavior that is described by an ARMAXmodel. RNN is a tech-nique that has been previously used in predicting Dst (Wu & Lundstedt, 1996) and Kp (Wing et al., 2005).

The purpose of the present study is to compare these methods (multiple regression, ARMAX, logistic regres-sion, MLP, and RNN) using several variable sets over the same training period and with the same validationdata set. With prediction of high‐energy flux events as the goal, we compare the ability of these models to cor-rectly identify when flux will exceed a threshold bothwhen flux is used as an input andwhen it is left out of themodel. We produce models with ground‐based indices (Kp, Dst, and sunspot number) and satellite‐observeddata (solar wind and IMF parameters) and with and without previous flux as inputs. The ground‐based datamodel could be used during periods when satellite data are missing. Although the Kp index has less physicalmeaning than an ultralow frequency (ULF) wave index, previous work has shown these factors to be highlycorrelated and therefore nearly interchangeable inputs in predictive models (O'Brien & McPherron, 2003;Simms et al., 2014). We use the Kp index in these models as it is often more reliably available in real time.We also explored the use of the AE index but found it was not as effective as Kp on its own and added no expla-natory power when Kp was in the model. Substorms are presumed to inject the seed electrons which are sub-sequently accelerated to relativistic energies, but AE may not to be the best measure of substorms. Previousstudies have found SMEd, which includes more ground stations and is limited to the darkside(Gjerloev, 2012; Newell & Gjerloev, 2011), or substorm number (Simms et al., 2018a) may measure the sub-storm influence on relativistic flux more effectively. Although including a substorm measure might improveourmodels,AE is less predictive thanKp and the SMEd index is not available in real time for next day forecasts.

While we provide commonmetrics of model assessment such as R2 and validation correlations (with a with-held test data set), we also report the true positive rates (TPRs) to assess each model's ability to predict highflux events of energetic electrons. These rates may be more helpful in determining the usefulness of a modelfor event prediction. Additionally, we are able to determine an optimal probability level to classify outcomesinto high energy electron events using receiver operating characteristic (ROC) curves that plot true positiveand false positive rates over a range of classification cutoffs (Fawcett, 2006; Hanley & McNeil, 1982). Thesecurves empirically describe the effect of moving the decision threshold on the accuracy rate. The area undera ROC curve (the AUC) gives an indication as to how successful a model is at classification with an area of 1equal to perfect classification and an area of 0.5 meaning the model does no better than random assignmentof classes (Metz, 1978).

2. Data

Daily electron flux values (log (particles/cm2 day sr) of electrons >2 MeV) were obtained from GOES‐13(geosynchronous orbit, L5.6) for Day 2010091‐2017365. Electron alerts are issued by the Space WeatherPrediction Center at the National Oceanic and Atmospheric Administration when flux >1,000particles/(cm2 s sr). Summed over a 24 hr period, this would correspond to flux >7.9365 log (particles/cm2-day sr). This is roughly the 75th percentile of daily flux observations at GOES‐13. As increases in flux can



occur midday, we set our criterion for “high flux” at the 60th percentile, or 7.4771 log (particles/cm2 day sr),to account for times in which flux was only high for partial days. (Note that this threshold was never chan-ged. Optimal probability thresholds described below refer only to changing the probability cutoff at whichobservations are classified into events and nonevents.)

Daily averaged predictor variables obtained from the OMNIWeb data site are Kp index (× 10),Dst index (nT),sunspot number, solar wind number density (N, n/cm3), velocity (V, km/s), pressure (P, nPa), electric field(Ey, mV/m), and IMF |B| (nT) and Bz (nT, geocentric solar magnetospheric, or GSM, coordinates). We chosethese particular variables not only because they represent a range of processes that may ultimately drive rela-tivistic electron acceleration but also because they are available for input into a next‐day forecast. All predic-tor variables were daily averaged in keeping with the daily averaged electron flux obtained from GOES.Although minimum Bz might appear to be a better predictor, as it would capture the southward turningof the IMF, we found that the correlation of flux with daily average Bz (−0.187) was stronger than that withdaily minimum Bz (−0.092). For this reason, and for consistency with the other variables, we use the dailyaverage rather than the minimum.

All models were trained on input variables from the day before. Although more accurate models might bemade by nowcasting from predictors measured on the same day as the flux, this would not create a usefulmodel for prediction.

The data was split into a training set (2010103‐2015365) and a test set (2016001‐2017365). Choosing a con-tinuous training set from the first 5 years of available data allowed training for the ARMAX models, whichrequire consecutive observations to estimate AR and MA terms. Occasional missing days of data werereplaced with the mean of the surrounding data points as ARMAX models need continuous time series toestimate parameters.

For all model types but the ARMAX we limited observations to only those days when flux could rise abovethe 60th percentile. In other words, we only included observations where flux the day before was below the60th percentile. This resulted in a training set of 1398 days and a validation (or test) set of 298 days. Wetrained the ARMAXmodels on the entire time period covered by the training data (including days when fluxwas high) then calculated prediction accuracies only following days below the 60th percentile.

For ground‐based regression models, AE (nT) was considered as an input but dropped due to its nonsignificantinfluence. Its addition also resulted in lower validation correlation and a lower TPR (correct identification ofevents) compared to the ground‐basedmodel without AE. The REG, ARMAX, and LGC ground‐basedmodels,therefore, contain only Kp, Dst, and sunspot number and are referred to as the KpDstSunspot models. Modelsincluding both ground‐ and satellite‐observed data use Kp, Dst, AE, sunspot number, N, V, B, Bz, P, and Ey asinputs. These are referred to as full models and may also include previous day's flux.

3. Model Building

We use the SPSS statistical software package to build multiple regression (REG), logistic regression (LGC),ARMAX (autoregressive‐moving average), and MLP neural network models. We utilize the ability of SPSSto build the best models automatically. REG, LGC, and MLP models were trained on predictors measuredone day prior to current day's flux. The ARMAX models, using the Expert Modeler procedure of SPSS, alsoincorporated all previous lags needed to fit a model with the lowest BIC (Bayesian Information Criterion).RNNs were trained using the trainNetwork procedure in MATLAB (2019, Neural Networks Toolbox) usingvariables from the 7 days prior. MLP and RNN architecture are not chosen by the user but learned bythe algorithm.

We use unstandardized data for the REG, LGC, and ARMAXmodels. All data (both training and test set) forthe MLP and RNNmodels was standardized by subtracting the mean and dividing by the standard deviationof the training set. Standardization of MLP and RNN data is required by the algorithms used in SPSS andMATLAB to produce more efficient and unbiased estimates of the weights. While data could be standardizedfor REG, LGC, and ARMAX models, it is not necessary and would produce scaled coefficients that wouldrequire standardized inputs. To make these models more usable we have therefore not scaled the trainingdata. Previous work has used the prediction efficiency (also known as R2 or the coefficient of determination)and/or correlation with a withheld test set to assess the predictive ability of a model. There are drawbacks to



these assessment criteria. First, the R2 only calculates the percent of variation in the training set that isexplained by the model. It is not an independent validation using a withheld test data set. Second, simplecorrelation of predictions from a model with test set values can be quite high, even while missing most ofthe events of interest (e.g., flux rising above the 60th percentile) if these events are rare. In our current study,we explore the use of true and false positive rates to determine overall accuracy rates of predicting eventsversus nonevents (Fawcett, 2006). A TPR is calculated as the number of correctly identified events dividedby all events, while the true negative rate (TNR) is the number of correctly identified nonevents dividedby all observations that are not the events of interest. The accuracy rate is the number of correctly classifiedobservations (either event or nonevent) divided by the total number of observations. As LGC, MLP, andRNN models output probabilities that a given observation will be an event, these can easily be assessed byuse of a true positive or accuracy rate. Classification is based on a cutoff probability optimized for the rateat which events and nonevents occur in the population. This threshold can be optimized for the case wherethe costs of misclassifying events and nonevents are equal, but higher event prediction accuracy can often beobtained by changing this cost ratio (Fawcett, 2006). We provide accuracy rates for both the equal‐cost pre-diction and for predictions where the cost of misclassifying an event (missing a rise in flux) is 4 times that ofmisclassifying a nonevent (predicting a rise in flux that does not occur).

For classifier models (LGC, MLP, and RNN), ROC curves plot the TPR (sensitivity) versus the false posi-tive rate over a range of classification thresholds. Generally, as the threshold is moved to increase theTPR, the rate of false alarms increases. This curve shows the trade‐off between these two rates. The areaAUC can be used to rank models by their ability to discriminate effectively between classes. A moreeffective model will have a greater area, with perfect classification in the case where the AUC = 1.The model does no better than random assignment of classes if the AUC = 0.5 (Krzanowski &Hand, 2009; Metz, 1978).

As both REG and ARMAX models output predicted values of flux, not probabilities of class member-ship, for these models we classify predicted events as those days on which the model predicts a fluxgreater than the 60th percentile. Using this classification, we are able to produce TPRs for both REGand ARMAX models as we do for the classifier models (LGC, MLP, and RNN). For comparison withpreviously published models, we also assess the REG and ARMAX models by reporting R2 (percent ofvariation in the training set explained by the model) and the validation correlation between observedand predicted values in the test set.

We explore three models: (1) using only ground‐observed data (Kp, Dst, and sunspot index), (2) a full modelusing all variables except previous day's flux, and (3) a full model using all variables and including previousday's flux. While the full models incorporating satellite‐observed data might give better predictions, themodel using only ground‐observed data could be used for prediction during periods when satellite dataare unavailable.

3.1. Regression Models

In addition to simultaneous testing of the influence of various inputs, both multiple regression and logisticregression also create a weighted sum of input parameters that can be used to predict future behavior of thedependent variable. REG predicts values. If classification is desired, predictions can be categorized using cut-off values. However, logistic regression is designed to classify observations into one or the other class and isoften more successful at this task than multiple regression (Neter et al., 1985).

A drawback to both these methods, however, is that they are linear models without built‐in lags. Althoughnonlinearity can be introduced by transformations or adding polynomial terms, and the effects of lags can bestudied by introducing lag terms, it may require some work to identify which additions produce the bestmodel. Identification of the best model may also be awkward due to multicollinearity between predictorsand the danger of producing an overfitted model which does not predict well.

These drawbacks can be overcome using ARMAXmodels, to efficiently identify which lags of input variablesare most influential, or by MLP networks, which can easily model nonlinearities. An RNN model, as it usessequences of predictors over prior time steps, may incorporate the benefits of both these methods, incorpor-ating prior behavior as well as nonlinear relationships.



3.2. ARMAX Method

AnARIMAXmodel uses a collection of AR, MA, and differencing (I) terms at various lags to model the cycli-cal behavior of the response variable. This can be easily extended to add input (or exogenous) variables,which describe the influence of other parameters on the response variable. These are called ARIMAX mod-els (with the X coming from the eXogenous variables) or transfer function models (Hyndman &Athanasopoulos, 2018; Makridakis et al., 1998). Once transfer models are trained, they do not require furtherinput of the response variable to produce accurate predictions. As an ARIMAXmodel can describe the beha-vior of the dependent variable without the continuous input of new observations of itself, it could provide auseful alternative to models that improve fit by incorporating flux observations from the previous time per-iod. We automatically select a “best” model using the Expert Modeler procedure in SPSS. This proceduresearches through the space of possible models. A normalized BIC is calculated for each and the model withthe lowest is chosen (Schwarz, 1978).

No differencing was needed as flux did not show a trend over time. For this reason, we refer to the models asARMAX models. Several also did not include MA terms. Initially, we also explored the use of an AR(1,27)model (not shown), using AR terms from lags at Days 1 and 27, to account for the cyclical behavior of solaractivity. Although the AR(27) term was statistically significant, its addition did not improve the fit, onlyminimally improved the accuracy rates in the error matrix, and resulted in the same or lowervalidation correlation.

To compare with the nonlinear MLP and RNNmodels, we also created a nonlinear ARMAXmodel by allow-ing the ExpertModeler procedure to choose not only from the main factor solar wind and magnetosphericvariables, but also their squares and interaction terms. This is suggested by the NARMAX procedure usedby Balikhin et al. (2011).

3.3. MLP Neural Networks

Neural networks such as MLPs incorporate an input layer (the measured predictor variables), at least onehidden layer (consisting of several neurons), and an output (the pseudoprobabilities used for classification).In most cases, a single hidden layer is enough to model the relationships in the data, but this layer is usuallymade up of more than one neuron.Weighted sums of the input parameters are fed into each neuron; the out-put from each neuron is then transformed by the activation (or transfer) function before weighted sums ofthe neuron output are fed into the output layer to produce class membership probabilities. An MLP is afeed‐forward network, meaning that all information flows from the inputs to the outputs, with no loopsor cycles back. In practice, it is often also useful to introduce a bias term to the weighted sum inputs bothinto and out of each neuron. A bias term is similar to the intercept term in a regression equation. In mostcases, predictors should be standardized in some fashion so that their means and standard deviations areequal. In our models, we standardize by subtracting the training set mean and dividing by the standarddeviation of each input parameter.

The MLP procedure in SPSS can be left to its own devices to build a neural network, choosing the number ofhidden layers, the neurons within them, and the activation function to transform the hidden layers to theprobability output layer. This activation function is usually a sigmoid function, often either the logistic func-tion (as is used by logistic regression) or a tanh function. The logistic function is useful in modeling probabil-ities as it asymptotes at 0 and 1, resulting in a model that will not predict nonsensical probabilities beyondthese values. The tanh function (with a − 1 to 1 range) can be easily converted to a similar 0,1 output range.In practice, the tanh activation function is most often used in neural networks as it has been found to pro-duce more accurate predictions (DeTienne et al., 2003). The output gives the relative chance of membershipin a class, but these do not automatically sum to 1. We use the softmax transformation to normalize theoutputs (Bishop, 2006).

Using theMLP procedure of SPSS, we built a number of predictivemodels. We allow the procedure to choosefrom all three variables in the KpDstSunspot model, or from all nine input variables (and possibly the lagflux) in the full models. (We base this last decision on the fact that the ARMAX full model retained all nineinput variables as significantly influential.) The algorithm initiates with random weights and works towardan optimal solution. Each call to the SPSS MLP function produces a new model with very different loadingson the input variables, different numbers of nodes in the hidden layer, and possibly different activation



function and number of hidden layers (Alpaydin, 2014). Averaging thesemodels is not, therefore, possible. Even if activation function and thenumber of hidden layers and nodes is fixed by the user, averaging the coef-ficients would remove the architecture that was chosen for optimal pre-diction. However, not all the models produced by the procedure predictwell in the test data set. To choose the best model, we ran the algorithm500 times for each input set and chose the model with the largest areaunder the ROC curve. The calculated probabilities of class membershipwere used to classify whether the day was predicted to have flux aboveor below the 60th percentile.

3.4. RNNs

We trained RNN models with the MATLAB trainNetwork procedureusing the Adam (adaptive moment estimation) optimizer (Kingma &Ba, 2014) to train a long short‐term memory model (LSTM)(Graves, 2013). These models are similar to, if more complicated than,MLP models. There is an input layer, a layer of hidden neurons, and anoutput layer that is standardized by the softmax function. However, aRNN, if given sequential input data, can use its classification decisionfrom a previous time step to influence its decision a time step later.Using the time‐dependent behavior of inputs, this added “memory” inthe RNN that is not present in the strictly feed‐forward MLP may increasethe classification accuracy. Although an MLP model could be trained onmultiple lags (further back than one time step), the architecture of sucha model would not likely not be as optimal. We did experiment with thisapproach to adding lags to the MLPmodel but discovered that lags furtherback than one day resulted in increasingly ineffective predictions. TheTPR actually went down as more lags were introduced.

As with the MLP full models, we retained all nine input variables as theywere deemed influential in the ARMAX model. Variables were standar-dized by subtracting each of their training set means and dividing by theirstandard deviations. Based on preliminary experiments with this trainingmethod and data set to optimize accuracy of predictions, we used 7 daysequences of input variables, a single‐layer network, and hidden nodesequal to (sequence length) × (number of input variables) × (number ofoutputs). As sequence size (number of days on which inputs were mea-sured) was always 7, and the number of outputs always 2 (flux event vs.nonevent), the number of hidden nodes ranged from 42 (for theKpDstSunspot model) to 154 (for the full model). Models were trained500 times and the model (of each input type) with the highest area underthe ROC curve was chosen.

4. Comparison of Models4.1. Comparison of Classifier Models Using Area Under the Curve

Using the test set, ROC curves of the classifier models (LGC, MLP, andRNN) are compared for each input variable set (Figure 1). Each of thesemodels classifies events better than a random classification (representedby the AUC = 0.5 reference line). Comparing the area under the ROCcurve directly, we find an AUC = 0.92 for both the RNN and MLP fullmodel with previous flux (Figure 2). This is nearing an AUC of 1 (perfectclassification). The LGC model is somewhat lower at 0.90. However, ifprevious day's flux were not available for forecasting, forecasts could bemade with one of the other two variable sets (the KpDstSunspot or full

Figure 1. Receiver operating characteristic (ROC) curves, which plot thetrue positive rate versus the false positive rate for the classifier models(LGC, MLP, and RNN). The AUC = 0.5 reference line is the line belowwhich a model does no better than random assignment of class. (a)KpDstSunspot model, (b) full model without previous day's flux, and (c) fullmodel with previous day's flux.



model without previous flux). These two variable sets produce modelswith marginally lower AUCs: 0.88–0.89 for the full model without pre-vious flux and 0.85–0.87 for the KpDstSunspot model. However, none ofthe models are all that different, given that the range from worst to bestAUC is 0.85–0.92. Bootstrap confidence intervals (at α = 0.05) for eachAUC show that there is no statistically significant difference betweenthe AUCs of LGC, MLP, and RNN models with the same input variables.(The bootstrap technique can be applied in many situations. By resam-pling many times from a single set of observations, the resulting distribu-tion can be used to determine confidence intervals (Efron, 1979).Comparing between variable sets, the only significant difference occursbetween the RNN full model with previous flux and the RNNKpDstSunspot variable set. Based on AUCs alone, it could be concludedthat any of the three models (LGC, MLP, or RNN) using any of the vari-able sets could be used to predict flux events with almost equal accuracy.

4.2. Comparison of Prediction Accuracy (TPR) by Model Type:REG, ARMAX, LGC, MLP, and RNN

For the LGC,MLP, and RNNmodels, we compared event prediction accu-racy (TPR) in the test data set using the optimal probability thresholdassuming equal costs of false positives and false negatives (Figure 3a).REG and ARMAX models were rated by whether predictions above orbelow the 60th percentile corresponded to events or nonevents. Overallaccuracy of prediction is presented in the second panel (Figure 3b).

Of the fivemodel types (REG, ARMAX, LGC,MLP, RNN), the REGmodelperformed the worst at correctly predicting events in the test data set,whether previous day's flux was included or not. Its TPR ranged from4.4% (for the KpDstSunspot ground‐based model) to 32.4% (for the fullmodel including previous day's flux) (white bars of Figure 3). (This wasthe onlymodel type for which adding previous day's flux resulted in signif-icantly better predictive ability.) Despite this poor predictive ability, thecorresponding validation correlations (correlating predicted values withobserved values in the test set) were 0.437 and 0.710. The highest correla-tion (full model with previous flux) is close to that attained by previousregression models using all days, not just the selection of days on whichflux could rise (average r = 0.78; Simms et al., 2016). This poor predictiveperformance, therefore, is not the result of removing days on which fluxwas persistently high. The conclusion is that a high validation correlationdoes not mean the model will have a high TPR. In other words, a high cor-relation does not translate into predicting flux events well. Similarly, thereasonably high 72.4% R2 of the best REGmodel (full model with previousflux) does not correspond to high predictive ability. Although a regressionmodel may be useful in predicting actual values of flux and in determiningwhich inputs are most influential, it does not perform well as a classifier.

To produce a model that does not depend on possibly unavailable pre-vious flux data but might still benefit from knowledge of that value, weproduced ARMAX models that model flux behavior using AR and MAterms but do not depend on additional daily input of flux measurements.Neither the full nor the KpDstSunspot model, therefore, includes previousflux as a possible input. The automatic ExpertModeler procedure for theKpDstSunspot model using linear terms chose an AR(1), MA(1,13) pro-cess (based on theminimum BIC) to model flux behavior. This means thatmodel‐predicted flux from the previous day (Lag 1) and possibly a MA of

Figure 2. Area under the ROC curves of the classifier models for threeinput variable sets. Perfect classification occurs when the ROC curvearea = 1. Error bars are the 95% confidence interval from a bootstrapdistribution.

Figure 3. Validation of models at equal cost threshold. (a) True positiverates for the test data set for each model type (REG, ARMAX, LGC, MLP,and RNN) and for the three predictor sets. (b). Overall accuracy rates.



model errors between the previous days (between Lag 1 and Lag 2, and between Lag 1 and Lag 13) wereinput into the prediction for the current day. (The MA terms indicate that a MA over the first andthirteenth lags can act to stabilize the model errors and may be necessary to cancel out negativeautocorrelation effects that may have been introduced by the AR1 term.) However, flux behavior in thefull model was sufficiently modeled as a simpler AR(1) process. The addition of more variables into thefull model made the MA terms unnecessary as the additional inputs likely described much of the cyclicalbehavior of flux and the AR1 term did not, therefore, result in negative autocorrelation of the errors.

For the full model, the ExpertModeler procedure chose to retain all nine predictors. All nine, therefore, weredeemed significantly influential. Using the absolute value of the standardized coefficients, the relative influ-ence of each variable can be compared (Figure 4); however, it should be noted that N, Dst, Kp, |B|, and sun-spot number were negative influences. |B| and sunspot number did not act at a one day lag. |B|, which wasthe least influential, acted 7 days before, and sunspot number began acting 6 days before. The retention of allvariables in the ARMAX model led to the later decision (below) to keep all variables in the MLP andRNN models.

The ARMAX ground‐based only (KpDstSunspot) and full models (Figure 3, light blue bars) performed betterthan the REG models with no previous flux (TPR rates of 11.8% for the ground‐based model and 44.1% forthe full model). (Note that although the ARMAXmodels never used actual flux observations for predictions,these bars are repeated on the right side of the bar chart for comparison to other models that do use previousflux.) There is some benefit to adding terms to describe the cyclical behavior of flux if actual measures of pre-vious day's flux cannot be obtained. However, there is still an obvious need for improvement. Again, anR2 = 83.0 and validation r = 0.796 for the full model appear to show good performance, but these measuresdo not give a true indication of how well the model predicts high flux events. TPR values were much lowerthan the validation correlation or R2 might suggest. (The KpDstSunspot ARMAXmodel had a validation cor-relation = 0.698 and R2 = 79.9, also considerably better than the REG model.)

We experimented with a nonlinear ARMAXmodel (added to the full model) by allowing the automated pro-cedure to choose not only between the main effects but also their square and interaction terms. This is notshown in the figures as it resulted in only a small improvement in TPR (47.0% as compared to 44.1%) andbetter predictive ability was achieved by allowing the MLP and RNNmodels to choose the specific nonlinearterms that most improved the models (see below).

Overall accuracy rates (combining both TPR and TNR) of the REG and ARMAX models were high (77.5–83.2) but lower than for the classifier models (Figure 3b). However, these high accuracies mostly representthe ability of these models to accurately predict nonevents.

The TPR for the LGCmodels (green bars of Figure 3) was higher than for both the REG and ARMAXmodels.The probabilities output by a linear classification model from logistic regression appear better able to iden-tify flux events than the output of a regression model. The LGC KpDstSunspot model produced surprisinglyaccurate predictions (TPR = 61.2), higher than the full model without previous flux (TPR = 56.7) and nearlyas high as for the LGC full model with previous flux (TPR = 62.7).

Figure 4. Standardized regression coefficients (absolute value) of the input variables in the full ARMAX model.



TheMLPmodels were expected to provide even better prediction than theLGCmodels, as logistic regression is only a linear classifier while MLP candescribe nonlinearities if they are present in the data. In the validation ortest set (dark blue bars of Figure 3), the MLPmodel performed worse thanLGC in the KpDstSunspot model (MLP TPR = 50.8). However, there wasmore improvement in the full model (MLP TPR= 67.2 and 68.5 for the fullmodel without or with previous day's flux, respectively).

The RNNmodels were expected to not only provide a nonlinear classifica-tion (as the MLP) but also to account for lagged influences of variablesover the previous 7 days (as the ARMAX model). The RNN TPRs wereall somewhat higher (64.2, 73.1, 74.6, for the three input variable sets).At the equal cost optimality threshold, therefore, there is some evidencethat the RNNmodels, which account for influences over more time steps,might be a better choice than the MLP or LGC.

Overall accuracy rates of the classifiermodels (LGC,MLP, and RNN) wereall high (87.5–90.6) with no large differences between model types orinput variable sets (Figure 3b).

4.3. Comparison of Prediction Accuracy at Unequal Costs

If the threshold for group classification is optimized for the case where thecost of a false negative (missing a flux event) is 4 times the cost of a falsepositive (predicting an event that does not occur) the validation TPRsincrease for all classifier models and for all input variable sets(Figure 5a). Most notably, the TPR of the LGC model is the same orslightly higher than that of the MLP or RNN for all three input variablesets (LGC TPR = 77.6, 85.1, and 89.6 for the KpDstSunspot and the twofull models, respectively). Accuracy rates (78.5–88.6) are similar to what

was found for the equal cost model (Figure 5b). Their slight decrease is due to an increase in the false positiverate which is to be expected when the optimal threshold is changed to increase the TPR.

4.4. Model Coefficients and MATLAB Code

New predictions can be made using the coefficients of these classifier models. Logits (natural log of the odds)predicted by the LGC models are calculated by taking the linear combination of coefficients (Table 2) multi-plied by the input variables. The predicted logits are converted to probabilities using the backtransformation:

Figure 5. Validation of models at threshold where cost of false positive is 4times cost of a false negative. (a) True positive rates for the test data set foreach model type (REG, ARMAX, LGC, MLP, and RNN) and for the threepredictor sets. (b) Overall accuracy rates.

Table 2Coefficients (Unstandardized) and Optimal Thresholds of the LGC Models

Model: KpDstSunspot Full without previous flux Full with previous day's flux

Constant 3.7660 8.5228 2.8428Kp × 10 0.08577 0.06816 0.04886Dst −0.02422 −0.02820 −0.03609Sunspot number −0.006885 −0.007238 −0.007767|B| — −0.1890 −0.07066Bz (gsm) — 0.06948 0.05734V — 0.01493 0.01698N — 0.02679 0.1296P — −0.1743 −0.2439Ey — 0.4920 0.4935Previous day's flux — — 2.6527Equal cost optimal probability threshold 0.2644 0.4000 0.34564 times cost optimal probability threshold 0.1501 0.1356 0.1067

Note. All predictors are from the previous day. Thresholds in the last two rows are the probability decision cutoffs. Anobservation is classified as a flux event if the predicted probability output by the model is above this threshold.



Pr eventð Þ ¼ elogit

1þ elogit(1)

where Pr (event) is the predicted probability of a flux event and logit is the predicted logit from the LGC.Days are then classified as an event if predicted Pr (event) > optimal probability threshold (last two rowsof Table 2). (Note that this is not the 60th percentile cutoff that was originally used to determine eventsversus nonevents in the training set.)

A table of optimal thresholds is included in the supporting information (Table S1). The RNN trained net-works are provided as MATLAB network files (supporting information Data Set S2). MLP and RNN coeffi-cients and transformations are incorporated in MATLAB programs that output probabilities for all threevariable sets (KpDstSun, Full without previous flux, and Full with previous flux) (supporting information:Data Sets S3 and S4). These programs also output the LGC probabilities. Output classifies observationsaccording to both the equal cost optimal threshold and the case where a false negative is 4 times that of afalse positive. Data Set S4 does not include the RNN models and can be run in a MATLAB environmentwhere the Neural Networks Toolbox is not available. A sample data set is also provided (Data Set S5).Note that while these models provide predictions for every day, they are only valid for days when the fluxwas not already high on the day before.

5. Discussion

We improve predictive models of high‐energy electron flux events by considering only those days on whichflux could rise (ignoring those days on which flux is already high) and by using classifier models (LGC, MLP,and RNN). We improve model assessments by comparing models via AUC (area under the ROC curve) andtrue positive rate values instead of the more traditional correlation and R2.

While the LGC (logistic) model is a linear classifier, unlike theMLP and RNNmodels which can incorporatenonlinearities as needed, and while the RNNmodels include predictor effects from the prior 7 days, all clas-sifier model types had similar AUC indicating similar ability to discriminate between events and nonevents.There was, however, some improvement in TPRs in the full model, with LGC performing the worst and RNNthe best. This suggests that first, nonlinear effects of predictors previously described in this system (Reeveset al., 2011; Wing et al., 2016) can be well characterized by the use of a set of, rather than single, predictorsand by the use of several predictors that incorporate the nonlinear nature of the system. Using several pre-dictors simultaneously, such as N, V, and |B| in the same analysis, may account for some of the nonlinearityobserved in single variable correlations. Additionally, there are specific parameters that could be describedas polynomial or multiplicative interaction terms. These include pressure (which is both a square term ofsolar wind velocity and an interaction term describing the possible synergistic effects of velocity and numberdensity), and Ey (which is an interaction term describing synergism between velocity and Bz). Additionalnonlinearity in themodels (introduced by theMLP and RNNmethods) increase predictive ability somewhat,but it is apparent that most of the previously described nonlinearities are already accounted for by simulta-neous analysis of predictors and by the use of several predictors that are descriptors of the nonlinear natureof, for example, N and V. Second, predictor effects can be somewhat adequately described by the most recentmeasurements without the need for older lags. All three classifier model types performed similarly in valida-tion. However, the RNN full models (including previous lags) do provide some improvement in TPR.

Previous feed‐forward neural network models (similar to MLP models) developed using 7 days of Kp and10 days of previous flux as inputs (Ling et al., 2010) or previous flux, Dst, Kp, and ULF waves over severalprevious days (O'Brien &McPherron, 2003) performed well. We have improved on this type of model by lim-iting data points to only those where flux could rise (i.e., removing days where flux is already high). Ourresults indicating that previous flux is not a necessary input and that data from only the previous day canproduce a reasonable model mean that we could drop these extra factors. This results in a more stable modelthat is less likely to suffer from problems of overfitting that can lead to poor forecasts.

Regression models predicting values of flux (REG and ARMAX) appeared effective based on R2 and valida-tion correlation statistics. However, TPRs from these models were low. Although the hypothesis testingavailable with REG and ARMAX models are useful in determining which predictors are important drivers,their values output does not lend itself well to classification. In addition, the statistics often used to evaluate



these model types, timeplots of flux versus predictions, R2, prediction efficiency, or validation correlation, donot adequately describe the ability of these models to classify events. The R2, which is mathematicallyequivalent to the prediction efficiency when applied to the training data set, measures the amount of varia-tion in the data explained by the model. In a sense, it validates the model using the same data as that used tocreate the model. This often results in a higher statistic than a validation correlation between testing set dataand model predictions. Prediction efficiency, if applied to the model predictions in the test data set, will givesimilar results as validation correlation. Previous studies that have used these criteria to assess modelsappear to show good model fit, but they are misleading. Timeplots of regression models, for example, showpredictions that follow the general outline of flux observations but fail to predict the high values that consti-tute a flux event following low flux days (Baker et al., 1990; Simms et al., 2016). High R2 values can show howwell the model fits the training set data but are not particularly suited to determining how well the modelwill predict a novel data set (Simms et al., 2016). Validation correlations may be high but this may only bean indication that the general patterns of flux increases and decreases are being described well, not that thereis good tracking to the highest values. Also, incorporating previous flux into these models can greatly inflatethe validation correlation without any real increase in predictive ability. This is particularly true if all daysare included as the model success will be quite high on those days where flux persistence is a major factor(Balikhin et al., 2016; Simms et al., 2016).

We experimented with one nonlinear ARMAX model as suggested by the NARMAX model of Balikhinet al. (2011) by allowing the automated procedure to choose not only from the main factor solar wind,IMF, and magnetospheric variables but also from their squares and interactions. The Balikhin models, how-ever, introduced only a single square term or single interaction term at a single lag in any given model. Byallowing (but not requiring) all nonlinear terms and all lags to be included, we expected an improvementin the prediction ability. However, the TPR of this nonlinear ARMAX model was only slightly higher thanthe main effects only model and a greater improvement was achieved by using a classifier model (LGC).As the largest improvement seems to be due to using a classifiermodel instead of amodel that predicts values,it is likely that adding nonlinear terms above the second order to the ARMAXmodel would have little benefit.Further, training neural networks that incorporate all necessary nonlinearities (MLP) and lags beyond 1 day(RNN) will accomplish the goal of a nonlinear, time‐dependent model more efficiently. Both these last twomodel types increased TPR in the full model beyond that seen in the linear, one day lag LGC model.

Of the three classifier model types (LGC, MLP, and RNN), there was no clear frontrunner based on the AUCcriterion alone. However depending on the input variable set used and the optimal threshold chosen, somemodels performed better in the TPR validation, but as the ranking of model type varies within these sub-groups the choice should be made on the basis of what data will be available for forecasting, what cost func-tion is postulated, and the simplicity of the model. For example, if only ground‐based data are available(KpDstSunspot model) the LGCmodel would likely be the best choice as it is the simplest and does not differsubstantially from the RNN model in either AUC or the validation TPR. If the cost of a false negative is pre-sumed to be 4 times that of a false positive, the LGC model is the better choice no matter what variable setis used.

We used three input variable sets: (1) ground‐based data only (Kp,Dst, and sunspot number), (2) all variablesexcept previous flux (|B|, Bz, V, N, P, Ey, Kp, Dst, and sunspot number), and (3) all variables plus previousflux. We were therefore able to compare whether models with more variables provided better forecasting.While the ground‐based KpDstSunspot model performed slightly worse than the other two input sets, thedifference was not substantial. The AUC (measuring the ability to discriminate between classes) of theground‐based models was statistically the same as the full models without previous flux. There would not,therefore, be any great disadvantage to using the ground‐based only model if other data were not available.Similarly, while adding previous day's flux usually produced a small increase in predictive ability, the differ-ence is so slight that there should be no reservations about using the full model without previous flux if thoseare the only data available. This is in contrast to previous models that have been developed where the pre-vious flux is an important predictor that must be available for good predictions. However, this is not the onlyreason for dropping previous flux as an input variable. It appears that output from models relying on thisvariable tend to lag behind actual flux observations (Boynton et al., 2015; Simms et al., 2016). It is perhapsnot surprising that previous flux is an unnecessary input to our models, given how they are constructed.



As flux is dependent on the parameters inputted from the previous day(s), it is possible that most of the infor-mation present in the “previous flux” term is described almost as well by the lagged values of the other pre-dictors. Besides this, previous models have found a high dependence on previous flux because they include apersistence factor. As we have removed persistence, by only considering those days on which flux has not yetrisen, we would expect that any correlation of flux measured on consecutive days would mostly disappear.

Correlations between flux and some of these variables may be higher on the same day than if predictors aremeasured one day previous. For example, an increase in N or pressure may rapidly decrease flux observed atGOES as the radiation belts are compressed below geosynchronous orbit (Shprits et al., 2006). If we wereconstructing models to better understand the physical processes, we would include measurements of thesevariables on the same day as the flux observation. However, because we are producing a predictive model,we cannot include same‐day or nowcast predictors.

Many of these predictors may also operate more actively before a 1‐day lag. Many solar wind parametersappear to drive waves in themagnetosphere, which are the direct drivers of electron acceleration. As a result,the solar wind parameters may correlate better with flux increases if measured two or more days in the past,allowing time for wave enhancements that subsequently drive the flux enhancements. We incorporatedthese greater lags in the ARMAX and RNN models. The ARMAX model, as compared to the REG modelwhich did not include additional lags, showed a modest improvement in predicting true positives; however,this may, in part, be due to the addition of terms that describe the intrinsic time series patterns of the fluxitself. The RNN model, as compared to the LGC and MLP models which did not include additional lags,showed only modest (or no) improvement in the TPR.

While correlations may sometimes be higher between flux and solar wind and magnetospheric parametersmeasured more than a day before, and although the physical properties of the system may argue in favor ofusing greater lags, these models show that flux events can be successfully predicted using measurementsfrom only the day previous. Additionally, these predictions can be made using parameters that are likelynot the direct drivers of electron acceleration but only proxies for the processes (ULF and very low frequency[VLF] waves, substorms, etc.) that are thought to be the physical drivers. This allows predictions whenimmediate observations of the direct drivers are not available.

The validation correlation of our full regression model (correlating observations in the withheld test set withpredictions from the model) was 0.71. A previous regression model incorporating waves (ULF and VLF) aswell as many of the same solar wind and IMF parameters had an average validation correlation of 0.78(Simms et al., 2016). The addition of possible direct physical drivers, therefore, only provided a slightimprovement. This does not mean that the physical drivers (which are not included in our present models)are proven to have no association with electron flux, only that much of the information in observations of thephysical drivers appears to also be present in the general IMF/solar wind parameter set. This slight improve-ment in validation correlation may suggest that the addition of such parameters as ULF and VLF wave activ-ity and the levels of seed electrons to the classifier models might similarly result in a slight improvement inprediction of events. However, the focus of this study was producing and comparing predictive models thatwere only dependent on variables available in real time.

The models developed here only predict flux increases following days when flux is low. It would also be ofinterest to knowwhen high, persistent relativistic electron flux was predicted to drop below damaging levels.However, this situation might be better modeled by nonlinear decay processes associated with other para-meters. Electromagnetic ion cyclotron (EMIC) waves may be one such process contributing to flux reduc-tions (Simms et al., 2018a), although it may be of more consequence in ultrarelativistic electronpopulations (Cao et al., 2017).

6. Conclusions

1. The purpose of this study is several‐fold:

a To create a number of models using different techniques, but the same data set, to compare predictive suc-cess rates. We have used two value‐predictive techniques (multiple regression and ARMAX) and three clas-sifier techniques (linear‐classifying logistic regression and two non‐linear‐classifying neural networks).



b To incorporate only those predictors that are likely to be available on most days of the year so that predic-tion of damaging relativistic electron flux events can always be made. As a result, we do not use wave orlower energy electron data as these may only be available sporadically.

c To develop models that are not dependent on high‐energy flux measurements from the day before asthese observations are not always available.

d To account for possible nonlinear effects.e To only predict events on the days of interest: that is, on those days on which flux could rise, not on those

days where electron flux is already at damaging levels: This may reduce the apparent success rate of themodel, as it may be easy to predict that persistent flux will remain high for another day, but it focuses pre-dictions on events of interest.

f To assess models with more accurate measures: Models should be assessed by comparing success rates atpredicting events of high relativistic flux. Regression models, which predict values of flux, can have mis-leadingly high validation correlations which do not correspond to a high success rate of event prediction.

2. Classifier models such as logistic regression, MLP, or RNN, which predict group membership, havehigher success rates thanmultiple regression or ARMAX at predicting events. Using validation with a testset, at the equal cost threshold, the RNN models performed best at prediction. The full model using ninesolar wind, IMF, and magnetospheric index values performed better than the KpDstSun model.Including previous day's flux did not improve predictions.

3. Multiple and logistic regression are linear models that do not account for nonlinear relationshipsbetween input and output variables. Nonlinearity could be approximated with transformations and poly-nomial terms. This would provide information on which relationships were nonlinear and the nature ofthis nonlinearity. However, determining the best nonlinear terms to include is often a laborious processand is not guaranteed to result in an adequate model.

4. Neural networks, as they are not linear classifiers, are often more successful at approximating possiblenonlinearity within neurons. (The major drawback is that there is little information about which rela-tionships account for the explanatory power of the model.) A RNN, a form of long short‐term memorynetworks, can also result in an efficient model incorporating input variables from many previous timesteps.

5. In the full model, this incorporation of nonlinearities (MLP) and previous time steps (RNN) providedsome improvement in the TPR over that of the logistic regression (a linear classifier using only theprevious day's data).

6. Many previous models have depended on continuous inputs of previous flux to achieve a high successrate. This may not be a feasible input as these data are not always available. We explored the use ofARMAXmodels, whichmodel the time‐dependent behavior of the inputs and output, to produce amodelwhich was not dependent on previous measurements. The ARMAX models did not produce satisfactorypredictions. We discovered, however, that this inclusion of previous flux was not necessary in the classi-fier models.

7. The optimal probability threshold for classification can be determined with ROC curves. We use two pos-sible thresholds: one using equal costs for false negatives and positives and a second assuming the cost ofa false negative (missing an event) is 4 times that of a false positive (predicting an event that does not hap-pen). The unequal cost ratio produced predictions which were only slightly less accurate overall but weremore likely to predict events correctly.

8. For a full model including |B|, Bz, V, N, P, Ey, Kp, Dst, and sunspot number as predictors, with an equalcost ratio, we achieved a success rate of 56.7–73.1% TPR (LGC, MLP, and RNN models). Compared to aTPR of 19.1 (REG) and 44.1 (ARMAX), this is a significant improvement. For the unequal cost ratio, TPRranged from 73.1–85.1% (the highest being the LGC model). The ground‐observed only model(KpDstSunspot) was better or not much worse (77.6% TPR), while the model including previous fluxwas slightly better (83.6–89.6% TPR).

9. The LGC model, despite its inability to account for nonlinearities and variable inputs from before oneday previous, may be the preferred model to use if the unequal cost ratio is preferred. It is both moreparsimonious and showed a higher validation TPR. At the equal cost ratio, the RNN model performedbetter.



ReferencesAlpaydin, E. (2014). Introduction to machine learning, (3rd ed.). Cambridge, MA: MIT Press.Baker, D. N. (2000). The occurrence of operational anomalies in spacecraft and their relationship to space weather. IEEE Transactions on

Plasma Science, 28, 2007–2016.Baker, D. N., McPherron, R. L., Cayton, T. E., & Klebesadel, R. W. (1990). Linear prediction filter analysis of relativistic electron properties

at 6.6 RE. Journal of Geophysical Research, 95(A9), 15,133–15,140. https://doi.org/10.1029/JA095iA09p15133Balikhin, M. A., Boynton, R. J., Walker, S. N., Borovsky, J. E., Billings, S. A., & Wei, H. L. (2011). Using the NARMAX approach to model

the evolution of energetic electrons fluxes at geostationary orbit. Geophysical Research Letters, 38, L18105. https://doi.org/10.1029/2011GL048980

Balikhin, M. A., Rodriguez, J. V., Boynton, R. J., Walker, S. N., Aryan, H., Sibeck, D. G., & Billings, S. A. (2016). Comparative analysis ofNOAA REFM and SNB3GEO tools for the forecast of the fluxes of high‐energy electrons at GEO. Space Weather, 14, 22–31. https://doi.org/10.1002/2015SW001303

Bishop, C. M. (2006). Pattern recognition and machine learning. LLC, New York, NY: Springer Science+Business Media.Boynton, R. J., Balikhin, M. A., & Billings, S. A. (2015). Online NARMAX model for electron fluxes at GEO. Annales Geophysicae, 33,

405–411. https://doi.org/10.5194/angeo-33-405-2015Boynton, R. J., Balikhin, M. A., Sibeck, D. G., Walker, S. N., Billings, S. A., & Ganushkina, N. (2016). Electron flux models for different

energies at geostationary orbit. Space Weather, 14, 846–860. https://doi.org/10.1002/2016SW001506Cao, X., Shprits, Y. Y., Ni, B., & Zhelavskaya, I. S. (2017). Scattering of ultra‐relativistic electrons in the Van Allen radiation belts accounting

for hot plasma effects. Scientific Reports, 7, 7. https://doi.org/10.1038/s41598-017-17739-7DeTienne, K. B., DeTienne, D. H., & Joshi, S. A. (2003). Neural networks as statistical tools for business researchers. Organizational

Research Methods, 6, 236. https://doi.org/10.1177/1094428103251907Efron, B. (1979). Bootstrap methods: Another look at the jackknife. The Annals of Statistics, 7(1), 1–26. https://doi.org/10.1214/aos/

1176344552Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27, 861–874. https://doi.org/10.1016/j.patrec.2005.10.010Gjerloev, J. W. (2012). The SuperMAG data processing technique. Journal of Geophysical Research, 117, A09213. https://doi.org/10.1029/

2012JA017683Graves, A. (2013). Generating sequences with recurrent neural networks, arXiv:1308.0850v5Hanley, J. A., &McNeil, B. J. (1982). Amethod of comparing the areas under receiver operating characteristic curves derived from the same

cases. Radiology, 743, 29–36. https://doi.org/10.1148/radiology.148.3.6878708Hyndman, R. J., & Athanasopoulos, G. (2018). Forecasting: Principles and practice, (2nd ed.p. 291). Victoria, Australia: OTexts, Heathmont.Kingma, D. P., & Ba, J. L. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980Koons, H. C., & Gorney, D. J. (1991). A neural network model of the relativistic electron flux at geosynchronous orbit. Journal of

Geophysical Research, 96, 5549. https://doi.org/10.1029/90JA02380Krzanowski, W. J., & Hand, D. J. (2009). ROC curves for continuous data. London: Chapman and Hall.Li, X. (2004). Variations of 0.7–6.0 MeV electrons at geosynchronous orbit as a function of solar wind. Space Weather, 2, S03006. https://doi.

org/10.1029/2003SW000017Ling, A. G., Ginet, G. P., Hilmer, R. V., & Perry, K. L. (2010). A neural netswork–based geosynchronous relativistic electron flux forecasting

model. Space Weather, 8, S09003. https://doi.org/10.1029/2010SW000576Makridakis, S. G., Wheelwright, S. C., & Hyndman, R. J. (1998). Forecasting: Methods and applications, (3rd ed.p. 652). New York, NY: John

Wiley and Sons.Metz, C. E. (1978). Basic principles of ROC analysis. Seminars in Nuclear Medicine, 8(4), 283–298. https://doi.org/10.1016/S0001-2998(78)

80014-2Neter, J., Wasserman, W., & Kutner, M. H. (1985). Applied linear statistical models. Homewood, Ill: Richard D. Irwin, Inc.Newell, P. T., & Gjerloev, J. W. (2011). Evaluation of SuperMAG auroral electrojet indices as indicators of substorms and auroral power.

Journal of Geophysical Research, 116, A12211. https://doi.org/10.1029/2011JA016779O'Brien, T. P., & McPherron, R. L. (2003). An empirical dynamic equation for energetic electrons at geosynchronous orbit. Journal of

Geophysical Research, 108(A3), 1137. https://doi.org/10.1029/2002JA009324Potapov, A. S. (2017). Relativistic electrons of the outer radiation belt and methods of their forecast (review). Solar‐Terrestrial Physics, 3(1),

57–72. https://doi.org/10.12737/article_58f9703837c248.84596315Potapov, A. S., Ryzhakova, L. V., & Tsegmed, B. (2016). A new approach to predict and estimate enhancements of “killer” electron flux at

geosynchronous orbit. Acta Astronautica, 126, 47–51. https://doi.org/10.1016/j.actaastro.2016.04.017Reeves, G. D., Morley, S. K., Friedel, R. H. W., Henderson, M. G., Cayton, T. E., Cunningham, G., et al. (2011). On the relationship between

relativistic electron flux and solar wind velocity: Paulikas and Blake revisited. Journal of Geophysical Research, 116, A02213. https://doi.org/10.1029/2010JA015735

Schwarz, G. (1978). Estimating the Dimension of a Model. Annals of Statistics, 6(2), 461–464. https://doi.org/10.1214/aos/1176344136Shprits, Y. Y., Thorne, R. M., Friedel, R., Reeves, G. D., Fennell, J., Baker, D. N., & Kanekal, S. G. (2006). Outward radial diffusion driven by

losses at magnetopause. Journal of Geophysical Research, 111, A11214. https://doi.org/10.1029/2006JA011657Simms, L., Engebretson, M., Clilverd, M., Rodger, C., Lessard, M., Gjerloev, J., & Reeves, G. (2018a). A distributed lag autoregressive model

of geostationary relativistic electron fluxes: Comparing the influences of waves, seed and source electrons, and solar wind inputs.Journal of Geophysical Research: Space Physics, 123, 3646–3671. https://doi.org/10.1029/2017JA025002

Simms, L. E., Engebretson, M. J., Clilverd, M. A., Rodger, C. J., Lessard, M. R., & Reeves, G. D. (2018b). Nonlinear and synergistic effects ofULF Pc5, VLF chorus, and EMIC waves on relativistic electron flux at geosynchronous orbit. Journal of Geophysical Research: SpacePhysics, 123, 4755–4766. https://doi.org/10.1029/2017JA025003

Simms, L. E., Engebretson, M. J., Pilipenko, V., Reeves, G. D., & Clilverd, M. (2016). Empirical predictive models of daily relativisticelectron flux at geostationary orbit: Multiple regression analysis. Journal of Geophysical Research: Space Physics, 121, 3181–3197. https://doi.org/10.1002/2016JA022414

Simms, L. E., Engebretson, M. J., Smith, A. J., Clilverd, M., Pilipenko, V. A., & Reeves, G. D. (2014). Prediction of relativistic electron fluxfollowing storms at geostationary orbit: Multiple regression analysis. Journal of Geophysical Research: Space Physics, 119, 7297–7318.https://doi.org/10.1002/2014JA019955

Wing, S., Johnson, J. R., Camporeale, E., & Reeves, G. D. (2016). Information theoretical approach to discovering solar wind drivers of theouter radiation belt. Journal of Geophysical Research: Space Physics, 121, 9378–9399. https://doi.org/10.1002/2016JA022711



Acknowledgments

We thank two anonymous reviewers fortheir helpful suggestions on this work.GOES energetic particle data areavailable at the NOAA Space WeatherPrediction Center (https://www.swpc.noaa.gov/products/goes-electron-flux).OMNIWeb data are available from theGoddard Space Flight Center SpacePhysics Data Facility (https://omniweb.gsfc.nasa.gov/ow.html). This work wassupported by NSF Grant AGS‐1651263.

https://doi.org/10.1029/JA095iA09p15133

https://doi.org/10.1029/2011GL048980

https://doi.org/10.1029/2011GL048980

https://doi.org/10.1002/2015SW001303

https://doi.org/10.1002/2015SW001303

https://doi.org/10.5194/angeo-33-405-2015

https://doi.org/10.1002/2016SW001506

https://doi.org/10.1038/s41598-017-17739-7

https://doi.org/10.1177/1094428103251907

https://doi.org/10.1214/aos/1176344552

https://doi.org/10.1214/aos/1176344552

https://doi.org/10.1016/j.patrec.2005.10.010

https://doi.org/10.1029/2012JA017683

https://doi.org/10.1029/2012JA017683

https://doi.org/10.1148/radiology.148.3.6878708

https://doi.org/10.1029/90JA02380

https://doi.org/10.1029/2003SW000017

https://doi.org/10.1029/2003SW000017

https://doi.org/10.1029/2010SW000576

https://doi.org/10.1016/S0001-2998(78)80014-2

https://doi.org/10.1016/S0001-2998(78)80014-2

https://doi.org/10.1029/2011JA016779

https://doi.org/10.1029/2002JA009324

https://doi.org/10.12737/article_58f9703837c248.84596315

https://doi.org/10.1016/j.actaastro.2016.04.017

https://doi.org/10.1029/2010JA015735

https://doi.org/10.1029/2010JA015735

https://doi.org/10.1214/aos/1176344136

https://doi.org/10.1029/2006JA011657

https://doi.org/10.1029/2017JA025002

https://doi.org/10.1029/2017JA025003

https://doi.org/10.1002/2016JA022414

https://doi.org/10.1002/2016JA022414

https://doi.org/10.1002/2014JA019955

https://doi.org/10.1002/2016JA022711

https://www.swpc.noaa.gov/products/goes-electron-flux

https://www.swpc.noaa.gov/products/goes-electron-flux

https://omniweb.gsfc.nasa.gov/ow.html

https://omniweb.gsfc.nasa.gov/ow.html

Wing, S., Johnson, J. R., Jen, J., Meng, C.‐I., Sibeck, D. G., Bechtold, K., et al. (2005). Kp forecast models. Journal of Geophysical Research,110, A04203. https://doi.org/10.1029/2004JA010500

Wu, J.‐G., & Lundstedt, H. (1996). Prediction of geomagnetic storms from solar wind data using Elman recurrent neural networks.Geophysical Research Letters, 23(4), 319–322. https://doi.org/10.1029/96GL00259



https://doi.org/10.1029/2004JA010500

https://doi.org/10.1029/96GL00259

Date post:	25-Nov-2021
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

Classifier Neural Network Models Predict Relativistic ...

Documents