www.elsevier.com/locate/ijforecast
International Journal of Forec
Short-term prediction of wind energy production
Ismael Sanchez*
Universidad Carlos III de Madrid, Avd. de la Universidad 30, 28911, Leganes, Madrid, Spain
Abstract
This paper describes a statistical forecasting system for the short-term prediction (up to 48 h ahead) of the wind energy
production of a wind farm. The main feature of the proposed prediction system is its adaptability. The need for an adaptive
prediction system is twofold. First, it has to deal with highly nonlinear relationships between the variables involved. Second,
the prediction system would generate predictions for alternative wind farms, as it is made by the system operator for efficient
network integration. This flexibility is attained through (i) the use of alternative models based on different assumptions about
the variables involved; (ii) the adaptive estimation of their parameters using different recursive techniques; and (iii) using an
on-line adaptive forecast combination scheme to obtain the final prediction. The described procedure is currently imple-
mented in SIPREOLICO, a wind energy prediction tool that is part of the on-line management of the Spanish Peninsular
system operation.
D 2005 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
Keywords: Adaptive estimation; Dynamic models; Forecast combination; Kalman filter; Nonparametric regression; Recursive least squares;
Wind energy
1. Introduction
Wind energy has been, in the last decade, the
fastest growing energy technology. In some countries
as Germany, Denmark, and Spain, wind power is
widely used. In the case of Spain, more than 4% of
its electricity comes from this source of energy. How-
ever, in spite of the noticeable benefits of wind energy,
this level of installed capacity can have unwanted
consequences. The reason for this is that wind energy
0169-2070/$ - see front matter D 2005 International Institute of Forecaste
doi:10.1016/j.ijforecast.2005.05.003
* Fax: +34 916249430.
E-mail address: [email protected].
cannot be scheduled and, therefore, there will always
be uncertainty about the final production. As a con-
sequence, the uncertainty caused by the connection of
many utilities to the grid can decrease the efficiency of
the network system operation. Accordingly, both util-
ities and the system operator need accurate on-line
forecasts of the wind power production. In a liberal-
ized electricity market, such a forecasting ability will
help enhance the position of wind energy compared to
other forms of energy.
This article describes a statistical forecasting sys-
tem for wind energy prediction based on the adaptive
combination of alternative dynamic models. The
main feature of the forecasting system is its flexibil-
asting 22 (2006) 43–56
rs. Published by Elsevier B.V. All rights reserved.
0 5 10 15 20 250
0.5
1
1.5
2
2.5
3
3.5 x 104(a)
Velocity of wind
Out
put p
ower
0 2 4 6 8 10 12 14 16 18 200
0.5
1
1.5
2
2.5
3
3.5 x 104(b)
Velocity of wind
Out
put p
ower
Period 1Period 2
Period 1Period 2
Fig. 1. Hourly average wind speed and generated power in a wind
farm in Spain. In each picture, periods 1 and 2 are consecutive.
I. Sanchez / International Journal of Forecasting 22 (2006) 43–5644
ity. This flexibility is needed for two main reasons.
First, it has to deal with highly nonlinear relation-
ships between the variables involved. Second, the
prediction system would generate predictions for al-
ternative wind farms of different characteristics, as it
is made by the system operator for efficient network
integration.
For a given wind farm, the input variables are
the meteorological predictions of wind (velocity and
direction) for the next 48 h and past values of
output power. The forecasting system has then to
supply, on an hourly basis, the predicted output
power up to 48 h ahead. The prediction system
needs to operate on-line. Therefore, once some
preliminary off-line identification of the models is
made using data from selected sites, the system
needs to be flexible enough to adapt to (a) unfore-
seen changing relationships between the variables
involved and (b) alternative wind farms with mini-
mal or no calibration. This on-line flexibility also
requires recursive estimation procedures that must
be performed in reasonable time. For instance, the
Spanish system operator needs to calculate hourly
predictions for more than 200 wind farms. There is,
therefore, little time for doing the calculations for
each wind farm.
From a statistical point of view, wind energy data
has some interesting features: (i) the relationship
between the velocity of the wind and the generated
power is highly nonlinear and, therefore, candidate
predictors have the risk of only being reliable within
certain ranges of data; and (ii) for a given velocity,
this relationship is time-varying because it depends
on other variables such as wind direction, local air
density, local temperature variations, local effects of
clouds and rain, and so forth. Since some of these
variables are difficult to foresee or even measure,
they might not be appropriately included in a model.
Fig. 1 shows some typical situations with wind
energy data that help illustrate these features. In
this figure, both graphs (a) and (b) show 200 con-
secutive hourly data points of velocity of wind
(hourly average) and generated power at a certain
wind farm in Spain. The first 100 points are marked
with a circle (o), whereas the last 100 points are
marked with a plus symbol (+). It can be seen in
these pictures that a model estimated by using the
first 100 points (period 1) will produce a poor per-
formance when applied to the next 100 points (peri-
od 2) if no adaptation is allowed.
In Fig. 1(a), due to stronger winds in period 2, the
wind park has reached the rated capacity. In this
situation, the output power of the rotor of the wind
turbine is maintained at an approximately constant
level, or should even be reduced in order to avoid
damages. These changes in the relationship between
speed and power are regulated by a control system
that might not be the same across all windmills.
Besides, this control technology is constantly evolv-
ing (see, for instance, Ackerman & Soder, 2002).
Therefore, such behaviour should be learned from
data. The changes observed in Fig. 1(b) can be pro-
0.5
1
1.5
2
2.5
3 x 104
kW
(a)
(b)
800
600
400
200
-200
0
0 5 10 15 20 25 30
kW
m/s
I. Sanchez / International Journal of Forecasting 22 (2006) 43–56 45
duced by changes in the power control of the wind
turbines, changes in wind direction or changes in
other meteorological variables that affect the efficien-
cy of the wind turbine. In both examples, an adaptive
forecasting system is likely to yield a better perfor-
mance than a predictor based on a single model with
constant parameters.
Let pt+h denote the output power of a wind farm at
time t +h and pt+h |tC the generated prediction using
information up to time t. The proposed methodology
for obtaining pt+h |tC is based on the following key
features: (a) the use of alternative nonlinear models
for obtaining a set of individual predictions; (b) the
use of a highly adaptive estimation of the parameters;
and (c) the construction of the final prediction using
an adaptive combination of the competing predictions.
Then, the prediction can be written as the result of the
following combination:
ppCtþhjt ¼XKk¼1
/ hð Þtk pp
kð Þtþhjt; ð1Þ
where K is the total number of available predictions
obtained from the alternative models; pt+h |t(k) , k =1,
. . ., K are the individual predictions, and /tk(h) is the
time-varying weight used for model k. The alternative
models used to obtain the individual predictions pt+h |t(k)
will be described in Section 2. Then, Section 3 dis-
cusses the alternative recursive estimation procedures.
In Section 4, the adaptive forecast combination
scheme is presented. Section 5 illustrates the proposed
procedures with real data. Section 6 concludes. A
version of the methods described here is currently
implemented in a prediction tool called SIPREOLICO
(see Sanchez et al., 2002). SIPREOLICO is now
operating at the control centre of Red Electrica de
Espana (REE), the Spanish system operator. The pre-
dictions generated by SIPREOLICO are used for real
time operation, to solve constraints in the daily and
intra-daily market, to forecast the wind power for each
distribution company, and to make wind power mar-
ket simulations.
0 5 10 15 20 250
m/s
Fig. 2. (a) Example of a typical machine power curve in a wind
tunnel and (b) empirical values of velocity of wind and outpu
power in a real wind farm (average hourly values).
2. The competing dynamic models
We know from physics that the theoretical relation-
ship between the energy (per unit time) of wind that
flows at speed v (m/s) through an intercepting area A
(m2) is
p ¼ 1
2qAv3; ð2Þ
where q is the air density (kg/m3), which, in turn,
depends on the air temperature and pressure, among
other factors. The real relationship between the power
generated by the whole wind farm and the velocity of
the wind can, however, be more complex than just (2).
Fig. 2 illustrates this point. Fig. 2(a) is the so-called
machine power curve (deterministic output power as a
function of the input velocity of wind) of a particular
wind turbine inside a wind tunnel as it appears, for
instance, in the information supplied by its manufac-
turer. From this machine power curve, we can see that
below some minimum wind speed, called connection
speed, the wind turbine does not produce power. After
this connection speed, the power increases as the wind
speed does. The profile of this growing part of the
power curve follows a similar growing pattern to that
shown in (2) and it also depends on the particular
t
I. Sanchez / International Journal of Forecasting 22 (2006) 43–5646
technology of the wind turbine (see, e.g., Ackerman &
Soder, 2002). When the speed increases and reaches
the so-called nominal speed, the power reaches the
rated capacity of the wind turbine. After the nominal
speed, the output power is kept constant for some
range of the wind speed. There are also alternative
technical solutions to maintain this constant level as
the wind speed exceeds the nominal speed. Finally,
when the wind speed surpasses the so-called discon-
nection speed, the wind turbine is disconnected to
prevent damages from excessive wind.
In spite of the deterministic relationship shown in
Fig. 2(a), the empirical power curve obtained in the
real operation of a wind farm is far from being
deterministic. Fig. 2(b) displays the empirical power
curve for a whole wind farm over several months. In
this figure, each point represents the hourly average
wind speed and the resulting average power. Fig.
2(b) reveals that the observed power varies and that
the time-varying influence of some other variables
can have a substantial effect that cannot be neglected.
We can thus conclude that the relationship between
the wind velocity and the output power should be
treated as a nonlinear and stochastic time-varying
function of wind speed. There are many reasons for
the empirical power curve to be stochastic. Some of
them are related to the technology of the wind tur-
bines (Ackerman & Soder, 2002; Bianchi, Mantz, &
Christiansen, 2004). For instance, the power control
system of the wind turbine can cause the connection
and nominal wind speeds to vary, and also to differ
from the remaining wind turbines of the park. Be-
sides, the behaviour of the wind turbines when the
wind speed increases is different from the behaviour
when the wind speed decreases. Also, some other
meteorological variables such as the wind direction
or the temperature can affect the efficiency of the
wind turbine.
Some authors have proposed wind power fore-
casting systems based solely on the transformation of
local wind predictions into power using the deter-
ministic machine power curves. The most popular
forecasting systems based on this deterministic ap-
proach are the Previento system (Beyer, Heinemann,
Mellingho, Monnich, & Waldl, 1999; Focken,
Lange, & Waldl, 2001) and the Prediktor system
(Joensen, Giebel, Landberg, Madsen, & Nielsen,
1999; Landberg, 1994). A quick look at Fig. 2(b)
can explain why these systems can be improved on
by using a statistical forecasting system based on
dynamic models. This statistical approach of fore-
casting wind power is the idea of the also popular
Zephyr system (Landberg et al., 2002; see also Gie-
bel, Landberg, Nielsen, & Madsen, 2001; Nielsen,
Madsen, & Tofting, 1999). The Zephyr system uses
a nonparametric model based on the local polynomi-
al regression method of Cleveland and Devlin (1988)
(see also Joensen, Madsen, Nielsen, & Nielsen,
1999; Nielsen, Joensen, Madsen, & Landberg,
2000). This nonparametric model uses the informa-
tion of the observed power and the predictions of
wind speed and direction.
In this article, a more sophisticated modelling ap-
proach is presented. It is based on the use of several
alternative models, instead of a single one. It should
be mentioned that some of the alternative models used
in the proposed system are similar to the model used
in the Zephyr system. The final prediction is made
through an adaptive linear combination of the alter-
native predictors, where the weights given to each
predictor are based on their actual forecasting perfor-
mance. In order to avoid overfitting, the combination
is performed using only the models with better recent
performance. Therefore, since the combination is pre-
ceded by model evaluation and comparison, many
alternative models could be proposed with the only
restriction being the time needed for computation.
This combination scheme can be interpreted as a
model competition, where the winners are used to
obtain the final prediction.
To form the group of competing models, two dif-
ferent sets of models are proposed here, without dis-
missing the possibility of proposing some others in
the future. The first type of models are dynamic linear
models where the relationship between power and
wind is made using polynomials of different degree,
from linear to cubic, and whose coefficients are esti-
mated adaptively. The second type of models are
nonparametric models based on local polynomial fit-
ting in a similar fashion to the Zephyr system. Since
the prediction tool will be used in different wind
farms, an important argument in choosing a model
is that it should be implemented without any previous
calibration.
To ease adaptability, different models will be esti-
mated for each prediction horizon; i.e., the h-step
I. Sanchez / International Journal of Forecasting 22 (2006) 43–56 47
prediction constants, h N1 are obtained separately for
each h by minimising a relevant mean squared error of
prediction for that horizon. We will call these models
multi-period-ahead models. Several authors have ar-
gued that the use of these multi-period-ahead models
can help improve the forecasting performance, espe-
cially in those cases where a dtrueT model can be
regarded as implausible and where any model can
only be seen as a useful approximation (see, i.e.,
Bhansali, 1996; Kang, 2003; and references therein).
We now include a brief description of the alternative
models, denoted as M1 to M9.
The model M1 is an univariate multi-period-ahead
autoregression of the form
ptþh ¼ a0t þ Pt k; cð Þ þ eM1ð Þtþhjt; ð3Þ
where
Pt k; cð Þ ¼Xki¼1
ai;tptþ1�i þ akþ1;tptþh�c; ð4Þ
et+h |t(M1) is the h-step ahead prediction error of this
model M1; and ait, i=0, 1, . . ., k are time-varying
parameters. A precise notation would also use a
forecast horizon index, but for the sake of simplicity
it is omitted. The term pt+h�c in (4) intends to
capture the daily cycle. If h b24�k, then c =24, and
thus pt+h�24 is the generated power 24 h before the
time of prediction. If hz24�k, then c =h +k, and the
model is just a multi-period-ahead AR(k +1). The
experience with Spanish data suggests k =3. This
simple model can be considered as an extension of
the so-called persistence model (pt+h |t=pt). This
model is of special interest if, for some reason, mete-
orological information is not available or it is of low
quality.
Models M2, M3, and M4 use the same parameter-
isation as M1 plus the information of the forecasted
wind speed. These multi-period-ahead predictors have
the form
ptþh ¼ a0;t þ Pt k; cð Þ þWtþhjt qð Þ þ eMmð Þtþhjt ;m ¼ 2; 3; 4;
ð5Þ
with
Wtþhjt qð Þ ¼Xqi¼1
bi;t vvitþhjt; ð6Þ
where vt+h |t is the forecasted speed of wind made at
period t for period t +h and where bi,t, i=1, . . ., q, are
time-varying parameters. In M2, we use q =1; in M3,
q=2; and in M4, q=3. Models M5, M6, and M7 use
the same parameterisation as M2, M3, and M4, respec-
tively, plus the information of the forecasted wind
direction. The wind direction has valuable informa-
tion. First, wind farms usually have some dominant
directions where velocity often moves in a narrow
range. Then, the direction of the wind can be a
predictor of the speed. Second, the performance of a
wind turbine depends on the direction. This direction
dependence is due to the effect of the terrain and also
to the so-called shadow effect of surrounding wind
turbines. These multi-period-ahead models are
ptþh ¼ a0;t þ Pt k; cð Þ þWtþhjt qð Þ þ Dtþhjt þ eMmð Þtþhjt ;
m ¼ 5; 6; 7; ð7Þ
with
Dtþhjt¼c1;tsin2p/tþhjt360
!þ c2;tcos
2p/tþhjt360
!;
ð8Þwhere / t+h |t is the forecasted wind direction
and c1,t and c2,t are time-varying parameters.
As in (6), the factor Wt+h |t (q) uses q=1 in model
M5, q =2 in M6 and q=3 in M7. Models M5, M6, and
M7 will have a competitive performance when the
available prediction of the speed of the wind is of
low quality.
Finally, models M8 and M9 use the same parame-
terisation as the autoregression M1 plus a nonpara-
metric predictor based on the daily cycle, and another
nonparametric predictor based on wind prediction.
The idea is similar to the conditional parametric
ARX models of Nielsen et al. (2000). The functional
form of M8 is
ptþh ¼ a0;t þ Pt k; cð Þ þ Fvtþhjt þ Fh
tþhjt þ eM8ð Þtþhjt; ð9Þ
with
Fvtþhjt ¼
XIi¼1
ai;thtþhþ1�i vð Þ; and ð10Þ
FHtþhjt ¼
XJj¼1
bj;thtþhþ1�j Hð Þ; ð11Þ
I. Sanchez / International Journal of Forecasting 22 (2006) 43–5648
where ht+h(d ) are nonparametric functions based on
local polynomials as proposed by Cleveland and Dev-
lin (1988) and Fan and Gijbels (1996). The local
polynomial fitting is implemented recursively as in
Vilar-Fernandez and Vilar-Fernandez (1998). The
parameters ai,t and bj,t are time-varying parameters.
In (10), ht+h(v) is the result of a nonparametric local
linear regression where the output power pt+h is
explained as a linear function of wind speed vt+h |t,
whereas in (11) the regressor ht+h(H) is the nonpara-
metric prediction of pt+h using local linear regressions
with the hour at time t+h as the only regressor. The
functional form of M9 is
ptþh ¼ a0;t þ Pt k; cð Þ þ Fv/tþhjt þ FH
tþhjt þ eM9ð Þtþhjt; ð12Þ
where
Fv/tþh ¼
XIi¼1
di;thtþhþ1�i v;/ð Þ; ð13Þ
with ht+h(v, /) being a nonparametric function similar
to (10) based on local linear regressions using the
wind speed and wind direction as regressors, and
di,t being time-varying parameters. Experiments with
Spanish data recommend I =3 and J =2. The compu-
tation of the nonparametric functions ht+h(v), ht+h(H),
and ht+h(v, /) is very time consuming. Therefore, in
order to do a feasible implementation, they are only
evaluated at some specified grid of fitting points and
then interpolated for the remaining points. This spec-
ification, together with some other estimation aspects,
such as bandwidths and some smoothing factors,
make this method site-dependent. To solve this de-
pendency, models M8 and M9 can be calibrated from
the analysis of some selected sites, but their optimal
implementation would need a specific analysis at each
site using older data.
As mentioned above, one of the motivations for the
previous models is to build a flexible prediction sys-
tem to be used in different locations. If the goal is to
build a prediction tool to be used only in a single wind
farm, more site-dependent models like neural net-
works could also be proposed (Dutton, Kariniotakis,
Halliday, & Nogaret, 1999; Kariniotakis, Stavrakakis,
& Nogaret, 1996). Neural networks can be an efficient
procedure for dealing with nonlinearities. These pro-
cedures, however, would need long periods of time to
tune the algorithms to specific local conditions.
3. The recursive adaptive estimation
All the parameters involved in the above nine
proposed models need to be estimated recursively.
Such an estimation procedure is discussed in this
section. There are two main reasons for using adaptive
estimation for wind power forecasting. First, the non-
linear behaviour can cause any proposed model to be
a valid approximation only in a certain span of data as
illustrated in Fig. 1(a). Second, the power curves are
changing through time in response to, for instance,
meteorological changes, as seen in Fig. 1(b). Thus, the
parameters cannot be regarded as constants and
should be adapted as new information is available,
as proposed by Grillenzoni (1994) and Sanchez (in
press). In the proposed forecasting system, and in
order to adapt to diverse situations in a better way,
the recursive estimation is performed using two alter-
native procedures: recursive least squares (RLS) and
the Kalman filter (KF). RLS is performed in such a
way that the adaptability of the estimates is larger
when the system is changing quickly and is smaller
when the parameters are changing slowly or remain
constant. On the other hand, KF is performed assum-
ing that parameters are always evolving very slowly.
Hence, at the end of the estimation process we will
have doubled the number of competing predictors. A
generic time-varying model for the dependent variable
pt can be written as
pt ¼ ztVbt þ at ð14Þ
with bt a k�1 vector of time-varying parameters and
zt a k�1 vector of input variables that can be either
stochastic or deterministic. The RLS estimator is
bb RLSð Þt ¼ bb RLSð Þ
t�1 þ &&RLSð Þt ztaat; ð15Þ
with at =pt� ztVbt�1(RLS) being the prediction error. The
matrix &t(RLS) is the so-called gain matrix or weighted
covariance matrix, and is a measure of the dispersion
of the estimate b t(RLS). This matrix can be obtained
recursively using the well-known result (see, e.g.,
Grillenzoni, 1994)
&&tðRLSÞ ¼ 1
kt&&
RLSð Þt�1 � &&
RLSð Þt�1 ztztV&&
RLSð Þt�1
kt þ zt V&&RLSð Þt�1 zt
!: ð16Þ
I. Sanchez / International Journal of Forecasting 22 (2006) 43–56 49
The parameter kt is the so-called forgetting factor
and holds 0bktV1. The above RLS algorithm mini-
mizes the weighted criterion St2(b)= ( pt�bVzt)
2+
ktSt�12 (b), where it can be seen that the sequence of
forgetting factors, kt, is the key feature of this adaptiveprocedure. From this objective function we can observe
that the smaller the value of kt, the lower the influence
of past data in the estimation. Typically, the choice of
the forgetting factor is a compromise between the
ability to track changes in the parameters and the
need to reduce the variance of the prediction error.
The choice of the forgetting factor is very important
since it has a substantial effect on the efficiency of the
predictions. Most applications use a constant forgetting
factor, typically inside the range 0.950VkV0.999.
Here we will use an adaptive forgetting factor where
the speed of adaptation of the estimates is related to the
characteristics of the data. The adaptive forgetting
factor will be the so-called ktCook proposed by Sanchez
(in press), who proved that it provides better adaptation
features than some other popular alternatives. In par-
ticular, it is able to adapt to common situations occur-
ring with wind energy data, such as those described in
Fig. 1(a) and (b). The proposed ktCook is based on
Cook’s distance to measure the influence of the new
data and translate such influence into an adaptive for-
getting factor. From Sanchez (in press), we have that
Cook’s distance for the time-varying model (14) can be
written as
Ct ¼ztV&
RLSð Þt�1 zt
�pt � ztVbb
RLSð Þt�1
�2
rr2t�1
�1þ ztV&
RLSð Þt�1 zt
� ; ð17Þ
where rt�12 is a consistent estimator of E(at
2), for
instance
rr2t�1 ¼
Pt�1
i¼1
�pi � ztVb
RLSð Þi
�2
t � 1:
Then, to evaluate the value of Ct in (17), it can be
compared with the v2 distribution with m degrees of
freedom. Let us denote the survivor function of the vm2
distribution as StuSt(Ct), that is St(Ct)=P(vm2NCt).
Several adaptive forgetting factors can be proposed
using this survivor function. Based on the empirical
performance reported in Sanchez (in press), the RLS
estimation of our forecasting system will be per-
formed with the following adaptive forgetting factor:
kCookt ¼ min max 0:7; Stð Þ; 0:999½ :
For the second estimation procedure, based on KF,
we will assume that the parameter vector bt evolves
slowly, following a random walk with small variance;
that is
bt ¼ bt�1 þ et:
with E(et)=re2 Im, where Im is the identity matrix and
re2 a small positive constant. With this assumption, it
is known that KF gives a linear, unbiased, and min-
imum error variance recursive algorithm to optimally
predict the new parameter value. This algorithm can
be written as
bb KFð Þt ¼ bb KFð Þ
t�1 þ jtðpt � ztVbˆt�1 KFð ÞÞ
jt ¼ &&KFð Þt�1 zt ztV&&
KFð Þt�1 zt þ 1
��
&&KFð Þt ¼ &&
KFð Þt�1 � &&
KFð Þt�1 ztztV &
KFð Þt�1
1þ ztV &KFð Þt�1 zt
þ r2eIm:
Experience with Spanish data suggests r2=10�20.
The proposed RLS is more adaptive than this KF
model, since it does not assume any structure for
the evolution of the parameters. Besides, the speed
of adaptation in the RLS is also time-varying
according to the information in the data. However,
it is sensible to also justify the use of the proposed
KF for wind power forecast. The reason is that it
is not uncommon for meteorological variables to be
quite stable in some regions and for some periods.
For instance, low-pressure systems, which are usu-
ally associated with high winds, can affect a region
from 2 to 3 days. On the other hand, high-pressure
systems, usually associated with lighter winds, can
last even longer. Meteorological changes are thus
so slow that 1 h can be a very small measurement
unit. In these cases, a model based on hourly data
that assumes an evolution in the form of a random
walk with small variance can be a parsimonious
I. Sanchez / International Journal of Forecasting 22 (2006) 43–5650
and efficient alternative to models that are prepared
for any kind of contingency. Each of the above-
mentioned models M1 to M9 are estimated, for
each horizon, using both procedures. Therefore,
we are using 18 alternative predictors for each
horizon. The final prediction will be made through
some linear combination of these competing pre-
dictions. This combination is discussed in the next
section.
4. Adaptive forecast combination and the final
prediction
When several candidate models are available to
forecast a single variable, we can either select the
best model or combine them. Regarding model se-
lection, alternative selection procedures have been
proposed in the literature, both based on selection
criteria, like the popular AIC (Akaike, 1974) or BIC
(Schwartz, 1978), or on testing procedures (see, e.g.,
Chen & Yang, 2002 and references therein). On the
other hand, forecast combination is also a popular
and important tool in forecasting time series analysis,
and there is a vast body of literature that demon-
strates its usefulness (see, e.g., Clemen, 1989; Yang,
2004). Forecast combination is especially advised
when there are doubts about the existence of a
dbest modelT.In this section, we will use the theory of forecast
combination to produce a combination of our
K =18 alternative predictions to obtain the final
prediction of pt+h. This final prediction can be
written as
ppCtþhjt ¼XKk¼1
/ hð Þtk pp
kð Þtþhjt; ð18Þ
where /tk(h) is the time-varying weight given to
model k and K is the total number of competing
predictors. Our forecast combination will be adap-
tive; i.e., the weight given to each model will
evolve through time. Note that, through the combi-
nation of the competing models, we not only look
for a dynamic adaptation, but also ease the adap-
tation of the prediction tool to alternative wind
farms, since the relative performance of the com-
peting models can depend on the location.
Since the number of alternative predictors is large
and the relative performance of them can be very
different, we will only combine a subset of them.
There is an agreement between practitioners that
poor forecasts should not be included in the combi-
nation (see, i.e., Bunn, 1985). The intuitive reason is
that any predictor can have some good performance at
some time just by chance. Thus, even the poorest
predictor will have a non-zero weight in the combi-
nation, to the detriment of better predictors, and caus-
ing a loss of efficiency. Therefore, by combining only
the important procedures, we can reduce the variabil-
ity of the combined forecast, leading to a much better
performance (Yang, 2004). The subset of selected
predictors will be chosen according to their recorded
recent performance. We will only combine the d best
predictors, where d can be time-varying (see Swanson
& Zeng, 2001, for alternative procedures for doing
this selection when the number of predictors is small
and d fixed). Then, some of the /tk(h) in (18) will be
zero. If we always combine the same subset of pre-
dictors, we could build a recursive combination
scheme using a regression with time-varying coeffi-
cients, as described in Section 3 (see also Diebold &
Pauly, 1987; Sessions & Chaterjee, 1989; and Terui &
van Dijk, 2002, for alternative recursive combinations
with d fixed). The idea of using as many predictors as
possible and then selecting which of them will enter
into the final combination is similar to the nonpara-
metric model proposed by Kohn et al. (2001). These
authors consider building a nonparametric regression
using linear combinations of basis functions (polyno-
mials, splines, etc.). To ensure flexible estimates, the
regression should include a large number of basis
functions. Then, to avoid overfitting, they select the
functions that will have a non-zero weight in the
regression. The Bayesian hierarchical method pro-
posed in Kohn et al. (2001) to solve the problem is,
however, computationally expensive for on-line oper-
ation, since it needs to be solved by Monte Carlo
simulation.
In order to propose a feasible adaptive combi-
nation procedure with a time-varying d, we will
first define the recursive on-line measurement of
forecasting performance. Once we have access to
the new wind power production pt, we can com-
pute the prediction errors et |t�h(i) =pt� pt|t�h
(i) of pre-
dicting pt from period t�h using the model i =1,
I. Sanchez / International Journal of Forecasting 22 (2006) 43–56 51
2, . . ., K. Then, we will use the following weight-
ed sum of products of prediction errors:
Si;jð Þtjt�h
¼ eeið Þtjt�h
eejð Þ
tjt�hþ kS i;jð Þ
t�1jt�h�1
¼Xts¼1
k t�sð Þeeið Þsjs�h
eejð Þ
sjs�h; i; j ¼ 1; N ;K; ð19Þ
where 0bk b1 has the same interpretation as the
above-mentioned forgetting factor. We could even
use an adaptive forgetting factor, k =kt. We can then
estimate the matrix of mean squared prediction errors
(MSPE) using an exponentially weighted moving av-
erage MSPE (EW–MSPE). Let us denote this EW–
MSPE matrix as Xt |t�hu Xt |t�h(k), where the (i, j)
element is
�XXtjt�h
�i; j
¼ Si;jð Þtjt�h
Xts¼1
k t�sð Þ
!�1
; i; j ¼ 1; ::;K:
ð20Þ
This matrix Xt |t�h is similar to the covariance matrix
proposed in Granger and Newbold (1986, p. 274). Let
us denote by Vt |t�h the EW–MSPE matrix with the
same information as Xt|t�h but sorted by its diagonal
elements in increasing order; i.e., the first row corre-
sponds to the best predictor and so on. Then, the
information of the best d predictors is in Vt |t�h[d],
which is the first d�d submatrix of Vt |t�h. Note that,
if dud(t), the size of Vt |t�h[d] and the predictors involved
can vary through time. We will adopt here the classical
approach for optimal forecast combination, using
weights that sum to one, and which are built using
the elements of Vt |t�h[d] . There are typically two options
for building the vector of combining coefficients:
using the full matrix Vt |t�h[d] or using only the diag-
onal (ignoring the correlation between the predic-
tors). In the first option, the vector bt |t�h[d] of
combining coefficients will be
bbd½ tjt�h
¼h�
VVd½ tjt�h
��1
cd
i.hc Vd�VV
d½ tjt�h
��1
cd
i; ð21Þ
where cd is a vector of ones of length d. In the
second option, the i-th element, i=1, . . ., d, of the
vector bt |t�h[d] of combining coefficients is
bbd½ i;tjt�h
¼�vv
ið Þtjt�h
��1�Xdl¼1
�vv
lð Þtjt�h
��1
; ð22Þ
where v(i)t |t�h, i=1, . . ., d, are the diagonal elements
of Vt |t�h[d] ; that is, the EW–MSPE of each selected
predictor. It should be noted that, in order to com-
bine a set of predictors to predict pt+h from period
t, the last available estimated EW–MSPE is X t |t�h.
Therefore, assuming that the best prediction of Xt+h |t
is Xt |t�h, which is equivalent to assuming a random
walk evolution of such a random variable, the com-
bination of the best d predictors will be
ppC d½ tþhjt ¼
Xdi¼1
bbd½ i;tjt�h
ppi½ tþhjt;
where pt+h |t[i] , i=1, . . ., d, is the predictor cor-
responding to row i of the EW–MSPE matrix Vt |t�h[d] .
t |t�h[d] . We will denote as dt
(h) the optimal number of
these predictors used in computing the final predic-
tion pt+h |tC . This number dt
(h) will be estimated using
the prediction performance of the different combin-
ing alternatives; that is, the performance of using
d=1, . . ., K. In order to evaluate the performance of
the K different combinations, we will use the same
definition of EW–MSPE as in (20) but now the
competing predictors are pt+h |tC[d] , d =1, . . ., K. The
prediction errors of these K combinations are
denoted by et |t�hC[d] =pt� pt |t�h
C[d] , d=1, . . ., K. Using
a recursion similar to (19), we can compute the
EW–MSPE of these prediction errors as
wwd½ tjt�h
¼ SC d½ tjt�h
Xts¼1
k t�sð Þ
!�1
; i; j ¼ 1; ::;K; ð23Þ
where St |t�hC[d] = (et |t�h
C[d])2+kSt�1|t�h�1C[d] . Hence, the es-
timated optimal combination will use the dt(h) best
predictors, where dt(h)= arg mind(wt |t�h
[d] ). The final
prediction is then
ppCtþhjt ¼Xdd hð Þ
t
i¼1
bb
dd
hð Þt
�i;tjt�h
ppi½ tþhjt: ð24Þ
I. Sanchez / International Journal of Forecasting 22 (2006) 43–5652
Note that, as with Xt+h |t, we have used the random
walk assumption for the evolution of (et+h |t[d] )2, and
therefore the best estimation of E[(et+h |t[d] )2] is wt |t�h
[d] .
As mentioned in Section 3, the random walk assump-
tion for the evolution of our random variables has a
physical justification, since atmospheric changes are
very slow. From a practical point of view, this random
walk assumption does not seem to be restrictive when
estimating dt(h) and the optimal weights (22), since
data suggest that the position of the predictors in the
sorted matrix Vt |t�h[d] also evolves very slowly. If the
position of the predictors in the ranking of perfor-
200 400 600 800 1000 1200 1400 1600 1800 20000
2
4
6
8
10
12
14
16
18Evolution of optimal d for h=6
Opt
imal
d
Time
0 200 400 600 800 1000 1200 1400 1600 1800 20000.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3Evolution of the EW-MSPE of the alternative predictors (relative to M1 with RLS)
Time
Rel
ativ
e E
W-M
SP
E
(a)
(b)
Fig. 3. (a) Evolution of the optimal numbers of predictors to
combine at h =6. (b) Evolution of the relative EW–MSPE of the
competing predictors (dotted lines) and the final combination (solid
lines). The values of EW–MSPE are relative to the EW–MSPE of
model M1 estimated by RLS.
mance changes very quickly, then wt |t�h[d] would be a
poor predictor of the performance of the combination
of the best dt(h) predictors. On the other hand, if the
position of the competing predictors in the ranking is
stable, wt |t�h[d] will be a good predictor of the perfor-
mance of the combination, and the coefficients bi,t |t�h
in (24) will lead to an efficient combination. Fig. 3
illustrates this point, showing a high stability across
time in the ranking of predictors. Fig. 3(a) shows the
evolution of values of dt(h) for h =6 using data from a
wind farm in Spain over time (the details of the
estimation are described in the next section), whereas
Fig. 3(b) shows the evolution, over the same period,
of the EW–MSPE of the alternative 18 predictors
(dotted line) at h=6. In order to ease visualization,
the alternative EW–MSPE have been divided by the
EW–MSPE of model M1 estimated with RLS. The
solid line in Fig. 3(b) is the EW–MSPE of the final
combination (24) using a similar estimation of
MSPE as in (20). Fig. 3 shows that the position
of the alternative predictors (dotted lines) in the
ranking of the best dt(h) predictors is very stable
across time. Thus, when dt(h) remains constant, it
tends to be based on the same predictors. The
relative performance of the competing models also
has a slow evolution, leading to a high stability in
dt(h).
5. Some results
This section shows an application of the method
using data from a wind farm in Spain. These results
should only be considered as illustrative, since the
performance can depend on many factors that here
will be fixed: location of the wind farm, characteris-
tics of the wind turbines, calibration of the nonpara-
metric models (M8 and M9), and the quality of the
wind predictions. The accuracy of the wind predic-
tions is the most influential factor in wind power
forecasting. We will therefore use both real wind
measurements, obtained from the anemometer of the
wind farm, and wind predictions supplied by the
Spanish Meteorological Institute.
The data corresponds to hourly average wind speed
and direction, and average hourly power measured
from January to April 2002. There is a total of 2800
data points when anemometer measurements are used
I. Sanchez / International Journal of Forecasting 22 (2006) 43–56 53
and 2052 when wind predictions are used. The first
100 observations were used to obtain initial estimates
in order to further apply RLS and KF. The on-line
adaptive combination of the 18 alternative predictors
were made by evaluating the EW–MSPE matrix (20)
and (23) with k =0.985, which is equivalent to an
asymptotic memory length of 24 h. The combination
coefficients have been obtained using (22).
Fig. 4 shows the empirical MSPE of the proposed
combination procedure at each horizon. For compar-
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 361
2
3
4
5
6
7
8x 106
MSPE of optimal combination and some individual modelsWind data: anemometer
Prediction horizon
MS
PE
Combination
M9 M7
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 400
5
10
15x 106
MSPE of optimal combination and some individual models Wind data: predictions
Prediction horizon
MS
PE
Combination
M9 M7
(a)
(b)
Fig. 4. MSPE of the optimal combination and models M7 and
M9 estimated with RLS using (a) real wind data and (b) wind
predictions.
ison, this figure also displays the MSPE of models M7
and M9 estimated by RLS (RLS and KF have very
similar performance in this data set), which are the
models that encompass all the characteristics of the
proposed models. ModelsM7 andM9 can then be seen
as the alternatives to a combination strategy. It can be
seen in both Fig. 4(a), based on real wind measure-
ments, and Fig. 4(b), based on wind predictions, that
the final predictions have better overall performance
than those individual models. Fig. 3 shown above is
based on real wind measurements. This figure illus-
trates how a different number of predictors is com-
bined as the relative accuracy of the competing
predictors changes across time.
In order to understand the role of the alternative
models in the final combination, Fig. 5 shows the
average combination coefficients of each model
when real wind measurements are used. Since we
are using real wind data, the relative performance of
model M1 is, as expected, very poor and, conse-
quently, its average combination coefficient is close
to zero. Therefore, we do not show its results here.
Fig. 5(a) displays the average coefficients of the
models that use the information of the wind (M2 to
M9). In this figure, we have merged the coefficients
corresponding to both estimation procedures, RLS
and KF. We have also merged the coefficients of
the models in which the use of the wind direction is
the only difference. This means summing up the
coefficients of M2 and M5 (models with a linear
function of the velocity), M3 and M6 (models with
a quadratic function of the velocity), M4 and M7
(models with a cubic function of the velocity), and
M8 and M9 (nonparametric models). Fig. 5(a) shows
that, out of the fully parametric models (M2 to M7),
the models with a cubic power of the wind (M4+M7)
have the largest weight, which is equivalent to say-
ing that they are better predictors. This result is in
agreement with (2). It can also be seen that these
models (M4+M7) have performance comparable to
the nonparametric models (M8 and M9, first solid
line from the bottom). However, when we consider
all the parametric models (M2 to M7, first solid line
from the top), the aggregated combination coefficient
surpasses that of the nonparametric models. This
allows us to conclude that, although nonparametric
models are especially suited to dealing with nonlinea-
rities, the combination of different polynomials of the
0 5 10 15 20 25 30 35 40 450
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Average combination coefficient of each model
Prediction horizon
Ave
rage
com
bina
tion
coef
ficie
nt
ParametricM2 to M7
NonparametricM8+M9
Param. with v1
M2+M5
Param. with v2
M3+M6
Param. with v3
M4+M7
0 200 400 600 800 1000 1200 1400 1600 18000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18Evolution of combination coefficient of model M9, at h=12, using RLS
Time
Com
bina
tion
coef
ficie
nt
(a)
(b)
Fig. 5. (a) Average combination coefficient of competing models.
The dotted lines correspond to the models based on a polynomial of
the velocity of wind with different orders. The solid lines are the
aggregation of all the parametric models M2 to M7 and the non-
parametric models M8 and M9, respectively. (b) Subsample of the
evolution of the combination coefficient of model M9 at h =12.
I. Sanchez / International Journal of Forecasting 22 (2006) 43–5654
velocity of wind is also an efficient way to model the
nonlinearity of this system and can make a significant
contribution to the final combination.
The combination coefficients for each model are
time-varying, aimed at adapting to the data. The
coefficients shown in Fig. 5(a) are just their average
values along the observed data points. Fig. 5(b)
illustrates the evolution of one of these combination
coefficients. This figure shows a portion of the evo-
lution of the combination coefficient of model M9
when used to predict at horizon h=12 using RLS as
the estimation method. It can then be seen that the
combination coefficient is only null for some short
periods, and for most of the time the coefficient
varies between 0.10 and 0.15. Since we are dealing
with 18 alternative predictors, a weight as large as
0.15 shows a significant contribution to the final
combination.
6. Concluding remarks
Through the design of a very flexible recursive
forecasting system, we have obtained a useful short-
term prediction tool for wind power production (and
perhaps with some other applications). The system is
based on an on-line time-varying forecast combina-
tion, where both the number of predictors and their
weights are time-varying. Both the competing models
and the estimation procedures have been selected in
such a way that a wide range of real situations can be
covered.
This statistical forecasting tool can also be ac-
companied by a set of numerical rules that intro-
duce non-random information about the wind park
which can affect predictions. An example of this
kind of information is the disconnection speed,
beyond which the wind turbines are disconnected
for safety reasons. Another example of non-random
changes that can affect predictions are changes in
the nominal power of the wind park, due to
changes in the number of wind turbines, mainte-
nance, and so on.
The described system has a modular framework
(competing models–estimation procedures–final
combination) that allows further independent re-
search to be made in every specific part of the
system and that will undoubtedly improve its
performance.
Acknowledgments
The author is grateful to the referees for their useful
comments. The author is also grateful to Carlos
Velasco for his computational assistance with the
nonparametric models. Some parts of this research
have been presented in the following seminars:
2002-IEA Symposium on Wind Forecasting Techni-
I. Sanchez / International Journal of Forecasting 22 (2006) 43–56 55
ques (Norrkiping), the World Wind Energy Confer-
ence and Exhibition (Berlin), the 2002 European
Wind Energy Conference (Paris), the 17th Internation-
al Workshop on Statistical Modelling (Chania), and
the XXI SEIO Meeting (Baeza). The author is grateful
to the attendants of the above-mentioned seminars for
their useful comments. This research has been partly
supported by Red Electrica de Espana and the ANE-
MOS project (ENK5-CT-2002-00665), funded by the
European Commission and grant SE 2004-03303
from Ministerio de Educacion y Ciencia. Any remain-
ing error is the author’s responsibility.
References
Ackerman, T., & Soder, L. (2002). An overview of wind energy
status 2002. Renewable and Sustainable Energy Reviews, 6,
67–128.
Akaike, H. (1974). A new look at the statistical model identi-
fication. IEEE Transactions on Automatic Control, AC-19,
716–723.
Beyer, H. G., Heinemann, D., Mellingho, H., Monnich, K., &Waldl,
H. P. (1999). Forecast of regional power output of wind turbines.
Proceedings of the EWEC 1999 (pp. 1070–1073).
Bhansali, R. J. (1996). Asymptotically efficient autoregressive
model selection for multistep prediction. Annals of the Institute
of Statistical Mathematics, 48, 577–602.
Bianchi, F. D., Mantz, R. J., & Christiansen, C. F. (2004). Power
regulation in pitch-controlled variable-speed WECS above rated
wind speed. Renewable Energy, 29, 1911–1922.
Bunn, D. W. (1985). Statistical efficiency in the linear combi-
nation of forecasts. International Journal of Forecasting, 1,
151–163.
Chen, Z., & Yang, Y. (2002). Time series models for forecasting:
Testing or combining? Manuscript, Iowa State University.
Clemen, R. T. (1989). Combining forecasts: A review and annotated
bibliography. International Journal of Forecasting, 5, 559–583.
Cleveland, W. S., & Devlin, S. J. (1988). Locally weighted
regression: An approach to regression analysis by local
fitting. Journal of the American Statistical Association, 83,
596–610.
Diebold, F. X., & Pauly, P. (1987). Structural change and the
combination of forecasts. Journal of Forecasting, 6, 21–40.
Dutton, A. G., Kariniotakis, G., Halliday, J. A., & Nogaret, E.
(1999). Load and wind power forecasting methods for the
optimal management of isolated power systems with high
wind penetration. Wind Engineering, 23, 69–87.
Fan, J., & Gijbels, I. (1996). Local polynomial modelling and its
applications. London7 Chapman & Hall.
Focken, U., Lange, M., & Waldl, H. P. (2001). Previento—
a wind power prediction system with an innovative
upscaling algorithm. Proceedings of the EWEC 2001
(pp. 826–829).
Giebel, G., Landberg, L., Nielsen, T. S., & Madsen, H. (2001). The
Zephyr project. The next generation prediction system. Proceed-
ings of the EWEC 2001 (pp. 777–780).
Granger, C. W. J., & Newbold, P. (1986). Forecasting economic
time series. San Diego7 Academic Press.
Grillenzoni, C. (1994). Optimal recursive estimation of dynamic
models. Journal of the American Statistical Association, 89,
777–787.
Joensen, A., Giebel, G., Landberg, L., Madsen, H., & Nielsen, A.
(1999). Model output statistics applied to wind power predic-
tion. Proceedings of the EWEC 1999 (pp. 1177–1180).
Joensen, A., Madsen, H., Nielsen, H. A., & Nielsen, T. S. (1999).
Tracking time-varying parameters using local regressions. Auto-
matica, 36, 1199–1204.
Kang, I.-B. (2003). Multi-period forecasting using different mod-
els for different horizons: An application to U.S. economic
time series data. International Journal of Forecasting, 19,
387–400.
Kariniotakis, G. N., Stavrakakis, G. S., & Nogaret, E. F.
(1996). Wind power forecasting using advanced neural net-
works models. IEEE Transactions on Energy Conversion, 11,
762–767.
Kohn, R., Michael, S., & Chan, D. (2001). Nonparametric regres-
sion using linear combinations of basis functions. Statistics and
Computing, 11, 313–322.
Landberg, L., (1994). Short-term prediction of local wind con-
ditions. PhD-Thesis, Riso National Laboratory. Roskilde,
Denmark.
Landberg, L., Giebel, G., Madsen, H., Nielsen, T. S., Jørgensen, J.
U., Laursen, L., et al. (2002). Wind farm production predic-
tion—the Zephyr model. Technical report. Roskilde, Denmark:
Riso national Laboratory.
Nielsen, T. S., Madsen, H., & Tofting, J. (1999). Experiences with
statistical methods for wind power prediction. Proceedings of
the EWEC 1999 (pp. 1066–1069).
Nielsen, T. S., Joensen, A., Madsen, H., & Landberg, L.
(2000). Tracking time-varying coefficient functions. Interna-
tional Journal of Adaptive Control and Signal Processing, 14,
813–828.
Sanchez, I. (in press). Recursive estimation of dynamic models
using Cook’s distance, with application to wind energy forecast.
Technometrics.
Sanchez, I., Usaola, J., Ravelo, O., Velasco, C., Domınguez, J.,
Lobo, M., et al. (2002). SIPREOLICO—a wind power pre-
diction system based on flexible combination of dynamic
models. Application to the Spanish power system. Proceed-
ings of the World Wind Energy Conference and Exhibition
2002.
Schwartz, G. (1978). Estimating the dimension of a model. Annals
of Statistics, 6, 461–464.
Sessions, D. N., & Chaterjee, S. (1989). The combining of forecasts
using recursive techniques with nonstationary weights. Journal
of Forecasting, 8, 239–251.
Swanson, N. R., & Zeng, T. (2001). Choosing among compet-
ing econometric forecasts: Regression-based forecast combi-
nation using model selection. Journal of Forecasting, 20,
425–440.
I. Sanchez / International Journal of Forecasting 22 (2006) 43–5656
Terui, N., & van Dijk, H. K. (2002). Combined forecasts from linear
and nonlinear time series models. International Journal of
Forecasting, 18, 421–438.
Vilar-Fernandez, J. A., & Vilar-Fernandez, J. M. (1998). Recur-
sive estimation of regression functions by local polynomial
fitting. Annals of the Institute of Statistical Mathematics, 50,
729–754.
Yang, Y. (2004). Combining forecasting procedures: Some theoret-
ical results. Econometric Theory, 20, 176–222.
Ismael Sanchez is Associate Professor of Statistics in the Politech-
nic School at Universidad Carlos III de Madrid. His main research
interests are time series analysis, forecasting, and statistical process
control. He has published in several leading journals, including the
Journal of the American Statistical Association, Technometrics, and
the Journal of Forecasting. He actively participates in a multidisci-
plinary team concerning the real-time prediction of wind energy
production for the Spanish Peninsular system.