Download - Short-term prediction of wind energy production

www.elsevier.com/locate/ijforecast

International Journal of Forec

Short-term prediction of wind energy production

Ismael Sanchez*

Universidad Carlos III de Madrid, Avd. de la Universidad 30, 28911, Leganes, Madrid, Spain

Abstract

This paper describes a statistical forecasting system for the short-term prediction (up to 48 h ahead) of the wind energy

production of a wind farm. The main feature of the proposed prediction system is its adaptability. The need for an adaptive

prediction system is twofold. First, it has to deal with highly nonlinear relationships between the variables involved. Second,

the prediction system would generate predictions for alternative wind farms, as it is made by the system operator for efficient

network integration. This flexibility is attained through (i) the use of alternative models based on different assumptions about

the variables involved; (ii) the adaptive estimation of their parameters using different recursive techniques; and (iii) using an

on-line adaptive forecast combination scheme to obtain the final prediction. The described procedure is currently imple-

mented in SIPREOLICO, a wind energy prediction tool that is part of the on-line management of the Spanish Peninsular

system operation.

D 2005 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.

Keywords: Adaptive estimation; Dynamic models; Forecast combination; Kalman filter; Nonparametric regression; Recursive least squares;

Wind energy

1. Introduction

Wind energy has been, in the last decade, the

fastest growing energy technology. In some countries

as Germany, Denmark, and Spain, wind power is

widely used. In the case of Spain, more than 4% of

its electricity comes from this source of energy. How-

ever, in spite of the noticeable benefits of wind energy,

this level of installed capacity can have unwanted

consequences. The reason for this is that wind energy

0169-2070/$ - see front matter D 2005 International Institute of Forecaste

doi:10.1016/j.ijforecast.2005.05.003

* Fax: +34 916249430.

E-mail address: [email protected].

cannot be scheduled and, therefore, there will always

be uncertainty about the final production. As a con-

sequence, the uncertainty caused by the connection of

many utilities to the grid can decrease the efficiency of

the network system operation. Accordingly, both util-

ities and the system operator need accurate on-line

forecasts of the wind power production. In a liberal-

ized electricity market, such a forecasting ability will

help enhance the position of wind energy compared to

other forms of energy.

This article describes a statistical forecasting sys-

tem for wind energy prediction based on the adaptive

combination of alternative dynamic models. The

main feature of the forecasting system is its flexibil-

asting 22 (2006) 43–56

rs. Published by Elsevier B.V. All rights reserved.

0 5 10 15 20 250

0.5

1

1.5

2

2.5

3

3.5 x 104(a)

Velocity of wind

Out

put p

ower

0 2 4 6 8 10 12 14 16 18 200

0.5

1

1.5

2

2.5

3

3.5 x 104(b)

Velocity of wind

Out

put p

ower

Period 1Period 2

Period 1Period 2

Fig. 1. Hourly average wind speed and generated power in a wind

farm in Spain. In each picture, periods 1 and 2 are consecutive.

I. Sanchez / International Journal of Forecasting 22 (2006) 43–5644

ity. This flexibility is needed for two main reasons.

First, it has to deal with highly nonlinear relation-

ships between the variables involved. Second, the

prediction system would generate predictions for al-

ternative wind farms of different characteristics, as it

is made by the system operator for efficient network

integration.

For a given wind farm, the input variables are

the meteorological predictions of wind (velocity and

direction) for the next 48 h and past values of

output power. The forecasting system has then to

supply, on an hourly basis, the predicted output

power up to 48 h ahead. The prediction system

needs to operate on-line. Therefore, once some

preliminary off-line identification of the models is

made using data from selected sites, the system

needs to be flexible enough to adapt to (a) unfore-

seen changing relationships between the variables

involved and (b) alternative wind farms with mini-

mal or no calibration. This on-line flexibility also

requires recursive estimation procedures that must

be performed in reasonable time. For instance, the

Spanish system operator needs to calculate hourly

predictions for more than 200 wind farms. There is,

therefore, little time for doing the calculations for

each wind farm.

From a statistical point of view, wind energy data

has some interesting features: (i) the relationship

between the velocity of the wind and the generated

power is highly nonlinear and, therefore, candidate

predictors have the risk of only being reliable within

certain ranges of data; and (ii) for a given velocity,

this relationship is time-varying because it depends

on other variables such as wind direction, local air

density, local temperature variations, local effects of

clouds and rain, and so forth. Since some of these

variables are difficult to foresee or even measure,

they might not be appropriately included in a model.

Fig. 1 shows some typical situations with wind

energy data that help illustrate these features. In

this figure, both graphs (a) and (b) show 200 con-

secutive hourly data points of velocity of wind

(hourly average) and generated power at a certain

wind farm in Spain. The first 100 points are marked

with a circle (o), whereas the last 100 points are

marked with a plus symbol (+). It can be seen in

these pictures that a model estimated by using the

first 100 points (period 1) will produce a poor per-

formance when applied to the next 100 points (peri-

od 2) if no adaptation is allowed.

In Fig. 1(a), due to stronger winds in period 2, the

wind park has reached the rated capacity. In this

situation, the output power of the rotor of the wind

turbine is maintained at an approximately constant

level, or should even be reduced in order to avoid

damages. These changes in the relationship between

speed and power are regulated by a control system

that might not be the same across all windmills.

Besides, this control technology is constantly evolv-

ing (see, for instance, Ackerman & Soder, 2002).

Therefore, such behaviour should be learned from

data. The changes observed in Fig. 1(b) can be pro-

0.5

1

1.5

2

2.5

3 x 104

kW

(a)

(b)

800

600

400

200

-200

0

0 5 10 15 20 25 30

kW

m/s

I. Sanchez / International Journal of Forecasting 22 (2006) 43–56 45

duced by changes in the power control of the wind

turbines, changes in wind direction or changes in

other meteorological variables that affect the efficien-

cy of the wind turbine. In both examples, an adaptive

forecasting system is likely to yield a better perfor-

mance than a predictor based on a single model with

constant parameters.

Let pt+h denote the output power of a wind farm at

time t +h and pt+h |tC the generated prediction using

information up to time t. The proposed methodology

for obtaining pt+h |tC is based on the following key

features: (a) the use of alternative nonlinear models

for obtaining a set of individual predictions; (b) the

use of a highly adaptive estimation of the parameters;

and (c) the construction of the final prediction using

an adaptive combination of the competing predictions.

Then, the prediction can be written as the result of the

following combination:

ppCtþhjt ¼XKk¼1

/ hð Þtk pp

kð Þtþhjt; ð1Þ

where K is the total number of available predictions

obtained from the alternative models; pt+h |t(k) , k =1,

. . ., K are the individual predictions, and /tk(h) is the

time-varying weight used for model k. The alternative

models used to obtain the individual predictions pt+h |t(k)

will be described in Section 2. Then, Section 3 dis-

cusses the alternative recursive estimation procedures.

In Section 4, the adaptive forecast combination

scheme is presented. Section 5 illustrates the proposed

procedures with real data. Section 6 concludes. A

version of the methods described here is currently

implemented in a prediction tool called SIPREOLICO

(see Sanchez et al., 2002). SIPREOLICO is now

operating at the control centre of Red Electrica de

Espana (REE), the Spanish system operator. The pre-

dictions generated by SIPREOLICO are used for real

time operation, to solve constraints in the daily and

intra-daily market, to forecast the wind power for each

distribution company, and to make wind power mar-

ket simulations.

0 5 10 15 20 250

m/s

Fig. 2. (a) Example of a typical machine power curve in a wind

tunnel and (b) empirical values of velocity of wind and outpu

power in a real wind farm (average hourly values).

2. The competing dynamic models

We know from physics that the theoretical relation-

ship between the energy (per unit time) of wind that

flows at speed v (m/s) through an intercepting area A

(m2) is

p ¼ 1

2qAv3; ð2Þ

where q is the air density (kg/m3), which, in turn,

depends on the air temperature and pressure, among

other factors. The real relationship between the power

generated by the whole wind farm and the velocity of

the wind can, however, be more complex than just (2).

Fig. 2 illustrates this point. Fig. 2(a) is the so-called

machine power curve (deterministic output power as a

function of the input velocity of wind) of a particular

wind turbine inside a wind tunnel as it appears, for

instance, in the information supplied by its manufac-

turer. From this machine power curve, we can see that

below some minimum wind speed, called connection

speed, the wind turbine does not produce power. After

this connection speed, the power increases as the wind

speed does. The profile of this growing part of the

power curve follows a similar growing pattern to that

shown in (2) and it also depends on the particular

t


technology of the wind turbine (see, e.g., Ackerman &

Soder, 2002). When the speed increases and reaches

the so-called nominal speed, the power reaches the

rated capacity of the wind turbine. After the nominal

speed, the output power is kept constant for some

range of the wind speed. There are also alternative

technical solutions to maintain this constant level as

the wind speed exceeds the nominal speed. Finally,

when the wind speed surpasses the so-called discon-

nection speed, the wind turbine is disconnected to

prevent damages from excessive wind.

In spite of the deterministic relationship shown in

Fig. 2(a), the empirical power curve obtained in the

real operation of a wind farm is far from being

deterministic. Fig. 2(b) displays the empirical power

curve for a whole wind farm over several months. In

this figure, each point represents the hourly average

wind speed and the resulting average power. Fig.

2(b) reveals that the observed power varies and that

the time-varying influence of some other variables

can have a substantial effect that cannot be neglected.

We can thus conclude that the relationship between

the wind velocity and the output power should be

treated as a nonlinear and stochastic time-varying

function of wind speed. There are many reasons for

the empirical power curve to be stochastic. Some of

them are related to the technology of the wind tur-

bines (Ackerman & Soder, 2002; Bianchi, Mantz, &

Christiansen, 2004). For instance, the power control

system of the wind turbine can cause the connection

and nominal wind speeds to vary, and also to differ

from the remaining wind turbines of the park. Be-

sides, the behaviour of the wind turbines when the

wind speed increases is different from the behaviour

when the wind speed decreases. Also, some other

meteorological variables such as the wind direction

or the temperature can affect the efficiency of the

wind turbine.

Some authors have proposed wind power fore-

casting systems based solely on the transformation of

local wind predictions into power using the deter-

ministic machine power curves. The most popular

forecasting systems based on this deterministic ap-

proach are the Previento system (Beyer, Heinemann,

Mellingho, Monnich, & Waldl, 1999; Focken,

Lange, & Waldl, 2001) and the Prediktor system

(Joensen, Giebel, Landberg, Madsen, & Nielsen,

1999; Landberg, 1994). A quick look at Fig. 2(b)

can explain why these systems can be improved on

by using a statistical forecasting system based on

dynamic models. This statistical approach of fore-

casting wind power is the idea of the also popular

Zephyr system (Landberg et al., 2002; see also Gie-

bel, Landberg, Nielsen, & Madsen, 2001; Nielsen,

Madsen, & Tofting, 1999). The Zephyr system uses

a nonparametric model based on the local polynomi-

al regression method of Cleveland and Devlin (1988)

(see also Joensen, Madsen, Nielsen, & Nielsen,

1999; Nielsen, Joensen, Madsen, & Landberg,

2000). This nonparametric model uses the informa-

tion of the observed power and the predictions of

wind speed and direction.

In this article, a more sophisticated modelling ap-

proach is presented. It is based on the use of several

alternative models, instead of a single one. It should

be mentioned that some of the alternative models used

in the proposed system are similar to the model used

in the Zephyr system. The final prediction is made

through an adaptive linear combination of the alter-

native predictors, where the weights given to each

predictor are based on their actual forecasting perfor-

mance. In order to avoid overfitting, the combination

is performed using only the models with better recent

performance. Therefore, since the combination is pre-

ceded by model evaluation and comparison, many

alternative models could be proposed with the only

restriction being the time needed for computation.

This combination scheme can be interpreted as a

model competition, where the winners are used to

obtain the final prediction.

To form the group of competing models, two dif-

ferent sets of models are proposed here, without dis-

missing the possibility of proposing some others in

the future. The first type of models are dynamic linear

models where the relationship between power and

wind is made using polynomials of different degree,

from linear to cubic, and whose coefficients are esti-

mated adaptively. The second type of models are

nonparametric models based on local polynomial fit-

ting in a similar fashion to the Zephyr system. Since

the prediction tool will be used in different wind

farms, an important argument in choosing a model

is that it should be implemented without any previous

calibration.

To ease adaptability, different models will be esti-

mated for each prediction horizon; i.e., the h-step


prediction constants, h N1 are obtained separately for

each h by minimising a relevant mean squared error of

prediction for that horizon. We will call these models

multi-period-ahead models. Several authors have ar-

gued that the use of these multi-period-ahead models

can help improve the forecasting performance, espe-

cially in those cases where a dtrueT model can be

regarded as implausible and where any model can

only be seen as a useful approximation (see, i.e.,

Bhansali, 1996; Kang, 2003; and references therein).

We now include a brief description of the alternative

models, denoted as M1 to M9.

The model M1 is an univariate multi-period-ahead

autoregression of the form

ptþh ¼ a0t þ Pt k; cð Þ þ eM1ð Þtþhjt; ð3Þ

where

Pt k; cð Þ ¼Xki¼1

ai;tptþ1�i þ akþ1;tptþh�c; ð4Þ

et+h |t(M1) is the h-step ahead prediction error of this

model M1; and ait, i=0, 1, . . ., k are time-varying

parameters. A precise notation would also use a

forecast horizon index, but for the sake of simplicity

it is omitted. The term pt+h�c in (4) intends to

capture the daily cycle. If h b24�k, then c =24, and

thus pt+h�24 is the generated power 24 h before the

time of prediction. If hz24�k, then c =h +k, and the

model is just a multi-period-ahead AR(k +1). The

experience with Spanish data suggests k =3. This

simple model can be considered as an extension of

the so-called persistence model (pt+h |t=pt). This

model is of special interest if, for some reason, mete-

orological information is not available or it is of low

quality.

Models M2, M3, and M4 use the same parameter-

isation as M1 plus the information of the forecasted

wind speed. These multi-period-ahead predictors have

the form

ptþh ¼ a0;t þ Pt k; cð Þ þWtþhjt qð Þ þ eMmð Þtþhjt ;m ¼ 2; 3; 4;

ð5Þ

with

Wtþhjt qð Þ ¼Xqi¼1

bi;t vvitþhjt; ð6Þ

where vt+h |t is the forecasted speed of wind made at

period t for period t +h and where bi,t, i=1, . . ., q, are

time-varying parameters. In M2, we use q =1; in M3,

q=2; and in M4, q=3. Models M5, M6, and M7 use

the same parameterisation as M2, M3, and M4, respec-

tively, plus the information of the forecasted wind

direction. The wind direction has valuable informa-

tion. First, wind farms usually have some dominant

directions where velocity often moves in a narrow

range. Then, the direction of the wind can be a

predictor of the speed. Second, the performance of a

wind turbine depends on the direction. This direction

dependence is due to the effect of the terrain and also

to the so-called shadow effect of surrounding wind

turbines. These multi-period-ahead models are

ptþh ¼ a0;t þ Pt k; cð Þ þWtþhjt qð Þ þ Dtþhjt þ eMmð Þtþhjt ;

m ¼ 5; 6; 7; ð7Þ

with

Dtþhjt¼c1;tsin2p/tþhjt360

!þ c2;tcos

2p/tþhjt360

!;

ð8Þwhere / t+h |t is the forecasted wind direction

and c1,t and c2,t are time-varying parameters.

As in (6), the factor Wt+h |t (q) uses q=1 in model

M5, q =2 in M6 and q=3 in M7. Models M5, M6, and

M7 will have a competitive performance when the

available prediction of the speed of the wind is of

low quality.

Finally, models M8 and M9 use the same parame-

terisation as the autoregression M1 plus a nonpara-

metric predictor based on the daily cycle, and another

nonparametric predictor based on wind prediction.

The idea is similar to the conditional parametric

ARX models of Nielsen et al. (2000). The functional

form of M8 is

ptþh ¼ a0;t þ Pt k; cð Þ þ Fvtþhjt þ Fh

tþhjt þ eM8ð Þtþhjt; ð9Þ

with

Fvtþhjt ¼

XIi¼1

ai;thtþhþ1�i vð Þ; and ð10Þ

FHtþhjt ¼

XJj¼1

bj;thtþhþ1�j Hð Þ; ð11Þ


where ht+h(d ) are nonparametric functions based on

local polynomials as proposed by Cleveland and Dev-

lin (1988) and Fan and Gijbels (1996). The local

polynomial fitting is implemented recursively as in

Vilar-Fernandez and Vilar-Fernandez (1998). The

parameters ai,t and bj,t are time-varying parameters.

In (10), ht+h(v) is the result of a nonparametric local

linear regression where the output power pt+h is

explained as a linear function of wind speed vt+h |t,

whereas in (11) the regressor ht+h(H) is the nonpara-

metric prediction of pt+h using local linear regressions

with the hour at time t+h as the only regressor. The

functional form of M9 is

ptþh ¼ a0;t þ Pt k; cð Þ þ Fv/tþhjt þ FH

tþhjt þ eM9ð Þtþhjt; ð12Þ

where

Fv/tþh ¼

XIi¼1

di;thtþhþ1�i v;/ð Þ; ð13Þ

with ht+h(v, /) being a nonparametric function similar

to (10) based on local linear regressions using the

wind speed and wind direction as regressors, and

di,t being time-varying parameters. Experiments with

Spanish data recommend I =3 and J =2. The compu-

tation of the nonparametric functions ht+h(v), ht+h(H),

and ht+h(v, /) is very time consuming. Therefore, in

order to do a feasible implementation, they are only

evaluated at some specified grid of fitting points and

then interpolated for the remaining points. This spec-

ification, together with some other estimation aspects,

such as bandwidths and some smoothing factors,

make this method site-dependent. To solve this de-

pendency, models M8 and M9 can be calibrated from

the analysis of some selected sites, but their optimal

implementation would need a specific analysis at each

site using older data.

As mentioned above, one of the motivations for the

previous models is to build a flexible prediction sys-

tem to be used in different locations. If the goal is to

build a prediction tool to be used only in a single wind

farm, more site-dependent models like neural net-

works could also be proposed (Dutton, Kariniotakis,

Halliday, & Nogaret, 1999; Kariniotakis, Stavrakakis,

& Nogaret, 1996). Neural networks can be an efficient

procedure for dealing with nonlinearities. These pro-

cedures, however, would need long periods of time to

tune the algorithms to specific local conditions.

3. The recursive adaptive estimation

All the parameters involved in the above nine

proposed models need to be estimated recursively.

Such an estimation procedure is discussed in this

section. There are two main reasons for using adaptive

estimation for wind power forecasting. First, the non-

linear behaviour can cause any proposed model to be

a valid approximation only in a certain span of data as

illustrated in Fig. 1(a). Second, the power curves are

changing through time in response to, for instance,

meteorological changes, as seen in Fig. 1(b). Thus, the

parameters cannot be regarded as constants and

should be adapted as new information is available,

as proposed by Grillenzoni (1994) and Sanchez (in

press). In the proposed forecasting system, and in

order to adapt to diverse situations in a better way,

the recursive estimation is performed using two alter-

native procedures: recursive least squares (RLS) and

the Kalman filter (KF). RLS is performed in such a

way that the adaptability of the estimates is larger

when the system is changing quickly and is smaller

when the parameters are changing slowly or remain

constant. On the other hand, KF is performed assum-

ing that parameters are always evolving very slowly.

Hence, at the end of the estimation process we will

have doubled the number of competing predictors. A

generic time-varying model for the dependent variable

pt can be written as

pt ¼ ztVbt þ at ð14Þ

with bt a k�1 vector of time-varying parameters and

zt a k�1 vector of input variables that can be either

stochastic or deterministic. The RLS estimator is

bb RLSð Þt ¼ bb RLSð Þ

t�1 þ &&RLSð Þt ztaat; ð15Þ

with at =pt� ztVbt�1(RLS) being the prediction error. The

matrix &t(RLS) is the so-called gain matrix or weighted

covariance matrix, and is a measure of the dispersion

of the estimate b t(RLS). This matrix can be obtained

recursively using the well-known result (see, e.g.,

Grillenzoni, 1994)

&&tðRLSÞ ¼ 1

kt&&

RLSð Þt�1 � &&

RLSð Þt�1 ztztV&&

RLSð Þt�1

kt þ zt V&&RLSð Þt�1 zt

!: ð16Þ


The parameter kt is the so-called forgetting factor

and holds 0bktV1. The above RLS algorithm mini-

mizes the weighted criterion St2(b)= ( pt�bVzt)

2+

ktSt�12 (b), where it can be seen that the sequence of

forgetting factors, kt, is the key feature of this adaptiveprocedure. From this objective function we can observe

that the smaller the value of kt, the lower the influence

of past data in the estimation. Typically, the choice of

the forgetting factor is a compromise between the

ability to track changes in the parameters and the

need to reduce the variance of the prediction error.

The choice of the forgetting factor is very important

since it has a substantial effect on the efficiency of the

predictions. Most applications use a constant forgetting

factor, typically inside the range 0.950VkV0.999.

Here we will use an adaptive forgetting factor where

the speed of adaptation of the estimates is related to the

characteristics of the data. The adaptive forgetting

factor will be the so-called ktCook proposed by Sanchez

(in press), who proved that it provides better adaptation

features than some other popular alternatives. In par-

ticular, it is able to adapt to common situations occur-

ring with wind energy data, such as those described in

Fig. 1(a) and (b). The proposed ktCook is based on

Cook’s distance to measure the influence of the new

data and translate such influence into an adaptive for-

getting factor. From Sanchez (in press), we have that

Cook’s distance for the time-varying model (14) can be

written as

Ct ¼ztV&

RLSð Þt�1 zt

�pt � ztVbb

RLSð Þt�1

�2

rr2t�1

�1þ ztV&

RLSð Þt�1 zt

� ; ð17Þ

where rt�12 is a consistent estimator of E(at

2), for

instance

rr2t�1 ¼

Pt�1

i¼1

�pi � ztVb

RLSð Þi

�2

t � 1:

Then, to evaluate the value of Ct in (17), it can be

compared with the v2 distribution with m degrees of

freedom. Let us denote the survivor function of the vm2

distribution as StuSt(Ct), that is St(Ct)=P(vm2NCt).

Several adaptive forgetting factors can be proposed

using this survivor function. Based on the empirical

performance reported in Sanchez (in press), the RLS

estimation of our forecasting system will be per-

formed with the following adaptive forgetting factor:

kCookt ¼ min max 0:7; Stð Þ; 0:999½ :

For the second estimation procedure, based on KF,

we will assume that the parameter vector bt evolves

slowly, following a random walk with small variance;

that is

bt ¼ bt�1 þ et:

with E(et)=re2 Im, where Im is the identity matrix and

re2 a small positive constant. With this assumption, it

is known that KF gives a linear, unbiased, and min-

imum error variance recursive algorithm to optimally

predict the new parameter value. This algorithm can

be written as

bb KFð Þt ¼ bb KFð Þ

t�1 þ jtðpt � ztVbˆt�1 KFð ÞÞ

jt ¼ &&KFð Þt�1 zt ztV&&

KFð Þt�1 zt þ 1

��

&&KFð Þt ¼ &&

KFð Þt�1 � &&

KFð Þt�1 ztztV &

KFð Þt�1

1þ ztV &KFð Þt�1 zt

þ r2eIm:

Experience with Spanish data suggests r2=10�20.

The proposed RLS is more adaptive than this KF

model, since it does not assume any structure for

the evolution of the parameters. Besides, the speed

of adaptation in the RLS is also time-varying

according to the information in the data. However,

it is sensible to also justify the use of the proposed

KF for wind power forecast. The reason is that it

is not uncommon for meteorological variables to be

quite stable in some regions and for some periods.

For instance, low-pressure systems, which are usu-

ally associated with high winds, can affect a region

from 2 to 3 days. On the other hand, high-pressure

systems, usually associated with lighter winds, can

last even longer. Meteorological changes are thus

so slow that 1 h can be a very small measurement

unit. In these cases, a model based on hourly data

that assumes an evolution in the form of a random

walk with small variance can be a parsimonious


and efficient alternative to models that are prepared

for any kind of contingency. Each of the above-

mentioned models M1 to M9 are estimated, for

each horizon, using both procedures. Therefore,

we are using 18 alternative predictors for each

horizon. The final prediction will be made through

some linear combination of these competing pre-

dictions. This combination is discussed in the next

section.

4. Adaptive forecast combination and the final

prediction

When several candidate models are available to

forecast a single variable, we can either select the

best model or combine them. Regarding model se-

lection, alternative selection procedures have been

proposed in the literature, both based on selection

criteria, like the popular AIC (Akaike, 1974) or BIC

(Schwartz, 1978), or on testing procedures (see, e.g.,

Chen & Yang, 2002 and references therein). On the

other hand, forecast combination is also a popular

and important tool in forecasting time series analysis,

and there is a vast body of literature that demon-

strates its usefulness (see, e.g., Clemen, 1989; Yang,

2004). Forecast combination is especially advised

when there are doubts about the existence of a

dbest modelT.In this section, we will use the theory of forecast

combination to produce a combination of our

K =18 alternative predictions to obtain the final

prediction of pt+h. This final prediction can be

written as

ppCtþhjt ¼XKk¼1

/ hð Þtk pp

kð Þtþhjt; ð18Þ

where /tk(h) is the time-varying weight given to

model k and K is the total number of competing

predictors. Our forecast combination will be adap-

tive; i.e., the weight given to each model will

evolve through time. Note that, through the combi-

nation of the competing models, we not only look

for a dynamic adaptation, but also ease the adap-

tation of the prediction tool to alternative wind

farms, since the relative performance of the com-

peting models can depend on the location.

Since the number of alternative predictors is large

and the relative performance of them can be very

different, we will only combine a subset of them.

There is an agreement between practitioners that

poor forecasts should not be included in the combi-

nation (see, i.e., Bunn, 1985). The intuitive reason is

that any predictor can have some good performance at

some time just by chance. Thus, even the poorest

predictor will have a non-zero weight in the combi-

nation, to the detriment of better predictors, and caus-

ing a loss of efficiency. Therefore, by combining only

the important procedures, we can reduce the variabil-

ity of the combined forecast, leading to a much better

performance (Yang, 2004). The subset of selected

predictors will be chosen according to their recorded

recent performance. We will only combine the d best

predictors, where d can be time-varying (see Swanson

& Zeng, 2001, for alternative procedures for doing

this selection when the number of predictors is small

and d fixed). Then, some of the /tk(h) in (18) will be

zero. If we always combine the same subset of pre-

dictors, we could build a recursive combination

scheme using a regression with time-varying coeffi-

cients, as described in Section 3 (see also Diebold &

Pauly, 1987; Sessions & Chaterjee, 1989; and Terui &

van Dijk, 2002, for alternative recursive combinations

with d fixed). The idea of using as many predictors as

possible and then selecting which of them will enter

into the final combination is similar to the nonpara-

metric model proposed by Kohn et al. (2001). These

authors consider building a nonparametric regression

using linear combinations of basis functions (polyno-

mials, splines, etc.). To ensure flexible estimates, the

regression should include a large number of basis

functions. Then, to avoid overfitting, they select the

functions that will have a non-zero weight in the

regression. The Bayesian hierarchical method pro-

posed in Kohn et al. (2001) to solve the problem is,

however, computationally expensive for on-line oper-

ation, since it needs to be solved by Monte Carlo

simulation.

In order to propose a feasible adaptive combi-

nation procedure with a time-varying d, we will

first define the recursive on-line measurement of

forecasting performance. Once we have access to

the new wind power production pt, we can com-

pute the prediction errors et |t�h(i) =pt� pt|t�h

(i) of pre-

dicting pt from period t�h using the model i =1,


2, . . ., K. Then, we will use the following weight-

ed sum of products of prediction errors:

Si;jð Þtjt�h

¼ eeið Þtjt�h

eejð Þ

tjt�hþ kS i;jð Þ

t�1jt�h�1

¼Xts¼1

k t�sð Þeeið Þsjs�h

eejð Þ

sjs�h; i; j ¼ 1; N ;K; ð19Þ

where 0bk b1 has the same interpretation as the

above-mentioned forgetting factor. We could even

use an adaptive forgetting factor, k =kt. We can then

estimate the matrix of mean squared prediction errors

(MSPE) using an exponentially weighted moving av-

erage MSPE (EW–MSPE). Let us denote this EW–

MSPE matrix as Xt |t�hu Xt |t�h(k), where the (i, j)

element is

�XXtjt�h

�i; j

¼ Si;jð Þtjt�h

Xts¼1

k t�sð Þ

!�1

; i; j ¼ 1; ::;K:

ð20Þ

This matrix Xt |t�h is similar to the covariance matrix

proposed in Granger and Newbold (1986, p. 274). Let

us denote by Vt |t�h the EW–MSPE matrix with the

same information as Xt|t�h but sorted by its diagonal

elements in increasing order; i.e., the first row corre-

sponds to the best predictor and so on. Then, the

information of the best d predictors is in Vt |t�h[d],

which is the first d�d submatrix of Vt |t�h. Note that,

if dud(t), the size of Vt |t�h[d] and the predictors involved

can vary through time. We will adopt here the classical

approach for optimal forecast combination, using

weights that sum to one, and which are built using

the elements of Vt |t�h[d] . There are typically two options

for building the vector of combining coefficients:

using the full matrix Vt |t�h[d] or using only the diag-

onal (ignoring the correlation between the predic-

tors). In the first option, the vector bt |t�h[d] of

combining coefficients will be

bbd½ tjt�h

¼h�

VVd½ tjt�h

��1

cd

i.hc Vd�VV

d½ tjt�h

��1

cd

i; ð21Þ

where cd is a vector of ones of length d. In the

second option, the i-th element, i=1, . . ., d, of the

vector bt |t�h[d] of combining coefficients is

bbd½ i;tjt�h

¼�vv

ið Þtjt�h

��1�Xdl¼1

�vv

lð Þtjt�h

��1

; ð22Þ

where v(i)t |t�h, i=1, . . ., d, are the diagonal elements

of Vt |t�h[d] ; that is, the EW–MSPE of each selected

predictor. It should be noted that, in order to com-

bine a set of predictors to predict pt+h from period

t, the last available estimated EW–MSPE is X t |t�h.

Therefore, assuming that the best prediction of Xt+h |t

is Xt |t�h, which is equivalent to assuming a random

walk evolution of such a random variable, the com-

bination of the best d predictors will be

ppC d½ tþhjt ¼

Xdi¼1

bbd½ i;tjt�h

ppi½ tþhjt;

where pt+h |t[i] , i=1, . . ., d, is the predictor cor-

responding to row i of the EW–MSPE matrix Vt |t�h[d] .

t |t�h[d] . We will denote as dt

(h) the optimal number of

these predictors used in computing the final predic-

tion pt+h |tC . This number dt

(h) will be estimated using

the prediction performance of the different combin-

ing alternatives; that is, the performance of using

d=1, . . ., K. In order to evaluate the performance of

the K different combinations, we will use the same

definition of EW–MSPE as in (20) but now the

competing predictors are pt+h |tC[d] , d =1, . . ., K. The

prediction errors of these K combinations are

denoted by et |t�hC[d] =pt� pt |t�h

C[d] , d=1, . . ., K. Using

a recursion similar to (19), we can compute the

EW–MSPE of these prediction errors as

wwd½ tjt�h

¼ SC d½ tjt�h

Xts¼1

k t�sð Þ

!�1

; i; j ¼ 1; ::;K; ð23Þ

where St |t�hC[d] = (et |t�h

C[d])2+kSt�1|t�h�1C[d] . Hence, the es-

timated optimal combination will use the dt(h) best

predictors, where dt(h)= arg mind(wt |t�h

[d] ). The final

prediction is then

ppCtþhjt ¼Xdd hð Þ

t

i¼1

bb

dd

hð Þt

�i;tjt�h

ppi½ tþhjt: ð24Þ


Note that, as with Xt+h |t, we have used the random

walk assumption for the evolution of (et+h |t[d] )2, and

therefore the best estimation of E[(et+h |t[d] )2] is wt |t�h

[d] .

As mentioned in Section 3, the random walk assump-

tion for the evolution of our random variables has a

physical justification, since atmospheric changes are

very slow. From a practical point of view, this random

walk assumption does not seem to be restrictive when

estimating dt(h) and the optimal weights (22), since

data suggest that the position of the predictors in the

sorted matrix Vt |t�h[d] also evolves very slowly. If the

position of the predictors in the ranking of perfor-

200 400 600 800 1000 1200 1400 1600 1800 20000

2

4

6

8

10

12

14

16

18Evolution of optimal d for h=6

Opt

imal

d

Time

0 200 400 600 800 1000 1200 1400 1600 1800 20000.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3Evolution of the EW-MSPE of the alternative predictors (relative to M1 with RLS)

Time

Rel

ativ

e E

W-M

SP

E

(a)

(b)

Fig. 3. (a) Evolution of the optimal numbers of predictors to

combine at h =6. (b) Evolution of the relative EW–MSPE of the

competing predictors (dotted lines) and the final combination (solid

lines). The values of EW–MSPE are relative to the EW–MSPE of

model M1 estimated by RLS.

mance changes very quickly, then wt |t�h[d] would be a

poor predictor of the performance of the combination

of the best dt(h) predictors. On the other hand, if the

position of the competing predictors in the ranking is

stable, wt |t�h[d] will be a good predictor of the perfor-

mance of the combination, and the coefficients bi,t |t�h

in (24) will lead to an efficient combination. Fig. 3

illustrates this point, showing a high stability across

time in the ranking of predictors. Fig. 3(a) shows the

evolution of values of dt(h) for h =6 using data from a

wind farm in Spain over time (the details of the

estimation are described in the next section), whereas

Fig. 3(b) shows the evolution, over the same period,

of the EW–MSPE of the alternative 18 predictors

(dotted line) at h=6. In order to ease visualization,

the alternative EW–MSPE have been divided by the

EW–MSPE of model M1 estimated with RLS. The

solid line in Fig. 3(b) is the EW–MSPE of the final

combination (24) using a similar estimation of

MSPE as in (20). Fig. 3 shows that the position

of the alternative predictors (dotted lines) in the

ranking of the best dt(h) predictors is very stable

across time. Thus, when dt(h) remains constant, it

tends to be based on the same predictors. The

relative performance of the competing models also

has a slow evolution, leading to a high stability in

dt(h).

5. Some results

This section shows an application of the method

using data from a wind farm in Spain. These results

should only be considered as illustrative, since the

performance can depend on many factors that here

will be fixed: location of the wind farm, characteris-

tics of the wind turbines, calibration of the nonpara-

metric models (M8 and M9), and the quality of the

wind predictions. The accuracy of the wind predic-

tions is the most influential factor in wind power

forecasting. We will therefore use both real wind

measurements, obtained from the anemometer of the

wind farm, and wind predictions supplied by the

Spanish Meteorological Institute.

The data corresponds to hourly average wind speed

and direction, and average hourly power measured

from January to April 2002. There is a total of 2800

data points when anemometer measurements are used


and 2052 when wind predictions are used. The first

100 observations were used to obtain initial estimates

in order to further apply RLS and KF. The on-line

adaptive combination of the 18 alternative predictors

were made by evaluating the EW–MSPE matrix (20)

and (23) with k =0.985, which is equivalent to an

asymptotic memory length of 24 h. The combination

coefficients have been obtained using (22).

Fig. 4 shows the empirical MSPE of the proposed

combination procedure at each horizon. For compar-

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 361

2

3

4

5

6

7

8x 106

MSPE of optimal combination and some individual modelsWind data: anemometer

Prediction horizon

MS

PE

Combination

M9 M7

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 400

5

10

15x 106

MSPE of optimal combination and some individual models Wind data: predictions

Prediction horizon

MS

PE

Combination

M9 M7

(a)

(b)

Fig. 4. MSPE of the optimal combination and models M7 and

M9 estimated with RLS using (a) real wind data and (b) wind

predictions.

ison, this figure also displays the MSPE of models M7

and M9 estimated by RLS (RLS and KF have very

similar performance in this data set), which are the

models that encompass all the characteristics of the

proposed models. ModelsM7 andM9 can then be seen

as the alternatives to a combination strategy. It can be

seen in both Fig. 4(a), based on real wind measure-

ments, and Fig. 4(b), based on wind predictions, that

the final predictions have better overall performance

than those individual models. Fig. 3 shown above is

based on real wind measurements. This figure illus-

trates how a different number of predictors is com-

bined as the relative accuracy of the competing

predictors changes across time.

In order to understand the role of the alternative

models in the final combination, Fig. 5 shows the

average combination coefficients of each model

when real wind measurements are used. Since we

are using real wind data, the relative performance of

model M1 is, as expected, very poor and, conse-

quently, its average combination coefficient is close

to zero. Therefore, we do not show its results here.

Fig. 5(a) displays the average coefficients of the

models that use the information of the wind (M2 to

M9). In this figure, we have merged the coefficients

corresponding to both estimation procedures, RLS

and KF. We have also merged the coefficients of

the models in which the use of the wind direction is

the only difference. This means summing up the

coefficients of M2 and M5 (models with a linear

function of the velocity), M3 and M6 (models with

a quadratic function of the velocity), M4 and M7

(models with a cubic function of the velocity), and

M8 and M9 (nonparametric models). Fig. 5(a) shows

that, out of the fully parametric models (M2 to M7),

the models with a cubic power of the wind (M4+M7)

have the largest weight, which is equivalent to say-

ing that they are better predictors. This result is in

agreement with (2). It can also be seen that these

models (M4+M7) have performance comparable to

the nonparametric models (M8 and M9, first solid

line from the bottom). However, when we consider

all the parametric models (M2 to M7, first solid line

from the top), the aggregated combination coefficient

surpasses that of the nonparametric models. This

allows us to conclude that, although nonparametric

models are especially suited to dealing with nonlinea-

rities, the combination of different polynomials of the

0 5 10 15 20 25 30 35 40 450

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Average combination coefficient of each model

Prediction horizon

Ave

rage

com

bina

tion

coef

ficie

nt

ParametricM2 to M7

NonparametricM8+M9

Param. with v1

M2+M5

Param. with v2

M3+M6

Param. with v3

M4+M7

0 200 400 600 800 1000 1200 1400 1600 18000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18Evolution of combination coefficient of model M9, at h=12, using RLS

Time

Com

bina

tion

coef

ficie

nt

(a)

(b)

Fig. 5. (a) Average combination coefficient of competing models.

The dotted lines correspond to the models based on a polynomial of

the velocity of wind with different orders. The solid lines are the

aggregation of all the parametric models M2 to M7 and the non-

parametric models M8 and M9, respectively. (b) Subsample of the

evolution of the combination coefficient of model M9 at h =12.


velocity of wind is also an efficient way to model the

nonlinearity of this system and can make a significant

contribution to the final combination.

The combination coefficients for each model are

time-varying, aimed at adapting to the data. The

coefficients shown in Fig. 5(a) are just their average

values along the observed data points. Fig. 5(b)

illustrates the evolution of one of these combination

coefficients. This figure shows a portion of the evo-

lution of the combination coefficient of model M9

when used to predict at horizon h=12 using RLS as

the estimation method. It can then be seen that the

combination coefficient is only null for some short

periods, and for most of the time the coefficient

varies between 0.10 and 0.15. Since we are dealing

with 18 alternative predictors, a weight as large as

0.15 shows a significant contribution to the final

combination.

6. Concluding remarks

Through the design of a very flexible recursive

forecasting system, we have obtained a useful short-

term prediction tool for wind power production (and

perhaps with some other applications). The system is

based on an on-line time-varying forecast combina-

tion, where both the number of predictors and their

weights are time-varying. Both the competing models

and the estimation procedures have been selected in

such a way that a wide range of real situations can be

covered.

This statistical forecasting tool can also be ac-

companied by a set of numerical rules that intro-

duce non-random information about the wind park

which can affect predictions. An example of this

kind of information is the disconnection speed,

beyond which the wind turbines are disconnected

for safety reasons. Another example of non-random

changes that can affect predictions are changes in

the nominal power of the wind park, due to

changes in the number of wind turbines, mainte-

nance, and so on.

The described system has a modular framework

(competing models–estimation procedures–final

combination) that allows further independent re-

search to be made in every specific part of the

system and that will undoubtedly improve its

performance.

Acknowledgments

The author is grateful to the referees for their useful

comments. The author is also grateful to Carlos

Velasco for his computational assistance with the

nonparametric models. Some parts of this research

have been presented in the following seminars:

2002-IEA Symposium on Wind Forecasting Techni-


ques (Norrkiping), the World Wind Energy Confer-

ence and Exhibition (Berlin), the 2002 European

Wind Energy Conference (Paris), the 17th Internation-

al Workshop on Statistical Modelling (Chania), and

the XXI SEIO Meeting (Baeza). The author is grateful

to the attendants of the above-mentioned seminars for

their useful comments. This research has been partly

supported by Red Electrica de Espana and the ANE-

MOS project (ENK5-CT-2002-00665), funded by the

European Commission and grant SE 2004-03303

from Ministerio de Educacion y Ciencia. Any remain-

ing error is the author’s responsibility.

References

Ackerman, T., & Soder, L. (2002). An overview of wind energy

status 2002. Renewable and Sustainable Energy Reviews, 6,

67–128.

Akaike, H. (1974). A new look at the statistical model identi-

fication. IEEE Transactions on Automatic Control, AC-19,

716–723.

Beyer, H. G., Heinemann, D., Mellingho, H., Monnich, K., &Waldl,

H. P. (1999). Forecast of regional power output of wind turbines.

Proceedings of the EWEC 1999 (pp. 1070–1073).

Bhansali, R. J. (1996). Asymptotically efficient autoregressive

model selection for multistep prediction. Annals of the Institute

of Statistical Mathematics, 48, 577–602.

Bianchi, F. D., Mantz, R. J., & Christiansen, C. F. (2004). Power

regulation in pitch-controlled variable-speed WECS above rated

wind speed. Renewable Energy, 29, 1911–1922.

Bunn, D. W. (1985). Statistical efficiency in the linear combi-

nation of forecasts. International Journal of Forecasting, 1,

151–163.

Chen, Z., & Yang, Y. (2002). Time series models for forecasting:

Testing or combining? Manuscript, Iowa State University.

Clemen, R. T. (1989). Combining forecasts: A review and annotated

bibliography. International Journal of Forecasting, 5, 559–583.

Cleveland, W. S., & Devlin, S. J. (1988). Locally weighted

regression: An approach to regression analysis by local

fitting. Journal of the American Statistical Association, 83,

596–610.

Diebold, F. X., & Pauly, P. (1987). Structural change and the

combination of forecasts. Journal of Forecasting, 6, 21–40.

Dutton, A. G., Kariniotakis, G., Halliday, J. A., & Nogaret, E.

(1999). Load and wind power forecasting methods for the

optimal management of isolated power systems with high

wind penetration. Wind Engineering, 23, 69–87.

Fan, J., & Gijbels, I. (1996). Local polynomial modelling and its

applications. London7 Chapman & Hall.

Focken, U., Lange, M., & Waldl, H. P. (2001). Previento—

a wind power prediction system with an innovative

upscaling algorithm. Proceedings of the EWEC 2001

(pp. 826–829).

Giebel, G., Landberg, L., Nielsen, T. S., & Madsen, H. (2001). The

Zephyr project. The next generation prediction system. Proceed-

ings of the EWEC 2001 (pp. 777–780).

Granger, C. W. J., & Newbold, P. (1986). Forecasting economic

time series. San Diego7 Academic Press.

Grillenzoni, C. (1994). Optimal recursive estimation of dynamic

models. Journal of the American Statistical Association, 89,

777–787.

Joensen, A., Giebel, G., Landberg, L., Madsen, H., & Nielsen, A.

(1999). Model output statistics applied to wind power predic-

tion. Proceedings of the EWEC 1999 (pp. 1177–1180).

Joensen, A., Madsen, H., Nielsen, H. A., & Nielsen, T. S. (1999).

Tracking time-varying parameters using local regressions. Auto-

matica, 36, 1199–1204.

Kang, I.-B. (2003). Multi-period forecasting using different mod-

els for different horizons: An application to U.S. economic

time series data. International Journal of Forecasting, 19,

387–400.

Kariniotakis, G. N., Stavrakakis, G. S., & Nogaret, E. F.

(1996). Wind power forecasting using advanced neural net-

works models. IEEE Transactions on Energy Conversion, 11,

762–767.

Kohn, R., Michael, S., & Chan, D. (2001). Nonparametric regres-

sion using linear combinations of basis functions. Statistics and

Computing, 11, 313–322.

Landberg, L., (1994). Short-term prediction of local wind con-

ditions. PhD-Thesis, Riso National Laboratory. Roskilde,

Denmark.

Landberg, L., Giebel, G., Madsen, H., Nielsen, T. S., Jørgensen, J.

U., Laursen, L., et al. (2002). Wind farm production predic-

tion—the Zephyr model. Technical report. Roskilde, Denmark:

Riso national Laboratory.

Nielsen, T. S., Madsen, H., & Tofting, J. (1999). Experiences with

statistical methods for wind power prediction. Proceedings of

the EWEC 1999 (pp. 1066–1069).

Nielsen, T. S., Joensen, A., Madsen, H., & Landberg, L.

(2000). Tracking time-varying coefficient functions. Interna-

tional Journal of Adaptive Control and Signal Processing, 14,

813–828.

Sanchez, I. (in press). Recursive estimation of dynamic models

using Cook’s distance, with application to wind energy forecast.

Technometrics.

Sanchez, I., Usaola, J., Ravelo, O., Velasco, C., Domınguez, J.,

Lobo, M., et al. (2002). SIPREOLICO—a wind power pre-

diction system based on flexible combination of dynamic

models. Application to the Spanish power system. Proceed-

ings of the World Wind Energy Conference and Exhibition

2002.

Schwartz, G. (1978). Estimating the dimension of a model. Annals

of Statistics, 6, 461–464.

Sessions, D. N., & Chaterjee, S. (1989). The combining of forecasts

using recursive techniques with nonstationary weights. Journal

of Forecasting, 8, 239–251.

Swanson, N. R., & Zeng, T. (2001). Choosing among compet-

ing econometric forecasts: Regression-based forecast combi-

nation using model selection. Journal of Forecasting, 20,

425–440.


Terui, N., & van Dijk, H. K. (2002). Combined forecasts from linear

and nonlinear time series models. International Journal of

Forecasting, 18, 421–438.

Vilar-Fernandez, J. A., & Vilar-Fernandez, J. M. (1998). Recur-

sive estimation of regression functions by local polynomial

fitting. Annals of the Institute of Statistical Mathematics, 50,

729–754.

Yang, Y. (2004). Combining forecasting procedures: Some theoret-

ical results. Econometric Theory, 20, 176–222.

Ismael Sanchez is Associate Professor of Statistics in the Politech-

nic School at Universidad Carlos III de Madrid. His main research

interests are time series analysis, forecasting, and statistical process

control. He has published in several leading journals, including the

Journal of the American Statistical Association, Technometrics, and

the Journal of Forecasting. He actively participates in a multidisci-

plinary team concerning the real-time prediction of wind energy

production for the Spanish Peninsular system.