Explaining Wind Farm Output Using Regression Analysis
Kate Geschwind
Mayo High School
Rochester, MN
2
Contents Abstract ........................................................................................................................................... 3
Introduction ..................................................................................................................................... 4
Materials and Methods .................................................................................................................... 4
Results ............................................................................................................................................. 6
Discussion ....................................................................................................................................... 9
Conclusion .................................................................................................................................... 10
Acknowledgments......................................................................................................................... 11
References/Bibliography............................................................................................................... 11
Appendix ....................................................................................................................................... 12
Figure 1 ........................................................................................................................................... 6 Figure 2 ........................................................................................................................................... 7
Figure 3 ........................................................................................................................................... 7 Figure 4 ........................................................................................................................................... 8
Figure 5 ........................................................................................................................................... 8 Figure 6 ........................................................................................................................................... 9
3
Abstract Wind farms are a common source of energy today, but the amount of energy produced is
intermittent and challenging to predict. The goal of this project was to create a practical
mathematical model using readily-available explanatory variables that would accurately
represent the amount of energy produced from a 67-turbine wind farm in southeastern
Minnesota. Such a model would be useful to electrical grid operators in predicting this wind
farm output. The analysis used recorded hourly data from December 2009, including time of
day; temperature; dew point; relative humidity; wind speed; wind direction; cloud cover; turbine
availability; and the amount of energy produced, measured in kilowatts. The regression
capabilities of a standard spreadsheet software program were used to create different
mathematical models using various combinations of explanatory variables. The regression
models were analyzed using R squared and the residual error values to measure the accuracy of
the model and identify the best-performing explanatory model. Using the same December 2009
data, this model was then compared to a simple wind turbine power curve model which might
also be used to explain the output of the same wind farm. By comparing the standard deviation
of residual error values as well as the sum of the absolute value of residual error values, the final
regression model was shown to be more accurate at explaining electrical energy output than the
power curve model. This regression model, combined with forecasts of the explanatory
variables, would be a useful tool for predicting the hourly output of the wind farm.
4
Introduction A common problem today is global warming. It is a topic on the news often, and many people
are working to find ways to improve our environment. There are many approaches to creating a
cleaner environment, and one way is to use sources of energy that are clean and renewable.
Wind energy is one such source of clean energy.
Wind energy has been used as early as 9000 B.C. to propel boats down the Nile River. Today,
wind turbines are becoming a major source of energy for the electric utility industry, providing
clean, renewable energy. As helpful as wind turbines are, they do have some drawbacks, such as
that the amount of energy they produce constantly changes from hour to hour. Consequently, it
is difficult to predict the energy output of a wind farm in advance. This makes it difficult for
utilities and transmission grid operators who are required to maintain the reliability of the
electrical system. They must have other non-intermittent power plants available to generate
more or less energy, depending on the wind.
My research was intended to develop a practical mathematical model using regression analysis
that would explain the energy output of a particular 67-turbine wind farm located in southeastern
Minnesota. The model would use observed weather information and other explanatory variables
that would accurately explain how much energy the wind farm produces each hour. Such a
model could then be used to better forecast the output of this wind farm and could be readily
replicated for other wind farms to improve predictability and promote wind-resource
development. This in turn would allow us to rely less on the energy sources that harm our
environment and more on clean and renewable wind energy.
Materials and Methods The following materials and data were used in this project:
Hourly weather data (temperature, dew point, relative humidity, wind speed, wind
direction, and cloud cover)
Hourly wind farm output data (measured in kilowatts)
Hourly wind farm turbine availability data
Wind turbine manufacturer’s power curve
Computer spread sheet software
The controlled variables in this experiment were the wind farm that was used, the spreadsheet
software that was used to make the models, and the weather data service. The independent
variables were the weather conditions when the data was being recorded, which included the
time of day, relative humidity, dew point, wind speed, cloud cover, temperature, and wind
direction. Information also provided was the turbine availability, wind cut-off and cut in speeds,
the actual output of energy from the wind farm, and the use of a lag dependent variable. The
dependent variable was the estimate of wind farm energy output that came out of the regression
models.
5
Hourly observed weather information for Rochester, Minnesota was obtained from a commercial
weather service for the month of December 2009. Rochester, Minnesota is the community with
available weather data that is nearest to the wind farm. The weather information included the
time of day, temperature, dew point, relative humidity, wind speed, wind direction, and cloud
cover.
Other data collected included wind turbine availability (i.e., how many of the 67 wind turbines
were available each hour), and wind cut-off and cut-in speeds from the turbine manufacturer’s
data. The information collected from the wind farm was the amount of energy produced from
the wind farm, measured in kilowatts.
The list of potential explanatory variables was supplemented with derived explanatory variables.
For example, the energy content of the wind varies with the cube of the wind speed. Therefore,
wind speed cubed was calculated and used as a potential explanatory variable. Also, the power
extracted from the wind by a wind turbine rotor is proportional to the drop in the wind speed
squared, so wind speed squared was also used as a potential explanatory variable. Finally, the
dependent variable was lagged one hour and used as a potential explanatory variable. This was
done because the output of the wind farm from one hour to the next is not random and will
typically be somewhat related to the prior hour’s output.
The first step in the modeling was to determine the correlation between each potential
independent variable and the dependent variable (estimated wind farm output) to show how
important each variable was in explaining the amount of energy produced. The correlation
coefficients were calculated using the predefined correlation function in a standard spreadsheet
software. Taking note of the correlation coefficients between the independent and dependent
variables, the spreadsheet program was used to create regression models using seven different
combinations of these explanatory variables. Once made, the regression models were analyzed
using R2
(the closer to 1, the more accurate the model) and the residual error values (the
difference between model output estimates and actual observed output) to measure the accuracy
of the models and identify the best-performing explanatory model.
For comparison with the performance of the regression models, a simple wind turbine power
curve model was developed using manufacturer’s data for the type of wind turbines in the wind
farm. An equation for the turbine power curve between the cut-in wind speed and the rated
speed was developed from discrete x-y data points from the manufacturer’s actual curve. This
curve-fit equation was developed using regression analysis and the explanatory variables of wind
speed, wind speed squared, and wind speed cubed. The power curve is shown on Figure 1
below.
The manufacturer’s power curve, along with the number of wind turbines in-service in the wind
farm each hour during December 2009, was combined with observed wind speed to arrive at
hourly estimates for wind farm output. Using a manufacturer’s power curve for the wind
turbines in the wind farm was not expected to accurately explain the output of the entire wind
farm, but a power curve model is a reasonable model in the absence of an explanatory model
tailored to the specific wind farm.
6
Figure 1
Using the same December 2009 data, the final regression model was compared to the simple
wind turbine power curve model. The standard deviation of residual error values and the sum of
the residual error values were compared for the regression model and the simple power curve
model to determine if the regression model provided a better method of explaining the wind farm
output than the power curve model.
A frequency distribution graph of the residual error values was created to highlight the
differences among the models. A graph was also made for each model showing the hourly
amount of energy estimated by the models to the actual amount of energy produced by the wind
farm.
Results As described above, a variety of regression models were developed in an attempt to develop an
accurate explanatory model. These models, along with the summary regression statistics, are
shown in the attached Appendix. The models were compared with each other as well as with the
results of the simple power curve model. The comparison results are shown below in Figure 2.
These results are also shown in the frequency distribution graph in Figure 3 for a sampling of the
developed models.
0
200
400
600
800
1,000
1,200
1,400
1,600
0 2 4 7 9 11 13 16 18 20 22 25 27 29 31 34 36 38 40 42 45 47 49 51 54 56 58 60
Ele
ctri
cal O
utp
ut
(kW
)
Wind Speed (mph)
1500 kW Wind Turbine Power Curve
kW=-234.018*V+17.863*V2-.314*V3+937.680where V = wind speed
Turbine Cut-Out
Turbine Cut-In
7
Figure 2
Figure 3
To better illustrate the performance of the final regression model, the estimated hourly wind farm
output values were compared graphically with the actual wind farm output values. This same
graphical comparison was developed for the simple power curve model. The power curve model
hourly estimates shown in Figure 4 appear relatively inaccurate. However, the hourly estimates
from the final regression model shown in Figure 5 show a significant improvement over the
power curve model and the results are reasonably accurate compared to the actual wind farm
output.
Mean Residual
(kW)
Median Residual
(kW)
Standard
Deviation of
Residuals (kW)
Sum of Absolute
Value of
Residuals (kWh)
Power Curve Model (16,908) (13,000) 22,771 15,310,256
Regression Model 1 - (2,342) 23,738 12,658,411
Regression Model 2 - (1,688) 22,895 12,624,520
Regression Model 3 - (1,420) 22,275 12,326,012
Regression Model 4 13 (906) 21,433 11,944,501
Regression Model 5 - (735) 9,677 4,706,487
Regression Model 6 - (557) 9,642 4,693,024
Regression Model 7 (Final Model) - (331) 9,568 4,695,068
0
50
100
150
200
250
300
350
400
-100,000 -50,000 0 50,000 100,000Model Error in kW
Frequency Distribution of Model Residual Errors
Power Curve Model
Final Model
Model 5
Model 4
Model 1
8
Figure 4
Figure 5
Figure 6 below shows the coefficients, the explanatory variables, and the intercept value from the
regression analyses that form the final regression model.
0
20,000
40,000
60,000
80,000
100,000
120,000
1 26
51
76
10
1
12
6
15
1
17
6
20
1
22
6
25
1
27
6
30
1
32
6
35
1
37
6
40
1
42
6
45
1
47
6
50
1
52
6
55
1
57
6
60
1
62
6
65
1
67
6
70
1
72
6
kW O
utp
ut
Hour in December
Power Curve Model Compared to Actual Wind Farm
Output
Actual Wind Farm Production (kW)
Power Curve Model
0
20,000
40,000
60,000
80,000
100,000
120,000
1 26
51
76
10
1
12
6
15
1
17
6
20
1
22
6
25
1
27
6
30
1
32
6
35
1
37
6
40
1
42
6
45
1
47
6
50
1
52
6
55
1
57
6
60
1
62
6
65
1
67
6
70
1
72
6
kW O
utp
ut
Hour in December
Final Regression Model Compared to Actual Wind
Farm Output
Actual Wind Farm Production (kW)
Final Model
9
Figure 6
Discussion The best-performing regression model was the one that included all of the explanatory variables.
The variables that were included in the equation were time of day, temperature, dew point,
relative humidity, wind speed, wind speed squared, wind speed cubed, wind direction, cloud
cover, turbine availability, maximum wind cutoff, minimum wind cutoff, and the dependent
variable lagged one hour. With all of these variables being used, the model had an R² of 0.9008,
which means that the model is 90.08% accurate in explaining the varying output of energy from
the wind farm. The sum of residual errors from the final regression model was lower than the
other regression models with the exception of the second-to-last model developed. The standard
deviation, mean, and median of residual error values from the final model were better than any of
the other models tried, including the simple power curve model.
Although a solidly-performing model was developed, the model could be improved with
improvements in explanatory variables. One likely source of inaccuracy in the model was that
the weather data service did not provide the measured weather information for the site of the
wind farm. The weather data was measured approximately five miles away. Also, the weather
information was measured at ground level, while the hub height of the wind turbines is 80 meters
and the wind speed at 80 meters might be very different than the wind speed at ground level.
This distance and height difference could have made the models less accurate.
An extension of this project would be to use the final regression model as a prediction model.
The models created were explanatory models using historical and derived data. To put this
model into practical use, it would need to be combined with forecasts of the explanatory
variables. Fortunately, the weather service that provided the hourly recorded weather data also
provides hourly forecasts of those same weather variables several days into the future.
Interestingly, implementing some of the improvements in the explanatory variables as described
above could make the model less useful as a predictive model. For example, creating an
explanatory model using measured hourly wind speed at the hub height wouldn’t be useful for
Wind Farm Output (Hour = n) = 486.42 * Time (CST)
- 453.06 * Temp
+ 463.36 * Dew Point
- 59.86 * Relative Humidity
+ 465.97 * Wind Speed
+ 30.36 * Wind Speed2
- 0.78 * Wind Speed3
- 70.98 * Wind Direction
- 323.55 * Cloud Cover
+ 5217.77 * Turbine Availability
- 1070.56 * Min Wind Cutoff
+ 0.86 * Wind Farm Output (Hour = n-1)
+ 2602.03
Final Explanatory Model Equation
10
predicting wind farm output unless a forecast of wind speed at the hub height is also available.
While some of the explanatory variables might not have been ideal, forecasts of them are at least
readily available, contributing to the practical application of this model.
Conclusion My results did support my hypothesis that if certain weather variables and other explanatory
variables such as turbine availability, maximum and minimum wind cutoff, and the dependent
variable lagged by one hour explain the output of energy from a wind farm, then a mathematical
equation can be made using these variables to create an accurate representation for the amount of
energy produced from a wind farm.
By using all of my explanatory variables to create a mathematical model, I was able to create a
model that more accurately explained the amount of energy produced from the wind farm than
the simple wind speed-based power curve model. Developing an accurate mathematical model
as a way to explain the hourly output of a wind farm is the first step towards being able to
accurately predict the output of a wind farm. Accurate predictions of wind farm output can help
minimize operational concerns created by wind farms and improve utility system reliability and
economics.
11
Acknowledgments I would like to acknowledge the support of Dave Geschwind, who helped me learn about wind
farms and who also taught me how operate the computer spreadsheet program so that I was able
to create different mathematical models. Also, I would like to acknowledge my science teacher,
Mr. MacDonald, for inspiring me to do a science project this year. Finally, I would like to thank
the Southern Minnesota Municipal Power Agency (SMMPA) for allowing me access to the
hourly data used in this project.
References/Bibliography Crichton, Nicola. Regression Analysis. Blackwell Science. 10 January 2010. Web.
Hamel, Gregory. What is Wind Speed. 12 December 2009. eHow. 10 January 2010. Web.
Introduction to Regression Analysis. NLREG. 10 January 2010. Web.
Li, S., Wunsch, D., O’Hair, E., and Giesselmann, M., November 2001. Comparative Analysis of
Regression and Artificial Neural Network Models for Wind Turbine Power Curve
Estimation. Journal of Solar Energy Engineering. ASME Transactions Volume 123.
Proof of Betz’ Law. 12 May 2003. Danish Wind Industry Association. 17 December 2009.
Web.
Ron Larson, Laurie Boswell, Timothy Kanold, and Lee Stiff. Algebra 2. Illinois: McDougal
Littell, 2007. Print.
Shepard, Don. The Relationship Between Pressure Gradient and Wind Speed. eHow. 3 January
2010. Web.
Spencer, Roy. What Causes Wind. 12 December 2009. Weather Questions. 10 January 2010
Web.
The Power of the Wind: Cube of Wind Speed. 1 June 2003. Danish Wind Industry Association.
17 December 2009. Web.
Twicken, Joe. Atmospheric Pressure. 17 November 1999. Jet Propulsion Laboratory. 3
January 2010. Web.
12
Appendix
13
SUMMARY OUTPUT - MODEL 1
Regression Statistics
Multiple R 0.624059715 R Square 0.389450528 Adjusted R Square 0.386975327
Standard Error 23785.89795
Observations 744
Coefficients
Intercept 2792.871253 Wind Speed (mph) -868.8403061 Wind Speed^2 394.6680637
Wind Speed^3 -9.605996968
0
20,000
40,000
60,000
80,000
100,000
120,000
1 26
51
76
10
1
12
6
15
1
17
6
20
1
22
6
25
1
27
6
30
1
32
6
35
1
37
6
40
1
42
6
45
1
47
6
50
1
52
6
55
1
57
6
60
1
62
6
65
1
67
6
70
1
72
6
kW O
utp
ut
Hour in December
Model 1 Compared to Actual Wind Farm Output
Actual Wind Farm Production (kW)
Iteration 1
14
SUMMARY OUTPUT - MODEL 2
Regression Statistics
Multiple R 0.657300478
R Square 0.432043919
Adjusted R Square 0.427420124
Standard Error 22987.86373
Observations 744
Coefficients
Intercept -18359.87132
Dew Point -78.68312728
Wind Speed (mph) -198.8836346
Wind Speed^2 377.2071656
Wind Speed^3 -9.531266419
Turbine Availability 24292.24173
Min Wind Cutoff -5723.90857
0
20,000
40,000
60,000
80,000
100,000
120,000
1 26
51
76
10
1
12
6
15
1
17
6
20
1
22
6
25
1
27
6
30
1
32
6
35
1
37
6
40
1
42
6
45
1
47
6
50
1
52
6
55
1
57
6
60
1
62
6
65
1
67
6
70
1
72
6
kW O
utp
ut
Hour in December
Model 2 Compared to Actual Wind Farm Output
Actual Wind Farm Production (kW)
Iteration 2
15
SUMMARY OUTPUT - MODEL 3
Regression Statistics
Multiple R 0.680002082
R Square 0.462402831
Adjusted R Square 0.45581104
Standard Error 22410.7008
Observations 744
Coefficients
Intercept 107866.0434
Temp -4712.291368
Dew Point 4734.845111
Relative Humidity -1356.548012
Wind Speed (mph) 1335.349862
Wind Speed^2 281.2618478
Wind Speed^3 -7.583618133
Wind Direction -388.9655457
Turbine Availability 26471.92565
Min Wind Cutoff -6070.528477
0
20,000
40,000
60,000
80,000
100,000
120,000
1 26
51
76
10
1
12
6
15
1
17
6
20
1
22
6
25
1
27
6
30
1
32
6
35
1
37
6
40
1
42
6
45
1
47
6
50
1
52
6
55
1
57
6
60
1
62
6
65
1
67
6
70
1
72
6
kW O
utp
ut
Hour in December
Model 3 Compared to Actual Wind Farm Output
Actual Wind Farm Production (kW)
Iteration 3
16
SUMMARY OUTPUT - MODEL 4
Regression Statistics
Multiple R 0.708706482
R Square 0.502264878
Adjusted R Square 0.493419131
Standard Error 21593.2784
Observations 744
Coefficients
Intercept 125221.6796
Time -142.2135816
Temp -5708.102417
Dew Point 6021.595496
Relative Humidity -1485.514645
Wind Speed (mph) 919.266452
Wind Speed^2 325.525742
Wind Speed^3 -8.555745158
Wind Direction -378.7953982
Cloud Cover -1598.215063
Turbine Availability 30907.35201
Max Wind Cutoff 0
Min Wind Cutoff -6120.486786
0
20,000
40,000
60,000
80,000
100,000
120,000
1 26
51
76
10
1
12
6
15
1
17
6
20
1
22
6
25
1
27
6
30
1
32
6
35
1
37
6
40
1
42
6
45
1
47
6
50
1
52
6
55
1
57
6
60
1
62
6
65
1
67
6
70
1
72
6
kW O
utp
ut
Hour in December
Model 4 Compared to Actual Wind Farm Output
Actual Wind Farm Production (kW)
Iteration 4
17
SUMMARY OUTPUT - MODEL 5
Regression Statistics
Multiple R 0.947906866
R Square 0.898527427
Adjusted R Square 0.897562335
Standard Error 9723.226662
Observations 744
Coefficients
Intercept -4836.475977
Dew Point -20.59172188
Wind Speed (mph) 284.8226251
Wind Speed^2 31.85289767
Wind Speed^3 -0.771989389
Turbine Availability 3806.614815
Min Wind Cutoff -1018.467926
one hour lag 0.876893466
0
20,000
40,000
60,000
80,000
100,000
120,000
1 26
51
76
10
1
12
6
15
1
17
6
20
1
22
6
25
1
27
6
30
1
32
6
35
1
37
6
40
1
42
6
45
1
47
6
50
1
52
6
55
1
57
6
60
1
62
6
65
1
67
6
70
1
72
6
kW O
utp
ut
Hour in December
Model 5 Compared to Actual Wind Farm Output
Actual Wind Farm Production (kW)
Iteration 5
18
SUMMARY OUTPUT - MODEL 6
Regression Statistics
Multiple R 0.948291472
R Square 0.899256715
Adjusted R Square 0.897882318
Standard Error 9708.0286
Observations 744
Coefficients
Intercept -1420.043325
Temp -205.4600724
Dew Point 158.4207807
Relative Humidity -23.36939825
Wind Speed (mph) 508.6793022
Wind Speed^2 19.65128821
Wind Speed^3 -0.528138739
Wind Direction -68.72203877
Turbine Availability 4030.869308
Min Wind Cutoff -980.2331631
one hour lag 0.870292698
0
20,000
40,000
60,000
80,000
100,000
120,000
1 26
51
76
10
1
12
6
15
1
17
6
20
1
22
6
25
1
27
6
30
1
32
6
35
1
37
6
40
1
42
6
45
1
47
6
50
1
52
6
55
1
57
6
60
1
62
6
65
1
67
6
70
1
72
6
kW O
utp
ut
Hour in December
Model 6 Compared to Actual Wind Farm Output
Actual Wind Farm Production (kW)
Iteration 6
19
SUMMARY OUTPUT - Final Model
Regression Statistics
Multiple R 0.9491
R Square 0.9008
Adjusted R Square 0.8964
Standard Error 9,645.9578
Observations 744
Coefficients
Intercept 2,602.03
Time (CST) 486.42
Time -
Temp (453.06)
Dew Point 463.36
Relative Humidity (59.86)
Wind Speed (mph) 465.97
Wind Speed^2 30.36
Wind Speed^3 (0.78)
Wind Direction (70.98)
Cloud Cover (323.55) Turbine Availability 5,217.77
Max Wind Cutoff -
Min Wind Cutoff (1,070.56)
one hour lag 0.86
0
20,000
40,000
60,000
80,000
100,000
120,000
1 26
51
76
10
1
12
6
15
1
17
6
20
1
22
6
25
1
27
6
30
1
32
6
35
1
37
6
40
1
42
6
45
1
47
6
50
1
52
6
55
1
57
6
60
1
62
6
65
1
67
6
70
1
72
6
kW O
utp
ut
Hour in December
Final Regression Model Compared to Actual Wind
Farm Output
Actual Wind Farm Production (kW)
Final Model