Electric Power Load ForecastingProject ReportNov 21, 2016
Masoud Ghandehari, Ding MaNew York University
Executive SummaryChanges in electric power used at network level was correlated with changes in a) residential population, b) worker population, c) peak yearly temperature, and d) number of all 311 calls. A gradient boosting regression model was used to analyze for the relative importance of the four features influencing the network load. The analysis shows that the residential population and worker population are the two most significant features for forecasting power load. The population data above are available yearly with census track resolution, but with a two year lag. In order to increase the temporal resolution and reduce the lag time, auxiliary data such as Taxi, subway turnstile, twitter data and other may be used, resulting in a more immediate measure of neighborhood growth. Using this approach, it may be possible to better forecast the summer peak load, in early Spring of that year.
Residential Population
Worker Population
Temperature
311 calls
Relative Importance
Four Features Modeled
Table of ContentsExecutive Summary 2Load Data Patterns 4-11Correlative Data and Modeling 12-27Summary & Next Steps 28Appendix 29-31
Load Patterns
Includes Spatial and temporal patterns of average and peak power, power density (versus building area), and changes versus time at network level
Hours of observation from 2008 to 2016 in 82 networks
Bronx
Brooklyn
Manhatt
an
Queens
Staten Island
Westchester
01000020000300004000050000600007000080000
01000020000300004000050000600007000080000
hour
s
hour
s
82 Networks
By Borough By Activity
A summer week load time history, all networks
No record for 07/15/2016 10:00 11:00 12:00
Pow
er/M
W
Hours in a day
Mon Tue Wed Thu Fri Sat Sun
July 11, 2016 0:00 to July 17, 2016 23:00
0 8 16 0 8 16 0 8 16 0 8 16 0 8 16 0 8 16 0 8 16150200250300350400450500
Hottest Week Overview of Jamaica Ave.
2008 2009 2010 2011 2012 2013 2014 2015
MW
Tue2008 16:002010 21:002014 20:00
Wed2012 14:00
Fri2011 18:002013 21:00
Hours in a day
Jamaica Ave network in hottest week of every year
Mon2009 21:002015 21:00
2016 Power density by building area (W/sq ft)
2016 power & power density 2016 Average Power Usage
(MW)
Change in power & power density (2008-2016) Down by 3.587 MW (4.57%)
Change of Power Density 2008-2016[W/ sq ft]
Change of Power 2008-2016[MW]
Peak power and density & fractional change of peak (2008-2016)Peak 2016 (mW) Peak density 2016 (mW/sqft) Fractional change 2016/2008
Peak power & density & fractional change of peak, (2008-2016)Four NYC boroughs only
Peak 2016 (mW) Peak density 2016 (mW/sqft) Fractional change 2016/2008
Correlative Data and ModelingRelation of power versus population and activity,
and identification of dominant features defining the power used at network level.
• Electric power, networks level: 2008-2016, hourly (raw, not temperature corrected)
• Lot area and building area, 2008-2016
• Summer (yearly) peak temperature variable: 2008-2016, (raw, not temperature corrected)
• Population: Longitudinal Household Employment Data (LEHD), yearly changes in residential and employment numbers at census block ( 2008-2014) yearly
• Population: America Community Survey (ACS), census block level, (every 1 to 5 years, however with 1000’s of attributes) from 2010 to present, yearly (on 5 yr integrated data)
• Income : from ACS every 1 years
• Taxi pick ups (drop-off, and fare also available), lat-long-time, green, yellow, uber. (not included yet)
• 311 (since 2010)
• Twitter, subway turnstile, real-estate value (not included yet)
Initial datasets to use for spatial-temporal model
Variation of daily energy consumption vs temperature
0 100000 200000 300000 400000 500000 6000000
10
20
30
40
50
60
70
80
90
100 Daily Energy Sum on Temperature Variable
Energy Consumption kW.h
Tem
pera
ture
(deg
rees
F)
Efficiency Frontier
Longitudinal Employment Household Data (LEHD) data
Partitioned as work location and residence location. (see sample worker job type data)Spatial: census block level Temporal: yearly data with 2-year lag.
2014 population density (Census Block Group level)
People per sq ft of lot areaResidents density of LEHD 2014 Workers density of LEHD 2014Population density of ACS 2014
ACS = American Community Survey
2014 population density, network level) 2014, (Source LEHD) People per sq ft
Population density and fractional change 2014/2013 (Source ACS)
(2008-2016) Change in average power use in NYC Five boroughs versus change in residential population and change in employment distribution
Change in power (mW) Change in number of workers Change in number of residents
(2008-2016) zoomed, Change in average power use in NYC Four boroughs versus change in residential population and change in employment distribution
Change in power (mW) Change in number of workers Change in number of residents
311 call 2016 311 call 2016 divided by ACS population 2014
Non-emergency community reporting (all 311 calls)
Change in power versus 311 calls, 2010 -2016Fractional change (2016/2010)Change in average power, mW (2016-
2010)
Relative data volume for one month yellow, green and FHV
yellow green FHV0
2000000
4000000
6000000
8000000
10000000
12000000
TLC June 2016
Evolution of Green Taxi from inception
8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 60.00200,000.00400,000.00600,000.00800,000.00
1,000,000.001,200,000.001,400,000.001,600,000.001,800,000.002,000,000.00
Green Taxi total monthly pick-ups
2013 2014 2015 2016
Green Taxi July 2014, July 2015 (Passenger)July 2015 Green Taxi Pickup Passengers Green Taxi Pickup Passengers
Fractional Change (2015/2014)
Income (2013 & 2014)source: ACS (American Community Survey)
Result of Gradient Boosting Model:Change of electric peak load is driven by changes in residential population, worker population and temperature in that order of significance.
Conclusions:- #1 defining feature is residential population- #2 defining feature is commercial population- #3 defining feature is temperature- 311 total calls were lower ranked feature- Income did not show as dominant feature but note only 2013-2014 were analyzed- Taxi was not used due to the overwhelming effect of yellow taxi in Manhattan. However taxi is a good
complement to population since population records are not temporally granularNext Steps:- Particular classes of 311 will be used rather than total number of calls- Spatial variability of local temperature may be included- Increasing observation will be useful. This may be done spatially (e.g. increasing spatial granular) or
temporally (e.g. using monthly peak rather than yearly peak)- Increasing the number of income observations will most likely show the effect of income differently.
Will use since 2010 - Green taxis will be used to model the outer borough networks only.- Taxi data will be used to increase temporal granularity of population data- Subway turnstile data will be used to enhance temporal resolution population movement.- Twitter data may be used (depending on full data availability)- Real estate may be used. Tbd.
Assumptions:- The summer peak temperature variable is used for the model
Summary : Assumptions, Conclusions and Next Steps
Appendix
Gradient Boosting Regression Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function.
•Score Returns the coefficient of determination R^2 of the prediction.
•Feature importance_ Return the feature importance (the higher, the more important the feature).
•train_score_ The i-th score train_score_[i] is the deviance (= loss) of the model at iteration i on the in-bag sample. BLUE
•Loss_ The concrete Loss Function object. RED