Bottom-up Estimation and Top-downPrediction for Multi-level Models:
Solar Energy Prediction CombiningInformation from Multiple Sources
Jae-Kwang Kim
Department of Statistics, Iowa State University
Ross-Royall Symposium: Johns Hopkins UniversityFeb 26, 2016
1/37
Collaborators
I Youngdeok Hwang (IBM Research)I Siyuan Lu (IBM Research)
2/37
Outline
I IntroductionI Modeling approachI Application: Solar Energy PredictionI Conclusion
Overview 3/37
Mountain Climbing for Problem Solving!
Math Problem
Stat Problem
Real Problem
Math Solution
Stat Solution
Real Solution
We need a map (abstraction) to move from problem to solution!
Overview 4/37
Real Problem: Solar Energy Prediction
I Solar electricity is now projected to supply 14% of totaldemand of contiguous U.S. by 2030, and 27% by 2050.
Introduction 5/37
IBM Solar Forecasting
Figure : Sky Camera for short-term forecasting (located at Watson)
I Research program funded the by the U.S. Department ofEnergy’s SunShot Initiative.
Introduction 6/37
Monitoring Network
I Global Horizontal Irradiance (GHI): The total amount ofshortwave radiation received from above by a horizontal surface.
I GHI Measurements are being collected every 15 minutes from1,528 sensor units.
Introduction 7/37
Weather Models
I Prediction of GHI from widely-used weather models NorthAmerican Mesoscale Forecast System (NAM) and Short-RangeEnsemble Forecast (SREF).
I We want to combine GHI measurements with the weather modeloutcomes to obtain the solar energy prediction.
Introduction 8/37
Statistical Model: Basic setup
I Population is divided into H exhaustive andnon-overlapping groups, where group h has nh units, forh = 1, . . . ,H.
I For group h, nh units are selected for measurement.I From the i-th unit of group h, the measurements and its
associated covariates, (yhij ,xhij), are available forj = 1, . . . ,nhi .
Model 9/37
Multi-level Model
I Consider level one and level two model,
yhi ∼ f1(yhi |xhi ;θhi),
θhi ∼ f2(θhi |zhi ; ζh),
I yhi = (yhi1, . . . , yhinhi )>: observations at unit (hi).
I xhi = (x>hi1, . . . ,x>hinhi
)>: covariates associated with unit (hi)(=two weather model outcomes).
I zhi : unit-specific covariate.I Note that θhi is a parameter in level 1 model, but a random
variable (latent variable) in level 2 model.I We can build a level 3 model on ζh if necessary.
ζh ∼ f3(ζh | qh;α).
Model 10/37
Data Structure Under Two-level Model
ζh
θh2θh1 θh3
yh11...yh1n1
yh21...yh2n2
yh31...yh3n3
f2f2
f2
f1 f1 f1
Model 11/37
Why Multi-level Models?
1. To reflect the reality: To allow for structural heterogeneity(=variety in big data) across areas.
2. To borrow strength: we need to predict the locations withno direct measurement.
Model 12/37
Real Problems Become Statistical Problems!
1. Parameter estimation2. Prediction3. Uncertainty quantification
Bayesian method using MCMC computation is a useful tool.
Model 13/37
Classical Solutions Do Not Necessarily Work inReality!
1. No single data file exists, as they are stored in cloud(Hadoop Distributed File System).
2. Micro-level data is not always available to the analyst forconfidentiality and security reasons.
3. Classical solution, based on MCMC algorithm, is timeconsuming and the computational cost can be huge for bigdata.
This is a typical big data problem.
Solution 14/37
New Solution: Divide-and-Conquer Approach
I Three steps for parameter estimation in each level1. Summarization: Find a summary (=measurement) for latent
variable to obtain the sampling error model.2. Combine: Combine the sampling error model and the latent
variable model.3. Learning: Estimate the parameters from the summary data.
I Apply the three steps in level two model and then do thesein level three model.
Solution 15/37
Modeling Structure
Storage
Storage
Storage
Sensor
Sensor
Sensor
Level 1
Level 1
Level 1
Level 2
Site 1
Site 2
Site 3
individual
data
Unit summary
Group
Summary
Solution 16/37
Summarization
I Find a measurement for θhi .I For each unit, treat (xhi ,yhi) as a single data set to obtain
the best estimator θhi of θhi by treating θhi as a fixedparameter.
I Obtain the sampling distribution of θhi as a function of θhi ,θhi ∼ g1(θhi | θhi).
Solution 17/37
Summarization Step under Two-Level Model Structure
ζh
θh2θh1 θh3
θh1 θh2 θh3
f2f2
f2
g1 g1 g1
g1(θhi | θhi): Sampling error model, θhi ∼ N(θhi , V (θhi)).
Solution 18/37
Combining
I The marginal distribution of θhi is
m2(θhi | zhi ; ζh) =
∫g1(θhi | θhi)f2(θhi | zhi ; ζh)dθhi . (1)
which is combining g1(θhi | θhi) and f2(θhi | zhi ; ζh) vialatent variable θhi .
I Also, the prediction model for the latent variable θhi isobtained by using Bayes theorem:
p2(θhi | θhi ; ζh) =g1(θhi | θhi)f2(θhi | zhi ; ζh)∫
g1(θhi | θhi)f2(θhi | zhi ; ζh)dθhi(2)
Solution 19/37
Combining Step
θhi
θhi ζh
g1
f2
m2p2
p2
Sampling error model (g1)+ Latent variable model (f2)⇒ Marginal model (m2), Prediction model (p2)
Solution 20/37
Learning
I Level two model can be learned by EM algorithm: at t-thiteration, we update ζh by solving
ζ(t+1)h ← arg max
ζh
nh∑i=1
Ep2
{log f2(θhi | zhi ; ζh)
∣∣∣θhi ; ζ(t)h
}where the conditional expectation is taken with respect tothe prediction model p2 in (2) evaluated at ζ(t)h , and ζ
(t)h
denotes the t-th iteration of the EM algorithm.
Solution 21/37
Learning Using EM Algorithm
θhi
θhi
Zhi
ζhM-step
E-step
Solution 22/37
Bayesian Interpretation
I Prediction model (2) can be written as
p2(θhi | θhi ; ζh) ∝ g1(θhi | θhi)f2(θhi | zhi ; ζh).
I Here, f2(θhi | zhi ; ζh) can be treated as a prior distributionand p2(θhi | θhi ; ζh) is a posterior distribution thatincorporates the observation of θhi .
I Use of g1(θhi | θhi) instead of full likelihood simplifies thecomputation. (Approximate Bayesian Computation).
Solution 23/37
Extension to Three Level Model
Model Measurement Parameter Latent variable(Data summary)
Level 1 yhi = (yhi1, · · · , yhin) θhi
Level 2 θh = (θh1, · · · , θhnh) ζh θ = (θh1, · · · , θhnh)
Level 3 ζ = (ζ1, · · · , ζH) α ζ = (ζ1, · · · , ζH)
We can apply the same three steps to the level three model.
Solution 24/37
Bottom-up Estimation
Latent VariableModel
f3(ζh|qh;α)
f2(θhi |zhi ; ζh)
f1(yhij |xhij ;θhi)
Level
3
2
1
Sampling ErrorModel
ζh ∼ g2(ζh|ζh)
θhi ∼ g1(θhi |θhi)
Parameter Estimation
α = arg maxα∑H
h=1 log∫
g2(ζh|ζh)f3(ζh|qh;α)dζh
ζh = arg maxζh
∑nhi=1 log
∫g1(θhi |θhi)f2(θhi |zhi ; ζh)dθhi
θhi = arg maxθhi
∑nhij=1 log f1(yhij |xhij ;θhi)
Figure : An illustration of the Bottom-up approach to parameterestimation
Solution 25/37
PredictionI Our goal is to predict unobserved yhij values from the
above models using the parameter estimates.I The best prediction for yhij is
y∗hij = Ep3
[Ep2
{Ef1(yhij | xhij ,θhi) | θhi ; ζh
}| ζh; α
],
where
p3(ζh | ζh, α) =g2(ζh | ζh)f3(ζh | qh; α)∫
g2(ζh | ζh)f3(ζh | qh; α)dζh
and
p2(θhi | θhi , ζh) =g1(θhi | θhi)f2(θhi | zhi ; ζh)∫
g1(θhi | θhi)f2(θhi | zhi ; ζh)dθhi.
I The prediction is made in a top-down manner.
Solution 26/37
Prediction: Top-down Prediction
α
ζ∗2ζ∗1 ζ∗3
θ∗1i θ∗2i θ∗3i
p3 p3p3
p2 p2 p2
Predict yhij using f1(yhij | xhij ; θ∗hi).
Solution 27/37
Prediction: Top-down Prediction
Level
3
2
1
Latent
ζh
θhi
yhij
Prediction Model
p3(ζh | ζh; α)
p2(θhi | θhi ; ζh)
f1(yhij | xhij ;θhi)
Best Prediction
ζ∗h ∼ p3(ζh | ζh; α)
θ∗hi ∼ p2(θhi | θhi ; ζ∗h )
y∗hij ∼ f1(yhij |xhij ,θ∗hi)
Figure : Top-down approach to prediction
Solution 28/37
Case study: Application to Solar Energy Prediction
I We use 15-day long (12/01/2014 – 12/15/2014) data foranalysis.
I Organized the states into 12 groups.I The number of sites in each group, mh, varies between 37
and 321.
Application 29/37
Grouping Scheme
I Pooling data from nearby sites.I Can incorporate complex structure such as distribution
zone.
Application 30/37
Application: Site Level
I First assume that
yhij = xhijθhi + εhij ,
εhij ∼ t(0, σ2hi , νhi),
where σ2hi is scale parameter and νhi is degree of freedom
andθhi | θhi ∼ N(θhi ,V hi),
where V hi = V (θhi).I The degree of freedom is assumed to be unknown and
estimated by the method of Lange et al. (1989).
Application 31/37
Three Level Model
I Assume level 2 model
θhi ∼ N(βh,Σh),
and ζh = (βh,Σh)
I Similarly, level 3 model is
ζh ∼ N(µ,Σ),
and α = (µ,Σ).
Application 32/37
Comparison
I We compared the performance of the multi-level approachwith three other modeling methods:
I Site-by-site model: fit a different model for each individualsite
I Group-by-group model: fit a different model for each groupI One global model: fit a single common model for all sites
using the aggregate dataI To evaluate the prediction accuracy, we randomly selected
the 70% of the data to fit the model and tested on theremaining 30%.
Application 33/37
MSPE Comparison
I We compare the accuracy by Mean Squared PredictionError (MSPE), N−1
T∑
(yhij − yhij)2, where yhij are obtained
from four different methods and NT is the size of the testdata set.
Multi level Site model Group model Global modelMSPE 0.297 0.298 0.406 0.383
SD 0.601 0.609 0.803 0.791
Table : Accuracy comparison of the different modeling methods
Application 34/37
Comparison in Detail (nhi ≤ 100 vs > 100)
0.0
0.5
1.0
1.5
<100 >100Sample Size
Mea
n S
quar
ed E
rror
Method
Multilevel
Site Model
Group Model
Global Model
Application 35/37
Discussion
I Motivated from a real problem: A solar energy forecastingsystem has been developed.
I We used a multi-level model approach to address thepractical issues.
I There are more issues to be investigated.I Spatial modelingI Estimation of group structureI Preferential sampling of sitesI ...
I The proposed method is promising for handling big data.
Application 36/37
Application 37/37