Bottom-up Estimation and Top-down Prediction for Multi ......Bottom-up Estimation and Top-down...

Bottom-up Estimation and Top-downPrediction for Multi-level Models:

Solar Energy Prediction CombiningInformation from Multiple Sources

Jae-Kwang Kim

Department of Statistics, Iowa State University

Ross-Royall Symposium: Johns Hopkins UniversityFeb 26, 2016

1/37

Collaborators

I Youngdeok Hwang (IBM Research)I Siyuan Lu (IBM Research)

2/37

Outline

I IntroductionI Modeling approachI Application: Solar Energy PredictionI Conclusion

Overview 3/37

Mountain Climbing for Problem Solving!

Math Problem

Stat Problem

Real Problem

Math Solution

Stat Solution

Real Solution

We need a map (abstraction) to move from problem to solution!

Overview 4/37

Real Problem: Solar Energy Prediction

I Solar electricity is now projected to supply 14% of totaldemand of contiguous U.S. by 2030, and 27% by 2050.

Introduction 5/37

IBM Solar Forecasting

Figure : Sky Camera for short-term forecasting (located at Watson)

I Research program funded the by the U.S. Department ofEnergy’s SunShot Initiative.

Introduction 6/37

Monitoring Network

I Global Horizontal Irradiance (GHI): The total amount ofshortwave radiation received from above by a horizontal surface.

I GHI Measurements are being collected every 15 minutes from1,528 sensor units.

Introduction 7/37

Weather Models

I Prediction of GHI from widely-used weather models NorthAmerican Mesoscale Forecast System (NAM) and Short-RangeEnsemble Forecast (SREF).

I We want to combine GHI measurements with the weather modeloutcomes to obtain the solar energy prediction.

Introduction 8/37

Statistical Model: Basic setup

I Population is divided into H exhaustive andnon-overlapping groups, where group h has nh units, forh = 1, . . . ,H.

I For group h, nh units are selected for measurement.I From the i-th unit of group h, the measurements and its

associated covariates, (yhij ,xhij), are available forj = 1, . . . ,nhi .

Model 9/37

Multi-level Model

I Consider level one and level two model,

yhi ∼ f1(yhi |xhi ;θhi),

θhi ∼ f2(θhi |zhi ; ζh),

I yhi = (yhi1, . . . , yhinhi )>: observations at unit (hi).

I xhi = (x>hi1, . . . ,x>hinhi

)>: covariates associated with unit (hi)(=two weather model outcomes).

I zhi : unit-specific covariate.I Note that θhi is a parameter in level 1 model, but a random

variable (latent variable) in level 2 model.I We can build a level 3 model on ζh if necessary.

ζh ∼ f3(ζh | qh;α).

Model 10/37

Data Structure Under Two-level Model

ζh

θh2θh1 θh3

yh11...yh1n1

yh21...yh2n2

yh31...yh3n3

f2f2

f2

f1 f1 f1

Model 11/37

Why Multi-level Models?

1. To reflect the reality: To allow for structural heterogeneity(=variety in big data) across areas.

2. To borrow strength: we need to predict the locations withno direct measurement.

Model 12/37

Real Problems Become Statistical Problems!

1. Parameter estimation2. Prediction3. Uncertainty quantification

Bayesian method using MCMC computation is a useful tool.

Model 13/37

Classical Solutions Do Not Necessarily Work inReality!

1. No single data file exists, as they are stored in cloud(Hadoop Distributed File System).

2. Micro-level data is not always available to the analyst forconfidentiality and security reasons.

3. Classical solution, based on MCMC algorithm, is timeconsuming and the computational cost can be huge for bigdata.

This is a typical big data problem.

Solution 14/37

New Solution: Divide-and-Conquer Approach

I Three steps for parameter estimation in each level1. Summarization: Find a summary (=measurement) for latent

variable to obtain the sampling error model.2. Combine: Combine the sampling error model and the latent

variable model.3. Learning: Estimate the parameters from the summary data.

I Apply the three steps in level two model and then do thesein level three model.

Solution 15/37

Modeling Structure

Storage

Storage

Storage

Sensor

Sensor

Sensor

Level 1

Level 1

Level 1

Level 2

Site 1

Site 2

Site 3

individual

data

Unit summary

Group

Summary

Solution 16/37

Summarization

I Find a measurement for θhi .I For each unit, treat (xhi ,yhi) as a single data set to obtain

the best estimator θhi of θhi by treating θhi as a fixedparameter.

I Obtain the sampling distribution of θhi as a function of θhi ,θhi ∼ g1(θhi | θhi).

Solution 17/37

Summarization Step under Two-Level Model Structure

ζh

θh2θh1 θh3

θh1 θh2 θh3

f2f2

f2

g1 g1 g1

g1(θhi | θhi): Sampling error model, θhi ∼ N(θhi , V (θhi)).

Solution 18/37

Combining

I The marginal distribution of θhi is

m2(θhi | zhi ; ζh) =

∫g1(θhi | θhi)f2(θhi | zhi ; ζh)dθhi . (1)

which is combining g1(θhi | θhi) and f2(θhi | zhi ; ζh) vialatent variable θhi .

I Also, the prediction model for the latent variable θhi isobtained by using Bayes theorem:

p2(θhi | θhi ; ζh) =g1(θhi | θhi)f2(θhi | zhi ; ζh)∫

g1(θhi | θhi)f2(θhi | zhi ; ζh)dθhi(2)

Solution 19/37

Combining Step

θhi

θhi ζh

g1

f2

m2p2

p2

Sampling error model (g1)+ Latent variable model (f2)⇒ Marginal model (m2), Prediction model (p2)

Solution 20/37

Learning

I Level two model can be learned by EM algorithm: at t-thiteration, we update ζh by solving

ζ(t+1)h ← arg max

ζh

nh∑i=1

Ep2

{log f2(θhi | zhi ; ζh)

∣∣∣θhi ; ζ(t)h

}where the conditional expectation is taken with respect tothe prediction model p2 in (2) evaluated at ζ(t)h , and ζ

(t)h

denotes the t-th iteration of the EM algorithm.

Solution 21/37

Learning Using EM Algorithm

θhi

θhi

Zhi

ζhM-step

E-step

Solution 22/37

Bayesian Interpretation

I Prediction model (2) can be written as

p2(θhi | θhi ; ζh) ∝ g1(θhi | θhi)f2(θhi | zhi ; ζh).

I Here, f2(θhi | zhi ; ζh) can be treated as a prior distributionand p2(θhi | θhi ; ζh) is a posterior distribution thatincorporates the observation of θhi .

I Use of g1(θhi | θhi) instead of full likelihood simplifies thecomputation. (Approximate Bayesian Computation).

Solution 23/37

Extension to Three Level Model

Model Measurement Parameter Latent variable(Data summary)

Level 1 yhi = (yhi1, · · · , yhin) θhi

Level 2 θh = (θh1, · · · , θhnh) ζh θ = (θh1, · · · , θhnh)

Level 3 ζ = (ζ1, · · · , ζH) α ζ = (ζ1, · · · , ζH)

We can apply the same three steps to the level three model.

Solution 24/37

Bottom-up Estimation

Latent VariableModel

f3(ζh|qh;α)

f2(θhi |zhi ; ζh)

f1(yhij |xhij ;θhi)

Level

3

2

1

Sampling ErrorModel

ζh ∼ g2(ζh|ζh)

θhi ∼ g1(θhi |θhi)

Parameter Estimation

α = arg maxα∑H

h=1 log∫

g2(ζh|ζh)f3(ζh|qh;α)dζh

ζh = arg maxζh

∑nhi=1 log

∫g1(θhi |θhi)f2(θhi |zhi ; ζh)dθhi

θhi = arg maxθhi

∑nhij=1 log f1(yhij |xhij ;θhi)

Figure : An illustration of the Bottom-up approach to parameterestimation

Solution 25/37

PredictionI Our goal is to predict unobserved yhij values from the

above models using the parameter estimates.I The best prediction for yhij is

y∗hij = Ep3

[Ep2

{Ef1(yhij | xhij ,θhi) | θhi ; ζh

}| ζh; α

],

where

p3(ζh | ζh, α) =g2(ζh | ζh)f3(ζh | qh; α)∫

g2(ζh | ζh)f3(ζh | qh; α)dζh

and

p2(θhi | θhi , ζh) =g1(θhi | θhi)f2(θhi | zhi ; ζh)∫

g1(θhi | θhi)f2(θhi | zhi ; ζh)dθhi.

I The prediction is made in a top-down manner.

Solution 26/37

Prediction: Top-down Prediction

α

ζ∗2ζ∗1 ζ∗3

θ∗1i θ∗2i θ∗3i

p3 p3p3

p2 p2 p2

Predict yhij using f1(yhij | xhij ; θ∗hi).

Solution 27/37

Prediction: Top-down Prediction

Level

3

2

1

Latent

ζh

θhi

yhij

Prediction Model

p3(ζh | ζh; α)

p2(θhi | θhi ; ζh)

f1(yhij | xhij ;θhi)

Best Prediction

ζ∗h ∼ p3(ζh | ζh; α)

θ∗hi ∼ p2(θhi | θhi ; ζ∗h )

y∗hij ∼ f1(yhij |xhij ,θ∗hi)

Figure : Top-down approach to prediction

Solution 28/37

Case study: Application to Solar Energy Prediction

I We use 15-day long (12/01/2014 – 12/15/2014) data foranalysis.

I Organized the states into 12 groups.I The number of sites in each group, mh, varies between 37

and 321.

Application 29/37

Grouping Scheme

I Pooling data from nearby sites.I Can incorporate complex structure such as distribution

zone.

Application 30/37

Application: Site Level

I First assume that

yhij = xhijθhi + εhij ,

εhij ∼ t(0, σ2hi , νhi),

where σ2hi is scale parameter and νhi is degree of freedom

andθhi | θhi ∼ N(θhi ,V hi),

where V hi = V (θhi).I The degree of freedom is assumed to be unknown and

estimated by the method of Lange et al. (1989).

Application 31/37

Three Level Model

I Assume level 2 model

θhi ∼ N(βh,Σh),

and ζh = (βh,Σh)

I Similarly, level 3 model is

ζh ∼ N(µ,Σ),

and α = (µ,Σ).

Application 32/37

Comparison

I We compared the performance of the multi-level approachwith three other modeling methods:

I Site-by-site model: fit a different model for each individualsite

I Group-by-group model: fit a different model for each groupI One global model: fit a single common model for all sites

using the aggregate dataI To evaluate the prediction accuracy, we randomly selected

the 70% of the data to fit the model and tested on theremaining 30%.

Application 33/37

MSPE Comparison

I We compare the accuracy by Mean Squared PredictionError (MSPE), N−1

T∑

(yhij − yhij)2, where yhij are obtained

from four different methods and NT is the size of the testdata set.

Multi level Site model Group model Global modelMSPE 0.297 0.298 0.406 0.383

SD 0.601 0.609 0.803 0.791

Table : Accuracy comparison of the different modeling methods

Application 34/37

Comparison in Detail (nhi ≤ 100 vs > 100)

0.0

0.5

1.0

1.5

<100 >100Sample Size

Mea

n S

quar

ed E

rror

Method

Multilevel

Site Model

Group Model

Global Model

Application 35/37

Discussion

I Motivated from a real problem: A solar energy forecastingsystem has been developed.

I We used a multi-level model approach to address thepractical issues.

I There are more issues to be investigated.I Spatial modelingI Estimation of group structureI Preferential sampling of sitesI ...

I The proposed method is promising for handling big data.

Application 36/37

Application 37/37

Date post:	11-Mar-2020
Category:	Documents
Upload:	others
View:	13 times
Download:	0 times

Bottom-up Estimation and Top-down Prediction for Multi ......Bottom-up Estimation and Top-down...

Documents