+ All Categories
Home > Documents > Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

Date post: 16-Jan-2016
Category:
Upload: branden-collins
View: 226 times
Download: 2 times
Share this document with a friend
Popular Tags:
47
Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu
Transcript
Page 1: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

Gibbs Variable Selection

Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu

Page 2: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

Topics

1. Overview of Variable Selection Procedures 1. Overview of Variable Selection Procedures

2. Gibbs Variable Selection (GVS)2. Gibbs Variable Selection (GVS)

3. How to implement in WinBUGS3. How to implement in WinBUGS

4. Recommendations 4. Recommendations

5. Appendices5. Appendices

Page 3: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

Overview of Variable Selection Procedures

• Selecting the best model – The best likelihood– The link function– Priors– Variable Selection

Page 4: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

Overview of Variable Selection Procedures

• The type of response variable typically narrows our choices for:

• The best likelihood• The link function

• Also, we often only have a relatively small number of priors to choose from

• If no prior information is known non-informative priors are chosen for the model parameters.

• If prior information is known, this will also narrow our choices for priors

Page 5: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

Overview of Variable Selection Procedures

• However there may be many choices for what variables should be included in the model

• In many real world problems the number of candidate variables is in the tens to hundreds.

• For example 10 candidate variables leads to 1024 different possible linear models

• More models are possible if considering interactions and non-linear terms!

Page 6: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

Overview of Variable Selection Procedures

• Things to keep in mind when doing variable selection– P-values for variables are no longer valid.

• Variable selection is a form of “data snooping” (recall “multiple comparison procedures”)

– Correlation between the predictors could lead to less than optimal results.

– It is best to use cross-validation techniques as a safeguard

– Best used in “prediction models” rather than “effect models”

Page 7: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

Overview of Variable Selection Procedures

• Variable Selection Methods• Frequentists use methods such as

– Stepwise Regression – Mallow’s CP– Maximum R-squared

• What methods do Bayesians use?

Page 8: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

Overview of Bayesian Variable Selection Procedures

Method Used For Ease of Use References*

Gibbs Variable Selection (GVS)

Variable Selection Moderately Easy 2, 3, 7.

SSVS (Stochastic Search Variable

Selection)

Variable Selection Somewhat difficult 2, 5.

Kuo and Mallick (Unconditional

Priors for Variable Selection )

Variable Selection Very Easy 2, 6.

Carlin and ChibGeneral Model Selection

Moderately Easy 1, 2.

Reversible JumpMostly Variable Selection

Moderately Difficult 4 and also see Matt Bognar’s thesis in 241 SH

* Refer to reverence slide

Page 9: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

Overview of Bayesian Variable Selection Procedures

Method Advantages Disadvantages

Gibbs Variable Selection (GVS)

Pseudo-priors do not affect the posterior distribution.

Easy to implement using WinBUGS.

Requires pseudo-priors on all model coefficients whose sole function is to increase the efficiency of the sampler.

SSVS (Stochastic Search Variable

Selection)

Can fit a wide variety of models. Allows the user to indicate which models they think are more likely.

Requires pseudo-priors on all model coefficients and candidate models.

Kuo and Mallick (Unconditional

Priors for Variable Selection )

Extremely straightforward, only required to specify the usual prior on full parameter vector( for the full model) and the conditional prior distributions replace the pseudo-priors.

No flexibility to alter the method to improve efficiency. If, for any parameter, the prior is diffuse compared with the posterior, the method becomes inefficient.

Carlin and Chib

Flexible Gibbs sampling strategy for any situation involving model uncertainty.

Computationally demanding. Must specify efficient pseudo-priors becomes too time consuming if there are a large number of model under consideration

Reversible Jump*No need for pseudo-priors. Maybe faster than GVS.

Diffuse priors will often lead to the fewest parameter model being chosen.

Cannot implement in WinBUGS.

* Thanks to Dr. Matt Bognar for insights into reversible jump. His thesis in 241 SH contains examples of using this method.

Page 10: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

Gibbs Variable Selection

• GVS Sampling Procedure

Likelihood:

Y[i] ~ dnorm(mu[i],tau)

mu[i]= β 0 + β 1* X1 *γ 1 + β 2* X2 *γ 2 + … + β p* Xp *γ p

Prior:

γ i ~ dbern(0.5) #γ =1 means this variable is selected

β i ~γ i*Real Prior+(1-γ i)*Pseudo Prior

tau ~ dgamma(1.0E-3,1.0E-3)

Page 11: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to Implement in WinBUGS

• We adapted code from Ntzoufras, I. (2003)– This code and the paper is available on the

WinBUGS web site– The example showed variable selection for a

model with 3 candidate predictor variables– This code required the user to modify the

code extensively if they wanted to use it for their own data

Page 12: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to Implement in WinBUGS

• Our WinBUGS code– Requires no changes in the model

specification– The user must only insert their data and

modify initial values.• p=number of x variables• N=Number of observations• Models=number of models (2^p)• Initial values for beta’s

Page 13: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to Implement in WinBUGS

• Provided at the end of this presentation– Full WinBUGS code for variable selection– Code for fitting the full model in WinBUGs (for

development of Pseudopriors)• This code also only requires the user to change

the data and initial values

– R code to assist in interpreting output from WinBUGS

– SAS Code used to develop sample data

Page 14: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to Implement in WinBUGS

EXAMPLE• Data used for example

– Validated our code using published results for a model with 3 variables

– Created a simulated data set with 500 observations and 10 predictors

• 9 predictors were continuous• 1 predictor was binary• Created a version of the file with correlation between

predictors to test robustness of GVS to non-orthogonal data

– Code used to create simulated data is in Appendix 1– Full correlation matrix (for correlated data) is in

Appendix 2

Page 15: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to Implement in WinBUGS

Target model created in simulated data

Y= 2 * X1 + .7 * X6 + .22 * X10 -.7 * X5 + random error (normal 0,1)

Page 16: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to implement in WinBUGS

Determine Likelihoods and

priors

Determine Likelihoods and

priors

Fit the full model to develop information

for pseudo-priors

Fit the full model to develop information

for pseudo-priors

Standardize X Matrix

Orthoganalize X Matrix**

Standardize X Matrix

Orthoganalize X Matrix**

Step 1Step 1 Step 2Step 2 Step 3Step 3

Prior to Running Our CodePrior to Running Our Code

** see recommendations

Page 17: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to Implement GVS in WinBUGS

• Standardize X matrix– Centering the covariates will remove

correlation between the model coefficients– Dividing by the standard deviation allows for

comparison of coefficients on the same scale– May also make it easy to assign proper non-

informative priors– If the user wants to use informative priors this

may be a nuisance

Step 1Step 1

Page 18: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to Implement GVS in WinBUGS

• Standardize X matrix– In SAS

proc standard data=input_file_name mean=0 std=1 out=output_file_name; var variable_names;run;

– In Winbugs

# This is at the top of the model code and is done automaticallyfor (j in 1:p) {

b[j] <- beta[j]/sd(x[,j]) for (i in 1:N) {

z[i,j] <- (x[i,j] - mean(x[,j]))/sd(x[,j]) }temp_mean[j]<-b[j]*mean(x[,j]) }

b0 <- beta0-sum(temp_mean[])

Step 1Step 1

Page 19: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to Implement GVS in WinBUGS

• Orthoganalize X matrix**– This is recommended by those who designed GVS to

make variable selection more accurate– However, this makes interpretation of the coefficients

impossible– We do not include this step in the WinBUGS code but

suggest some alternative methods for handling correlation in our recommendations

Step 1Step 1

Page 20: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to Implement GVS in WinBUGS

• Orthogonalize X matrix**– In our example we simulated data with correlations and GVS

seems to be able to handle some correlation– More analysis needs to be done to determine the sensitivity of

the procedure to correlations in the x matrix– We recommend variable clustering to remove highly correlated

variables– If you want to orthogonalize your X matrix, this is the appropriate

SAS code

proc princomp data=input_file_name out=output_file_name;var variable_names;run;

Step 1Step 1

Page 21: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to Implement GVS in WinBUGS

• Fit full model to get Pseudo-priors– In SAS

– In Winbugs (full code is at end of document)

Step 2Step 2

proc reg data=input_file_name;model y=variable_list;run;

Model {# Standardize code here#likelihood

for (i in 1:N){Y[i] ~ dnorm(mu[i],tau)for(j in 1:p){temp[i,j]<-beta[j]*z[i,j] }mu[i] <- beta0 + sum(temp[i,]) }

beta0 ~ dnorm(0,0.00001)for (j in 1:p) { beta[j] ~dnorm(0, 1.0E-6) }tau ~ dgamma(1.0E-3,1.0E-3)sigma <- sqrt(1/tau)}

Page 22: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to Implement GVS in WinBUGS

• Fit full model to get Pseudo-priors– In SAS

Step 2Step 2

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 7.99442 0.04431 180.42 <.0001 x1 1 2.03194 0.05115 39.72 <.0001 x2 1 -0.01728 0.05096 -0.34 0.7347 x3 1 -0.09008 0.04458 -2.02 0.0439 x4 1 0.02241 0.04495 0.50 0.6183 x5 1 -0.38520 0.04590 -8.39 <.0001 x6 1 0.71948 0.04489 16.03 <.0001 x7 1 0.03432 0.04496 0.76 0.4456 x8 1 0.04153 0.04471 0.93 0.3534 x9 1 -0.01484 0.04682 -0.32 0.7515 x10 1 0.20915 0.04488 4.66 <.0001

Page 23: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to Implement GVS in WinBUGS

• Fit full model to get Pseudo-priors– In WinBugs (note estimates are very similar and SAS runs in a fraction of the time)

Step 2Step 2

node mean sd MC error 2.50% median 97.50% start samplebeta[1] 2.031 0.0511 4.74E-04 1.931 2.032 2.132 500 9501beta[2] -0.0171 0.05108 5.09E-04 -0.1142 -0.01757 0.08328 500 9501beta[3] -0.09102 0.04418 4.50E-04 -0.1777 -0.09149 -0.00371 500 9501beta[4] 0.02227 0.04458 4.06E-04 -0.06568 0.02315 0.1091 500 9501beta[5] -0.3856 0.04629 4.51E-04 -0.4772 -0.386 -0.2954 500 9501beta[6] 0.7189 0.04544 4.43E-04 0.6307 0.7186 0.8083 500 9501beta[7] 0.03404 0.04576 5.22E-04 -0.05454 0.0342 0.1245 500 9501beta[8] 0.04132 0.04451 4.03E-04 -0.04581 0.04108 0.1306 500 9501beta[9] -0.01492 0.04676 4.58E-04 -0.1063 -0.01505 0.07692 500 9501beta[10] 0.2097 0.04496 4.44E-04 0.1207 0.2097 0.2975 500 9501

Page 24: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to Implement GVS in WinBUGS

• Determine the Likelihood and Priors– Our GVS code is set up with the standard

likelihood and non-informative priors– Edit GVS code before running to put in data

for pseudo-priors

Step 3Step 3

Page 25: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to implement in WinBUGS

model {# Standardize x's and coefficients

for (j in 1:p) {b[j] <- beta[j]/sd(x[,j]) ;for (i in 1:N) {

z[i,j] <- (x[i,j] - mean(x[,j]))/sd(x[,j]) ; }

temp_mean[j]<-b[j]*mean(x[,j])}

b0 <- beta0-sum(temp_mean[])

Running Our Adapted CodeRunning Our Adapted Code

Page 26: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to implement in WinBUGS

#likelihood for (i in 1:N){

Y[i] ~ dnorm(mu[i],tau)for(j in 1:p){

temp[i,j]<-g[j]*beta[j]*z[i,j] }mu[i] <- beta0 + sum(temp[i,])

# residualsstres[i] <- (Y[i] - mu[i])/sigma

#if standardized residual is greater than 2.5, outlieroutlier[i] <- step(stres[i] - 2.5) + step(-(stres[i]+2.5) )

}

Page 27: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to implement in WinBUGS

for (j in 1:p){ # Create indicators for the possible variables in the model

# for example x[3] show after intercept, x1,x2,x1+x2, which is pow(2,3-1)

TempIndicator[j]<-g[j]*pow(2, j-1) }

#Create a model number for each possible modelmdl<- 1+sum(TempIndicator[])

# calculate the percentage of time each model is selectedfor (j in 1 : models){

pmdl[j]<-equals(mdl, j) }

Page 28: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to implement in WinBUGS

# diffuse normal prior on the intercept beta0 ~ dnorm(0,0.00001)

# if the parameter is not in the model an informative prior is used# this prior is calculated using a prior run of the full model

for (j in 1:p) {bprior[j]<-(1-g[j])*mean[j]tprior[j] <-g[j]*0.001+(1-g[j])/(se[j]*se[j])beta[j] ~ dnorm(bprior[j],tprior[j]) g[j]~ dbern(0.5) }

tau ~ dgamma(1.0E-3,1.0E-3)sigma <- sqrt(1/tau)

}

Page 29: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to implement in WinBUGS

# Fit the full model and put the information for the mean and se of each # standardized beta here for use in the pseudo-priors

DATAmean[] se[]2.013 0.1120.149 0.112-0.259 0.094-0.161 0.096-0.293 0.1340.564 0.0970.094 0.098-0.023 0.097-0.018 0.1330.214 0.088

END

Page 30: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to implement in WinBUGS

# Set the initial values for the beta's, the overall# precision and the variable selection indicators

Initial Valueslist(beta0 = 0, beta=c(0,0, 0,0,0,0,0,0,0,0), tau = .1, g=c(1,1,1,1,1,1,1,1,1,1))

# P=number of parameters# N=the number of obs# models= the number of possible models (2^p)

DATAlist(p = 10, N = 500, models = 1024Y =c(6.339,…..

Page 31: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to implement in WinBUGS

• Output– You should monitor at least all of the Beta’s and

PMDL which is the percentage of the time each model was picked

– Models will be numbered 1 – 1024 in our example (the total number of models)

– To see which model number corresponds to which variables selected we created an R function which outputs the model names in order (appendix 5)

• To use this fuction type print.ind(number of x variables) Example: print.ind(10)

Page 32: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to implement in WinBUGS

• Our Example– Ran 50,000

iterations– Some models

not visited– beta are the

standardized coefficients

– B are the raw coefficients

– Simulated • b1=2, • b5=-.7, • b6=.7, • b10=.22

Correlated Data node mean sd MC error 2.50% median 97.50%

b[1] 1.99 0.04395 2.14E-04 1.904 1.99 2.076

b[2] -0.01494 0.04601 2.16E-04 -0.1054 -0.0151 0.07503

b[3] -0.09533 0.04703 2.03E-04 -0.1878 -0.09507 -0.00378

b[4] 0.02211 0.04402 2.08E-04 -0.06457 0.02212 0.1083

b[5] -0.7717 0.08926 3.88E-04 -0.9483 -0.7714 -0.5961

b[6] 0.7387 0.04553 2.04E-04 0.65 0.7388 0.8278

b[7] 0.035 0.04602 2.23E-04 -0.05527 0.03524 0.1248

b[8] 0.04097 0.04408 1.92E-04 -0.04531 0.04119 0.128

b[9] -0.0145 0.04446 1.99E-04 -0.1007 -0.01441 0.07348

b[10] 0.2126 0.04531 1.94E-04 0.1234 0.2125 0.3015

beta[1] 2.016 0.04451 2.17E-04 1.929 2.016 2.102

beta[2] -0.01651 0.05087 2.39E-04 -0.1165 -0.01669 0.08294

beta[3] -0.09032 0.04456 1.92E-04 -0.1779 -0.09007 -0.00358

beta[4] 0.0225 0.0448 2.11E-04 -0.06572 0.02251 0.1102

beta[5] -0.3862 0.04467 1.94E-04 -0.4745 -0.386 -0.2983

beta[6] 0.7196 0.04435 1.99E-04 0.6332 0.7197 0.8064

beta[7] 0.03415 0.0449 2.17E-04 -0.05393 0.03439 0.1217

beta[8] 0.04161 0.04477 1.95E-04 -0.04602 0.04184 0.13

beta[9] -0.01521 0.04663 2.09E-04 -0.1056 -0.01511 0.07707

beta[10] 0.2096 0.04467 1.91E-04 0.1216 0.2094 0.2972

Page 33: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to implement in WinBUGS

• Our Example– The model visited

97% of the time was the target model

– This is due to the strong correlations between the response and predictors.

– We also modeled data with weaker correlations.

• in these cases the target model was usually in the top 5 models and several models were visited with higher frequency

Correlated Data

> print.ind(10) node mean sd MC error

[1] "x1+ x5+ x6+ x10" pmdl[562] 0.9693 0.1726 7.61E-04

[1] "x1+ x5+ x6" pmdl[50] 0.01232 0.1103 4.93E-04

[1] "x1+ x3+ x5+ x6+ x10" pmdl[566] 0.009555 0.09728 4.04E-04

[1] "x1+ x5+ x6+ x7+ x10" pmdl[626] 0.00204 0.04512 1.89E-04

[1] "x1+ x5+ x6+ x8+ x10" pmdl[690] 0.00202 0.0449 2.05E-04

[1] "x1+ x4+ x5+ x6+ x10" pmdl[570] 0.001778 0.04213 1.82E-04

[1] "x1+ x5+ x6+ x9+ x10" pmdl[818] 0.001333 0.03649 1.70E-04

[1] "x1+ x2+ x5+ x6+ x10" pmdl[564] 0.001252 0.03537 1.52E-04

[1] "x1+ x3+ x5+ x6" pmdl[54] 1.01E-04 0.01005 4.47E-05

Page 34: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to implement in WinBUGS

• Our Example– Frequentist

Stepwise method picks appropriate model but also includes x3 at the .05 level

– Note that the p-value of .0473 is not accurate (since we were data snooping)

– Parameter estimates are very similar

Correlated Data

Variable Estimate SE Type II SS F Value Pr > F

Intercept 0.19748 0.55106 0.12533 0.13 0.7202

x1 1.99288 0.04382 2018.143 2067.97 <.0001

x3 -0.09303 0.04678 3.85938 3.95 0.0473

x5 -0.77756 0.08863 75.11717 76.97 <.0001

x6 0.7403 0.04546 258.8251 265.22 <.0001

x10 0.20933 0.04509 21.03744 21.56 <.0001

StepVar.Entered

PartialR-Square

ModelR-Square C(p) F Value Pr > F

1 x1 0.7173 0.7173 354.643 1263.5 <.0001

2 x6 0.0874 0.8047 93.6354 222.44 <.0001

3 x5 0.0234 0.8281 25.2357 67.51 <.0001

4 x10 0.0074 0.8355 5.0224 22.21 <.0001

5 x3 0.0013 0.8368 3.091 3.95 0.0473

Page 35: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to implement in WinBUGS

• Our Example– Ran 50,000

iterations– Some models

not visited– beta are the

standardized coefficients

– B are the raw coefficients

– Simulated • b1=2, • b5=-.7, • b6=.7, • b10=.22

Non-Correlated Data node mean sd

MC error 2.50% median 97.50% start

sample

b[1] 2.039 0.04615 2.20E-04 1.948 2.039 2.129 500 49501

b[2] 0.187 0.1436 6.75E-04 -0.09499 0.1865 0.4679 500 49501

b[3] -0.1483 0.1363 5.91E-04 -0.4163 -0.1475 0.1169 500 49501

b[4] -0.1588 0.1387 6.55E-04 -0.432 -0.1587 0.1124 500 49501

b[5] -0.5456 0.09244 4.02E-04 -0.729 -0.5455 -0.3641 500 49501

b[6] 0.7378 0.0473 2.11E-04 0.6455 0.7377 0.8308 500 49501

b[7] 0.00314 0.1303 6.29E-04 -0.2525 0.003829 0.2571 500 49501

b[8] 0.01064 0.1402 6.12E-04 -0.2638 0.01136 0.2876 500 49501

b[9] 0.1248 0.1325 5.93E-04 -0.1318 0.1249 0.3871 500 49501

b[10] 0.3615 0.048 2.02E-04 0.2668 0.3613 0.4557 500 49501

beta[1] 2.037 0.0461 2.20E-04 1.947 2.037 2.127 500 49501

beta[2] 0.1781 0.1368 6.43E-04 -0.09047 0.1776 0.4457 500 49501

beta[3] -0.1472 0.1354 5.87E-04 -0.4134 -0.1465 0.1161 500 49501

beta[4] -0.1543 0.1348 6.37E-04 -0.4199 -0.1542 0.1093 500 49501

beta[5] -0.2731 0.04627 2.01E-04 -0.3649 -0.273 -0.1822 500 49501

beta[6] 0.7182 0.04604 2.05E-04 0.6283 0.7181 0.8087 500 49501

beta[7] 0.00325 0.135 6.52E-04 -0.2617 0.00397 0.2665 500 49501

beta[8] 0.01025 0.1351 5.89E-04 -0.2542 0.01094 0.2771 500 49501

beta[9] 0.126 0.1339 5.99E-04 -0.1331 0.1262 0.391 500 49501

beta[10] 0.3483 0.04625 1.94E-04 0.2571 0.3481 0.439 500 49501

Page 36: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to implement in WinBUGS

• Our Example– Oddly, the coefficient

for x10 and x5 are farther off than in the correlated data. This is by chance (found in simulated data review)

– The model visited most often is the target model

– This model is visited a slightly higher percentage of the time than in the correlated data

Non-Correlated Data

> print.ind(10) node mean sd

[1] "x1+ x5+ x6+ x10" pmdl[562] 0.9904 0.09728

[1] "x1+ x4+ x5+ x6+ x10" pmdl[570] 0.002424 0.04918

[1] "x1+ x5+ x6+ x9+ x10" pmdl[818] 0.001616 0.04017

[1] "x1+ x5+ x6+ x7+ x10" pmdl[626] 0.001535 0.03915

[1] "x1+ x5+ x6+ x8+ x10" pmdl[690] 0.001455 0.03811

[1] "x1+ x2+ x5+ x6+ x10" pmdl[564] 0.001293 0.03593

[1] "x1+ x3+ x5+ x6+ x10" pmdl[566] 0.001111 0.03331

[1] "x1+ x6+ x10" pmdl[546] 6.06E-05 0.007785

[1] "x1+ x4+ x5+ x6+ x7+ x10" pmdl[634] 2.02E-05 0.004495

[1] "x1+ x4+ x5+ x6+ x8+ x10" pmdl[698] 2.02E-05 0.004495

[1] "x1+ x5+ x6+ x7+ x8+ x10" pmdl[754] 2.02E-05 0.004495

Page 37: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

How to implement in WinBUGS

• Our Example– Frequentist Stepwise

method picks appropriate model

– Parameter estimates are very similar

Non-Correlated Data

Variable Estimate SE Type II SS F Value Pr > F

Intercept -1.68739 0.55768 9.68545 9.16 0.0026

x1 2.03861 0.04613 2065.806 1952.68 <.0001

x5 -0.54495 0.09201 37.11277 35.08 <.0001

x6 0.73788 0.04734 256.9933 242.92 <.0001

x10 0.36191 0.04785 60.5075 57.19 <.0001

StepVar.Entered

PartialR-Square

ModelR-Square C(p) F Value Pr > F

1 x1 0.695 0.695 335.019 1134.74 <.0001

2 x6 0.0914 0.7863 88.1258 212.5 <.0001

3 x10 0.0209 0.8072 33.2824 53.67 <.0001

4 x5 0.0128 0.82 0.5196 35.08 <.0001

Page 38: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

Recommendations

• Orthogonalizing the X matrix makes interpretation of the coefficients impossible– If the correlations between the response and

explanatory variables are stronger then the correlations between the explanatory variables you may not need to orthogonalize the X matrix

– If correlations are high within the x matrix we recommend using a variable clustering procedure and then pick an explanatory variable from each cluster

• In SAS use Proc VARCLUS

Page 39: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

Recommendations

• Remember the number of candidate models is 2 raised to the number of variables you have. – In our example 2^10-1024.– Sampler may have to many iterations

Page 40: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

Conclusions

• Using the adapted WinBUGS code, it is now easy to implement GVS in the standard regression setting

• We are working on also adapting other code from the Ntzoufras paper (like Ridge Regression) to make it easy to implement

• In the case of standard linear regression with non-informative priors, the GVS method appears to give almost identical results to frequentist methods implemented in SAS.– SAS took seconds to fit these models while WinBUGS took

hours

• The additional time to use GVS may only be warranted when wanting to use informative priors.

Page 41: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

Appendix 1 Simulated Data Creation Code

*Correlated Data;

data hw.simulate (drop= i j);

array x{10};do i= 1 to 500;

* Make 10 normal(J,1) variables;

do j= 1 to 10;x{j}=rannor(12458)+j;

end;

* Turn one into a binary variable;if x5 le 5 then x5=1;else x5=0;

*create correlation between two predictors;x2=.5*x1+rannor(5);if x5=1 then x9=rannor(5)+.5;else x9=rannor(5);

*create y from x1, x5, x6, and x10;

y=2*x1+.7*x6+.22*x10-.7*x5+rannor(1);

output ;end;run;

*Non-correlated Data;

data hw.simulate_nocorr (drop= i j);

array x{10};do i= 1 to 500;

* Make 10 normal(J,1) variables;

do j= 1 to 10;x{j}=rannor(12458)+j;

end;

* Turn one into a binary variable;if x5 le 5 then x5=1;else x5=0;

*create y from x1, x5, x6, and x10;

y=2*x1+.7*x6+.22*x10-.7*x5+rannor(1);

output ;end;run;

Page 42: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

Appendix 2 Simulated Data Correlation Matrix

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 yx1 0.479 0.032 0.025 0.004 0.047 -0.055 -0.039 0.153 0.063 0.847x2 0.479 -0.027 -0.012 -0.028 0.109 -0.002 -0.021 0.093 0.046 0.433x3 0.032 -0.027 -0.031 -0.038 0.019 0.023 0.034 0.013 -0.041 -0.001x4 0.025 -0.012 -0.031 0.054 0.015 -0.003 0.079 0.113 0.056 0.032x5 0.004 -0.028 -0.038 0.054 -0.007 -0.044 -0.022 0.235 0.066 -0.151x6 0.047 0.109 0.019 0.015 -0.007 0.079 -0.021 -0.056 -0.009 0.335x7 -0.055 -0.002 0.023 -0.003 -0.044 0.079 0.057 -0.096 -0.077 -0.007x8 -0.039 -0.021 0.034 0.079 -0.022 -0.021 0.057 0.009 -0.050 -0.022x9 0.153 0.093 0.013 0.113 0.235 -0.056 -0.096 0.009 -0.019 0.065x10 0.063 0.046 -0.041 0.056 0.066 -0.009 -0.077 -0.050 -0.019 0.125y 0.847 0.433 -0.001 0.032 -0.151 0.335 -0.007 -0.022 0.065 0.125

Page 43: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

Appendix 3 Code to Fit full model in WinBUGS

#This code can be run with modification# only to the data and initial values

Model {# Standardize x's and coefficients

for (j in 1:p) {b[j] <- beta[j]/sd(x[,j]) for (i in 1:N) {

z[i,j] <- (x[i,j] - mean(x[,j]))/sd(x[,j]) }temp_mean[j]<-b[j]*mean(x[,j]) }

b0 <- beta0-sum(temp_mean[]) #likelihood

for (i in 1:N){Y[i] ~ dnorm(mu[i],tau)

for(j in 1:p){temp[i,j]<-beta[j]*z[i,j] }mu[i] <- beta0 + sum(temp[i,]) }

#priorsbeta0 ~ dnorm(0,0.00001)

for (j in 1:p) {beta[j] ~dnorm(0, 1.0E-6)

}tau ~ dgamma(1.0E-3,1.0E-3)sigma <- sqrt(1/tau) }

DATA#p= the number of explanatory variables#N=the number of observations#Put your response variables and explanatory variables here

list(p = 10, N = 500, Y =c(6.339,…, 8.4532),x = structure(.Data =c(0.3534,…, 10.6885),.Dim = c(500,10)))

Initial Values#enter initial values for the betas.

list(beta0 = 0, beta=c(0,0, 0,0,0,0,0,0,0,0), tau = .1)

Page 44: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

Appendix 4 GVS variable selection code

************************************************************************** # Code from Gibbs Variable Selection Using BUGS# code of Ioannis Ntzoufras for variable selection with 3 # predictor variables was modified to allow for # variable selection for any number of variables# the user only need to modify the inits and data

model {# Standardize x's and coefficients

for (j in 1:p) {b[j] <- beta[j]/sd(x[,j]) ;for (i in 1:N) {z[i,j] <- (x[i,j] - mean(x[,j]))/sd(x[,j]) ; }temp_mean[j]<-b[j]*mean(x[,j])}

b0 <- beta0-sum(temp_mean[]) #likelihood

for (i in 1:N){Y[i] ~ dnorm(mu[i],tau)

for(j in 1:p){temp[i,j]<-g[j]*beta[j]*z[i,j] }mu[i] <- beta0 + sum(temp[i,])

# residualsstres[i] <- (Y[i] - mu[i])/sigma #if standardized residual is greater than 2.5, outlieroutlier[i] <- step(stres[i] - 2.5) + step(-(stres[i]+2.5) ) }

for (j in 1:p){ # Create indicators for the possible variables in the model

TempIndicator[j]<-g[j]*pow(2, j-1) }

#Create a model number for each possible modelmdl<- 1+sum(TempIndicator[])

# calculate the percentage of time each model is selectedfor (j in 1 : models){

pmdl[j]<-equals(mdl, j) }

# Priors# diffuse normal prior on the intercept beta0 ~ dnorm(0,0.00001)

# if the parameter is not in the model an informative prior is used# this prior is calculated using a prior run of the full model

for (j in 1:p) {bprior[j]<-(1-g[j])*mean[j]tprior[j] <-g[j]*0.001+(1-g[j])/(se[j]*se[j])beta[j] ~ dnorm(bprior[j],tprior[j]) g[j]~ dbern(0.5)

}tau ~ dgamma(1.0E-3,1.0E-3)sigma <- sqrt(1/tau)

}

Page 45: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

Appendix 4 GVS variable selection code

# Fit the full model and put the information for# the mean and se of each beta here for use in # the pseudo priors

mean[] se[]2.013 0.1120.149 0.112-0.259 0.094-0.161 0.096-0.293 0.1340.564 0.0970.094 0.098-0.023 0.097-0.018 0.1330.214 0.088

END

# Set the initial values for the beta's, the overall# precision and the variable selection indicators

Initial Valueslist(beta0 = 0, beta=c(0,0, 0,0,0,0,0,0,0,0), tau = .1, g=c(1,1,1,1,1,1,1,1,1,1))

# P=number of parameters# N=the number of obs# models= the number of possible models (2^p)

list(p = 10, N = 500, models=1024,Y =c(6.339, 4.4347, …),x = structure(.Data =c(0.3534,…10.6885),.Dim = c(500,10)))

Page 46: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

Appendix 5 R code to name models

ind<-function(p){ if(p == 0) { return(t <- 0) } else if(p == 1) {return(t <- rbind(0, 1)) } else if(p == 2) { return(t <- rbind(c(0, 0), c(1, 0), c(0, 1), c(1, 1))) } else { t <- rbind(cbind(ind(p - 1), rep(0, 2^(p - 1))), cbind( ind(p - 1), rep(1, 2^(p - 1)))) return(t) }}print.ind<-function(p){ t <- ind(p) print("intercept") for(i in 2:nrow(t)) { e <- NULL L <- T for(j in 1:ncol(t)) { if(t[i, j] == 1 & L == T) { e <- paste(e, "x", j, sep = "") L <- F } else if(t[i, j] == 1 & L == F) { e <- paste(e, "+ x", j, sep = "") }} print(e)}}

Page 47: Gibbs Variable Selection Xiaomei Pan, Kellie Poulin, Jigang Yang, Jianjun Zhu.

References

1. Carlin, B.P. and Chib S. (1995), “Bayesian Model Choice via Markov Chain Monte Carlo Methods”, Journal of Royal Statistics Society, B, 57, 473-484.

2. Dellaportas, P., Forster, J., & Ntzoufras, I. (2000), “Bayesian Variable Selection Using the Gibbs Sampler” In Dey, K., Ghosh, S., Mallick, B. (Eds.), Generalized Linear Models: A Bayesian Perspective. New York, NY, Marcel Drekker Inc.

3. Dellaportas, P., Forster, J., & Ntzoufras, I. (2002), “On Bayesian Model and Variable Selection using MCMC”, Statistics and Computing 12:27-36.

4. Green, P. (1995) “Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination”, Biometrika, Vol. 82 No. 4 711-732.

5. George, E. & McCullock, R. (1993) “Variable Selection Via Gibbs Sampling”, Journal of the American Statistical Association, September 1993, Volume 88, No. 423, Theory and Methods.

6. Kuo, L. and Mallick, B. (1998), “Variable Selection for Regression Models”, Sankhya, B, 60, Part 1, 65-81.

7. Ntzoufras, I. (2003) “Gibbs Variable Selection Using BUGS”, Journal of Statistical Software, Volume 7, Issue 7.


Recommended