+ All Categories
Home > Documents > Sim Slides,Tricks,Trends,2012jan15

Sim Slides,Tricks,Trends,2012jan15

Date post: 18-Jun-2015
Category:
Upload: dennis-sweitzer
View: 852 times
Download: 0 times
Share this document with a friend
Description:
An introduction to simulation using Excel, with clinical trial examples, macros and VBA function, plus a discussion of enterprise-wide simulation.
Popular Tags:
72
Simulation in Excel: Tricks, Trials & Trends Presented to the American College of Radiology 12 January 2012 Dennis Sweitzer, Ph.D. www.Dennis-Sweitzer.com
Transcript
Page 1: Sim Slides,Tricks,Trends,2012jan15

Simulation in Excel: Tricks, Trials & Trends

Presented to the American College of Radiology

12 January 2012

Dennis Sweitzer, Ph.D.!www.Dennis-Sweitzer.com !

Page 2: Sim Slides,Tricks,Trends,2012jan15

Abstract Simulation in Excel: Tricks, Trials & Trends Excel is a general purpose spreadsheet which is widely used & understood, but rarely used by itself for simulations. However, the Data Table function in MS Excel can be used to execute substantial simulations, without requiring cumbersome programming "tricks" or VBA coding. The result is an arbitrarily large results table in which each row is one iteration of the simulation, and each column is a random variable generated in the simulation. A small number of additional probability functions are easily programmed using VBA to make Excel a general purpose simulation package. Because VBA is interpreted, use of VBA functions can greatly limit the speed of a simulation. However, for simulations of small size and complexity, the ease and familiarity of working in Excel, outweigh the disadvantages of speed. Examples from clinical trials will be used. Finally, I discuss new methods to move simulations out of the black boxes and into the enterprise, based on work by Sam Savage. Simulation results (a “SIP”, or “Stochastic Information Packet”) from multiple platforms can be stored as XML strings(using the DIST standard) in a “SLURP” (“Stochastic Library Unit with Relationships Preserved”), and from there used for reports, planning, etc, or incorporated into other simulations.

Page 3: Sim Slides,Tricks,Trends,2012jan15

•  Some Macros and VBA functions

•  Clinical Trial Examples

Outline

•  How to do Simulation in Excel •  Notes on using Inverse Probability Functions

•  Probability Management in SIPS, SLURPS, & DIST

Page 4: Sim Slides,Tricks,Trends,2012jan15

Background

•  Occasional need for simulations •  Excel is convenient, but

–  does not explicitly support simulations – Simulation usually requires VBA programming

(so why not use R or SAS instead) – Or Add-in commercial programs (eg., @Risk) – Or some academic add-ins

•  Does have iterative calculations, Solver •  Why not simulation?

Page 5: Sim Slides,Tricks,Trends,2012jan15

Simulate what?

•  Stochastic Models – Unknown parameters? èGuestimate a distribution – Optimizing choices? èTest each with simulations

•  Sensitivity Analysis – Variations in Inputs è Variations in Outputs –  2 parameters: use a table – >2 parameters: simulate & compare variation

Page 6: Sim Slides,Tricks,Trends,2012jan15

Excel: Pros Common Language / Common Tools •  Most people understand Excel •  Many tools available in Excel Transparency: Modeling assumptions can be:

Specified -- Graphed -- Debated What you see is what you get!

More hands on deck, more eyes on the prize….: Statistician Team Member

Initial Model Explores & breaks model Repair & enhance …Repeat until satisfied

MEGO

Page 7: Sim Slides,Tricks,Trends,2012jan15

Excel Cons

Slower than in SAS, S+, R, etc Lacks some statistical/probability functions •  Latest versions are a little better •  Still need to add some VBA code •  Known bugs in statistical routines (often fixed) Tradeoffs: •  Quicker modifications

vs slower execution

Page 8: Sim Slides,Tricks,Trends,2012jan15

Simple Solution: Data Tables

Excel Data Tables •  Creates a table of values of a function Each column is a Random Variable •  Leftmost column is used as an argument

–  (unneeded for simulation) •  Data Table repeats calculations for each row Each row is a simulation iteration

Page 9: Sim Slides,Tricks,Trends,2012jan15

1. Create Simulation

Create Random Variables using Inverse Probability Method: For Random Variable X with distribution function F(x),

F(x): ℜ→ [0,1] If Random Uniform U∈ [0,1]

X = F-1(U) (Excel: U=Rand() )

Page 10: Sim Slides,Tricks,Trends,2012jan15

2. Align Random Variables •  Calculations can be

anywhere in Spreadsheet

•  Reference the Variables in a row

•  Is best to label variables in same way

Page 11: Sim Slides,Tricks,Trends,2012jan15

3. Select Data Table •  Select table region

–  1st row is Rand Vars –  1st column is not used

(can label iterations) •  From toolbar:

– Data>Data Table

Page 12: Sim Slides,Tricks,Trends,2012jan15

4. Create Simulation Table •  Column input cell =

Upper left hand corner of table

•  Row input cell = ignore •  OK è Populates the

table •  (may have to manually

recalculate)

Page 13: Sim Slides,Tricks,Trends,2012jan15

5. Execute Simulation Iterative development •  Simulation can be changed •  Add reporting variables •  Recalculate to rerun

–  (no need to use Data Table again, unless expanding)

•  Hint: debug with short table, expand for final run

Page 14: Sim Slides,Tricks,Trends,2012jan15

The End (of the key concepts)

Page 15: Sim Slides,Tricks,Trends,2012jan15

But still more….

•  Why use inverse probability distributions (instead of random variables)?

•  When not to use a spreadsheet for simulation? •  Tools:

– Macros to set up a simulation – VBA functions for common simulation distributions

•  Trends: Probability Management – SIPs, SLURPS, DIST

Page 16: Sim Slides,Tricks,Trends,2012jan15

Inverse Probability Function •  Most systems directly generate random

variables with the desired distribution •  Why use Inverse Probability Functions?

– Which are (probably) slower?

Personal opinion •  Testing & Debugging •  Verification ç Calculates correctly •  Validation ç Calculations answer Problem •  Sensitivity ç Input vs Output variability

Page 17: Sim Slides,Tricks,Trends,2012jan15

Why use Inverse Probability Distributions? •  Testing & Debugging •  Validation & Verification •  Sensitivity ç Save the Rand() values è Recreate unexpected results è Reasonableness: small changes in Rand() à small

changes in output? è Explore impact of small changes in Rand() values

on simulation output

Page 18: Sim Slides,Tricks,Trends,2012jan15

As Mapping function

Probability Distribution: F(x): ℜ→ [0,1] Random Uniform: U∈ (0,1]

Inverse PDF: X = F-1(U) For Continuous (or monotone) F-1

Small changes in u∈U è small changes in F-1 (u)

⟼F-1 U

Page 19: Sim Slides,Tricks,Trends,2012jan15

Mapping

2 Random Uniform Var As input to

Deterministic Function

Page 20: Sim Slides,Tricks,Trends,2012jan15

Mapping

Random numbers in (should)

Map to outputs in

Page 21: Sim Slides,Tricks,Trends,2012jan15

A Max value looks high. Is it a bug? If not, how often?

Example #1 Simple model,

function of 2 RV

Saved random U[0,1] For each iteration Check u∈U[0,1] That generated high value u=0.983… è random high è Rarely happens

Saving {Ui}: •  Verify •  Replicate •  Quantify

Page 22: Sim Slides,Tricks,Trends,2012jan15

Example #1 (Sensitivity) Sort by U1, U2 çSensitive to U1

çInsensitive to U2

Page 23: Sim Slides,Tricks,Trends,2012jan15

Spreadsheet limitations •  Only simple data structures are available

– Rows & columns, no lists & trees –  Discrete event simulations

•  Complex algorithms: difficult – Eg, While or for loops – Can improvise (cumbersome, slow, buggy)

•  Speed: slow •  Data Storage: what-you-see-is-all-you-get

Page 24: Sim Slides,Tricks,Trends,2012jan15

Tools: Excel Simulation Template

•  Adds some missing random functions •  Adds some set-up macros

Excel template & examples at:

www.Dennis-Sweitzer.com

Page 25: Sim Slides,Tricks,Trends,2012jan15

Macro SimulateSampler To start a new simulation when you don't remember the names & parameters of common random variables used in simulation: •  Run the Macro SimulationSample •  Copy, delete, and edit as needed. •  Make sure all random values are referenced

in the first row of the data table at the bottom.

Page 26: Sim Slides,Tricks,Trends,2012jan15

Macro SimulationSampler •  Creates a simulation with

each of common simulation functions

Page 27: Sim Slides,Tricks,Trends,2012jan15

Macro SimulationSampler ……… •  Sets up header

row for data table

•  Sets up a place for statistics

Page 28: Sim Slides,Tricks,Trends,2012jan15

Macro Simulate •  Highlight the row of random variables

–  (1st row of simulation table) •  Run macro "Simulate”

– Prompts for which will ask for the number of simulation iterations,

– The default number of iterations is 100 – Debug & develop (manually recalculate) – Final run with >1000 iterations – Visual Basic code is computationally intensive,

Page 29: Sim Slides,Tricks,Trends,2012jan15

Macro Simulate

Page 30: Sim Slides,Tricks,Trends,2012jan15

Excel Random Variables

Rand() --Random Uniform [0,1] NormSInv() – Inverse Standard Normal Distribution CriticalBinomial() – Inverse Binomial Distribution LogNormInv() - Inverse Log Normal Distribution

Caveat: parameters are mean, SD after the Log transformation

Page 31: Sim Slides,Tricks,Trends,2012jan15

Erlang Distribution

How long do you wait until you get a predetermined number of arrivals? •  Interarrival times are distributed IID

exponential •  Erlang is Gamma with integer parameter

Page 32: Sim Slides,Tricks,Trends,2012jan15

Beta Distribution

Can use as •  Distribution of a Binomial probability •  Range = [0,1]

•  Generic bounded hump (vs Normal as generic unbounded hump) •  Better behaved than a triangular distribution

Page 33: Sim Slides,Tricks,Trends,2012jan15

Example#2, Problem

Client: “Here’s our plan….” •  Simple spreadsheet calculation

– But only the expected value, –  but not variability

Page 34: Sim Slides,Tricks,Trends,2012jan15

Example #2, Simulation •  Time to 100th

patient •  Patients arrive

IID Exponential

Summary Statistics of Simulated values (below) Interpretation: under the assumptions, 90% of simulations required more than 4.4 months

Page 35: Sim Slides,Tricks,Trends,2012jan15

Added VBA Functions Inverse Functions Needed for Simulation •  Poisson, Negative Binomial Interpolation from Table •  Interpolate: 1 or 2 dimensional interpolation Convenience •  Beta with Mean, SD as parameters •  Beta with Hi, Low, and Mode used for

parameters •  Log Normal with mean, SD as parameters

Page 36: Sim Slides,Tricks,Trends,2012jan15

Missing Statistical Functions Inverse Distributions

•  InvPoisson :: Poisson •  InvPascal :: Negative Binomial

– (how many failures before k successes)

•  Negative Binomial is continuous valued distribution; •  Discrete version is often denoted Pascal distribution

Page 37: Sim Slides,Tricks,Trends,2012jan15

Example#3, Patients to Screen

Expected Enrollment rate = 75% ± 5%

~ Beta Distribution # Screen Failures ~ Negative Binomial (Pascal)

– Depends on Enrollment Rate

Page 38: Sim Slides,Tricks,Trends,2012jan15

Beta Distribution (2)

For Convenience •  Beta distribution given Mean, SD •  Beta distribution given Mean, SD, upper, lower bounds •  Beta distribution given Mode, Upper, Lower bounds

Page 39: Sim Slides,Tricks,Trends,2012jan15

Simulation from a Table

⇒ Find the value in the 1st vector; ç Return interpolated value from 2nd Simulate arbitrary distribution: •  Top Row: values in [0,1] •  Bottom Row: Quantiles •  Result: interpolated value of U from table Or a function: y=f(x) •  X is found in top row, y is interpolated from bottom row

Page 40: Sim Slides,Tricks,Trends,2012jan15

Table Simulation Uses

• Polygonal distributions (like Triangular) • Survival curve (for time to event)

– Est. K-M curve from data, simulate rest of trial • Arbitrary empirical distributions • Distribution from observations • Table of power calculations

– eg, assurance calculations: • If # patients is random, so is effective power of the study • If True effect size is random, so is Pr{success}

Page 41: Sim Slides,Tricks,Trends,2012jan15

Simulation from a 2-dimensional table

Here: •  Rows are quartiles of a random function •  Left column is value of a parameter •  A family of distributions which vary with the parameter

•  Parameter y=75% (can be random) •  Generate random numbers from the interpolated distribution.

Page 42: Sim Slides,Tricks,Trends,2012jan15

Example #4: Interim Review •  After 2 months, review randomization rates •  Continue to Randomize to 100 patients •  How long?

Page 43: Sim Slides,Tricks,Trends,2012jan15

Example#4: Interim Review (Simulation)

Y= # Patients at 2 mos ~ Poisson Time to Randomize (100-Y) additional pts ~ Erlang (Gamma) 80% CI:; (2.5, 3.7) months

Page 44: Sim Slides,Tricks,Trends,2012jan15

Clinical Trials Applications

•  Simulations for planning •  Prototyping larger simulation •  Checking assumptions/validation

Page 45: Sim Slides,Tricks,Trends,2012jan15

Planning Expected Trial Performance •  Usually not of interest -- already done w/o simulation •  But should be Variability of Trial Performance •  Important for Risk Management: “What’s the earliest,

the latest, the most, the least, etc” •  80% CIs Structural Problems •  Interactions of parameters may doom the trial before it

even starts! (eg, mean (max{ X, Y} ) vs max{ mean(X), mean(Y) } )

¡The Flaw of Averages! �

Page 46: Sim Slides,Tricks,Trends,2012jan15

Prototyping Prototyping: •  Toy simulation with hands-on teamwork •  Development model •  Get team buy-in on assumptions •  Processing speed not important •  Rapid modifications are important Ideal? •  Develop a prototype in an 1 hour meeting •  Check for errors later •  Run large simulations later for precise estimates

Page 47: Sim Slides,Tricks,Trends,2012jan15

Checking planning assumptions •  H0 = Simulation assumptions •  Observed: a value X •  {xi} = corresponding values in simulation •  Rank of X in {xi} ≈ p-value Stored Values: Use Function Percent Rank Descriptive Statistics: Use Frequency Count Use to: •  Test assumptions, validate model, +?? •  If an observed value of X is rare in the simulation,

question assumptions!

Page 48: Sim Slides,Tricks,Trends,2012jan15

Checking Assumptions Example: •  A trial is designed based on a non-trivial simulation. •  The model predicts a completion rate of 65%

with 95% C.I.= (55%, 75%) •  4 months into the trial, a 50% completion rate is

observed. •  How significant is this discrepancy? Resimulate: •  {xi} = simulated completion rates (1/iteration) •  Rank of observed 50% in simulated {xi} ≈ p-value •  “How likely is the observation, under the modeled

assumptions?”

Page 49: Sim Slides,Tricks,Trends,2012jan15

Sensitivity Analysis

•  “What-ifs” •  Interactions between parameters

è Identify Key Control points! �•  Vary parameters between simulations •  Compare simulation results

– Eg, average, worst-case scenarios

•  Correlations between simulated parameters and outcomes

Page 50: Sim Slides,Tricks,Trends,2012jan15

Weighted simulations

Advantage: •  Large but unlikely events are more likely to

be simulated •  Common but dull events are simulated

infrequently, but up-weighted •  Rare, but exciting, events are simulated, and

down-weighted

Page 51: Sim Slides,Tricks,Trends,2012jan15

Macro Management VBA Editor:

Alt-F11 (or find the menu) •  Copy Module between sheets •  Copy code from .xls sheet &

insert into VBA editor •  Open & save as new sheet

Page 52: Sim Slides,Tricks,Trends,2012jan15

Macro Management (newer) In Visual Basic From the Tool Bar •  File > Export File

– Export VBA code (module: “SweitzerSimulationCoreCode”)

•  File > Import File –  Imports VBA code (into a module)

Page 53: Sim Slides,Tricks,Trends,2012jan15

Further resources

Commercial and Free software packages Provide: •  More rigorous algorithms •  More functions

– Resampling, multivariate, etc •  More support

Page 54: Sim Slides,Tricks,Trends,2012jan15

Commercial Add-Ins

@RISK www.palisade.com

Crystal Ball www.decisioneering.com

Page 55: Sim Slides,Tricks,Trends,2012jan15

Free Add-Ins PopTools (Windows only)

www.cse.csiro.au/poptools SimTools.xla (Macintosh & Windows) http://home.uchicago.edu/~rmyerson/addins.htm Caveat: Licensing •  Free for non-commercial (eg, education) •  Not clear for other uses

(NB: vba code from my website is free for all use, � but not as useful)�

Page 56: Sim Slides,Tricks,Trends,2012jan15

Semi-Commercial

Low-cost Excel simulation add-in: •  RiskSim by Michael Middleton •  www.treeplan.com/ •  Also: Decision Trees, Sensitivity Analysis,

on-line text-book: http://www.treeplan.com/chapters.htm

Page 57: Sim Slides,Tricks,Trends,2012jan15

Additional Reading INTRODUCTION TO MODELING AND GENERATING PROBABILISTIC INPUT PROCESSES FOR SIMULATION

www.informs-sim.org/wsc07papers/008.pdf Spreadsheet Simulation (Seila, 2006) www.informs-sim.org/wsc06papers/002.pdf Work Smarter, Not Harder: Guidelines for Designing Simulation Experiments www.informs-sim.org/wsc06papers/005.pdf Tips for the Successful Practice of Simulation www.informs-sim.org/wsc06papers/007.pdf

Page 58: Sim Slides,Tricks,Trends,2012jan15

Probability Management

Built more elaborate models Learned to •  Display results in column •  Copy values to save •  Do math with the results

Why not? •  Save columns

of simulated iterations

•  Recombine as needed

Page 59: Sim Slides,Tricks,Trends,2012jan15

Combining simulations results

•  Ie., portfolio optimization

Why not? •  Save columns

of simulated iterations

•  Recombine as needed

Study#1, Late Start

Study#2, Early Start

Study#1, Early Start

Study#2, Late Start

4 simulations: { 2 studies} x {2 scenarios}

Estimates of total: •  Resources •  Costs •  Pr{success}

⇒ Pick optimal

M Requires independence!

Page 60: Sim Slides,Tricks,Trends,2012jan15

Combining simulation iterations

•  Preserves relationships

Why not? •  Save columns

of simulated iterations

•  Recombine as needed

Study#1, Late Start

Study#2, Early Start

Study#1, Early Start

Study#2, Late Start

4 simulations: { 2 studies} x {2 scenarios}

Estimates of …

Simulation of common

factors

Page 61: Sim Slides,Tricks,Trends,2012jan15

Probability Management

Primary source for rest of presentation: Savage, Scholtes and Zweidler, 2006, "Probability Management," OR/MS Today, Vol.33, No.1 (February 2006) •  http://www.orms-today.org/orms-2-06/frprobability.html (Part 2) •  http://www.orms-today.org/orms-4-06/frprobability.html

Further research: Other people already doing it

Page 62: Sim Slides,Tricks,Trends,2012jan15

Basic idea

Dependent Simulations

Estimates of …

Simulations of common

factors

Simulations of common

factors

Simulations of common

factors

Dependent Simulations

Dependent Simulations Dependent Simulations

Reporting & Analysis Programs

Page 63: Sim Slides,Tricks,Trends,2012jan15

Basic idea

Simulations Simulations

Reporting & Analysis Programs

Simulations Simulations

Simulations Simulations

Reporting & Analysis Programs

•  Database of Simulation Results •  Results at the iteration level •  Coherent

Multiple simulations: •  Different platforms

•  Different sources •  Different uses

Page 64: Sim Slides,Tricks,Trends,2012jan15

Basic Definitions Simulations

SLURP: Stochastic Library Unit with Relationships Preserved •  SIPs are coherent with each other

–  Eg, in each SIP, iteration #4567 is from the same alternative universe

•  Analogous to demographic “Representative Samples”

SIP: Stochastic Information Package •  Basic unit of information •  Eg, “the price of oil”, but for

10,000 alternative universes

Page 65: Sim Slides,Tricks,Trends,2012jan15

Basic Definitions Simulations

Benefits of coherent modeling •  Statistical dependencies are

modeled consistently across the organization

•  Models can be “rolled up” between levels of the organization

•  Auditability: Easier to audit individual simple models

Requires central control: •  Common standards •  Certification authority

–  “Chief Probability Officer”

Page 66: Sim Slides,Tricks,Trends,2012jan15

Coherence Simulations

Example: variables X&Y •  Coherent •  But not correlated

Requires central control: •  Common standards •  Certification authority

–  “Chief Probability Officer”

Page 67: Sim Slides,Tricks,Trends,2012jan15

DIST Standard Simulations XML

•  10,000 numbers ⇒ 1 XML string Metadata + Base 64 encoding of values

Contents: •  Name •  Mean, Min, Max,

Count of values •  Data type (Binary,

1 or 2 Byte) 3 bytes (8 bits each) into 4 characters (6 bits each)

How to Store SIPs? •  Massive

amounts of data

How to Share SIPs? Reduce precision

and pack it!

Page 68: Sim Slides,Tricks,Trends,2012jan15

DIST Standard

•  Each cell contains an array •  Operations apply functions

to each element in array

•  A SIP in DIST ⇒ fits into 1 cell on a spreadsheet

<dist name="User Interface, weeks" avg="3.3751" min="2.03" max="7.75" count="100" type="Double" origin="DistShaper3 at smpro.ca" ver="1.1" >G00Z9SIDCIEmC0nYFtMi6R0XKZ+KvSzBI85ui5tMZgoDlbGt dF1d/CqEMwUlmCfVMMg6oUByUXQyIATsaSw1QhgrhOwaaAI9D 6oks9M+IDk0XQyIDlI2mhJZBkQXRnm7IR45ST3D///IDlgrHD I38VraK2kLownZf41jWw1tROxTsS/jGRAUJCbwHfwougAAEXR r3A83FQnpnhXukBxM+kswBykeb0gOQ5RByk83PxtV7mCrH1QQ jy6LPGstpgFYRrYKvqZ9Ez8AAAAA</dist>!

Source: Marc Thibault, Sam Savage. Probability Management for Projects: Managing Uncertainty in plan estimates and targets.. October 2011

Page 69: Sim Slides,Tricks,Trends,2012jan15

Supporting Software

<dist name="User Interface, weeks" avg="3.3751" min="2.03" max="7.75" count="100" type="Double" origin="DistShaper3 at smpro.ca" ver="1.1" >G00Z9SIDCIEmC0nYFtMi6R0XKZ+KvSzBI85ui5tMZgoDlbGt dF1d/CqEMwUlmCfVMMg6oUByUXQyIATsaSw1QhgrhOwaaAI9D 6oks9M+IDk0XQyIDlI2mhJZBkQXRnm7IR45ST3D///IDlgrHD I38VraK2kLownZf41jWw1tROxTsS/jGRAUJCbwHfwougAAEXR r3A83FQnpnhXukBxM+kswBykeb0gOQ5RByk83PxtV7mCrH1QQ jy6LPGstpgFYRrYKvqZ9Ez8AAAAA</dist>!

MS Excel Spreadsheet Add-ins •  Risk Solver from Frontline Systems (www.Solver.com) •  XLSim 3 (www.VectorEconomics.com)

–  small (single sheet) interactive simulation with DISTs –  enables the users of Oracle Crystal Ball and @Risk from

Palisade Corp. to read and right DISTs.

•  Analytica from Lumina Decision Systems, Inc (www.Lumina.com)

SAS? R/S+ --Already is vector oriented •  RExcel runs R from Excel. ??

Page 70: Sim Slides,Tricks,Trends,2012jan15

R/S+ Ø  x1<-rnorm(10000) # an array of 10,000 standard random normal Ø  y1<-rpois(10000, 5) # an array of 10,000 random poissons Ø  (x1+y1)[1:10] # element by element operations

•  Already handles vectors – very fast •  Needs functions to encode & decode DIST

¿Accessing R from with spreadsheet? •  RExcel – Access R from within Excel (Addin) •  ROOo – Access R from within OpenOffice spreadsheet

•  Open Source (like LINIX)

•  (Perhaps) use spreadsheet for upper level simulation •  Use R at lower level – each cell contains 1000’s of simulated values

Page 71: Sim Slides,Tricks,Trends,2012jan15

Probability Management

Savage, Scholtes and Zweidler, 2006, "Probability Management," OR/MS Today, Vol.33, No.1 (February 2006) •  http://www.orms-today.org/orms-2-06/frprobability.html (Part 2) •  http://www.orms-today.org/orms-4-06/frprobability.html

Page 72: Sim Slides,Tricks,Trends,2012jan15

The End (Actual – not simulated)


Recommended