Monte Carlo Simulation and Resampling

Date post: 11-Dec-2016
Monte Carlo Simulation and Resampling Tom Carsey (Instructor) Jeff Harden (TA) ICPSR Summer Course Summer, 2011 — Monte Carlo Simulation and Resampling 1/114
Monte Carlo Simulation and Resampling

Tom Carsey (Instructor)

Jeff Harden (TA)

ICPSR Summer Course

Summer, 2011

— Monte Carlo Simulation and Resampling 1/114

Introductions and Overview

What do I plan for this course?

What do you want from this course?

What are the expectations for everyone involved?

Overview of syllabus

— Monte Carlo Simulation and Resampling 2/114

What is the Objective?

The fundamental objective of scientific research is inference.

By that I mean we want to use the data we observe to draw

broader conclusions about a process we care about that

extend beyond our data.

We have a sample of data we can study, but the goal is to

learn about the population from which it came.

Monte Carlo simulations and resampling methods help us

meet these objectives.

— Monte Carlo Simulation and Resampling 3/114

Why Use Simulations?

Analysis where observable data is not available

Mimic the repeated sampling framework of classical

frequentist statistics

Provide solutions where analytic solutions are not available or

are intractable

Testing hypothetical processes

Robustness Checks

Mimics an experimental lab

— Monte Carlo Simulation and Resampling 4/114

What is a Monte Carlo Simulation?

Computer simulation that generates are large number of

simulated samples of data based on an assumed Data

Generating Process (DGP) that characterizes the population

from which the simulated samples are drawn.

Patterns in those simulated samples are then summarized

and described.

Such patterns can be evaluated in terms of substantive

theory or in terms of the statistical properties of some


— Monte Carlo Simulation and Resampling 5/114

What is a DGP?

A DGP describes how a values of a variable of interest are

produced in the population.

Most DGP’s of interest include a systematic component and

a stochastic component.

We use statistical analysis to infer characteristics of the DGP

by analyzing observable data sampled from the population.

In applied statistical work, we never know the DGP – if we

did, we wouldn’t need statistical estimates of it.

In Monte Carol simulations, we do know the DGP because

we create it.

— Monte Carlo Simulation and Resampling 6/114

What is Resampling?

Like Monte Carlo simulations, resampling methods use a

computer to generate a large number of simulated samples

of data.

Also like Monte Carlo simulations, patterns in these

simulated samples are then summarized, and the results used

to evaluate substantive theory or statistical estimators.

What is different is that the simulated samples are generated

by drawning new samples (with replacement) from the

sample of data you have.

In resampling methods, the researcher DOES NOT know or

control the DGP, but the goal of learning about the DGP

remains the same.

— Monte Carlo Simulation and Resampling 7/114

Simulations as Experiments

Experiments rest on control of the research environment.

Control achieved by balanced (often through randomization)

assignment of observations to groups.

Then, all members of all groups are treated equally except

for one factor.

If differences emerge between groups, causality is attributed

to that factor, which is generally called the treatment effect.

Examples in applied research

— Monte Carlo Simulation and Resampling 8/114

Simulations as Experiments (2)

Computer simulations follow the same logic.

The computer is the “lab” and the researcher controls how

simulated samples are generated.

One factor is varied across groups of simulated samples and

any differences that appear are attributed to that factor

(again, generally called the treatment effect).

— Monte Carlo Simulation and Resampling 9/114

Simulations as Experiments (3)

The power of experiments rests in their control of the

environment and the resulting claims of causality.

Of course, finding out that the treatment causes some

response does not necessarily explain why that response


The limitations of experimental work include:

They can quickly become very complex

Results may not generalize well to the (necessarily more

complex) real world outside of the lab.

— Monte Carlo Simulation and Resampling 10/114

Populations and Samples

The distinction between the population DGP and the

sample(s) of data we generate or have available to us is


If the goal is inference (descriptive or causal), then we are

attempting to make statements about the population based

on some sample data.

The fundamental difference between Monte Carlo simulation

and resampling is that we create/control the population

DGP in Monte Carlo simuations, but not in resampling.

Both methods allow us to evaluate theoretical and/or

statistical assumptions.

Both methods offer opportunities to relax or eliminate some

statistical assumptions.

— Monte Carlo Simulation and Resampling 11/114

Monte Carlo Simulation of OLS

Ordinary Least Squares (OLS) Regression assumes some

dependent variable (often labeled Y ) is a linear function of

some set of independent variables (often labeled as X ’s), plus

some stochastic (random) component (often labeled as ε).

A set of parameters describes the relationship between the

X ’s and Y . They are often represented as β’s.

The Model might be represented like this:

Yi = β0 + β1X1i + β2X2i + . . .+ εi (1)

Or like this in matrix notation

Y = Xβββ + εεε (2)

— Monte Carlo Simulation and Resampling 12/114

0 1 2 3 4 5





Component Parts of a Simple Regression

Independent Variable -- X



nt V



-- Y yi = β0 + β1xi



} β0

Monte Carlo Simulation of OLS (2)

Next we need to specify more about the stochastic

component of the model.

In OLS, we generally assume that the residual follows a

normal distribution with a mean of zero and a constant

variance. This can be expressed as:

εi ∼ fN(ei | 0, σ2) (3)

where σ2 represents a constant variance.

We have now specified the systematic and the stochastic

components of Y .

— Monte Carlo Simulation and Resampling 14/114

Monte Carlo Simulation of OLS (3)

We can rewrite these two components as follows:

Y ∼ fN(yi | µi , σ2) (4)

µi = Xβ (5)

This set-up models the randomness in Y directly, and makes

clear that the conditional mean of Y is captured by Xβ.

The value of this set-up is it can be generalized, like this:

Y ∼ f (y | θ, σ2) (6)

θ = g(X , β) (7)

This makes clear that the functions f and g must be clearly

specified as part of the DGP for Y .

Monte Carlo simulations focus all the nitty gritty of

specifying these functions.

— Monte Carlo Simulation and Resampling 15/114

Know Your Assumptions

To simulate a DGP with the goal of evaluating a statistical

estimator, you need to know the assumptions of that


For OLS, the key ones are:

Independent variables are fixed in repeated samples

The model’s residuals are independently and identically

distributed (iid)

The residuals are distributed normally

No perfect collinearity among the independent variables

These assumptions must be properly incorporated into the

simulation, but then can be examined one by one through

repeating the simulation.

— Monte Carlo Simulation and Resampling 16/114

“Fixed in Repeated Samples – Really?”

In experimental analysis, this assumption is plausible.

Researchers often fix the exact values of the treatment


In observational analysis (like most of social science), it is

not. X ’s are random variables just like Y .

Thus, there is some DGP out there for the X ’s as well.

The key element of this assumption boils down to assuming

that the X ’s are uncorrelated with the residual (ε) from the

regression model.

In short, the DGP for the X ’s must be uncorrelated with the

DGP for the residuals.

We’ll see how measurement error in X messes this up.

— Monte Carlo Simulation and Resampling 17/114

Simulating OLS in R

set.seed(123456) # Set the seed for reproducible results

sims ¡- 500 # Set the number of simulations at the top of the script

alpha.1 ¡- numeric(sims) # Empty vector for storing the simulated intercepts

B.1 ¡- numeric(sims) # Empty vector for storing the simulated slopes

a ¡- .2 # True value for the intercept

b ¡- .5 # True value for the slope

n ¡- 1000 # sample size

X ¡- runif(n, -1, 1) # Create a sample of n observations on the variable X.

# Note that this variable is outside the loop, because X

# should be fixed in repeated samples.

for(i in 1:sims)– # Start the loop

Y ¡- a + b*X + rnorm(n, 0, 1) # The true DGP, with N(0, 1) error

model ¡- lm(Y ˜ X) # Estimate OLS Model

alpha.1[i] ¡- model$coef[1] # Put the estimate for the intercept

# in the vector alpha.1

B.1[i] ¡- model$coef[2] # Put the estimate for X in the vector B.1

˝ # End loop

— Monte Carlo Simulation and Resampling 18/114

0.10 0.15 0.20 0.25 0.30



Simulated Distribution of Intercept

Estimated Values of Parameters


0.3 0.4 0.5 0.6 0.7



Simulated Distribution of Slope

Estimated Values of Parameters


What Did We Learn?

We see that the estimated intercepts and slopes vary from

one simulated sample to the next.

We see that they tend to be centered very near the true

values we specified in the DGP

We see that their distributions are at least bell-shaped, if not

perfectly normal.

We can learn a lot more, however, if we manipulate features

of the DGP, re-run the simulation, and then observe what, if

anything, changes.

I’ll leave the nuts and bolts to lab, but let’s look at one


— Monte Carlo Simulation and Resampling 20/114

Multicollinearity in OLS

What is Multicollinearity?

What does it do to OLS results?

Let’s investigate this with a simulation

— Monte Carlo Simulation and Resampling 21/114

Multicollinearity Simulation

Model with 2 independent variables, correlated at .1, .5, .9.

and -.9

Each sample size is 1,000

I draw 1,000 simulated samples at each level of correlation

True values for β0 = 0, β1 = .5, and β2 = .5

Here is what I get

— Monte Carlo Simulation and Resampling 22/114

-0.10 -0.05 0.00 0.05 0.100




Density Estimate of B0 by Level of Multicollinearity

Estimated Values of B0


Low CorrMedium CorrHigh Corr

0.3 0.4 0.5 0.6 0.7



Density Estimate of B1 by Level of Multicollinearity

Estimated Values of B1


Low CorrMedium CorrHigh Corr

0.3 0.4 0.5 0.6 0.7



Density Estimate of B2 by Level of Multicollinearity

Estimated Values of B2


Low CorrMedium CorrHigh Corr

0.3 0.4 0.5 0.6 0.7






Population Correlation = 0

Estimate of Beta 1



e of


a 2

0.3 0.4 0.5 0.6 0.7






Population Correlation = 0.5

Estimate of Beta 1



e of


a 2

0.3 0.4 0.5 0.6 0.7






Population Correlation = 0.9

Estimate of Beta 1



e of


a 2

-0.04 -0.02 0.00 0.02






Population Correlation = 0.9

Diff in Cor of X1 and Y compared to X2 and Y



e of


a 1

Randomness and Probability

Making inference requires use of probability and probability


We draw a sample, but we want to speak about the larger


We can make those statements if we have a sense of the

probability of drawing the sample that we have.

The key element to drawing a useful sample is randomness.

— Monte Carlo Simulation and Resampling 25/114

For a sample to be random, it means that every element in

the larger population had a fair or equal chance of being


If the sample is large enough, it will include mostly “typical”


It will also include some “odd” cases.

When the sample is large, it will have enough cases that are

odd in different ways to cancel out, and enough typical cases

to outweigh the few odd ones.

Thus, large random samples give us a great deal of statistical


— Monte Carlo Simulation and Resampling 26/114

Probability Model

To make inference, we need to develop a probability model

for the data.

We need to generate a belief about the probability that the

population of possible cases would produce a sample of

observations that looks like the one we have.

A probability model for a single variable describes the range

of possible values that variable could have and the

probability, or likelihood, of the various possible values

occurring in random sample.

Note that OLS (logit, probit – really any single equation

model) is really a probability model about a single variable –

Y . It’s just a probability that is conditional on some X ’s.

— Monte Carlo Simulation and Resampling 27/114

Probability Model (cont.)

A random variable represents a random draw from the

population. Each data point is a particular realization of a

random variable.

That means that it’s value for the variable under

consideration is just one observed value from the whole

range of possible values that could have been observed.

The range of all possible values for a random variable is

called the distribution of that variable.

The shape of that distribution describes how likely we are to

observe particular values of a random variable if we were to

draw one out by chance. This is the probability distribution

for the random variable in question.

— Monte Carlo Simulation and Resampling 28/114

Drawing a Random Sample

Each observation is just one of many we could have selected.

Each entire sample is just one of many we could have


We could learn a lot about the probability distribution that

describes the population if we could draw lots and lots of


In observational work, we usually only have one sample in our

hands, which is why we end up making some assumptions

about the probability distribution that describes the larger

population from which our one sample was taken.

But in simulations, we can generate lots and lots of samples.

— Monte Carlo Simulation and Resampling 29/114

What is Probability?

Probability involves the study of randomness and uncertainty.

At a fundamental level, a probability is a number between 0

and 1 that describes how likely something is to occur.

Another way to think about it is how frequently an outcome

will result if an action, or trial, is repeated many times. This

so-called “Frequentist” view of probability lies at the heart of

classical statistical theory and the notion of repeated


The idea of an “expected” outcome is how we test

hypotheses. We compare what we observe to what we

expected given some set of assumptions and we try to decide

how likely it was to observe what we observed.

— Monte Carlo Simulation and Resampling 30/114

Example: Flipping a Coin

Suppose I have a coin and I toss it in the air. What is the

probability that it will come up “Heads?”

We can make an assumption about the coin being fair and

assert based on that assumption that the probability is .5.

Or we could flip the coin a lot of times and see how

frequently we get Heads. If the coin is fair, it should be

about half the time.

The first approach defines a probability as a logical

consequence based on assumptions.

The second approach relies on the law of large numbers to

approach the true probability. The law of large numbers says

that increasing the number of observations leads the

observed average to converge toward the true average.

The coin toss example is shown in the next slide.

— Monte Carlo Simulation and Resampling 31/114

0 100 200 300 400 500








Number of Trials






on o

f Suc





Figure: Cumulative Frequency of the Proportion of Coin Flips that

come up Heads

Randomness Again

Random does NOT mean haphazard or chaotic.

Any one observation or trial might be hard to predict.

However, Random Variables are systematic. They follow

rules and they show stability in the long run.

Again, the way to think about this is that the range of

possible values and how likely each one is to occur is

described by a probability distribution. You either need to

assume what that distribution is, find a way to uncover it, or

find methods that are robust to various distributions.

— Monte Carlo Simulation and Resampling 33/114

Properties of Probabilities

All probabilities fall between 0 and 1.

The probability of some event, E, happening, often written

as P(E ), is defined as 0 ≤ P(E ) ≤ 1.

The sum of the probabilities of all possible outcomes must

equal 1.

If E is a set of possible outcomes for an event, then P(E )will equal the sum of the probabilities of all of the events

included in set E .

Finally, for any set of outcomes E , P(E ) + P(not E ) = 1. In

other words, P(E ) + (1− P(E )) = 1.

— Monte Carlo Simulation and Resampling 34/114

Conditional Probability

So far, we have been dealing with independent events.

When the probability of one outcome changes depending on

some other factor, then the probability is conditional on

that other factor.

For example, the probability that a citizen might turn out to

vote could depend upon whether that person lives in a place

where the campaign is close and hotly contested.

The conditional probability of event E happening given that

event F has happened, is generally written like this: P(E |F ).

— Monte Carlo Simulation and Resampling 35/114

Conditional Probability (cont.)

A conditional probability can be computed like this:

P(E |F ) =P(E ∩ F )P(F )

From this, we can say E is independent of F if and only if

P(E |F ) = P(E ) (which also implies that P(F |E ) = P(F )).

Another way to think about independence is that two events

E and F are independent if:

P(E ∩ F ) = P(E )P(F )

This second expression captures what is called the

“multiplicative” rule regarding conditional probabilities.

— Monte Carlo Simulation and Resampling 36/114

Probability Distributions

Describes the range of possible values and probability of

observing those values in a random draw (with replacement).

PDF - Probability Distribution Function (Discrete) or

Probability Density Function (continous)

CDF - Cumulative Distribution/Density Function.

Total Area under a PDF sums to 1.

The CDF records the accumulated probability as is

approaches 1.

— Monte Carlo Simulation and Resampling 37/114

PDF of the Normal

−3 −2 −1 0 1 2 3










F o

f X

Figure: PDF of a Random Variable, X, Distributed Normally with

mean=0 and sd=1

— Monte Carlo Simulation and Resampling 38/114

CDF of the Normal

−3 −2 −1 0 1 2 3











F o

f X

Figure: CDF of a Random Variable, X, Distributed Normally with

mean=0 and sd=1

— Monte Carlo Simulation and Resampling 39/114

PDF’s and CDF’s

Sum of area under a PDF equals 1.

The CDF shows this summation across the range of the


Note the Mass of the PDF centered around the Mean.

That is because the Expected Value of a random Variable is

its Mean.

OLS is about estimating the probability that Y (the

dependent variable) takes on some value conditional on the

values of the X , or independent, variables. These conditional

probability models are predicting the expected value of the

dependent variable, which is the mean.

— Monte Carlo Simulation and Resampling 40/114

Random Variables

Two types of Random Variables — Discrete and Continuous.

These are roughly similar to categorical and continuous.

Discrete Random Variables can only take on integer values

“Heads” or “Tails”

“Strongly Agree”, “Agree”, “Disagree”, “Strongly Agree”

A count of objects or events — How many “Heads” or How

many “Wars”?

Continuous Random Variables can take on any value on the

Real Number line.

— Monte Carlo Simulation and Resampling 41/114

Discrete Example

Toss a coin three times and record the number of “Heads.”

(a count)

One possible sequence of how three tosses might come out

is (H,T,T).

The set of all possible outcomes, then, is the set:


If X is the number of Heads, it can be either 0,1,2, or 3. If

the coin we are tossing is fair, then each of the eight possible

outcomes are equally likely.

Thus, P(X = 0) = 18 ,P(X = 1) = 3

8 ,P(X = 2) = 38 , and

P(X = 3) = 18 .

— Monte Carlo Simulation and Resampling 42/114

Discrete (cont.)

This describes the discrete probability distribution function of

X .

Note that the individual probabilities of all of the possible

events sum to 1.

The cumulative probability distribution function would be

represented like this:

P(X ≤ 0) = 18 ,P(X ≤ 1) = 4

8 ,P(X ≤ 2) = 78 , and

P(X ≤ 3) = 88 .

You can use the sample() function in R to generate

observation of a discrete random variable.My.Sample ¡- sample(k,size=n,prob=p,replace=TRUE)

— Monte Carlo Simulation and Resampling 43/114

Example using sample()

Tossing 3 coins 800 timesset.seed(23212) # Allows results to be reproduced

n ¡- 800 # Sample size I want to draw

k ¡- c(”0 Heads”,”1 Head”,”2 Heads”,”3 Heads”) # possible outcomes

p ¡- c(1,3,3,1)/8 # probability of getting 0, 1, 2, or 3 Heads

My.Sample ¡- sample(k,size=n,prob=p,replace=TRUE)



0 Heads 1 Head 2 Heads 3 Heads

94 312 293 101

— Monte Carlo Simulation and Resampling 44/114

Probability Distributions

Common Discrete Distributions include: Bernoulli, Binomial,

Multinomial, Poisson, Negative Binomial

Common Continuous Distributions include: Uniform, Normal,

Chi-2 (or χ2), F, and Student-t.

The last three are sampling distributions that have a degrees

of freedom parameter.

PDF’s of discrete distributions are represented as spike plots

while PDF’s of continuous distributions are represented as

density plots.

— Monte Carlo Simulation and Resampling 45/114

Spike Plot of a Binomial

2 4 6 8







Number of Trials



l PD

F o

f X

Figure: PDF of a Binomial Random Variable with n=8 and p=.5

— Monte Carlo Simulation and Resampling 46/114

Continuous PDF’s

Continuous PDF’s don’t really describe the probability of

getting any precise value because the probability of getting

any precise value is effectively 0.

Rather, they are used to describe the probability of getting a

value that falls between an upper and lower bound.

We can consider one tail of the distribution, two tails, or all

but the tails.

Example with the Normal

— Monte Carlo Simulation and Resampling 47/114

Areas Under a Normal PDF

P((X ≤≤ −− 1.5))

X=−1.5 X=1.5

1 −− P((X ≤≤ 1.5))

X=1.5 X=−1.5

P((X ≤≤ −− 1.5)) ++ ((1 −− P((X ≤≤ 1.5))))

X=−1.5 X=1.5

P((X ≤≤ −− 1.5)) ++ ((1 −− P((X ≤≤ 1.5))))

X=−1.5 X=1.5

Figure: Shades Areas Under a Normal Distribution

— Monte Carlo Simulation and Resampling 48/114

Probability is about uncertainty and randomness.

Randomness does not mean haphazard.

Random variables follow probability distributions that can be

defined with some assumptions or through frequentist

repeated samples.

The expected value of a random variable is the mean of the

distribution from which it was drawn.

Thus, we build probability and conditional probability models

for data based on classical probability theory.

— Monte Carlo Simulation and Resampling 49/114

Generating Random Variables

R has many functions that generate random variables that

follow many types of distributions.runif() # Random Uniform distribution

rnormal() # Random Normal distribution

rt() # Random Student’s T distribution

rf() # Random F distribution

rchisq() # Random Chi-Square distribution

rbinom() # Random Binomial distribution

If you type help(Distributions) in R , you will get a

complete listing of those built into R . Many others are

available in other packages.

— Monte Carlo Simulation and Resampling 50/114

Generating Random Variables (2)

However, you are not limited to only those distributions

already programmed into R .

You can simulate a random draw from any PDF if you know

the formula for the PDF.

When thinking about a probability distribution, you need toconsider the number of parameters that describe its locationand shape. Key elements to consider include:



Range of valid values

Skewness (symmetry of the distribution)

Kurtosis (“peakedness” of middle; “heaviness” of tails)

When selecting a distribution function for generating a

random variable, you have to make sure it is producing the

type of variable you want.

— Monte Carlo Simulation and Resampling 51/114

Examples of Random Variables

Suppose I want a vector of 10 values randomly and uniformly

distributed between 0 and 1?¿ Random10 ¡- runif(10)

¿ Random10

[1] 0.51932983 0.03848523 0.29820136 0.15254877 0.26798912

[6] 0.28751082 0.82063644 0.92177149 0.30496555 0.60416280

Now, what if I repeat the command? Will they be the same?¿ Random10 ¡- runif(10)

¿ Random10

[1] 0.19536123 0.87823454 0.49350686 0.67970321 0.03955143

[6] 0.75172914 0.56054510 0.33262119 0.35444109 0.19775575

Why are they different?

Well, they are random (100,000,001 numbers between 0 and

1, inclusive, out to 8 decimal places)

Actually, they are pseudo-random numbers

— Monte Carlo Simulation and Resampling 52/114

Pseudo-Random Number Generators

Pseudo-random number generators are actually complex

computer formulas that generate long strings of numbers

that behave as if they were random.

They insert a starting value, called a seed, into the formula,

and then it cycles

So, you can re-create a “random” sequence by staring with

the same seed

R picks a new seed when you start a new session. STATA

picks the same seed at the start of every session.

— Monte Carlo Simulation and Resampling 53/114

Setting the Seed

You can set the seed in R using the set.seed() function¿ set.seed(682879)

¿ Random5 ¡- runif(5)

¿ Random5

[1] 0.3506136 0.9191146 0.6758455 0.9105095 0.8402629

¿ Random5 ¡- runif(5)

¿ Random5

[1] 0.32960079 0.70555853 0.23793750 0.68339820 0.10286161

¿ set.seed(682879)

¿ Random5 ¡- runif(5)

¿ Random5

[1] 0.3506136 0.9191146 0.6758455 0.9105095 0.8402629

¿ Random5 ¡- runif(5)

¿ Random5

[1] 0.32960079 0.70555853 0.23793750 0.68339820 0.10286161

— Monte Carlo Simulation and Resampling 54/114

Setting the Seed (2)

Very important to know how software sets the seed and

whether it resets it automatically or not.

As noted, R resets its seed to something new every time you

open the software, but STATA resets its seed to the same

value every time the software is opened.

A website for a group called Random.org

(http://www.random.org/) offers truly random numbers

based on atmospheric noise and a discussion of them.

— Monte Carlo Simulation and Resampling 55/114

Example of a Random Normal Variable

Using the rnorm() function¿ set.seed(17450)

¿ Normal500 ¡- rnorm(500,mean=5,sd=2)

¿ Normal500

[1] 3.3420737 5.7393314 7.2544833 1.2088551 7.5385345 4.7587521



[493] 4.8450481 4.4310396 5.8419443 4.3937012 4.5475633 8.5981948

[499] 3.0382390 6.1508388

The first argument sets N. The second sets the Mean, and

the third sets the Standard Deviation

We can check the Mean and SD like this:¿ mean(Normal500)

[1] 5.052378

¿ sd(Normal500)

[1] 1.940603

— Monte Carlo Simulation and Resampling 56/114

Simulations as experiments give researchers new leverage we

don’t have in observational analysis.

Interesting models have both systematic and stochastic


Getting the distribution of the stochastic component right is

critical for inference and a major topic of focus for


Monte Carlo simulations let you define the population DGP.

Resampling methods do not.

— Monte Carlo Simulation and Resampling 57/114

Properties of Statistical Estimators

There are three basic properties of statistical estimators thatresearchers might want to evaluate using Monte Carlosimulations:




Bias is about getting the right answer on average.

Efficiency is about minimizing the variance around an


Consistency is about getting closer and closer to the right

answer as your sample size increases.

— Monte Carlo Simulation and Resampling 58/114

Unbiased and Inefficient Biased and Inefficient

Biased and Efficient Unbiased and Efficient

Figure: Illustration of Bias and Inefficiency of Parameter Estimates

Properties of Statistical Estimators (2)

In the OLS/GLM context, most tend to equate bias with the

estimates of the β’s and efficiency with the estimates of

their standard errors.

At one level, this makes sense. We want to know if our point

estimates are unbiased, and the Standard Errors measure

their distribution (which we generally want to be small).

This is O.K. in some settings, but this is not exactly right.

The parameters and their standard errors that are computed

using sample data are both estimates of something. Either

could be biased (e.g. systematically wrong) or inefficient

(estimated with less precision that we’d like).

Monte Carlo simulation can be used to evaluate both.

— Monte Carlo Simulation and Resampling 60/114

Evaluating Bias

Again, Bias is about systematically getting the wrong answer.

One way to measure it is absolute bias:

abs(True Parameter - Simulated Parameter)

You can repeat the simulation multiple times and compute

the mean of this difference and also show its distribution.

Next, you might vary some feature of the simulation and

show how changing that feature affects absolute bias.

An example.

— Monte Carlo Simulation and Resampling 61/114

0.0 0.2 0.4 0.6 0.8 1.0







Measurement Error Variance



e B


Figure: Impact of Measurement Error on Absolute Bias in Simple OLS

Evaluating Bias (2)

In this example, the initial impact of measurement error

appears to be small. It then grows more rapidly, but that

growth rate appears to slow down.

In this example, True β1 = .5, True X ranges from -1 to 1,

but observed X has random measurement error distributed

normally with a mean of zero and variance that grows to 1.

Correct interpretation of the previous figure requires knowing

the scale of all of these bits of information. If True β1equalled 27, then absolute bias that never exceeds .4 is not


What about the ratio of variance in Observed X due to True

X versus measurement error? In this case, the maximum of

1 results in a variance in Observed X of about 1.3, while the

variance in True X equals about .33.

— Monte Carlo Simulation and Resampling 63/114

Evaluating Bias (3)

Simulations for bias must consider the plausible ranges of

values for X and the factor that might cause bias.

One option would be to re-label each axis in the figure to

express the relative bias and proportion of variance in X due

to measurement error.

Other thing to notice in the figure is how the distribution of

absolute bias changes as the level of measurement error

changes. The variance is lowest at very low and very high

levels of measurement error. Why?

At low values of error, the parameter is consistently

estimated near the true value. At high values of error, the

parameter is consistently estimated to be near zero.

This is more clear if I double the maximum variance of the

measurement error from 1 to 2

— Monte Carlo Simulation and Resampling 64/114

0.0 0.5 1.0 1.5 2.0







Measurement Error Variance



e B


Figure: Impact of Measurement Error on Absolute Bias in Simple OLS

Efficient Estimates of Parameters

There might be several ways to estimate a parameter – How

can we evaluate their efficiency?

In a case like multicollinearity, we can see that slopes are less

efficiently estimated as multicollinearity increases by looking

at standard error estimates.

However, it is not accurate to say that the method with the

smallest standard error are the most efficient as a general

rule. We can standard errors that are wrongly estimated to

be small.

OLS assumptions that have efficiency implications DO NOT

always inflate standard error estimates.

Better to look at the distribution of the simulated values of

the parameter in question.

— Monte Carlo Simulation and Resampling 66/114

Efficient Estimation of the “Average”

Two very common methods of measuring the “Average” orcentral tendency of a variable are the Mean and the Median.

The Mean is the sum of all values divided by N

The Median is the middle value – the 50th percentile value.

In a single variable case with a symmetric distribution, both

will provide unbiased estimates of the central tendency of the


Which is more efficient?

— Monte Carlo Simulation and Resampling 67/114

A Simulation Study


Sims ¡- 10000

N ¡- 100

Results ¡- matrix(NA,nrow=Sims,ncol=2)

i ¡- 1

for(i in 1:Sims)–

Y ¡- runif(N)

Results[i,1] ¡- mean(Y)

Results[i,2] ¡- median(Y)


— Monte Carlo Simulation and Resampling 68/114

0.40 0.45 0.50 0.55 0.60





Measures of Central Tendency from a Uniform(0,1) Variable

Measures of Central Tendency



-0.4 -0.2 0.0 0.2 0.4




Measures of Central Tendency from A Standard Normal Variable

Measures of Central Tendency



Results of Simulation

We clearly see that in either case, the Mean is a more

efficient estimator of central tendency than is the Median.

The look more similar when the underlying distribution from

which the sample is being drawn is normal rather than

uniform, but that’s also a function of scales, so be careful.

Notice we used the distribution of the estimates themselves

– we did not compute a standard error.

What we’ve done regarding bias and efficiency for parameter

estimates could also be applied to estimates of standard

errors (they too can be right or wrong, and they too can be

widely dispersed or tightly clustered in repeated samples).

Epilogue: is the Mean always more efficient?

— Monte Carlo Simulation and Resampling 71/114

-0.4 -0.2 0.0 0.2 0.4




Measures of Central Tendency from A Laplace(0,1)

Measures of Central Tendency



Performance of Standard Errors

A Standard Error is meant to serve as a measure of the

uncertainty of a parameter estimate.

It can thought of as an estimate of the standard deviation of

all possible estimates of a given parameter based on equally

sized samples randomly drawn from the same population.

We generally use Standard Errors for hypothesis testing and

the construction of confidence intervals.

Still, any analytic computation of a standard error relies on

some assumptions – if those assumptions are not met, the

formula will not produce a proper estimate of the standard


If the standard error is wrong, our hypothesis tests and

confidence intervals will be wrong.

— Monte Carlo Simulation and Resampling 73/114

What is a Confidence Interval?

Suppose we run a regression and see the following ResultsCoefficient Standard Error

Constant 0.5 0.2

X1 1.3 0.4

X2 2.8 1.6

Assuming a large sample, a normal distribution, etc. wecould compute a 95% confidence interval for the coefficientoperating on X1 like this:

95% CI = 1.3 ± 1.96*0.4

95% CI = 2.084 to 0.516

I can do the same for the coefficient operating on X2:

95% CI = 2.8 ± 1.96*1.6

95% CI = 5.936 to -0.336

How would you interpret these results?

— Monte Carlo Simulation and Resampling 74/114

Confidence Intervals (cont.)

The 95% CI has the estimated parameter at its center, and

extends ± 1.96 standard errors if we assume the coefficient

estimates are normally distributed.

How to interpret this?

If I had a lot of samples drawn from the same population,

95% of the CI’s I computed like this would contain the True

value of the parameter.

In any one sample, the CI either does or does not include the

True parameter – you can’t make a probabilistic statement

about it (e.g. you do not have a 95% chance that our CI

includes the true value).

What it does suggest is a plausible range of values for the


— Monte Carlo Simulation and Resampling 75/114

Performance of Standard Errors (2)

Thus, one way to evaluate the performance of standard

errors in a Monte Carlo simulation is to determine whether

they meet their intended definition: In a large number of

repeated samples, a CI set at XX% should include the true

population parameter XX% of the time.

If it includes the True parameter more than it should, the

confidence interval is too large and you risk accepting a Null

hypothesis when it is False.

If it includes the True parameter less than it should, the

confidence interval is too small and you risk rejecting a Null

hypothesis when it is True.

— Monte Carlo Simulation and Resampling 76/114

Coverage Probabilities

In R , what you need to do is compute a confidence interval

at a given level (let’s say XX%) each time through the

simulation (each of the 1,000 iterations).

At each point, check to see if that confidence interval

contains the True population parameter or not (and you set

the Truth, so you know what it is).

Record a 1 when it does and a 0 when it does not.

The percentage of times you score a 1 equals the percentage

of times that your confidence interval included the True


If this percentage is approximately equal to XX%, your

standard error estimates are accurate.

— Monte Carlo Simulation and Resampling 77/114

What About Type II Error?

Coverage probabilities describe the proportion of estimated

confidence intervals that contain the true population

parameter. An accurate 95% CI corresponds to a 5%

probability of Type I error – rejecting a Null hypothesis that

is True.

What about Type II error – the failure to reject a Null

hypothesis that is false?

For Type I error, there is only one true parameter to compare

to the CI that is computed.

For Type II error, there are an infinite number of False Null


Pick a plausible one (say, one exactly 1.96 standard errors

away from the True parameter), then compute the proportion

of times your simulated CI includes that plausible False Null.

— Monte Carlo Simulation and Resampling 78/114

Choosing between Bias or Inefficiency?

Which should I worry about more, bias or inefficiency?

Classical Frequentists, Shrinkage Models, Bayesians

In any given sample, your parameter estimates might deviate

from the Truth because they are biased or because there is

variance in their estimation.

One way to approach this is to adopt a strategy that

considers both factors.

Mean Squared Error.

— Monte Carlo Simulation and Resampling 79/114

Mean Squared Error

Mean Squared Error (MSE) is exactly what it sounds like –

you compute a series of errors or differences, you square

each of those differences, and you compute the mean.

This is commonly reported for OLS models as the MSE of

the regression by computing the MSE of the model residuals.

But this can be applied to anything, including parameter


In a Monte Carlo simulation, I can estimate lots of slope

coefficients. Each time, I can compute the difference

between the estimated value and the True value and then

square that difference.

The mean of those squared differences is the MSE

— Monte Carlo Simulation and Resampling 80/114

Mean Squared Error (2)

If the MSE = 0, then the estimator always perfectly recovers

the population parameter. Of course, that is not realistic.

If the estimator is unbiased, then the observed squared errors

capture only sampling variance – our uncertainty about the

parameter estimate.

If the estimator is biased, then the observed squared errors

capture both this bias and sampling variance.

Specifically, the MSE of θ = Var(θ) + Bias(θ, θ)2

So MSE is a method of comparison that considers both Bias

and Inefficiency in evaluating performance, where smaller

MSE is better.

— Monte Carlo Simulation and Resampling 81/114

Limitations of MSE

It is a loss function that considers both Bias and Inefficiency,

but just one specific loss function – a quadratic one.

The implied weighting of Bias and Inefficiency might not be

the ratio you desire.

MSE is sensitive to outliers. Means are more sensitive to

outliers, and squaring differences also emphasizes large


Alternatives include using the mean of absolute errors rather

than squared errors, or using methods that rely on medians

rather than means.

You will see examples in Lab.

— Monte Carlo Simulation and Resampling 82/114

Consistency is about an estimator converging toward the

true value as sample size increases.

This assumption gets scant attention in OLS, but is

fundamental to MLE, where the small sample properties are


This raises a more general concern with the finite sample

properties of estimators compared to their asymptotic

properties. (e.g. Beck and Katz, 1995).

Simulations can be extremely valuable in revealing finite

sample properties.

This is the same as saying that an easy factor to vary in a

Monte Carlo simulation is the size of each simulated sample

that you draw.

— Monte Carlo Simulation and Resampling 83/114

Other Performance Evaluations

You can evaluate the performance of models are all sorts ofother factors. These might include:

Explained Variance

Within Sample Predictive accuracy

Out of Model Forecasting

You can add a parsimony discount factor (or use things like


The burden is on the researcher to identify a characteristic

that is appropriate and a way to measure performance on

that characteristic.

The trick is to make sure your simulation is doing what you

think it is doing.

— Monte Carlo Simulation and Resampling 84/114

Simulation Error

Simulation error can emerge from a number of places:

The most common is operator error – you make a mistake in

your program OR in your logic.

You stumble across an oddity in the pseudo random number

generator – I generally run simulations several times starting

from different seeds to guard against this.

Simulations themselves are probabilistic. You randomly draw

some finite sample of data, and you randomly draw some

finite number of those samples. Larger N at either stage can

have implications for your study, though some might limit

the idea of simulation error just to the number of samples

you draw.

— Monte Carlo Simulation and Resampling 85/114

Other Reasons to do Simulations

Evaluate Distributional Assumptions of Estimators

Evaluating the range of DGP’s that might produce a variable.

Evaluate the behavior of a statistic that has no or weak

support from analytic theory.

Robustness of sample estimates to different distributional


— Monte Carlo Simulation and Resampling 86/114

Distributional Assumptions

Monte Carlo simulations are well suited to evaluating

distributional assumptions of models.

Since you control the DGP, you can vary the distributional

assumption and observe if/how the results change.

The important question is often one of magnitude.


— Monte Carlo Simulation and Resampling 87/114

Normal Distribution in OLS

OLS assumes the residuals of the model are drawn from a

normal distribution.

What if the distribution has high Kurtosis? You can look

different distributions like the Laplace, or you can vary it

using the Student t at different degrees of freedom or the

Pearson Type VII.

What if it the distribution is skewed? You can draw a vector

of size N from a Chi-2 Distribution, then standardize the

values of that vector. The result will be a vector of random

observations with a mean of zero, a variance of 1, but with a

positive skew proportional to the degrees of freedom in the

Chi-2 distribution.

— Monte Carlo Simulation and Resampling 88/114

-2 0 2 4








Simulated Standardized Chi-Square Distributions



Distributional Assumptions

Sometimes we have multiple estimators we could use to

estimate a model that might vary in the distributional

assumption. How can we tell which one to use?

An example is Jeff’s work on using OLS or Median

Regression (MR) to estimate a linear model.

OLS estimates the conditional mean of Y and assumes the

residuals are drawn from a normal distribution.

MR estimates the conditional median of Y and assumes the

residuals are drawn from a Laplace distribution.

We’ll save it for Lab.

— Monte Carlo Simulation and Resampling 91/114

The Beta Distribution

The Beta distribution is particularly useful if you want to

explore a range of distributions over the 0-1 space.

The Beta distribution is governed by two parameters, oftencalled a and b or α and β.

A Beta(α = 1, β = 1) is the uniform(0,1) distribution

A Beta(α < 1, β < 1) is U-shaped

A Beta(α < 1, β ≥ 1) is strictly decreasing

A Beta(α > 1, β > 1) is unimodal

A Beta where both α and β are positive, but with α < β will

have positive skew; α > β will have negative skew.

This makes it extremely flexible in exploring how estimators

behave over different distributional shapes.

— Monte Carlo Simulation and Resampling 92/114

Page 93: Monte Carlo Simulation and Resampling

Figure: Examples of different Beta distributions

Substantive Example using Beta

Mooney (1997, p. 72-77)

Lijphart and Crepaz (1991) score the United States as a

-1.341 on a standardized scale of corporatism.

According to Ligphart and Crepaz (1991, p. 235),

corporatism, “. . . refers to an interest group system in

which groups are organized into national, specialized,

hierarchical and monopolistic peak organizations.”

Since this is a standardized score, it should have a mean of

zero and a standard deviation of 1.

The question is: is the level of corporatism in the U.S.

significantly lower than average?

— Monte Carlo Simulation and Resampling 94/114

Example (2)

There are two problems:

We don’t have a good theory about the proper probability

distribution for the DGP.

Even if we did, they only measured 18 countries.

Thus, an analytic approach is unwise because they depend on

strong theory and/or large samples (e.g. asymptotic


Mooney shows that we can use a simulation to get a sense

of how likely a score of -1.341 is and what sorts of

probability distributions for the DGP are likely or unlikely to

produce such scores.

— Monte Carlo Simulation and Resampling 95/114

Example (3)

Mooney’s simulation:

Defines a range of Beta distributions

Draws a very large sample from that distribution

Standardizes the sample (thus, mean of zero like the original


Records attributes of the sample (level of α, β, kurtosis, and


Computes the proportion of observations that fall below


Treats that proportion as the Probability of Type 1 error (e.g.

level of statistical significance)

— Monte Carlo Simulation and Resampling 96/114

Example (4)

I ran this simulation ranging both α and β from 1 through 30

NOTE: a Beta(1,1) is a uniform distribution, a Beta(30,30)

is effectively a normal distribution. Those in between have

various levels of skewness and kurtosis.

I drew samples of 10,000 from each distribution.

I plotted the resulting levels Type 1 error as a function of

these attributes of the various Beta distributions.

— Monte Carlo Simulation and Resampling 97/114

0 5 10 15 20 25 30








Mooney Replication, Slide 1

Level of A




of T


1 E


0 5 10 15 20 25 30








Mooney Replication, Slide 2

Level of B




of T


1 E


Mooney Replication, Slide 3

0 5 10 15 20 25 30










0 5




Level of A


l of B




of T


1 E


2 3 4 5 6 7 8








Mooney Replication, Slide 4

Level of Kurtosis




of T


1 E


-2 -1 0 1 2








Mooney Replication, Slide 5

Level of Skewness




of T


1 E


Mooney Replication, Slide 6

1 2 3 4 5 6 7 8 9











0 1


Level of Kurtosis


l of S






of T


1 E


What Did We Learn?

Under a wide range of distributions, a score of -1.341 or

occurs more than 5% of the time in large samples.

When -1.341 is rare is when α is relatively low, but is not

responsive to β.

More specifically, this is most likely when there is a positive

skew to the DGP.

This makes sense since a variable with a positive skew has a

short tail on the left and long tail on the right. If the mean is

0, then a short tail on the negative side makes -1.341

relatively rare.

Is the U.S. significantly below average? Only if there is a

strong positive skew to the DGP

— Monte Carlo Simulation and Resampling 104/114

Statistics with No Analytic Support

Some statistics have weak or no theoretical/analytic supportat all, and many other have weak support in small samples.Mooney (1997) notes a few:

The ratio of two correlated regression coefficients (Bartels


Jackman’s (1994) estimator legislative vote to seat bias.

Difference between two medians.

You know enough now to imagine the approach:

Simulate data with plausible characteristics (e.g. define the

systematic and stochastic components of the DGP)

Compute the statistic in question

Examine the simulated distribution of the statistic

Alter one feature at a time in your DGP, repeat the

simulation, and observe any changes in the pattern describing

your statistic.

— Monte Carlo Simulation and Resampling 105/114

Your Results and Your Data

Another great use of simulations centers on evaluating the

robustness of your findings from the analysis of your sample

of data.

In this sort of study, your sample data and initial analysisprovide the information you need to define the populationDGP for the simulation study.

Use the actual values of the X ’s in your data, or generate X ’s

that look like your observed X ’s in your sample.

Use your OLS estimates of the β’s as the values of your

population parameters.

Simulate the stochastic component of Y based on the

observed residuals of your model.

Use all of this to generate simulated samples of Y .

Your simulation then re-runs your analysis using your X ’s and

simulated Y ’s.

— Monte Carlo Simulation and Resampling 106/114

Your Results and Your Data (2)

You can evaluate the simulation in two basic ways:

Does your simulations recover the “True” parameters?

How similar are your simulated Y ’ to your actual observed

Y ’s.

Of course, the real power comes when you begin to

manipulate features of your simulation and then re-evaluate

the performance of your statistical analysis as noted above.

Features you might vary include: N, attributes of X,

attributes of the stochastic component of the model, etc.

This allows you to determine how sensitive your original

results are to different assumptions or different features in

the sample of data you have.

— Monte Carlo Simulation and Resampling 107/114

Other Kinds of Data

Everything thus far has involved continuous variables andcontinuous probability distributions. Of course, there are lotsof variables (and associated probability distributions) that arenot continuous in the population DGP or are at least notobserved as continuous. Common ones include:

Dichotomous variables

Ordered categorical variables

Unordered categorical variables

Count variables

And other kinds of data structures, including:

Clustered/Multi-level data.

Panel and Time Series Cross Section (TSCS) data.

We’ll let Jeff do that in Lab! (except . . .)

— Monte Carlo Simulation and Resampling 108/114

Time Series Cross Section Data

Data that has observations for the same set of multiple units

across multiple time periods (e.g. 50 states over 30 years).

Tremendous debate on the proper way to analyze such data.

Political Analysis (2007: 15(2) Special Issue

Political Analysis (2011:19(2)) Symposium on Fixed-Effects

Vector Decomposition

Simple Question: Does a lagged value of Y capture unit

fixed effects?

— Monte Carlo Simulation and Resampling 109/114

TSCS (2)

The TSCS Model looks like this:

Yit = Xitβ + eit (8)

With a residual like this:

eit = µi + αt + εit (9)

Unit effects capture the history of each unit.

They may be viewed as fixed or random (too much to sort

out now)

But, does the previous value of Y also capture that history?

— Monte Carlo Simulation and Resampling 110/114

A Simulation Study

Simulated Yit for 50 units over 50 years.

I set Yit as a function of its own past value (parameter =


I varied the degree of unit effect by adding a random unit

effect drawn from a normal distribution with a zero mean

and a variance that ranged from 0 to 4 in increments of 0.5.

I drew 1,000 samples at each level of unit effect.

I estimated two models: Y regressed just on its lagged value

and Y regressed on its lagged value plus a full set of unit

fixed effects.

I plot the simulated parameters operating on the lagged

value of Y for both models at each level of unit effect.

— Monte Carlo Simulation and Resampling 111/114

0.3 0.5 0.7 0.9




Unit Var = 0.0

Estimated B1


No Unit EffectsUnit Effects

0.3 0.5 0.7 0.9




Unit Var = 0.5

Estimated B1


0.3 0.5 0.7 0.9



Unit Var = 1.0

Estimated B1


0.3 0.5 0.7 0.9



Unit Var = 1.5

Estimated B1


0.3 0.5 0.7 0.9



Unit Var = 2.0

Estimated B1


0.3 0.5 0.7 0.9




Unit Var = 2.5

Estimated B1


0.3 0.5 0.7 0.9




Unit Var = 3

Estimated B1


0.3 0.5 0.7 0.9




Unit Var = 3.5

Estimated B1


0.3 0.5 0.7 0.9




Unit Var = 4.0

Estimated B1


Results of TSCS Simulation

Failure to account for fixed effects when they are present

results in positive bias in the estimate of the coefficient

operating on the lagged value of Y .

Not shown, but it is clear that controlling for fixed effects

when they are present significantly improves the fit of the

model. In other words, just including the lagged value of Y is

NOT sufficient to capture unit effects.

Some evidence that the coefficient operating on the lagged

value of Y is biased slightly downward when a full set of

fixed effects are included.

Variance in the parameter is larger when fixed effects are

included (likely due to multicollinearity)

— Monte Carlo Simulation and Resampling 113/114

Wrapping Up Monte Carlos

Monte Carlo simulations as experiments.

Vast array of applications for substantive and methodological


Great teaching tool.

Can get very complex very quickly.

Be careful – make sure your simulation is doing what you

think it is doing.

— Monte Carlo Simulation and Resampling 114/114
