Post on 16-Jul-2015
transcript
• Causal inference
• Experiment definitions
• Validity in experimentation
• Selected experimental designs
• Critique an experiment
Class Outline
• An instrument X (e.g., price) is said to be causally
associated with response Y (e.g., sales) if changes in X
cause changes in Y in a pre-specified direction with high
probability.
• Causality:
– ΔX => ΔY with high probability.
• Quantified Causality:
– ΔX =1 => ΔY = β with probability p(β).
What is causality?
1. Concomitant variation (statistical association)
2. Time order: X must occur before Y
3. Falsification: rejection of alternative explanations by
holding all other factors constant.
How to establish causality?
• A study found that the average life span of famous
orchestra conductors was 73.4 years, significantly higher
than the life expectancy for males, 68.5 years. Jane
Brody in her New York Times health column reported
that this was thought to be due to arm exercise.
• What extraneous variable can also explain the above?
5
Causal inference example: Aging Conductors
1. manipulating x, then observing the corresponding y,
2. holding all other factors constant,
3. measuring association.
An experiment attempts to check these three criteria
for causality by:
1. Identify the true constructs of interest in the real world:
instrument X, response Y, population P.
2. Establish for each of the above a proxy in the
experimental study: x, y, p.
3. Assign the experimental units to one or more groups.
The groups must be at parity, in that the groups must be
equivalent in all respects other than the x variable
4. Measure the values of the response variable for each
item in each of the groups
5. Compute the causal effect of the instrument change
The Experimental Procedure
Randomly sample
100 consumers.
Randomly Assign
50 see package
design “A”
50 see package
design “B”
Count # your brand purchased in each group
Marketing Experiment Example: Package Design
• Definitions
– Factor: Explicitly manipulated variable.
– Levels: The values a factor is allowed to take.
– Treatment: Combined levels of factors that an individual is
exposed to.
– Control Group: No treatment.
– Measurement: Recording of response.
– Subject: Object of treatment.
Experimental Design: Definitions
• Effects
– Treatment effects: Effects of interest
• Manipulation check
– Experimental effects: Unintended effects
• Impact of measurement
– Other-variable effects: Effects of ignored extraneous
variables
– Randomness
Experiments: Effects
• Internal validity
– The extent to which the observed results are due to the experimental
manipulation.
– Problems: Being able to come up with explanations for changes in y that have
nothing to do with a falsification argument to falsify the statement that the change
in y was caused by the change in x (Most common problem - “selection bias”: the
two groups are not at parity)
• External validity
– The degree to which the experimental results are likely to hold beyond the
experimental setting.
– Problems: x, y, p being poor proxies for X, Y, P.
• Usually there is a tradeoff between the two.
• Without internal validity, external validity means nothing.
Validity
• Passage of time
– History effect (H): Events external to the experiment that
affect the responses of the people involved in the
experiment.
– Maturation effect (M): Changes in the respondents that are
a consequence of time, such as aging, getting hungry, or
getting tired.
Threats to Internal Validity
• Testing
– Testing effect (T): The fact that someone has been
measured previously might effect their future behavior
(e.g., desire to be consistent).
– Interactive Testing Effect (IT): The prior measurement
affects perceptions of the experimental variable (e.g.,
question about coke’s brand awareness affects processing
of coke’s advertising).
Threats to Internal Validity
• Data
– Instrument variation (IV): The method used to collect data
changes within the experiment (e.g., questionnaire,
interviewer, etc.).
– Statistical regression (SR): Regression towards the mean.
If an event is extreme it is likely to revert towards the mean
on its next occurrence (e.g., salesperson had an
exceptional year).
Threats to Internal Validity
• Sample
– Selection bias (SB): If units self-select themselves into the
treatment and control groups then this is of serious
concern if the selection reason is related to the outcome of
interest.
– Experimental Mortality (EM): The sample becomes
unrepresentative.
– Differential Experimental Mortality (DEM): Mortality may be
different across groups.
Threats to Internal Validity
• x, y, and p being poor proxies for X, Y, and P
• Non-representative sample, environment, and materials
used.
Threats to External Validity
• O Any formal observation or measurement
• X Exposure of the experimental units to the treatment
• EG Experimental group
• CG Control group
• R Random Assignment
Common Notation for Experiments
• Toyota wants to find out the effectiveness of a new
advertising campaign on potential customers
• What are the followings
– X (treatment)? TV commericials (interpersed through TV
shows)
– Y (response)? Attitudes toward Toyota cars
– P (population)? Potential Toyota car buyers
Common Experimental Designs: Toyota Example
Effect: O2 - O1 = E + B = E + H + M + T + TI + IV + EM
Before-After Design Without Control Group (One Group
Pre-test/Post-test Design)
Before-After Design With Control Group (Two Group
Pre-test/Post-test Design)
Effect: (O2 - O1) – (O4 – O3) Biases: SB, DEM, and TI
• Factorial Design
– We test the effect of the manipulation of 2 or more
treatments at one time in which every level of each factor
is observed with every level of every other factor.
Experimental designs with more than two factors
• Example:
• Price: 3 levels ($2.0, $1.75, $1.50)
• Advertising: 2 levels (None and Some)
• Coupons: 2 levels (No and Yes)
• This could be called a 3x2x2 factorial design. You will
have 12 EGs where each EG received one combination
of the treatment levels.
Factorial Design
• An interaction occurs when the effect of one
experimental factor depends on the level of another
experimental factor.
• Interactions can mask or weaken experimental effects if
they are not taken into account.
• Example) The effectiveness of a spokesperson depends
on the type of product.
Interactions
Absence of Interaction: 2 x 2 Example
Level 1 Level 2
Factor B, Level 2
Factor B, Level 1Mean response
Factor A
No Interaction
Presence of Interaction: 2 x 2 Example
Level 1 Level 2
Factor B, Level 2
Factor B, Level 1
Mean response
Factor A
Level 1 Level 2
Factor B, Level 2
Factor B, Level 1
Mean response
Factor A
Cross over Spread
• If you don’t care about interactions
– There is a lot of redundancy in a factorial design.
– You can create a reduced set of cells by eliminating
redundant profiles.
– Most statistical packages will design experiments for you.
Fractional Factorial Design
• A strategy for eliminating biases in measuring treatment
effects due to self-selection.
• What if small sample sizes in the groups so that t-tests,
z-tests, etc… do not hold?
• Randomization tests of significance (e.g, Fisher’s test).
Randomization
• The process by which pairs of cases are matched on
variables thought to impact the treatment effect of interest.
• Followed by random assignment of one of each of the
matched pairs (or more) to one of the two (or multiple) groups.
– Expensive and time consuming.
– Difficult to find matches on all variables of interest.
– Which variables?
• Example in Marketing: Split-cable experiments for
commercials, beta testing across geographically similar
stores, cities, etc…
Matching
• Blocking is done by selecting, typically, a few variables
thought to impact the treatment effect, and then
randomly assigning people to the treatments within
blocks.
• Blocking is similar in spirit to matching, but:
– in blocking you are typically interested in how the
treatment effect varies across the blocks,
– Statistical matching as opposed to one-one.
Blocking
• Example
– In an experiment, the objective is to test the effectiveness
of three types of display racks for supermarket
merchandising.
– These are end-aisle displays, stand-alone racks, and
check-out stand racks.
– The racks are to be tested in both small and large
supermarket stores.
Blocking
• Example
– Treatment: 3 Types of Racks.
– Blocks: 2 Types of Stores.
For each type of store, assign the stores randomly to one
type of rack.
Why not simply assign stores to racks without worrying
about blocking?
Blocking
• Between-subjects design
– Each subject receives only one treatment.
– Comparisons are made between groups of different
subjects.
• Within-subjects design
– Subject receives more than one treatment.
– Comparisons are made across multiple measures on the
same subject.
Two types of experiments
• Within subjects designs are advantageous because you
get greater statistical power due to “internal matching”
(you are your own control).
• However, in some cases, due to contamination, time
constraints, between subjects designs must be used.
• This is not an obvious issue.
Within or Between Subjects?
• Identify the real instrument/treatment variable X, the real
response variable Y and the real population P of interest to
the manager.
• Identify the proxies x, y and p in the experiment setting.
• When and how is y being measured? Identify the
experimental design and the corresponding best estimate of
the observed effect of x on y:
a. Before-After without Control Group: E = O2 − O1
b. Before-After with Control Group: E = (O2 − O1) − (O4 − O3)
c. After-Only with Control Group: E = O2 − O4
Guidelines for Critiquing Experimental Research in
Marketing
• Look for problems in internal validity.
– Are there alternative explanations to the change E other
than the treatment variable? If there are, the statement
that x causes y is falsifiable and the experiment is flawed.
• Look for problems in external validity. That is, is there a
problem with the proxies?
Guidelines for Critiquing Experimental Research in
Marketing