Dummy Variable Regression Analysis

Post on 29-May-2015

657 views 3 download

Tags:

description

Dummy variable regression analysis (with marketing focus)

transcript

Class Outline

• Incorporating Discrete Variables in Regression Analysis Using Dummy Variables

• Incorporating Discrete Variables with 3+ Categories in Regression Analysis Using Dummy Variables

• Application Exercise #1: Impact of Competition on Airfare

• Application Exercise #2: Movie Box Office Revenue Forecast

Example – Sales Data ContinuedMarket ID Sales Price Advertising

1 214 2.2 Yes2 163 2.7 No3 201 2.4 Yes4 152 2.9 No5 157 2.8 No6 213 2.2 Yes7 226 2 Yes8 187 2.3 No9 219 2.1 Yes

10 163 2.7 No11 157 2.8 No12 189 2.6 Yes13 169 2.6 No14 152 2.9 No15 189 2.6 Yes

Discrete Variable with 2 categories

Regression Analysis Using Dummy Variables

• We can represent a discrete variable using dummy variables

• dummy variable: takes the value of 0 or 1 to indicate the absence or presence of some categorical effect

• Procedure (two category case)1. Select a baseline category: e.g. No advertising2. Generate a dummy variable for the non-baseline

category (DumAdYes)3. Use the variable in regression analysis

Sales = a + b1*Price + b2*DumAdYes + ε

Regression Analysis Using Dummy Variables

Market ID Sales Price Advertising DumAdYes1 214 2.2 Yes 12 163 2.7 No 03 201 2.4 Yes 14 152 2.9 No 05 157 2.8 No 06 213 2.2 Yes 17 226 2.0 Yes 18 187 2.3 No 09 219 2.1 Yes 1: : : : :

Use “Dummy Variable Examples.xlsx – Dummy Variable Example 1”

Regression Analysis Using Dummy Variables

=0 if No Ad=1 if AdNo Ad

Sales = a + b1*Price + ε

AdSales = a + b1*Price + b2 + ε

Influence of Ad on Sales: b2

Sales = a + b1*Price + b2*DumAdYes + εUse “Dummy Variable Examples.xlsx – Dummy Variable Example 1”

Price

Sales

No AdSales = a + b1*Price + εAdSales = a + b1*Price + b2 + ε

Ad

No Ad

b2b1

0

ab1

SUMMARY OUTPUT

Regression Statistics

Multiple R 1.00

R Square 1.00

Adjusted R Square 1.00

Standard Error 0.49

Observations 15.00

ANOVA

df SS MS F Significance F

Regression 2 9682.672455 4841.336 19844.63 7.62539E-22

Residual 12 2.927544735 0.243962

Total 14 9685.6

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 325.36 1.73 188.50 0.00 321.60 329.12

Price -60.04 0.63 -94.84 0.00 -61.42 -58.66

DumAdYes 20.02 0.37 54.78 0.00 19.22 20.81

Regression Analysis Using Dummy Variables

1. Prediction / Forecastingeg.) Price = 3; AD; Expected Sales=165.26eg.) Price = 3; No AD; Expected Sales=145.24

2. Relationship between variablesInfluence of Ad on Sales: 20.02

Sales=325.36–60.04*Price+20.02*DumAdYes+ε

Incorporating Discrete Variable with 3+ Categories

Example – Sales Data ContinuedMarket ID Sales Price Advertising

1 221 2.2 TV2 171 2.7 No3 210 2.4 Radio4 153 2.9 No5 163 2.8 No6 224 2.2 TV7 236 2.0 Radio8 191 2.3 No9 233 2.1 TV

10 171 2.7 No11 163 2.8 No12 192 2.6 Radio13 174 2.6 No14 156 2.9 No15 201 2.6 TV

Discrete Variable with 3 categories

Regression Analysis Using Dummy Variables

• We can always represent a discrete variable with K categories using K-1 dummy variables.

• Procedure1. Select a baseline category2. Define K-1 Dummy variables for non-baseline

categories 3. Include them in regression analysis

Use “Dummy Variable Examples.xlsx – Dummy Variable Example 2”

Regression Analysis Using Dummy Variables

Advertising DumTV DumRadioTV 1 0No 0 0

Radio 0 1No 0 0No 0 0TV 1 0

Radio 0 1

Sales = a + b1*Price + b2*DumTV + b3*DumRadio + ε

Use “Dummy Variable Examples.xlsx – Dummy Variable Example 2”

Regression Analysis Using Dummy Variables

No Ad (baseline)Sales = a + b1*Price + ε

TV AdSales = a + b1*Price + b2 + ε

Radio AdSales = a + b1*Price + b3 + ε

Sales = a + b1*Price + b2*DumTV + b3*DumRadio + ε

=1, AD=TV

=0, OW

=1, AD=Radio

=0, OW

Regression Statistics

Multiple R 1.00

R Square 0.99

Adjusted R Square 0.99

Standard Error 2.57

Observations 15.00

ANOVA

df SS MS F Significance F

Regression 3 11490.89388 3830.298 579.5011 2.193E-12

Residual 11 72.7061161 6.6096469

Total 14 11563.6

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 341 9.03 37.75 0.00 320.94 360.68

Price -64 3.31 -19.27 0.00 -71.09 -56.51

DumTV 24 2.14 11.26 0.00 19.38 28.80

DumRadio 21 2.15 9.66 0.00 16.00 25.45

No AdSales = 341 - 64*Price + ε

TV AdSales = 341 - 64*Price + 24 + ε

Radio AdSales = 341 - 64*Price + 21 + ε

Sales = 341 – 64*Price + 24*DumTV + 21*DumRadio + ε

=1, AD=TV

=0, OW

=1, AD=Radio

=0, OW

Application Exercise #1:Impact of Competition on Airfare

Airfare

AirfareCompetition

••••••

Distance

Other factors

Location Characteristics

Impact of Competition on Airfare

Use “Airfare.xlsx”• Origin: Airport code for the origin• Destination: Airport code for the destination• Average Fare: Average non-stop fare for the route • # of Airlines: Number of Airlines providing direct service

between O & D• Distance: Distance between O & D• South_West: Whether Southwest provides a Direct Service• Holiday_O: Whether Origin is a holiday market• Holiday_D: Whether Destination is holiday market• Traffic_O: Annual airport traffic at Origin city size• Traffic_D: Annual airport traffic at Destination city size

Impact of Competition on Airfare

Q1: Generate “DumHO” using Holiday_OQ2: Generate “DumHD” using Holiday_DQ3: Generate “DumSW” using SouthWestQ4: Perform a regression analysis

Airfare = a + b1* (# of Airlines) + b2* Distance+ b3* Traffic_O + b4* Traffic_D+ b5* DumHO + b6* DumHD + b6* DumSW + ε

Impact of Competition on Airfare

Q5: One more airline in the market will (decrease/increase) average fare by $( ).

Q6: Presence of South West in the market will (decrease/increase) average fare by $( ).

Q7: One mile longer in distance will (decrease/increase) average fare by $( ).

Q8: If destination is holiday market, average fare will (decrease/increase) by $( ).

Impact of Competition on AirfareR Square 0.42ANOVA P-Val. = 0.00

Coefficients Standard Error t Stat P-valueIntercept -43.05 350.11 -0.12 0.90# of Airlines -25.88 6.73 -3.85 0.00Distance 0.22 0.01 17.71 0.00Traffic_O -8.19 14.03 -0.58 0.56Traffic_D 25.80 14.34 1.80 0.07DumHO -34.72 22.42 -1.55 0.12DumHD -74.85 23.53 -3.18 0.00DumSW -65.11 17.98 -3.62 0.00

Application Exercise #2:Box Office Revenue Forecast

• Suppose you are helping Warner Bros. in developing a model for forecasting Box Office revenues

• You are provided the opening week revenues (in millions of $) for various past movies along with several dependent variables

Movie Opening_Week_Revenue# of

TheatersOverall Rating

Genre

The Dark Knight 158.4 4366 82 ActionIron Man 98.6 4105 79 Action

Sex and the City 57 3285 53 ComedyMamma Mia! 27.8 2976 51 Comedy

21 24.1 2648 48 DramaConstantine 29.8 3006 50 HorrorThe Grudge 39.1 3245 49 Horror

WALL-E 63.1 3992 93 KidsKung Fu Panda 60.2 4114 73 Kids

Movie Revenue Forecast

RevenueRating

Other Factors

Genre

# of Theaters

Use “Movie.xlsx”

1. Generate dummy variables to represent Genre. Use “Kids” genre as the baseline category.

• Genre: {Action, Comedy, Drama, Horror, Kids}• DumA = 1, if Genre = Action = 0, Otherwise• DumC = 1, if Genre = ( ) = 0, Otherwise• DumD = 1, if Genre = ( ) = 0, Otherwise• DumH = 1, if Genre = ( ) = 0, Otherwise

In Excel, Use =if(E2=“Action’,1,0)

3. Describe the relationship among the variables.Revenue = -126.66 + 0.04*#Theater + 0.27*Rating

+ 16.20*DumA + 5.99*DumC+ 19.56*DumD + 14.43*DumH + ε

Continuous variables:If we increase XXX by one unit, Revenue ( ).This is statistically significant/insignificant.

Dummy variables: Compared to the revenue of baseline category (i.e. Kids), XXX genre has ( ).This is statistically significant/insignificant.