+ All Categories
Home > Documents > How can we obtain those equations?

How can we obtain those equations?

Date post: 01-Jan-2016
Category:
Upload: marcia-mcfarland
View: 26 times
Download: 3 times
Share this document with a friend
Description:
Last week we talked, among other things, about supply and demand equations and said that having those available may improve the accuracy of our predictions. - PowerPoint PPT Presentation
55
Last week we talked, among other things, about supply and demand equations and said that having those available may improve the accuracy of our predictions.
Transcript
Page 1: How can we obtain those equations?

Last week we talked, among other things, about supply and demand equations and said that having those available may improve the accuracy of our predictions.

Page 2: How can we obtain those equations?

Last week we talked, among other things, about supply and demand equations and said that having those available may improve the accuracy of our predictions.

• How can we obtain those equations?

• What information do we need for that?

• What techniques do we use to translate raw data into an equation?

• How much faith should we have in the resulting equations?

Page 3: How can we obtain those equations?

Regression analysis

The simplest case is the relationship between two variables, which may help answer such business-type questions as

•How does the number of TVs sold at an outlet depend on the TV price?

•How does the quantity demanded of paper towels depend on the population of a town?

•How does the volume of ice cream sales depend on the outside temperature?

Page 4: How can we obtain those equations?

Simple regression

Step 1. Collect data.

Observation # Quantity Price1 18 4752 59 4003 43 4504 25 5505 27 5756 72 3757 66 3758 49 4509 70 400

10 21 500

Page 5: How can we obtain those equations?

10 20 30 40 50 60 70 800

100

200

300

400

500

600

700

Page 6: How can we obtain those equations?

Step 2. Assume a form of the relationship between the variables of interest, such as

QD = A – B∙P, or QD = A/P – B∙P + CP – D,

or any other you can possibly think of,where A through D are some numbers, or “coefficients”.

Which of the above specifications would you prefer to use? How would you justify your choice?

None of them represents the “real” relationship between P and Q, therefore let’s use the linear, the simplest one that is consistent with theory (therefore common sense).

Page 7: How can we obtain those equations?

10 20 30 40 50 60 70 800

100

200

300

400

500

600

700

Page 8: How can we obtain those equations?

0 10 20 30 40 50 60 70 800

100

200

300

400

500

600

700

PriceLinear (Price)

Quantity

Price

Page 9: How can we obtain those equations?

Step 3. Find the best values for coefficients.  How?

Each observation is a point in the price-quantity space.

Each pair of numbers, A and B, when plugged into the equation above, uniquely defines a line in the price-quantity space. It is highly unlikely that all the points (observations) will fit on the same line.  The best line will be the one that minimizes the sum of squared deviations between the line and the actual data points.

Page 10: How can we obtain those equations?
Page 11: How can we obtain those equations?

This procedure can be performed using a MS Excel spreadsheet.

2003: In the upper scroll-down menu, choose Tools Data Analysis Regression

2007: Data tab Analysis Data Analysis Regression

then enter the range of cells that contain data for each variable.• Y is the “dependent variable” (in our case, QD).• X is the “independent variable”, also called the explanatory variable (in our case, P).• Check the “Labels” box if and only if you include column headers.• Click on the cell where you want the output printout to start, then click ‘OK’.

Page 12: How can we obtain those equations?

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.868299

R Square 0.753944Adjusted R Square 0.723187

Standard Error 11.147108

Observations 10

ANOVA

  df SS MS FSignificance

F

Regression 1 3045.935 3045.935 24.51298 0.0011193

Residual 8 994.0642 124.2580

Total 9 4040      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 163.706 24.2337 6.7553 0.000144 107.823 219.589

Price -0.2608 0.0526 -4.9510 0.001119 -0.382 -0.1393

Page 13: How can we obtain those equations?

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.868299

R Square 0.753944Adjusted R Square 0.723187

Standard Error 11.147108

Observations 10

ANOVA

  df SS MS FSignificance

F

Regression 1 3045.935 3045.935 24.51298 0.0011193

Residual 8 994.0642 124.2580

Total 9 4040      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 163.706 24.2337 6.7553 0.000144 107.823 219.589

Price -0.2608 0.0526 -4.9510 0.001119 -0.382 -0.1393

R2 shows the portion of the variation in the dependent

variable, QD, that is explained by the

independent one, P.

Page 14: How can we obtain those equations?

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.868299

R Square 0.753944Adjusted R Square 0.723187

Standard Error 11.147108

Observations 10

ANOVA

  df SS MS FSignificance

F

Regression 1 3045.935 3045.935 24.51298 0.0011193

Residual 8 994.0642 124.2580

Total 9 4040      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 163.706 24.2337 6.7553 0.000144 107.823 219.589

Price -0.2608 0.0526 -4.9510 0.001119 -0.382 -0.1393

The greater the F-statistic is, the lower is the probability that the estimated regression model fits the data purely by accident.

Page 15: How can we obtain those equations?

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.868299

R Square 0.753944Adjusted R Square 0.723187

Standard Error 11.147108

Observations 10

ANOVA

  df SS MS FSignificance

F

Regression 1 3045.935 3045.935 24.51298 0.0011193

Residual 8 994.0642 124.2580

Total 9 4040      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 163.706 24.2337 6.7553 0.000144 107.823 219.589

Price -0.2608 0.0526 -4.9510 0.001119 -0.382 -0.1393

The greater the F-statistic is, the lower is the probability that the estimated regression model fits the data purely by accident.

That probability is given under “Significance F”.

Page 16: How can we obtain those equations?

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.868299

R Square 0.753944Adjusted R Square 0.723187

Standard Error 11.147108

Observations 10

ANOVA

  df SS MS FSignificance

F

Regression 1 3045.935 3045.935 24.51298 0.0011193

Residual 8 994.0642 124.2580

Total 9 4040      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 163.706 24.2337 6.7553 0.000144 107.823 219.589

Price -0.2608 0.0526 -4.9510 0.001119 -0.382 -0.1393

The ‘Coefficients’ column contains the values of A and B that provide the best fit.

Page 17: How can we obtain those equations?

In our case, the regression analysis suggests that the best estimate of the demand for TVs based on the data provided is…

Page 18: How can we obtain those equations?

In our case, the regression analysis suggests that the best estimate of the demand for TVs based on the data provided is…

QD = A + B∙P

Page 19: How can we obtain those equations?

In our case, the regression analysis suggests that the best estimate of the demand for TVs based on the data provided is…

QD = 163.7 – 0.261∙P

Which means that For every $1 increase in the price of a TV set, the quantity demanded drops by (approx.) 0.26 units

Page 20: How can we obtain those equations?

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.868299

R Square 0.753944Adjusted R Square 0.723187

Standard Error 11.147108

Observations 10

ANOVA

  df SS MS FSignificance

F

Regression 1 3045.935 3045.935 24.51298 0.0011193

Residual 8 994.0642 124.2580

Total 9 4040      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 163.706 24.2337 6.7553 0.000144 107.823 219.589

Price -0.2608 0.0526 -4.9510 0.001119 -0.382 -0.1393

The ‘Coefficients’ column contains the values of A and B that provide the best fit.

Page 21: How can we obtain those equations?

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.868299

R Square 0.753944Adjusted R Square 0.723187

Standard Error 11.147108

Observations 10

ANOVA

  df SS MS FSignificance

F

Regression 1 3045.935 3045.935 24.51298 0.0011193

Residual 8 994.0642 124.2580

Total 9 4040      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 163.706 24.2337 6.7553 0.000144 107.823 219.589

Price -0.2608 0.0526 -4.9510 0.001119 -0.382 -0.1393

t-values show the ‘statistical significance’ of each

coefficient. The larger is the t-statistic, the higher are the chances that the coefficient

is different from zero.

Page 22: How can we obtain those equations?

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.868299

R Square 0.753944Adjusted R Square 0.723187

Standard Error 11.147108

Observations 10

ANOVA

  df SS MS FSignificance

F

Regression 1 3045.935 3045.935 24.51298 0.0011193

Residual 8 994.0642 124.2580

Total 9 4040      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 163.706 24.2337 6.7553 0.000144 107.823 219.589

Price -0.2608 0.0526 -4.9510 0.001119 -0.382 -0.1393

t-values show the ‘statistical significance’ of each

coefficient. The larger is the t-statistic, the higher are the chances that the coefficient

is different from zero.

As in the case of the F-statistic, the same

information is also available in the form of probabilities, or

P-values.

Page 23: How can we obtain those equations?

This, however, is very different from the statement “There is 99.89% probability that the coefficient on price is –0.26”, which is FALSE.

Another way to say it is, there is a 99.89% probability that price matters for consumer decisions (more specifically, that higher price is associated with a lower quantity demanded).

In our case, there is only a 0.11% probability that price is irrelevant for consumer decisions.

Page 24: How can we obtain those equations?

0 100 200 300 400 500 600

-200 - 100 0 100 200 300 400

The regression output gives us the “expected value” of the coefficient

But the actual value is not certain – it is distributed around the expected

value in a probabilistic manner

P = 0.2

The entire distribution lies in the positive range – P0 and we are certain the sign of the coefficient is

positive (but don’t know its actual value!)

80% of the area is in the positive range

20%

Page 25: How can we obtain those equations?

The “Lower 95%” and “Upper 95%” columns in the printout give the upper and lower bounds within which the true value of each coefficient falls with a certain probability (95% probability in this case). This range is also known as the “95% confidence interval”. In our case, …

Page 26: How can we obtain those equations?

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.868299

R Square 0.753944Adjusted R Square 0.723187

Standard Error 11.147108

Observations 10

ANOVA

  df SS MS F Signif F

Regression 1 3045.935 3045.935 24.51298 0.0011193

Residual 8 994.0642 124.2580

Total 9 4040      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 163.706 24.2337 6.7553 0.000144 107.823 219.589

Price -0.2608 0.0526 -4.9510 0.001119 -0.382 -0.1393

Page 27: How can we obtain those equations?

The “Lower 95%” and “Upper 95%” columns in the printout give the upper and lower bounds within which the true value of each coefficient falls with a certain probability (95% probability in this case). This range is also known as the “95% confidence interval”. In our case, …

… there is a 95% probability the value of the coefficient for price lies between -0.382 and -0.1393

Page 28: How can we obtain those equations?

Note the P-value for the estimated coefficient on price equals the significance of F. This is always the case when there is only one independent variable.

Page 29: How can we obtain those equations?

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.868299

R Square 0.753944Adjusted R Square 0.723187

Standard Error 11.147108

Observations 10

ANOVA

  df SS MS FSignificance

F

Regression 1 3045.935 3045.935 24.51298 0.0011193

Residual 8 994.0642 124.2580

Total 9 4040      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 163.706 24.2337 6.7553 0.000144 107.823 219.589

Price -0.2608 0.0526 -4.9510 0.001119 -0.382 -0.1393

Page 30: How can we obtain those equations?

To summarize, we want:

The sign of coefficients to make economic sense;

R2 to be

The F-statistic to be

The t-statistic to be

‘Significance F’ and P-values to be

The confidence interval to be

LARGE (close to 1)

LARGE

LARGE in absolute value

SMALL

SMALL / NARROW

Page 31: How can we obtain those equations?

Statistical significance of a coefficient is a statement about how reliable the sign of the calculated coefficient is (look at the t-statistic and the p-value).

Page 32: How can we obtain those equations?

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.868299

R Square 0.753944Adjusted R Square 0.723187

Standard Error 11.147108

Observations 10

ANOVA

  df SS MS FSignificance

F

Regression 1 3045.935 3045.935 24.51298 0.0011193

Residual 8 994.0642 124.2580

Total 9 4040      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 163.706 24.2337 6.7553 0.000144 107.823 219.589

Price -0.2608 0.0526 -4.9510 0.001119 -0.382 -0.1393

Page 33: How can we obtain those equations?

Statistical significance of a coefficient is a statement about how reliable the sign of the calculated coefficient is (look at the t-statistic and the p-value).

Example: There is a 0.11% probability that the coefficient on “Price” is non-negative

Page 34: How can we obtain those equations?

Statistical significance of a coefficient is a statement about how reliable the sign of the calculated coefficient is (look at the t-statistic and the p-value).

Example: There is a 0.11% probability that there is either a positive or no relationship between price and quantity demanded.

You may also come across the expression “the coefficient is significant at the 5% level”. That means there is no more than 5 percent probability that this coefficient has the sign opposite to the estimate.

When statistical significance of a coefficient is low, the role of the variable in question is weak or unclear.

Page 35: How can we obtain those equations?

Economic significance of a variable:How, according to the regression results, a change in one variable will affect the other variable (look at the value of the coefficient itself).

Page 36: How can we obtain those equations?

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.868299

R Square 0.753944Adjusted R Square 0.723187

Standard Error 11.147108

Observations 10

ANOVA

  df SS MS FSignificance

F

Regression 1 3045.935 3045.935 24.51298 0.0011193

Residual 8 994.0642 124.2580

Total 9 4040      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 163.706 24.2337 6.7553 0.000144 107.823 219.589

Price -0.2608 0.0526 -4.9510 0.001119 -0.382 -0.1393

Page 37: How can we obtain those equations?

Economic significance of a variable:How, according to the regression results, a change in one variable will affect the other variable (look at the value of the coefficient itself).

Example: For every $1 increase in price, quantity demanded drops by 0.261 unit.

Page 38: How can we obtain those equations?

Economic significance of a variable:How, according to the regression results, a change in one variable will affect the other variable (look at the value of the coefficient itself).

Example: For every $100 increase in price, quantity demanded drops by ~ 26 units.

Page 39: How can we obtain those equations?

Economic significance of a variable:How, according to the regression results, a change in one variable will affect the other variable (look at the value of the coefficient itself).

Example: For every $4 decrease in price, quantity demanded increases by approximately one unit.

Avoid causality statements!

Page 40: How can we obtain those equations?

What if you are not happy with the fit?(Say, F-statistic is too small, the statistical significance is small, etc.)

Sometimes this is due to the fact that the points on the scatter plot do not align well along a straight line.

In that case you may be able to make things better by trying a different specification, such as a log-linear regression.

Page 41: How can we obtain those equations?

A curve provides a better fit than a line.

Page 42: How can we obtain those equations?

DD QQ ln' PP ln'

72.2e

To run a log-linear regression, you first need to create new auxiliary variables

and

Here, ln stands for the natural logarithm (logarithm with base e, where is a mathematical constant).

ln X is a number such that elnX = X.

After that, proceed with the regression as usual.

Page 43: How can we obtain those equations?

'' 0 PQ PD PQ PD lnln 0

PPeQD0

The resulting equation will be of the form

or

which is equivalent to

What if the log-linear specification doesn’t help?

How can we increase the explanatory power further?

Add more explanatory variables!

Page 44: How can we obtain those equations?

Another potential source of errors: the “specification problem”

Example: Data on demand for soft drinks:

July

Oct

Jan

Apr

Apr

Fitted line

P

Q

Page 45: How can we obtain those equations?

Another potential source of errors: the “specification problem”

Example: Data on demand for soft drinks:

July

Oct

Jan

Apr

Apr

Fitted line

P

Q

The real story:

Page 46: How can we obtain those equations?

Another potential source of errors: the “specification problem”

Example: Data on demand for soft drinks:

July

OctJan Apr

Fitted line

P

Q

The real story:

Once again, adding more explanatory variables may help us understand things better

Page 47: How can we obtain those equations?

Multiple regression

The idea is similar to a simple regression, except there are more than one explanatory (independent) variable.

When compared to a simple regression, a multiple regression helps avoid the aforementioned “specification problem”, improve the overall goodness of fit, and improve the understanding of factors relevant for the variable of interest.

What variables other than own price could matter in the soft drink example?

Outside temperature? Town population? Etc.

Page 48: How can we obtain those equations?

Running a multiple regression in MS Excel is similar to a simple regression except the fact that, when choosing the cell range for independent variables, you need to include all the independent variables at once.

The output will contain more lines, according to the number of variables included in the regression.

(A demonstration session follows.)

Regression output can again be translated into an equation.

Page 49: How can we obtain those equations?

Such an equation helps us not only evaluate the relationship between price and quantity, but also answer such questions as…

- Are goods X and Y substitutes or complements?

- How does the consumption of our good depend on income?

- Does advertising matter and how much?

and so on.

Page 50: How can we obtain those equations?

Things to look for when adding explanatory variables:

Does R2 improve (increase) when variables are added?

Page 51: How can we obtain those equations?

Things to look for when adding explanatory variables:

Does R2 improve (increase) when variables are added? (Normally, the answer is ‘yes’.)

What is happening to R2-adjusted?(R2

adj punishes the researcher for adding variables that

don’t contribute much to the explanatory power, so it is a better criterion)

Page 52: How can we obtain those equations?

Only statistically significant variables should be included in the final regression run and the resulting equation.

More advanced statistical packages perform ‘stepwise regressions’ when the program itself decides which variables are worth keeping and which deserve to be dropped.

Page 53: How can we obtain those equations?

“Dummy” variables

Sometimes, we are interested in the role of a factor that doesn’t have a numerical value attached to it, such as gender, race, day of the week, etc.

Such factors can be included in the regression by assigning a value to each realization of the variable except one.

Dummy variables usually only take values of 0 or 1.

Page 54: How can we obtain those equations?

Examples:

Gender: “0” if male, “1” if female. (one dummy does the job)

Day of the week: We need six (7-1=6) additional variables.X1: “1” if Monday, “0” otherwise.X2: “1” if Tuesday, “0” otherwise… etc. up to X6.No variable for Sunday. We will know Sunday is important if a regression with dummies included produces a noticeably better R2

ADJ

than without them.

Page 55: How can we obtain those equations?

The “economic” interpretation of the effect of dummy variables is similar to that for regular variables.

An average male buys 10 more gallons of soft drinks in a year than an average female

On average, there are 200 more people attending a Washburn home soccer game when the game is on Friday


Recommended