Part 5: Functional Form5-1/36
Regression ModelsProfessor William GreeneStern School of Business
IOMS DepartmentDepartment of Economics
Part 5: Functional Form5-2/36
Regression and Forecasting Models
Part 5 – Elasticities and Functional Form
Part 5: Functional Form5-3/36
Linear Regression Models Model building
Linear models – cost functions Semilog models – growth models Logs and elasticities
Analyzing residuals Violations of assumptions Unusual data points Hints for improving the model
Part 5: Functional Form5-4/36
Using and Interpreting the Model
Interpreting the linear model
Semilog and growth models
Log-log model and elasticities
Part 5: Functional Form5-5/36
Statistical Cost Analysis
Output
Cost
80000700006000050000400003000020000100000
500
400
300
200
100
0
S 20.5111R-Sq 92.4%R-Sq(adj) 92.3%
Fitted Line PlotCost = 2.444 + 0.005291 Output
Generation cost ($M) and output (Millions of KWH) for 123 American electric utilities. (1970).
The units of the LHS and RHS must be the same.$M cost = b0 + b1MKWHY = $ costb0 = $ cost = 2.444 $M
b1 = $M /MKWH = 0.005291 $M/MKWH
So,…..b0 = fixed cost = total cost if MKWH = 0
b1 = marginal cost = dCost/dMKWH
b1 * MKWH = variable cost
Part 5: Functional Form5-6/36
In the millennial edition of its World Health Report, in 2000, the World Health Organization published a study that compared the successes of the health care systems of 191 countries. The results notoriously ranked the United States a dismal 37th, between Costa Rica and Slovenia. The study was widely misrepresented , universally misunderstood and was, in fact, unhelpful in understanding the different outcomes across countries. Nonetheless, the result remains controversial a decade later, as policy makers argue about why the world’s most expensive health care system isn’t the world’s best.
Part 5: Functional Form5-7/36
Application: WHO
WHO data on 191 countries in 1995-1999. Analysis of Disability Adjusted Life Expectancy = DALE EDUC = average years of education
DALE = β0 + β1EDUC + ε
Part 5: Functional Form5-8/36
The (Famous) WHO Data
Part 5: Functional Form5-9/36
The slope is the interesting quantity.Each additional year of education is associated with an increase of 3.611 in disability adjusted life expectancy.
Part 5: Functional Form5-10/36
Part 5: Functional Form5-11/36
Part 5: Functional Form5-12/36
Part 5: Functional Form5-13/36
Increase of per capita GDP of 1,000 PPP units is associated with an increase of ‘happy’ of 1000(.0018566) = 1.86. Happy ranges from 0 to 100.
Part 5: Functional Form5-14/36
Semilog Models and Growth Rates
YEARS
LogS
alar
y
302520151050
11.5
11.0
10.5
10.0
9.5
S 0.154111R-Sq 86.4%R-Sq(adj) 86.1%
Fitted Line PlotLogSalary = 9.841 + 0.04998 YEARS
LogSalary = 9.84 + 0.05 Years + e
Conclude : The slope is the growth rate per period or theproportional increase for a 1 unit change in the "x."
Part 5: Functional Form5-15/36
Salary = eYears = 0 at Starting Salary
Salary = e = 18,770Marginal change. From yeart to year t+1, log Salary goesup by 0.05. Salary changes
from e to eSay
9.84+0.05Y 9.84+0.05(Y+1)
9.84+0.05Years
9.84
we go from year 10 to year 11.
Salary goes from e to eor 30,946.03 to 32532.67 which is anincrease of 5.12%. Will be the same forany year to the next year.
9.84+0.05(10) 9.84+0.05(11)
Part 5: Functional Form5-16/36
Semilog Model for Fuel Bills
ROOMS
logF
uel
111098765432
7.5
7.0
6.5
6.0
5.5
Scatterplot of logFuel vs ROOMS
Each increase of 1 room raises the fuel bill by about 21%. [Actually closer to exp(.215)-1 = 24%.]
Part 5: Functional Form5-17/36
Using Semilog Models for Trends
MonthFli
ghts
80706050403020100
350
300
250
200
150
100
50
0
Scatterplot of Flights vs Month
Frequent Flyer Flights for 72 Months. (Text, Ex. 11.1, p. 508)
Part 5: Functional Form5-18/36
Regression Approach logFlights = β0 + β1 Months + ε b0 = 2.770, b1 = 0.03710, s = 0.06102
Month
LogF
light
s
80706050403020100
6.0
5.5
5.0
4.5
4.0
3.5
3.0
S 0.247017R-Sq 90.9%R-Sq(adj) 90.8%
Fitted Line PlotLogFlights = 2.770 + 0.03710 Month
Part 5: Functional Form5-19/36
Elasticity and Loglinear Models logy = β0 + β1logx + ε The “responsiveness” of one variable to changes
in another E.g., in economics
demand elasticity = (%ΔQ) / (%ΔP) Math: Ratio of percentage changes
%ΔQ / %ΔP = {100%[(ΔQ )/Q] / {100%[(ΔP)/P]} Units of measurement and the 100% fall out of this eqn. Elasticity = (ΔQ/ΔP)*(P/Q) Elasticities are units free
Part 5: Functional Form5-20/36
Monet Regression
Part 5: Functional Form5-21/36
Part 5: Functional Form5-22/36
Part 5: Functional Form5-23/36
Using the Residuals How do you know the model is “good?” Various diagnostics to be developed
over the semester. But, the first place to look is at the
residuals.
Part 5: Functional Form5-24/36
Residuals Can Signal a Flawed Model
Standard application: Cost function for output of a production process.
Compare linear equation to a quadratic model (in logs)
(123 American Electric Utilities)
Part 5: Functional Form5-25/36
Electricity Cost Function
Part 5: Functional Form5-26/36
Candidate Model for CostLog c = a + b log q + e
Part 5: Functional Form5-27/36
A Better Model?
Log Cost = α + β1 logOutput + β2 [logOutput]2 + ε
Part 5: Functional Form5-28/36
Candidate Models for CostThe quadratic equation is the appropriate model.
Logc = b0 + b1 logq + b2 log2q + e
Part 5: Functional Form5-29/36
Missing Variable Included
logOutputRe
sidua
l121086420
0.50
0.25
0.00
-0.25
-0.50
Residuals Versus logOutput(response is logCost)
logOutput
Resid
ual
121086420
2.0
1.5
1.0
0.5
0.0
-0.5
-1.0
Residuals Versus logOutput(response is logCost)
Residuals from the quadratic cost model
Residuals from the linear cost model
Part 5: Functional Form5-30/36
Unusual Data Points
Domestic
Over
seas
6005004003002001000
1400
1200
1000
800
600
400
200
0
S 73.0041R-Sq 52.2%R-Sq(adj) 52.1%
Regression of Foreign Box Office on DomesticOverseas = 6.693 + 1.051 Domestic
Outliers have (what appear to be) very large disturbances, ε
The 500 most successful movies
Part 5: Functional Form5-31/36
Outliers
Domestic
Over
seas
6005004003002001000
1400
1200
1000
800
600
400
200
0
S 73.0041R-Sq 52.2%R-Sq(adj) 52.1%
Regression of Foreign Box Office on DomesticOverseas = 6.693 + 1.051 Domestic
Remember the empirical rule, 99.5% of observations will lie within mean ± 3 standard deviations? We show (b0+b1x) ± 3se below.)
Titanic is 8.1 standard deviations from the regression!Only 0.86% of the 466 observations lie outside the bounds. (We will refine this later.)
These points might deserve a closer look.
Part 5: Functional Form5-32/36
logPrice = b0 + b1 logArea + e
Prices paid at auction for Monet paintings vs. surface area (in logs)
Not an outlier: Monet chose to paint a small painting. Possibly an outlier: Why was the price so low?
Part 5: Functional Form5-33/36
What to Do About Outliers
(1) Examine the data(2) Are they due to mismeasurement error or obvious
“coding errors?” Delete the observations.(3) Are they just unusual observations? Do nothing. (4) Generally, resist the temptation to remove outliers.
Especially if the sample is large. (500 movies is large.)
(5) Question why you think it is an outlier. Is it really?
Part 5: Functional Form5-34/36
Regression Options
Part 5: Functional Form5-35/36
Minitab’s Opinions
Minitab uses ± 2S to flag “large” residuals.
i
Influential observationshave very large values of | x - x | .
Part 5: Functional Form5-36/36
On Removing Outliers Be careful about singling out particular
observations this way.The resulting model might be a product of your opinions, not the real relationship in the data.
Removing outliers might create new outliers that were not outliers before.
Statistical inferences from the model will be incorrect.
Part 5: Functional Form5-37/36
Part 5: Functional Form5-38/36
Part 5: Functional Form5-39/36
Part 5: Functional Form5-40/36
Part 5: Functional Form5-41/36
Part 5: Functional Form5-42/36
Correlation?
Part 5: Functional Form5-43/36
Part 5: Functional Form5-44/36