Applied Regression -- Prof. Juran2 Outline for Session 4
Summary Measures for the Full Model Top Section of the Output
Interval Estimation More Multiple Regression Movers Nonlinear
Regression Insurance
Slide 3
Applied Regression -- Prof. Juran3 Top Section: Summary
Statistics
Slide 4
Applied Regression -- Prof. Juran4
Slide 5
5
Slide 6
6 Top Section: Summary Statistics
Slide 7
Applied Regression -- Prof. Juran7
Slide 8
8 As stated earlier R 2 is closely related to the correlation
between X and Y, indeed Furthermore, R 2, and thus r X,Y, is
closely related to the slope of the regression line via Thus,
testing the significance of the slope, testing the significance of
R 2 and testing the significance of r X,Y are essentially
equivalent.
Slide 9
Applied Regression -- Prof. Juran9
Slide 10
10
Slide 11
Applied Regression -- Prof. Juran11
Slide 12
Applied Regression -- Prof. Juran12
Slide 13
Applied Regression -- Prof. Juran13
Slide 14
Applied Regression -- Prof. Juran14
Slide 15
Applied Regression -- Prof. Juran15 Interval Estimation
Slide 16
Applied Regression -- Prof. Juran16 An Image of the Residuals
xixi yiyi (x i, y i ) X Y The observed values: The fitted values:
The residuals: Recall: The regression line passes through the data
so that the sum of squared residuals is as small as possible. (x i,
y i )
Slide 17
Applied Regression -- Prof. Juran17 Regression and Prediction
Regression lines are frequently used for predicting future values
of Y given future, conjectural or speculative values of X. Suppose
we posit a future value of X, say x 0. The predicted value,,
is
Slide 18
Applied Regression -- Prof. Juran18 Under our assumptions this
is an unbiased estimate of Y given that x=x 0, regardless of the
value of x 0. Let 0 = E ( Y(x 0 )) and thus, since the estimate is
unbiased, 0 = b 0 + b 1 x 0. However, be alert to the fact that
this estimate (prediction) of a future value has a standard error
of Furthermore, the standard error of the prediction of the
expected (mean) value of Y given x = x 0 is
Slide 19
Applied Regression -- Prof. Juran19 From these facts it follows
that a 2-sided confidence interval on the expected value of Y given
x= x 0, , is given by
Slide 20
Applied Regression -- Prof. Juran20 A 2-sided
predictioninterval on future individual values of Y given x = x 0,
y , is given by
Slide 21
Applied Regression -- Prof. Juran21 Confidence Interval on E (
Y ( x 0 )) Prediction Interval on Y ( x 0 )
Slide 22
Applied Regression -- Prof. Juran22 Note that both of these
intervals are parabolic functions in x 0, have their minimum
interval width at x 0 =, and their widths depend on and on S xx The
sum of squared x term appears so often in regression equations that
it is useful to use the abbreviation S xx. Note that S xx can
easily be obtained from the variance as computed in most
spreadsheets or statistics packages.
Slide 23
Applied Regression -- Prof. Juran23 An Image of the Prediction
and Confidence Intervals
Slide 24
Applied Regression -- Prof. Juran24
Slide 25
Applied Regression -- Prof. Juran25
Slide 26
Applied Regression -- Prof. Juran26
Slide 27
Applied Regression -- Prof. Juran27 All-Around Movers The
management question here is whether historical data can be used to
create a cost estimation model for intra-Manhattan apartment moves.
The dependent variable is the number of labor hours used, which is
a proxy for total cost in the moving business. There are two
potential independent variables: volume (in cubic feet) and the
number of rooms in the apartment being vacated.
Slide 28
Applied Regression -- Prof. Juran28 Summary Statistics
Slide 29
Applied Regression -- Prof. Juran29
Slide 30
Applied Regression -- Prof. Juran30
Slide 31
Applied Regression -- Prof. Juran31
Slide 32
Applied Regression -- Prof. Juran32
Slide 33
Applied Regression -- Prof. Juran33 The Most Obvious Simple
Regression
Slide 34
Applied Regression -- Prof. Juran34
Slide 35
Applied Regression -- Prof. Juran35 An Alternative Simple
Regression Model
Slide 36
Applied Regression -- Prof. Juran36
Slide 37
Applied Regression -- Prof. Juran37 A Multiple Regression
Model
Slide 38
Applied Regression -- Prof. Juran38
Slide 39
Applied Regression -- Prof. Juran39 Volume is the best single
predictor, but perhaps not useful if customers are to be expected
to collect these data and enter them on a web site. Rooms is a
pretty good predictor (not as good as Volume), and may be more
useful on a practical basis. Preliminary Observations
Slide 40
Applied Regression -- Prof. Juran40 The multiple regression
model makes better predictions, but not much better than either of
the simple regression models. The multiple regression model has
problems with multicollinearity. Notice the lack of significance
for the Rooms variable (and the strange coefficient). Preliminary
Observations
Slide 41
Applied Regression -- Prof. Juran41 Prediction intervals,
corresponding to the estimated number of hours for one specific
move, given one specific value for the number of rooms. Confidence
intervals, corresponding to the estimated population average number
of hours over a large number of moves, all with the same number of
rooms.
Slide 42
Applied Regression -- Prof. Juran42 Validity of the Rooms
Model
Slide 43
Applied Regression -- Prof. Juran43 Analysis of the
Residuals
Slide 44
Applied Regression -- Prof. Juran44
Slide 45
Applied Regression -- Prof. Juran45 Comments on the Rooms Model
Good explanatory power Statistically Significant Points fit the
line well But Small apartments tend to be over-estimated Large
apartments tend to be badly estimated, especially on the high side
Maybe could use more data Maybe nonlinear
Slide 46
Applied Regression -- Prof. Juran46 A Non-linear Model? Note:
If Ae B , then ln ( A ) = B.
Slide 47
Applied Regression -- Prof. Juran47
Slide 48
Applied Regression -- Prof. Juran48
Slide 49
Applied Regression -- Prof. Juran49
Slide 50
Applied Regression -- Prof. Juran50 Residual Analysis Histogram
of Residuals 0 2 4 6 8 10 12 -35-30-25-20-15-10-505101520253035
Residual Error Frequency Histogram of Residuals 0 2 4 6 8 10 12 14
-35-30-25-20-15-10-505101520253035 Residual Error Frequency Linear
Model Exponential Model
Slide 51
Applied Regression -- Prof. Juran51 Residual Errors vs.
Predictions -15 -10 -5 0 5 10 15 20 25 30 35 0102030405060
Predicted Hours Errors (Hours) Linear Model Residual Errors vs.
Predictions -20 -15 -10 -5 0 5 10 15 20 25 30 0102030405060
Predicted Hours Errors (Hours) Exponential Model
Slide 52
Applied Regression -- Prof. Juran52 Residual Errors vs. Rooms
-15 -10 -5 0 5 10 15 20 25 30 35 0123456 Rooms Errors (Hours)
Linear Model Residual Errors vs. Rooms -20 -15 -10 -5 0 5 10 15 20
25 30 0123456 Rooms Errors (Hours) Exponential Model
Slide 53
Applied Regression -- Prof. Juran53 Conclusions Regression
analysis is technically easy Creating a reliable model is subject
to creativity and judgment The Rooms model (either linear or
otherwise) is reasonably useful for this managerial application The
most serious estimation problem is when we try to make predictions
for large apartments. What about a separate model for very large
apartments?
Slide 54
Applied Regression -- Prof. Juran54
Slide 55
Insurance Case Applied Regression -- Prof. Juran55
Slide 56
Insurance Case Applied Regression -- Prof. Juran56
Slide 57
Insurance Case Applied Regression -- Prof. Juran57 The
regression with exponential equation has a higher R 2. One "real
world" explanation: companies that generate very high ROAEs will be
rewarded with higher valuation multiples The relationship might be
exponential as opposed to linear because an investment will
compound at this higher ROAE. The primary driver for this is that
Duck is an outlier in both dimensions it has a VERY high P/B and
ROAE.
Slide 58
Insurance Case Applied Regression -- Prof. Juran58
Slide 59
Applied Regression -- Prof. Juran59
Slide 60
Insurance Case Applied Regression -- Prof. Juran60 What is the
implied P/B multiple and implied total value of Circle? Using the
following equation to calculate the implied P/B multiple: Plugging
in 14.2 for x, we get y = 1.387481. The implied book value is $2.5
billion times P/B multiple of 1.387481 = an estimated total value
of $3.4687 billion.
Slide 61
Insurance Case Applied Regression -- Prof. Juran61 3. Abe has
announced that it will be making an acquisition. It is trying to
decide whether to pay in stock or in cash. a. If Abe pays with
stock, the pro-forma ROAE of the combined company will be 12.2% and
the pro-forma book value will be $16.5 billion. What is the implied
P/B multiple and implied total value of the pro-forma company? b.
If Abe pays with cash, the pro-forma ROAE of the combined company
will be 15.5% and the pro-forma book value will be $11.5 billion.
What is the implied P/B multiple and implied total value of the
pro-forma company? c. If the goal is to maximize the pro-forma
total value of the new company, how should Abe pay for the
acquisition?
Slide 62
Insurance Case Applied Regression -- Prof. Juran62 Depending on
which version of the equation we use, there are several possible
results for the estimate P/B of the new company: Abe should pay in
cash, since the total value would be $0.044527 billion higher than
if Abe paid in stock.
Slide 63
Insurance Case Applied Regression -- Prof. Juran63 4. Assume
that before the acquisition, Abe has a book value of $11.5 billion
and an ROAE of 12.8%. Abe will either issue $5 billion in stock or
use $5 billion in cash to complete the acquisition. What
incremental value, if any, is created in both the stock and cash
scenarios described above?
Slide 64
Insurance Case Applied Regression -- Prof. Juran64 Abe's total
value before the acquisition is determined by taking its ROAE of
12.8% and applying the regression equation, to get an implied P/B
multiple of 1.189957x. Applying that to total book value of $11.5
billion, we would get an implied total value of $13.68451 billion.
Adding in the $5 billion cost of the proposed acquisition, we would
get an adjusted value for Abe of $18.68451 billion. In both the
scenarios described in question 3 (stock and cash), the pro- forma
total value would be LESS than $18.68451 billion. Thus, NO
incremental value is created. (The exact result will vary depending
on which model you use.)
Slide 65
Insurance Case Applied Regression -- Prof. Juran65
Slide 66
Insurance Case Applied Regression -- Prof. Juran66
Slide 67
Insurance Case Applied Regression -- Prof. Juran67
Slide 68
Applied Regression -- Prof. Juran68 Summary Summary Measures
for the Full Model Top Section of the Output Interval Estimation
More Multiple Regression Movers Nonlinear Regression Insurance
Slide 69
Applied Regression -- Prof. Juran69 For Session 5 Cigarettes Do
a full multiple regression model of the cigarette data, and answer
the questions: www.ilr.cornell.edu/~hadi/RABE/Data/P081.txt Cars Do
a multiple regression model of the cars data Just quantitative
independent variables; well talk next time about the qualitative
ones