+ All Categories
Home > Documents > Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

Date post: 28-Feb-2018
Category:
Upload: anonymous-y4ajgz
View: 220 times
Download: 0 times
Share this document with a friend

of 160

Transcript
  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    1/160

    Management 6381: Managerial Statistics

    Lectures and Outlines

    2016 Bruce Cooil

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    2/160

    2

    See the Bottom Right Corner of Each Page for the Document Page Numbers Listed Here.

    TABLE OF CONTENTS

    Lecture 1

    Descriptive Statistics

    (Including Stem &

    Leaf Plots, Box Plots,

    Regression Example) 1Stem & Leaf Di splay 1

    Descri ptive Statistics: Means,

    Median, Std.Dev., IQR 2

    Box Plot 3

    Regression 10

    Lecture 2

    Central Limit

    Theorem & CIs 12Statement of Theorem 12

    Simulations 13

    Practical I ssues & Examples 15

    Tail Probabil iti es & Z-values 16

    Z-Value Notation 17

    Picture of CLT 19

    Everythi ng There I s to Know 20

    Summary & 3 Types of CI s 21

    Glossary 22

    Lecture 3CIs & Introduction to

    Hypothesis Testing 23Examples of Two Main Types

    of CI s 23

    Hypothesis Testing 25

    Type I & Type I I Er ror 27

    Pictures of the Right and Left

    Tai l P-Values 29

    Big Pictur e Recap 30

    Glossary 31

    Lecture 4

    One- & Two-Tailed

    Tests, Tests on

    Proportion, & Two

    Sample Test 32When to Use t-Values (Case 2) 34

    Test on Sample Proporti on

    (Case 3) 34

    Means from Two Samples

    (Case 4) 35

    About t-Distri bution 38

    Lecture 5

    More Tests on Means

    from Two Samples 39Tests on Two Proportions 39

    Odds, Odds Ratio, & Relati ve 44

    Tests on Pair ed Samples 45

    F inding the Right Case 47

    Lecture 6

    Simple Linear

    Regression 48Purposes 48

    The Thr ee Assumptions,

    Terminology, & Notation 49

    Modeli ng Cost in Terms of

    Units 50

    Estimation & I nterpretation of

    Coefficients 51

    Decompositi on of SS(Total ) 52

    Main Regression Output 53

    Measures of F it 54Correlation 56

    Di scussion Questions 57I nterpretation of Plots

    59How to Do Regression in

    Minitab 61

    How to Do Regression i n Excel 62

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    3/160

    3

    See the Bottom Right Corner of Each Page for the Document Page Numbers Listed Here.

    TABLE OF CONTENTS

    Lecture 6 Addendum

    Terminology, Examples

    &Notation 63

    Synonym Groups 63

    Main I deas 63Examples of Corr elati on 64

    Notation for Types of Var iation

    and R2 66

    Lecture 7

    Inferences About

    Regression Coefficients

    & Confidence/Prediction

    Intervals for Y/Y 67

    Modeli ng Home Pri ces Using 68

    Regression Output 72

    Two Basic Tests 73

    Test for Lack- of-F it 74

    Test on Coeff icients 75

    Prediction I ntervals & Confi dence

    I ntervals 76

    How to Generate These Intervals

    in M ini tab 17 77

    Lecture 8

    Introduction to Multiple

    Regression 80

    Application to Predicting Product

    Share (Super Bowl Broadcast) 81

    3-D Scatterplot 82

    Regression Output 84

    Sequenti al Sums of Squares 85

    Squared Coeff icient t-Ratio

    Measures Marginal Value 86

    Di scussion Questions on

    I nterpreting Output 88

    Lecture 9

    More Multiple

    Regression Examples 90Modeli ng Salaries (NF L

    Coaches

    2015) 90Modeli ng Home Pri ces 93

    Regression Di alog Boxes 99

    Lecture 10

    Strategies for Finding the

    Best Model 102Stepwise Approach 102

    Best Subsets Appr oach 103

    Procedure for F inding Best Model 104

    Studying Successfu l Products (TV

    Shows) 105

    Best Subsets Output 108

    Stepwise Options 109

    Stepwise Output 110

    Best Predictive Model 111

    Regression on All Candidate

    Predictors to Find Redundant

    Predictors 113

    Other Cr iteri a for Selecting Models 114

    Discoverers 115

    Lecture 11

    1-Way Analysis of

    Variance (ANOVA) as a

    Multiple Regression 116Comparing Dif ferent Types of

    Mutual Funds116

    Meaning of the Coeff icients 118

    Purpose of Overall F -test and

    Coeff icient t-Tests 120

    Comparing Network Share by

    Location of Super Bowl 122

    Standard Formulation of 1-WayANOVA 125

    Analysis of Covariance 126

    Looking Up F Critical Values 128

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    4/160

    4

    See the Bottom Right Corner of Each Page for the Document Page Numbers Listed Here.

    TABLE OF CONTENTS

    Lecture 12

    Chi-Square Tests for

    Goodness-of-Fit &

    Independence 129Goodness-of -F it Test 129

    Test for I ndependence 130

    Using M in itab 132

    Lecture 13

    Executive Summary &

    Notes for Final Exam,

    Outline of the Course &

    Review Questions 133Executive Summary & Notes for

    Final 133

    Outli ne of Course 135

    Review Questions with Answers 140

    Appendix for Review Questions 145

    The Outlines

    Tests Concerning Means

    and Proportions &

    Outline of Methods for

    Regression 149Tests Concerni ng M eans and

    Proportions151

    Conf idence I ntervals for the Seven

    Cases 153

    Outl ine of M ethods for Regression 154

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    5/160

    Lecture 1: Descriptive StatisticsManagerial Statistics

    Reference: Ch. 2: 2.4, 2.6pp. 56-59, 67-68; Ch. 3: 3.1-3.4 --pp. 98-105, 108-116, 118-129Outline:!Stem and Leaf displays

    !Descriptive StatisticsMeasures of the Center: mean, quartiles, trimmed mean, median

    Measures of Dispersion: standard deviation, interquartile range!Box plots & Regression as Descriptive Tools

    Stem and Leaf Displays The rules:1) List extremes separately;

    2) Divide the remaining observations into from 5 to 15 intervals;

    3) The stem represents the first part of each observation and is used to label the interval, while the

    leaf represents the next digit of each observation;4) Dont hesitate to bend or break these rules.

    Famous Ancient Example (modified slightly): Salaries of 10 college graduates in thousands (1950s):

    2.1, 2.9, 3.2, 3.3, 3.5, 4.6, 4.8 , 5.5, 7.9, 50.

    Stem and Leaf

    (With trimming)

    Units:0.10 Thousand$

    2|19

    3|235

    4|68

    5|5

    6|

    7|9

    High: 500MINITABs Version:This is an option in the Graph Menu, Or you can give the commands shown.

    Stem-and-Leaf Displays

    Stem and Leaf Display When Numbers Above are

    in Units of $100,000 (i.e., same data X 100)

    UNITS: 0.1 *100 = 10 Thousand $

    Same Display

    High: 500

    No Trimming! (Here the extreme observations are

    included in the main part of the display.)

    MTB> Stem c1

    Leaf Unit = 1.0

    (9) 0 223334457

    1 1

    1 2

    1 3

    1 4

    1 5 0

    With Trimming

    MTB > Stem c1;

    SUBC> trim.

    Leaf Unit = 0.10

    2 2 19

    5 3 235

    5 4 68

    3 5 5

    2 6

    2 7 9

    HI 500

    Page 1 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    6/160

    Another Example: Make S&L of 11 customer expenditures at an electronics store(dollars): 235, 403, 539, 705, 248, 350, 909, 506, 911, 418, 283.

    Units: $10

    3 2|348

    4 3|5(2) 4|01

    5 5|30

    3 6|

    3 7|0

    2 8|

    2 9|01

    Now reconsider the first example with the 10 salaries!I put those 10 observations into the first column of a Minitab spreadsheet (orworksheet) and then asked for descriptive statistics.

    MTB > desc c1

    Descriptive Statistics

    Variable N Mean Median TrMean StDev SE Mean

    Salaries 10 8.78 4.05 4.46 14.58 4.61

    Variable Minimum Maximum Q1 Q3

    Salaries 2.10 50.00 3.12 6.10

    What do the Mean, TrMmean, Median, Q1" and Q3" represent?

    Mean: Average of Sample

    TrMean (5% Trimmed Mean): Average of middle 90% of sample

    Median: Middle Observation (when n is even: average of middle two obs.)

    OR 50th

    Percentile

    Q1 & Q3 (1stand 3

    rdquartiles):

    25thand 75thPercentiles

    Page 2 of 156

    2

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    7/160

    Note how the median is much better measure of a typical central value in this case.

    Recall how standard deviation is calculated.

    First the sample variance is calculated:

    S2= estimate of average squared distance from the mean

    = {sum of squared differences (Obs-Mean)2}/(n-1)

    ={2.1 -8.78)2+(2.9 -8.78)

    2+...+ (50 -8.78)

    2}/9= 212.6.

    Then the sample standard deviation is calculated as the square root of the

    sample variance:

    s = (212.6)= 14.58 .

    As a descriptive statistic, s is usually interpreted as the typical distance of anobservation from the mean. But what does s actually measure?

    Square root of average squared distance from mean

    Whats the disadvantage ofSas a measure of dispersion (or spread)?

    Sensitive to extreme observations (large and small)

    Whats an alternative measure of dispersion that is insensitive to extremes?

    0.75 * (Q3 - Q1)

    [Q3-Q1] is referred to as the interquartile range (IQR). If the distribution is

    approximately normal, then

    (0.75)(Q3 - Q1) (0.75) IQR

    provides an estimate of the population standard deviation ().

    For sample of 10 salaries:(0.75) IQR =0.75(6.10 - 3.12) =2.2.

    (Compare with s= 14.58.)The Boxplot

    Elements: Q1, median, and Q3 are represented as a box, and 2 sets of fences

    (inner and outer) are graphed at intervals of 1.5 IQR below Q1 and above Q3.

    The figures on pages 122-125 (in our text by Bowerman et al.) provide goodillustrations.

    Page 3 of 156

    3

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    8/160

    Inner Fences

    Page 4 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    9/160

    MINITAB Boxplot of the 10 Salaries

    Result of Menu Commands: GraphBoxplot

    50

    40

    30

    20

    10

    0

    Salaries

    Boxplot of Salaries

    Page 5 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    10/160

    More Examples with Another Data Set Where We Compare Distributions

    These data are fromhttp://www.google.com/finance and consist of daily closing prices and

    returns (in %) for Google stock and the S&P500 index (see the variables Google_Return and

    S&P500_Return below), and a column of standard normal observations.

    . . .

    Page 6 of 156

    http://www.google.com/financehttp://www.google.com/financehttp://www.google.com/financehttp://www.google.com/finance
  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    11/160

    Page 7 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    12/160

    (Recall what the Standard Normal distribution looks like, e.g.http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg.)

    MTB > describe c3 c5 c6

    (Or to do same analysis from menus: start from Statmenu, got to Basic Statistics & then to Display

    Descriptive Statistics, then in the dialog box select c3, c5,and c6 as the variables.)

    Descriptive Statistics: Google_Return, S&P_Return, Standard_Normal

    Descriptive Statistics: Google_Return, S&P_Return, Standard_Normal

    Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum

    Google_Return 29 1 -0.207 0.260 1.401 -5.414 -0.772 -0.086 0.771 1.674

    S&P_Return 29 1 0.026 0.116 0.624 -1.198 -0.414 0.017 0.534 1.051

    Standard_Normal 30 0 -0.134 0.170 0.931 -1.778 -0.813 -0.184 0.598 1.871

    Standard_NormalS&P_ReturnGoogle_Return

    2

    1

    0

    -1

    -2

    -3

    -4

    -5

    -6

    Data

    22-Apr-16

    Boxplot of Google_Return, S&P_Return, Standard_Normal

    quarterly results

    Apparently due to announcement of disappointing

    Page 8 of 156

    http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svghttp://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svghttp://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svghttp://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg
  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    13/160

    In contrast to the boxplots on the previous page, many business distributions are

    positively skewed. For example, here is a comparison of the revenue distribution

    for the largest firms in three health care industries.

    Pharmacy & Other ServicesMedical FacilitiesInsurance & Managed Care

    1 20

    1 00

    80

    60

    40

    20

    0

    R

    evenue(Billions)

    Express_Scripts_Holdin

    HCA_Holdings

    United_Health_ Group

    Boxplot of 2014 Revenues in Three Health Care Industries

    (12 Firms) (13 Firms) (13 Firms)

    (for Firms That Are Among the Largest 1000 in U.S.)

    Page 9 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    14/160

    Page 10 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    15/160

    11

    The regression equation is approximately:

    Google_Return = - 0.2317 + 0.9408 [S&P_Return]

    This equation describes the relationship between Google returns and

    returns on the S&P500. If we assume the S&P500 represents the market

    as a whole, then this regression model is a form of the market model (or

    security characteristic line). It summarizes the relationship betweencontemporaneous returns, since the Google returns and S&P returns

    occur at the same time. Consequently, it has no direct predictive value

    (i.e., it does not allow me to predict future Google returns because that

    would require me to know future S&P returns). Nevertheless, it allows

    me to study the relationship between Google returns and the market as a

    whole. For example, the expected return on Google stock, when the

    S&P500 return is 0, is:

    Google_Return = - 0.2317 + 0.9408 [0] = -0.23% (approximately)

    The regression analysis also shows that there is a relatively weak

    relationship between the two types of returns: the R-squared (adjusted)

    value (to the right of the plot) indicates that only 14.5% of the variance

    in Google returns is explained by the market return.

    This type of regression is generally done with weekly or monthly

    returns, rather than daily returns (as was done here).

    1 .00.50.0-0.5-1.0-1.5

    2

    1

    0

    -1

    -2

    -3

    -4

    -5

    -6

    S 1 .29557

    R-Sq 1 7.5%

    R-Sq(adj) 1 4.5%

    S& P_Return

    Google_

    Return

    Fitted Line PlotGoogle_Return = - 0.231 7 + 0.9408 S&P_Return

    Page 11 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    16/160

    Lecture 2: The Central Limit Theorem & Confidence Intervals

    Outline (Reference: Ch. 7-8: 7.1-7.3, App 7.3, 8.1-8.5, App 8.3): The Central Limit Theorem (CLT):

    Sample Means Have Approximately a Normal Distribution (given asufficiently large sample, from virtually any parent distribution)

    Illustration of How Sample Proportion is a Sample Mean (with simulations) Two Examples: Example 1: sample mean; Example 2: sample proportion Introduction to Confidence Intervals (CLT application): Z-values, Picture of

    CLT, a Quiz, & Major Types of CIs

    Assume you have a "large" sample size n, and that you find the sample mean,, as the average of n observations, each of which is from a parentdistribution (or population) with mean and standard deviation .

    Statement of the Centr al L imit Theorem:

    The sample mean has approximately a normal distribution with mean and standard deviation /n.

    Note: A sample proportion(

    ) is simply the mean of n independentobservations where each observation takes on the value "1" with probability p,and takes on the value 0" with probability 1-p.

    For example, suppose a company studies the probability (p) that an individualcustomer complains. Think of each customer response as a random variableX, which takes on the value "1" if they complain, and "0" otherwise. Imaginethat each customer complains with a probability p = 0.1.

    What i s the name of the parent distr ibution of X ?

    X has a Bernoulli distribution, which is also referred to as a binomial

    distribution with 1 trial (n=1, p=0.1). The mean () is n*p =1*0.1=0.1,

    and the standard deviation () is [np(1-p)]=[1*0.1*0.9]1/2= 0.3.

    By Central Limit Theorem: if I sample 100 customers, then the proportion whocomplain (p) will have a normal distribution (approximately), with the same mean

    (0.1) but a much smaller standard deviation of 0.3/n = 0.3/100 = 0.03.

    Page 12 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    17/160

    2

    Here is a picture of the parent distribution.

    10

    0.9

    0.8

    0.7

    0.6

    0.5

    0.4

    0.3

    0.2

    0.1

    0.0

    Value of O bservation (1 : Complaint; 0 : No Complaint)

    Frequency

    Parent Distribution: Binomial (n=1, p=0.1)

    In a simulation, I repeatedly took a random sample of 100observations from the parent distribution above, and calculatedthe mean of each sample of 100 observations. I did this a 1000times. Here is a histogram of those 1000 means.

    0.210.180.150.120.090.060.03

    160

    140

    120

    100

    80

    60

    40

    20

    0

    Value of the Mean

    Frequency

    Mean 0.09962

    StDev 0.02918N 1000

    Histogram of 1000 Means (Each is the Average of 100 Observations)(and Comparison with Normal Distribution)

    As predicted by the Central Limit Theorem: this distribution isapproximately normal and the sample mean (the mean of themeans) is approximately 0.1 (same as the parent), and thesample standard deviation (of the means) is approximately 0.03(= [parent distributions std.dev.]/ n = 0.3/100).

    Page 13 of 156

    Note: this is

    approximately 0.03

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    18/160

    3

    Another simulation: suppose we toss one fair die. Here is theprobability distribution of the outcome.

    654321

    0.18

    0.16

    0.14

    0.12

    0.10

    0.08

    0.06

    0.04

    0.02

    0.00

    Outcome of Tossed Die

    Frequency

    Parent Distribution: Integers 1-6 Are Equally Probable

    I repeatedly take a random sample of 2 observations from theparent distribution above, and calculate the mean of each sampleof 2 observations. I do this a 1000 times. Here is a histogram ofthose 1000 means (each mean is of only 2 observations).

    6.45.64.84.03.22.41.60.8

    180

    160

    140

    120

    100

    80

    60

    40

    20

    0

    Value of the mean

    Frequency

    M e an 3. 53 9

    S tD ev 1 .18 4

    N 1000

    Histogram of 1000 Means (Each is the Average of 2 Observations)

    As predicted by the Central Limit Theorem: this distribution isapproximately normal and the sample mean (the mean of themeans) is approximately 3.5 (same as the parent), and thesample standard deviation (of the means) is approximately 1.2

    (1.2 = [parent distributions std. dev.]/n = 1.7/2). Page 14 of 156

    Note: This is

    approximately 1.2

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    19/160

    4

    Practical Issues & Two Examples

    How large should n be? Here are two guides:

    1)for a typical sample mean, : n > 30 (this is a conservative rule);2)for a sample proportion: n large enough so that & ( ) .

    Example 1

    Here are descriptive statistics for 40 annual returns on the S&P500(these returns are simple annual percent gain or loss on index, withoutcompounding or inclusion of dividends), 1975-2014.

    MTB > desc 'S&P_Return' StDev/N = [16.57/40

    Descriptive Statistics: S&P_Return

    Variable N N* Mean SE Mean StDev Minimum

    S&P_Return 40 0 13.41 2.62 16.57 -36.55

    Variable Q1 Median Q3 MaximumS&P_Return 4.99 15.75 27.74 37.20

    This summary shows that:

    n = 40, = 13.41 (this is an estimate of ), s = 16.57 (an estimate of );and s/n = 16.57/(40) = 2.62

    Describe the distr ibution of (the sample mean), assuming the actualdistr ibution of S&P_Retur n remains unchanged dur ing 1975-2014:The distribution is approximately Normal with mean of approximately

    13.41 and std. dev. of approx. 2.62 .

    Example 2

    Suppose I interview 100 people and 20 prefer a new product (to

    competing brands). I want to estimate: p proportion of population thatprefer the new brand. (Each customer preference is a Bernoulli

    observation, with an approx. mean of 0.20 and approx. variance of[0.20 0.8]=0.16.)

    I n summary, the sample proportion,p , is: 20/100 = 0.2.

    p behaves as though it has a normal distribution,

    with a mean of approximately 0.2 (this is our estimate) and a standard

    deviation of approximately[0.2*0.8/100]1/2 = 0.04 .

    Recall that forBernoulli Distribut = p, 2=p(1-p).Consequently:

    /n= [p(1-p)/n]1/2

    Page 15 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    20/160

    5

    Tail-Probabilities & the Corresponding Normal Values (Z-values)

    0.4

    0.3

    0.2

    0.1

    0.0

    Frequency

    General Normal Distribution

    Tail Probabilities

    0.10 >

    0.05 >

    0.025 >

    Value of Normal Random Variable

    Page 16 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    21/160

    6

    Z-Value Notation

    z is used to represent the standard normal value above which there is a tail

    probability of .Tail probability is

    z

    Verify that z0.10= 1.28, z0.05=1.645, and that z0.025= 1.96. (Use normaltable, e.g.,http://www2.owen.vanderbilt.edu/bruce.cooil/cumulative_standard_normal.pdf.)

    Page 17 of 156

    To verify that Z = 1.28:

    0.10

    0.10

    Z

    0.90

    Tail probability is 0.10,

    So find Z-value that

    corresponds

    to cumulative prob. of 0.9 .

    => It's 1.28

    To verify that Z = 1.645:0.05

    Z0.05

    Tail probability is 0.05,

    So find Z-value that

    corresponds

    to cumulative prob. of 0.95.

    => It's 1.645

    Verify that Z = 1.96 !0.025

    Z

    0.025

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    22/160

    0z

    Cumulative probabilities for POSITIVE z-values are in the following table:

    z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09

    0.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359

    0.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753

    0.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141

    0.3 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517

    0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879

    0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224

    0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549

    0.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852

    0.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133

    0.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389

    1.0 .8413 .8438 .8461 .8485 .8508 .8531 .8554 .8577 .8599 .8621

    1.1 .8643 .8665 .8686 .8708 .8729 .8749 .8770 .8790 .8810 .8830

    1.2 .8849 .8869 .8888 .8907 .8925 .8944 .8962 .8980 .8997 .9015

    1.3 .9032 .9049 .9066 .9082 .9099 .9115 .9131 .9147 .9162 .9177

    1.4 .9192 .9207 .9222 .9236 .9251 .9265 .9279 .9292 .9306 .9319

    1.5 .9332 .9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441

    1.6 .9452 .9463 .9474 .9484 .9495 .9505 .9515 .9525 .9535 .9545

    1.7 .9554 .9564 .9573 .9582 .9591 .9599 .9608 .9616 .9625 .9633

    1.8 .9641 .9649 .9656 .9664 .9671 .9678 .9686 .9693 .9699 .9706

    1.9 .9713 .9719 .9726 .9732 .9738 .9744 .9750 .9756 .9761 .9767

    2.0 .9772 .9778 .9783 .9788 .9793 .9798 .9803 .9808 .9812 .9817

    2.1 .9821 .9826 .9830 .9834 .9838 .9842 .9846 .9850 .9854 .9857

    2.2 .9861 .9864 .9868 .9871 .9875 .9878 .9881 .9884 .9887 .9890

    2.3 .9893 .9896 .9898 .9901 .9904 .9906 .9909 .9911 .9913 .9916

    2.4 .9918 .9920 .9922 .9925 .9927 .9929 .9931 .9932 .9934 .9936

    2.5 .9938 .9940 .9941 .9943 .9945 .9946 .9948 .9949 .9951 .9952

    2.6 .9953 .9955 .9956 .9957 .9959 .9960 .9961 .9962 .9963 .9964

    2.7 .9965 .9966 .9967 .9968 .9969 .9970 .9971 .9972 .9973 .9974

    2.8 .9974 .9975 .9976 .9977 .9977 .9978 .9979 .9979 .9980 .9981

    2.9 .9981 .9982 .9982 .9983 .9984 .9984 .9985 .9985 .9986 .9986

    3.0 .9987 .9987 .9987 .9988 .9988 .9989 .9989 .9989 .9990 .9990

    Page 18 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    23/160

    7

    Picture of the Central Limit Theorem

    Acknowledgment: This picture of the Central Limit Theorem is based on a much prettier graph made for this course by Tim Keiningham,Global Chief Strategy Officer and Executive Vice President,Ipsos Loyalty (also a student in an earlier version of this course).

    Page 19 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    24/160

    8

    Everything There is to Know About the Normal Distribution,

    The Central Limit Theorem, and Confidence Intervals

    The Central Limit Theorem states that the distribution of (the

    distribution of sample means) is approximately normal with mean and

    variance 2/n, abbreviated:

    is approximately N(,2/n), where:

    is the mean of the distribution of ( is also the mean of the

    population from which the observations were sampled).

    is the sample mean. (The sample is taken from a population with

    mean and variance 2. Think ofas an "estimate" of .)

    2/n is the variance of the distribution of . Also referred to as the

    variance of.

    /n is the standard error of (the sample mean). It is also sometimes

    called the SE mean or standard deviation of.

    The figure on the top of the previous page indicates:

    is within 1.28 standard errors*of with probability 80% .

    is within 1.645 standard errors*of with probability 90% .

    is within 1.96 standard errors*of with probability 95% .

    *Remember that the standard error of is /n.

    Another Way of Saying the Same Thing

    (1.28) (/n) is an __80__% confidence interval for .

    (1.645)(/n) is a __90__% confidence interval for .

    (1.96) (/n) is a __95__% confidence interval for .

    Page 20 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    25/160

    9

    Br ief Summary of Chapter 8

    Three Types of Confidence Intervals Are Introduced in Chapter 8

    1) 100(1-)% confidence Interval for when n > 30:

    /(/ ).2) 100(1-)% confidence Interval for p:

    / ( )/.This assumes n is large enough so that & ( ) .

    3) 100(1-)% confidence Interval for when n< 30:

    /()(/ ).

    This is the same as confidence interval (1), except that a t-value is now used in place of the z-

    value.

    NOTE: The text refers to "/()

    " as "t/2".

    General Form of Confidence Intervals:Estimate [ (t- or z-value) x (Standard Dev.of Estimate)]

    Examp le 1:Consider the return data: n=40, =13.41, s/n=2.62.Find a 90% confidence interval for :

    13.41 1.645 (2.62)

    Find a 95% C.I. for :

    13.41 1.96 (2.62)

    Examp le 2: Consider the product preference example above.

    Here = 0.2 and n=100.Find a 90% confidence interval for p(the actual proportion):

    0.2 (1.645)[.2(.8)/100]

    Find an 80% C.I. for p:

    0.2 1.28(0.04)

    Examp le 3:Suppose we consider only the last 16 changes in S&P:

    n=16, = 6.93, s/n= 4.69. Must use t-value because n

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    26/160

    10

    Glossary

    Reference: Chapter 5 (pp. 188, 190) versus Chapter 3 (pp. 100, 110)

    The Mean of a Distribution: () (). (1)

    The mean of a distribution (or a random variable X) is simply the weighted average of its

    realizable outcomes, where each realizable value is weighted by its probability, P(x).

    Contrast this definition with the definition of a sample mean:

    x x n ( / )1 (= ). (2)The only difference is that (1/n), the frequency with which each observation occurs in the

    sample, replaces P(x) in equation (1).

    The Variance of a Distribution: ( ) ( )(). (3)

    The variance of a distribution (of a random variable X) may also be calculated as = 2() 2. Note the first term in this last expression is just the expectation or averagevalue of X2.

    Standard Deviation of a Distribution:

    = ( )()/ = / . (4)Compare this with the definition of the sample standard deviation:

    = ( ) ()/

    = ( )/ ( )/.

    (The sample variance is: = [ ( )/ ( )].)

    _________________________________________________________________

    ANSWERS to Examples (on Bottom of Previous Page)

    Example 1

    1) 90% CI: Z0.05 (s/n) = 13.41 1.645 (2.62) = 13.41 4.31OR: (9.1, 17.7)

    2) 95% CI: Z0.025(s/n) = 13.41 1.96 (2.62) = 13.41 5.13OR: (8.3, 18.5)

    Example 2

    1) 90% CI: 0.05 (1 )/ = 0.2 (1.645)[.2(.8)/100]= 0.2 0.066

    OR: (13%, 27%)

    2) 80% CI: 0.0 (1 )/ = 0.2 1.28(0.04) = 0.2 0.051OR: (15%, 25%)

    Example 399% CI:

    /()

    (/ ) = 6.93 2.947 (4.69) = 6.93 13.82 OR: (-6.9, 20.8)Page 22 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    27/160

    Lecture 3

    Confidence Intervals for Means and Proportions& Introduction to Hypothesis Testing (Large Sample Mean)

    Outline (Ch.9: 9.1-9.2) Recap of C.I.s for Means and Proportions

    One-tailed tests on sample meanWhat is type I error? Type II error? Power?

    Everything to Know About the Test Statistic and P-value

    Recap of Confidence IntervalsExample 1I have just designed a new type of mid-size car with a hybrid engine.To determine its average fuel efficiency (mpg), I sample 30 mile per

    gallon measurements from 30 different cars (city driving).MTB > print c1

    MPG

    70.4 70.5 70.8 71.2 72.5 73.5 75.1 77.0 77.2 77.4 77.9 78.3

    80.3 80.9 81.1 81.4 84.2 84.2 84.3 85.4 85.6 86.3 86.3 86.7

    89.3 89.7 89.9 90.6 91.0 92.1

    MTB > stem c1;

    SUBC> trim.

    Stem-and-Leaf Display: MPGStem-and-leaf of MPG N = 30; Leaf Unit = 1.0

    4 7 0001

    6 7 23

    7 7 5

    11 7 7777

    12 7 8

    (4) 8 0011

    14 8

    14 8 44455

    9 8 666

    6 8 999

    3 9 01

    1 9 2

    MTB > desc c1

    Descriptive Statistics: MPGVariable N Mean SE Mean TrMean StDev Minimum Q1

    MPG 30 81.37 1.24 81.43 6.80 70.40 76.53

    Variable Median Q3 Maximum

    MPG 81.25 86.40 92.10

    Note: SE Mean (or 1.24) = / = (6.8/30)

    MPG

    95

    90

    85

    80

    75

    70

    Boxplot of MPG

    Page 23 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    28/160

    2

    Find a 95% confidence interval for the real mean mpg () andinterpret it.

    C.I.: ./ = 81.37 1.96 (1.24)

    = 81.37 2.43 or (78.9, 83.8)

    Interpretation: This covers the real mean () with 95%

    probability

    Would an 80% confidence interval be longer or shorter?

    Shorter!

    (Use Z0.10= 1.28, and interval becomes (79.8, 83.0).)

    (The convention is : Use t-values when n

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    29/160

    3

    Hypothesis Testing

    Reconsider the new hybrid car example (example 1). Suppose that I want

    to show that my new car has an average mpg () that is better than that of

    the best performing competitor, for which the average mpg is 78. Formally, I want

    to "disprove" a null hypothesis

    H0: =78 (or sometimes written as 78)

    in favor of the alternative hypothesis:

    H1: > 78.

    Note that: n=30,

    =81.37, s= 6.8, s/n = 1.24. (For n < 30, the procedure isidentical except when we find the critical value. That case will also be discussed.)

    To build a case for H1, I follow 3 logical steps (typical of all hypothesis testing).

    1) Assume H0is true.

    2) Construct a test statisticwith a known distribution (using H0).

    In this case I use the test statisti c,z [ - 78]/(s/n)

    which should have approximately a standard normal

    distribution if H0is true. (WHY? CLT, since n is large)

    3) Reject H0in favor of H1if the value of z supports H1.

    ("Large" values of z support H1in this case.)

    Regarding step 3, if H0is true, I would see values of z greater than z0.05= 1.645

    only 5% of the time. This seems improbable and it supports H1and so a reasonable

    decision rule is to: reject H0in favor of H1if z is greater than 1.645. This assumes

    that I am wil l ing to make a mistake 5% of the time.

    Page 25 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    30/160

    4

    In this sample,

    z = [ - 78]/(s/n) = [81.37-78]/1.24 = 2.72 > 1.645.

    Therefore, I reject H0in favor of H1.

    SUMMARY: to test H0: = 78 versus H1: > 78

    we use the decision rule: reject H0if

    z = [ - 78]/(s/n) > z

    or equivalently if: > 78 + z(s/n).

    Otherwise we accept H0.

    In this case z= 2.72, so I reject the null hypothesis H0at the 0.05 level, and

    conclude in favor of the alternative hypothesis H1. That is, I conclude that

    the average mpg of the new hybrid automobile is significantly greater than

    78, but using this decision rule (i.e., rejecting H0whenever z>z0.05) there is

    a 5% chance that I have erroneously rejected H0and that the real average

    mpg () really is only 78 (or less).

    Above we chose = 0.05, so that z0.05= 1.645. This probability isreferred

    to as the significance level, and it is the maximum probability of making a

    type I error: type I error refers to the error we make if we reject H0when H0

    is in fact true. Typically we use = 0.001, 0.010, 0.025, 0.05, 0.1, or 0.2

    so that z= 3.09, 2.33, 1.96, 1.645, 1.28, or 0.84, respectively

    (the corresponding t-values are very similar for moderate values of n:

    for n=20: t

    19

    = 3.6, 2.5, 2.1, 1.7, 1.3, or 0.86;

    for n=30: t( )29

    = 3.4, 2.5, 2.0, 1.7, 1.3, or 0.85 ).

    Suppose that I had chosen = 0.001, then since z0.001 = 3.09, and z = 2.72,

    I would accept H0because z =2.72 >/ z0.001=3.09. In this case, I would be

    concerned that I made a type II error. Type II error refers to the case where

    the null hypothesis H0is really false but I fail to reject it! The following

    figure summarizes the situation with type I and II errors.

    Page 26 of 156

    Z0.05

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    31/160

    5

    DECISION WHAT IS REALLY TRUE

    H0IS TRUE H1IS TRUE

    REJECT H0 Type I Error Correct

    Decision

    ACCEPT H0 Correct

    Decision

    Type II Error

    Good Lingo: Cannot Reject H0 can be used for Accept H0.

    Bad Lingo: Accept H1 should not be used for Reject H0.

    How do we protect against:

    Type 1 Error? Small Type II Error? Large n

    Note that to make a decision on whether to reject or accept H0: =78, we simply

    need to compare the test statistic z = [ - 78]/(s/n) with an appropriate normalvalue, z, that corresponds to the significance level that is chosen beforehand. Ifz > z, we reject H0(otherwise accept H0).

    Distribution of Test Statistic (Z) When H0Is True

    z0.05 z z0.001

    1.645 2.72 3.09

    Alternatively, we could simply look up the tail probability that corresponds to

    the test statistic z (this is called the p-value) and compare it to the

    significance level . If thep-value is less than (p-value < ), wereject H0

    (otherwise accept H0).

    In this case z = 2.72, and the p-value for H0: = 78 versus H1: > 78, is theright tail-probability (because this is a one-tailed test where the alternative

    hypothesis goes to the right-side). What is the p-value in this case?

    P-value (probability to the right of 2.72) = 1 - [Cumulative probability at 2.72]

    =1 - 0.9967 0.0033

    Can we reject H0at the 0.05 level? YES At the 0.001 level? NO!Page 27 of 156

    P-Value: the

    probability to right

    of z

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    32/160

    6

    Another One-Tailed Test (in the other direction):

    Suppose I make the claim that my cars average mpg is 80 (theobservations on page 1 were really drawn from a normal distribution

    with = 80). My competitor might be interested in testing:

    H0: = 80 (sometimes written > 80) versus H1: < 80.

    And suppose my competitor chooses a significance level of = 0.10.

    In this case the test statistic is:

    z = [ - 80]/(s/n) = [81.37 - 80]/1.24 = 1.10

    andsmall values of z support the alternative hypothesis H1: z 0.05= 1.645 z < - z0.1= -1.28

    Page 28 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    33/160

    7

    Alternatively, we can find the p-value that corresponds to the test

    statistic, z, for this hypothesis test and compare it with , and (as always)

    we only reject H0if the p-value is less than . Remember that when

    the alternative hypothesis goes to the left side, the p-value refers to the

    tail probability to the left of the test statistic z. Given the way the p-value

    is calculated, we always reject H0when p-value < , and accept H0

    otherwise.

    Given the test statistic z =1.10 for H0: = 80 versus H1: 78, and the test statistic was z= 2.72?

    This was calculated on page 5 as 0.0033.

    43210-1-2-3-4

    2.5

    2.0

    1.5

    1.0

    0.5

    0.0

    43210-1-2-3-4

    2.5

    2.0

    1.5

    1.0

    0.5

    0.0

    Page 29 of 156

    1.10

    2.72

    Here P-value is a left tail

    probability because H1goe

    to the left !

    Here P-value is a rig

    tail probability beca

    H1goes to the right

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    34/160

    8

    Big Picture RecapLet 0represent the constant benchmark to which we wish to compare , &

    consider three scenarios.

    1)H0: = 0 2)H0: = 0 3)H0: = 0

    H0

    also written as: 0 0 No Other WayAlternative

    Hypothesis H1H1: > 0 H1: < 0 H1: 0

    Critical Value z -z z/2

    Decision Rule

    Reject H0 if: z > z z < z |z| > z/2(Note that z is the test statistic )

    Definition ofp-value Tail prob. > z Tail prob. < z Tail prob. > |z|

    Example Example 1

    (see bottom p.6)

    Example 2

    (see bottom p.6)

    Example 3

    (new example)

    Picture of

    test statistic and

    p-value (shaded area)relative to standard

    normal distribution

    Null Hypothesis H0 : = 78 H0 : = 80 H0 : = 80Alternative H1: > 78 H1: < 80 H1: 80Significance Level = 0.05 = 0.10 = 0.10

    Test Statisticz = = . = . | | = .

    Critical Value Z0.05= 1.645 -Z0.10= -1.28 Z0.10/2=Z0.05=1.6

    Decision Reject H0 Accept H0Accept H0Becau

    |1.10| 1.645

    Page 30 of 156

    P-Value=0.0033

    P-value=0.86

    P-Value=0.27

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    35/160

    9

    Glossary

    = significance level = maximum probability of making a type I error.

    p-value = tail-probability that corresponds to test statistic, that is calculated for

    specific alternative hypothesis H1.

    = probability of making a type II error (not rejecting H0when H1is true).

    Power = 1- = probability of making correct decision when H1is true.

    How does power change with sample size?

    Power increases as sample size increases (ceteris paribus).

    Because as n increases, the test statistic becomes larger in absolute

    value, and is more likely to exceed the critical value in the appropriate

    direction. See the 3rd-to-last row of the table on the last page (i.e., the

    test statistic formulas). Another way to think about it: as the test

    statistic becomes larger in absolute value in the direction supporting H1,the p-value decreases.

    How does power change with ?

    Power increases as increases (ceteris paribus).

    Because as increases, the critical value decreases in absolute value,

    and is more likely to be exceeded by the test statistic, see the

    penultimate row of last page (i.e., the critical values and how theychange with ).

    Bruce Cooil, 2016

    Page 31 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    36/160

    Lecture 4

    One and Two-Tailed Tests, Tests on a Sample

    Proportion, & Introduction to Tests on Two Samples

    Main References

    (1) Ch.9: 9.3-9.4, Summary, Glossary, App. 9.3;Ch.10: 10.1

    (2) The Outline "Tests on Means and

    Proportions"(referred to as "The Outline")

    TopicsI. Tests on Means and Propor tions from One Sample (Reference:

    9.3-9.4)

    Example of a two-tailed test (Case 1) When to use t-values (Case 2) Tests on a sample proportion (Case 3)

    II. Tests on M eans from Two Samples (Ref: 10.1)

    Tests on means from two large samples (Case 4) Tests on means when it is appropriate to assume variances are

    equal(Case 5)

    I. Tests on Means & Proportions from One Sample

    Summary of Last Time (1-Tailed Versions of Case 1)

    Last time we first considered the one-tailed hypothesis test:

    H0: = 78 versus H1: > 78.

    (OR H0: 78 )In this case we use the decision rule: reject H0if:

    z = [ - 78 ] /(s/n) > z,

    or equivalently if > 78 + z(s/n). Otherwise we

    accept H0.

    Page 32 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    37/160

    2

    Then we considered the one-tailed test going the other way (

    still represents the mean mpg of my new hybrid). I make the

    claim that the average mpg is 80, and so my competitor wants to

    test:

    H0: = 80 versus H1: < 80 .

    The decision rule will be to reject H0in favor of H1if:

    =

    ( : =.

    .= . )

    supports H1. If is calculated using observations from adistribution where < 80 (as my competitor believes is the

    case), then we will tend to get small values of z. So the decision

    rule would be, reject H0in favor of H1if

    =

    <

    (or equivalently if: < (/ ).

    [Note that this is just Case 1 in the outline: 0refers to the constant used

    in the null hypothesis, which is "80" in this last case.]

    Example of a 2-tai led TestA two-tailed test would be:

    H0: = 80 versus H1: 80.So, for example if =0.05, we would reject H0in favor of H1if

    |z| > z0.025(because /2 = 0.025).

    What do we conclude if we do this 2-tailed test?

    (Recall that: n= , = . , / =1.24.)Test Stat:Z =(81.37 - 80)/1.24 = 1.10 (SAME as above )

    Critical Value:Z0.025= 1.96

    Conclusion:AcceptH0 .

    (is not significantly different from 80.) Page 33 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    38/160

    3

    When to Use T-values

    Case 2 (p.1 of outline) is identical to Case 1, except that the

    critical values in Case 2 are based on the t-statistic. When

    should we use Case 2?

    A good conservative approach is to always use t-values when

    you have to estimate (which isalways) but it does not make

    much of a difference if n >30.

    Lets dothe two-tailed test above, using Case 2 (=0.05):

    H0: = 80 versus H1: 80 (SAME)Test Statistic: t = 1.10 (Same)

    Critical Value: ./ = . = .Conclusion: Accept H0.

    Tests on a Sample Proport ion

    Example: Let p = proportion of customers in the population that

    prefer my new product. Suppose I need to test ( = 0.05):

    H0: p = 0.1 versus H1: p > 0.1.

    This is case 3 in the outline (with p0= 0.1).The details are just

    like case 1 except we use a different standard error of the mean.

    I f 30 of 100( =0.3 )randomly selected customers prefer myproduct, can I show that more than 10% of the population of

    all customers prefer my product at the 0.05 level?

    Test Stat ist ic :

    = ()

    = ....

    = .. = . > . =.

    Crit ical Value:Z0.05= 1.645

    Conclus ion :Reject H0(YES!).

    Only

    difference

    from Case 1

    Page 34 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    39/160

    4

    II. Means from Two Samp les

    Case 4: What To Do When Both Samples Are Large

    Example:

    The owner of two fast-food restaurants wants to compare the

    average drive-thru experience times for lunch customers at eachrestaurant (experience time is the time from when vehicle

    entered the line to when the entire order was received). There is

    reason to believe that Restaurant 1 has lower average experience

    times than Restaurant 2 because its staff has more training.

    Suppose n1experience times during lunch are randomly selected

    for Restaurant 1, n2from Restaurant 2 with following results(units: minutes): n

    1

    = 100 = s1= 0.7n

    2

    = 50 = . s2= 0.5 .Why do we use Case 4 on page 1 of the outline?

    Both Samples 30 (& Independent).

    If we want to show Restaurant 1 has a lower average experience

    time, what are the appropriate hypotheses and what can we

    conclude (at the 0.1 level)?H0: 1- 2= 0 (OR: 0 ) In Outline: D0= 0.

    H1: 1< 2 OR 1- 2< 0

    Test Statistic:

    =

    +

    = .(.) +

    (.)

    = ..

    = .

    Critical Value:-Z0.10= -1.28 Conclusion: Reject H0.

    (YES!)

    What would happen if we test at the 0.01 level?

    NewCritical Value:-Z0.01= -2.33 (Still Reject H0)

    Is there any reason to pick in advance?

    Yes, its more objective!

    Page 35 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    40/160

    5

    Would Welchs t-test (p. 376) make a difference?In this case we use the same test statistic but compare it with a

    critical value from the t-distribution with degrees of freedom,

    So for the =0.1 and =0.01, the criticalvalues are:

    0.()

    = 1.29, 0.0()

    = 2.35, respectively, and

    the conclusions are the same in each case!

    Case 5: What If We Are Willing To Assume Equal

    Variances?

    Example : I'm comparing weekly returns on the same stock

    over two different periods. The average sample return is larger

    during period 2. Can one show that the return during period 2

    is significantly higher than during period 1 at the 0.01 level?The data are: n1 = 21, = . %,

    = .

    n2 = 11, = . %, = . .

    What are the appropriate hypotheses?

    H0: 1- 2= 0

    H1: 1< 2 1- 2< 0.

    It may be risky to rely only on the CLT. (Why?)

    Technically I make 3 additional assumptions if I use Case 5:

    (1) observations are approximately normal,

    (2) the two populations have equal variances and

    (3) samples are independent.

    .133

    150

    )50/5.0(

    1100

    )100/7.0(

    )100/1(

    1

    )/(

    1

    )/(

    )//(2222

    2

    2

    2

    2

    2

    2

    1

    2

    1

    2

    1

    2

    2

    2

    21

    2

    1

    n

    ns

    n

    ns

    nsnsdf

    Page 36 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    41/160

    6

    The test statistic in Case 5 allows us to use a pooled estimate of

    the variance:

    . .

    .

    . . The test statistic is:

    t

    . ..

    ..Suppose I do this test at the 0.01 significance level. What would

    be the critical value for the test statistic "t" and what would be

    the conclusion?

    Critical Value: .? . .-2.457Conclusion: Reject H0 . (YES!)

    What would be the two-tailed test in this case? (Specify H0&

    H1.) Also give the critical value and conclusion if testing at the

    0.01 level?

    H0: 1- 2= 0 versus H1: 1- 20

    Test Statistic: t = -2.6 (Same as for one-tailed test)

    Critical Value: ./ . = 2.75Conclusion: Accept H0 .(No!)

    (Because |t|=2.6 < 2.75 .)

    JustlikeCase4withspused

    inplaceofs1"ands2

    Page 37 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    42/160

    7

    About the t-Distribution (Reference: Bowerman, et al., pp. 344-346)

    According to the Central Limit Theorem, (the sample mean of n observations),

    has approximately a normal distribution with mean , and standard deviation /n .

    Also, this approximation improves as the sample size, n, increases. Consequently,by the Central Limit Theorem, the standardized mean,

    =

    ,

    has approximately a standard normal distribution. We have been using this single

    resul t to justi fy the construction of confidence intervals and hypothesis tests.

    When using this result, we have generally been approximating by substituting

    the sample standard deviation, s,for it. If the sample is large enough, thisdoesnt imposemuch additional error. But when samples are smaller (e.g., n < 30),

    the convention is to accommodate the additional error (caused when using s for )

    by using the fact that i f theoriginal distributionwas normal, then the t-statistic,

    =

    ,

    really has what is referred to as at-distr ibution wi th n-1 degrees of freedom. The

    degrees offreedom number, n-1, refers to the amount of information that thesample standard deviation, s, contains about the true standard deviation . If we

    have only 1 observation, we have no information about (n-1= 1-1 = 0), if we

    have 2 observations we have essentially 1 piece of information about , and so on.

    This is the reason we divide by the degrees of freedom n-1, when calculating s,

    = [ ( )/ ( )].

    The real question becomes: why should we use the t-distribution when i t rel ies

    on the strong assumption that the ori ginal distribution is normal, which is

    exactly the type of assumption we were trying to avoid by using the Central Limit

    Theorem?! The answer is essentially this: by using t-values in place of z-values

    we are doing something that accommodates the additional inaccuracy we generate

    by using s to estimate , and inpractice it works quite well even when the parent

    distri bution is not normal! Of course, t-values converge to z-values as the sample

    size increases: see the t-table.

    Page 38 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    43/160

    Lecture 5: More Tests on Means from Two Samples

    Outline: (Reference: Bowerman et al., 10.2-10.3, Appendix 10.3; the Outline

    Tests Concerning Mean and Proportions)

    Tests on Two Proportions (Case 6, Ch. 10.3) Everything to Know About Odds, Odds Ratios and Relative Risk

    Tests on Paired Samples (Case 7, Ch. 10.2)

    Tests on Two Proportions(Case 6: Large Samples)

    This example comes from an article, 10 Most Popular

    Franchises published in the Small Business section of

    CNN.com (April, 2010):http://money.cnn.com/galleries/2010/smallbusiness/1004/gallery.Franchise_failure_rates/index.html.

    (More recent data through early 2016 consist primarily of a

    smaller sample of settled loans from the same period:

    http://fitsmallbusiness.com/best-franchises-sba-default-rates/# . )

    It provides franchise failure rates based on loan data from theSmall Business Administration (October, 2000 through

    September, 2009) and it illustrates all of the issues one will

    typically face when comparing rates (expressed as proportions).

    The 10 most popular franchises are: 1)Subway, 2)Quiznos,

    3)The UPS Store, 4)Cold Stone Creamery, 5)Dairy Queen,

    6)Dunkin Donuts, 7)Super 8 Motel, 8)Days Inn, 9)Curves for

    Women, and 10)Matco Tools. Super 8 Motel and Days Inn have

    the highest start-up costs (average SBA loan sizes are 0.91 and

    1.02 million dollars, respectively), and nominally Super 8Motels seem to have a lower failure rate. Here are the data.

    SBA Loans Failures*

    Super 8 Motel 456 18

    Days Inn 390 23

    *Failures are loans in liquidation or charged off.

    Page 39 of 156

    http://money.cnn.com/galleries/2010/smallbusiness/1004/gallery.Franchise_failure_rates/index.htmlhttp://money.cnn.com/galleries/2010/smallbusiness/1004/gallery.Franchise_failure_rates/index.htmlhttp://money.cnn.com/galleries/2010/smallbusiness/1004/gallery.Franchise_failure_rates/index.htmlhttp://money.cnn.com/galleries/2010/smallbusiness/1004/gallery.Franchise_failure_rates/index.htmlhttp://money.cnn.com/galleries/2010/smallbusiness/1004/gallery.Franchise_failure_rates/index.html
  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    44/160

    2

    Is there a higher failure rate for SBA loans to Days Inn than for

    Super 8 Motel at the 0.05 level?

    H0:

    p1- p2= 0 (Or 0)

    H1: p1- p2> 0

    (Where p1= proportion of Days Inn failures;

    p2= proportion of Super 8 Motel failures.)

    Are the sample sizes sufficiently large to use the normal

    approximation in CASE 6?(In Case 6, the relevant sample sizes are the number of successes and failures ineach sample; each must be at least 5, i.e., )p-(1n,pn),p-(1n,pn 22221111 .)

    YES, all 4 groups 5.

    The sample estimates of p1and p2are:

    = 2= 0.0590; 2= 846= 0.0395.Consequently, the test statistic is:

    =

    (

    ) +

    (

    )

    = . . .(.) + .(.)

    = 0.01950.0151= 1.30.

    Page 40 of 156

    D0= 0

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    45/160

    3

    OR, following the texts approach (which is appropriate only

    when the null hypothesis states that the proportions are equal),

    we could also use the overall rate of failure to calculate the

    standard error of the test statistic. Since,

    =

    =23 +1

    39 + = 0.0485 (see data on p.1),the test statistic becomes:

    = .9 .39 .(.) + .(.)

    =.19.1 = 1.32 .

    With either test statistic we get essentially the same result:

    Critical Value: Conclusion:

    Z0.05= 1.645 Accept H0(No,the rate at Days Inn is not significantly higher.)

    Which approach does MINITAB take?

    Page 41 of 156

    Case 1

    Case 2

    Cases 4 & 5

    Case 7

    Case 3

    Case 6

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    46/160

    4

    If We Do NOT Pool (which is the default unless we click on the usedpooled estimate... option ):

    Test and CI for Two Proportions

    Sample X N Sample p

    1 23 390 0.058974

    2 18 456 0.039474

    Difference = p (1) - p (2)

    Estimate for difference: 0.0195007

    95% CI for difference: (-0.00992794, 0.0489293)

    Test for difference = 0 (vs not = 0): Z = 1.30

    P-Value = 0.194

    Wait!! This p-value is for a two-sided test!

    We need the p-value forH1: p1>p2 , which is: 0.194/2 =0.097

    => Accept H0.

    Three options are provided here

    1)Both samples in one column

    2)Each sample in its own colum

    3)Summarized data.

    I could have selected the

    appropriate one-sided alternati

    here but instead used the defaul

    option (the two-sided test).

    3210-1-2-3

    0.4

    0.3

    0.2

    0.1

    0.0

    Page 42 of 156

    D0= 0

    The default setting is to notpool !

    Sum of two tail probabilities

    tail prob. is

    0.194/2 =0.097

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    47/160

    5

    If We Pool:

    Test and CI for Two Proportions

    Sample X N Sample p

    1 23 390 0.058974

    2 18 456 0.039474

    Difference = p (1) - p (2)

    Estimate for difference: 0.0195007

    95% CI for difference: (-0.00992794, 0.0489293)

    Test for difference = 0 (vs not = 0):

    Z = 1.32 P-Value = 0.188 1-sided p-value=0.094 .

    Other Caveats and Notes

    1)1& 2 may seriously underestimate actual rates offailure, since the study includes recent loans to franchises

    that probably will fail within 5 years (but had not yetfailed during the study period). To get better estimates,

    each loan should be observed over a period of equal

    duration. For example, we might observe each over a 5

    year period (from the time of the loan is granted), and

    1& 2would then be legitimate estimates of the failurerate of SBA loans to each franchise.

    2) Sometimes data of this type are summarized in terms of

    odds and odds ratios, especially in health/medical care

    applications.

    Page 43 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    48/160

    6

    Odds, Odds Ratio and Relative Risk

    DefinitionIf an event occurs with probability p, then the odds of it occurring is

    defined as p/(1-p).

    I n the ExampleIf we use 1 = 6%as an estimate of the failure rate at Days Innfranchises and 2 = 4%as the corresponding estimate for Super 8Motel franchises, then the odds of failure are:

    Days Inn franchises: 0.06/(1-0.06)= 0.0638;

    Super 8 franchises: 0.04/(1-0.04)= 0.0417.

    And the odds ratio (or ratio of odds for failure for Days Inn versus

    Super 8 Motels) is:

    Odds Ratio/() /() =

    .. = . , (1)

    indicating that the odds of failure is 1.5 times higher for the Days Innfranchises. (To turn this into a health care example: imagine

    companies are people, and that failure is a disease to which certain

    people are more susceptible.)

    Alternatively, since this is a prospective study, sometimes the results

    are summarized in terms of the relative riskof failure (for small versus

    large), which is simply the ratio of

    1 2:

    Relative Risk = .. = . , (2)indicating that failure is 1.5 times more likely for the Days Inn. Of

    course, remember that 1& 2are not good estimates, which is acommon problem in health/medical applications. Also, 1is not evensignificantly larger than 2at the 0.05 level!

    Page 44 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    49/160

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    50/160

    8

    If I expect the CGM Focus Fund to outperform Fidelity Growth

    Strategies Fund during 2010-2015, I might ask the following research

    question: Does the CGM Focus have an average return that is

    significantly more than 0.5% higher than the average annual

    return of the Fidelity Growth Strategies during 2010-2014 (=0.1)?

    Then: H0:CGM- Fidelity =0.5(OR < 0.5) H1:CGM- Fidelity > 0.5 .

    The actual data are below.

    Year CGM Focus

    Fund

    F idel ity Growth

    Strategies Fund

    Differences:

    =CGMFidelity2010 16.94 25.63 - 8.692011 - 26.29 - 8.95 - 17.34

    2012 14.23 11.78 2.452013 37.61 37.87 - 0.262014 1.39 13.69 - 12.302015 - 4.11 3.17 - 7.28Mean 6.63 13.87 - 7.24

    We cant apply cases 4 or5 to this problem because the annual returnsare from the same years and are affected by the same market forces.

    Consequently,

    the two samples are not independent!But we can take differences (CGM minus Fidelitysee the last columnin the table above) and apply Case 2 to the single sample of differences.

    The following hypotheses are equivalent to the ones above but arewritten in terms of the differences:

    H0: Differences=0.5 (OR < 0.5) H1: Differences> 0.5.The mean and standard deviation of the five differences are:

    =7.24; = ( ) = 7.38. Thus, the standard error of themean is:

    =

    .6 =3.01. Here are the details of the case 2 test.

    Test statistic: = =7.240.5

    3.01 = 2.57Critical Value: () = .() =. Conclusion:Accept H0 (No!)

    The averagedifference makeclear we cannotreject H0 (Fidelitoutperforms CGBut we formallyapply the testanyway (as anillustration).

    Because t is not greater than 1.48Page 46 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    51/160

    9

    Addendum: Finding the Right Case

    Large Sample (Case 1)

    Mean

    Small Sample (Case 2)

    1 Sample

    Proportion (Case 3)

    Large Samples (Case 4)

    When Either Sample Is Not Large, Use Welchs t

    Means (OR: always useWelchs t)

    Equal Variances (Case 5)

    2 samples

    Proportions (Case 6)

    Paired Samples (Case 7)

    Bruce Cooil, 2016

    Page 47 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    52/160

    Lectu re 6: Simp le Lin ear Regression

    Outline: Main reference is Ch. 13, especially 13.1-13.2, 13.5, 13.8

    The Why, When ,What and How of Regression

    Purposes of Regression Three Basic Assumptions:

    Linearity, Homoscedasticity, Random Error Estimation and Interpretation of the Coefficients

    Decomposition of SS(Total) = ( )=1(See third equation on page 492:

    SS(Total) is referred to there as Total Variation.) Measures of fit: MS(Error) (the variance of error), R2(Adjusted)

    Purposes

    1. To predict values of one variable (Y) given the values of

    another (X). This is important because the value of X may

    be easier to obtain, or may be known earlier.

    2. To study the strength or nature of the relationship between

    two variables.

    3. To study the variable Y by controlling for the effects (orremoving the effects) of another variable X.

    4. To provide a descriptive summary of the relationship

    between X and Y.

    Assumpt ions

    The basic model is of the form:

    (1) 0 + 1 + ,where 0, and 1are called coefficients, and represent unknownconstants (that will be estimated in the regression analysis), and

    is used to represent random error. The error,, is assumed

    Page 48 of 156

    What

    How

    Why

    When

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    53/160

    2

    to come from a distribution with mean 0 and constant variance

    2. The main result of the regression analysis is to provide

    estimates of the coefficients so that we can use the estimatedregression equation,

    (2) 0 + 1to predict Y.

    Notes on Terminology and Notation

    yis the predicted value and is referred to as the "fit" or the

    "fitted value."

    The residuals, ei,(the observed errors) are defined as thedifference between the actual and the predicted value of Y,

    i.e.,

    ei= [residual for observation i]= . Note that the theoretical error term, i, from equation (1), is

    slightly different from the residuals:

    i yi (0+ 1xi) versus ei yi (b0+ b1xi).

    Formal ly the model makes the assumption that the errors (the

    i)are a random sample from a distr ibution with mean 0 and

    variance 2. This one assumption is sometimes referred to in 3

    parts.

    1. Linearity: there is a basic linear relationship between y andx as shown in (1), which is equivalent to saying that the

    real mean of the errors (the i) is 0.

    2. Homoscedasticity: the variance of the errors iis constantfor all yi.

    3. Random Error: the errors iare independent from one

    another.

    Page 49 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    54/160

    3

    Two plots provide a way of checking these assumptions:

    To check linearity: the plot of y versus x;

    To check linearity, homoscedasticity and randomness: the

    plot of the residuals, ( ), versus the fit values,.Plots of standardized residuals versus fit are especiallyuseful.

    Imagine I have developed a special new product and that Idevelop a model to estimate the cost of producing it using data

    from the first 5 orders.

    Order Numberof

    Units(x)

    Cost(y)($1000)

    Predicted Cost(or fit)

    Residual

    ( )1 1 6 5 12 3 14 11 3

    3 4 10 14 -4

    4 5 14 17 -3

    5 7 26 23 3

    Page 50 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    55/160

    4

    Estimation and Interpretat ion o f Coefficients

    In the plot above, the open circles are the actual observations of

    y & x (cost & units), and the solid circles are the values of y & x(predicted, or fitted cost, & units). The vertical distances

    between open circles and solid circles represent the observederrors or residuals of the regression model. The estimated

    regression line is:

    0 + 1(3) 2 + 3.The two estimated coefficients,

    0 2 and 1 3,are chosen to minimize the sum of squared residuals or errors

    that are made when we use the estimated regression equation to

    predict cost (y). Note that in this case the sum of squared

    residuals (or errors) is (see last column of table on previouspage):

    SS(Error) = ( )=1= 12+32+(-4)2 +(-3)2+32= 44.

    This is apparently the smallest value of sum of squared error

    obtainable among all possible choices of b0and b1.

    Please interpret these coefficients.

    b0: predicted (or average) value of Y when X=0

    (in this application it is the fixed cost) ;

    b1: average change in Y per unit change in X

    (in this application it is the variable cost).

    Page 51 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    56/160

    5

    Decompo sit ion o f SS(Total)

    Without this regression model, we might be forced to use the

    average, , to predict future values of y. To get an indication ofhow we1l would do as a prediction, we can find the sum ofsquared differences between each yi& :

    ( )

    =1=(6-14)2+(14-14)2+(10-14)2+(14-14)2+(26-14)2=224

    (see the 3rdcolumn of the table on the next page). This sum ofsquares is referred to as SS(Total),i.e.,

    SS(Total) = ( )= = 224 .

    The regression model succeeds in reducing the uncertainty about

    yif SS(Error)is significantly less than SS(Total). Also,

    regression models actually allow us to decompose SS(Total)into two parts, SS(Error)and SS(Regression):

    SS(Total) = SS(Regression) + SS(Error);

    where: SS(Regression)= ( ) =1= the sum of squares of the fitted values around

    their mean (the mean of the values is ).=(5-14)2+(11-14)2+(14-14)2+(17-14)2+(23-14)2

    = 180

    (see the 4th column of the table on the next page).

    Page 52 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    57/160

    6

    So in this case, the decomposition of SS(Total) works out as

    follows: SS(Total) = SS(Regression) + SS(Error)

    224 = 180 + 44.

    Summary of the Decomposition of SS(Total)

    Units(x) Cost(y) ( ) ( ) ( )1 6 (6-14)2 (5-14)2 12

    3 14 (14-14)2 (11-14)2 32

    4 10 (10-14)2 (14-14)2 (-4)2

    5 14 (14-14)2 (17-14)2 (-3)2

    7 26 (26-14)2 (23-14)2 32

    TOTALS: 224 = 180 + 44Name of SS: SS(Total)= SS(Regress.)+ SS(Error)

    Minitab Summary: Main Regression Output of Version 17

    (See Page 11 for a Compariso n w ith Excel)

    Regression Analysis: Cost(y) versus Units(x)

    Analysis of Variance

    Source DF Adj SS Adj MS F-Value P-ValueRegression 1 180.00 180.00 12.27 0.039

    Units(x) 1 180.00 180.00 12.27 0.039

    Error 3 44.00 14.67

    Total 4 224.00

    Model SummaryS R-sq R-sq(adj) R-sq(pred)

    3.82971 80.36% 73.81% 38.11%

    CoefficientsText Notat ion:

    s

    Term Coef SE Coef T-Value P-Value VIFConstant 2.00 3.83 0.52 0.638Units(x) 3.000 0.856 3.50 0.039 1.00

    Regression EquationCost(y) = 2.00 + 3.000 Units(x)

    224/4

    Page 53 of 156

    "MS" refers to "Mean Square" which is always t

    corresponding SS (Sum of Squares) divided byDF (degrees of freedom): MS=SS/DF.

    Variance of Error

    Variance of Y

    Measures of Fit

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    58/160

    7

    Measures of Fit (Mod el Summary)

    Note that on the line just below the Analysis of Variance table in

    the MINITAB output, there are 4 primary measures of fit:

    s=3.83, R-sq=80.4%, R-Sq(adj)=73.8%, R-sq(pred)=38.1 .

    The first three can be calculated using the information in the

    Analysis of Variance table. The standard deviation srepresents the estimated standard deviation of the residuals, or

    observed errors, also written as s,s= [Variance of observed errors]

    1/2

    = [SS(Error)/(n-[# parameters in model])]1/2

    = [44/(5-2)]1/2= [14.67]1/2= 3.83.

    [The text calls "s" the "standard error," and writes it as simply s.See the shaded expressions on page 479.]

    Note that the variance of the errors, 14.67, is also provided in

    the Analysis of Variance table, and is often referred to there as

    "MS(Error)" or Adj MS(Error)which is shorthand for"adjusted mean squared error." "Mean Square" is generally used

    as a synonym for "variance." For these data:

    s= [MS(Error)]1/2 = [14.67]1/2= 3.83 .

    Another obvious overall measure of how well the model

    performs would be:

    ()() ()()

    which is the proportion of SS(Total) generated or "explained" by

    the model. R2is often referred to as the "coefficient of

    determination."

    Page 54 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    59/160

    8

    In this example:

    ()() . . %

    (OR: ()() )where 180," 44,and 224" are all shown in the Analysis of

    Variance table.

    A better measure of fit is found by adjusting R2so that it

    estimates the proportion of the varianceof y that is explained

    by the fitted values from the model. This proportion is referredto as "R2(Adjusted),"

    () ()() .In this case:

    ()

    ( )

    [ ( ) ] .

    . . %

    Note that 14.67" is shown in the Analysis of Variance

    table.

    R2(Adj) represents the proportion of the variance of Y that is"explained" (or generated) by the regression equation, while s

    represents the estimated standard deviation of the residuals. I n

    this example, 73.8% of the variance in cost (Y) is" explained"

    by the model that uses units (X) as a predictor and the

    standard deviation of the errors made by this model is 3.8

    thousand doll ars.

    Page 55 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    60/160

    9

    In simple linear regression, R2(unadjusted) is generally written

    as "r2" and it represents the squared correlation coefficient (alsosee page 494 of the text). The estimated correlation between

    cost (Y) and units (X) is 0.896. See the correlation matrixbelow.

    MTB > corr c1 c2 c3Correlations (Pearson)

    Units(x) Cost(y)Cost(y) 0.896

    0.039

    FITS1 1.000 0.896* 0.039

    Formulas for Correlation (Pearson Correlation) :

    .

    )1(

    1*

    )1(

    1

    )1/(

    2/1

    n

    1i

    2n

    1i

    2

    n

    1i

    yyn

    xxn

    nyyxx

    ii

    ii

    (See pp. 125-127, 492-495 of the text for more examples and discussion.)

    Alternatively:r = (sign of b1) * [square root of R2(f rom simple regression)].

    I n the example: r = + 896.0804.0 .

    Note that in general, r is between -1 and +1.

    Cell Contents: Correlation

    P-value

    In spreadsheet:c1: Units(x); c2: Cost(y); and c3: Fits1

    r

    Page 56 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    61/160

    10

    Discussion Questions

    (The Regression Output Is Redisplayed on the Next Page.)

    1. Use the regression equation to predict the cost(Y) when number of units(X) is 4.

    y = b0+ b1X = 2 + 3(4) = 14 thousand dollars

    2. What was the actual cost for an order when units =4? (What is theresidual or error at that point?)

    From page 3: When Units (X) is 4, Y is 10,

    thus: residual = y - y = 10 - 14 = -4.

    3. What is the sample variance of cost (Y)? (See next page.)

    SY2= SS(Total)/4 = 224/4 = 56( SY= 56 = 7.5 thousand )

    4. What is the estimated variance of the residuals (or errors) of theregression?

    S2 = MS(ERROR) = 14.67 (Find this in Analysis of Variance Table!)

    5. How good is the fit?There are two ways of measuring fit (see the Model Summary):

    S= 3.83 thousand dollars

    R2(Adjusted) = 73.8%

    (74% of the Variance in cost(Y) is explained by the model.)

    6. Show how R2(Adjusted) is related to the variance of cost and the varianceof the residuals?

    = 1 (

    )

    = 1 14.6756 7. Show how R2(unadjusted) is related to the correlation between cost (Y) and units (X).

    R2 = r2 (r represents the sample correlation; this only works insimple linear regression!!)

    (On next page: R2= 0.804; on page 9: r = 0.896.)

    Page 57 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    62/160

    11

    Appendix

    Comparison of MINITAB Output (Versions 14-17) with Excel

    MINITAB 17 Output

    Regression Analysis: Cost(y) versus Units(x)

    Analysis of VarianceSource DF Adj SS Adj MS F-Value P-ValueRegression 1 180.00 180.00 12.27 0.039Units(x) 1 180.00 180.00 12.27 0.039

    Error 3 44.00 14.67 #4Total 4 224.00

    Model Summary

    224/4 = Var(Y) for #3S R-sq R-sq(adj) R-sq(pred)

    3.82971 80.36% 73.81% 38.11%#5

    CoefficientsTerm Coef SE Coef T-Value P-Value VIFConstant 2.00 3.83 0.52 0.638Units(x) 3.000 0.856 3.50 0.039 1.00

    Regression EquationCost(y) = 2.00 + 3.000 Units(x)

    Excel OutputSUMMARY OUTPUT

    Regression Statistics

    Multiple R 0.896421

    R Square 0.803571

    Adjusted R

    Square 0.738095

    Standard

    Error 3.829708

    Observations 5

    ANOVA

    Df SS MS F

    Significance

    FRegression 1 180 180 12.27273 0.039389

    Residual 3 44 14.66667

    Total 4 224

    Coefficients

    Standard

    Error t Stat P-value Lower 95%

    Upper

    95%

    Lower

    95.0%

    Upper

    95.0%

    Intercept 2 3.829708 0.522233 0.637618 -10.18784 14.18784

    -

    10.18784 14.18784

    X Variable 1 3 0.856349 3.503245 0.039389 0.274716 5.725284 0.274716 5.725284

    Page 58 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    63/160

    12

    Interpreting the Plot of Residuals versus Fit

    One way to check on the three assumptions (linearity,

    homoscedasticity and random error) is to plot the residuals

    (errors) against the predicted (or fitted) values .

    There are hardly enough observations here to be very confident

    in the assumptions. But in general we look for symmetry around

    the horizontal line through zero as an indication that the

    assumptions of linearity and randomness are met. To confirmhomoscedasticity, we look for roughly constant vertical

    dispersion around the horizontal line through zero.

    The ideal situation generally looks something like the followingplot.

    -10 0 10 20

    -3

    -2

    -1

    0

    1

    2

    Fitted Value

    Residual

    Residuals Versus the Fitted Values

    (response is C3)

    Page 59 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    64/160

    13

    Here is a situation where the linearity assumption is violated.

    Here is a common situation (below) where homoscedasticity is

    violated: notice how the residuals show increasing vertical

    dispersion around the horizontal line through zero as the fitted

    values increase.

    5 10 15

    -10

    0

    10

    20

    30

    Fitted Value

    Residual

    Residuals Versus the Fitted Values

    (response is C3)

    -10 0 10 20 30

    -10

    0

    10

    Fitted Value

    Residual

    Residuals Versus the Fitted Values(response is C3)

    Page 60 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    65/160

    14

    How to Do This Regression Analysis in MINITABMinitab 17

    Page 61 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    66/160

    15

    How to do a Regression Analysis in Excel

    Click into the Data menu and check for the Data Analysis

    option (far right).

    If the Data Analysis option is not there :

    Start from File menuClick on OptionsClick onAdd-InsSelect Analysis ToolPak & hit Go near

    the bottom of the dialog box.

    Otherwise start from the Data menu:Click on Data Analysis (far right), Select Regression and

    then specify the Y- and X-range in the dialog box.

    (You can simply click into each range box and then move themouse directly into the spreadsheet to select the numerical

    data cells from the appropriate column(s) of the spreadsheet.

    The appropriate range of cells should then appear in the range

    box. The Input X Range may consist of several columns,

    each column for a different predictor.)

    Other Good References

    See page 519 of the text for an example with great screenpictures. Another good reference is:

    www.wikihow.com/Run-Regression-Analysis-in-Microsoft-Excel.

    Page 62 of 156

    http://www.wikihow.com/Run-Regression-Analysis-in-Microsoft-Excelhttp://www.wikihow.com/Run-Regression-Analysis-in-Microsoft-Excelhttp://www.wikihow.com/Run-Regression-Analysis-in-Microsoft-Excel
  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    67/160

    Lecture 6 Addendum: Terminology, Examples, and Notation

    Regression Terminology

    Synonym Groups

    1) Y, Dependent Variable, Response Variable

    2) X, Predictor Variable, Independent Variable

    3) , Prediction, Predicted Value, Fit, Fitted Value4) Variance of Y, MS(Total), Adj MS(Total)

    5) Variance of Error, MS(Error), Adj MS(Error)

    6) , Error, Residual7) Coefficientsare sometimes referred to using the more general term parameters.

    Coefficientsare the parametersthat are used in linear models.

    Main Ideas

    Simple linear regression refers to a regression model with only one predictor. The underlying

    theoretical model is:

    = 0 + 1 + ,where yrepresents a value of the dependent variable, xis a value of the predictor, representsrandom error and and 1represent unknown constants.The corresponding estimate regression equation is:

    = 0 + 1.The regression coefficient b0and b1 refer to sample estimates of the true coefficients 0and 1,

    respectively.

    The sample correlation coefficient, r, estimates the true (or population) value of the correlation,

    , which is a measure of the degree to which two variables (Y and X) are linearly related.

    Of course, the sample correlation (r) and the slope coefficient (b1)are closely related:

    1 = = (,) () , (*)where and are the sample standard deviations of Y and X respectively.The corresponding relationship between the true values, 1 and ,is:

    1 = =(,)

    =

    (,)2

    .

    Page 63 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    68/160

    Examples of Correlation

    Correlation: r= 0.725 (R2(unadjusted) = r2x100% = 52.6%); y= -756.6 + 12.25 x.

    Change in GDP (Y): change in Annual U.S. GDP in billions of dollars

    Consumer Sentiment (X): Index of financial well-being and the degree to which consumer

    expectations are positive (based on five questions on a survey conducted by the University of

    Michigan. (https://data.sca.isr.umich.edu/fetchdoc.php?docid=24770)

    Correlation: r= 0.725 (R2(unadjusted) = r2x100% = 51.3%); y= 44.45 +0.3045 x.

    Data are from the World Health Organization (Life Expectancy (Y) as of 2015, Literacy Rate (X)

    for 2007-2012).

    11010090807060

    500

    250

    0

    -250

    -500

    Consumer Sentiment

    ChangeinGDP

    Change in GDP vs Consumer Sentiment (1995-2015)

    2009

    1999

    2015

    1009080706050403020

    85

    80

    75

    70

    65

    60

    55

    50

    Literacy Rate (for People >=15 years-old)

    LifeExpe

    ctancy(BothSexesinyears)

    Life Expectancy(Both_Sexes) vs Literacy Rate (>=15 years) for 112 Nations

    Page 64 of 156

    https://data.sca.isr.umich.edu/fetchdoc.php?docid=24770https://data.sca.isr.umich.edu/fetchdoc.php?docid=24770https://data.sca.isr.umich.edu/fetchdoc.php?docid=24770https://data.sca.isr.umich.edu/fetchdoc.php?docid=24770
  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    69/160

    Correlation: r = -0.891 (R2(unadjusted) = r2x100% = 79.4%);

    MPG = 41.710.006263 Weight.

    Data are for 14 automobiles (2005) from www.chegg.com.

    Correlation: r= 0.018 (R2(unadjusted) = r2x100% = 0.1%);

    Random Y = 0.06153 + 0.01688 Random X.

    Y and X are two sets of 1000 standard normal random numbers.

    4000375035003250300027502500

    28

    26

    24

    22

    20

    18

    16

    Weight

    MPG

    MPG (City) vs Weight (Lbs)

    43210-1-2-3-4

    4

    3

    2

    1

    0

    -1

    -2

    -3

    -4

    Random X

    RandomY

    Random Y vs Random X

    Page 65 of 156

    http://www.chegg.com/
  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    70/160

    Notation for Types of Variation and R2

    For linear regression models (with one or more predictor variables), the basic types of sums of

    squares represent three types of variation.

    1. Total Variation = ( ) ()The sum of squares of the observations of Y around their mean.

    2. Explained Variation = ( ) ()The sum of squares of the predicted values of Y ( ) around their mean (which is also ).

    3. Unexplained Variation = ( ) ()The sum of squares of the differences between each observation and the corresponding

    predicted value.

    Note:

    Total Variation = Explained Variation + Unexplained Variation

    Or: SS(Total) = SS(Regression) + SS(Error)

    The R2 (Unadjusted) is sometimes called the simple coefficient of determination and it is

    the square of the correlation:

    () = = =()

    ()= ()()

    R2 (Adjusted) is a more accurate assessment of the strength of the relationship between Y

    and X. In general:

    R2 (Adjusted) = ()()

    =

    () ( [# ])

    () ( )

    For simple linear regression, which includes a constant and a slope coefficient: [# of

    parameters in model] = 2.

    [Reference: pp. 493-495, Essentials of Business Statistics (2015), 5thEdition, Bowerman

    et al.]

    Page 66 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    71/160

    Lecture 7Inferences About Regression Coefficients &

    Confidence/Prediction Intervals for Y/Y

    Outline:(Ref: Ch. 13: 13.3-13.4, 13.6-13.7) Recap of Main Ideas from Lecture 6

    Testing Lack-of-Fit

    Inferences Based on Regression Coefficients (Ch. 13.3)

    Prediction Intervals versus Confidence Intervals for Y (Ch. 13.4)

    (Please read pp. 486-489, not for details on how PIs and

    CIs are calculated but for the main idea of what they tell

    us about Y!)

    Summary of Ideas from Lecture 6

    3 Assumptions:

    The basic relationship between Y and X is linear up to a

    random error term that has mean 0 (linearity) and constant

    variance (homoscedasticity). Errors are random in the sense

    that they are independent of each other and do not depend on

    the value of Y.

    One way to check these assumptions is to plot residuals versus

    fitted values.

    The coefficients estimates b0, and b1, are chosen to minimize the

    sum of squared errors (or residuals).

    b1represents the average change in Y that is associated with aone unit change in X.

    Regression is useful because it allows us to reduce the

    uncertainty regarding Y. We can think about this is in terms of

    the decomposition of SS(Total) (NOTE: SS is used in

    regression to refer to Sums of Squares):

    Page 67 of 156

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    72/160

    2

    SS(Total) = SS(Regression) + SS(Error)

    ( )2= = ( ) 2= + ( ) 2=

    = +

    This decomposition of total uncertainty (SS(Total)) suggests two

    useful summaries of how well the model fits:

    1) 2() 1 ()/([# ])()/() 1 ()() .(Recall that:

    2() 1 ())() . )2) ()

    {()/( # )}.

    Appl icat ion

    Data are available from Blackboard (with this lecture note).These data are from 43 urban communities (2012)

    Citation: Cost of Living Index, Council for Community and

    Economic Research, January, 2013.

    MTB > info c1-c3

    Information on the WorksheetColumn Count Name

    T C1 43 URBAN AREA AND STATE

    C2 43 HOME PRICEAvg for 2400 sq. ft. new home, 4 bed, 2 bath on 8000 sq.ft. lot

    C3 43 Apt RentAvg for 950 sq. ft. unfurnished apt., 2 bed, 1.5-2bathexcluding all utilities except water.

    Other interesting data sets on home prices and rental rates by city:

    https://smartasset.com/mortgage/price-to-rent-ratio-in-us-cities

    https://smartasset.com/mortgage/rent-vs-buy#map.

    SS of observed

    y-valuesaround mean

    SS of fitted

    values aroundthe mean

    SS of errors (errors

    are the actual y minusfitted or predicted y)

    Page 68 of 156

    Variance of Error

    Variance of Y

  • 7/25/2019 Lectures 1-13 +Outlines-6381-2016-Work Book Version.pdf

    73/160

    3

    Regression for All 43 Cities

    Regression Using 38 Cities Where Rent < $1500

    4000350030002500200015001000

    1400000

    1200000

    1000000

    800000

    600000

    400000

    200000

    S 70561.6

    R-Sq 88.2%

    R-Sq(adj) 87.9%

    Apt Rent

    HOMEPRICE

    Fitted Line PlotHOME PRICE = - 61366 + 339.3 Apt Rent

    New York (Manhattan) NY

    New York (Brooklyn) NY

    San Francisco CA

    Honolulu HI

    New York (Queens) NY

    150014001300120011001000

    500000

    450000

    400000

    350000

    300000

    250000

    200000

    S 59728.1

    R-Sq 35.9%

    R-Sq(adj) 34.2%

    Apt Rent

    H

    OMEPRICE

    HOME PRICE = 1894 + 286.5 Apt Rent

    Page 69 of 156

  • 7/25/2019 Lectures 1-13 +Ou


Recommended