Ch. 1: Data and Distributions • Populations vs. Samples • How to graphically display data
– Histograms, dot plots, stem plots, etc – Helps to show how samples are distributed
• Distributions of both continuous and discrete variables – Density functions and Mass functions
• Three basic properties – Shows the distribution of the entire population or process
• Some important distributions and associated Probability – Continuous: Exponential, Normal, Uniform … – Discrete: Binomial, Poisson …
4/24/12 1 H.X. Lecture 30: Final Summary
Ch. 2: Numerical Summary Measures
• Measure of center of Data (Sample) –Sample mean –Sample median, midpoint –Trimmed means
• Measure of variability for Data (Sample)
–Sample variance –Sample Standard deviation
• Quartiles; Five-number-Summary; IQR and Outliers • Graphical Display: Boxplots; Modified Version; Side-
By-Side Boxplots
4/24/12 H.X. Lecture 30: Final Summary 2
∑=+++
= in x
nnxxxx 1...21
( )∑ −−
=−
−++−+−= 2
222
212
11
1)(...)()( xx
nnxxxxxxs i
n
2ss =
Ch. 2 (Cont.): Numerical Summary Measures
• Measures of Center (Distributions) – Continuous: – Discrete:
• Measure of variability (Distributions)
– Continuous: – Discrete:
• Normal Quantile (QQ) plot
∫∞
∞−⋅= dxxfxX )(µ
∑ ⋅= )(xpxXµ
( )∫∞
∞−⋅−= dxxfx XX )(22 µσ
( )∑ ⋅−= )(22 xpx XX µσ
4/24/12 3 H.X. Lecture 30: Final Summary
Ch.3: Bivariate Data • Scatterplots: Visually Display Bivariate data, y vs. x • Pearson’s Correlation Coefficient (between X and Y, both
quantitative), r : – r measures the strength and direction of the linear
relationship – , other convenient formulas for Sxy, Sxx and Syy
– Takes values between -1 and 1, inclusive • Sign indicates type/direction of relationship (positive, negative) • Value indicates strength: farther from 0 is stronger
– If switch roles of X and Y à r doesn’t change – Unit free—unaffected by linear transformations – Affected by Outliers, Not a resistant measure – Correlation ≠ Causaiton
4/24/12 H.X. Lecture 30: Final Summary 4
Ch. 3: LS (Least Square) Regression Line
• Estimated straight line Equation: y = a + b x – a is the intercept (where it crosses the y-axis) – b is the slope (rate) –
– Predicted value of y – Residual from the fit (or regression line) – Breaking up Sum of Squares: SSR, SSE, SST
• Coefficient of Determination: – Percent of variation explained by the linear
regression between Y and X
4/24/12 H.X. Lecture 30: Final Summary 5
⎟⎟⎠
⎞⎜⎜⎝
⎛=
x
y
ss
rb
SSTSSE
SSTSSRr −== 12
Ch. 3 (Cont.): MSE and Residual Plot
• Mean Squared Error about the LS line:
• Standard Deviation about the LS line:
– Also called “root MSE” in SAS output. • Residual: • A residual plot, plotting the residuals against x.
– The residual plot should not have any pattern but a random scattering of points
– If a pattern is observed, the linear regression model is probably not appropriate.
ˆi i ie y y= −
4/24/12 6 H.X. Lecture 30: Final Summary
Ch. 5: Probability and Sampling Distributions
• Chance Experiments: – Simple Events: individual outcomes – Events: collections of simple events – Sample Space: – Venn Diagrams – Tree Diagrams
• Complex Events: – Event A or B, Event A and B, – Event A’ (Complement of A) – Disjoint Events (Mutually Exclusive) – Independent Events
4/24/12 H.X. Lecture 30: Final Summary 7
Probability Basic Rules • Probability Axioms:
– 0 ≤ P(A) ≤ 1 for any event A – P(S) = 1, where S is the sample space
• Addition Rule - For any disjoint events A and B, P(A or B) = P(A)+P(B)
• Complementary Events: P(A’) = 1 - P(A) • General Addition Rule: (for any events A and B)
P(A or B) = P(A)+P(B)-P(A and B) • Independence Rule: P (A and B) = P(A) P(B) • Conditional Probability: P(A|B) = P (A and B) / P(B) • Bayes Rule for Calculation of Conditional Probability, Tree Diagrams
4/24/12 H.X. Lecture 30: Final Summary 8
Random Variables and Sampling Distribution
• Random Variables – Discrete Distribution Table, Prob. Histogram – Continuous Distribution Curve, density function – Independent R.V.s
• Sampling Distribution of a Sample Mean • Sampling Distribution of a Sample Proportion
(rule of thumb for Normal Appox.) • Central Limit Theorem • Continuity Correction (from Binomial to Normal
Appox.)
4/24/12 H.X. Lecture 30: Final Summary 9
Ch 7: Estimation and Statistical Inference by C.I. s
• (Unbiased, Consistent) Point Estimation • Large-Sample C.I.s for a Population Mean (Normality
Assumption)
– one-sided C.I.s: Upper or Lower bound C.I. – Interpretation of Confidence Level. – Necessary sample size for a desired Bound (round up):
• Small-Sample C.I. – t-crit is associated with d.f. = n -1 – Normailty Assumption still holds.
4/24/12 H.X. Lecture 30: Final Summary 10
ns value)critical (z ±X
2CritZ snB
⎛ ⎞= ⎜ ⎟⎝ ⎠
ns value)critical ( tX ±
C.I. for a Population Proportion • Point Estimation for a Population Proportion • Large-Sample C.I.s for a Population Proportion
– Necessary sample size for a desired Bound (round up for not-an-integer):
• , or 0.5 if p-hat is unavailable.
• Small-Sample C.I. replaces z-crit by t-crit
4/24/12 H.X. Lecture 30: Final Summary 11
ˆ ˆ(1 )ˆ p pp Zcritn−
±
2_*(1 *) z criticaln p pB
⎛ ⎞= − ⎜ ⎟⎝ ⎠
ˆ*p p=
C.I. for two Population Means’ Difference
• Large-Sample C.I.s for Difference between two Population Means (Normality Assumption)
• Small-Sample C.I. , Zcrit replaced by t-crit, with (round down for non-integer)
4/24/12 H.X. Lecture 30: Final Summary 12
2
22
1
21
21 ns
nsZcritXX +±−
( )( ) ( )
11 2
22
22
1
21
21
22
221
21
−+
−
+=
nns
nns
nsnsdf
Ch. 8: Hypotheses Testing • State Hypotheses
– Both Null and Alternative (one or two-sided)
• Determine an appropriate α level. If not specified, use 5% • Type I error; Significance Level. Understand it.
• Calculate the appropriate test statistic • Find the P-value, the probability of the as extreme or more
extreme than the test statistic • Reject H0, when the P-value is smaller than the significance
level α. – Otherwise: Fail to reject H0
• State a conclusion in layman’s terms
4/24/12 H.X. Lecture 30: Final Summary 14
One-sample t Test for a Population Mean: • The null hypothesis is H0: µ = µ0 • The alternative hypothesis could be:
Ha: µ ≠ µ0 (two-sided) Ha: µ > µ0 (one-sided) Ha: µ < µ0 (one-sided)
4/24/12 15 H.X. Lecture 30: Final Summary
• Test Statistic
• t ~ Student’s t-distribution • df = n – 1
• If n is large (≥30), CLT guarantees an approximate normal
distribution and the t can be replaced with z, where z follows a standard normal distribution.
nsXt 0µ−=
P-value tied to Ha
• Two-sided (both tails) Ha: µ ≠ µ0
• One-sided (right tail)
Ha: µ > µ0 • One-sided (left tail)
Ha: µ < µ0
4/24/12 16 H.X. Lecture 30: Final Summary
Other Tests or Remarks • Two-Sample z (or t, depending on sample sizes)
test for Two Population Means – When using t, the d.f. calculation
• One-Sample t Test with (Matched) Paired Data • Focus on two population means’ difference
• A two-sided significance test <-> A two-sided C.I. for the same parameter
– If the claimed value is in the CI à fail to reject H0 – If the claimed is not in the CI à reject H0 – NOTE: must have “≠” in Ha!
• Statistical Significance ≠Practical Sig.
4/24/12 H.X. Lecture 30: Final Summary 17
Cautions (for both C.I. and tests of significance):
• Data: assume SRS (random sampling) • Population need to be …
– If n < 30, have to check normality (by Normal QQ-plot)
– With n ≥ 30, CLT can give us approximate normality in most situations.
4/24/12 18 H.X. Lecture 30: Final Summary
Ch. 9: One Way ANOVA • Hypotheses:
– H0: µ1 = µ2 = … = µk vs. Ha: At least one µi is different • F test statistic
• ANOVA table
• P-value is always the upper tail of the F distribution with (k – 1, n – k) degrees of freedom. Tables of critical values for F distribution: (Table VIII)
• F statistic > F critical value <=> P-value < α => Reject H0 4/24/12 H.X. Lecture 30: Final Summary 19
variationsamples-within variationsamples-between statistictest =
Source DF SS MS
Model (Between)
k – 1 SSM (formula)
SSM/k – 1
Error (Within)
n – k SSE (formula)
SSE/n – k
Total n – 1 SST = SSM + SSE
Assumptions (prior to Running one-way ANOVA)
1. Constant variance: The variances of the k populations are the same.
– Check this with the ratio of the largest and smallest standard deviations, the ratio must be
< 2 2. Each of the k populations follows a normal
distribution. – Check this by looking at QQplots for each group
• Remark: statistical significance ≠ practical
significance 4/24/12 H.X. Lecture 30: Final Summary 20
Ch. 9: Multiple Comparison
• If insignificant in one-way ANOVA, we don’t have to try further steps…
• Otherwise, run Multiple Comparison to see which explicitly means are different. – Tukey’s Mehtod (“cldiff” or “lines” format) – Dunnett’s Method (only if there’s a control
group)
4/24/12 H.X. Lecture 30: Final Summary 21
9.4: Randomized Complete Block Design
• RCBD (both treatment and block factor must be categorical)
• In RCBD, – we are only interested in the treatment factor – The block factor might affect response but that’s not of interest.
• Two F tests – Blocking Effect? Use test statistic and P-value to conclude… – Treatment Effect? Use test statistic and P-value to conclude…
4/24/12 H.X. Lecture 30: Final Summary 22
Source DF SS MS Factor A
(treatment) a – 1 SSA MSA
Factor B (block)
b – 1 SSB MSB
Error (a – 1)(b – 1) SSE MSE
Total ab – 1 SST
Necessary Assumptions for RCBD
• Similar to one-way ANOVA 1. Constant variance 2. Each of the k populations follows a normal
distribution • One additional assumption
3. There is no interaction between the treatment and blocking variables
• Can assess just using common sense (Just ask: Do/should they interact?)
• OR check by a Two-way ANOVA model “Interaction Plot”…
4/24/12 23 H.X. Lecture 30: Final Summary
Ch. 10: Two-Way ANOVA • Testing Two factors and their interaction’s effect to the response
variable…
• Test – First, Interaction (of the most interest). – Then Factor A and B, respectively.
• If “Interaction” significant, still run slicing for Factor A and B. • If “Interaction” insignificant while a single Factor significant, run one-way
ANOVA and multiple comparison.
4/24/12 H.X. Lecture 30: Final Summary 24
Source DF SS MS
Factor A a – 1 SSA MSA
Factor B b – 1 SSB MSB
AB interaction (a – 1)(b – 1) SSAB MSAB
Error ab(r – 1) SSE MSE
Total abr – 1 SST
Ch. 10 (Cont.): Two-Way ANOVA • Interaction plot
– Roughly speaking, there’s no “Interaction” effect if all lines are parallel to each other
• In summary, for Ch. 9 and 10 we should know:
– All of One-way ANOVA (Ch. 9) • By hand and/or using SAS
– Most of randomized Blocking design (Sec 9.4), Two-way ANOVA
(Ch. 10, Section 2) • For both:
– Complete ANOVA tables, calculate DFs and F test statistic – Perform F tests using F table – Interpret SAS output
• Know the general concept of a higher order (multi-way) ANOVA model.
4/24/12 H.X. Lecture 30: Final Summary 25
Ch. 11: Inferential Methods in Regression and Slopes (Correlations)
• Normal Error Regression Model – Error Item (3 assumptions: Independence, Normality
and Constant Variance) • SSE, MSE, and Root MSE • Coefficient of Determination, R^2
– % of variation explained by the regression model – Simply by squaring r
• Statistical Inference about the slope in SLR Model:
– C.I. for β (the slope): b ± (t crit) * sb – Hypotheses Testing w.r.t. the slope, i.e. test of Linear
Relationship – Remark: t~Student’s t-distribution with d.f. = n – 2
4/24/12 H.X. Lecture 30: Final Summary 26
Using ANOVA table to test SLR
• Remark: d.f. of F test statistic = (1, n – 2)
4/24/12 H.X. Lecture 30: Final Summary 27
Source DF SS MS
Model (Regression)
1 SSM (or SSR) SSM/1 = MSM (or MSR)
Error n – 2 SSE (or SSResid)
SSE/n – 2 = MSE
Total n – 1 SST = SSM + SSE
Multiple Linear Regression Model
• MLR Model:
• Test the above linear relationship – H0: All βi’s = 0 vs. Hα: At least one βi ≠ 0 – A rejection of the null indicates that collectively the Xs
do well at explaining Y; otherwise don’t have to run the following step
– But it doesn’t show which explicit Xi’s are doing “the explaining”
• Model Selection, especially Backward Elimination • The Estimated Line, from SAS output
– Use it to Predict Yi; – Get residual by “Actual Y_i – Predicted Value”
4/24/12 H.X. Lecture 30: Final Summary 28
1 1 2 2 ...i p p iY X X X eα β β β= + + + + +
After Class… • Review Notes, practices, Hw, Labs and previous tests. • Wed, Lab#8 (optional) • Final Exam (Close book, Close notes)
– Next Wed, 8-10am – Student ID, a calculator (SAT policy, NO QWERTY
keyboard) and pencils, two-page crib sheet (8” by 11”) handwritten by yourself, two-sided.
• SEE CALCULATOR POLICY and “crib sheet” (on Syllabus) from course website.
• No electronics except a calculator. Not allowed to exchange calculator or crib sheet during the exam. Not allowed to type/print your crib sheet.
4/24/12 H.X. Lecture 30: Final Summary 29