Date post: | 01-Jan-2016 |
Category: |
Documents |
Upload: | dana-perry |
View: | 215 times |
Download: | 0 times |
SUMMARY FOR EQT271Semester 1 2014/2015
Maz Jamilah Masnan, Inst. of Engineering Mathematics, Univ. Malaysia Perlis
2
EQT 271ENGINEERING
STATISTICS
1. Basic Statistics 2. Statistics
Inference
4. Simple Linear
Regression
3. ANOVA
5. Nonparametric
Statistics
maz jamilah masnan/sem 1 2014/2015
3
Chapter 1. Basic Statistics
Statistics in Engineering Collecting Engineering Data (data type, group vs
ungroup) Data Summary and Presentation – 1. graphically (table, charts, graph etc) and 2. numerically (MCT – mean, mode, median,
MOD – range, variance, std. dev., MOP – quartile, z-score, percentile, outlier, boxplot ≈ 5-number-summary)
maz jamilah masnan/sem 1 2014/2015
4
1. Basic Statistics
Probability Distributions - Discrete Probability Distribution (Binomial & Poisson,
Poisson Approximation of Binomial Probabilities –
[ ]) - Continuous Probability Distribution (Normal & Normal
approximation of Binomial – [ +
continuous correction factor] & Poisson – [ 10 +
continuous correction factor])
λ
maz jamilah masnan/sem 1 2014/2015
5
Population 1 Population 2
S2
S3
S4
Sn
S2
S3
S4
Sn
S1
.
.
.
.
X1
X2
X3
X4
Xn
X
S1
.
.
.
.
X1
X2
X3
X4
Xn
X X n
X n
S2S3
S4
Sn
S2S3
S4
Sn
Concept of Sampling Distribution of the Sample Mean
maz jamilah masnan/sem 1 2014/2015
6
X1
X2
X3
X4
Xn
.
.
.
.
X 799.6
X n
12.16
X
maz jamilah masnan/sem 1 2014/2015
7
1. Basic Statistics
Uses Central of Limit Theorem, std. dev. = std. error i.e.
If a population is normal with mean μ and standard deviation σ, the sampling distribution of is also normally distributed with and .
, ,
Z-value for the sampling distribution of is
X X n
( )XZ
n
XX x
X
Sampling Distribution of the Sample Mean
maz jamilah masnan/sem 1 2014/2015
8
Properties and Shape of the Sampling Distribution
of the Sample Mean. If n ≥30, is normally distributed, where
Note: If the unknown then it is estimated by .
If n<30 and variance is known. is normally distributed
If n<30 and variance is unknown. t distribution
with n-1 degree of freedom is used
2
~ ,X Nn
X
2s2
X
X2
~ ,X Nn
12~ n
XT t
sn
maz jamilah masnan/sem 1 2014/2015
9
Population 1 Population 2
S1
.
.
.
.
S1
.
.
.
.
S2S3
S4
Sn
S2S3
S4
Sn
1
2
3
4
n
p
p
p
p
p
1
2
3
4
n
p
p
p
p
p
Pp
Pp
P
pq
n
P
pq
n
maz jamilah masnan/sem 1 2014/2015
10
Sampling Distribution of the Sample Proportion
The population and sample proportion are denoted by p and , respectively, are calculated as,
and For the large values of n (n ≥ 30), the sampling distribution is very closely normally distributed.
Mean and Standard Deviation of Sample Proportion
pX
pN
ˆx
pn
ˆ ,pq
p N pn
~
Pp
P
pq
n
maz jamilah masnan/sem 1 2014/2015
11
Statistical inference is a process of drawing an inference about the data statistically. It concerned in making conclusion about the characteristics of a population based on information contained in a sample. Since populations are characterized by numerical descriptive measures called parameters, therefore, statistical inference is concerned in making inferences about population parameters.
Chapter 2. Statistical Inference
maz jamilah masnan/sem 1 2014/2015
12
1.Estimation (point & interval estimate [a < < b])
- Confidence interval estimation for mean (μ ) and proportion (p) - Determining sample size
2. Hypothesis Testing - Test for one and two means - Test for one and two proportions
x
Estimation & Hypothesis Testing
maz jamilah masnan/sem 1 2014/2015
13
X
X
μ
-1.96 X
x +1.96 X xObserved X
+1.96 x-1.96 x μ μ
95% of the s lie in this interval
X
F( )
μ
X1
X2
X3
X4
Xn
X
n interval estimates computed by using
±1.96 X x
maz jamilah masnan/sem 1 2014/2015
14
Confidence Interval (Mean)
maz jamilah masnan/sem 1 2014/2015
15
Confidence Interval Estimates for the differences between two population mean,
i) Variance and are known
ii) If the population variances, and are unknown, then the following tables shows the different formulas that may be used depending on the sample sizes and the assumption on the population variances.
Estimation (Confidence Interval – Difference in Means)
1 2
21
22
2 2
1 21 2
1 22
X X Zn n
21
22
maz jamilah masnan/sem 1 2014/2015
16
Equality of variances, when are unknown
Sample size
1 230 30n , n 2 2
1 2, 1 230 30n , n
2 21 2
2 2
1 21 2
1 22
s sX X Z
n n
2 21 2
1 21 22
22 21 2
1 22 22 2
1 2
1 2
1 2
1 1
,v
s sX X t
n n
s sn n
vs sn n
n n
2 21 2
1 21 22
2 21 1 2 22
1 2
1 1
1 1
2
p
p
X X Z Sn n
n s n sS
n n
1 21 22
2 21 1 2 22
1 2
1 2
1 1
1 1
2
2
p,v
p
X X t Sn n
n s n sS
n n
v n n
maz jamilah masnan/sem 1 2014/2015
17
Confidence Interval (Proportion)
maz jamilah masnan/sem 1 2014/2015
18
Hypothesis Testing1. Test for one and two population
means2. Test for one and two population
proportions
Hypothesis Testing
Require understanding of:-Definition of hypothesis test, null and alternative hypothesis, tests statistics,
critical region (rejection region), critical value, p-value.
maz jamilah masnan/sem 1 2014/2015
19
Procedure for hypothesis testing
1. Define the question to be tested and formulate a hypothesis for a stating the problem.
2. Choose the appropriate test statistic and calculate the sample statistic value. The choice of test statistics is dependent upon the probability distribution of the random variable involved in the hypothesis.
3. Establish the test criterion by determining the critical value and critical region.
4. Draw conclusions, whether to accept or to reject the null hypothesis.
1
: a or a or a
: a or a or > aoH
H
maz jamilah masnan/sem 1 2014/2015
20maz jamilah masnan/sem 1
2014/2015
21maz jamilah masnan/sem 1
2014/2015
22maz jamilah masnan/sem 1
2014/2015
23
Hypothesis testing for the differences between two population mean,
Test hypothesis
Test statisticsi) Variance and are known, and both and are samples of
any sizes.
ii) If the population variances, and are unknown, then the following tables shows the different formulas that may be used depending on the
sample sizes and the assumption on the population variances.
Hypothesis testing for the differences between two population mean,
Test hypothesis
Test statisticsi) Variance and are known, and both and are samples of
any sizes.
ii) If the population variances, and are unknown, then the following tables shows the different formulas that may be used depending on the
sample sizes and the assumption on the population variances.
1 2
21
22
1 2 0
2 21 2
1 2
test
X XZ
n n
1n 2n
21 2
2
0 1 2
1 1 2 0
1 1 2 0
1 1 2 0 2 2
: 0
: 0 Reject when
: 0 Reject when
: 0 Reject when or Z
test
test
H
H H Z Z
H H Z Z
H H Z z z
maz jamilah masnan/sem 1 2014/2015
24
Equality of variances, when are unknown
Sample sizeEquality of variances, when are unknown
Sample size
1 230 30n , n
2 21 2,
1 230 30n , n
2 21 2
1 2 0
2 21 2
1 2
test
X XZ
s sn n
1 2 0
2 21 2
1 2
22 21 2
1 22 22 2
1 2
1 2
1 2
1 1
test
X Xt
s sn n
s sn n
vs sn n
n n
2 21 2
1 2
1 2
2 21 1 2 2
1 2
1 1
1 1
2
test
g
g
X XZ
Sn n
n s n sS
n n
1 2 0
1 2
2 21 1 2 2
1 2
1 2
1 1
1 1
2
2
test
g
g
X Xt
Sn n
n s n sS
n n
v n n
maz jamilah masnan/sem 1
2014/2015
25maz jamilah masnan/sem 1
2014/2015
26maz jamilah masnan/sem 1
2014/2015
27
For the single mean & proportion Confidence Interval vs Hypothesis Testing
0000 :@: ppHμμH
At the same level in confidence interval and hypothesis testing, when the null hypothesis is rejected, the confidence interval for the mean and proportion will not contain the hypothesized mean/proportion.
Likewise, when we fail to reject null hypothesis the confidence interval will contain the hypothesized mean/ proportion.
•** Applies only for two-tailed test. •Allan Bluman, pg. 458
α
maz jamilah masnan/sem 1 2014/2015
28
For the difference of means & proportions Confidence Interval vs Hypothesis Testing
[-8.5 , 8.5]
Contains zero
=If the CI contains zero, we fail to reject H0
(Means that the there is NO DIFFERENCE in population means or proportions)
[5.45 , 12.45]No zero
= If the CI does not contain zero, we reject H0
(Mean/proportion for population 1 is GREATER than the mean/proportion for population 2)
[-7.3 , -3.3]
No zero
= If the CI does not contain zero, we reject H0
(Mean/proportion for population 1 is LESS than the mean/proportion for population 2)
0:@0: 210210 ppHμμH
0:@0: 210210 ppHμμH
0:@0: 210210 ppHμμH
maz jamilah masnan/sem 1 2014/2015
maz jamilah masnan/sem 1 2014/2015
29
Outcomes for hypothesis result
Ho (Claim)
Reject Ho----------------------
---There is sufficient
evidenceto reject the
claim.
Fail to Reject Ho
-------------------------
There is insufficient evidence
to reject the claim.
H1 (Claim)Reject H1
-------------------------
There is sufficient evidence
to support the claim.
Fail to Reject H1
-------------------------
There is insufficient evidence
to support the claim.
30
Chapter 3. ANALYSIS OF VARIANCE (ANOVA)
1. 1-way-ANOVA[Completely Randomized Design]
2. 2-way-ANOVA (without replication)[Randomized Completely Block Design]
3. 2-way-ANOVA (with replication)[Factorial Design]
* Testing 3 or more population means
maz jamilah masnan/sem 1 2014/2015
31
1. 1-way-ANOVA [Completely Randomized Design]
• Hypothesis:H0: µ1 = µ2 = ... = µt *H1: µi µj for at least one pair (i,j)(At least one of the treatment group means differs from the rest. OR At least two of the population means are not equal)
The populations from which the samples were obtained must be normally or approximately normal distributed
The variance of the response variable, denoted 2, is the same for all of the populations.
The observations (samples) must be independent of each other
Assumptions for Analysis of Variance
maz jamilah masnan/sem 1 2014/2015
32
1. 1-way-ANOVA[Completely Randomized Design]
Source ofVariation
Sum ofSquares
DF MeanSquar
e
F p-Value
Treatments(between group var.)
k-1
Error (within group var.)
N-k
Total N-1
maz jamilah masnan/sem 1 2014/2015
33
Source ofVariation
Sum ofSquares
Degrees ofFreedom
MeanSquare
F p-Value
Treatments
SSTR t-1
Error SSE N-t
Total SST N-1
-1
SSTRMSTR
t
SSEMSE
N t
MSTR
MSE
1. 1-way-ANOVA[Completely Randomized Design]
maz jamilah masnan/sem 1 2014/2015
34
CONCLUSION
Fail to Reject H0
No difference in mean
Between- group variance estimate
approximately equal to the within-
group variance
F test value approximately equal to
1
Reject H0
Difference in mean
Between- group variance estimate will be larger than
within-group variance
F test value = greater than 1
* All treatments are equal
* Treatments are not equal
1. 1-way-ANOVA [Completely Randomized Design]
maz jamilah masnan/sem 1 2014/2015
35
Sampling Distribution of MSTR/MSE
MSTR/MSEMSTR/MSE
Sampling Distributionof MSTR/MSE
aDo Not Reject H0Do Not Reject H0
Reject H0Reject H0
Critical ValueCritical ValueFF
Comparing the Variance Estimates: The F Test
If F_ratio < F_(critical value) , FAIL to REJECT Ho
If F_ratio > F_(critical value), REJECT Ho
maz jamilah masnan/sem 1 2014/2015
36
2-way-ANOVA (without replication) [Randomized Completely Block Design]
12345
2730292831
3328313030
2928303231
Sample MeanSample Variance
ObservationWaxType 1
WaxType 2
WaxType 3
2.5 3.3 2.529.0 30.4 30.0
12345
2730292831
3328313030
2928303231
Sample MeanSample Variance
BatchWaxType 1
WaxType 2
WaxType 3
2.5 3.3 2.529.0 30.4 30.0
Treatment
Treatment
Block
*Treatment can be in column or row
(in 1-way-ANOVA)
*Treatment and block can either be in column or row
(in 2-way-ANOVA)
maz jamilah masnan/sem 1 2014/2015
37
• First Hypothesis:Treatment Effect:
H0: 1 = 2 = ... = t =0H1: j 0 at least one j
OR
2-way-ANOVA (without replication) [Randomized Completely Block Design]
• Second Hypothesis:Block Effect:
H0: i = 0 for each value of i through nH1: i ≠ 0 at least one i
OR
H0: µ1 = µ2 = ... = µt *H1: µi µj for at least one pair (i,j)
H0: µ1 = µ2 = ... = µt *H1: µi µj for at least one pair (i,j)
maz jamilah masnan/sem 1 2014/2015
38
Source ofVariation
Sum ofSquares
DF MeanSquare
F p-Value
Treatments
k-1
Blocksn-1
Error(k-1) *(n-1)
Total kn-1
2-way-ANOVA (without replication) [Randomized Completely Block Design]
maz jamilah masnan/sem 1 2014/2015
39
Source ofVariation
Sum ofSquares
Degrees ofFreedom
MeanSquare
F p-Value
Treatments
SSTR t-1
Blocks SSBL n-1
Error SSE (t-1)(n-1)
Total SST tn-1
Source ofVariation
Sum ofSquares
Degrees ofFreedom
MeanSquare
F p-Value
Treatments
SSTR t-1
Blocks SSBL n-1
Error SSE (t-1)(n-1)
Total SST tn-1
1
SSTRMSTR
t
1
SSBLMSBL
n
1 1
SSEMSE
t n
MSTR
MSE
MSBL
MSE
2-way-ANOVA (without replication) [Randomized Completely Block Design]
maz jamilah masnan/sem 1 2014/2015
40
Do Not Reject H0Do Not Reject H0
Reject H0Reject H0
(Critical Value)(Critical Value)FF
If F_ratio < F_(critical value) , FAIL to REJECT Ho
If F_ratio > F_(critical value), REJECT Ho
2-way-ANOVA (without replication) [Randomized Completely Block Design]
DECISION TO MAKE
maz jamilah masnan/sem 1 2014/2015
41
Three Sets of Hypothesis:i. Factor A Effect: H0: 1 = 2 = ... = a =0
H1: at least one i 0
ii. Factor B Effect: H0: 1 = 2 = ... = b =0
H1: at least one j ≠ 0
iii. Interaction Effect: H0: ( )ij = 0 for all i,j
H1: at least one ( )ij 0
H0: µ1 = µ2 = ... = µa *
H1: µi µk for at least one pair (i,k)
H0: µ1 = µ2 = ... = µb *
H1: µi µk for at least one pair (i,k)
H0: µAB1 = µAB2 = ... = µABb *
H1: µABi µABk for at least one pair (i,k)
REMEMBER THIS
2-way-ANOVA (with replication) [Factorial Design]
maz jamilah masnan/sem 1 2014/2015
42
2-way-ANOVA (with replication) [Factorial Design]
H0: There is no difference in means of factor AH1: There is a difference in means of factor AH0: There is no difference in means of factor BH1: There is a difference in means of factor BH0: There is no interaction effect between factor A and B for/on ………H1: There is an interaction effect between factor A and B for/on ………
Three Sets of Hypothesis:i. Factor A Effect:
ii. Factor B Effect:
iii. Interaction Effect:
OR USE THESE
maz jamilah masnan/sem 1 2014/2015
maz jamilah masnan/sem 1 2014/2015
43
2-way-ANOVA (with replication) [Factorial Design]
FIRST : Run test to check INTERACTION [Plot the interaction and test the
hypothesis]
If there is NO INTERACTION, then run a test to know which factor
effect is significance
If there EXIST INTERACTION, no need to run tests
for each factor.
44
Source ofVariation
Sum ofSquares
DF MeanSquare
F p-Value
Factor A a-1
Factor B b-1
Interaction
(a-1)(b-1)
Error ab(r-1)
Total abr-1
2-way-ANOVA (with replication) [Factorial Design]
maz jamilah masnan/sem 1 2014/2015
45
2-way-ANOVA (with replication) [Factorial Design]
Source ofVariation
Sum ofSquares
Degrees ofFreedom
MeanSquare
F p-Value
Factor A SSA a-1
Factor B SSB b-1
Interaction
SSAB (a-1)(b-1)
Error SSE ab(r-1)
Total SST abr-1
-1
SSAMSA
a
-1
SSBMSBb
( 1)( 1)SSABMSAB
a b
( 1)
SSEMSEab r
MSAMSE
MSBMSE
MSABMSE
maz jamilah masnan/sem 1 2014/2015
46
Do Not Reject H0Do Not Reject H0
Reject H0Reject H0
(Critical Value)(Critical Value)FF
If F_ratio < F_(critical value) , FAIL to REJECT Ho
If F_ratio > F_(critical value), REJECT Ho
2-way-ANOVA (with replication) [Factorial Design]
DECISION TO MAKE
maz jamilah masnan/sem 1 2014/2015
47maz jamilah masnan/sem 1
2014/2015
Two-wheel-drive (mean) Four-wheel-drive (mean)0
5
10
15
20
25
30
35
Graph of the means of the Factors
Series1 Series2
Gasoline C
onsum
pti
on
REG-ULAR
OCTANE
Disordinal InteractionThere is a SIGNIFICANCE interaction between
……
Ordinal InteractionThere is an interaction but not significant. The main effect can be interpreted independently
Two-wheel-drive (mean) Four-wheel-drive (mean)0
5
10
15
20
25
30
35
Graph of the means of the Factors
Series1 Series2
Gasoline C
onsum
pti
on
REG-ULAR
OCTANE
Two-wheel-drive (mean) Four-wheel-drive (mean)0
5
10
15
20
25
30
35
Graph of the means of the Factors
Series1 Series2
Gasoline C
onsum
pti
on
REG-ULAR
OCTANE
No Interaction (parallel)There is no significant interaction . The main effect can be interpreted
independently
48
4. Simple Linear Regression
0 1ˆ ˆY X
0 1Y X
Estimate the model using LEAST SQUARE
METHOD
0 1ˆ ˆy x
1 1
1
2
12
1
2
12
1
n n
i ini i
xy i ii
n
ini
yy ii
n
ini
xx ii
x y
S x yn
y
S yn
x
S xn
1xy
xx
S
S
maz jamilah masnan/sem 1 2014/2015
49
ADEQUACY OF THE MODEL COEFFICIENT OF DETERMINATION
( OR )2R 2r
PEARSON PRODUCT MOMENT CORRELATION COEFFICIENT
(r)
.xy
xx yy
Sr
S S
2
2 xy
xx yy
SSSRr
SST S S
measure of the variation (%) of the dependent variable (Y) that is explained
by the regression line and the independent
variable (X)
measures the strength of a linear
relationship between the two variables X
and Y.
r-1 +1Wea
k Strong
Strong
maz jamilah masnan/sem 1 2014/2015
maz jamilah masnan/sem 1 2014/2015
50
TEST FOR LINEARITY OF REGRESSION
1. Determine the hypotheses.
2. Compute Critical Value/ level of significance.
3. Compute the test statistic.
( no linear r/ship)(exist linear r/ship)
valueportn
2,
2
0:0:
11
10
HH
xx
xyyy
Sn
SSVar
Vart
1
2
ˆ)ˆ(
)ˆ(
ˆ
11
1
1
( no linear r/ship)
2,2
2,2
or
nn
tttt
4. Determine the Rejection Rule.
Reject H0 if : orp-value < a
5.Conclusion.
t -Test F -Test1. Determine the hypotheses.
3. Compute the test statistic.
F = MSR/MSE * this value can get from ANOVA table
4. Determine the Rejection Rule. Reject H0 if :
p-value < aF test >
( no linear r/ship)(exist linear r/ship)
0:0:
11
10
HH
2. Specify the level of significance.
2,1, nF
2,1, nF valuepor
There is a significant relationship between variable X and Y.
5.Conclusion.There is a significant relationship between variable X and Y.
51
0.01,1,8 11.26F
maz jamilah masnan/sem 1 2014/2015
ANOVA APPROACH FOR TESTING LINEARITY OF REGRESSION
1) Hypothesis:
2) F-distribution table:
3) Test Statistic:
F = MSR/MSE = 17.303
or using p-value approach:
0 1
1 1
: 0
: 0
H
H
0.01,1,8 11.26F
4) Rejection region:
If F statistic > F table, we reject H0 or
if p-value < alpha, we reject H0
5) Thus, there is a linear relationship between the variables X and Y.
52maz jamilah masnan/sem 1
2014/2015
5. Nonparametric Statistics
Sign Test (ST) Wilcoxon Signed Rank Test (WSRT)
Man Whitney Test (MWT)
(i.e. Wilcoxon Rank Sum Test)
Kruskal Wallis Test (KWT)
Test for 1 sample (use median) Can be performed for paired sample
but not covered in EQT271
Test for 1 sample (use median) Can be performed for paired sample
but not covered in EQT271
The parametric version is t-test
Test for 2 samples (use
medians) The parametric
version is Z test and t-test
Test for 3 or more samples (use medians)
The parametric version is ANOVA
53maz jamilah masnan/sem 1
2014/2015
* Please double check the summary with the notes. Some
of the complete descriptions and formulae might not be available in the summary. *
* Please do more exercises for the final exam preparation *