Date post: | 08-Apr-2015 |
Category: |
Documents |
Upload: | sanjeev-nawani |
View: | 130 times |
Download: | 4 times |
04/11/23Dr Tarek Amin1
Introduction to Inferential Statistics
Dr. Tarek Tawfik
04/11/23Dr Tarek Amin2
Sampling DistributionWhat are the steps needed to summarize information?
Collection of measurements
Data reduction (descriptive)
All possible cases included A sub-set (sample)) census(
Population Parameters Sample Statistics
This is a dead end Inferential statistics
04/11/23Dr Tarek Amin3
Why sampling instead of whole population?
1- Samples are usually cheaper and quicker.2- It is sometimes impossible to locate all members of a population (not included in the list, unavailable, difficult to reach, unwilling to participate).3- Research sometimes destroys the units of analysis so that a census would destroy the population (batteries quality control check).4- Sometimes sampling is more accurate than census (inexperienced data collection team during census).
04/11/23Dr Tarek Amin4
Problem with Sampling
Do sample statistics are representative to the population as those concluded if we conduct a complete and accurate census?
This problem can be addressed using inferential statistics.
Inferential Statistics: are the numerical techniques for making conclusions about a population based on the information obtained from a random sample drawn from that population.
04/11/23Dr Tarek Amin5
Steps to Conduct Inferential Statistics
Raw Data of a random sample
Sample Statistics (descriptive Statistics) Measures of Central tendency, dispersion,
association
Inferential StatisticsAllow us to generalize from the sample
statistics to the population.
04/11/23Dr Tarek Amin6
The Sampling Distribution of a Sample Statistics Concept
≈ Let’s assume that we want to survey a community of 400, the age of them were recorded and having the following parameters:
µ = 35 years σ = 13 years≈ Let’s assume, however, that we do not survey
all 400, instead we randomly select 120 people and ask them about their ages and calculate the mean age.
≈ Then, we put them back into the community and randomly select another 120 residents (may include members of the first sample).
≈ We did this over and over and each time we calculate the mean age. ≈ The results will be like those in the following table.
04/11/23Dr Tarek Amin7
Distribution of 20 random sample means (n=20)
Sample NumberSample mean
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
34.7
35.9
35.5
34.7
34.5
34.4
35.7
34.6
37.4
35.3
34.1
35.5
34.9
36.2
35.6
35.0
35.1
36.4
35.6
33.6
353433 36 37
. . ..…..…
.… .. ..
All the results are clustered around the population value (35 years), with a few
scores a bit further out and one extremescore of 37.4 years (random variation).
Those 400 people in the community have age range from 2 to 69 years, while the means of
the samples have a very narrow range of valuesof about 4 years and 10 samples coincide with
the population mean (35 years).
04/11/23Dr Tarek Amin8
A sampling distribution: is the theoretical probability distribution of an infinite number of sample outcomes for a statistics, using random samples of equal size.
Most of the samples will cluster around the population parameters with occasional sample result falling relatively further to one side or the other of the distribution (this called the sampling distribution of sample means).
Has the following properties:
The mean of the sampling distribution is equal to the population mean, the average of the averages (µχ) will be the same as the population mean. The standard error will be related to the standard deviation of the population. The sampling distribution will be normally distributed.
04/11/23Dr Tarek Amin9
PopulationParameters
Mean S.D
Sample Mean
S.D
Standard error of the mean
The degree the sample statistics are deviating /different from the population parameters.
04/11/23Dr Tarek Amin10
Central Limit TheoremIf you draw a sample from a population and calculate its mean, how close
have you come to knowing the mean of the population?
How your mean expresses and close enough to the population mean?
When many samples are drawn from a population, the means of these samples tend to be distributed normally and the mean of the means is
very close to the actual mean of the population .
├The Z-score can be calculated for these means┤To calculate we need to know the S.D of the means, (σ/√n) this new S.D of the means is called the standard error of the mean.
The term error indicates the fact that due to sampling error, each sample mean is likely to deviate some what from true population mean.
04/11/23Dr Tarek Amin11
Central Limit TheoremThe formula for SEM= Standard
deviation/ Square root of n.The formula indicates that we are estimating the SE given the S.D of a sample of size n.For a sample of 100 and S.D of 40 the SEM = 40 /√100 = 4.For a sample of 1000 and S.D of 40 the SEM = 40 / √1000 = 1.26.
Two factors influence the SEM, sample size and S.D of the sample: Sample size has greater impact as it is used a denominator .
For a sample of 100 and S.D of 20 the SEM = 20 / √100 = 2.For a sample of 100 and S.D of 40 the SEM = 40 / √100 = 4.If there is more variability within a sample the greater
the SEM.
04/11/23Dr Tarek Amin12
Standard Deviation V. Standard Error
The value of σ measures the standard deviation in the population and is based on measurements of individuals “tell us how much variability can
be expected among individuals.”
While SEM (the standard deviation of the means) tell us how much variability can be found among means of the samples.
SEM pertains to means not to individuals.
04/11/23Dr Tarek Amin13
Estimation and Confidence Intervals
Introduction and conceptDr Tarek Tawfik
04/11/23Dr Tarek Amin14
Important Concepts
The properties of the sampling distribution:
1 -The mean of the sampling distribution is equal to the population mean.
In other words, although individual results will vary from sample to sample, on average the sample means will be equal to that of the population = Unbiased Estimator
04/11/23Dr Tarek Amin15
Properties
2 -The spread of sample results around the population value is affected by the sample size .
The standard deviation of the sampling distribution, called the standard error.
As the sample size increases the SEM will decrease (the sample results are more tightly clustered around the population value)= Efficient Estimator of the population values.
n
04/11/23Dr Tarek Amin16
Properties
3 -The sampling distribution of sample means is normal (Z-score is applicable). Around 95 % of repeated samples will have a mean of 1.96 Standard errors of the population mean.
This allow us to specify a range or INTERVAL of scores within which 95 % of all possible sample means will fall, defined by:
nInterval 96.1
95 % of all possible means will fall in the interval
04/11/23Dr Tarek Amin17
Estimation
In research we usually have a single sample and we need to estimate the population value from such sample.
In such cases the population parameters are unknown, and we assume that the sample actually fall within a certain region of the sampling distribution
Our sample is one of the 95 % of all possible samples that fall within ±1.96 standard error from the population mean (Confidence Level), so our estimate of the population value is valid.
04/11/23Dr Tarek Amin18
PopulationValue
Lower limit
Sample
Upper limit
04/11/23Dr Tarek Amin19
Areas under the standard normal curve.ZArea under curve between
both points (Around the mean)Beyond both points
)two tails(
Beyond one point
)one tail(
±0.1
±0.2
±0.3
±0.4
±0.5
±0.6
±0.7
±0.8
±0.9
±1
±1.1
±1.2
±1.3
±1.4
±1.5
±1.6
±1.645
±1.7
±1.8
±1.9
1.96±2
±2.1
±2.2
±2.3
±2.4
±3
0.080
0.159
0.236
0.311
0.383
0.451
0.516
0.576
0.632
0.683
0.729
0.770
0.806
0.838
0.866
0.890
0.900
0.911
0.928
0.943
0.9500.954
0.964
0.972
0.979
0.984
0.996
0.920
0.841
0.764
0.689
0.617
0.549
0.484
0.424
0.368
0.317
0.271
0.230
0.194
0.162
0.134
0.110
0.100
0.089
0.072
0.057
0.050
0.046
0.036
0.028
0.021
0.010
0.004
0.4600
0.4205
0.3820
0.3445
0.3085
0.2745
0.2420
0.2120
0.1840
0.1585
0.1355
0.1150
0.0970
0.0810
0.0670
0.0550
0.0500
0.0445
0.0360
0.0290
0.0250
0.0230
0.0180
0.0140
0.0105
0.0100
0.0020
04/11/23Dr Tarek Amin20
Calculation of The Confidence Level1 -Calculation of the standard error:
We do not have the S.D of the population, so we use sample S.D.
2 -Mean – 1.96 S.E = the furthest the population parameter can be below the sample and value remains within the 95 % region = The lower limit.
3 -Mean + 1.96 S.E = the furthest the population parameter can be above the sample and the value remains within the 95 % region = The upper limit.
n
DSs
z
nsCi
.
96.1
Mean +1.96S.E -1.96 S.E
Lower limit Upper Limit
04/11/23Dr Tarek Amin21
ExampleA random sample of 81 newborn infants from
a hospital in a poor neighborhood during the last year had a mean birth weight of 100
oz, with a SD of 27 oz.
What is the point estimate for the unknown true value of the average (mean) birth weight of all infants born in that hospital in the last year (the population parameter)?
04/11/23Dr Tarek Amin22
The difference between the sample mean of 100 oz and the unknown population mean (which is estimate) is the sampling error.
Because the point estimate, 100 oz, is a single number, it gives no indication of its sampling error. CIs computed from random sample enable us to measure sampling error in numeric terms.
04/11/23Dr Tarek Amin23
What is the value of the 95 % CI estimate from mean birth weight?
1 -Calculate the SEM (SD/√n):
27√ / 81=27 /9=3
2 -Calculate the 95 % CI:
Mean ± 1.96 (SEM).
100 ± 1.96)3(
100 ± 5.88
)94.12 to 105.88 oz.(
A confidence interval is the range of values, that, it is estimated,
includes a population statistic, at a specific level of confidence.
Population mean is
102 oz
04/11/23Dr Tarek Amin24
Interpretation of CI
By using the 95 % confidence level, we are prepared to be wrong only five times in every 100 samples (1 in 20).
The risk of not including the population parameter in our interval estimate is only 5.%
This probability of error is known as alpha level (ά) which is simply one minus the confidence level expressed as proportion,
1 -0.95 = 0.05.
04/11/23Dr Tarek Amin25
Changing the Confidence LevelA random sample of 200 nurses is taken and each nurse asked his or her
annual income in whole dollars .These 200 nurses have an average income of $ 35,000, SD of $ 5000.
The 95 % CI = 35,000± 1.96 (5000/ √200) = 35,000 ± 695.) 34,305 to 35,695 = ( $ this is the estimate of the average income of all nurses
with a 95 % confidence. The interval width = 35,695-34,305 = $ 1390.
The 99 % CI = 35,000 ± 2.58 (5000 / √200) = 35,000 ± 915 ) 34,085 to 35,915 = (the interval width is 1830 .
The 90 % CI will be : (34,415 to 35,585), the interval width = 1170.$
04/11/23Dr Tarek Amin26
The smaller the confidence level the larger the alpha
error
)not including the true population parameter and wrong conclusions.(
04/11/23Dr Tarek Amin27
Thank You