determinatiion of

Post on 14-Aug-2015

184 views 3 download

Tags:

transcript

Determination of

Sample size

a review of statistical

theory

A descriptive statistics

forshaf@gmail.com

Plan your data collection and collect data using one or more of:

Questionnaires

Semi-structured in-depth

and group interviews

Sampling Secondary data Observation

Qualitative methods

Write your project report and prepare your presentation

Quantitative methods

Analyse your data using one or both of:

Submit your project report and give your presentation

The r

esea

rch

proc

ess

2

Descriptive and Inferential statistics

• Statistics used to describe or summarize information about the population or sample.

• Inferential statistics is a use of statistics to make inferences or judgment about a population on the basis of sample

Sample statistics and population parameters

• Sample statistics relates to variables in a sample or measures computed from sample data

• Population parameters: variables in a population or measured characteristics of the population

Frequency distributions (table)

• A set of data organized by summarizing the number of times a particular value of a variable occurs

• To make the data useable• Process?• Percentage distribution is a frequency

organized in a table ( graph) that summarizes the percentage value associated with the particular values of a variable

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Monthly Rent for 70 houses

Percentage tableSpending amount on buying Percent spent

1 chokidar 4000 ?

2 clerk 6000 ?

3 suprintendent 10000 ?

4 officers 20000 ?

total 40000 100%

Percentage tableSpending amount on buying

Percent spent Prob

1 chokidar 4000 ? ?

2 clerk 6000 ? ?

3 superintendent 10000 ? ?

4 officers 20000 ? ?

total 40000 100% 1.00

Probability distribution

• The organization of probability values associated with particular values of a variable into table (graph).

• Percentages are shown in probability situation• Long run relative frequency• Event will occur in future

Proportion

• The percentage of population elements that successfully meet some criterion

Descriptive Statistics: Numerical Measures

Numerical data properties

Variation ShapeCentral Tendency

Kurtosis

Standard Deviation

Skewness

Median

Mode

Mean

Variance

Interquartile Range

Range

The central tendency: Mean

• The measure of central tendency: the arithmetic average

Mean

• The mean of a data set is the average of all the data values.

• As we said, the sample mean is the point

estimator of the population mean m.

Sample Mean

Number ofobservationsin the sample

Sum of the valuesof the n observations

ixx

n

Population Mean m

Number ofobservations inthe population

Sum of the valuesof the N observations

ix

N

The central tendency: Median and mode

• The measure of central tendency that is the mid point; the value below which half the values in a sample fall

• Mode is a measure of central tendency: the value that occurs most often

Median

Whenever a data set has extreme values, the median is the preferred measure of central location.

A few extremely large incomes or property values can inflate the mean.

The median is the measure of location most often reported for annual income and property value data.

The median of a data set is the value in the middle when the data items are arranged in ascending order.

Positioning Point n 12

Median

12 14 19 26 2718 27

For an odd number of observations:

in ascending order

26 18 27 12 14 27 19 7 observations

the median is the middle value.

Median = 19

Median

12 14 19 26 2718 27

For an even number of observations:

in ascending order

26 18 27 12 14 27 30 8 observations

the median is the average of the middle two values.

Median = (19 + 26)/2 = 22.5

19

30

Measures of Variability (Dispersion)

Range Interquartile Range or Mid-

spread Deviations score, Variance Standard Deviation

Coefficient of Variation

Measure of dispersion

• Can be skinny of fat• Hence, Range tells the distance between the smallest

and largest values of a frequency distribution (EXTREME VALUES)

• Interquartile range encompasses the middle most observations, that is, the range between the bottom quartile (25%) and top quartile (25%)

Range

The range of a data set is the difference between the largest and smallest data values.

It is the simplest measure of variability. It is very sensitive to the smallest and largest data values.

Quartiles

• Unless the sample size is large, percentiles may not make sense, since percentiles divide the data into 100 groups.

• In smaller samples, we might divide the data into four groups (quartiles). Since almost any sample can be divided into four groups, the quartiles are important descriptive statistics to explain.

Quartiles are specific percentiles. First Quartile = 25th Percentile

Second Quartile = 50th Percentile = Median Third Quartile = 75th Percentile

Measures of Variability (Dispersion)

It is often desirable to consider measures of variability (dispersion), as well as measures of location.

For example, in choosing supplier A or supplier B we might consider not only the average delivery time for each, but also the variability in delivery time for each.

Range: Example

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Range = largest value - smallest valueRange = 615 - 425 = 190

Monthly Rent for 70 Apartments

Interquartile Range or Midspread

The interquartile range of a data set is the difference between the third quartile and the first quartile. It is the range for the middle 50% of the data. It overcomes the sensitivity to extreme data values—it is not effected by the extreme values.

Interquartile Range Q Q3 1

Interquartile Range: Example

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

3rd Quartile (Q3) = 5251st Quartile (Q1) = 445

Interquartile Range = Q3 - Q1 = 525 - 445 = 80

Monthly Rent for 70 Apartments

Measures of Variability (Dispersion)

• Deviations scores• di =Xi - X where di = a deviation score• Example: ?

• Average deviationn

xi

)(

n

Xxi

)(

variance• A measure of variability or dispersion, the

square root is the standard deviation

The variance is a measure of variability that utilizes all the data.

It is based on the difference between the value of each observation (xi) and the mean ( for a sample, m for a population).

Variance

The variance is computed as follows:

The variance is the average of the squared differences between each data value and the mean.

for asample

for apopulation

22

( )xNi

1

)( 22

n

xixs

Standard deviation

• Square root of the variance for distribution

The standard deviation of a data set is the positive square root of the variance.

It is measured in the same units as the data, making it more easily interpreted than the variance.

Standard Deviation

The standard deviation is computed as follows:

for asample

for apopulation

s s 2 2

Normal distribution

• Normal distribution curve• A symmetrical, bell shaped distribution that

describes the expected probability of many chance occurrences

• Expected distribution of sample mean• Normal curve equals (3) SD ± from its mean

standardized normal distribution

• Specific normal curve that has:– Symmetrical about its mean– Identify the normal distribution at highest point=1– Probability of occurrences =1– Normal distribution mean = zero and SD=1– Formula or standardized value = (value to be transformed)-(mean)/SDWhere µ= hypothesized or expected value of the mean

standardized normal distribution

• Example:– a shop keeper has an experience of his average

sales level: Mean=9000 units and it varies: SD= 500 units. Further, he expects that the sales level will be between 7500 and 9625 units. What is the probability of occurrence?

Formula:

Additional type of distribution

• Population distribution• Sample distribution• Sampling distribution

Population distribution

• A frequency distribution of the elements of a population

• It has its mean µ and SD σ

Sample distribution

• A frequency distribution of the elements of a sample

• sample mean xA and its SD is rep S

Sampling distribution of the sample mean• Basis for understanding statistics• It is theoretical probability distribution of all

possible samples of certain size drawn from a particular population

• in actual practice would never calculated• Large samples say 50000 each having n

elements from a specified population• Several people, several samples; not same mean• Central limit theorem says if large sample and

drawn randomly, mean approx normal distri

Sampling distribution…

• It’s the functional relationship between the possible values of some summary characteristics of n cases drawn at random and the probability associated with each value over all possible samples size n from a particular population

• Sampling mean is called expected value of the statistics

Sampling distribution…

• SD of the sampling distribution of x called standard error of the mean

• Standard error of the mean is the SD of the sampling distribution of the mean

THREE type of distributionDistribution Mean SD

1 Population distribution µ σ

2 Sample distribution xA S

3 Sampling distribution µ x = µ S x

Central-limit theorem

• The theory stating that as a sample size increases the distribution of sample mean of size n, randomly selected, approaches a normal distribution

Central-limit theorem• Example– Number of rupees spend on book, further assume

that age of youth 20 years old at Commerce Department and population size is 6. Now calculate the mean

S No STUDENTS EXP ON BOOK $

1 A 1.00

2 B 2.00

3 C 3.00

4 D 4.00

5 E 5.00

6 F 6.00

Samples

Samples1.2

1.3 2, 3

1, 4 2, 4 3, 4

1, 5 2, 5 3, 5 4, 5

1, 6 2, 6 3, 6 4, 6 5, 6

Means of the samples and their frequency distributionSample Summation X x= probability

1,2

1,3

1,4

1,5

1,6

2,3

2,4

2,5

2,6

3,4

3,5

3,6

4,5

4,6

5,6

Means of the samples and their frequency distributionSample Summation X x= probability

1,2 3.00

1,3 4.00

1,4 5.00

1,5 6.00

1,6 7.00

2,3 5.00

2,4 6.00

2,5 7.00

2,6 8.00

3,4 7.00

3,5 8.00

3,6 9.00

4,5 9.00

4,6 10.00

5,6 11.00

Means of the samples and their frequency distributionSample Summation X x= probability

1,2 3.00 1.50

1,3 4.00 2.00

1,4 5.00 2.50

1,5 6.00 3.00

1,6 7.00 3.50

2,3 5.00 2.50

2,4 6.00 3.00

2,5 7.00 3.50

2,6 8.00 4.00

3,4 7.00 3.50

3,5 8.00 4.00

3,6 9.00 4.50

4,5 9.00 4.50

4,6 10.00 5.00

5,6 11.00 5.50

Means of the samples and their frequency distributionSample Summation X x= probability

1,2 3.00 1.50 1/15

1,3 4.00 2.00 1/15

1,4 5.00 2.50 1/15

1,5 6.00 3.00 1/15

1,6 7.00 3.50 1/15

2,3 5.00 2.50 1/15

2,4 6.00 3.00 1/15

2,5 7.00 3.50 1/15

2,6 8.00 4.00 1/15

3,4 7.00 3.50 1/15

3,5 8.00 4.00 1/15

3,6 9.00 4.50 1/15

4,5 9.00 4.50 1/15

4,6 10.00 5.00 1/15

5,6 11.00 5.50 1/15

Means of the samples and their frequency distribution

Sample mean Frequency Probability

1.50

2.00

2.50

3.00

3.50

4.00

4.50

5.00

5.50

Means of the samples and their frequency distribution

Sample mean Frequency Probability

1.50 1

2.00 1

2.50 2

3.00 2

3.50 3

4.00 2

4.50 2

5.00 1

5.50 1

Means of the samples and their frequency distribution

Sample mean Frequency Probability

1.50 1 1/15

2.00 1 1/15

2.50 2 2/15

3.00 2 2/15

3.50 3 3/15

4.00 2 2/15

4.50 2 2/15

5.00 1 1/15

5.50 1 1/15

Point estimates• An estimate of the population mean in the

form of a single value, usually the sample mean

• Example: large population mean is unknown. In order to know the mean through sample of say 300 people • It is rarely exact to population mean• Confidence level is important

Confidence interval & Level• It is the range of confidence• A specified range of numbers within which a

population mean is expected to lie: the set of acceptable hypotheses of the level of probability associated with an interval estimate

• Confidence level is a percentage or decimal value that tells how confident a researcher can be about being correct. It states the long run percentage of the time that a confidence interval will include the true population mean

Confidence interval: example• A manger thinks that age is useful standard in

placement. People have been sampled and sample mean of 100 people were 37.5 years, SD (S)12.00 years. Hopping for the sample point estimate from the sample exactly the same as the population mean age. Confidence level is 95%. Please follow the steps for calculation

Confidence interval; stepsSteps

1 Calculate the mean from the sample

2 Assuming SD is unknown, so estimate the population SD by finding S, i-e the sample SD

3 Estimate the standard error of the mean, utilizing the following formula

4 Determine the Z-value associated with the confidence level desired. The confidence level should be divided by 2 to determine what %age of the area under the curve must be included on each side of the mean

5 Calculate the confidence level

Steps

• Step 1: Calculate the mean from the sample X = 37.5

• Assuming SD is unknown, so estimate the population SD by finding S, i-e the sample SD

S= 12.00

step3

• Estimate the standard error of the mean, utilizing the following formula

S=12/ /100 =1.2

step4• Determine the Z-value associated with the confidence level desired. The

confidence level should be divided by 2 to determine what %age of the area under the curve must be included on each side of the mean

• Sampling confidence is 95% and half is 47.5%• See the Z table(2)• Find the value in the table which is equal to

1.96

step5

• Calculate the confidence levelformula µ = X ±E or µ= X ± Zcl SX

• µ = 37.5 ± (1.96) (1.2) = 37.5 ± 2.352 = 35.15 and 39.85

Some basic formulasPopulation Mean

Sample mean

Deviation

Variance

SD population

SD sample

Standardized normal distribution

Standard error of sampling distribution

Some basic formulasPopulation Mean µ= Ʃ xi /N

Sample mean ×̅A = Ʃ xi/n

Deviation Di = ( xi-×̅A)

Variance S2 = ( xi-×̅A)2

SD population σ =/ (xi-×̅A)2 /N

SD sample ×̅A =/ (xi-×̅A)2 /n-1

Standardized normal distribution Z= x- µ/ σ

Standard error of sampling distribution

Sample size: random error and sample size

• When the SD of the population is unknown, a confidence interval is calculated using the formula

Confidence interval= ×̅A +- Z S/ A under root n

Factors in determining the sample size for questions involving means

• Heterogeneity • Magnitude of acceptable error• Confidence level

Estimating sample size for questions involving means

• Steps:– Estimate the SD of population *pilot study*

Sequential sampling– Make a judgment about the allowable magnitude

of error– Determine confidence level

Rule of thumb for SD estimation

• Expect it to be about 1/6 of the rangeExample: If researcher studying on DVD purchase expected price paid to range from $100 to $700If we plan on using 10-point purchase intention scale, it is: 10/6 = 1.67

SD is knownestimate the mean of population

sample size is:

n= [ZS/E]2

Z= standardized value S= sample SD or estimate of the population SDE= acceptable magnitude or error

Example: determine sample size

Study on the annual expenditure on soap, have 95% confidence level (Z= 1.96) and range of error (E) of less then $2. If the estimate of the SD =$29, the sample size will be n= [ZS/E]2

=(1.96*29/2)2

If the range of error (E) is $4 then

Thank you