+ All Categories
Home > Documents > Statistics

Statistics

Date post: 07-Feb-2016
Category:
Upload: ashton
View: 21 times
Download: 0 times
Share this document with a friend
Description:
Statistics. Introduction to. Statistic [ st uh- tis -tik ] noun . A numerical fact or datum, especially one computed from a sample. How Long is a Meter Stick?. Expected value: 1m Measured values: 1.01m, .98m, 1.03m. How do we decide which of these measured values is correct ?. - PowerPoint PPT Presentation
Popular Tags:
48
Transcript
Page 1: Statistics
Page 2: Statistics

Statistics

Introduction to

Statistic [stuh-tis-tik] noun . A numerical fact or datum, especially one computed from a sample

Page 3: Statistics

How Long is a Meter Stick?

Expected value: 1mMeasured values: 1.01m, .98m, 1.03m

• How do we decide which of these measured values is correct?

• How do we discuss the variation in our measurements?

Page 4: Statistics

MeanThe arithmetic average

n

i

ixnn

xxxx

n

1

1...21

Page 5: Statistics

Propagation of Uncertainty

AccuracySources of Inaccuracy:

Broken measurement device

Parallax Systematic error ?

Precision Sources of

Imprecision: Multiple measurement

methods Random error ?

Low bias, high variability

High bias, low variability

Page 6: Statistics

Variance and Standard DeviationVariance:

Standard Deviation: measures how far observations are from the mean

n

ii

n xxnn

xxxxxxs

1

222

22

12 1....

2ss

Page 7: Statistics

ErrorError is the difference between the

measured and expected value

Error is how we make sense of differences between two measurements that should be the same

Error is NOT mistakes! If you made a mistake, do it again.

Page 8: Statistics

Types of Error DescriptionsFor a true mean, µ, and standard deviation, σ, the sample mean has an uncertainty of the mean over the square root of the number of samples.

Gives a measure of reliability of the mean.

N

sx

Sample standard error tells you how close your sample mean should be to the true mean.

Page 9: Statistics

Using the Standard Error

xx inside

confirmed

xx outside not confirmed

This is the simplest way of using data to confirm or refute a hypothesis.

This is also what is used to create the error bars. x

x

x

Page 10: Statistics

Density Curve

Low values indicate a small spread (all values close to the mean)

high values indicate a large spread (all values far from the mean)

Page 11: Statistics

Normal Distribution• Particularly important class of density curve• Symmetric, unimodal, •bell-shaped• Mean, μ, is at the center of the curve• Probabilities are the area under the curve• Total area = 1

Page 12: Statistics

The Empirical Rule

In a normal distribution with mean μ and standard deviation of σ:

• 68% of observations fall within 1 σ of the mean

• 95% of observation fall within 2 σ of the mean

• 99.7% observations fall within 3 σ of the mean

B ADF C

Page 13: Statistics

Example with dataSet of values: 2, 4, 4, 4, 5, 5, 7, 9

Mean:

Standard Deviation:

samples

tmeasuremenmean

#

2

2

8

42001113 22222222

2 4 4 4 5 5 7 9

85

Page 14: Statistics

Data Distribution

5 5+2

5+4

5+6

5-25-45-6

Page 15: Statistics

Confidence Interval

5 5+2

5+4

5+6

5-25-45-6

Page 16: Statistics

Central Limit Theorem If X follows a normal distribution with mean μ and standard

deviation σ, then x̄ is also normally distributed with mean What if X is not normally distributed?

When sampling from any population with mean μ and standard deviation σ, when n is large, the sampling distribution of x ̄ is approximately normal:

As the number of measurements increase, they will approach a normal distribution (Gaussian).

2

2

2

2

22

2

2

2

2

x

e

N

exP

x

xN

x

P x e x2

http://www.intuitor.com/statistics/CLAppClasses/CentLimApplet.htm

Visit This webpage to play with the numbers

Page 17: Statistics

Applications

Simulated examples: Dice rolling, coin

flipping ect…

Exit polling

Page 18: Statistics

Non-normal Distributions

Page 19: Statistics

Central Limit Theorem Summary

For large N of sample, the distribution of those mean values will be:

which is a normal distribution.

Normal distribution of CLT is independent of the type of distribution of data.

Page 20: Statistics

Where else would this become problematic?

Where can it still be used, but issues should be considered?

Page 21: Statistics

Questions?

Page 22: Statistics

Effective Statistics

You might have strong association, but how do you prove causation? (that x causes y?)

Good evidence for causation: a well designed experiment where all other variables that cause changes in the response variable are controlled

Page 23: Statistics

The Scientific/Statistic Process

1. Formulating a scientific question2. Decide on the population you are

interested in3. Select a sample4. Observational study or experiment? 5. Collect data6. Analyze data7. State your conclusion

Page 24: Statistics

Ways to collect information from sampleAnecdotal evidenceAvailable dataObservational studyExperiment

Page 25: Statistics

Sampling and Inference

population sample

σ

μ

s

sampling

inference

Page 26: Statistics

Some CautionsStatistics can not account for poor

experimental designThere is no sharp border between

“significant” and “non-significant” correlation, only increasing and decreasing evidence

Lack of significance may be due to poorly designed experiment

Page 27: Statistics

Fit Testst-test, z-test, and χ2 test

Page 28: Statistics

z-Test

Page 29: Statistics

z-test

• All normal distributions are the same if we standardize our data: • Units of size σ• Mean μ as center

• If x is an observation from a normal distribution, the standardized value of x is called the z-score

• Z-scores tell how many standard deviations away from the mean an observation is

Page 30: Statistics

z- test procedure

• To use: find the mean, standard deviation, and standard error

• Use these statistics along with the observed value to find Z value

• Consult the z-score table to find P(Z) the determined z

Equation for hypothesis testing: n

xz

/

Page 31: Statistics

ExampleJacob scores 16 on the ACT. Emily scores

670 on the SAT. Assuming that both tests measure scholastic aptitude, who has the higher score? The SAT scores for 1.4 million students in a recent graduating class were roughly normal with a mean of 1026 and standard deviation of 209. The ACT scores for more than 1 million students in the same class were roughly normal with mean of 20.8 and standard deviation of 4.8.

Page 32: Statistics

Example Continued

Jacob – ACT

Score: 16Mean: 20.8 Standard Dev.: 4.8

Emily - SAT

Score: 670 Mean: 1026Standard Dev.: 209

Page 33: Statistics

Interpreting Results

Page 34: Statistics

“Backwards” z-testWhat if we are given a probability (P(Z))

and we are interested in finding the observed value corresponding to the probability.?Find the Z-scoreSet up the probability (could be 2 sided)

P(-z0<Z<zo) = Convert the score to x by

n

zx

*

Page 35: Statistics

t Tests

Page 36: Statistics

Necessary assumptions for t-Test

1. Population is normally distributed.2. Sample is randomly selected from the

unknown population.3. Standard deviation of the unknown

population is the same as the known population.

So, we can take the sample standard deviation as an estimate of the known population.

ns

xt

/

Page 37: Statistics
Page 38: Statistics

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 820

0.2

0.4

0.6

0.8

1

Probability that fish populations are the same average length in each lake

T Test Accumulating Data (N) Progressively

# of samples included in analysis from each lake

Prob

abili

ty th

at th

e fis

h ar

e th

e sa

me

in b

oth

lake

s

This is typical of the kind of data many of you may generate. Let’s take a quickLook at how this T Test calculated from the data, using Excel.

Page 39: Statistics

z versus t proceduresUse z procedures if you know the

population standard deviationUse t procedure if you don’t know the

population standard deviationUsually we don’t know the population

standard deviation, unless told otherwiseCentral Limit Theorem

Page 40: Statistics

χ2-test (kai)

Page 41: Statistics

χ2-test (Goodness-of-fit) Users Guide• χ2-test tells us whether distributions of

categorical variables differ from one another• Can use to determine if your data conforms

to a functional fit.• Compares multiple means to multiple

expected values.• Can only use when you have multiple data

sets that cannot be combined into one mean.• Use when comparing means to expected

values.

Page 42: Statistics

χ2-testXi is each individual meanµi is each expected valueΔXi = uncertainty in Xi

d = # of mean values• χ2/d table gives probability that data matches

expected values.• In χ2/d , d is count of independent

measurements.

2 X i i 2

X i2

i1

d

Page 43: Statistics

χ2- (Goodness-of-fit) Test ProcedureFind averages and uncertainty for each

average.Calculate χ2 using averages, uncertainties,

and expected values.Count number of independent variables.Use table to find probability of fit accuracy

based on χ2/d and number of independent variables (d).

Page 44: Statistics

Example• Launch a bottle

rocket with several different volumes of water.

• Measure height of flight multiple times for each volume.

• You decide you have a fit of:

• Plot of fit with data on left.

)(m/ml V 10 - (m/ml) V0.204y 22-4

Page 45: Statistics

Example

7 degrees offreedom

Probability of fit ≈50%

50% of the time, chance alone could produce a larger χ2 value.

No reason to reject fit.

•This does not mean that other fits might not match the data better, so try other fits and see which one is closest.

Page 46: Statistics

Interpreting Results Probability is

how similar data is to expected value.

Large P means data is similar to expected value.

Small P means data is different than expected value.

Page 47: Statistics

SummaryPropagation of uncertainty

MeanAccuracy vs. PrecisionErrorStandard deviation Central Limit Theorem

Fit Testsz-testt-testχ2-test

Page 48: Statistics

Recommended