+ All Categories
Home > Documents > Introduction to Basic Statistics

Introduction to Basic Statistics

Date post: 03-Jan-2016
Category:
Upload: teagan-miller
View: 33 times
Download: 0 times
Share this document with a friend
Description:
Introduction to Basic Statistics. S x. x. =. n. Mean. The mean is the sum of the values of a set of data divided by the number of values in that data set. (pronounced “X-bar”). Mean. S x. x. =. n. x = individual data value n = # of data values in the data set - PowerPoint PPT Presentation
Popular Tags:
61
Introduction Introduction to to Basic Basic Statistics Statistics
Transcript
Page 1: Introduction to Basic Statistics

IntroductionIntroductiontoto

Basic StatisticsBasic Statistics

Page 2: Introduction to Basic Statistics

S xnx =

Mean

(pronounced “X-bar”)

The meanmean is the sum of the values of a set of data divided by the number of values in that data set.

Page 3: Introduction to Basic Statistics

x = individual data value

n = # of data values in the data set

S = summation of a set of values

S xnx =

Mean

Page 4: Introduction to Basic Statistics

3 7 12 17 21 21 23 27 32 36 44

Data Set:

Sum of the values = 243

Number of values = 11

Mean = 24311

= 22.09S xnx = =

Mean

Page 5: Introduction to Basic Statistics

The most frequently occurring value in a set of data is the modemode.

Symbol… M

Mode

27 17 12 7 21 44 23 3 36 32 21

Data Set:

Page 6: Introduction to Basic Statistics

3 7 12 17 21 21 23 27 32 36 44

Data Set:

Mode = 21

The most frequently occurring value in a set of data is the modemode.

Mode

Page 7: Introduction to Basic Statistics

Note: If two numbers of equal frequency stand out, then the data set is “bimodal.” If more than two numbers of equal frequency stand out, then the data set is “multi-modal.”

The most frequently occurring value in a set of data is the modemode.

Mode

Page 8: Introduction to Basic Statistics

The medianmedian is the value that occurs in the middle of a set of data that has been arranged in chronological order.

Symbol… x pronounced “X-tilde”~

Median

Page 9: Introduction to Basic Statistics

Data Set:

Median = 21

Median

The medianmedian is the value that occurs in the middle of a set of data that has been arranged in chronological order.

27 17 12 7 21 44 23 3 36 32 21

Page 10: Introduction to Basic Statistics

Note: A data set that contains an odd # of values always has a Median. For an even # of values, the two middle values are averaged with the result being the Median.

Median

3 7 12 17 21 21 23 27 32 36 44Data Set:

Median = 21

Page 11: Introduction to Basic Statistics

The rangerange is the difference between the largest and smallest values that occur in a set of data.

Range = 44-3 = 41

Symbol… R

Range

3 7 12 17 21 21 23 27 32 36 44Data Set:

Page 12: Introduction to Basic Statistics

Standard Deviation

Page 13: Introduction to Basic Statistics

Two classes took a recent quiz. There were 10 students in each class, and each class had an average score of 81.5

Page 14: Introduction to Basic Statistics

Since the averages are the same, can we assume that the students in both classes all did pretty much the same on the exam?

Page 15: Introduction to Basic Statistics

The answer is… No.

The average (mean) does not tell us anything about the distribution or variation in the grades.

Page 16: Introduction to Basic Statistics

Here are Dot-Plots of the grades in each class:

Page 17: Introduction to Basic Statistics

Mean

Page 18: Introduction to Basic Statistics

So, we need to come up with some way of

measuring not just the average, but also the

spread of the distribution of our data.

Page 19: Introduction to Basic Statistics

Why not just give an average and the range of data (the highest and

lowest values) to describe the distribution

of the data?

Page 20: Introduction to Basic Statistics

Well, for example, lets say from a set of data, the

average is 17.95 and the range is 23.

But what if the data looked like this:

Page 21: Introduction to Basic Statistics

Here is the average

And here is the range

But really, most of the numbers are in this area, and are not evenly distributed throughout the range.

Page 22: Introduction to Basic Statistics

The Standard Deviation is a number that

measures how far away each number in a set of data is from their mean.

Page 23: Introduction to Basic Statistics

If the Standard Deviation is large,

it means the numbers are spread out from their mean.

If the Standard Deviation is small, it means the numbers are

close to their mean.small,

large,

Page 24: Introduction to Basic Statistics

Here are the scores

on the math quiz for Team

A:

72

76

80

80

81

83

84

85

85

89

Average: 81.5

Page 25: Introduction to Basic Statistics

The Standard Deviation measures how far away each number in a set of data is from their mean.For example, start with the

lowest score, 72. How far away is 72 from the mean of 81.5?72 - 81.5 = - 9.5

- 9.5

Page 26: Introduction to Basic Statistics

- 9.5

Or, start with the lowest score, 89. How far away is 89 from the mean of 81.5?89 - 81.5 = 7.5

7.5

Page 27: Introduction to Basic Statistics

So, the first step to finding the Standard Deviation is to find all the

distances from the mean.

72

76

80

80

81

83

84

85

85

89

-9.5

7.5

Distance from Mean

Page 28: Introduction to Basic Statistics

So, the first step to finding the Standard Deviation is to find all the

distances from the mean.

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

Page 29: Introduction to Basic Statistics

Next, you need to square each of

the distances

to turn them all

into positive

numbers

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

90.25

30.25

Distances

Squared

Page 30: Introduction to Basic Statistics

Next, you need to square each of

the distances

to turn them all

into positive

numbers

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

90.25

30.25

2.25

2.25

0.25

2.25

6.25

12.25

12.25

56.25

Distances

Squared

Page 31: Introduction to Basic Statistics

Add up all of the

distances

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

90.25

30.25

2.25

2.25

0.25

2.25

6.25

12.25

12.25

56.25

Distances

Squared

Sum:214.5

Page 32: Introduction to Basic Statistics

Divide by (n - 1) where n represents the amount of numbers you have.

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

90.25

30.25

2.25

2.25

0.25

2.25

6.25

12.25

12.25

56.25

Distances

Squared

Sum:214.5

(10 - 1)

= 23.8

Page 33: Introduction to Basic Statistics

Finally, take the Square

Root of the average distance

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

90.25

30.25

2.25

2.25

0.25

2.25

6.25

12.25

12.25

56.25

Distances

Squared

Sum:214.5

(10 - 1)

= 23.8= 4.88

Page 34: Introduction to Basic Statistics

This is the Standard Deviation

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

90.25

30.25

2.25

2.25

0.25

2.25

6.25

12.25

12.25

56.25

Distances

Squared

Sum:214.5

(10 - 1)

= 23.8= 4.88

Page 35: Introduction to Basic Statistics

Now find the

Standard Deviation

for the other class

grades

57

65

83

94

95

96

98

93

71

63

- 24.5

- 16.5

1.5

12.5

13.5

14.5

16.5

11.5

- 10.5

-18.5

Distance from Mean

600.25

272.25

2.25

156.25

182.25

210.25

272.25

132.25

110.25

342.25

Distances

Squared

Sum:2280.5(10 - 1)

= 253.4= 15.91

Page 36: Introduction to Basic Statistics

Now, lets compare the two classes again

Team A Team B

Average on the Quiz

Standard Deviation

81.5 81.5

4.88 15.91

Page 37: Introduction to Basic Statistics

A histogram is a common data distribution graph that is used to show the frequency with which specific values, or values within ranges, occur in a set of data.

An forensic engineer might use a histogram to show the most common, or average, dimension that exists among a group of identical manufactured parts.

Histogram

Page 38: Introduction to Basic Statistics

0 1 2 3 4 5 6-1-2-3-4-5-6

0

3

-1

3

2

-1

-1

1

2

-3

0

1

0

1

-2

1

2

-4

-1

1

0

-2

0

0

Page 39: Introduction to Basic Statistics

Specific values, called data elements, are plotted along the X-axis of the graph.

Histogram

0 1 2 3 4 5 6-1-2-3-4-5-6

Data Elements

Page 40: Introduction to Basic Statistics

Large sets of data are often divided into limited number of groups. These groups are called class intervals.

Histogram

-5 to 5

Class Intervals6 to 16-6 to -16

Page 41: Introduction to Basic Statistics

The number of data elements is shown by the frequency, which occurs along the Y-axis of the graph.

HistogramF

req

uen

cy

1

3

5

7

-5 to 5 6 to 16-6 to -16

Page 42: Introduction to Basic Statistics

“Is the data normal?”

Translation…does the greatest frequency of the data values occur about the mean value?

Normal Distribution

Page 43: Introduction to Basic Statistics

Fre

qu

ency

Data Elements

0 1 2 3 4 5 6-1-2-3-4-5-6

Mean Value

Normal Distribution

Page 44: Introduction to Basic Statistics

“Is the process normal?”

Further Translation…does the data form a bell shape curve when plotted on a histogram?

Normal Distribution

Page 45: Introduction to Basic Statistics

Fre

qu

ency

Data Elements

0 1 2 3 4 5 6-1-2-3-4-5-6

Normal Distribution

Page 46: Introduction to Basic Statistics
Page 47: Introduction to Basic Statistics

Basic Biostat 5: Probability Concepts 47

Chapter 5: Chapter 5: Probability ConceptsProbability Concepts

Page 48: Introduction to Basic Statistics

Basic Biostat 5: Probability Concepts 48

In Chapter 5:

5.1 What is Probability?

5.2 Types of Random Variables

5.3 Discrete Random Variables

5.4 Continuous Random Variables

5.5 More Rules and Properties of Probability

Page 49: Introduction to Basic Statistics

Basic Biostat 5: Probability Concepts 49

Definitions• Random variable ≡ a numerical quantity that

takes on different values depending on chance• Population ≡ the set of all possible values for a

random variable• Event ≡ an outcome or set of outcomes

• Probability ≡ the relative frequency of an event in the population … alternatively… the proportion of times an event is expected to occur in the long run

Page 50: Introduction to Basic Statistics

Basic Biostat 5: Probability Concepts 50

Example• In a given year: 42,636

traffic fatalities (events) in a population of N = 293,655,000

• Random sample population

• Probability of event = relative freq in pop= 42,636 / 293,655,000 = .0001452 ≈ 1 in 6887

Page 51: Introduction to Basic Statistics

Basic Biostat 5: Probability Concepts 51

Example: Probability• Assume, 20% of population has a condition• Repeatedly sample population• The proportion of observations positive for the condition approaches 0.2 after a very large number of trials

Page 52: Introduction to Basic Statistics

Basic Biostat 5: Probability Concepts 53

Random Variables• Random variable ≡ a numerical quantity

that takes on different values depending on chance

• Two types of random variables– Discrete random variables (countable set of

possible outcomes) – Continuous random variable (unbroken

chain of possible outcomes)

Page 53: Introduction to Basic Statistics

Basic Biostat 5: Probability Concepts 54

Example: Discrete Random Variable

• Treat 4 patients with a drug that is 75% effective

• Let X ≡ the [variable] number of patients that respond to treatment

• X is a discrete random variable can be either 0, 1, 2, 3, or 4 (a countable set of possible outcomes)

Page 54: Introduction to Basic Statistics

Basic Biostat 5: Probability Concepts 55

Example: Discrete Random Variable

• Discrete random variables are understood in terms of their probability mass function (pmf)

• pmf ≡ a mathematical function that assigns probabilities to all possible outcomes for a discrete random variable.

• This table shows the pmf for our “four patients” example:

x 0 1 2 3 4

Pr(X=x) 0.0039 0.0469 0.2109 0.4219 0.3164

Page 55: Introduction to Basic Statistics

Random Variable

• A random variable x takes on a defined set of values with different probabilities.

• For example, if you roll a die, the outcome is random (not fixed) and there are 6 possible outcomes, each of which occur with probability one-sixth.

• For example, if you poll people about their voting preferences, the percentage of the sample that responds “Yes on Proposition 100” is a also a random variable (the percentage will be slightly differently every time you poll).

• Roughly, probability is how frequently we expect different outcomes to occur if we repeat the experiment over and over (“frequentist” view)

Page 56: Introduction to Basic Statistics

Random variables can be discrete or continuous

• Discrete random variables have a countable number of outcomes– Examples: Dead/alive, treatment/placebo, dice, counts,

etc.

• Continuous random variables have an infinite continuum of possible values. – Examples: blood pressure, weight, the speed of a car,

the real numbers from 1 to 6.

Page 57: Introduction to Basic Statistics

Probability functions

• A probability function maps the possible values of x against their respective probabilities of occurrence, p(x)

• p(x) is a number from 0 to 1.0.• The area under a probability function is

always 1.

Page 58: Introduction to Basic Statistics

Discrete example: roll of a die

x

p(x)

1/6

1 4 5 62 3

xall

1 P(x)

Page 59: Introduction to Basic Statistics

Probability mass function (pmf)

x p(x)

1 p(x=1)=1/6

2 p(x=2)=1/6

3 p(x=3)=1/6

4 p(x=4)=1/6

5 p(x=5)=1/6

6 p(x=6)=1/61.0

Page 60: Introduction to Basic Statistics

Cumulative distribution function

x P(x≤A)

1 P(x≤1)=1/6

2 P(x≤2)=2/6

3 P(x≤3)=3/6

4 P(x≤4)=4/6

5 P(x≤5)=5/6

6 P(x≤6)=6/6

Page 61: Introduction to Basic Statistics

• For evidence yielding full single source DNA profiles, laboratories will typically use random match probabilities which are based on genotype frequency estimates, while others will use a likelihood ratio under the primary hypothesis that the suspect is the source of the DNA profile versus the alternate

• hypothesis where an unrelated untested individuals from the general population was the DNA donor.


Recommended