+ All Categories
Home > Documents > Chapter 2: Modeling Distributions of Data · 2018. 9. 2. · Cumulative frequency Cumulative...

Chapter 2: Modeling Distributions of Data · 2018. 9. 2. · Cumulative frequency Cumulative...

Date post: 30-Mar-2021
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
18
Chapter 2: Modeling Distributions of Data 2.1 – Describing Location in a Distribution
Transcript
Page 1: Chapter 2: Modeling Distributions of Data · 2018. 9. 2. · Cumulative frequency Cumulative relative frequency 1 1/51 = .020 11 11/51 = 0.216 25 25/51 = 0.490 37 37/51 = 0.725 42

Chapter 2: Modeling Distributions of Data

2.1 – Describing Location in a

Distribution

Page 2: Chapter 2: Modeling Distributions of Data · 2018. 9. 2. · Cumulative frequency Cumulative relative frequency 1 1/51 = .020 11 11/51 = 0.216 25 25/51 = 0.490 37 37/51 = 0.725 42

Learning Objectives

After this section, you should be able to:

• MEASURE position using percentiles

• INTERPRET cumulative relative frequency graphs

• TRANSFORM data

• DEFINE and DESCRIBE density curves

Page 3: Chapter 2: Modeling Distributions of Data · 2018. 9. 2. · Cumulative frequency Cumulative relative frequency 1 1/51 = .020 11 11/51 = 0.216 25 25/51 = 0.490 37 37/51 = 0.725 42

Measuring Position: Percentiles• The pth percentile of a distribution is the value with p percent of the

observations less than it.

• One way to describe the location of a value in a distribution is to tell

what percent of observations are less than it.

“at 70th percentile” not “in …”

Page 4: Chapter 2: Modeling Distributions of Data · 2018. 9. 2. · Cumulative frequency Cumulative relative frequency 1 1/51 = .020 11 11/51 = 0.216 25 25/51 = 0.490 37 37/51 = 0.725 42

Example #1

If Jenny is in a class of 25 students and makes an 86 on her test. How did she perform relative to the rest of her class? What percentile is she ranked in?

6 7

7 2334

7 5777899

8 00123334

8 569

9 03

6 7

7 2334

7 5777899

8 00123334

8 569

9 03

4th highest 21 lower scores

21

25= 84%

6 7

7 2334

7 5777899

8 00123334

8 569

9 03

Page 5: Chapter 2: Modeling Distributions of Data · 2018. 9. 2. · Cumulative frequency Cumulative relative frequency 1 1/51 = .020 11 11/51 = 0.216 25 25/51 = 0.490 37 37/51 = 0.725 42

Example #1

If Jenny is in a class of 25 students and makes an 86 on her test. How did she perform relative to the rest of her class? What percentile is she ranked in?

21

25= 84%

Her score was greater than 21 of the 25

observations.

Jenny is at the 84th percentile of the test

score distribution in her class.

Page 6: Chapter 2: Modeling Distributions of Data · 2018. 9. 2. · Cumulative frequency Cumulative relative frequency 1 1/51 = .020 11 11/51 = 0.216 25 25/51 = 0.490 37 37/51 = 0.725 42

Example #2

The stemplot shows the number of wins for each of the 30 Major

League Baseball teams in 2009.

5 9

6 2455

7 00455589

8 0345667778

9 123557

10 3

Rockies: 24

30= 80%

Yankees:29

30= 97%

*Can NOT be at 100th percentile with this method

5 9

6 2455

7 00455589

8 0345667778

9 123557

10 3

5 9

6 2455

7 00455589

8 0345667778

9 123557

10 3

Since 80% of the teams won less games,

they are at the 80th percentile.

Since 97% of the teams won less games,

they are at the 97th percentile.

Page 7: Chapter 2: Modeling Distributions of Data · 2018. 9. 2. · Cumulative frequency Cumulative relative frequency 1 1/51 = .020 11 11/51 = 0.216 25 25/51 = 0.490 37 37/51 = 0.725 42

Example #2

The stemplot shows the number of wins for each of the 30 Major

League Baseball teams in 2009.

5 9

6 2455

7 00455589

8 0345667778

9 123557

10 3

Indians:3

30= 10%

* 2 teams

at the

same

percentile

Since 10% of the teams won less games,

they are at the 10th percentile.

Page 8: Chapter 2: Modeling Distributions of Data · 2018. 9. 2. · Cumulative frequency Cumulative relative frequency 1 1/51 = .020 11 11/51 = 0.216 25 25/51 = 0.490 37 37/51 = 0.725 42

Cumulative Relative Frequency Graph

A cumulative relative

frequency graph displays the

cumulative relative frequency

of each class of a frequency

distribution.

Or called “ogive”

Page 9: Chapter 2: Modeling Distributions of Data · 2018. 9. 2. · Cumulative frequency Cumulative relative frequency 1 1/51 = .020 11 11/51 = 0.216 25 25/51 = 0.490 37 37/51 = 0.725 42

Example #3

The table shows the distribution of median household incomes for

the 50 states and the District of Columbia in a recent year.

Median Income

($1000s)

Frequency Relative frequency Cumulative

frequency

Cumulative relative

frequency

35 to < 40 1 1/51 = 0.020

40 to < 45 10 10/51 = 0.196

45 to < 50 14 14/51 = 0.275

50 to < 55 12 12/51 = 0.235

55 to < 60 5 5/51 = 0.098

60 to < 65 6 6/51 = 0.118

65 to < 70 3 3/51 = 0.059

Median Income

($1000s)

Frequency Relative frequency Cumulative

frequency

Cumulative relative

frequency

35 to < 40 1 1/51 = 0.020 1 1/51 = .020

40 to < 45 10 10/51 = 0.196 11 11/51 = 0.216

45 to < 50 14 14/51 = 0.275 25 25/51 = 0.490

50 to < 55 12 12/51 = 0.235 37 37/51 = 0.725

55 to < 60 5 5/51 = 0.098 42 42/51 = 0.824

60 to < 65 6 6/51 = 0.118 48 48/51 = 0.941

65 to < 70 3 3/51 = 0.059 51 51/51 = 1.000

Page 10: Chapter 2: Modeling Distributions of Data · 2018. 9. 2. · Cumulative frequency Cumulative relative frequency 1 1/51 = .020 11 11/51 = 0.216 25 25/51 = 0.490 37 37/51 = 0.725 42

Example #3

Cumulative

frequency

Cumulative relative

frequency

1 1/51 = .020

11 11/51 = 0.216

25 25/51 = 0.490

37 37/51 = 0.725

42 42/51 = 0.824

48 48/51 = 0.941

51 51/51 = 1.000

Page 11: Chapter 2: Modeling Distributions of Data · 2018. 9. 2. · Cumulative frequency Cumulative relative frequency 1 1/51 = .020 11 11/51 = 0.216 25 25/51 = 0.490 37 37/51 = 0.725 42

Example #3

a) California, with a median household

income of $57,445, is at what percentile?

Interpret this value.

About 79th percentile

In California, ~79% of household have an

income less than $57,445

b) What is the 25th percentile for this

distribution? What is another name for

this value?

25th percentile ~ $45,000

* Also known as 𝑸𝟏 *

Page 12: Chapter 2: Modeling Distributions of Data · 2018. 9. 2. · Cumulative frequency Cumulative relative frequency 1 1/51 = .020 11 11/51 = 0.216 25 25/51 = 0.490 37 37/51 = 0.725 42

Check your Understanding

The graph is a cumulative relative frequency graph showing the lifetimes

(in hours) of 200 lamps.

Page 13: Chapter 2: Modeling Distributions of Data · 2018. 9. 2. · Cumulative frequency Cumulative relative frequency 1 1/51 = .020 11 11/51 = 0.216 25 25/51 = 0.490 37 37/51 = 0.725 42

Measuring Position: z - scores

The z – score tells us how many standard deviations from the mean an observation falls,

and in what direction

Definition:

If x is an observation from a distribution that has known mean

and standard deviation, the standardized value of x is:

A standardized value is often called a z-score.

z x mean

standard deviationOr 𝑧 =

𝑥𝑖− ҧ𝑥

𝑠𝑥

** Unit = st. dev.

** Positive z score = above mean

** Negative z score = below mean

Page 14: Chapter 2: Modeling Distributions of Data · 2018. 9. 2. · Cumulative frequency Cumulative relative frequency 1 1/51 = .020 11 11/51 = 0.216 25 25/51 = 0.490 37 37/51 = 0.725 42

Example #4

The single-season home record for major league baseball has been set just three times since Babe Ruth hit

60 home runs in 1927. Roger Maris hit 61 in 1961, Mark McGwire hit 70 in 1998 and Barry Bonds hit 73 in

2001. In absolute sense, Barry Bonds had the best performance of these four players, because he hit the

most home runs in a single season. However, in a relative sense this may not be true. Baseball historians

suggest that hitting home run has been easier in some eras than others. This is due to many factors,

including quality of batters, quality of pitchers, hardness of the baseball, dimensions of ballparks, and possible

use of performance-enhancing drugs. To male fair comparison, we should see how these performances rate

relative to others hitters during the same year. Calculate the standardized score for each player and

compare.Year Player HR Mean SD

1927 Babe Ruth 60 7.2 9.7

1961 Roger Maris 61 18.8 13.4

1998 Mark McGwire 70 20.7 12.7

2001 Barry Bonds 73 21.4 13.2

Page 15: Chapter 2: Modeling Distributions of Data · 2018. 9. 2. · Cumulative frequency Cumulative relative frequency 1 1/51 = .020 11 11/51 = 0.216 25 25/51 = 0.490 37 37/51 = 0.725 42

Example #4

Year Player HR Mean SD

1927 Babe Ruth 60 7.2 9.7

1961 Roger Maris 61 18.8 13.4

1998 Mark McGwire 70 20.7 12.7

2001 Barry Bonds 73 21.4 13.2

Ruth: 𝑧 =60−7.2

9.7= 5.44

Maris: 𝑧 =61−18.8

13.4= 3.15

McGwire: 𝑧 =70−20.7

12.7= 3.88

Bonds: 𝑧 =73−21.4

13.2= 3.91

Page 16: Chapter 2: Modeling Distributions of Data · 2018. 9. 2. · Cumulative frequency Cumulative relative frequency 1 1/51 = .020 11 11/51 = 0.216 25 25/51 = 0.490 37 37/51 = 0.725 42

Example #4

Ruth: 𝑧 =60−7.2

9.7= 5.44

Maris: 𝑧 =61−18.8

13.4= 3.15

McGwire: 𝑧 =70−20.7

12.7= 3.88

Bonds: 𝑧 =73−21.4

13.2= 3.91

All player are above mean for their respective

year. However, Babe Ruth is the home run

champ, relatively speaking.

Page 17: Chapter 2: Modeling Distributions of Data · 2018. 9. 2. · Cumulative frequency Cumulative relative frequency 1 1/51 = .020 11 11/51 = 0.216 25 25/51 = 0.490 37 37/51 = 0.725 42

Class Example

• What variable could we measure about this class?

• # of sports played, # of siblings, # of phone contacts, # of texts per day, etc.

Put data in calculator and find 1 –Var Stats

• Find ҧ𝑥 𝑎𝑛𝑑 𝑠𝑥

Find your own z – score

Who is above the mean? Below mean?

Anyone with 𝑧 = 0

Page 18: Chapter 2: Modeling Distributions of Data · 2018. 9. 2. · Cumulative frequency Cumulative relative frequency 1 1/51 = .020 11 11/51 = 0.216 25 25/51 = 0.490 37 37/51 = 0.725 42

Example #5

In 2001, Arizona Diamondback Mark Grace’s home run total had a

standardized score of 𝑧 = −0.48. Interpret this value and calculate the

number of home runs he hit.

Grace hit 0.48 standard deviations below the mean of 21.4

home runs in 2001.

−0.48 =𝑥 − 21.4

13.2∙ 13.213.2 ∙

−6.336 = 𝑥 − 21.4

15 ≈ 𝑥

About 15 home runs


Recommended