MAT 142 College Mathematics Module ST Statisticskamman/notes/statistics/statistics.pdf · 38 22 27...

MAT 142 College Mathematics Module ST

StatisticsTerri Miller revised July 14, 2015

2 Statistics

Data Organization and Visualization

Basic Terms.

A population is the set of all objects under study, a sample is any subset of a population,and a data point is an element of a set of data.

Example 1. Population, sample, data pointPopulation: all ASU studentsSample: 1000 randomly selected ASU students

data: 10, 15, 13, 25, 22, 53, 47data point: 13data point: 53

The frequency is the number of times a particular data point occurs in the set of data. Afrequency distribution is a table that list each data point and its frequency. The relativefrequency is the frequency of a data point expressed as a percentage of the total numberof data points.

Example 2. Frequency, relative frequency, frequency distributiondata: 1, 3, 6, 4, 5, 6, 3, 4, 6, 3, 6frequency of the data point 1 is 1frequency of the data point 6 is 4the relative frequency of the data point 6 is (4/11)× 100% ≈ 36.35%the frequency distribution for this set of data is: (where x is a data point and f is thefrequency for that point)

x f1 13 34 42 25 1

Data is often described as ungrouped or grouped. Ungrouped data is data given as indi-vidual data points. Grouped data is data given in intervals.

Example 3. Ungrouped data without a frequency distribution.

1, 3, 6, 4, 5, 6, 3, 4, 6, 3, 6

Example 4. Ungrouped data with a frequency distribution.

MAT 142 - Module ST 3

Number oftelevision sets Frequency

0 21 132 183 04 105 2

Total 45

Example 5. Grouped data.

Exam score Frequency90-99 780-89 570-79 1560-69 450-59 540-49 030-39 1Total 37

Organizing Data.

Given a set of data, it is helpful to organize it. This is usually done by creating a frequencydistribution.

Example 6. Ungrouped data.Given the following set of data, we would like to create a frequency distribution.

1 5 7 8 23 7 2 8 7

To do this we will count up the data by making a tally (a tick mark in the tally column foreach occurrence of the data point. As before, we will designate the data points by x.

x tally1 |2 ||3 |45 |67 |||8 ||

Now we add a column for the frequency, this will simply be the number of tick marks foreach data point. We will also total the number of data points. As we have done previously,we will represent the frequency with f .

4 Statistics

x tally f1 | 12 || 23 | 14 05 | 16 07 ||| 38 || 2

Total 10

This is a frequency distribution for the data given. We could also include a column for therelative frequency as part of the frequency distribution. We will use rel f to indicate therelative frequency.

x tally f rel f1 | 1 1

10∗ 100% = 10%

2 || 2 210∗ 100% = 20%

3 | 1 10%4 0 0%5 | 1 10%6 0 0%7 ||| 3 3

10∗ 100% = 30%

8 || 2 20%Totals 10 100%

For our next example, we will use the data to create groups (or categories) for the data andthen make a frequency distribution.

Example 7. Grouped Data.Given the following set of data, we want to organize the data into groups. We have decidedthat we want to have 5 intervals.

26 18 21 34 1838 22 27 22 3025 25 38 29 2024 28 32 33 18

Since we want to group the data, we will need to find out the size of each interval. To dothis we must first identify the highest and the lowest data point. In our data the highestdata point is 38 and the lowest is 18. Since we want 5 intervals, we make the computation

highest− lowest

number of intervals=

38− 18

5=

20

5= 4

Since we need to include all points, we always take the next highest integer from that whichwas computed to get the length of our interval. Since we computed 4, the length of ourintervals will be 5. Now we set up the first interval

lowest ≤ x < lowest + 5 which results in 18 ≤ x < 23.


Our next interval is obtained by adding 5 to each end of the first one:

18 + 5 ≤ x < 23 + 5 which results in 23 ≤ x < 28.

We continue in this manner to get all of our intervals:

18 ≤ x < 2323 ≤ x < 2828 ≤ x < 3333 ≤ x < 3838 ≤ x < 43.

Now we are ready to tally the data and make the frequency distribution. Be careful to makesure that a data point that is the same number as the end of the interval is placed in thecorrect interval. This means that the data point 33 is counted in the interval 33 ≤ x < 38and NOT in the interval 28 ≤ x < 33.

x tally f rel f18 ≤ x < 23 ||||||| 7 7

20∗ 100% = 35%

23 ≤ x < 28 ||||| 5 520∗ 100% = 25%

28 ≤ x < 33 |||| 4 420∗ 100% = 20%

33 ≤ x < 38 || 2 220∗ 100% = 10%

38 ≤ x < 43 || 2 10%Totals 20 100%

Histogram.

Now that we have the data organized, we want a way to display the data. One such displayis a histogram which is a bar chart that shows how the data are distributed among eachdata point (ungrouped) or in each interval (grouped)

Example 8. Histogram for ungrouped data.Given the following frequency distribution:


0 21 132 183 04 105 2

The histogram would look as follows:

6 Statistics

! " # $ % &

'

"

$

&

(

!'

!"

!$

!&

!(

"'

Example 9. Histogram example for grouped data. We will use the data from example 7.The histogram would look as follows.


Measures of Central Tendency

Mode.

The mode is the data point which occurs most frequently. It is possible to have more thanone mode, if there are two modes the data is said to be bimodal. It is also possible fora set of data to not have any mode, this situation occurs if the number of modes gets tobe “too large”. It it not really possible to define “too large” but one should exercise goodjudgement. A reasonable, though very generous, rule of thumb is that if the number of datapoints accounted for in the list of modes is half or more of the data points, then there is nomode.

Note: if the data is given as a list of data points, it is often easiest to find the mode bycreating a frequency distribution. This is certainly the most organized method for findingit. In our examples we will use frequency distributions.

Example 10. A data set with a single mode.Consider the data from example 8:


0 21 132 183 04 105 2

You can see from the table that the data point which occurs most frequently is 2 as it has afrequency of 18. So the mode is 2.

Example 11. A data set with two modes.Consider the data:

Number ofhours of television Frequency

0 10.5 41 8

1.5 92 13

2.5 103 11

3.5 134 5

4.5 3

You can see from the table that the data points 2 and 3.5 both occur with the highestfrequency of 13. So the modes are 2 and 3.5.

8 Statistics

Example 12. A data set with no mode.Consider the data:

Age Frequency18 1219 520 321 922 123 824 1225 1226 527 3

Total 71

You can see from the table that the data points 18, 24 and 25 all occur with the highestfrequency of 12. Since this would account for 36 of the 71 data points, this would qualify as“too large” a number of data points taken accounted for. In this case, we would say there isno mode.

Median.

The median is the data point in the middle when all of the data points are arranged inorder (high to low or low to high). To find where it is, we take into account the numberof data points. If the number of data points is odd, divide the number of data points by 2and then round up to the next integer; the resulting integer is the location of the median.If the number of data points is even, there are two middle values. We take the number ofdata points and divide by 2, this integer is the first of the two middles, the next one is alsoa middle. Now we average these two middle values to get the median.

Example 13. An odd number of data points with no frequency distribution.

3, 4.5, 7, 8.5, 9, 10, 15

There are 7 data points and 7/2=3.5 so the median is the 4th number, 8.5.

Example 14. an odd number of data points with a frequency distribution.

Age Frequency18 1219 520 321 922 2

Total 31

There are 31 data points and 31/2=15.5 so the median is the 16th number. Start counting,18 occurs 12 times, then 19 occurs 5 times getting us up to entry 17 (12+5); so the 16thentry must be a 19. This data set has a median of 19.


Example 15. An even number of data points with no frequency distribution.

3, 4.5, 7, 8.5, 9, 10, 15, 15.5

There are 8 data points and 7/2=3.5 so the median is the 4th number, 8.5.

Mean.

The mean is the average of the data points, it is denoted x. There are three types of datafor which we would like to compute the mean, ungrouped of frequency 1, ungrouped with afrequency distribution, and grouped.

Starting with the first type, ungrouped of frequency 1, is when data is given to you as a listand it is not organized into a frequency distribution. When this happens, we compute theaverage as we have always done, add up all of the data points and divide by the number ofdata points. To write a formula for this, we use the capital greek letter sigma, Σx. Thisjust means to add up all of the data points. We will use n to represent the number of datapoints.

mean: x =Σx

n

This corresponds to the left hand column of your calculator instructions.

Example 16. Given the ungrouped data list below:

10 15 13 25 22 53 47

We would enter the data into the calculator following the instructions in the left hand column,the result is x = 26.4285714.

When have a frequency distribution for the data, we have to take the average as before butremember that the frequency gives the number of times that the data point occurs.

mean: x =Σ(fx)

n

This corresponds to the right hand column of your calculator instructions.

Example 17. Given the frequency distribution for ungrouped data below:


0 21 132 183 04 105 2

We would enter the data into our calculators following the directions in the right handcolumn. The result is x = 2.2.

10 Statistics

Our final type of data is grouped data. This requires a computation before we can begin.Since we cannot enter the entire interval as a data point, we use a representative for eachinterval, xi. This representative is the midpoint of the interval, to find the midpoint of aninterval you add the two endpoints and divide by 2. These are the numbers that you use asdata points for computing the mean.

Example 18. Mean for Grouped data.We will use the data from example 7 again:

x f18 ≤ x < 23 723 ≤ x < 28 528 ≤ x < 33 433 ≤ x < 38 238 ≤ x < 43 2

Total 20

Start by calculating the representative for each interval.

23 + 18

2=

41

2= 20.5

Since this is the midpoint of the first interval and the intervals have length 5, we find therest by adding 5 to this one.

x xi f18 ≤ x < 23 20.5 723 ≤ x < 28 25.5 528 ≤ x < 33 30.5 433 ≤ x < 38 35.5 238 ≤ x < 43 40.5 2

Totals 20

Now enter the data as directed using the right hand column of the calculator directions. Youuse xi as the data point (list 1) and the frequency as usual in list 2. The result should bex = 27.25.


Measures of Dispersion

In the last section we looked at measure of central tendency. These let us know what themiddle of a set of data look like. We are not going to look at measures of dispersion. Thesewill tell us know spread out a set of data is.

One measure of spread is the range. The range is the difference between the highest datavalue and the lowest data value.

We are going to be using the following data for the next few examples. Below is a list of theheights, in inches, of the 2015/16 Phoenix Suns roster players.

84 73 78 85 77 75 85 82 75 82 76 78 80

Example 19. What is the range of the heights of the players on the current Phoenix Sunsroster?

Solution: The tallest player is 85 inches and the shortest player is 73 inches. This meansthat the range of the heights is 85− 73 = 12.

The standard deviation is a number that describes the amount of variation or dispersionof a set of data values. The standard deviation is the square root of the variance.

The standard deviation is the square root of the sum of the square of the differences betweeneach data value and the mean divide by one less than the number of data values.

Sx =

√Σ(x− x)2

(n− 1)

Example 20. Find the standard deviation of the heights of the players on the Phoenix Suns2015/16 roster. How many of the players fall within one standard deviation of the mean?

Solution: We first must find the mean of the player heights. We do this by adding all theheights together and dividing by the number of players. (Note: Round the mean to twodecimal places and any squared values to four decimal places.)

x =84 + 73 + 78 + 85 + 77 + 75 + 85 + 82 + 75 + 82 + 76 + 78 + 80

13= 79.23

Now that we have the mean, we can calculate the standard deviation. This is most easilydone by creating a table.

12 Statistics

Data Value x− x (x− x)2

84 84− 79.23 = 4.77 (4.77)2 = 22.752973 73− 79.23 = −6.23 (−6.23)2 = 38.812978 78− 79.23 = −1.23 (−1.23)2 = 1.512985 85− 79.23 = 5.77 5.772 = 33.292977 77− 79.23 = −2.23 (−2.23)2 = 4.972975 75− 79.23 = −4.23 (−4.23)2 = 17.892985 85− 79.23 = 5.77 5.772 = 33.292982 82− 79.23 = 2.77 2.772 = 7.672975 75− 79.23 = −4.23 (−4.23)2 = 17.892982 82− 79.23 = 2.77 2.772 = 7.672976 76− 79.23 = −3.23 (−3.23)2 = 10.432978 78− 79.23 = −1.23 (−1.23)2 = 1.512980 80− 79.23 = 0.77 0.772 = 0.5929

Σ (x− x) ≈ 0 Σ (x− x)2 = 198.3077

We now need to take the sum of the square of the differences between the data values andthe mean, divide that number by 12 (13 − 1) and then take the square root of that result.This will give us that the standard deviation of the heights of the players is

Sx =

√198.3077

12= 4.07

Now to answer the question of how many players fall within one standard deviation of themean. This is asking us to first find an interval whose left end is the mean minus one standarddeviation (79.23−4.07 = 75.16) and whose right end is the mean plus one standard deviation(79.23 + 4.07 = 83.3). We now need to count how many players have a height between 75.16inches and 83.3 inches. There are seven players whose heights (78, 77, 82, 82, 76, 78, 80) arewithin one standard deviation of the mean.


Normal Distribution

Many types of data are normally distributed. This is often known as the bell curve. Innormally distributed data, the mean, median and mode are equal.

!

Any data which is normally distributed can transformed to a standard normal distribution.The standard normal distribution has a mean, median, and mode of 0 (µ = 0) and a standarddeviation of 1 (σ = 1). Normally distributed data can be summarized by a rule often referredto the 68-95-99.7 rule. This rule says that approximately 68% of the data fall within onestandard deviation of the mean; 95% of the data fall within two standard deviations of themean; and 99.7% of the data fall within three standard deviations of the mean.

14 Statistics

!Standard Normal Distribution 68-95-99.7 Rule

A z-score is a statistical measure that tells us how many standard deviations above or belowthe mean a particular data value is in a normal distribution. A positive z-score indicatesthat the data value is above the mean. A negative z-score indicates that the data value isbelow the mean. A z-score of 0 indicates that the data value is equal to the mean.

z =x− µσ

Using a z-score, we can use what we know about the standard normal distribution to deter-mine the area under the standard normal curve that falls below or above a particular datavalue. There are table that will give us this information or allow us to calculate it. We haveprovided one such z-table as part of the formula sheet. Our z−table gives the area underthe normal curve to the left of a particular z-value.

Example 21. Human pregnancies are normally distributed with a mean of 266 days and astandard deviation of 16 days. What is the probability that a pregnancy will last fewer than270 days?

Solution: Let’s start by looking at a picture for this problem.

!!270

area below 270 days


In order to answer this question, we will need to determine how many standard deviationsaway from from the mean 270 days is. This is what we get when we calculate the z-score.

z =270− 266

16= .25

We now need to look this z-score up in the z-table. We go down the left column of thepositive side of the table until we get to 0.2. We then go across the .2 row until we get tothe 0.05 column.

!!

Z-table z 0.0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359

0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753

0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141

0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0..6443 0.6480 0.6517

0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879

0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224

0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549

0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852

0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133

0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389

1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621

1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830

1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015

1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177

1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319

1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441

1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545

1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633

1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706

1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767

2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817

2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857

2.2 0.9861 0.9864 0.9868 0.9871 00.9875 0.9878 0.9881 0.9884 0.9887 0.9890

2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916

2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936

2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952

2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964

2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974

2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981

2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986

3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990

3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993

3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995

3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997

3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998

This number, 0.5948, is the probability that a pregnancy will last fewer that 270 days.

Example 22. Human pregnancies are normally distributed with a mean of 266 days and astandard deviation of 16 days. What percent of pregnancies will last more than 280 days?


!!

280

area above 280 days


z =280− 266

16= .875

16 Statistics

We now need to look this z-score up in the z-table. Since our z-table only has z-score to twodecimal places, we will need to look up 0.88. We go down the left column of the positiveside of the table until we get to 0.8. We then go across the .8 row until we get to the 0.08column.

This number, 0.8106, is the probability that a pregnancy will last fewer that 280 days. Weare asked about more that 280 days, thus we need to subtract this number from 1. Thusthe probability that a pregnancy will last more than 280 days it 1 − 0.8106 = .1894. Weneed to change this probability to a percent by multiplying it by 100 to find that 18.94% ofpregnancies will last more than 280 days.

Example 23. Human pregnancies are normally distributed with a mean of 266 days and astandard deviation of 16 days. What is the probability that a pregnancy will last between250 and 275 days?


!!

275

area below 275 days

area below 250 days


z =250− 266

16= −1

We go down the left column of the negative side of the table until we get to 1.0. We thengo across the −1.0 row until we get to the 0.00 column. This gives us 0.1587.

We now need to determine how many standard deviations away from from the mean 250days is. This is what we get when we calculate the z-score.

z =275− 266

16= 0.5625

We now need to look this z-score up in the z-table. Since our z-table only has z-score to twodecimal places, we will need to look up 0.56. We go down the left column of the positiveside of the table until we get to 0.5. We then go across the 0.5 row until we get to the 0.06column. This gives us 0.7123.


In order to find the probability that a pregnancy lasts between 250 and 275 days, we needto subtract the probability of less than 250 day from the probability of less than 275 days.This will give us 0.7123− 0.1587 = 0.5536 is the probability of a pregnancy lasting between250 and 275 days.

In the previous three examples, we first found a z-score and then looked that up in thez-table to get the information that we needed. We can also use a z-table in essentially a waythat is backward from that.

Example 24. All incoming freshmen at a major university are required to take a math-ematics placement exam. The scores are normally distributed with a mean of 420 and astandard deviation of 45. If a student score less than a certain score, he or she will have totake a review course. Find the cutoff score at which 25% of the students would have to takethe review course.

Solution: For this problem we want to find a test score so that 25% of the students scorebelow that score. The picture below illustrates what we are looking for.

!!

!

25%!

420!

The 25% indicates the value within the z-table that we need to look for. We would need tochange the 25% to a decimal of 0.25. This is the number within the table that we need tofind. Once we find the value, we need to determine which z-score.

18 Statistics

!

z! 0.09! 0.08! 0.07! 0.06! 0.05! 0.04! 0.03! 0.02! 0.01! 0.0!

-1.0! 0.1379! 0.1401! 0.1423! 0.1446! 0.1469! 0.1492! 0.1515! 0.1539! 0.1562! 0.1587!

-0.9! 0.1611! 0.1635! 0.1660! 0.1685! 0.1711! 0.1736! 0.1762! 0.1788! 0.1814! 0.1841!

-0.8! 0.1867! 0.1894! 0.1922! 0.1949! 0.1977! 0.2005! 0.2033! 0.2061! 0.2090! 0.2119!

-0.7! 0.2148! 0.2177! 0.2206! 0.2236! 0.2266! 0.2296! 0.2327! 0.2358! 0.2389! 0.2420!

-0.6! 0.2451! 0.2483! 0.2514! 0.2546! 0.2578! 0.2611! 0.2643! 0.2676! 0.2709! 0.2743!

-0.5! 0.2776! 0.2810! 0.2843! 0.2877! 0.2912! 0.2946! 0.2981! 0.3015! 0.3050! 0.3085!

-0.4! 0.3121! 0.3156! 0.3192! 0.3228! 0.3264! 0.3300! 0.3336! 0.3372! 0.3409! 0.3446!

-0.3! 0.3483! 0.3520! 0.3557! 0.3594! 0.3632! 0.3669! 0.3707! 0.3745! 0.3783! 0.3821!

-0.2! 0.3829! 0.3897! 0.3936! 0.3974! 0.4013! 0.4052! 0.4090! 0.4129! 0.4168! 0.4207!

-0.1! 0.4247! 0.4286! 0.4325! 0.4364! 0.4404! 0.4443! 0.4483! 0.4522! 0.4562! 0.4602!

-0.0! 0.4641! 0.4681! 0.4721! 0.4761! 0.4801! 0.4840! 0.4880! 0.4920! 0.4960! 0.5000!

!

When we look at the partial table above, we see that there is not exact entry within thetable for 0.2500. The two values that are closest to 0.2500. We want to pick the value thatis closest. We see that 0.2500− 0.2483 = 0.0017 and 0.2514− 0.2500 = 0.0014. Thus, 0.2514is the closest to .2500. We now follow the row to the left and the column up to find thez-score associated with the probability 0.2514. This score is z = −0.67

We now have the z-score. We need to figure out which test score will create this z-score. Todo this, we plug what we know into the formula z = x−µ

σand solve for x.

−0.67 =x− 420

45−30.15 = x− 420

389.85 = x

Since score are whole numbers, we will round this number to 390. Thus a score of 390 on theexam will be the cutoff score at which 25% of the students would have to take the reviewcourse.

Date post:	25-Jul-2018
Category:	Documents
Upload:	dinhdung
View:	215 times
Download:	0 times

MAT 142 College Mathematics Module ST Statisticskamman/notes/statistics/statistics.pdf · 38 22 27...

Documents