Breakout Session #1Graphical Statistics
Presented byDr. Del Ferster
What’s in store for today?
We’ll start by doing a needs assessment.Where do you want or need more information regarding the topics for this year’s work.
We’ll spend a bit of time looking at some “test-type” problems.We’ll take another look at graphical statistics.
Different types of plots2 column frequency tables.
I know, it sounds like a blast!
Let’s do some problems!
Practice ProblemsGRAPHICAL STATISTICS
Get your popcorn ready!
TODAY’S FEATURE PERFORMANCE
GRAPHICAL STATISTICS
Descriptive Statistics:Tabular and Graphical Presentations
Summarizing Qualitative DataSummarizing Quantitative Data
RecallQualitative = Essentially just a name. Quantitative = True numerical data.
2.6
We Deal with 2 Types of Data
Numerical/Quantitative Data [Real Numbers]:
Your heightThe number of people in your familytemperature of coffee bought at McDonaldsThe score on your last math test
Qualitative/Categorical Data [Labels rather than numbers]:
grade of a High School student[F, S, J, Senior]favorite colorPolitical party affiliationthe part of a new automobile that breaks firstthe reason you get mad at your spouse
Summarizing Qualitative Data
Frequency Distribution (shows how many)Relative Frequency Distribution (shows what
fraction)Percent Frequency Distribution (shows what
percentage)Bar GraphPie Chart
Both these are graphical means for displaying any of above.
Frequency Distribution
A frequency distribution is a tabular summary of data showing the frequency (or number) of items in each of several non-overlapping classes.The objective is to provide insights about the data that cannot be quickly obtained by looking only at the original data.
Example: Stumble InnGuests staying at Stumble Inn were asked to rate the quality of their accommodations as being excellent, above average, average, below average, or poor. Theratings provided by a sample of 20 guests are shownbelow.
Below Average Average Above Average
Above Average Above Average Above Average
Above Average Below Average Below Average
Average Poor Poor
Above Average Excellent Above Average
Average Above Average Average
Above Average Average
Frequency DistributionRating FrequencyPoor 2Below Average 3Average 5Above Average 9Excellent 1
Total 20
Example: Stumble Inn
Relative Frequency Distribution
The relative frequency of a class is the fraction or proportion of the total number of data items belonging to the class.A relative frequency distribution is a tabular summary of a set of data showing the relative frequency for each class.
Percent Frequency Distribution
The percent frequency of a class is the relative frequency multiplied by 100.A percent frequency distribution is a tabular summary of a set of data showing the percent frequency for each class.
Example: Stumble InnRelative Frequency and Percent Frequency Distributions
Relative Percent
Rating FrequencyFrequency
Poor .1010Below Average .15
15Average .25
25Above Average .45
45Excellent .05 5
Total 1.00 100
Bar GraphA bar graph is a graphical device for depicting qualitative data.On the horizontal axis we specify the labels that are used for each of the classes.A frequency, relative frequency, or percent frequency scale can be used for the vertical axis.Using a bar of fixed width drawn above each class label, we extend the height appropriately.The bars are separated to emphasize the fact that each class is a separate category.
Example: Stumble InnBar Graph
12
3
45
6
78
9
Poor BelowAverage
Average AboveAverage
Excellent
Freq
uenc
y
Rating
Pie ChartThe pie chart is a commonly used graphical device for presenting relative frequency distributions for qualitative data.First draw a circle; then use the relative frequencies to subdivide the circle into sectors that correspond to the relative frequency for each class.Since there are 360 degrees in a circle, a class with a relative frequency of .25 would consume .25(360) = 90 degrees of the circle.
Example: Stumble Inn
Pie Chart
Average 25%
BelowAverage 15%
Poor 10%
AboveAverage 45%
Exc. 5%
Quality Ratings
Insights Gained from the Preceding Pie ChartOne-half of the customers surveyed gave Stumble Inn a quality rating of “above average” or “excellent” (looking at the left side of the pie). This might please the manager.For each customer who gave an “excellent” rating, there were two customers who gave a “poor” rating (looking at the top of the pie). This should displease the manager.
Example: Stumble Inn
Summarizing Quantitative Data
Frequency DistributionRelative Frequency and Percent Frequency DistributionsDot PlotHistogramCumulative DistributionsOgive
Example: RPM Auto Repair
The manager of RPM Auto Repairwould like to have a betterunderstanding of the costof parts used in the enginetune-ups performed in theshop. He examines 50customer invoices for tune-ups. The costs of parts,rounded to the nearest dollar, are listed on the nextslide.
Sample of Parts Cost for 50 Tune-ups
91 78 93 57 75 52 99 80 97 6271 69 72 89 66 75 79 75 72 76104 74 62 68 97 105 77 65 80 10985 97 88 68 83 68 71 69 67 7462 82 98 101 79 105 79 69 62 73
Including a line in the table for every possible cost is not a good idea.
We need to categorize the data.
Example: RPM Auto Repair
Frequency Distribution
Guidelines for Selecting Number of Classes
Use between 5 and 20 classesSmaller data sets usually require fewer classesData sets with a larger number of elementsusually require a larger number of classes.Note that the upper limit of every class is also the lower limit of the next class.
We treat the upper limit as OPEN (or Up to that amount)
Frequency Distribution
Guidelines for Selecting Width of Classes
Use classes of equal width.Approximate Class Width =
Largest Data Value Smallest Data ValueNumber of Classes
For RPM Auto Repair, if we choose 6 classes:
Frequency Distribution
Approximate Class Width =109 52 9.5
6so we'll use an interval length of 10
50-60 60-70 70-80 80-90 90-100 100-110
2 13 16 7 7 5 Total 50
Parts Cost ($)Frequency
Relative Frequency andPercent Frequency Distributions
50-60 60-70 70-80 80-90 90-100 100-110
PartsCost ($)
.04 .26 .32 .14 .14 .10Total 1.00
RelativeFrequency
4 26 32 14
1410
100
Percent Frequency
2/50 .04(100)
Relative Frequency andPercent Frequency Distributions
For the RPM Motors Data, we can make the following observations.
Only 4% of the parts costs are in the $50-60 class.30% of the parts costs are under $70.The greatest percentage (32% or almost one-third) of the parts costs are in the $70-80 class.10% of the parts costs are $100 or more.
Dot Plot
One of the simplest graphical summaries of data is a dot plot.A horizontal axis shows the range of data values.Then each data value is represented by a dot placed above the axis.
Example: RPM Auto Repair
Dot Plot . . .. . . .
50 60 70 80 90 100 110
. . . ..... .......... .. . .. . . ... . .. . . .. .. .. .. . .
Cost ($)
Histogram
Another common graphical presentation of quantitative data is a histogram.The variable of interest is placed on the horizontal axis.A rectangle is drawn above each class interval with its height corresponding to the interval’s frequency, relative frequency, or percent frequency.Unlike a bar graph, a histogram has no natural separation between rectangles of adjacent classes.
Example: Hudson Auto Repair
Histogram
PartsCost ($)
24
6
8
10
12
14
16
18
Freq
uenc
y
50 60 70 80 90 100 110
Cumulative Distributions
Cumulative frequency distribution -- shows the number of items with values less than or equal to the upper limit of each class.Cumulative relative frequency distribution -- shows the proportion of items with values less than or equal to the upper limit of each class.Cumulative percent frequency distribution -- shows the percentage of items with values less than or equal to the upper limit of each class.
Example: Hudson Auto RepairCumulative Distributions
Cumulative Cumulative Cumulative Relative PercentCost ($) Frequency Frequency Frequency < 60 2 .04 4 < 70 15 .30 30 < 80 31 .62 62 < 90 38 .76 76 < 100 45 .90 90 <110 50 1.00 100
Exploratory Data Analysis
The techniques of exploratory data analysis consist of simple arithmetic and easy-to-draw pictures that can be used to summarize data quickly.One such technique is the stem-and-leaf display.
Stem-and-Leaf DisplayA stem-and-leaf display shows both the rank order and shape of the distribution of the data.It is similar to a histogram on its side, but it has the advantage of showing the actual data values.The first digits of each data item are arranged to the left of a vertical line.To the right of the vertical line we record the last digit for each item in rank order.Each line in the display is referred to as a stem.Each digit on a stem is a leaf.
8 5 7 9 3 6 7 8
Stem-and-Leaf Display
Leaf UnitsA single digit is used to define each leaf.In the preceding example, the leaf unit was 1.Leaf units may be 100, 10, 1, 0.1, and so on.Where the leaf unit is not shown, it is assumed to equal 1.
Example: Hudson Auto RepairStem-and-Leaf Display
5 2 7 6 2 2 2 2 5 6 7 8 8 8 9 9 9 7 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9 8 0 0 2 3 5 8 9 9 1 3 7 7 7 8 9
10 1 4 5 5 9
SPLIT STEM Stem-and-Leaf Display
If we believe the original stem-and-leaf display has condensed the data too much, we can stretch the display by using two more stems for each leading digit(s).Whenever a stem value is stated twice, the first value corresponds to leaf values of 0-4, and the second values corresponds to values of 5-9.
Example: Hudson Auto RepairSPLIT STEM Stem and Leaf Plot
5 2 5 7 6 2 2 2 2 6 5 6 7 8 8 8 9 9 9 7 1 1 2 2 3 4 4 7 5 5 5 6 7 8 9 9 9 8 0 0 2 3 8 5 8 9 9 1 3 9 7 7 7 8 9
10 1 4 10 5 5 9
2 Way Data Tables
Thus far we have focused on methods that are used to summarize the data for one variable at a time.Often we are interested in tabular and graphical methods that will help understand the relationship between two variables.2 Way Data Tables and scatter diagrams are two methods for summarizing the data for two (or more) variables simultaneously.
2 Way Data Tables
2 way data tables are used to summarize the data for two variables simultaneously.2 way data tables can be used when:
One variable is qualitative and the other is quantitativeBoth variables are qualitativeBoth variables are quantitative
The left and top margin labels define the classes for the two variables.
Example: Finger Lakes Homes2 Way Data Tables
The number of Finger Lakes homes sold for each style and price for the past two years is shown below.
Price Home Style Range Colonial Ranch Split A-Frame
Total < $99,000 18 6 19 12 55 > $99,000 12 14 16 3 45
Total 30 20 35 15 100
Example: Finger Lakes Homes
Insights Gained from the Preceding 2 Way table
The greatest number of homes in the sample (19) are a split-level style and priced at less than or equal to $99,000.Only three homes in the sample are an A-Frame style and priced at more than $99,000.
2 Way Tables: Row or Column Percentages
Converting the entries in the table into row percentages or column percentages can provide additional insight about the relationship between the two variables.
Example: Finger Lakes HomesRow Percentages
Price Home Style Range Colonial Ranch Split A-Frame Total < $99,000 32.73 10.91 34.55 21.82 100 > $99,000 26.67 31.11 35.56 6.67 100
Note: row totals are actually 100.01 due to rounding.
Example: Finger Lakes HomesColumn Percentages
Price Home Style Range Colonial Ranch Split A-Frame
< $99,000 60.00 30.00 54.29 80.00 > $99,000 40.00 70.00 45.71 20.00
Total 100 100 100 100
A quick 2 way table problemBaked Chips Mashed Total
Boys 34 100Girls 25 37
Teachers 12 22 50Total 104 250
The table above gives the preferences for a variety of people regarding their favorite way to consume potatoes (Yes it’s a carbohydrate extravaganza!!)
1) How many boys liked baked?2) How many teachers preferred
chips?3) How many girls were asked?4) Out of the people who liked
chips, how many were boys?
That was fun, let’s do another one!This one deals with probabilities. Grab your calculator and let’s rock! A person is picked at random from this sample
Baked Chips Mashed TotalBoys 15 51 34 100 BoysGirls 25 37 38 100 GirlsTeachers 12 16 22 50 TeachersTotal 52 104 94 250 Total
1) What is the probability the a person picked is a boy?2) What is the probability the a person picked likes mashed?3) What is the probability the person was a teacher who prefers baked
potatoes?4) What is the probability that, out of the girls, the person likes chips?5) Out of the people who like chips, what is the probability the person is a
boy?
Scatter Diagram
A scatter diagram is a graphical presentation of the relationship between two quantitative variables.One variable is shown on the horizontal axis and the other variable is shown on the vertical axis.The general pattern of the plotted points suggests the overall relationship between the variables.
Example: Panthers Football TeamScatter DiagramThe Panthers football team is interested in investigating the relationship, if any, between interceptions made and points scored.
x = Number of y = Number of Interceptions Points Scored
1 14 3 24 2 18 1 17 3 27
Example: Panthers Football TeamScatter Diagram
y
x
Number of Interceptions1 2 3
Num
ber o
f Poi
nts S
core
d
0
51015202530
0
Example: Panthers Football Team
The preceding scatter diagram indicates a positive relationship between the number of interceptions and the number of points scored.Higher points scored are associated with a higher number of interceptions.The relationship is not perfect; all plotted points in the scatter diagram are not on a straight line.
Scatter Diagram
A Positive Relationship
x
y
Scatter Diagram
A Negative Relationship
x
y
Scatter Diagram
No Apparent Relationship
x
y
Wrapping Up
Thanks for your attention and participation.I know it’s not easy doing this after a full day with the “munchkins”.
I hope that your year is off to a good start.If I can help in any way, don’t hesitate to shoot me an email, or give me a call.