Announcements

Post on 04-Jan-2016

19 views 0 download

Tags:

description

Exams returned at end of class Average = 78 Standard Dev = 12 Key with explanations will be posted Don’t be discouraged: First test is often hardest. Have been focused on categorical data & proportions Next segment of course will focus on numerical data & means - PowerPoint PPT Presentation

transcript

Announcements

• Exams returned at end of class

• Average = 78• Standard Dev = 12• Key with explanations

will be posted• Don’t be discouraged:

First test is often hardest

• Have been focused on categorical data & proportions

• Next segment of course will focus on numerical data & means

• Today we discuss summary stats and graphs for numerical data

Numerical Data

• Numerical data can be continuous or discrete

• Discrete data is restricted by its nature to certain values, usually counts

• Continuous data could conceptually be measured to more and more decimal places

Examples of Discrete Data

• Number of people

• Litter size for animal births

• Number of days with rain

Example of Continuous Data

• Temperature (not just 87, but 87.3 degrees)

• Time (not just 10 seconds, but 10.58 sec)

• Weight (not just 5 lbs, but 5.3 lbs)

Summaries of Numerical Data

• Numerical data is summarized by a measure of “center” and a measure of “spread”

• There are two pairs of these measures

• Mean (center) and standard deviation (spread)

• Median (center) and SIQR (spread)

• SIQR = Semi-interquartile range

Mean and Standard Deviation

• The mean is the average. To compute the mean, add up all the values and divide by the number of observations.

• The standard deviation is a measure of spread. To compute it, subtract the mean from each value (called deviations). Square the deviations, total them, divide by n-1 and take the square root.

Example 1• Observations: 50, 63, 72, 84, 91

• Mean = (50+63+72+84+91)/5 = 72

• Deviations = (50-72) = -22, … , (91 -72) = 19

• Deviations Squared = 484, … , 361

• Total of above = 484 + … + 361 = 1070

• Total/(5-1) = Total/4 = 267.5

• Square root of 267.5 = 16.35

• Standard Deviation = 16.35

Example 2

• Observations: 69, 71, 72, 72, 76

• Mean = (69+71+72+72+76)/5 = 72

• Standard Deviation = 2.54

• This data set has the same mean as example 1, but less variability. Thus, it has a lower “spread” or standard deviation.

Median and SIQR

• The median is the middle of the sorted data. One half of the data is higher than the median and one half is below.

• SIQR = (upper quartile - lower quartile)/2

• The lower quartile is the value so that one fourth of the data is below it and three fourths of the data is above it.

• The upper quartile is the value so that three fourths of the data is below it and one fourth of the data is above it.

Example 1 revisited

• Observations: 50, 63, 72, 84, 91

• Median = 72

• Lower Quartile = 63

• Upper Quartile = 84

• SIQR = (84 - 63)/2 = 10.5

Example 2 revisited

• Observations: 69,71,72,72,76

• Median = 72

• Lower Quartile = 71

• Upper Quartile = 72

• SIQR = (72-71)/2 = .5

• Again, the two data sets have the same “center” but different “spreads”

Making the comparison

Mean and SD• Sensitive to outliers• Sampling distributions

are easily found

Median and SIQR• Robust to outliers• Sampling distributions

are difficult to find

Therefore, we will use the mean and standard deviation for “well behaved” data and we will use the median and SIQR when we have outliers.

Sensitivity vs. Robustness• Observations: 50, 63, 72, 84, 91

• Mean = 72, SD = 16.35

• Median = 72, SIQR = 10.5

• New Observation = 24

• New Mean = 64, New SD = 24.45

• New Median = 67.5, New SIQR = 13.875

• The mean and SD were more heavily affected by the outlier than the median and SIQR.

Sampling Distributions

• As we move forward, we will see that the sample mean is normally distributed, and that the t-distribution can help describe the sample mean and sample standard deviation

• Finding the sampling distributions for the sample median and SIQR is more involved, and will not be covered in this course.

Summary Graphs

• Stem-and-leaf chart

• Histogram

• Box plot

Exam Grades: Stem-and-leaf plot9 | 5677

9 | 001123444

8 | 566666778889

8 | 00001112222233444

7 | 55668999999

7 | 0011223344

6 | 567777777788899

6 | 000123334

5 | 6799

5 | 14

4 | 8

• The stems are the first digit of the grade and placed to the left of the line

• The leaves are the second digit of the grade and placed to the right of the line

• Each grade is represented• Example: There are three

81’s

Histogram: Section 506

• Histogram is a bar chart

• More aesthetic than a stem-and-leaf

• Cannot reconstruct the data set from a histogram

Box-plots

• Useful for comparing groups

• Center line is median

• Top of box is upper quartile

• Bottom of box is lower quartile

Median

Upper Q.

Lower Q

Max

Min

Max

Upper Q.

Median

Lower Q

Min

More On Boxplots

• Same data sets as before, but a zero was added to each

• Outliers are represented as points

• Definition of outlier is based on the quartiles and the SIQR

MedianUpper Q.

Lower Q

Max

Min

Median

Upper Q.

Lower Q

Max

Min

Lowest Non-outlier

Lowest Non-outlier

Grades and Curving

Why I don’t curve• Low scores indicate a

problem to be addressed: learning is not happening

• Curving does not encourage learning, it is a cheap fix for low grades

What I do instead• Sometimes I offer

exam corrections• Other times I offer

additional bonus assignments

• This time: a bonus assignment will be offered

A Tale of Two Aggies

John• In my Fall 99 class• First Exam: D• Good HW & Quiz• Made office visits• Grades improved• Class grade: A• I did not curve

Sarah• In my Fall 99 class• First Exam: D• Skipped HW & Quiz• Never came by office• Class grade: F• Whined• I did not curve

Bonus E: Election Coverage

• Give a statistical critique of election coverage of next week’s debate

• If you can’t watch debate, you may use a magazine or newspaper (include copy)

• Clarity: 2 points

• Validity: 2 points

• Brevity: 2 points

• Typed on paper: due Oct. 24

How to make a stem-and-leaf• Click the Editor

button• Enter data in columns• Click Close button• Go to Graphs: One

Variable: Stem-and-Leaf

• Select the variable of interest

• Click OK

How to make a histogram

• After entering data, go to Graphs: One Variable: Histogram: Continuous Variable

• Select variable of interest

• Set desired options• Click OK

How to make box-plots

• Go to Graphs: Comparison of Variables: Box Plot Comparison

• Select all variables of interest (makes side-by-side box plots)

• Click OK.