+ All Categories
Home > Documents > STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Date post: 26-Dec-2015
Category:
Upload: gyles-neal
View: 215 times
Download: 1 times
Share this document with a friend
Popular Tags:
33
STAT 211 – 019 Dan Piett West Virginia University Lecture 2
Transcript
Page 1: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

STAT 211 – 019 Dan Piett

West Virginia University

Lecture 2

Page 2: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Last LecturePopulation/SampleVariable Types

Discrete/Continuous Numeric & Ranked/Unranked Categorical

Displaying Small Sets of NumbersDot Plots, Stem and Leaf, Pie Charts

HistogramsFrequency/Density and Symmetric vs

Right/Left SkewedMeasures of Center

Mean/Median

Page 3: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Overview2.3 Measures of Dispersion2.5 Boxplots3.1 Scatterplots3.2 Correlation3.3 Regression

Page 4: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Section 2.3

Measures of Dispersion

Page 5: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Descriptive StatisticsDescribing the DataHow do we describe data?Graphs (Last Class)Measures

Center (Last Class)Mean/Median

Dispersion/Spread (This Class)Variance, Standard Deviation, IQR

Page 6: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Spread of DataExample: SpreadData 1: 8, 8, 9, 9, 10, 11, 11, 12, 12Data 2: -30, -20, -10, 0, 10, 20, 30, 40 ,50Data 1 – Mean = Median = 10Data 2 – Mean = Median = 10

Both have the same measure of center but how do they differ?

Data 2 is much more spread out.

Page 7: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Sample Standard DeviationSample Standard Deviation (S) is a

measure of how spread out the data is S can be any number >= 0Larger S indicates a larger spreadUnit Associated with S is the same unit as

the variableExample: Mean of 110 lb, Standard Deviation

10 lbThe square of the sample standard

deviation is called the sample variance

Page 8: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Standard Deviation ExampleData 1 (8, 8, 9, 9, 10, 11, 11, 12, 12)

S = 1.58Data 2 (-30, -20, -10, 0, 10, 20, 30, 40 ,50)

S = 27.39

As you can see, the standard deviation of Data 2 is much larger than Data 1.

Page 9: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Population Variance/Standard DeviationMuch like the sample mean (xbar)

estimates the population mean (mu), the sample variance/standard deviation (s) can be used to estimate the true population standard deviation (sigma)

Page 10: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Linear Transformations and Changes of ScaleBy adding or subtracting a constant to every

value in a data setThe mean is increased/decreased by the same

amountThe median is increased/decreased by the same

amountThe standard deviation is unchanged

By multiplying each value by a constantThe mean is multiplied by the same amountThe median is multiplied by the same amountThe standard deviation is multiplied by the same

amount

Page 11: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Section 2.5

Boxplots

Page 12: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

QuartilesQuartiles are numbers which partition the data

into 4 subgroups (ie 4 quarters in a dollar)Q1

The data separating lowest 25% of the data valuesQ2 aka. Median

The data separating the lowest 50% of the data values

Q3 The data separating the lowest 75% of the data

valuesQ4 aka. Maximum

The largest data value

Page 13: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Quartiles ExampleYou can think of Q1 as the median of the

bottom half of the data and Q3 as the median of the top half of the data

Page 14: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Interquartile Range (IQR)The IQR is another measure of spread,

much like S.Larger IQR results in more spread dataIQR is calculated as Q3 - Q1ExampleData 1 (8, 8, 9, 9, 10, 11, 11, 12, 12)

IQR = 11.5-8.5=3Data 2 (-30, -20, -10, 0, 10, 20, 30, 40 ,50)

IQR = 35-(-15) = 50

Page 15: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

BoxplotsBoxplots are a graphical representation of

the quartiles.

Page 16: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Using IQR to Find Potential OutliersOne method to find potential outliers is as

follows:1. Find the IQR2. Add 1.5*IQR to Q3

Anything larger than this value can be flagged as a potential outlier

3. Likewise, subtract 1.5*IQR from Q1Anything smaller than this value can be flagged as a

potential outlier

Example Data 1 (8, 8, 9, 9, 10, 11, 11, 12, 12) Data 2 (-30, -20, -10, 0, 10, 20, 30, 40 ,50)

Page 17: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Section 3.1

Scatterplots

Page 18: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Bivariate DataBivariate data is data consisting of two

variables from the same individualExamples

Height and WeightClasses skipped and GPA

Graphed using a scatterplot

Page 19: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Scatterplot Example

Page 20: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Section 3.2

Correlation

Page 21: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Pearson Correlation CoefficientWe have discussed ways to describe data of

one variable. This section will discuss how to describe two variables on the same individual together.

The correlation coefficient, r, is a measure of the strength of a linear (straight line) relationship between bivariate data. (You will not need to know the formula for r)

To say two variables are correlated is two say that an increase/decrease in one corresponds to an increase/decrease in the other.

Page 22: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

More on rr can take on values between -1 and 1The strength of the correlation depends on

how close you are to the extreme values of -1 or 1r = -.78 is a stronger correlation than r = .50

There are three types of correlationPositiveNegativeNo Correlation

Page 23: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Positive CorrelationPositive Correlation exists when r is

between 0 and 1.The closer r is to 1, the stronger the

relationshipThis implies that if you increase one of the

variables, the other one will also increase.Examples:

Height and Weight, Temperature and Ice Cream Sales

Page 24: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Negative CorrelationPositive Correlation exists when r is

between -1 and 0.The closer r is to -1, the stronger the

relationshipThis implies that if you increase one of the

variables, the other one will decrease.Example:

Temperature and Hot Chocolate Sales

Page 25: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

No CorrelationNo Correlation exists when r is

approximately 0This implies that if you increase one of the

variables the other one does not changeExample:

Temperature and Cookie Sales

Page 26: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Interpretation of rAlthough we may find that two variables are

correlated, this does not mean that there is necessarily a causal relationship.

Example:High School Teachers who are paid less tend to have

students who do better on the SATs than Teachers who are paid more. It has been found that there is a negative correlation between teacher salary and students SAT scores. Therefore we should pay our teachers less so students score higher.

Clearly this is not a causal relationship. There is likely a third variable, that is explaining this. One possibility may be the age of the teacher.

Page 27: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Section 3.3

Regression

Page 28: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Regression IntroSo we have decided that two variables are

correlated, we are now going to use the value of one of the variables, “x”, to predict the value of the other variable, “y ”.

Example:Use height (x) to predict weight (y)Use temperature (x) to predict ice cream

sales (y)

Page 29: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Regression Equation

Page 30: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Calculating a Regression Equation Given the slope and intercept

Page 31: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Plotting a Regression Line

Page 32: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Notes on Regression Lines

Page 33: STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

ResidualsA residual is the distance between a point

(observed y-value) and the regression line (predicted y-value)

Formula: Observed Value – Predicted ValueUsing the Cholesterol Example:

For TV Hours = 3, our predicted value was 212.2The actual value on the graph is 220.The residual for this particular point is = 220-

212.2=7.8A residual may be positive or negative

The interpretation is that the observed y-value is 7.8 units larger than the predicted y value for TV Hours = 3


Recommended