+ All Categories
Home > Documents > AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot....

AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot....

Date post: 13-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
47
AP Statistics Chapter 9 Re-Expressing data: Get it Straight
Transcript
Page 1: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

AP Statistics

Chapter 9

Re-Expressing data:

Get it Straight

Page 2: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Objectives:

• Re-expression of data

• Ladder of powers

Page 3: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Straight to the Point

• We cannot use a linear model unless the

relationship between the two variables is linear.

Often re-expression (transformation) can save

the day, straightening bent relationships so that

we can fit and use a simple linear model.

• Two simple ways to re-express data are with

logarithms and reciprocals.

• Re-expressions can be seen in everyday life—

everybody does it.

Page 4: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Straight to the Point

• The relationship between fuel efficiency (in

miles per gallon) and weight (in pounds) for late

model cars looks fairly linear at first:

Page 5: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Straight to the Point

• A look at the residuals plot shows a problem:

Page 6: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Straight to the Point

• We can re-express fuel efficiency as gallons per

hundred miles (a reciprocal) and eliminate the

bend in the original scatterplot:

Page 7: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Straight to the Point

• A look at the residuals plot for the new model

seems more reasonable:

Page 8: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Goals of Re-expression

• Goal 1: Make the distribution of a variable (as seen in its

histogram, for example) more symmetric.

It’s easier to summarize the center of a symmetric

distribution, we can use the mean and standard

deviation.

If the distribution is unimodal also, we can analysis

using the normal model.

Here taking the log of the explanatory variable.

Page 9: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Goals of Re-expression

• Goal 2: Make the spread of several groups (as

seen in side-by-side boxplots) more alike, even

if their centers differ.

Groups that share a common spread are easier

to compare.

Here taking the log makes the individual boxplots

more symmetric and gives them spreads that are

more nearly equal.

Page 10: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Goals of Re-expression

• Goal 3: Make the form of a scatterplot more

nearly linear.

Linear scatterplots are easier to model.

By re-expressing to straighten the scatterplot

relationship we can fit a linear model and use

linear techniques to analysis.

Here taking the log of the response variable.

Page 11: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Goals of Re-expression

• Goal 4: Make the scatter in a scatterplot spread

out evenly rather than thickening at one end.

Having an even scatter is a condition of many

methods of Statistics, as we will see later.

This is closely related to goal 2, but often comes

along with goal 3, as seen below. When taking

the log to straighten the data, it also evened out

the spread.

Page 12: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

The Ladder of Powers

• There is a family of simple re-expressions that

move data toward our goals in a consistent

way. This collection of re-expressions is called

the Ladder of Powers.

• The Ladder of Powers orders the effects that

the re-expressions have on data.

Page 13: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

The Ladder of Powers

Ratios of two quantities (e.g., mph)

often benefit from a reciprocal.

The reciprocal

of the data–1

An uncommon re-expression, but

sometimes useful.

Reciprocal

square root–1/2

Measurements that cannot be negative

often benefit from a log re-expression.We’ll use

logarithms here“0”

Counts often benefit from a square

root re-expression.

Square root of

data values½

Data with positive and negative values

and no bounds are less likely to

benefit from re-expression.

Raw data1

Try with unimodal distributions that are

skewed to the left.

Square of

data values2

CommentNamePower

Page 14: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

The Ladder of Powers

• The Ladder of Powers orders the effects that

the re-expressions have on data.

• How it works.

If you try taking the square root of all the values

in a variable and it helps, but not enough, then

move further down the ladder to the log or

reciprocal root. Those re-expressions will have a

similar, but even stronger, effect on your data.

If you go too far, you can always back up.

Remember, when you take a negative power,

the direction of the relationship will change. This

is OK, you can always change the sign of the

response variable if you want to keep the same

direction.

Page 15: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Plan B: Attack of the Logarithms

• When none of the data values is zero or

negative, logarithms can be a helpful ally in the

search for a useful model.

• Try taking the logs of both the x- and y-

variable.

• Then re-express the data using some

combination of x or log(x) vs. y or log(y).

Page 16: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Plan B: Attack of the Logarithms

Page 17: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Multiple Benefits

• We often choose a re-expression for one

reason and then discover that it has helped

other aspects of an analysis.

• For example, a re-expression that makes a

histogram more symmetric might also

straighten a scatterplot or stabilize variance.

Page 18: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Why Not Just Use a Curve?

• If there’s a curve in the scatterplot, why not just

fit a curve to the data?

Page 19: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Why Not Just Use a Curve?

• The mathematics and calculations for “curves of

best fit” are considerably more difficult than

“lines of best fit.”

• Besides, straight lines are easy to understand.

We know how to think about the slope and the y-

intercept.

Page 20: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

More Plan B: Modeling Nonlinear Data - Logarithms

• Two specific types of nonlinear growth.

1. Exponential function (form y = abx)

2. Power function (form y = axb)

• Equations of both forms can be transformed

into linear forms.

• Can then use linear regression to model and

analyze the transformed data.

• Can also perform an inverse transformation to

obtain a model of the original data.

Page 21: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

To Transform the exponential Function use its Inverse the

Logarithmic Function

• Properties of Logarithms

Page 22: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Using Logarithms to Transform Data

• Logarithms can be useful in straightening a

scatterplot whose data values are greater than

zero.

• Remember, you cannot take the logarithm of a

nonpositive number.

• When you use transformed data to create a

linear model, your regression equation is not in

terms of (x,y) but in terms of the transformed variable(s) (log ŷ or log x).

Page 23: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Logarithm Transformations

Page 24: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Example: Testing for Exponential Association

• Data

Page 25: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

View Scatterplot

• Looks like it has a curved pattern, could

possibly be an exponential relationship.

Page 26: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Your Turn: Is the following data exponential & if so, what is r?

Page 27: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Your Turn: Is the following data (Hours vs. Number) exponential & if so, what is r?

Page 28: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Exponential Regression Procedure

1. Verify data is exponential.

Graph scatterplot

Transform data to linear by taking the log of the

response variable.

2. Calculate the LSRL for the transformed data; log ŷ =b0+b1x (linear model). Analyze using

linear techniques, LSRL, r, r2, and residuals.

3. Find exponential model for the original data by

inverse transformation of the LSRL,

exponentiating both sides of the LSRL equation to base 10; ŷ = C • 10kx (exponential

model).

Page 29: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Example: Data

• Year

1880

1890

1900

1910

1920

1930

1940

1950

1960

1970

• Mbbl

30

77

149

328

689

1,412

2,150

3,803

7,674

16,690

Annual crude oil production from 1880 to 1970

Page 30: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

What to do:

1. Graph scatterplot.

2. Transform data to linear (take the log of y).

3. Calculate LSRL of transformed data & graph.

4. Analyze transformed data (r, r2, residual plot).

5. Perform inverse transformation (exponentiate

LSRL to base 10).

6. Graph exponential model.

Page 31: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Back to the Data

• Year

1880

1890

1900

1910

1920

1930

1940

1950

1960

1970

• Mbbl

30

77

149

328

689

1,412

2,150

3,803

7,674

16,690

Annual crude oil production from 1880 to 1970

Page 32: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Models of Data

• Data is exponential (scatterplot curved pattern and constant common ratio ≈ 2.1)

• Linear model

log ŷ=-53.7+.0294x

• Exponential model

ŷ=(10-53.7) • 10.0294x

Use model on the calculate to make predictions,

not the exponential model equation.

Predict oil production for 1956.

• 6564 Mbbl

Predict oil production for 1992.

• 75027 Mbbl – extrapolation, be careful.

Page 33: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Your Turn: Exponential Regression

Page 34: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Models of Data

• Data is exponential (scatterplot curved pattern and constant common ratio ≈ 1.5)

• Linear Model

Log ŷ = -24.11 + .0157x

• Exponential Model

ŷ = (10-24.11) • (10.0157x)

Page 35: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Your Turn: Age vs Height

Page 36: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Models for Data

• Data is exponential (scatterplot curved pattern and constant common ratio ≈ 1.04)

• Linear Model

Log ŷ = 1.89 + .0241x

• Exponential Model

ŷ = (101.89) • (10.0214x)

• If comparing Height vs Weight, could a common

ratio be calculated?

NO, because the explanatory variable Height

does not in crease in equal increments.

Have to calculate different models and see which

best fits the data.

Page 37: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Transforming or Re-Expression Power Data

Page 38: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Power Function Model

• Power Function general form: y = axb

• When we apply the log transformation to the response variable

y in an exponential growth model, we produce a linear

relationship. To produce a linear relationship from a power

function model, we apply the log transformation to both

variables (x & y).

• Here is how it is done.

Power function model: y = axb

Take the log of both sides of the equation:

log y = log (axb)

Using the product and power properties of logs, this results

in a linear relationship between log y and log x.

log y = log a + log xb

log y = log a + b log x

The power b in the power function model becomes the

slope of the straight line that links log y to log x.

Page 39: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Inverse Transformation

• Obtaining a power function model for the

original data from the LSRL on the transformed

data.

• LSRL will have the form:

log ŷ = a + b log x

• Inverse transform the LSRL by exponentiating both sides of the equation to base 10.

10log ŷ = 10(a + b log x)

ŷ = (10a)(10b log x)

ŷ = (10a)(10log x)b

ŷ = (10a)(xb) which is in the form y = C · xb

• A Power Function (can not be done on the calulator, must be done by hand).

Page 40: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Power Function Procedure

1. Graph scatterplot.

2. Determine it is a power function (ie. not

exponential).

3. Transform data to linear (take the log of y & x).

4. Calculate LSRL of transformed data & graph.

5. Analyze transformed data (r, r2, residual plot).

6. Perform inverse transformation (exponentiate

LSRL to base 10).

7. Graph power model.

8. Make predictions based on the power model.

Page 41: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Example 1

• The table shows the temperature of an instrument

measured as its distance from a heat source is varied.

Find a suitable model for Dist. vs Temp.

• LSRL: log(Temp.) = 4.84 - .255 log(Dist.)

log ŷ = 4.84 - .255 log x

• Power model: Temp. = (104.84)·(Dist.)-.255

ŷ = 104.84 · x-.255

Page 42: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

Your Turn:

• The owner of a Video Game Store records the

business costs and revenue for different years

with the results listed. Find the best model.

• LSRL: log ŷ = 3.3 + .4 log x

• Power model: ŷ = 103.3 · x.4 or ŷ = (1995)x.4

Page 43: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

What Can Go Wrong?

• Don’t expect your model to be perfect.

• Don’t stray too far from the ladder.

• Don’t choose a model based on R2 alone:

Page 44: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

What Can Go Wrong?

• Beware of multiple modes.

Re-expression cannot pull separate modes together.

• Watch out for scatterplots that turn around.

Re-expression can straighten many bent relationships, but not those that go up then down, or down then up.

Page 45: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

What Can Go Wrong?

• Watch out for negative data values.

It’s impossible to re-express negative values by any power that is not a whole number on the Ladder of Powers or to re-express values that are zero for negative powers.

• Watch for data far from 1.

Data values that are all very far from 1 may not be much affected by re-expression unless the range is very large. If all the data values are large (e.g., years), consider subtracting a constant to bring them back near 1.

Page 46: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

What have we learned?

• When the conditions for regression are not met,

a simple re-expression of the data may help.

• A re-expression may make the:

Distribution of a variable more symmetric.

Spread across different groups more similar.

Form of a scatterplot straighter.

Scatter around the line in a scatterplot more

consistent.

Page 47: AP Statistics - Charlotte County Public Schools · Power Function Procedure 1. Graph scatterplot. 2. Determine it is a power function (ie. not exponential). 3. Transform data to linear

What have we learned?

• Taking logs is often a good, simple starting

point.

To search further, the Ladder of Powers or the

log-log approach can help us find a good re-

expression.

• Our models won’t be perfect, but re-expression

can lead us to a useful model.


Recommended