Go into “play mode” to view slideshow as intended. Hold down SHIFT and click F5.

Scholarship: Statistics and Modelling Performance Standard 93201

Outcome description The student will demonstrate the ability to apply mathematical, statistical and probability knowledge and methods to complex problems in contexts which may be unfamiliar, interpret and, where appropriate, generalise results and clearly communicate concepts and findings.

Go into “play mode” to view slideshow as intended.

Hold down SHIFT and click F5.

Regression & Curve-fittingSKILLS: Selecting the best model(s) in situations that are not

clear-cut, justifying your decision, and report-writing.Do past schol questions that involve this. Get others to read

your comments and discuss (e.g. teacher, peers doing Scholarship).

INFO: - Describing shape, strength & direction.- Regression and the method of Least Squares.

- R2 and r - Testing for a Linear Model – Residual plots. - Testing for an Exponential Model (log-linear transformation) - Testing for a Power Function (log-log transformation) - Justifying choice of model - Using Piece-wise functions (key in Schol.)

Go into “play mode” to view slideshow as intended.

Hold down SHIFT and click F5.

Glossary

Explanatory (independent) Variable:

A quantitative variable that we control the value of (plotted on x-axis).

“Control” variable.

Response (dependent) Variable:

Variable whose value we do not control, but only observe (plotted on y-axis).

We investigate how it changes as we alter the value of the explanatory variable

Linear relationship:Data best explained by a straight

Regression line (linear-regression model)

Non-linear relationship:Data best explained by a curve (non-linear

model). Investigate a power curve, exponential curve, quadratic, polynomial, or logarithmic.

Correlation:The relationship between two

continuous quantitative variables – tells us NOTHING about cause and effect.

Regression:The process of fitting a line

(Regression line) or curve to a scatterplot and using its equation for the purpose of ESTIMATING the value of y (response variable), based on a given value of x (explanatory variable).

GlossaryCorrelation:The relationship between two continuous

quantitative variables – tells us NOTHING about cause and effect.

Regression:The process of fitting a line

(Regression line) or curve to a scatterplot and using its equation for the purpose of ESTIMATING the value of y (response variable), based on a given value of x (explanatory variable).

Correlation Coefficient (r):A measure of the strength of a linear

relationship between two variables. Domain: (-1<r<1)

• If r is positive: positive relationship – as x gets larger, y gets larger.

• If r is negative: negative relationship – as x gets larger, y gets smaller.

Coefficient of Determination (R2):

Measures the percentage of variability in y (response variable) that is accounted for by the regression line.

Investigating Causation:Investigating the cause of the

relationship observed between two variables.

Confounding:When the relationship between the 2

variables being studied is being influenced by another variable(s).

Direct causal relationship:XY or YX

Indirect link:Relationship is “Counfounded” by the

presence of a THIRD “Lurking” variable

XmY (mediating variable)ZX and ZY (common cause)

X1 and X2 Y (multiple causal factors)

No significant relationship:When one cannot rule out the

possibility that there is no actual relationship between two variables in the population.

Glossary Reliability:Confidence in estimates made

using model:If not much variability (data-

points close to line/curve), then a valid model should allow us to make reliable estimates.

If large variability (points spread out),

then we could not make very reliable estimates, even with best poss model.

Validity:• Validity of your model and

its estimates:The degree to which your mathematical model reflects the behaviour of the relationship between the 2 variables.

• Validity of a claim or conclusion:Accuracy of a person’s claim or conclusion based on the evidence of actual data.

Investigating Causation:Investigating the cause of the

relationship observed between two variables.

Confounding:When the relationship between the 2

variables being studied is being influenced by another variable(s).

Direct causal relationship:XY or YX

Indirect link:Relationship is “Counfounded” by the

presence of a THIRD “Lurking” variable

XmY (mediating variable)ZX and ZY (common cause)

X1 and X2 Y (multiple causal factors)

No significant relationship:When one cannot rule out the

possibility that there is no actual relationship between two variables in the population.

Scatter Plots

Types of Variables

Scatter Plots

Types of Variables

Quantitative(numerical)

Discrete

Continuous

Scatter Plots

• The scatter plot is the basic tool used to investigate relationships between two ___________ variables.

Types of Variables


Qualitative(categories)

Discrete

Continuous

Scatter Plots

• The scatter plot is the basic tool used to investigate relationships between two quantitative variables.

• It is used for paired (__________) data.

Types of Variables



Discrete

Continuous

Scatter Plots

• The scatter plot is the basic tool used to investigate relationships between two quantitative variables.

• It is used for paired (bi-variate) data.

Types of Variables



Discrete

Continuous

What do I look for in scatter plots?

Shape

Do you see – a ______relationship…

straight line

OR

– a ___ ________ relationship?


Shape

Do you see – a linear relationship…

straight line

OR

– a ___ ________ relationship?


Shape


straight line

OR

– a non-linear relationship?


Shape


straight line

OR

– a non-linear relationship?


Shape

Do you see – a _______ relationship… as

one variable gets bigger, so does the other

OR– a ________ relationship? as one

variable gets bigger, the other gets smaller


Shape

Do you see – a positive relationship… as


OR– a ________ relationship? as one



Shape



OR– a negative relationship? as one



Shape



OR– a negative relationship? as one



Scatter

Do you see – a ______relationship… little

scatter

OR

– a ______relationship? lots of scatter


Scatter

Do you see – a strong relationship… little

scatter

OR

– a ______relationship? lots of scatter


Scatter


scatter

OR

– a weak relationship? lots of scatter


Scatter


scatter

OR

– a weak relationship? lots of scatter


Scatter

Do you see – _______ scatter… roughly the

same amount of scatter as you look across the plot

or – ___ _________scatter? the

scatter looks like a “fan” or “funnel”


Scatter

Do you see – constant scatter… roughly the


or – ___ _________scatter? the



Scatter



or – non-constant scatter? the



Scatter



or – non-constant scatter? the


What do I look for in scatter plots? Unusual features

Do you see – any ________? unusually far

from the trend

– any __________?


Do you see – any outliers? unusually far

from the trend

– any __________?



from the trend

– any groupings?



from the trend

– any groupings?

Rank these relationships from strongest (1) to weakest (4):


3

4

1

2


• How did you make your decisions?

• The less scatter there is about the trend line, the stronger the relationship is.

Describing scatterplots

• Relationship – linear or non linear

• Relationship – positive or negative

• Relationship – strong or weak

• Scatter – constant or non constant

• Unusual features – outliers or groupings

What do I see in these scatter plots?

• There appears to be a linear trend.

• There appears to be moderate constant scatter about the trend line.

• Negative relationship.• No outliers or

groupings visible.454035

20

19

18

17

16

15

14

Latitude (°S)

Mean January Air Temperatures for 30 New Zealand Locations

Tem

pera

ture

(°C

)


• There appears to be a non-linear trend.

• There appears to be non-constant scatter about the trend line.

• Positive relationship.• One possible outlier

(Large GDP, low % Internet Users).

0 10 20 30 40

GDP per capita (thousands of dollars)

0

10

20

30

40

50

60

70

80

Inte

rnet

Users

(%

)

% of population who are Internet Users vs

GDP per capita for 202 Countries


• Two non-linear trends (Male and Female).

• Very little scatter about the trend lines

• Negative relationship until about 1970, then a positive relationship.

• Gap in the data collection (Second World War).

Year

1990198019701960195019401930

30

28

26

24

22

20

Ag

e

Average Age New Zealanders are First Married

Regression

What is Regression?Regression is when you fit a line or curve to

a scatter plot for the purpose of ________ the value of y (________variable), based on a given value of x (____________ variable).

Back to Regression & Curve Fitting menu

Regression


a scatter plot for the purpose of PREDICTING the value of y (________variable), based on a given value of x (____________ variable).

Regression


a scatter plot for the purpose of PREDICTING the value of y (response variable), based on a given value of x (____________ variable).

Regression

What is Regression?Regression is when you fit a line or curve to a

scatter plot for the purpose of PREDICTING the value of y (response variable), based on a given value of x (explanatory variable).

Difference between Correlation and Regression:Correlation: Measures degree of

association.Regression: Uses relationship to predict.



Difference between Correlation and Regression:

Correlation: Measures degree of association.

Regression: Uses relationship to predict.

Unless the relationship is perfect, there is some variation of the observed y values from those predicted by the fitted line. We call these prediction errors ‘ ___________’.






Unless the relationship is perfect, there is some variation of the observed y values from those predicted by the fitted line. We call these prediction errors ‘Residuals’.





The aim of regression is to fit a line that keeps the ________ as small as possible.





The aim of regression is to fit a line that keeps the residuals as small as possible.

Regression Regression

8

y = 5 + 2x

data point(8, 25)

25

21prediction error

Regression relationship = trend + scatter

Observed value = predicted value + prediction error

The Least Squares Regression The Least Squares Regression LineLine

• Choose the line with smallest sum of squared prediction errors.

Minimise the sum of squared prediction errors 2 )(residuals

Which line?



Minimise the sum of squared prediction errors 2

ˆ )y-(y

Which line?



Minimise the sum of squared prediction errors 2 )(residuals

Which line?



Minimise the sum of squared prediction errors 2

ˆ )y-(y

Which line?

Fitting the best regression line: The method of LEAST SQUARES

Take the data that we collected at the beginning of the year on this class:Forearm length (explanatory variable) vs height (response variable).

Below is the scatter plot that we made using this data.

y = 2.2118x + 116.76

172

174

176

178

180

182

184

186

188

190

192

194

26.5 27 27.5 28 28.5 29 29.5 30 30.5 31 31.5 32

He

igh

t (c

m)

Radius length (cm)

Scatter plot of Radius length vs Height for Y13 Stats students at STC, 2010

There is always one unique regression line with the best possible fit. It has 3 characteristics:

1 The sum of the residuals will be very close to zero (positives & negatives cancel out).

2 The sum of the squared residuals will be minimised (hence called the ‘least squares’ regression line).

3 The ‘mean’ point will lie on the line.

y = 2.2118x + 116.76

172

174

176

178

180

182

184

186

188

190

192

194

26.5 27 27.5 28 28.5 29 29.5 30 30.5 31 31.5 32

He

igh

t (c

m)

Radius length (cm)

Scatter plot of Radius length vs Height for Y13 Stats students at STC, 2010

),( yx

x Length of forearm (cm)

y

Height (cm)

ŷ

(height predicted by

regression line)

ŷ=2.2118x + 116.76

Prediction error

(Residuals)

y - ŷ

(Residuals)2

27.4 178.2

29 180

27 183

30 181

28.5 175

30.5 190.5

30.5 178

28.9 179.1

30.5 187

29 181

31.5 191.5

29.5 174

29.5 184

TOTALS:

),( yx

(1.) What is the sum of all your residuals? _________ . For the line of best fi t, this should be very close to zero. Why does it not have to come out as exactly zero?

_________________________________________________________

(2.) Find the mean values of the x and y variables: x and y .

x = ____________ , y = ______________ .

Does the point ),( yx lie on your regression line?

Test by subbing the value of x in f or x your regression equation. What value does it predict f or y ? I s it the same value as y (the mean y value)? _______

(3.) The defi nition of the least squares regression line is that it minimises the sum of the

squared residuals. Use Excel to test whether this is true by altering the values of the gradient and y-intercept in the regression equation both up and down. Watch how the total at the bottom of the (residuals)2 column changes. I s this really the least squares line? Explain.

____________________________________________

R-squared (R2) - the Co-efficient of DeterminationOn a scatter plot Excel has options for displaying the equation of the fitted

line and the value of R2.Four scatter plots with fitted lines are shown below. The equation of the

fitted line and the value of R2 are given for each plot.

Common Cracker Brands

y = 0.1112x + 372.37

R2 = 0.4257

350

400

450

500

550

100 300 500 700 900 1100 1300

Salt (mg/100g)

En

erg

y (c

alo

ries

/100

g)

Common Cracker Brandsy = 4.9844x + 380.82

R2 = 0.982

350

400

450

500

550

0 10 20 30 40

Total Fat (%)

En

erg

y (c

alo

ries

/100

g)


R2 = 0.0166

350

400

450

500

550

0 20 40 60

Number of crackers per 100g

En

erg

y (

calo

ries/1

00g

)

Common Cracker Brandsy = 0.0237x - 2.6556

R2 = 0.4892

0

5

10

15

20

25

30

35

0 500 1000 1500

Salt (mg/100g)

To

tal

Fat

(%

)

Comment on any relationship between the scatter plot and the value of R2.What do you think R2 is measuring?

ANSWER: The smaller the scatter about the trend-line, the greater the R2.


What should I say about the R2?

R2: The “Co-efficient of Determination”(measures VARIABILITY)

It measures the “ _______________________________________________________________________.”



It measures the “proportion of variability in the RESPONSE variable (y) ________________________________.”



It measures the “proportion of variability in the RESPONSE variable (y) that is accounted for by the slope of the regression line.”







The R2 is something you can look at, but only as an extra piece of evidence.




The R2 is something you can look at, but only as an extra piece of evidence. An R2 of 0.85 means that __________________________

_____________________________________________________ .




The R2 is something you can look at, but only as an extra piece of evidence. An R2 of 0.85 means that your model accounts for 85% of RESPONSE VARIABLE’S variability.

(85% of the vertical scatter above and below the MEAN value on the y-axis).




The R2 is something you can look at, but only as an extra piece of evidence. An R2 of 0.85 means that your model accounts for 85% of RESPONSE VARIABLE’S variability


BUT the R2 on its own cannot be used to determine whether a model is appropriate.



The R2 is something you can look at, but only as an extra piece of evidence. An R2 of 0.85 means that your model accounts for 85% of RESPONSE VARIABLE’S variability



R2 deals with variability accounted for, but it doesn’t tell us whether the SHAPE of the model is appropriate.

The visual fit of the data-points to the model and the linearity of a log-transformation should be the main criteria.



R2 deals with variability accounted for, but it doesn’t tell us whether the SHAPE of the model is appropriate.

The visual fit of the data-points to the model and the linearity of a log-transformation should be the main criteria.

The graph to the left is clearly non-linear.

Its shape is concave upwards.

Yet the R2 for a linear trendline is very strong (R2 = 0.936).

Why is this??

Log-log transformation

y = 0.8587x + 4.7271 R 2 = 0.936

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0

0 0.5 1 1.5 2 ln(x)

ln(y)

R-squared (R2) - the Co-efficient of Determination

R2 measures the proportion of variance in the response variable (y) either side of the mean y-value that can be explained by regression line/curve.

Common Cracker Brands

y = 0.1112x + 372.37

R2 = 0.4257

350

400

450

500

550

100 300 500 700 900 1100 1300

Salt (mg/100g)

En

erg

y (c

alo

ries

/100

g)


R2 = 0.982

350

400

450

500

550

0 10 20 30 40

Total Fat (%)

En

erg

y (c

alo

ries

/100

g)


R2 = 0.0166

350

400

450

500

550

0 20 40 60

Number of crackers per 100g

En

erg

y (

calo

ries/1

00g

)

Common Cracker Brandsy = 0.0237x - 2.6556

R2 = 0.4892

0

5

10

15

20

25

30

35

0 500 1000 1500

Salt (mg/100g)

To

tal

Fat

(%

)

R-squared (R2) – the formula

Correlation (r : the correlation coefficient)

Correlation measures the _______ of the _____ _______ between two quantitative variables


Correlation measures the strength of the _____ _______ between two quantitative variables

Correlation measures the strength of the linear association between two quantitative variables, and

Tells us whether it the relationship is ________ or _________.



Tells us whether it the relationship is positive or _________.



Tells us whether it the relationship is positive or negative .

Get the correlation coefficient (r) from your calculator or computer

r has a value between -1 and +1


Tells us whether it the relationship is positive or negative .



r = -1 r = -0.7 r = -0.4 r = 0 r = 0.3 r = 0.8 r = 1

Points f all exactly on a straight line


No linear relationship

(uncorrelated)



Correlation has no units

r = -1 r = -0.7 r = -0.4 r = 0 r = 0.3 r = 0.8 r = 1



No linear relationship

(uncorrelated)

What can go wrong? Use correlation only if you have two quantitative

variables (variables that can be measured)

There is an association between gender and weight but there isn’t a correlation between gender and weight!

Gender is not quantitative

Use correlation only if the relationship is ______

What can go wrong? Use correlation only if you have two quantitative

variables (variables that can be measured)

There is an association between gender and weight but there isn’t a correlation between gender and weight!

Gender is not quantitative

Use correlation only if the relationship is linear

Beware of outliers!

Always plot the data before looking at the correlation!

• r = 0

• No linear relationship, but there is a relationship!

• r = 0.9

• No linear relationship, but there is a relationship!

Tick the plots where it would be OK to use a correlation coefficient to describe the

strength of the relationship:

9876543210

4000300020001000

0

Position Number

Dis

tan

ce (

million

m

iles)

Distances of Planets from the Sun

Reaction Times (seconds)

for 30 Year 10 Students

0

0.2

0.4

0.6

0.8

0 0.2

0.4

0.6

0.8

1

Non-dominant Hand

Dom

inan

t H

an

d

454035

20

19

18

17

16

15

14

Latitude (°S)

Mean January Air Temperatures

for 30 New Zealand Locations

Tem

pera

ture

(°C

)

Female ($)

Average Weekly Income for Employed New Zealanders in

2001

Male

($)

0

200

400

600

800

1000

1200

0 200

400

600

800

Tick the plots where it would be OK to use a correlation coefficient to describe the

strength of the relationship:

9876543210

4000300020001000

0

Position Number

Dis

tan

ce (

million

m

iles)

Distances of Planets from the Sun

Reaction Times (seconds)

for 30 Year 10 Students

0

0.2

0.4

0.6

0.8

0 0.2

0.4

0.6

0.8

1

Non-dominant Hand

Dom

inan

t H

an

d

454035

20

19

18

17

16

15

14

Latitude (°S)

Mean January Air Temperatures

for 30 New Zealand Locations

Tem

pera

ture

(°C

)

Female ($)

Average Weekly Income for Employed New Zealanders in

2001

Male

($)

0

200

400

600

800

1000

1200

0 200

400

600

800

Not linear

Remove two outliers, nothing happening

Understanding r and R2

See forearm length vs height data we collected on this class at the start of the year.

Play around with this graph.

The R2 value (coefficient of determination) depends on which regression model we fit to the data because it is the proportion of variation in the response variable accounted for by that model.

So, why does the value of the correlation coefficient r remain the same, regardless of which regression model we fit to the graph?

The R2 value (coefficient of determination) depends on which regression model we fit to the data because it is the proportion of variation in the response variable accounted for by that model.So, why does the value of the correlation coefficient r remain the same, regardless of which regression model we fit to the graph?

Answer: Because the correlation coefficient (r) is simply the strength of the linear association between the 2 variables.Mathematically it is calculated from the Covariance between x and y , which is a combined measure of the spread of the data around the means of both variables. Hence r can be calculated without the aid of a regression line. So actually r only equals the square root of R2 if the R2 value is based on the least squares linear regression line.

What do I see in this scatter plot?

Trend:Appears to be a linear trend, with a possible outlier (tall person with a small foot size.)

Strength:Moderate strength.Constant scatter.

DirectionPositive association.As foot size increases, height TENDS to increase.

22 23 24 25 26 27 28 29

150

160

170

180

190

200

Foot size (cm)

Heig

ht

(cm

)

Height and Foot Size for 30 Year 10 Students

What will happen to the correlation coefficient if the tallest Year 10 student is

removed?

• It will get smaller

• It won’t change

• It will get bigger

22 23 24 25 26 27 28 29

150

160

170

180

190

200

Foot size (cm)

Heig

ht

(cm

)


What will happen to the correlation coefficient if the tallest Year 10 student is

removed?


22 23 24 25 26 27 28 29

150

160

170

180

190

200

Foot size (cm)

Heig

ht

(cm

)


What do I see in this scatter plot?

• Appears to be a strong linear trend.

• Outlier in X (the elephant).

• Appears to be constant scatter.

• Positive association.6005004003002001000

40

30

20

10

Gestation (Days)

Lif

e E

xp

ecta

ncy (

Years

)

Life Expectancies and Gestation Period for a sample of non-human Mammals

Elephant

What will happen to the correlation

coefficient if the elephant is removed?


• It won’t change


6005004003002001000

40

30

20

10

Gestation (Days)

Lif

e E

xp

ecta

ncy (

Years

)


Elephant

What will happen to the correlation

coefficient if the elephant is removed?


6005004003002001000

40

30

20

10

Gestation (Days)

Lif

e E

xp

ecta

ncy (

Years

)


Elephant

Calculating the correlation coefficient, r

1. Plot the data on your Graphics Calculator.

2. Fit a linear trendline (regression line).

3. Display R2.

4. r =

If the slope is positive, then r is positive.

If the slope is negative, then r is negative.

2R

Extension session:Calculating the correlation coefficient

First work out the covariance – a measure of how x and y vary together from their means:

Covariance of X and Y:

Cov(X,Y) =

The correlation coefficient r = YX

YXCov

.

),(

n

yyxx )).((

Testing for a Linear model: Residual plots

If our linear model describes the shape of the data well (i.e. it is linear), then the residuals should just be random variation.

They should be distributed evenly either side of the regression line (uniform shape).

Q. How do we test this?A.By plotting the residuals on a graph.

If the data is linear, the residual plot should show an even balance of positive and negative residuals (uniform shape – no patterns).

If the residual plot shows a curved pattern (e.g. mostly negative residuals at one end and positive residuals at the other), it means that the best model is either non-linear or piecewise.

To practice using Residual Plots: Do NuLake p252: Q22 & 23 (residual plots).


Exponential functions and the “LOG-LINEAR” transformation

(plotting x against ln(y) )


Exponential functions and the “LOG-LINEAR” transformation

(plotting x against ln(y) )For any 2 variables x and y, if y is an exponential

function of x, then ln(y) will be a linear function of x.

How to test for an Exponential Model:

We can test whether the relationship between 2 variables is best modelled by an exponential function of the form y = Aekx by plotting x against ln(y). If the resulting scatter follows a linear trend, then this means that y is an exponential function of x.

First calculate some ordered pairs for y = 2e3x.

If y = 2e3x, draw the graph of x vs ln(y).

x y

0

0.5

1

If y = 2e3x, draw the graph of x vs ln(y).

Several ordered pairs from this relation are:

2

8.9634

40.1711

Graph.

We get the growth curve we would expect from an exponential relationship.

x0.5 1

y

4

3

2

1

Calculate the equivalent pairs for x and ln(y).

ln(y)

x

ln(y)

0.5 1

4

3

2

1

0.6931

2.1931

3.6931

We obtain a straight line. By making this conversion, an exponential relationship is converted to a linear one.

Graph of x vs y Graph of x vs logey

The spreadsheet shows the number of truckloads of logs delivered to a sawmill over a ten-year period. 1 Verify that an exponential model is an appropriate model for the number of truckloads delivered per year.2 Determine the equation of the ‘best fit’ exponential curve.3 Estimate the yearly rate of increase as a percentage.

Calculate the log for each delivery number in column C alongside the respective number in column B.

1 Verify that an exponential model is an appropriate model for the number of truckloads delivered per year.



Graph log(deliveries) vs year number.

Use the Chart-Wizard to draw a scatter diagram (the X-Y Scatter option).


As a straight line relationship is apparent, the actual relationship is exponential.

2 Determine the equation of the ‘best fit’ exponential curve.

Graph the truckloads vs year and fit an exponential model.

2 Determine the equation of the ‘best fit’ exponential curve.

The equation of the curve of ‘best fit’, using an exponential model, is y = 16 589e0.2424x

3 Estimate the yearly rate of increase as a percentage.

y = 16 589e0.2424x

This can also be expressed as

0.242416589 ( )xy e

16589 (1.274)x

The model is

The yearly rate of increase is approximately 27.4%

Power functions and the “LOG-LOG” transformation(plotting ln(x) against ln(y) )


Power functions and the “LOG-LOG” transformation(plotting ln(x) against ln(y) )

For any 2 variables x and y, if y is a power function of x, (i.e. y=Axn), then ln(y) will be a linear function of ln(x).

How to test for a Power Model:

We can test whether the relationship between 2 variables is best modelled by a power function of the form y = Axn by plotting ln(x) against ln(y). If the resulting scatter follows a linear trend, then this means that y is a power function of x.

If y = 2x0.4, draw the graph of ln(x) vs ln(y).

First calculate some ordered pairs for y = 2x0.4.

If y = 2x0.4, draw the graph of ln(x) vs ln(y).

Several ordered pairs from this relation are:

x y

1

2

3

We get the curve we would expect from a power relationship.

Graph.

y

1

3

2

1

x2 3

Calculate the equivalent pairs for ln(x) and ln(y).

2

2.6390

3.1037

ln(x) ln(y)0.6931

0.9704

1.1326

0

0.6931

1.0986

The equivalent pairs for ln(x) and ln(y) are:

ln(x)0.5 1

ln(y)

1

0.5Graph.

We obtain a straight line.

By making this conversion, a power-function relationship is converted to a linear one.

Graph of x vs y

Graph of logex vs logey

Sigma (new): Ex. 20.03

The first question is worked through on the following slides.

Hold down SHIFT and push F5 to view these.

(a) Draw a graph of logey against x.(b) Draw a graph of logey against logex

(a) Draw a graph of logey against x.



















Select the x and ln(y) columns (hold down control to get both).


FORMATTING:

-Write title: x vs ln(y)


Label horizontal axis: xAnd vertical axis: ln(y)


(a) Draw a graph of logey against x.(b) Draw a graph of logey against logex

(b) Draw a graph of logey against logex








Select the ln(x) and ln(y) columns and insert a new scatter plot.


Title: ln(x) vs ln(y).

Select “LAYOUT 1” as before & add axis labels.

(a) Draw a graph of logey against x.(b) Draw a graph of logey against logex(c) Which relationship best fits the data? Choose between y = aekx or y = axn.

(c) Which relationship best fits the data? Choose between y = aekx or y = axn.

Add a linear regression line to each graph:


There is our log linear transformation.



Appears to be linear.




Now fit a linear regression line to the other graph and compare.




Now fit a linear regression line to the other graph and compare.

Which one looks more linear?

Clearly the log-linear transformation (x vs ln(y)) has a linear shape.



Whereas the log-log transformation (ln(x) vs ln(y)) does not.




Hence an __________ model (y = aekx) will fit the raw data better than a _______ model (y = axn).



Hence an exponential model (y = aekx) will fit the raw data better than a _______ model (y = axn).




Hence an exponential model (y = aekx) will fit the raw data better than a power model (y = axn).



(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.



y = -0.9993x + 5.7857

0

1

2

3

4

5

6

0 1 2 3 4 5 6

ln(y

)

x

x vs ln(y)Need to find the constants “a” and “k” in the exponential equation for the raw data: y = aekx


y = -0.9993x + 5.7857

0

1

2

3

4

5

6

0 1 2 3 4 5 6

ln(y

)

x


In terms of the original variables x and y, the equation on the log-linear graph is really:


y = -0.9993x + 5.7857

0

1

2

3

4

5

6

0 1 2 3 4 5 6

ln(y

)

x



ln(y) = -0.9993x + ln(a)


y = -0.9993x + 5.7857

0

1

2

3

4

5

6

0 1 2 3 4 5 6

ln(y

)

x



ln(y) = -0.9993x + ln(a)

ln(a) = 5.7857 (from equation on graph). And ln(a) means logea


y = -0.9993x + 5.7857

0

1

2

3

4

5

6

0 1 2 3 4 5 6

ln(y

)

x



ln(y) = -0.9993x + ln(a)


So a = e5.7857


y = -0.9993x + 5.7857

0

1

2

3

4

5

6

0 1 2 3 4 5 6

ln(y

)

x



ln(y) = -0.9993x + ln(a)


So a = e5.7857 = _______


y = -0.9993x + 5.7857

0

1

2

3

4

5

6

0 1 2 3 4 5 6

ln(y

)

x



ln(y) = -0.9993x + ln(a)


So a = e5.7857 = 325.6


y = -0.9993x + 5.7857

0

1

2

3

4

5

6

0 1 2 3 4 5 6

ln(y

)

x



ln(y) = -0.9993x + ln(a)


So a = e5.7857 k = -0.9993 (gradient) = 325.6


y = -0.9993x + 5.7857

0

1

2

3

4

5

6

0 1 2 3 4 5 6

ln(y

)

x



ln(y) = -0.9993x + ln(a)


So a = e5.7857 k = -0.9993 (gradient) = 325.6

So model is: y = 325.6e-0.9993x

Where x=weeks since crash, and y = share price in cents.



Now insert a scatter plot of the raw data (x vs y).Format and label it.


Now insert a scatter plot of the raw data (x vs y).Format and label it.


Then right-click on any data-point on the graph.Fit a trendline:-Choose exponential (since we’ve already chosen an exponential model).

What do you notice??





y = 325.6e-0.999x

0

20

40

60

80

100

120

140

0 1 2 3 4 5 6

Aver

age

wee

kly

shar

e pr

ice

(cen

ts)

Weeks since October 1987 crash occurred

Graph of Ariadne Share Price for the 5 weeks following the October 1987 crash



Exponential model. Visually a good fit.

Look familiar?


y = 325.6e-0.999x

0

20

40

60

80

100

120

140

0 1 2 3 4 5 6

Aver

age

wee

kly

shar

e pr

ice

(cen

ts)


Graph of Ariadne Share Price for the 5 weeks following the October 1987 crash Look familiar?

This is the same equation we formed algebraically based on the log-linear graph.

Just to be sure that a power function model wouldn’t be better, see what happens if we try to one…






y = 164.86x-2.42

0

20

40

60

80

100

120

140

160

180

0 1 2 3 4 5 6

Aver

age

wee

kly

shar

e pr

ice

(cen

ts)




Power function model. Shocker!It is visually evident that this model is a poor fit to the data.

y = 325.6e-0.999x

0

20

40

60

80

100

120

140

0 1 2 3 4 5 6

Aver

age

wee

kly

shar

e pr

ice

(cen

ts)





y = 164.86x-2.42

0

20

40

60

80

100

120

140

160

180

0 1 2 3 4 5 6

Aver

age

wee

kly

shar

e pr

ice

(cen

ts)



Power function model. Poor fit.

y = 325.6e-0.999x

0

20

40

60

80

100

120

140

0 1 2 3 4 5 6

Aver

age

wee

kly

shar

e pr

ice

(cen

ts)




So, the equation of the best model for this data is:

y = 325.6e-0.999x


y = 325.6e-0.999x

0

20

40

60

80

100

120

140

0 1 2 3 4 5 6

Aver

age

wee

kly

shar

e pr

ice

(cen

ts)




• Extension: Actually the most realistic model is probably: y = 325.6e-x

So, the equation of the best model for this data is:

y = 325.6e-0.999x


y = 325.6e-0.999x

0

20

40

60

80

100

120

140

0 1 2 3 4 5 6

Aver

age

wee

kly

shar

e pr

ice

(cen

ts)





Why?


y = 325.6e-0.999x

0

20

40

60

80

100

120

140

0 1 2 3 4 5 6

Aver

age

wee

kly

shar

e pr

ice

(cen

ts)





Why? Because the index -0.9993x is very close to -1x.


y = 325.6e-0.999x

0

20

40

60

80

100

120

140

0 1 2 3 4 5 6

Aver

age

wee

kly

shar

e pr

ice

(cen

ts)






General rule: If an index is very close to a multiple of 0.5, like this, it’s appropriate to round it when we give the final answer.


y = 325.6e-0.999x

0

20

40

60

80

100

120

140

0 1 2 3 4 5 6

Aver

age

wee

kly

shar

e pr

ice

(cen

ts)







This is because our equation is only an estimated model.


y = 325.6e-0.999x

0

20

40

60

80

100

120

140

0 1 2 3 4 5 6

Aver

age

wee

kly

shar

e pr

ice

(cen

ts)







This is because our equation is only an estimated model. An element of uncertainty exists because our data used for this scatter plot is just a SAMPLE


y = 325.6e-0.999x

0

20

40

60

80

100

120

140

0 1 2 3 4 5 6

Aver

age

wee

kly

shar

e pr

ice

(cen

ts)







This is because our equation is only an estimated model. An element of uncertainty exists because our data used for this scatter plot is just a SAMPLE (i.e. from just one company).


y = 325.6e-0.999x

0

20

40

60

80

100

120

140

0 1 2 3 4 5 6

Aver

age

wee

kly

shar

e pr

ice

(cen

ts)








We’re using the relationship found in this sample to make an INFERENCE about a relationship between these two variables in the population


y = 325.6e-0.999x

0

20

40

60

80

100

120

140

0 1 2 3 4 5 6

Aver

age

wee

kly

shar

e pr

ice

(cen

ts)








We’re using the relationship found in this sample to make an INFERENCE about a relationship between these two variables in the population (i.e. behaviour of all share prices in the market over these 5 weeks).

For more practice if needed – Sigma (new) pg. 420: Ex. 20.03.

NuLake 7.6 Modelling Power FunctionsQ39Dist (d) Volume (V)

0.45 14800.71 5951.21 2051.78 952.14 653.65 23

ln(d) ln(V)-0.798508 7.299797367-0.34249 6.388561406

0.1906204 5.3230099790.5766134 4.5538768920.7608058 4.174387271.2947272 3.135494216

(b) Gradient (k) = -1.9932 Y-intercept (c) = 5.7045

Intensity of Sound

y = -1.9932x + 5.7045

R2 = 1

012345678

-1 -0.5 0 0.5 1 1.5

ln(d)

ln(V

)

(b) Gradient (k) = -1.9932 Y-intercept (c) = 5.7045

(c) Equation:y = axn

loge(y) = loge(a) + loge(xn)

loge(y) = nloge(x) + loge(a)

Y = nX + cY = -1.9932X + 5.7045 from Excel log-log graph

Where Y = loge(y) and X = loge(x)

n -1.9932c = loge(a) - So a = ec

= e5.7045

= 300 (nearest whole nbr)

So y = 300x-1.9932

Intensity of Sound

y = 300.22x-1.9932

R2 = 1

0200400600800

1000120014001600

0 0.5 1 1.5 2 2.5 3 3.5 4

Dist. (d)

Vo

lum

e (V

)

Choosing & justifying a model(s): Other ways of justifying your choice in a Scholarship

question


Justifying choice of model – consider the nature of the variables:


The nature of the variables may suggest the type of model.

y=Axn

where n is positive.

y=Axn

where n is negative.

y=Aekx

where k is positive.

y=Aekx

where k is negative.



Power model essential:When the physical properties of the variables under

investigation demand the use of a power model:

y=Axn


y=Axn


Or a root function: y=Axn

where n is between 0 and 1.




investigation demand the use of a power model: Example 1:



Power model essential:When the physical properties of the variables under investigation demand the use of a power

model: Example 1: If the variables are Length L cm (explanatory variable), and Mass M grams (response

variable) of the same species then, we can conclude 2 things about any valid model for this r/ship:1. That it must have a y-intercept (M-intercept) of (0,0), since a living organism will have no mass if it has no length!. This means that we require a Power Function with a positive exponent. Exponential functions don’t pass through (0,0).

2. That if it is assumed that the species grows uniformly in each of its 3 dimensions, then a Power model of the form M = aL3 would be appropriate.

E.g. if dropping objects of equal mass from different heights (H cm), and measuring the time taken to reach the ground (T secs.), then the constant increase in speed (acceleration due to gravity of 9.8m/s2) means 2 things:1. That it must have a y-intercept (T-intercept) of (0,0) since it would take 0 time to drop from a height of 0cm. This would require a power function with a positive exponent.2. That the r/ship must be positive but flattening out, since the greater the height from which an object is dropped, the longer it takes to reach the ground (more distance to cover), but reaching a greater speed. Hence a square root function would be reasonable, i.e. T = Ah0.5.




investigation demand the use of a power model: Example 2:



Power model essential:When the physical properties of the variables under investigation

demand the use of a power model: Example 2: Dropping objects of equal mass from different heights

(hcm), and observing the time taken to reach the ground (T secs.), then we can conclude 2 things about any valid model:1. That it must have a y-intercept (T-intercept) of (0,0) since it would take 0 time to drop from a height of 0cm. So, again, this would require a power function with a positive exponent.2. That the r/ship must be positive but flattening out, since the greater the height from which an object is dropped, the longer it takes to reach the ground (more distance to cover), but reaching a greater speed. Hence we’d expect something like a square root function, i.e. of the formT = Ah0.5.



Exponential model essential:

y=Aekx


y=Aekx




Exponential model essential:A relationship will be exponential if the change is

____________.____________means that there is a constant ________ _____.The response variable must have a finite initial value (i.e. a

y-intercept) that is being __________repeatedly by a constant.




multiplicative.____________means that there is a constant ________ _____.The response variable must have a finite initial value (i.e. a





multiplicative.Multiplicative means that there is a constant ________ _____.The response variable must have a finite initial value (i.e. a





multiplicative.Multiplicative means that there is a constant percentage

change.The response variable must have a finite initial value (i.e. a





multiplicative.Multiplicative means that there is a constant percentage

change.The response variable must have a finite initial value (i.e. a

y-intercept) that is being multiplied repeatedly by a constant.



Exponential model essential:A relationship will be exponential if the change is multiplicative.Multiplicative means that there is a constant percentage

change.The response variable must have a finite initial value (i.e. a y-

intercept) that is being multiplied repeatedly by a constant.

E.g. An initial investment of $1000 earns compound interest, and increases by 4.5% every year (i.e. × 1.045 every year).

E.g. The temperature of a bowl of soup is initially 70°C. It then drops by 10% every minute until it approaches room temperature (i.e. × 0.9 every minute).

E.g. Radioactive decay – always by a constant percentage per unit of time. Therefore it would be modelled by an exponential curve

Whereas, for power curves, the proportional change varies with time (e.g. parabolas, cubics etc.).


Exponential model essential:A relationship will be exponential if the change is multiplicative.Multiplicative means that there is a constant percentage

change.The response variable must have a finite initial value (i.e. a y-

intercept) that is being multiplied repeatedly by a constant.

Whereas, for power curves, the proportional change varies with time (e.g. parabolas, cubics etc.).

Also (for Calculus students), you’ll know that the derivative of an exponential function is proportional to its y-value:

If , then , i.e.

So if , then , i.e. . So

Therefore, a relationship will be exponential if the value on the y-axis is always changing at a rate that is proportional to its value at that instant.

xey xedx

dy

kxAey kxAekdx

dy.

ydx

dy

ykdx

dy. y

dx

dy

is proportional to

For practice of identifying this property of Exponential Functions, do a selection of questions from Old Sigma – Ex. ?, pg. ?

Choosing & justifying a model(s): Things to write about for Excellence.

VALIDITY OF A MODEL:

The extent to which the behaviour of your mathematical model reflects the relationship between your 2 variables

What makes a model valid/invalid?Will your mathematical curve estimate sensible

y-values throughout the domain of x-values in which you need use it?




– One function may model a relationship well for some domains of the explanatory variable but not for others.

– For example, many functions will estimate unrealistic values for long-range extrapolations (always a risky business).





– For example, many functions will estimate unrealistic values for long-range extrapolations (always a risky business). However such a function may still be valid for interpolation or short-range extrapolation.

Your job is to clearly specify the domain of x-values for which your model is valid (e.g. 0 < x <7, where x is whole numbers), and to explain why it is not valid elsewhere.




– For example, many functions will estimate unrealistic values for long-range extrapolations (always a risky business). However such a function may still be valid for interpolation or short-range extrapolation.

Your job is to clearly specify the domain of x-values for which your model is valid (e.g. 0 < x <7, where x is whole numbers), and to explain why it is not valid elsewhere.

Then, you could consider splitting the data into piece-wise form and fitting a different function to the x-value(s) for which your original function is not valid.


KEY: Plot your chosen model as a graph on your GC. Examine its features:

Use the GRAPHS option on your GC, selecting “draw” to sketch your chosen function to check its key features and “G-solve” to locate intercepts etc.


KEY: Plot your chosen model as a graph on your GC. Examine its features:

Use the GRAPHS option on your GC, selecting “draw” to sketch your chosen function to check its key features and “G-solve” to locate intercepts etc.

Things to mention: Intercepts, asymptotes, maxima, minima.

You will need to comment on any of these and how they model (or don’t model) the behaviour of the relationship between your explanatory and response variables.

Asymptotes:

You need to be aware of these when discussing your

model:

y=Axn


y=Axn


y=Aekx


y=Aekx


Asymptotes:


model:

– Exponential curves have a horizontal asymptote (will be the x-

axis itself unless the graph is translated vertically). However

exponential functions do not have a vertical asymptote – i.e.

despite getting steeper and steeper, they still exist (are

continuous) for all values of x.y=Aekx


y=Aekx


Asymptotes:


model:- Power curves with positive exponents have no asymptotes.- Power curves with negative exponents will have two

asymptotes – the x and y axes themselves! Hence no y or x intercept.

y=Axn


y=Axn


In summary, When choosing a model and (later on) discussing its validity:What should you do?

Answer: Look at your chosen type of curve on your graph (plot on your G.C.). Look at the behaviour of the graph:

What is its y-intercept and does this suit your variables? Does your response variable have any limiting values that it

physically cannot go above (or below) – ‘ceilings’ or ‘floors’? Does your model reflect the behaviour of the r/ship between

these 2 variables for all possible x-values? Are there any x values you could sub in, where you’d get an unrealistic y-value estimate? Explain.

Piecewise functionsWhere a graph is split into different sections. A different

equation is used for different domains along the x-axis.E.g. y

x

Key words that you must know:

DOMAIN: interval along the x-axis (explanatory variable).

RANGE: interval along the y-axis (response variable).


Piecewise functionsWhere a graph is split into different sections. A different

equation is used for different domains along the x-axis.E.g. y

x

E.g. A telephone account could be charged at a flat rate of $30 for the first 100 calls per month, and then at 20c for each extra call.Represent this as a piecewise function and draw its graph.Find the cost to the subscriber of 125 calls per month.




Where a graph is split into different sections. A different equation is used for different domains along the x-axis.

E.g. y

x





For the first 100 calls, the function is:

number. wholea is where, 1000domain for the , 30)( xxxf



For x > 100 the calls begin to be charged

ie x 100 calls are charged at $0.2 each, but the first 100 cost $30 so,


y

xKey words that you must know:








number. wholea is where, 100for , )100(2.030)( xxxxf








The piecewise function is


number. wholea is where, 100for , )100(2.030)( xxxxf

nbr. wholea is where,100


),100(2.030

,30)(

xx

xx

xxf





),100(2.030

,30)(

xx

xx

xxf

Graph.

30

100

y

x

f(x)





),100(2.030

,30)(

xx

xx

xxf

Graph.

30

100

y

x

f(x)





),100(2.030

,30)(

xx

xx

xxf

Graph.

30

100

y

x

f(x)

For 125 calls: 125 is greater than 100, so use the second formula

f(x) = 30 + 0.2(x – 100) Sub in x=125

Do NuLake pg. 338: Q3032

HW: Old Sigma (2nd ed): p302 – Ex. 17.5.

Sigma Mathematics Workbook© Pearson Education New

Zealand 2007

20.05A telephone account could be charged at a flat rate of $30 for the first 100 calls per month, and then at 20c for each extra call.Find the cost to the subscriber of 125 calls per month.

30, 0 100, W( )

30 0.2( 100), 100, W

x xf x

x x x

For 125 calls: 125 is greater than 100, so use the second formula

f(x) = 30 + 0.2(x – 100)

f(125) = 30 + 0.2(125 – 100)

= 30 + 5

= $35

Date post:	20-Jan-2016
Category:	Documents
Upload:	oscar-mccoy
View:	217 times
Download:	0 times

Go into “play mode” to view slideshow as intended. Hold down SHIFT and click F5.

Documents