Date post: | 20-Jan-2016 |
Category: |
Documents |
Upload: | oscar-mccoy |
View: | 217 times |
Download: | 0 times |
Scholarship: Statistics and Modelling Performance Standard 93201
Outcome description The student will demonstrate the ability to apply mathematical, statistical and probability knowledge and methods to complex problems in contexts which may be unfamiliar, interpret and, where appropriate, generalise results and clearly communicate concepts and findings.
Go into “play mode” to view slideshow as intended.
Hold down SHIFT and click F5.
Regression & Curve-fittingSKILLS: Selecting the best model(s) in situations that are not
clear-cut, justifying your decision, and report-writing.Do past schol questions that involve this. Get others to read
your comments and discuss (e.g. teacher, peers doing Scholarship).
INFO: - Describing shape, strength & direction.- Regression and the method of Least Squares.
- R2 and r - Testing for a Linear Model – Residual plots. - Testing for an Exponential Model (log-linear transformation) - Testing for a Power Function (log-log transformation) - Justifying choice of model - Using Piece-wise functions (key in Schol.)
Go into “play mode” to view slideshow as intended.
Hold down SHIFT and click F5.
Glossary
Explanatory (independent) Variable:
A quantitative variable that we control the value of (plotted on x-axis).
“Control” variable.
Response (dependent) Variable:
Variable whose value we do not control, but only observe (plotted on y-axis).
We investigate how it changes as we alter the value of the explanatory variable
Linear relationship:Data best explained by a straight
Regression line (linear-regression model)
Non-linear relationship:Data best explained by a curve (non-linear
model). Investigate a power curve, exponential curve, quadratic, polynomial, or logarithmic.
Correlation:The relationship between two
continuous quantitative variables – tells us NOTHING about cause and effect.
Regression:The process of fitting a line
(Regression line) or curve to a scatterplot and using its equation for the purpose of ESTIMATING the value of y (response variable), based on a given value of x (explanatory variable).
GlossaryCorrelation:The relationship between two continuous
quantitative variables – tells us NOTHING about cause and effect.
Regression:The process of fitting a line
(Regression line) or curve to a scatterplot and using its equation for the purpose of ESTIMATING the value of y (response variable), based on a given value of x (explanatory variable).
Correlation Coefficient (r):A measure of the strength of a linear
relationship between two variables. Domain: (-1<r<1)
• If r is positive: positive relationship – as x gets larger, y gets larger.
• If r is negative: negative relationship – as x gets larger, y gets smaller.
Coefficient of Determination (R2):
Measures the percentage of variability in y (response variable) that is accounted for by the regression line.
Investigating Causation:Investigating the cause of the
relationship observed between two variables.
Confounding:When the relationship between the 2
variables being studied is being influenced by another variable(s).
Direct causal relationship:XY or YX
Indirect link:Relationship is “Counfounded” by the
presence of a THIRD “Lurking” variable
XmY (mediating variable)ZX and ZY (common cause)
X1 and X2 Y (multiple causal factors)
No significant relationship:When one cannot rule out the
possibility that there is no actual relationship between two variables in the population.
Glossary Reliability:Confidence in estimates made
using model:If not much variability (data-
points close to line/curve), then a valid model should allow us to make reliable estimates.
If large variability (points spread out),
then we could not make very reliable estimates, even with best poss model.
Validity:• Validity of your model and
its estimates:The degree to which your mathematical model reflects the behaviour of the relationship between the 2 variables.
• Validity of a claim or conclusion:Accuracy of a person’s claim or conclusion based on the evidence of actual data.
Investigating Causation:Investigating the cause of the
relationship observed between two variables.
Confounding:When the relationship between the 2
variables being studied is being influenced by another variable(s).
Direct causal relationship:XY or YX
Indirect link:Relationship is “Counfounded” by the
presence of a THIRD “Lurking” variable
XmY (mediating variable)ZX and ZY (common cause)
X1 and X2 Y (multiple causal factors)
No significant relationship:When one cannot rule out the
possibility that there is no actual relationship between two variables in the population.
Scatter Plots
Types of Variables
Scatter Plots
Types of Variables
Quantitative(numerical)
Discrete
Continuous
Scatter Plots
• The scatter plot is the basic tool used to investigate relationships between two ___________ variables.
Types of Variables
Quantitative(numerical)
Qualitative(categories)
Discrete
Continuous
Scatter Plots
• The scatter plot is the basic tool used to investigate relationships between two quantitative variables.
• It is used for paired (__________) data.
Types of Variables
Quantitative(numerical)
Qualitative(categories)
Discrete
Continuous
Scatter Plots
• The scatter plot is the basic tool used to investigate relationships between two quantitative variables.
• It is used for paired (bi-variate) data.
Types of Variables
Quantitative(numerical)
Qualitative(categories)
Discrete
Continuous
What do I look for in scatter plots?
Shape
Do you see – a ______relationship…
straight line
OR
– a ___ ________ relationship?
What do I look for in scatter plots?
Shape
Do you see – a linear relationship…
straight line
OR
– a ___ ________ relationship?
What do I look for in scatter plots?
Shape
Do you see – a linear relationship…
straight line
OR
– a non-linear relationship?
What do I look for in scatter plots?
Shape
Do you see – a linear relationship…
straight line
OR
– a non-linear relationship?
What do I look for in scatter plots?
Shape
Do you see – a _______ relationship… as
one variable gets bigger, so does the other
OR– a ________ relationship? as one
variable gets bigger, the other gets smaller
What do I look for in scatter plots?
Shape
Do you see – a positive relationship… as
one variable gets bigger, so does the other
OR– a ________ relationship? as one
variable gets bigger, the other gets smaller
What do I look for in scatter plots?
Shape
Do you see – a positive relationship… as
one variable gets bigger, so does the other
OR– a negative relationship? as one
variable gets bigger, the other gets smaller
What do I look for in scatter plots?
Shape
Do you see – a positive relationship… as
one variable gets bigger, so does the other
OR– a negative relationship? as one
variable gets bigger, the other gets smaller
What do I look for in scatter plots?
Scatter
Do you see – a ______relationship… little
scatter
OR
– a ______relationship? lots of scatter
What do I look for in scatter plots?
Scatter
Do you see – a strong relationship… little
scatter
OR
– a ______relationship? lots of scatter
What do I look for in scatter plots?
Scatter
Do you see – a strong relationship… little
scatter
OR
– a weak relationship? lots of scatter
What do I look for in scatter plots?
Scatter
Do you see – a strong relationship… little
scatter
OR
– a weak relationship? lots of scatter
What do I look for in scatter plots?
Scatter
Do you see – _______ scatter… roughly the
same amount of scatter as you look across the plot
or – ___ _________scatter? the
scatter looks like a “fan” or “funnel”
What do I look for in scatter plots?
Scatter
Do you see – constant scatter… roughly the
same amount of scatter as you look across the plot
or – ___ _________scatter? the
scatter looks like a “fan” or “funnel”
What do I look for in scatter plots?
Scatter
Do you see – constant scatter… roughly the
same amount of scatter as you look across the plot
or – non-constant scatter? the
scatter looks like a “fan” or “funnel”
What do I look for in scatter plots?
Scatter
Do you see – constant scatter… roughly the
same amount of scatter as you look across the plot
or – non-constant scatter? the
scatter looks like a “fan” or “funnel”
What do I look for in scatter plots? Unusual features
Do you see – any ________? unusually far
from the trend
– any __________?
What do I look for in scatter plots? Unusual features
Do you see – any outliers? unusually far
from the trend
– any __________?
What do I look for in scatter plots? Unusual features
Do you see – any outliers? unusually far
from the trend
– any groupings?
What do I look for in scatter plots? Unusual features
Do you see – any outliers? unusually far
from the trend
– any groupings?
Rank these relationships from strongest (1) to weakest (4):
Rank these relationships from strongest (1) to weakest (4):
3
4
1
2
Rank these relationships from strongest (1) to weakest (4):
• How did you make your decisions?
• The less scatter there is about the trend line, the stronger the relationship is.
Describing scatterplots
• Relationship – linear or non linear
• Relationship – positive or negative
• Relationship – strong or weak
• Scatter – constant or non constant
• Unusual features – outliers or groupings
What do I see in these scatter plots?
• There appears to be a linear trend.
• There appears to be moderate constant scatter about the trend line.
• Negative relationship.• No outliers or
groupings visible.454035
20
19
18
17
16
15
14
Latitude (°S)
Mean January Air Temperatures for 30 New Zealand Locations
Tem
pera
ture
(°C
)
What do I see in these scatter plots?
• There appears to be a non-linear trend.
• There appears to be non-constant scatter about the trend line.
• Positive relationship.• One possible outlier
(Large GDP, low % Internet Users).
0 10 20 30 40
GDP per capita (thousands of dollars)
0
10
20
30
40
50
60
70
80
Inte
rnet
Users
(%
)
% of population who are Internet Users vs
GDP per capita for 202 Countries
What do I see in these scatter plots?
• Two non-linear trends (Male and Female).
• Very little scatter about the trend lines
• Negative relationship until about 1970, then a positive relationship.
• Gap in the data collection (Second World War).
Year
1990198019701960195019401930
30
28
26
24
22
20
Ag
e
Average Age New Zealanders are First Married
Regression
What is Regression?Regression is when you fit a line or curve to
a scatter plot for the purpose of ________ the value of y (________variable), based on a given value of x (____________ variable).
Back to Regression & Curve Fitting menu
Regression
What is Regression?Regression is when you fit a line or curve to
a scatter plot for the purpose of PREDICTING the value of y (________variable), based on a given value of x (____________ variable).
Regression
What is Regression?Regression is when you fit a line or curve to
a scatter plot for the purpose of PREDICTING the value of y (response variable), based on a given value of x (____________ variable).
Regression
What is Regression?Regression is when you fit a line or curve to a
scatter plot for the purpose of PREDICTING the value of y (response variable), based on a given value of x (explanatory variable).
Difference between Correlation and Regression:Correlation: Measures degree of
association.Regression: Uses relationship to predict.
What is Regression?Regression is when you fit a line or curve to a
scatter plot for the purpose of PREDICTING the value of y (response variable), based on a given value of x (explanatory variable).
Difference between Correlation and Regression:
Correlation: Measures degree of association.
Regression: Uses relationship to predict.
Unless the relationship is perfect, there is some variation of the observed y values from those predicted by the fitted line. We call these prediction errors ‘ ___________’.
What is Regression?Regression is when you fit a line or curve to a
scatter plot for the purpose of PREDICTING the value of y (response variable), based on a given value of x (explanatory variable).
Difference between Correlation and Regression:
Correlation: Measures degree of association.
Regression: Uses relationship to predict.
Unless the relationship is perfect, there is some variation of the observed y values from those predicted by the fitted line. We call these prediction errors ‘Residuals’.
Difference between Correlation and Regression:
Correlation: Measures degree of association.
Regression: Uses relationship to predict.
Unless the relationship is perfect, there is some variation of the observed y values from those predicted by the fitted line. We call these prediction errors ‘Residuals’.
The aim of regression is to fit a line that keeps the ________ as small as possible.
Difference between Correlation and Regression:
Correlation: Measures degree of association.
Regression: Uses relationship to predict.
Unless the relationship is perfect, there is some variation of the observed y values from those predicted by the fitted line. We call these prediction errors ‘Residuals’.
The aim of regression is to fit a line that keeps the residuals as small as possible.
Regression Regression
8
y = 5 + 2x
data point(8, 25)
25
21prediction error
Regression relationship = trend + scatter
Observed value = predicted value + prediction error
The Least Squares Regression The Least Squares Regression LineLine
• Choose the line with smallest sum of squared prediction errors.
Minimise the sum of squared prediction errors 2 )(residuals
Which line?
The Least Squares Regression The Least Squares Regression LineLine
• Choose the line with smallest sum of squared prediction errors.
Minimise the sum of squared prediction errors 2
ˆ )y-(y
Which line?
The Least Squares Regression The Least Squares Regression LineLine
• Choose the line with smallest sum of squared prediction errors.
Minimise the sum of squared prediction errors 2 )(residuals
Which line?
The Least Squares Regression The Least Squares Regression LineLine
• Choose the line with smallest sum of squared prediction errors.
Minimise the sum of squared prediction errors 2
ˆ )y-(y
Which line?
Fitting the best regression line: The method of LEAST SQUARES
Take the data that we collected at the beginning of the year on this class:Forearm length (explanatory variable) vs height (response variable).
Below is the scatter plot that we made using this data.
y = 2.2118x + 116.76
172
174
176
178
180
182
184
186
188
190
192
194
26.5 27 27.5 28 28.5 29 29.5 30 30.5 31 31.5 32
He
igh
t (c
m)
Radius length (cm)
Scatter plot of Radius length vs Height for Y13 Stats students at STC, 2010
There is always one unique regression line with the best possible fit. It has 3 characteristics:
1 The sum of the residuals will be very close to zero (positives & negatives cancel out).
2 The sum of the squared residuals will be minimised (hence called the ‘least squares’ regression line).
3 The ‘mean’ point will lie on the line.
y = 2.2118x + 116.76
172
174
176
178
180
182
184
186
188
190
192
194
26.5 27 27.5 28 28.5 29 29.5 30 30.5 31 31.5 32
He
igh
t (c
m)
Radius length (cm)
Scatter plot of Radius length vs Height for Y13 Stats students at STC, 2010
),( yx
x Length of forearm (cm)
y
Height (cm)
ŷ
(height predicted by
regression line)
ŷ=2.2118x + 116.76
Prediction error
(Residuals)
y - ŷ
(Residuals)2
27.4 178.2
29 180
27 183
30 181
28.5 175
30.5 190.5
30.5 178
28.9 179.1
30.5 187
29 181
31.5 191.5
29.5 174
29.5 184
TOTALS:
),( yx
(1.) What is the sum of all your residuals? _________ . For the line of best fi t, this should be very close to zero. Why does it not have to come out as exactly zero?
_________________________________________________________
(2.) Find the mean values of the x and y variables: x and y .
x = ____________ , y = ______________ .
Does the point ),( yx lie on your regression line?
Test by subbing the value of x in f or x your regression equation. What value does it predict f or y ? I s it the same value as y (the mean y value)? _______
(3.) The defi nition of the least squares regression line is that it minimises the sum of the
squared residuals. Use Excel to test whether this is true by altering the values of the gradient and y-intercept in the regression equation both up and down. Watch how the total at the bottom of the (residuals)2 column changes. I s this really the least squares line? Explain.
____________________________________________
R-squared (R2) - the Co-efficient of DeterminationOn a scatter plot Excel has options for displaying the equation of the fitted
line and the value of R2.Four scatter plots with fitted lines are shown below. The equation of the
fitted line and the value of R2 are given for each plot.
Common Cracker Brands
y = 0.1112x + 372.37
R2 = 0.4257
350
400
450
500
550
100 300 500 700 900 1100 1300
Salt (mg/100g)
En
erg
y (c
alo
ries
/100
g)
Common Cracker Brandsy = 4.9844x + 380.82
R2 = 0.982
350
400
450
500
550
0 10 20 30 40
Total Fat (%)
En
erg
y (c
alo
ries
/100
g)
Common Cracker Brandsy = 0.3717x + 440.06
R2 = 0.0166
350
400
450
500
550
0 20 40 60
Number of crackers per 100g
En
erg
y (
calo
ries/1
00g
)
Common Cracker Brandsy = 0.0237x - 2.6556
R2 = 0.4892
0
5
10
15
20
25
30
35
0 500 1000 1500
Salt (mg/100g)
To
tal
Fat
(%
)
Comment on any relationship between the scatter plot and the value of R2.What do you think R2 is measuring?
ANSWER: The smaller the scatter about the trend-line, the greater the R2.
Back to Regression & Curve Fitting menu
What should I say about the R2?
R2: The “Co-efficient of Determination”(measures VARIABILITY)
It measures the “ _______________________________________________________________________.”
What should I say about the R2?
R2: The “Co-efficient of Determination”(measures VARIABILITY)
It measures the “proportion of variability in the RESPONSE variable (y) ________________________________.”
What should I say about the R2?
R2: The “Co-efficient of Determination”(measures VARIABILITY)
It measures the “proportion of variability in the RESPONSE variable (y) that is accounted for by the slope of the regression line.”
What should I say about the R2?
R2: The “Co-efficient of Determination”(measures VARIABILITY)
It measures the “proportion of variability in the RESPONSE variable (y) that is accounted for by the slope of the regression line.”
What should I say about the R2?
R2: The “Co-efficient of Determination”(measures VARIABILITY)
It measures the “proportion of variability in the RESPONSE variable (y) that is accounted for by the slope of the regression line.”
The R2 is something you can look at, but only as an extra piece of evidence.
What should I say about the R2?
R2: The “Co-efficient of Determination”(measures VARIABILITY)
It measures the “proportion of variability in the RESPONSE variable (y) that is accounted for by the slope of the regression line.”
The R2 is something you can look at, but only as an extra piece of evidence. An R2 of 0.85 means that __________________________
_____________________________________________________ .
What should I say about the R2?
R2: The “Co-efficient of Determination”(measures VARIABILITY)
It measures the “proportion of variability in the RESPONSE variable (y) that is accounted for by the slope of the regression line.”
The R2 is something you can look at, but only as an extra piece of evidence. An R2 of 0.85 means that your model accounts for 85% of RESPONSE VARIABLE’S variability.
(85% of the vertical scatter above and below the MEAN value on the y-axis).
What should I say about the R2?
R2: The “Co-efficient of Determination”(measures VARIABILITY)
It measures the “proportion of variability in the RESPONSE variable (y) that is accounted for by the slope of the regression line.”
The R2 is something you can look at, but only as an extra piece of evidence. An R2 of 0.85 means that your model accounts for 85% of RESPONSE VARIABLE’S variability
(85% of the vertical scatter above and below the MEAN value on the y-axis).
BUT the R2 on its own cannot be used to determine whether a model is appropriate.
R2: The “Co-efficient of Determination”(measures VARIABILITY)
It measures the “proportion of variability in the RESPONSE variable (y) that is accounted for by the slope of the regression line.”
The R2 is something you can look at, but only as an extra piece of evidence. An R2 of 0.85 means that your model accounts for 85% of RESPONSE VARIABLE’S variability
(85% of the vertical scatter above and below the MEAN value on the y-axis).
BUT the R2 on its own cannot be used to determine whether a model is appropriate.
R2 deals with variability accounted for, but it doesn’t tell us whether the SHAPE of the model is appropriate.
The visual fit of the data-points to the model and the linearity of a log-transformation should be the main criteria.
(85% of the vertical scatter above and below the MEAN value on the y-axis).
BUT the R2 on its own cannot be used to determine whether a model is appropriate.
R2 deals with variability accounted for, but it doesn’t tell us whether the SHAPE of the model is appropriate.
The visual fit of the data-points to the model and the linearity of a log-transformation should be the main criteria.
The graph to the left is clearly non-linear.
Its shape is concave upwards.
Yet the R2 for a linear trendline is very strong (R2 = 0.936).
Why is this??
Log-log transformation
y = 0.8587x + 4.7271 R 2 = 0.936
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0
0 0.5 1 1.5 2 ln(x)
ln(y)
R-squared (R2) - the Co-efficient of Determination
R2 measures the proportion of variance in the response variable (y) either side of the mean y-value that can be explained by regression line/curve.
Common Cracker Brands
y = 0.1112x + 372.37
R2 = 0.4257
350
400
450
500
550
100 300 500 700 900 1100 1300
Salt (mg/100g)
En
erg
y (c
alo
ries
/100
g)
Common Cracker Brandsy = 4.9844x + 380.82
R2 = 0.982
350
400
450
500
550
0 10 20 30 40
Total Fat (%)
En
erg
y (c
alo
ries
/100
g)
Common Cracker Brandsy = 0.3717x + 440.06
R2 = 0.0166
350
400
450
500
550
0 20 40 60
Number of crackers per 100g
En
erg
y (
calo
ries/1
00g
)
Common Cracker Brandsy = 0.0237x - 2.6556
R2 = 0.4892
0
5
10
15
20
25
30
35
0 500 1000 1500
Salt (mg/100g)
To
tal
Fat
(%
)
R-squared (R2) – the formula
Correlation (r : the correlation coefficient)
Correlation measures the _______ of the _____ _______ between two quantitative variables
Correlation (r : the correlation coefficient)
Correlation measures the strength of the _____ _______ between two quantitative variables
Correlation measures the strength of the linear association between two quantitative variables, and
Tells us whether it the relationship is ________ or _________.
Correlation (r : the correlation coefficient)
Correlation measures the strength of the linear association between two quantitative variables, and
Tells us whether it the relationship is positive or _________.
Correlation (r : the correlation coefficient)
Correlation measures the strength of the linear association between two quantitative variables, and
Tells us whether it the relationship is positive or negative .
Get the correlation coefficient (r) from your calculator or computer
r has a value between -1 and +1
Correlation (r : the correlation coefficient)
Tells us whether it the relationship is positive or negative .
Get the correlation coefficient (r) from your calculator or computer
r has a value between -1 and +1
r = -1 r = -0.7 r = -0.4 r = 0 r = 0.3 r = 0.8 r = 1
Points f all exactly on a straight line
Points f all exactly on a straight line
No linear relationship
(uncorrelated)
Get the correlation coefficient (r) from your calculator or computer
r has a value between -1 and +1
Correlation has no units
r = -1 r = -0.7 r = -0.4 r = 0 r = 0.3 r = 0.8 r = 1
Points f all exactly on a straight line
Points f all exactly on a straight line
No linear relationship
(uncorrelated)
What can go wrong? Use correlation only if you have two quantitative
variables (variables that can be measured)
There is an association between gender and weight but there isn’t a correlation between gender and weight!
Gender is not quantitative
Use correlation only if the relationship is ______
What can go wrong? Use correlation only if you have two quantitative
variables (variables that can be measured)
There is an association between gender and weight but there isn’t a correlation between gender and weight!
Gender is not quantitative
Use correlation only if the relationship is linear
Beware of outliers!
Always plot the data before looking at the correlation!
• r = 0
• No linear relationship, but there is a relationship!
• r = 0.9
• No linear relationship, but there is a relationship!
Tick the plots where it would be OK to use a correlation coefficient to describe the
strength of the relationship:
9876543210
4000300020001000
0
Position Number
Dis
tan
ce (
million
m
iles)
Distances of Planets from the Sun
Reaction Times (seconds)
for 30 Year 10 Students
0
0.2
0.4
0.6
0.8
0 0.2
0.4
0.6
0.8
1
Non-dominant Hand
Dom
inan
t H
an
d
454035
20
19
18
17
16
15
14
Latitude (°S)
Mean January Air Temperatures
for 30 New Zealand Locations
Tem
pera
ture
(°C
)
Female ($)
Average Weekly Income for Employed New Zealanders in
2001
Male
($)
0
200
400
600
800
1000
1200
0 200
400
600
800
Tick the plots where it would be OK to use a correlation coefficient to describe the
strength of the relationship:
9876543210
4000300020001000
0
Position Number
Dis
tan
ce (
million
m
iles)
Distances of Planets from the Sun
Reaction Times (seconds)
for 30 Year 10 Students
0
0.2
0.4
0.6
0.8
0 0.2
0.4
0.6
0.8
1
Non-dominant Hand
Dom
inan
t H
an
d
454035
20
19
18
17
16
15
14
Latitude (°S)
Mean January Air Temperatures
for 30 New Zealand Locations
Tem
pera
ture
(°C
)
Female ($)
Average Weekly Income for Employed New Zealanders in
2001
Male
($)
0
200
400
600
800
1000
1200
0 200
400
600
800
Not linear
Remove two outliers, nothing happening
Understanding r and R2
See forearm length vs height data we collected on this class at the start of the year.
Play around with this graph.
The R2 value (coefficient of determination) depends on which regression model we fit to the data because it is the proportion of variation in the response variable accounted for by that model.
So, why does the value of the correlation coefficient r remain the same, regardless of which regression model we fit to the graph?
The R2 value (coefficient of determination) depends on which regression model we fit to the data because it is the proportion of variation in the response variable accounted for by that model.So, why does the value of the correlation coefficient r remain the same, regardless of which regression model we fit to the graph?
Answer: Because the correlation coefficient (r) is simply the strength of the linear association between the 2 variables.Mathematically it is calculated from the Covariance between x and y , which is a combined measure of the spread of the data around the means of both variables. Hence r can be calculated without the aid of a regression line. So actually r only equals the square root of R2 if the R2 value is based on the least squares linear regression line.
What do I see in this scatter plot?
Trend:Appears to be a linear trend, with a possible outlier (tall person with a small foot size.)
Strength:Moderate strength.Constant scatter.
DirectionPositive association.As foot size increases, height TENDS to increase.
22 23 24 25 26 27 28 29
150
160
170
180
190
200
Foot size (cm)
Heig
ht
(cm
)
Height and Foot Size for 30 Year 10 Students
What will happen to the correlation coefficient if the tallest Year 10 student is
removed?
• It will get smaller
• It won’t change
• It will get bigger
22 23 24 25 26 27 28 29
150
160
170
180
190
200
Foot size (cm)
Heig
ht
(cm
)
Height and Foot Size for 30 Year 10 Students
What will happen to the correlation coefficient if the tallest Year 10 student is
removed?
• It will get bigger
22 23 24 25 26 27 28 29
150
160
170
180
190
200
Foot size (cm)
Heig
ht
(cm
)
Height and Foot Size for 30 Year 10 Students
What do I see in this scatter plot?
• Appears to be a strong linear trend.
• Outlier in X (the elephant).
• Appears to be constant scatter.
• Positive association.6005004003002001000
40
30
20
10
Gestation (Days)
Lif
e E
xp
ecta
ncy (
Years
)
Life Expectancies and Gestation Period for a sample of non-human Mammals
Elephant
What will happen to the correlation
coefficient if the elephant is removed?
• It will get smaller
• It won’t change
• It will get bigger
6005004003002001000
40
30
20
10
Gestation (Days)
Lif
e E
xp
ecta
ncy (
Years
)
Life Expectancies and Gestation Period for a sample of non-human Mammals
Elephant
What will happen to the correlation
coefficient if the elephant is removed?
• It will get smaller
6005004003002001000
40
30
20
10
Gestation (Days)
Lif
e E
xp
ecta
ncy (
Years
)
Life Expectancies and Gestation Period for a sample of non-human Mammals
Elephant
Calculating the correlation coefficient, r
1. Plot the data on your Graphics Calculator.
2. Fit a linear trendline (regression line).
3. Display R2.
4. r =
If the slope is positive, then r is positive.
If the slope is negative, then r is negative.
2R
Extension session:Calculating the correlation coefficient
First work out the covariance – a measure of how x and y vary together from their means:
Covariance of X and Y:
Cov(X,Y) =
The correlation coefficient r = YX
YXCov
.
),(
n
yyxx )).((
Testing for a Linear model: Residual plots
If our linear model describes the shape of the data well (i.e. it is linear), then the residuals should just be random variation.
They should be distributed evenly either side of the regression line (uniform shape).
Q. How do we test this?A.By plotting the residuals on a graph.
If the data is linear, the residual plot should show an even balance of positive and negative residuals (uniform shape – no patterns).
If the residual plot shows a curved pattern (e.g. mostly negative residuals at one end and positive residuals at the other), it means that the best model is either non-linear or piecewise.
To practice using Residual Plots: Do NuLake p252: Q22 & 23 (residual plots).
Back to Regression & Curve Fitting menu
Exponential functions and the “LOG-LINEAR” transformation
(plotting x against ln(y) )
Back to Regression & Curve Fitting menu
Exponential functions and the “LOG-LINEAR” transformation
(plotting x against ln(y) )For any 2 variables x and y, if y is an exponential
function of x, then ln(y) will be a linear function of x.
How to test for an Exponential Model:
We can test whether the relationship between 2 variables is best modelled by an exponential function of the form y = Aekx by plotting x against ln(y). If the resulting scatter follows a linear trend, then this means that y is an exponential function of x.
First calculate some ordered pairs for y = 2e3x.
If y = 2e3x, draw the graph of x vs ln(y).
x y
0
0.5
1
If y = 2e3x, draw the graph of x vs ln(y).
Several ordered pairs from this relation are:
2
8.9634
40.1711
Graph.
We get the growth curve we would expect from an exponential relationship.
x0.5 1
y
4
3
2
1
Calculate the equivalent pairs for x and ln(y).
ln(y)
x
ln(y)
0.5 1
4
3
2
1
0.6931
2.1931
3.6931
We obtain a straight line. By making this conversion, an exponential relationship is converted to a linear one.
Graph of x vs y Graph of x vs logey
The spreadsheet shows the number of truckloads of logs delivered to a sawmill over a ten-year period. 1 Verify that an exponential model is an appropriate model for the number of truckloads delivered per year.2 Determine the equation of the ‘best fit’ exponential curve.3 Estimate the yearly rate of increase as a percentage.
Calculate the log for each delivery number in column C alongside the respective number in column B.
1 Verify that an exponential model is an appropriate model for the number of truckloads delivered per year.
1 Verify that an exponential model is an appropriate model for the number of truckloads delivered per year.
1 Verify that an exponential model is an appropriate model for the number of truckloads delivered per year.
Graph log(deliveries) vs year number.
Use the Chart-Wizard to draw a scatter diagram (the X-Y Scatter option).
1 Verify that an exponential model is an appropriate model for the number of truckloads delivered per year.
As a straight line relationship is apparent, the actual relationship is exponential.
2 Determine the equation of the ‘best fit’ exponential curve.
Graph the truckloads vs year and fit an exponential model.
2 Determine the equation of the ‘best fit’ exponential curve.
The equation of the curve of ‘best fit’, using an exponential model, is y = 16 589e0.2424x
3 Estimate the yearly rate of increase as a percentage.
y = 16 589e0.2424x
This can also be expressed as
0.242416589 ( )xy e
16589 (1.274)x
The model is
The yearly rate of increase is approximately 27.4%
Power functions and the “LOG-LOG” transformation(plotting ln(x) against ln(y) )
Back to Regression & Curve Fitting menu
Power functions and the “LOG-LOG” transformation(plotting ln(x) against ln(y) )
For any 2 variables x and y, if y is a power function of x, (i.e. y=Axn), then ln(y) will be a linear function of ln(x).
How to test for a Power Model:
We can test whether the relationship between 2 variables is best modelled by a power function of the form y = Axn by plotting ln(x) against ln(y). If the resulting scatter follows a linear trend, then this means that y is a power function of x.
If y = 2x0.4, draw the graph of ln(x) vs ln(y).
First calculate some ordered pairs for y = 2x0.4.
If y = 2x0.4, draw the graph of ln(x) vs ln(y).
Several ordered pairs from this relation are:
x y
1
2
3
We get the curve we would expect from a power relationship.
Graph.
y
1
3
2
1
x2 3
Calculate the equivalent pairs for ln(x) and ln(y).
2
2.6390
3.1037
ln(x) ln(y)0.6931
0.9704
1.1326
0
0.6931
1.0986
The equivalent pairs for ln(x) and ln(y) are:
ln(x)0.5 1
ln(y)
1
0.5Graph.
We obtain a straight line.
By making this conversion, a power-function relationship is converted to a linear one.
Graph of x vs y
Graph of logex vs logey
Sigma (new): Ex. 20.03
The first question is worked through on the following slides.
Hold down SHIFT and push F5 to view these.
(a) Draw a graph of logey against x.(b) Draw a graph of logey against logex
(a) Draw a graph of logey against x.
(a) Draw a graph of logey against x.
(a) Draw a graph of logey against x.
(a) Draw a graph of logey against x.
(a) Draw a graph of logey against x.
(a) Draw a graph of logey against x.
(a) Draw a graph of logey against x.
(a) Draw a graph of logey against x.
(a) Draw a graph of logey against x.
(a) Draw a graph of logey against x.
(a) Draw a graph of logey against x.
(a) Draw a graph of logey against x.
(a) Draw a graph of logey against x.
(a) Draw a graph of logey against x.
(a) Draw a graph of logey against x.
(a) Draw a graph of logey against x.
(a) Draw a graph of logey against x.
(a) Draw a graph of logey against x.
(a) Draw a graph of logey against x.
Select the x and ln(y) columns (hold down control to get both).
(a) Draw a graph of logey against x.
FORMATTING:
-Write title: x vs ln(y)
(a) Draw a graph of logey against x.
Label horizontal axis: xAnd vertical axis: ln(y)
(a) Draw a graph of logey against x.
(a) Draw a graph of logey against x.(b) Draw a graph of logey against logex
(b) Draw a graph of logey against logex
(b) Draw a graph of logey against logex
(b) Draw a graph of logey against logex
(b) Draw a graph of logey against logex
(b) Draw a graph of logey against logex
(b) Draw a graph of logey against logex
(b) Draw a graph of logey against logex
(b) Draw a graph of logey against logex
Select the ln(x) and ln(y) columns and insert a new scatter plot.
(b) Draw a graph of logey against logex
Title: ln(x) vs ln(y).
Select “LAYOUT 1” as before & add axis labels.
(a) Draw a graph of logey against x.(b) Draw a graph of logey against logex(c) Which relationship best fits the data? Choose between y = aekx or y = axn.
(c) Which relationship best fits the data? Choose between y = aekx or y = axn.
Add a linear regression line to each graph:
(c) Which relationship best fits the data? Choose between y = aekx or y = axn.
There is our log linear transformation.
(c) Which relationship best fits the data? Choose between y = aekx or y = axn.
There is our log linear transformation.
Appears to be linear.
(c) Which relationship best fits the data? Choose between y = aekx or y = axn.
There is our log linear transformation.
Appears to be linear.
Now fit a linear regression line to the other graph and compare.
(c) Which relationship best fits the data? Choose between y = aekx or y = axn.
There is our log linear transformation.
Appears to be linear.
Now fit a linear regression line to the other graph and compare.
Which one looks more linear?
Clearly the log-linear transformation (x vs ln(y)) has a linear shape.
(c) Which relationship best fits the data? Choose between y = aekx or y = axn.
Clearly the log-linear transformation (x vs ln(y)) has a linear shape.
Whereas the log-log transformation (ln(x) vs ln(y)) does not.
(c) Which relationship best fits the data? Choose between y = aekx or y = axn.
Clearly the log-linear transformation (x vs ln(y)) has a linear shape.
Whereas the log-log transformation (ln(x) vs ln(y)) does not.
Hence an __________ model (y = aekx) will fit the raw data better than a _______ model (y = axn).
Clearly the log-linear transformation (x vs ln(y)) has a linear shape.
Whereas the log-log transformation (ln(x) vs ln(y)) does not.
Hence an exponential model (y = aekx) will fit the raw data better than a _______ model (y = axn).
(c) Which relationship best fits the data? Choose between y = aekx or y = axn.
Clearly the log-linear transformation (x vs ln(y)) has a linear shape.
Whereas the log-log transformation (ln(x) vs ln(y)) does not.
Hence an exponential model (y = aekx) will fit the raw data better than a power model (y = axn).
(c) Which relationship best fits the data? Choose between y = aekx or y = axn.
(a) Draw a graph of logey against x.(b) Draw a graph of logey against logex(c) Which relationship best fits the data? Choose between y = aekx or y = axn.
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
y = -0.9993x + 5.7857
0
1
2
3
4
5
6
0 1 2 3 4 5 6
ln(y
)
x
x vs ln(y)Need to find the constants “a” and “k” in the exponential equation for the raw data: y = aekx
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
y = -0.9993x + 5.7857
0
1
2
3
4
5
6
0 1 2 3 4 5 6
ln(y
)
x
x vs ln(y)Need to find the constants “a” and “k” in the exponential equation for the raw data: y = aekx
In terms of the original variables x and y, the equation on the log-linear graph is really:
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
y = -0.9993x + 5.7857
0
1
2
3
4
5
6
0 1 2 3 4 5 6
ln(y
)
x
x vs ln(y)Need to find the constants “a” and “k” in the exponential equation for the raw data: y = aekx
In terms of the original variables x and y, the equation on the log-linear graph is really:
ln(y) = -0.9993x + ln(a)
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
y = -0.9993x + 5.7857
0
1
2
3
4
5
6
0 1 2 3 4 5 6
ln(y
)
x
x vs ln(y)Need to find the constants “a” and “k” in the exponential equation for the raw data: y = aekx
In terms of the original variables x and y, the equation on the log-linear graph is really:
ln(y) = -0.9993x + ln(a)
ln(a) = 5.7857 (from equation on graph). And ln(a) means logea
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
y = -0.9993x + 5.7857
0
1
2
3
4
5
6
0 1 2 3 4 5 6
ln(y
)
x
x vs ln(y)Need to find the constants “a” and “k” in the exponential equation for the raw data: y = aekx
In terms of the original variables x and y, the equation on the log-linear graph is really:
ln(y) = -0.9993x + ln(a)
ln(a) = 5.7857 (from equation on graph). And ln(a) means logea
So a = e5.7857
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
y = -0.9993x + 5.7857
0
1
2
3
4
5
6
0 1 2 3 4 5 6
ln(y
)
x
x vs ln(y)Need to find the constants “a” and “k” in the exponential equation for the raw data: y = aekx
In terms of the original variables x and y, the equation on the log-linear graph is really:
ln(y) = -0.9993x + ln(a)
ln(a) = 5.7857 (from equation on graph). And ln(a) means logea
So a = e5.7857 = _______
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
y = -0.9993x + 5.7857
0
1
2
3
4
5
6
0 1 2 3 4 5 6
ln(y
)
x
x vs ln(y)Need to find the constants “a” and “k” in the exponential equation for the raw data: y = aekx
In terms of the original variables x and y, the equation on the log-linear graph is really:
ln(y) = -0.9993x + ln(a)
ln(a) = 5.7857 (from equation on graph). And ln(a) means logea
So a = e5.7857 = 325.6
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
y = -0.9993x + 5.7857
0
1
2
3
4
5
6
0 1 2 3 4 5 6
ln(y
)
x
x vs ln(y)Need to find the constants “a” and “k” in the exponential equation for the raw data: y = aekx
In terms of the original variables x and y, the equation on the log-linear graph is really:
ln(y) = -0.9993x + ln(a)
ln(a) = 5.7857 (from equation on graph). And ln(a) means logea
So a = e5.7857 k = -0.9993 (gradient) = 325.6
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
y = -0.9993x + 5.7857
0
1
2
3
4
5
6
0 1 2 3 4 5 6
ln(y
)
x
x vs ln(y)Need to find the constants “a” and “k” in the exponential equation for the raw data: y = aekx
In terms of the original variables x and y, the equation on the log-linear graph is really:
ln(y) = -0.9993x + ln(a)
ln(a) = 5.7857 (from equation on graph). And ln(a) means logea
So a = e5.7857 k = -0.9993 (gradient) = 325.6
So model is: y = 325.6e-0.9993x
Where x=weeks since crash, and y = share price in cents.
(a) Draw a graph of logey against x.(b) Draw a graph of logey against logex(c) Which relationship best fits the data? Choose between y = aekx or y = axn.
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
Now insert a scatter plot of the raw data (x vs y).Format and label it.
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
Now insert a scatter plot of the raw data (x vs y).Format and label it.
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
Then right-click on any data-point on the graph.Fit a trendline:-Choose exponential (since we’ve already chosen an exponential model).
What do you notice??
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
Then right-click on any data-point on the graph.Fit a trendline:-Choose exponential (since we’ve already chosen an exponential model).
What do you notice??
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
y = 325.6e-0.999x
0
20
40
60
80
100
120
140
0 1 2 3 4 5 6
Aver
age
wee
kly
shar
e pr
ice
(cen
ts)
Weeks since October 1987 crash occurred
Graph of Ariadne Share Price for the 5 weeks following the October 1987 crash
Then right-click on any data-point on the graph.Fit a trendline:-Choose exponential (since we’ve already chosen an exponential model).
What do you notice??
Exponential model. Visually a good fit.
Look familiar?
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
y = 325.6e-0.999x
0
20
40
60
80
100
120
140
0 1 2 3 4 5 6
Aver
age
wee
kly
shar
e pr
ice
(cen
ts)
Weeks since October 1987 crash occurred
Graph of Ariadne Share Price for the 5 weeks following the October 1987 crash Look familiar?
This is the same equation we formed algebraically based on the log-linear graph.
Just to be sure that a power function model wouldn’t be better, see what happens if we try to one…
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
Just to be sure that a power function model wouldn’t be better, see what happens if we try to one…
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
Just to be sure that a power function model wouldn’t be better, see what happens if we try to one…
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
y = 164.86x-2.42
0
20
40
60
80
100
120
140
160
180
0 1 2 3 4 5 6
Aver
age
wee
kly
shar
e pr
ice
(cen
ts)
Weeks since October 1987 crash occurred
Graph of Ariadne Share Price for the 5 weeks following the October 1987 crash
Just to be sure that a power function model wouldn’t be better, see what happens if we try to one…
Power function model. Shocker!It is visually evident that this model is a poor fit to the data.
y = 325.6e-0.999x
0
20
40
60
80
100
120
140
0 1 2 3 4 5 6
Aver
age
wee
kly
shar
e pr
ice
(cen
ts)
Weeks since October 1987 crash occurred
Graph of Ariadne Share Price for the 5 weeks following the October 1987 crash
Exponential model. Visually a good fit.
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
y = 164.86x-2.42
0
20
40
60
80
100
120
140
160
180
0 1 2 3 4 5 6
Aver
age
wee
kly
shar
e pr
ice
(cen
ts)
Weeks since October 1987 crash occurred
Graph of Ariadne Share Price for the 5 weeks following the October 1987 crash
Power function model. Poor fit.
y = 325.6e-0.999x
0
20
40
60
80
100
120
140
0 1 2 3 4 5 6
Aver
age
wee
kly
shar
e pr
ice
(cen
ts)
Weeks since October 1987 crash occurred
Graph of Ariadne Share Price for the 5 weeks following the October 1987 crash
Exponential model. Visually a good fit.
So, the equation of the best model for this data is:
y = 325.6e-0.999x
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
y = 325.6e-0.999x
0
20
40
60
80
100
120
140
0 1 2 3 4 5 6
Aver
age
wee
kly
shar
e pr
ice
(cen
ts)
Weeks since October 1987 crash occurred
Graph of Ariadne Share Price for the 5 weeks following the October 1987 crash
Exponential model. Visually a good fit.
• Extension: Actually the most realistic model is probably: y = 325.6e-x
So, the equation of the best model for this data is:
y = 325.6e-0.999x
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
y = 325.6e-0.999x
0
20
40
60
80
100
120
140
0 1 2 3 4 5 6
Aver
age
wee
kly
shar
e pr
ice
(cen
ts)
Weeks since October 1987 crash occurred
Graph of Ariadne Share Price for the 5 weeks following the October 1987 crash
Exponential model. Visually a good fit.
• Extension: Actually the most realistic model is probably: y = 325.6e-x
Why?
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
y = 325.6e-0.999x
0
20
40
60
80
100
120
140
0 1 2 3 4 5 6
Aver
age
wee
kly
shar
e pr
ice
(cen
ts)
Weeks since October 1987 crash occurred
Graph of Ariadne Share Price for the 5 weeks following the October 1987 crash
Exponential model. Visually a good fit.
• Extension: Actually the most realistic model is probably: y = 325.6e-x
Why? Because the index -0.9993x is very close to -1x.
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
y = 325.6e-0.999x
0
20
40
60
80
100
120
140
0 1 2 3 4 5 6
Aver
age
wee
kly
shar
e pr
ice
(cen
ts)
Weeks since October 1987 crash occurred
Graph of Ariadne Share Price for the 5 weeks following the October 1987 crash
Exponential model. Visually a good fit.
• Extension: Actually the most realistic model is probably: y = 325.6e-x
Why? Because the index -0.9993x is very close to -1x.
General rule: If an index is very close to a multiple of 0.5, like this, it’s appropriate to round it when we give the final answer.
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
y = 325.6e-0.999x
0
20
40
60
80
100
120
140
0 1 2 3 4 5 6
Aver
age
wee
kly
shar
e pr
ice
(cen
ts)
Weeks since October 1987 crash occurred
Graph of Ariadne Share Price for the 5 weeks following the October 1987 crash
Exponential model. Visually a good fit.
• Extension: Actually the most realistic model is probably: y = 325.6e-x
Why? Because the index -0.9993x is very close to -1x.
General rule: If an index is very close to a multiple of 0.5, like this, it’s appropriate to round it when we give the final answer.
This is because our equation is only an estimated model.
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
y = 325.6e-0.999x
0
20
40
60
80
100
120
140
0 1 2 3 4 5 6
Aver
age
wee
kly
shar
e pr
ice
(cen
ts)
Weeks since October 1987 crash occurred
Graph of Ariadne Share Price for the 5 weeks following the October 1987 crash
Exponential model. Visually a good fit.
• Extension: Actually the most realistic model is probably: y = 325.6e-x
Why? Because the index -0.9993x is very close to -1x.
General rule: If an index is very close to a multiple of 0.5, like this, it’s appropriate to round it when we give the final answer.
This is because our equation is only an estimated model. An element of uncertainty exists because our data used for this scatter plot is just a SAMPLE
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
y = 325.6e-0.999x
0
20
40
60
80
100
120
140
0 1 2 3 4 5 6
Aver
age
wee
kly
shar
e pr
ice
(cen
ts)
Weeks since October 1987 crash occurred
Graph of Ariadne Share Price for the 5 weeks following the October 1987 crash
Exponential model. Visually a good fit.
• Extension: Actually the most realistic model is probably: y = 325.6e-x
Why? Because the index -0.9993x is very close to -1x.
General rule: If an index is very close to a multiple of 0.5, like this, it’s appropriate to round it when we give the final answer.
This is because our equation is only an estimated model. An element of uncertainty exists because our data used for this scatter plot is just a SAMPLE (i.e. from just one company).
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
y = 325.6e-0.999x
0
20
40
60
80
100
120
140
0 1 2 3 4 5 6
Aver
age
wee
kly
shar
e pr
ice
(cen
ts)
Weeks since October 1987 crash occurred
Graph of Ariadne Share Price for the 5 weeks following the October 1987 crash
Exponential model. Visually a good fit.
• Extension: Actually the most realistic model is probably: y = 325.6e-x
Why? Because the index -0.9993x is very close to -1x.
General rule: If an index is very close to a multiple of 0.5, like this, it’s appropriate to round it when we give the final answer.
This is because our equation is only an estimated model. An element of uncertainty exists because our data used for this scatter plot is just a SAMPLE (i.e. from just one company).
We’re using the relationship found in this sample to make an INFERENCE about a relationship between these two variables in the population
(d) Calculate the values of the constants, and hence write down an equation which models the weekly share price.
y = 325.6e-0.999x
0
20
40
60
80
100
120
140
0 1 2 3 4 5 6
Aver
age
wee
kly
shar
e pr
ice
(cen
ts)
Weeks since October 1987 crash occurred
Graph of Ariadne Share Price for the 5 weeks following the October 1987 crash
Exponential model. Visually a good fit.
• Extension: Actually the most realistic model is probably: y = 325.6e-x
Why? Because the index -0.9993x is very close to -1x.
General rule: If an index is very close to a multiple of 0.5, like this, it’s appropriate to round it when we give the final answer.
This is because our equation is only an estimated model. An element of uncertainty exists because our data used for this scatter plot is just a SAMPLE (i.e. from just one company).
We’re using the relationship found in this sample to make an INFERENCE about a relationship between these two variables in the population (i.e. behaviour of all share prices in the market over these 5 weeks).
For more practice if needed – Sigma (new) pg. 420: Ex. 20.03.
NuLake 7.6 Modelling Power FunctionsQ39Dist (d) Volume (V)
0.45 14800.71 5951.21 2051.78 952.14 653.65 23
ln(d) ln(V)-0.798508 7.299797367-0.34249 6.388561406
0.1906204 5.3230099790.5766134 4.5538768920.7608058 4.174387271.2947272 3.135494216
(b) Gradient (k) = -1.9932 Y-intercept (c) = 5.7045
Intensity of Sound
y = -1.9932x + 5.7045
R2 = 1
012345678
-1 -0.5 0 0.5 1 1.5
ln(d)
ln(V
)
(b) Gradient (k) = -1.9932 Y-intercept (c) = 5.7045
(c) Equation:y = axn
loge(y) = loge(a) + loge(xn)
loge(y) = nloge(x) + loge(a)
Y = nX + cY = -1.9932X + 5.7045 from Excel log-log graph
Where Y = loge(y) and X = loge(x)
n -1.9932c = loge(a) - So a = ec
= e5.7045
= 300 (nearest whole nbr)
So y = 300x-1.9932
Intensity of Sound
y = 300.22x-1.9932
R2 = 1
0200400600800
1000120014001600
0 0.5 1 1.5 2 2.5 3 3.5 4
Dist. (d)
Vo
lum
e (V
)
Choosing & justifying a model(s): Other ways of justifying your choice in a Scholarship
question
Back to Regression & Curve Fitting menu
Justifying choice of model – consider the nature of the variables:
Justifying choice of model – consider the nature of the variables:
The nature of the variables may suggest the type of model.
y=Axn
where n is positive.
y=Axn
where n is negative.
y=Aekx
where k is positive.
y=Aekx
where k is negative.
Justifying choice of model – consider the nature of the variables:
The nature of the variables may suggest the type of model.
Power model essential:When the physical properties of the variables under
investigation demand the use of a power model:
y=Axn
where n is positive.
y=Axn
where n is negative.
Or a root function: y=Axn
where n is between 0 and 1.
Justifying choice of model – consider the nature of the variables:
The nature of the variables may suggest the type of model.
Power model essential:When the physical properties of the variables under
investigation demand the use of a power model: Example 1:
Justifying choice of model – consider the nature of the variables:
The nature of the variables may suggest the type of model.
Power model essential:When the physical properties of the variables under investigation demand the use of a power
model: Example 1: If the variables are Length L cm (explanatory variable), and Mass M grams (response
variable) of the same species then, we can conclude 2 things about any valid model for this r/ship:1. That it must have a y-intercept (M-intercept) of (0,0), since a living organism will have no mass if it has no length!. This means that we require a Power Function with a positive exponent. Exponential functions don’t pass through (0,0).
2. That if it is assumed that the species grows uniformly in each of its 3 dimensions, then a Power model of the form M = aL3 would be appropriate.
E.g. if dropping objects of equal mass from different heights (H cm), and measuring the time taken to reach the ground (T secs.), then the constant increase in speed (acceleration due to gravity of 9.8m/s2) means 2 things:1. That it must have a y-intercept (T-intercept) of (0,0) since it would take 0 time to drop from a height of 0cm. This would require a power function with a positive exponent.2. That the r/ship must be positive but flattening out, since the greater the height from which an object is dropped, the longer it takes to reach the ground (more distance to cover), but reaching a greater speed. Hence a square root function would be reasonable, i.e. T = Ah0.5.
Justifying choice of model – consider the nature of the variables:
The nature of the variables may suggest the type of model.
Power model essential:When the physical properties of the variables under
investigation demand the use of a power model: Example 2:
Justifying choice of model – consider the nature of the variables:
The nature of the variables may suggest the type of model.
Power model essential:When the physical properties of the variables under investigation
demand the use of a power model: Example 2: Dropping objects of equal mass from different heights
(hcm), and observing the time taken to reach the ground (T secs.), then we can conclude 2 things about any valid model:1. That it must have a y-intercept (T-intercept) of (0,0) since it would take 0 time to drop from a height of 0cm. So, again, this would require a power function with a positive exponent.2. That the r/ship must be positive but flattening out, since the greater the height from which an object is dropped, the longer it takes to reach the ground (more distance to cover), but reaching a greater speed. Hence we’d expect something like a square root function, i.e. of the formT = Ah0.5.
Justifying choice of model – consider the nature of the variables:
The nature of the variables may suggest the type of model.
Exponential model essential:
y=Aekx
where k is positive.
y=Aekx
where k is negative.
Justifying choice of model – consider the nature of the variables:
The nature of the variables may suggest the type of model.
Exponential model essential:A relationship will be exponential if the change is
____________.____________means that there is a constant ________ _____.The response variable must have a finite initial value (i.e. a
y-intercept) that is being __________repeatedly by a constant.
Justifying choice of model – consider the nature of the variables:
The nature of the variables may suggest the type of model.
Exponential model essential:A relationship will be exponential if the change is
multiplicative.____________means that there is a constant ________ _____.The response variable must have a finite initial value (i.e. a
y-intercept) that is being __________repeatedly by a constant.
Justifying choice of model – consider the nature of the variables:
The nature of the variables may suggest the type of model.
Exponential model essential:A relationship will be exponential if the change is
multiplicative.Multiplicative means that there is a constant ________ _____.The response variable must have a finite initial value (i.e. a
y-intercept) that is being __________repeatedly by a constant.
Justifying choice of model – consider the nature of the variables:
The nature of the variables may suggest the type of model.
Exponential model essential:A relationship will be exponential if the change is
multiplicative.Multiplicative means that there is a constant percentage
change.The response variable must have a finite initial value (i.e. a
y-intercept) that is being __________repeatedly by a constant.
Justifying choice of model – consider the nature of the variables:
The nature of the variables may suggest the type of model.
Exponential model essential:A relationship will be exponential if the change is
multiplicative.Multiplicative means that there is a constant percentage
change.The response variable must have a finite initial value (i.e. a
y-intercept) that is being multiplied repeatedly by a constant.
Justifying choice of model – consider the nature of the variables:
The nature of the variables may suggest the type of model.
Exponential model essential:A relationship will be exponential if the change is multiplicative.Multiplicative means that there is a constant percentage
change.The response variable must have a finite initial value (i.e. a y-
intercept) that is being multiplied repeatedly by a constant.
E.g. An initial investment of $1000 earns compound interest, and increases by 4.5% every year (i.e. × 1.045 every year).
E.g. The temperature of a bowl of soup is initially 70°C. It then drops by 10% every minute until it approaches room temperature (i.e. × 0.9 every minute).
E.g. Radioactive decay – always by a constant percentage per unit of time. Therefore it would be modelled by an exponential curve
Whereas, for power curves, the proportional change varies with time (e.g. parabolas, cubics etc.).
The nature of the variables may suggest the type of model.
Exponential model essential:A relationship will be exponential if the change is multiplicative.Multiplicative means that there is a constant percentage
change.The response variable must have a finite initial value (i.e. a y-
intercept) that is being multiplied repeatedly by a constant.
Whereas, for power curves, the proportional change varies with time (e.g. parabolas, cubics etc.).
Also (for Calculus students), you’ll know that the derivative of an exponential function is proportional to its y-value:
If , then , i.e.
So if , then , i.e. . So
Therefore, a relationship will be exponential if the value on the y-axis is always changing at a rate that is proportional to its value at that instant.
xey xedx
dy
kxAey kxAekdx
dy.
ydx
dy
ykdx
dy. y
dx
dy
is proportional to
For practice of identifying this property of Exponential Functions, do a selection of questions from Old Sigma – Ex. ?, pg. ?
Choosing & justifying a model(s): Things to write about for Excellence.
VALIDITY OF A MODEL:
The extent to which the behaviour of your mathematical model reflects the relationship between your 2 variables
What makes a model valid/invalid?Will your mathematical curve estimate sensible
y-values throughout the domain of x-values in which you need use it?
Choosing & justifying a model(s): Things to write about for Excellence.
VALIDITY OF A MODEL:
The extent to which the behaviour of your mathematical model reflects the relationship between your 2 variables
– One function may model a relationship well for some domains of the explanatory variable but not for others.
– For example, many functions will estimate unrealistic values for long-range extrapolations (always a risky business).
Choosing & justifying a model(s): Things to write about for Excellence.
VALIDITY OF A MODEL:
The extent to which the behaviour of your mathematical model reflects the relationship between your 2 variables
– One function may model a relationship well for some domains of the explanatory variable but not for others.
– For example, many functions will estimate unrealistic values for long-range extrapolations (always a risky business). However such a function may still be valid for interpolation or short-range extrapolation.
Your job is to clearly specify the domain of x-values for which your model is valid (e.g. 0 < x <7, where x is whole numbers), and to explain why it is not valid elsewhere.
VALIDITY OF A MODEL:
The extent to which the behaviour of your mathematical model reflects the relationship between your 2 variables
– One function may model a relationship well for some domains of the explanatory variable but not for others.
– For example, many functions will estimate unrealistic values for long-range extrapolations (always a risky business). However such a function may still be valid for interpolation or short-range extrapolation.
Your job is to clearly specify the domain of x-values for which your model is valid (e.g. 0 < x <7, where x is whole numbers), and to explain why it is not valid elsewhere.
Then, you could consider splitting the data into piece-wise form and fitting a different function to the x-value(s) for which your original function is not valid.
Choosing & justifying a model(s): Things to write about for Excellence.
KEY: Plot your chosen model as a graph on your GC. Examine its features:
Use the GRAPHS option on your GC, selecting “draw” to sketch your chosen function to check its key features and “G-solve” to locate intercepts etc.
Choosing & justifying a model(s): Things to write about for Excellence.
KEY: Plot your chosen model as a graph on your GC. Examine its features:
Use the GRAPHS option on your GC, selecting “draw” to sketch your chosen function to check its key features and “G-solve” to locate intercepts etc.
Things to mention: Intercepts, asymptotes, maxima, minima.
You will need to comment on any of these and how they model (or don’t model) the behaviour of the relationship between your explanatory and response variables.
Asymptotes:
You need to be aware of these when discussing your
model:
y=Axn
where n is positive.
y=Axn
where n is negative.
y=Aekx
where k is positive.
y=Aekx
where k is negative.
Asymptotes:
You need to be aware of these when discussing your
model:
– Exponential curves have a horizontal asymptote (will be the x-
axis itself unless the graph is translated vertically). However
exponential functions do not have a vertical asymptote – i.e.
despite getting steeper and steeper, they still exist (are
continuous) for all values of x.y=Aekx
where k is positive.
y=Aekx
where k is negative.
Asymptotes:
You need to be aware of these when discussing your
model:- Power curves with positive exponents have no asymptotes.- Power curves with negative exponents will have two
asymptotes – the x and y axes themselves! Hence no y or x intercept.
y=Axn
where n is positive.
y=Axn
where n is negative.
In summary, When choosing a model and (later on) discussing its validity:What should you do?
Answer: Look at your chosen type of curve on your graph (plot on your G.C.). Look at the behaviour of the graph:
What is its y-intercept and does this suit your variables? Does your response variable have any limiting values that it
physically cannot go above (or below) – ‘ceilings’ or ‘floors’? Does your model reflect the behaviour of the r/ship between
these 2 variables for all possible x-values? Are there any x values you could sub in, where you’d get an unrealistic y-value estimate? Explain.
Piecewise functionsWhere a graph is split into different sections. A different
equation is used for different domains along the x-axis.E.g. y
x
Key words that you must know:
DOMAIN: interval along the x-axis (explanatory variable).
RANGE: interval along the y-axis (response variable).
Back to Regression & Curve Fitting menu
Piecewise functionsWhere a graph is split into different sections. A different
equation is used for different domains along the x-axis.E.g. y
x
E.g. A telephone account could be charged at a flat rate of $30 for the first 100 calls per month, and then at 20c for each extra call.Represent this as a piecewise function and draw its graph.Find the cost to the subscriber of 125 calls per month.
Key words that you must know:
DOMAIN: interval along the x-axis (explanatory variable).
RANGE: interval along the y-axis (response variable).
Where a graph is split into different sections. A different equation is used for different domains along the x-axis.
E.g. y
x
E.g. A telephone account could be charged at a flat rate of $30 for the first 100 calls per month, and then at 20c for each extra call.Represent this as a piecewise function and draw its graph.Find the cost to the subscriber of 125 calls per month.
Key words that you must know:
DOMAIN: interval along the x-axis (explanatory variable).
RANGE: interval along the y-axis (response variable).
For the first 100 calls, the function is:
number. wholea is where, 1000domain for the , 30)( xxxf
E.g. A telephone account could be charged at a flat rate of $30 for the first 100 calls per month, and then at 20c for each extra call.Represent this as a piecewise function and draw its graph.Find the cost to the subscriber of 125 calls per month.
For the first 100 calls, the function is:
For x > 100 the calls begin to be charged
ie x 100 calls are charged at $0.2 each, but the first 100 cost $30 so,
number. wholea is where, 1000domain for the , 30)( xxxf
y
xKey words that you must know:
DOMAIN: interval along the x-axis (explanatory variable).
RANGE: interval along the y-axis (response variable).
E.g. A telephone account could be charged at a flat rate of $30 for the first 100 calls per month, and then at 20c for each extra call.Represent this as a piecewise function and draw its graph.Find the cost to the subscriber of 125 calls per month.
For the first 100 calls, the function is:
For x > 100 the calls begin to be charged
ie x 100 calls are charged at $0.2 each, but the first 100 cost $30 so,
number. wholea is where, 1000domain for the , 30)( xxxf
number. wholea is where, 100for , )100(2.030)( xxxxf
Key words that you must know:
DOMAIN: interval along the x-axis (explanatory variable).
RANGE: interval along the y-axis (response variable).
E.g. A telephone account could be charged at a flat rate of $30 for the first 100 calls per month, and then at 20c for each extra call.Represent this as a piecewise function and draw its graph.Find the cost to the subscriber of 125 calls per month.
For the first 100 calls, the function is:
For x > 100 the calls begin to be charged
ie x 100 calls are charged at $0.2 each, but the first 100 cost $30 so,
The piecewise function is
number. wholea is where, 1000domain for the , 30)( xxxf
number. wholea is where, 100for , )100(2.030)( xxxxf
nbr. wholea is where,100
nbr. wholea is where,1000
),100(2.030
,30)(
xx
xx
xxf
E.g. A telephone account could be charged at a flat rate of $30 for the first 100 calls per month, and then at 20c for each extra call.Represent this as a piecewise function and draw its graph.Find the cost to the subscriber of 125 calls per month.
The piecewise function is
nbr. wholea is where,100
nbr. wholea is where,1000
),100(2.030
,30)(
xx
xx
xxf
Graph.
30
100
y
x
f(x)
E.g. A telephone account could be charged at a flat rate of $30 for the first 100 calls per month, and then at 20c for each extra call.Represent this as a piecewise function and draw its graph.Find the cost to the subscriber of 125 calls per month.
The piecewise function is
nbr. wholea is where,100
nbr. wholea is where,1000
),100(2.030
,30)(
xx
xx
xxf
Graph.
30
100
y
x
f(x)
E.g. A telephone account could be charged at a flat rate of $30 for the first 100 calls per month, and then at 20c for each extra call.Represent this as a piecewise function and draw its graph.Find the cost to the subscriber of 125 calls per month.
The piecewise function is
nbr. wholea is where,100
nbr. wholea is where,1000
),100(2.030
,30)(
xx
xx
xxf
Graph.
30
100
y
x
f(x)
For 125 calls: 125 is greater than 100, so use the second formula
f(x) = 30 + 0.2(x – 100) Sub in x=125
Do NuLake pg. 338: Q3032
HW: Old Sigma (2nd ed): p302 – Ex. 17.5.
Sigma Mathematics Workbook© Pearson Education New
Zealand 2007
20.05A telephone account could be charged at a flat rate of $30 for the first 100 calls per month, and then at 20c for each extra call.Find the cost to the subscriber of 125 calls per month.
30, 0 100, W( )
30 0.2( 100), 100, W
x xf x
x x x
For 125 calls: 125 is greater than 100, so use the second formula
f(x) = 30 + 0.2(x – 100)
f(125) = 30 + 0.2(125 – 100)
= 30 + 5
= $35