Year 13Mathematics
Contents
uLake Ltdu a e tduLake LtdInnovative Publisher of Mathematics Texts
Robert Lakeland & Carl NugentBivariate Data
• AchievementStandard .................................................. 2
• BivariateData .......................................................... 3
• ScatterPlots............................................................ 4
• Relationships........................................................... 7
• CorrelationCoefficient................................................... 12
• RegressionAnalysis ..................................................... 24
• Outliers............................................................... 35
• Residuals.............................................................. 40
• ResidualPlots.......................................................... 41
• Causation ............................................................. 46
• Non-LinearRegression................................................... 47
• BivariateInvestigation................................................... 52
• PracticeInternalAssessment ............................................. 54
• Answers............................................................... 57
IAS 3.9
IAS 3.9 – Year 13 Mathematics and Statistics – Published by NuLake Ltd New Zealand © Robert Lakeland & Carl Nugent
2 IAS 3.9 – Bivariate Data
Thisachievementstandardinvolvesstudentsinvestigatingbivariatemeasurementdata.
◆ ThisachievementstandardisderivedfromLevel8ofTheNewZealandCurriculumandisrelatedto theachievementobjectives Carryoutinvestigationsofphenomena,usingthestatisticalenquirycycle ❖ usingexistingdatasets ❖ finding,using,andassessingappropriatemodels(includinglinearregressionforbivariate data),seekingexplanations,andmakingpredictions ❖ usinginformedcontextualknowledgeandstatisticalinference ❖ communicatingfindingsandevaluatingallstagesofthecycle intheStatisticsstrandoftheMathematicsandStatisticsLearningArea.
◆ Investigatebivariatemeasurementdatainvolvesshowingevidenceofusingeachcomponentofthe statisticalenquirycycle.
◆ Investigatebivariatemeasurementdata,withjustification involveslinkingcomponentsofthe statisticalenquirycycletothecontext,andreferringtoevidencesuchasstatistics,datavalues,trends, orfeaturesofvisualdisplaysinsupportofstatementsmade.
◆ Investigatebivariatemeasurementdata,withstatisticalinsight involvesintegratingstatisticaland contextualknowledgethroughouttheinvestigationprocess,andmayincludereflectingaboutthe process;consideringotherrelevantvariables;evaluatingtheadequacyofanymodels,orshowinga deeperunderstandingofthemodels.
◆ Usingthestatisticalenquirycycletoinvestigatebivariatemeasurementdatainvolves: ❖ posinganappropriaterelationshipquestionusingagivenmultivariatedataset
❖ selectingandusingappropriatedisplay(s) ❖ identifyingfeaturesinthedata ❖ findinganappropriatemodel ❖ describingthenatureandstrengthoftherelationshipandrelatingthistothecontext ❖ usingthemodeltomakeaprediction ❖ communicatingfindingsinaconclusion.
Measurementdatacaneitherbediscreteorcontinuousinnature.Inregressionanalysisthe y-variable,orresponsevariable,mustbeacontinuousvariable.Thex-variableorexplanatory variablecanbeeitheradiscreteorcontinuousvariable.Therelationshipmaybenon-linear.
◆ UseandinterpretationofR2isnotexpectedatthislevel.
Achievement Achievement with Merit Achievement with Excellence• Investigatebivariate
measurementdata.• Investigatebivariate
measurementdata,withjustification
• Investigatebivariatemeasurementdata,withstatisticalinsight.
NCEA 3 Internal Achievement Standard 3.9 – Bivariate Data
3IAS 3.9 – Bivariate Data
IAS 3.9 – Year 13 Mathematics and Statistics – Published by NuLake Ltd New Zealand © Robert Lakeland & Carl Nugent
Bivariate Data
IntroductionInstatisticsweareofteninterestedinidentifyingrelationshipsinvolvingmorethanonevariable.InthisbookletwestudyBivariateDatawhichdealswiththestudyoftwoquantitativevariables(pairsofvariables)andidentifyingrelationshipsbetweenthem.Weaimtoidentifytherelationshipbetweenthevariables,graphicallywiththeuseofascatterplotandquantitativelybylookingatcorrelation(thedegreetowhichtwoormorequantitiesarelinearlyassociated).Inadditionweuseregressiontoenableustomakequantitativepredictions(interpolation)ofonevariablefromtheother.MuchoftheemphasisinthisAchievementStandardisonthevisualinterpretationofscatterplotsaswellaslinkingstatisticalknowledgetothecontextofthequestionandusingappropriatereasoningandreflection.TothisendwearesuggestingthatstudentshaveaccesstoiNZight“a simple data analysis system which encourages exploring what data is saying without the distractions of driving complex software (iNZight website)”.INZightcanbedownloadedfromhttp://www.stat.auckland.ac.nz/~wild/iNZight/dlw.htmlandcanbeinstalledoneitheraMacorWindowscomputer.
Alldatasetsusedinthisbookletcanbedownloadedfromourwebsiteatwww.nulake.co.nzunderthe‘Downloads’linkandimporteddirectlyintoiNZightorExcel.
©uLake LtduLake Ltd Innovative Publisher of Mathematics Texts
7IAS 3.9 – Bivariate Data
IAS 3.9 – Year 13 Mathematics and Statistics – Published by NuLake Ltd New Zealand © Robert Lakeland & Carl Nugent
Whenwedrawascatterplottheshapeoftheplotisusuallydefinedaslinearornon-linear(curved).Acorrelationexistsbetweentwovariableswhenoneofthemisrelatedorcanbeinfluencedbytheotherinsomeway.Thestrengthofarelationshipcanbeidentifiedvisually,onascatterplot,byhowtightlyorspreadouttheplottedpointsareandthedirectionoftherelationshipcanbeidentifiedbythegeneralslopeofthepatternofthedata.Studythescatterplotsbelowandnotethevisualdescriptionofshape,strengthanddirectionoftherelationshipbetweenxandy.
Relationships
Inspecting a Scatter Plot
y
xlinear, strong positive relationship
y
xlinear, moderate positive relationship
y
xnon-linear, strong positive relationship
y
x
non-linear,moderatenegativerelationship
y
x
norelationship
Whenwearegivenascatterplotandaskedtocommentontherelationshipbetweenthevariablesourfirstapproachisavisualone.Thefirststepistomakesureweunderstandtheprecisemeaningofthevariablesbeingusedaswellastheunitsappliedtothevariables.Sometimesitmaybenecessarytoresearchthemeaningofthevariablessothatyouhaveabetterunderstandingpriortoinvestigatingapossiblerelationship.Nextlookatthedegreeofscatterofthepoints.Arethepointsclumpedtogetherorspreadout?Arethepointsinanumberofdistinctclustersoristherejustasingleclusterofpointswithoneortwopointsattheextremesofthescatterplot?Arethereanyunusualobservationsthatgoagainstthegeneraltrendofthescatterplot?Arethesepointsstillrealistic,i.e.cantheybeexplained?Whatisthevariationofscatterinthescatterplot?Arethepointsclosetoeachotheroristheresignificantgapsbetweenthem?Dothepointsfollowadiscerniblepatternortrend,e.g.followingastraightline?
Correlation
y
xlinear, weak positive relationship
y
x
linear,strongnegativerelationship
y
x
linear,moderatenegativerelationship
y
x
linear,weaknegativerelationship
Anoutlierisoneormorepointsthatdonotfollowthetrend.
Whendeterminingwhetherarelationshipexistsbetweentwovariableswealsowanttobeabletomeasurethestrengthanddirectionofarelationshipquantitatively,andtodothisforlinearrelationshipswecancalculatethe(linear)correlationcoefficient(r).Thisisexplainedindetailonpage12.
y
xstrongrelationshipwithoutliers
©uLake LtduLake Ltd Innovative Publisher of Mathematics Texts
29IAS 3.9 – Bivariate Data
IAS 3.9 – Year 13 Mathematics and Statistics – Published by NuLake Ltd New Zealand © Robert Lakeland & Carl Nugent
Example
Aheateristurnedoninacoldroomandthetemperature(˚C)oftheroomisrecordedevery fiveminutes.Theresultsaregivenbelow.FindtheleastsquaresregressionlineusingiNZight.
Time(x) C(y)0 0.35 1.810 3.715 5.420 8.125 10.230 11.435 13.840 15.1
Impor t thedataset‘Example’intoiNZightandthendrawascatterplotoftimeversustemperature.
Addalineartrendline(regressionline)tothescatterplotbyclickingonthebutton‘AddtoPlot’andthenontheradiobutton‘Addtrendcurves’.Clickonthecheckbox‘linear’followedby‘ShowChanges’and‘Done’.
Tofindtheregressionlineclickonthe‘GetSummary’button.Boththecorrelationcoefficientandtheregressionlinearedisplayedinthewindow.
Theregressionlineiswrittenintermsofthevariablenamesratherthanxandy, i.e.Degrees.C(y)=0.38467*Time.mins(x)+0.06
©uLake LtduLake Ltd Innovative Publisher of Mathematics Texts
37IAS 3.9 – Bivariate Data
IAS 3.9 – Year 13 Mathematics and Statistics – Published by NuLake Ltd New Zealand © Robert Lakeland & Carl Nugent
48. DownloadthedatasetcalledQuestion48.csvfromunderthe‘Downloads’linkonourwebsite(www.nulake.co.nz). ImportthefileintoiNZight.
a) UsingiNZightproduceascatterplotof xversusy. Visuallydescribeyourscatterplotand identifyanyfeaturesthatstandout,e.g. strength(degreeofscatter),groupings (clusters),unusualobservations,and thevariationofscatteretc.
Commentonthelinearcorrelation coefficientforyourscatterplotandfindthe leastsquaresregressionline.
Predictthevalueofywhenx=5.0.
b) ImportthefileQuestion48b.csvwhich isthesamedatasetasabovewithsome additionalpointsincluded.Drawascatter plotofthedata.Isthereanypointthat youcouldconsiderapossibleoutlier?Ifso identifythepoint.
c) Whateffectdoesyourchosenpointinb) haveonthelinearcorrelationcoefficientand regressionline?
d) Wouldyourchosenpointinpartb)be betteridentifiedasaninfluential pointratherthananoutlierorboth? Justifyyouranswer.
e) ImportthefileQuestion48e.csvwhich isthesamedatasetasabovewithsome additionalpointsincluded.Drawascatter plotofthedata.Isthereanypointthat youcouldconsiderapossibleoutlier?Ifso identifythepoint.
f) Whateffectdoesyourchosenpointine) haveonthelinearcorrelationcoefficientand regressionline?
g) Wouldyourchosenpointinparte)bebetter identifiedasaninfluentialpointor anoutlierorbothorneither?Justifyyour answer.
Achievement – Answerthefollowingquestions.
©uLake LtduLake Ltd Innovative Publisher of Mathematics Texts
IAS 3.9 – Year 13 Mathematics and Statistics – Published by NuLake Ltd New Zealand © Robert Lakeland & Carl Nugent
42 IAS 3.9 – Bivariate Data
ExampleOpenthedatasetQuestion16inExcel,whichcomprisesthemeanlifespan(inyears),metabolicrate(cm3/g/h),gestationperiod(days)andbrainweight(g)of27mammals.ByeitherusingtheAnalysisToolpakwhichcanbeinstalledusing‘Add-Ins’forExcel2010orStatPlusmacLEwhichisaseparateprogramthatcanbedownloadedfromwww.analystsoft.com/en/products/statplusmacle/forMac,produceaplotoftheresidualsforgestationperiodversuslifespanofthe27mammalsandcommentonwhattheplottellsusaboutusingalinearmodelfortheassociationbetweenthesetwovariables.
IfyouareusingStatPlusmacLEonaMacbeginbybootingtheapplicationStatPlus.WhenStatPlusbootsitwillalsobootExcel.OpenanewworksheetinExcel.
UsingExcelopenthedatasetQuestion16.csv.CopythetwocolumnsCandD,i.e.LifespanandGestationPeriodfromtheQuestion16.csvspreadsheetintocolumnsAandBofyourblankworksheet.NowclosetheQuestion16.csvspreadsheet.
IntheStatPlusapplicationchoosethemenuoption‘Statistics’,then‘Regression’,then‘MultipleLinearRegression’.Thefollowingwindowappears.
Clickonthebuttontotherightofthe‘Dependentvariable’field.Rememberthedependentvariableistheresponseoryvariable.TheExcelwindowwillbeactivatedsoyoucanselectthecellsrepresentingthedependentvariableofourdatapair,i.e.‘Lifespan’.WhenyoureturntoStatPlus[Workbook1]Sheet1!$S$1:$A$28shouldappear.Clickonthebuttontotherightofthe‘Independentvariable’field.Theindependentvariableistheexplanatoryorxvariable.TheExcelwindowwillbeactivatedsoyoucanselectthecellsrepresenting
theindependentvariableofourdatapair,thatis,‘GestationPeriod’.WhenyoureturntoStatPlus[Workbook1]Sheet1!$B$1:$B$28shouldappear.Nowclickthecheckboxes‘Plotresidualsvsfitted’and‘Plotlinefit’followedby‘OK‘.
Generatedaretheresiduals,aplotoftheresidualsandascatterplotinanExcelspreadsheet.Thespreadsheetincludesalargeamountofinformationincludingtheleastsquaresregressionline,(Lifespan=9.44087+0.10247*Gestationperiod),thelinearcorrelationcoefficient(r=0.69011)aswellasatableofresiduals(seebelowandthenextpage).
cont...
Thepredictedyvaluesarethosegeneratedbysubstitutingtheappropriategestationperiodintotheleastsquaresregressionline.Forobservation1thegestationperiodis14.SubstitutingthisintoLifespan=9.44087+0.10247*Gestationperiodgivesusapredictedvalue,i.e.Lifespan(y)of10.8754.Theobservedpointforagestationperiodof14daysisalifespanof50yearssothedifferencebetween50and10.8754,i.e.39.1246isourresidual.
Instructions for StatPlus mac LE are given below. See the following page for information on the instructions for using the Analysis ToolPak.
Correlationcoefficient
Regressionequation
©uLake LtduLake Ltd Innovative Publisher of Mathematics Texts
49IAS 3.9 – Bivariate Data
IAS 3.9 – Year 13 Mathematics and Statistics – Published by NuLake Ltd New Zealand © Robert Lakeland & Carl Nugent
Non-Linear Regression cont...Analternativetofittingalogarithmicmodelcouldbetoinvestigateusinganegativeparabola,especiallyifyouareusingiNZightwhereyouhavelimitedoptions(seebelow).
Theequationforthequadratictrendis Strength=1016.18Day–19.35Day2+12520.5kPawhichgivesastrengthof14475kPafor2dayscuringandastrengthof20747kPafor10dayscuring.Comparingthethreetrends(linear,logarithmicandquadratic)thelinearinthelongterm(>25days)willbecomelessreliableasitwillcontinuetoincreaseovertime.Thesamewillbetrueofthequadraticbecauseitwillbegintodecreaseoncethemaximumpointofthequadraticisreached.Inthelongtermthelogarithmicmodelislikelytobethebestfitasitapproachesanasymptote.Anotheroptionistoadoptapiecewisefunctionwhichisacombinationofthedifferentfunctionsoverdifferentsubsetsofx.
iNZight at present only allows quadratic and cubic models for non-linear functions. If you require exponential, logarithmic or power models the authors suggest you use Excel. You can start from the scatter plots produced by the Analysis Toolpak or the StatPlus mac LE. ©
uLake LtduLake Ltd Innovative Publisher of Mathematics Texts
IAS 3.9 – Year 13 Mathematics and Statistics – Published by NuLake Ltd New Zealand © Robert Lakeland & Carl Nugent
58 IAS 3.9 – Bivariate Data
Page 10 Q16. cont...
g) Thereappearstobea moderatepositivelinear associationbetweenthe twovariables. Mammalswithalonger gestationperiodhavea longerlifespan.Two unusualobservationsare theEchnidawhichhasa gestationperiodofonly 14daysbutalonglifespan of50yearsandmanwith agestationperiodof267 daysandaverylong lifespanof86years. Mostpointsfall withinthegestationrange of0to<200days andlifespanof0to20 years.
h) Metabolicrate (explanatory)andgestation period(response). Investigatinghow metabolicratepredicts gestationperiodnotthe otherwayround.
i)
Page 11 Q16. cont...
j) Theyhavealowgestation period.
k) Notanobviouslinearone althoughgreatermetabolic ratedoesresultinalower gestationperiod.More likelyanegativenon-linear relationship.
Page 11 Q16. cont...
l) Thereappearstobeweaknegativenon-linearassociationbetweenthetwovariables.Mammalswithagreatermetabolicratehaveashortergestationperiod.Mostpointsareclusteredinthemetabolicrange0to<1.0andgestationperiodof0to<300days.ExtremepointsarethatoftheAsianelephant,littlebrownbatandE.Americanmolealthoughtheywouldstillfitanon-linearmodel.
m)
n) Gestationperiodisa reasonablepredictorof lifespanalthough manandtheE.American molegoagainstthetrend. Alinearmodelcouldbe suitableforthesetwo variables. Metabolicrateversus lifespanwouldbe bettersuitedtoa non-linearmodel. AgaintheE.American molegoesagainst thetrend.Asuitable non-linearmodelwould probablybeagood predictoroflifespan. Withbrainweight versuslifespanalinear modelisaffected bytheAsian elephantandE.American mole.
Page 11 Q16 n) cont...
Overallgestationlooks likethebetterpredictor fromalinearperspective butmetabolicratefrom anon-linearperspective. Brainweightisnotasgood astheothertwo.
Page 16
17.
r=–0.912
18.
r=0.975
19. r=0.920
Page 17 (Answers may vary)
20. 1.Constantdifference.
21. Closeto–1.Theolderacarthelessitsvalue.
22. Somewhatnegative.Lesswelleducatedpeoplesmoke.
23. Closeto1.Peoplewithbigfeetaregenerallytaller.
24. Somewhatnegative.Morecigarettessmokedthelowertheleveloffitness.
25. –1.Themoreyouspendthelessyousave.
26. Closeto1.Tallmenoftenchoosetallwives.
27. Somewhatpositive.Morelandareagreaterprice.
x y xy x2 y2
5 40.1 200.5 25 1608.0115 32.2 483 225 1036.8418 35.1 631.8 324 1232.0120 34.3 686 400 1176.4925 23.6 590 625 556.9630 26.9 807 900 723.6138 24.1 915.8 1444 580.8150 20.0 1000 2500 400201 236.3 5314.1 6443 7314.73
x y xy x2 y2
1 3.1 3.1 1 9.612 4.4 8.8 4 19.363 7.2 21.6 9 51.844 6.6 26.4 16 43.565 15 75 25 2256 14.1 84.6 36 198.819 20.3 182.7 81 412.0910 25.3 253 100 640.0940 96.0 655.2 272 1600.36
©uLake LtduLake Ltd Innovative Publisher of Mathematics Texts