+ All Categories
Home > Documents > Math 385/585 Applied Regression Analysis · 2017-09-01 · • Know how to use indicator variables...

Math 385/585 Applied Regression Analysis · 2017-09-01 · • Know how to use indicator variables...

Date post: 02-Apr-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
12
Math 385/585 Applied Regression Analysis Fall 2017 Section 001 1:50 to 2:50 M W F Instructor: Dr. Chris Edwards Phone: 948-3969 Office: Swart 123 Classroom: Swart 3 Text: Applied Linear Statistical Models,5 th edition, by Kutner, Nachtsheim, Neter, and Li. Earlier editions of the text will likely be adequate, but you will have to allow for different page numbers and homework problem numbers. Catalog Description: A practical introduction to regression emphasizing applications rather than theory. Simple and multiple regression analysis, basic components of experimental design, and elementary model building. Both conventional and computer techniques will be used in performing the analyses. Prerequisite: Math 201 or Math 301 and Math 256 each with a grade of C or better. Course Objectives: Linear models in statistics are the backbone of many applications, including regression and ANOVA techniques. Math 385 focuses students on the regression aspect of modeling while Math 386 focuses students on the ANOVA aspect. In Math 385, students will learn how to calculate and interpret regression estimates, including parameter estimates, fits, and residuals, and will be able to perform statistical inference. In addition to simple linear regression, successful students will understand the issues introduced in multiple linear regression, including polynomial regression and non-linear regression. Finally, the student will be able to assess model adequacy and know methods to update and improve the model. Upon successful completion of the course, students are expected to have the ability to complete the following: Identify and understand the components and assumptions for the standard linear regression model Use statistical inference on regression model coefficients, including confidence intervals and hypothesis tests Construct and interpret the ANOVA table for describing a linear regression model Calculate and analyze residuals from a regression model Perform diagnostics on a regression model, including assessing lack of fit Perform remedial measures such as transformations to improve a regression model Understand how linear algebra can be used to describe a multiple regression model Perform inference in multiple regression and understand how the increased number of dimensions adds complexity to the interpretations due to collinearity Understand how to fit polynomial regression models Know how to use indicator variables in regression models Be able to build a model from a pool of variables, using techniques such as Best Subsets and Stepwise Regression Identify outliers, in both the X and Y dimensions, in multiple regression models Understand the basics of non-linear regression, including Logistic Regression
Transcript
Page 1: Math 385/585 Applied Regression Analysis · 2017-09-01 · • Know how to use indicator variables in regression models • Be able to build a model from a pool of variables, using

Math385/585AppliedRegressionAnalysisFall2017

Section0011:50to2:50MWF

Instructor:Dr.ChrisEdwards Phone:948-3969 Office:Swart123

Classroom:Swart3 Text:AppliedLinearStatisticalModels,5thedition,byKutner,Nachtsheim,Neter,andLi.Earliereditionsofthetextwilllikelybeadequate,butyouwillhavetoallowfordifferentpagenumbersandhomeworkproblemnumbers.

Catalog Description: A practical introduction to regression emphasizing applications rather thantheory. Simple and multiple regression analysis, basic components of experimental design, andelementarymodelbuilding.Bothconventionalandcomputertechniqueswillbeusedinperformingtheanalyses.Prerequisite:Math201orMath301andMath256eachwithagradeofCorbetter.

Course Objectives: Linear models in statistics are the backbone of many applications, includingregressionandANOVAtechniques.Math385 focusesstudentsontheregressionaspectofmodelingwhile Math 386 focuses students on the ANOVA aspect. In Math 385, students will learn how tocalculateandinterpretregressionestimates,includingparameterestimates,fits,andresiduals,andwillbeabletoperformstatisticalinference.Inadditiontosimplelinearregression,successfulstudentswillunderstand the issues introduced inmultiple linear regression, including polynomial regression andnon-linearregression.Finally,thestudentwillbeabletoassessmodeladequacyandknowmethodstoupdateandimprovethemodel.

Uponsuccessfulcompletionofthecourse,studentsareexpectedtohavetheabilitytocompletethefollowing:• Identifyandunderstandthecomponentsandassumptionsforthestandardlinearregressionmodel• Usestatisticalinferenceonregressionmodelcoefficients,includingconfidenceintervalsandhypothesistests

• ConstructandinterprettheANOVAtablefordescribingalinearregressionmodel• Calculateandanalyzeresidualsfromaregressionmodel• Performdiagnosticsonaregressionmodel,includingassessinglackoffit• Performremedialmeasuressuchastransformationstoimprovearegressionmodel• Understandhowlinearalgebracanbeusedtodescribeamultipleregressionmodel• Performinferenceinmultipleregressionandunderstandhowtheincreasednumberofdimensionsaddscomplexitytotheinterpretationsduetocollinearity

• Understandhowtofitpolynomialregressionmodels• Knowhowtouseindicatorvariablesinregressionmodels• Beabletobuildamodelfromapoolofvariables,usingtechniquessuchasBestSubsetsandStepwiseRegression

• Identifyoutliers,inboththeXandYdimensions,inmultipleregressionmodels• Understandthebasicsofnon-linearregression,includingLogisticRegression

Page 2: Math 385/585 Applied Regression Analysis · 2017-09-01 · • Know how to use indicator variables in regression models • Be able to build a model from a pool of variables, using

Grading:Finalgradesarebasedonthese300points:

Topic Points TentativeDate ChaptersExam1 SimpleLinearRegression 70pts. October6 1to4Exam2 MultipleRegressionI 70pts. November13 5to8Exam3 MultipleRegressionII 70pts. December15 9to11,13

and14Homework 15PointsEach 90pts.

Homework: I will collect (around) 5 homework problemsapproximatelyonceeveryotherweek.Theduedatesare listedonthecourseoutlinebelow.Isuggestthatyouworktogetherinsmallgroupsonthehomeworkifyoulike;don’tforgetthatIamaresourceforyoutouse.Oftenwewillusecomputersoftwareto perform our analyses; include printouts where appropriate,but pleasemake your papers readable. In otherwords, I don’twant25pagesofprintouthanded in ifyoucansummarize it intwopages.

OfficeHours:OfficehoursaretimeswhenIwillbeinmyofficetohelpyou.TherearemanyothertimeswhenIaminmyoffice.IfIaminandnotbusy, Iwillbehappytohelp.MyofficehoursforFall2017semesterare3:00to3:45MondayandWednesday,and9:00to11:00Tuesday.

Philosophy:Istronglybelievethatyou,thestudent,aretheonlypersonwhocanmakeyourselflearn.Therefore,wheneveritisappropriate,Iexpectyoutodiscoverthemathematicswewillbeexploring.Idonotfeelthatlecturingtoyouwillteachyouhowtodomathematics.Ihopetobeyourguidewhilewelearnsomemathematics,butyouwillneedtodothelearning.Iexpecteachofyoutocometoclasspreparedtodigesttheday’smaterial.ThatmeansyouwillbenefitmostbyhavingreadeachsectionofthetextandtheDayByDaynotesbeforeclass.

Mypersonalbelief is thatone learnsbestbydoing. Ibelieve thatyoumustbe trulyengaged in thelearningprocesstolearnwell.Therefore,Idonotthinkthatmyroleasyourteacheristotellyoutheanswerstotheproblemswewillencounter;ratherIbelieveIshouldpointyouinadirectionthatwillallow you to see the solutions yourselves. To accomplish that goal, I will find different interactiveactivitiesforustoworkon.Yourjobistouseme,yourtext,yourfriends,andanyotherresourcestobecomeadeptatthematerial.TheDayByDaynotesalsoincludeSkillsthatIexpectyoutoattain.

Math 585 Expectations: Expectations for the graduate students are understandably more rigorousthanfortheundergraduatestudent.StudentstakingMath585willhaveanextratheoreticalproblemadded to eachhomework, tobe assignedduring the semester. In addition, a final projectworth50pointswillbedueattheendofthesemester.Thisprojectwillinvolveacompleteanalysisofadataset,includingmodelestimation,development,andvalidation.

Finalgradesareassignedasfollows:270pts. A(90%)260pts. A-(87%)250pts. B+(83%)240pts. B(80%)230pts. B-(77%)220pts. C+(73%)210pts. C(70%)200pts. C-(67%)190pts. D+(63%)180pts. D(60%)179pts.orless F

Page 3: Math 385/585 Applied Regression Analysis · 2017-09-01 · • Know how to use indicator variables in regression models • Be able to build a model from a pool of variables, using

Monday Wednesday Friday

September4NoClass

September6Day1Introduction,LeastSquares

September8Day2Models

Sections1.1to1.5

September11Day3Estimation

Sections1.6to1.8

September13Day4Inference

Sections2.1to2.3

September15Day5IntervalEstimatesSections2.4to2.6

September18Day6Homework1Due

ANOVASection2.7

September20Day7GLM

Section2.8

September22Day8ResidualsI

Sections3.1to3.6

September25Day9ResidualsII

Sections3.1to3.6

September27Day10LackofFitSection3.7

September29Day11TransformationsSections3.8to3.9

October2Day12Homework2Due

SimultaneousInferenceSections4.1to4.3

October4Day13Review

October6Day14Exam1

October9Day15IntrotoMatricesSections5.1to5.7

October11Day16RegressionMatricesSections5.8to5.13

October13Day17Mult.Reg.ModelsSections6.1to6.2

October16Day18Inference

Sections6.3to6.6

October18Day19IntervalsSection6.7

October20Day20DiagnosticsSection6.8

October23Day21Homework3Due

ExtraSSSection7.1

October25Day22GLMTests

Sections7.2to7.3

October27Day23ComputationalProblemsand

MulticollinearitySections7.5to7.6

October30Day24PolynomialModels

Section8.1

November1Day25InteractionsISection8.1

November3Day26InteractionsIISection8.2

November6Day27DummyVariablesISections8.3to8.7

November8Day28DummyVariablesIISections8.3to8.7

November10Day29Homework4Due

Review

November13Day30Exam2

November15Day31ModelBuilding

Sections9.1to9.3

November17Day32BestSubsets

Sections9.4to9.6

November20Day33Diagnostics

Sections10.1to10.2

November22NoClass

November24NoClass

November27Day34XOutliers

Section10.3

November29Day35Homework5Due

YOutliersSection10.4

December1Day36Trees

Section11.4

December4Day37Non-LinearRegressionISections13.1to13.2

December6Day38Non-LinearRegressionIISections13.3to13.4

December8Day39LogisticRegressionSections14.2to14.3

December11Day40Homework6DueLogisticInference

Section14.5

December13Day41Review

December15Day42Exam3

Page 4: Math 385/585 Applied Regression Analysis · 2017-09-01 · • Know how to use indicator variables in regression models • Be able to build a model from a pool of variables, using

HomeworkAssignments:(subjecttochangeifwediscoverdifficultiesaswego)

Homework1 DueSeptember181.19,p.35

GradePointAverage. Thedirector of admissions of a small college selected 120 students atrandomfromthenewfreshmanclassinastudytodeterminewhetherastudent’sgradepointaverage(GPA)attheendofthefreshmanyear(𝑌)canbepredictedfromtheACTtestscore(𝑋). The results of the study follow. Assume that first-order regression model (1.1) is

appropriate.

a.) Obtain the least squares estimates of 𝛽! and 𝛽!, and state the estimated regressionfunction.

b.) Plottheestimatedregressionfunctionandthedata.Doestheestimatedregressionfunctionappeartofitthedatawell?

c.) Obtainapointestimateof themean freshmanGPAforstudentswithACTtestscore𝑋 =30.

d.) What is the point estimate of the change in themean responsewhen the entrance testscoreincreasesbyonepoint?

1.23,p.36

RefertoGradePointAverageProblem1.19.

a.) Obtaintheresiduals𝑒!.Dotheysumtozeroinaccordwith(1.17)?

b.) Estimate𝜎!and𝜎.Inwhatunitsis𝜎expressed?

1.33,p.37

Refertotheregressionmodel𝑌! = 𝛽! + 𝜀! inExercise1.30Derivetheleastsquaresestimatorof𝛽!forthismodel.

2.4,p.90

RefertoGradePointAverageProblem1.19.

𝑖: 1 2 3 … 118 119 120

𝑋!: 21 14 28 … 28 16 28

𝑌!: 3.897 3.885 3.778 … 3.914 1.860 2.948

Page 5: Math 385/585 Applied Regression Analysis · 2017-09-01 · • Know how to use indicator variables in regression models • Be able to build a model from a pool of variables, using

a.) Obtaina99percentconfidence interval for𝛽!. Interpretyourconfidence interval.Does itinclude zero? Why might the director of admissions be interested in whether theconfidenceintervalincludeszero?

b.) Test,usingtheteststatistic𝑡∗,whetherornotalinearassociationexistsbetweenstudent’sACTscore(𝑋)andGPAattheendofthefreshmanyear(𝑌).Usealevelofsignificanceof0.01.Statethealternatives,decisionrule,andconclusion.

c.) WhatistheP-valueofourtestinpart(b)?Howdoesitsupporttheconclusionreachedinpart(b)?

2.55,p.97

DerivetheexpressionforSSRin(2.51):

𝑆𝑆𝑅 = 𝑏!! 𝑋! − 𝑋 !!! .

Homework2 DueOctober22.23,p.93

RefertoGradePointAverageProblem1.19.

a.) SetuptheANOVAtable.

b.) What isestimatedbyMSR inyourANOVAtable?ByMSE?UnderwhatconditiondoMSRandMSEestimatethesamequantity?

c.) Conduct and 𝐹 test of whether or not 𝛽! = 0. Control the 𝛼 risk at 0.01. State thealternatives,decisionrule,andconclusion.

d.) Whatistheabsolutemagnitudeofthereductioninthevariationof𝑌when𝑋isintroducedintotheregressionmodel?What istherelativereduction?What isthenameofthelattermeasure?

e.) Obtain𝑟andattachtheappropriatesign.

f.) Whichmeasure,𝑅!or𝑟,hasthemoreclear-cutoperationalinterpretation?Explain.

2.67,p.99

RefertoGradePointAverageProblem1.19.

a.) Plot the data, with the least squares regression line for ACT scores between 20 and 30superimposed?

b.) Ontheplotfrompart(a),superimposeaplotofthe95percentconfidencebandforthetrueregressionlineforACTscoresbetween20and30.Doestheconfidencebandsuggestthatthetrueregressionrelationhasbeenpreciselyestimated?Discuss.

Page 6: Math 385/585 Applied Regression Analysis · 2017-09-01 · • Know how to use indicator variables in regression models • Be able to build a model from a pool of variables, using

3.3,p.146-147

RefertoGradePointAverageProblem1.19.

a.) PrepareaboxplotfortheACTscores𝑋!.Arethereanynoteworthyfeaturesinthisplot?

b.) Prepareadotplotoftheresiduals.Whatinformationdoesthisplotprovide?

c.) Plot the residuals𝑒! against the fitted values𝑌!.What departures from regressionmodel(2.1)canbestudiedfromthisplot?Whatareyourfindings?

d.) Prepareanormalprobabilityplotoftheresiduals.Alsoobtainthecoefficientofcorrelationbetween the ordered residuals and their expected values under normality. Test thereasonablenessof thenormalityassumptionhereusingTableB.6and𝛼 = 0.05.Whatdoyouconclude?

e.) Conclude theBrown-Forsythe test todeterminewhetherornot theerror variance varieswith the level of𝑋. Divide the data into the two groups,𝑋 > 26 and𝑋 ≥ 26, and use𝛼 = 0.01. State the decision rule and conclusion. Does your conclusion support yourpreliminaryfindingsinpart(c)?

f.) Information is given below for each student on two variables not included in themodel,namely,intelligencetestscore 𝑋! .

3.21,p.151

Derivetheresultin(3.29):

𝑌!" − 𝑌!"!

!!

!!!

!

!!!

= 𝑌!" − 𝑌!!

!!

!!!

!

!!!

+ 𝑌! − 𝑌!"!

!!

!!!

!

!!!

SSE=SSPE+SSLF

Homework3 DueOctober233.17,p.150-151

Sales growth. A marketing researcher studied annual sales of a product that had beenintroduced10yearsago.Thedataareasfollows,where𝑋istheyear(coded)and𝑌issalesinthousandsofunits:

𝑖: 1 2 3 4 5 6 7 8 9 10

𝑋!: 0 1 2 3 4 5 6 7 8 9

𝑌!: 98 135 162 178 221 232 283 300 374 395

Page 7: Math 385/585 Applied Regression Analysis · 2017-09-01 · • Know how to use indicator variables in regression models • Be able to build a model from a pool of variables, using

a.) Prepareascatterplotofthedata.Doesalinearrelationappearadequatehere?

b.) Use the Box-Cox procedure and standardization (3.36) to find an appropriate powertransformationof𝑌.EvaluateSSEfor𝜆 = 0.3, 0.4, 0.5, 0.6, 0.7.Whattransformationof𝑌issuggested?

c.) Usethetransformation𝑌! = 𝑌andobtaintheestimatedlinearregressionfunctionforthetransformeddata.

d.) Plot the estimated regression line and the transformed data. Does the regression lineappeartobeagoodfittothetransformeddata?

e.) Obtain the residuals and plot them against the fitted values. Also prepare a normalprobabilityplot.Whatdoyourplotsshow?

f.) Expresstheestimatedregressionfunctionintheoriginalunits.

4.21,p.175

Whenthepredictorvariableissocodedthat𝑋 = 0andthenormalerrorregressionmodel(2.1)applies, are𝑏! and𝑏! independent? Are the joint confidence intervals for𝛽! and𝛽! thenindependent?

5.7,p.210

RefertoPlastichardnessProblem1.22.Usingmatrixmethods,find:

1) 𝒀′𝒀

2) 𝑿′𝑿

3) 𝑿′𝒀

5.20,p.211

Findthematrix𝑨ofthequadraticform:7𝑌!! − 8𝑌!𝑌! + 8𝑌!!.

5.26,p.212

RefertoPlastichardnessProblems1.22and5.7.

a) Usingmatrixmethods,obtainthefollowing:

1) 𝑿′𝑿 !!

2) 𝒃

3) 𝒀

4) 𝑯

Page 8: Math 385/585 Applied Regression Analysis · 2017-09-01 · • Know how to use indicator variables in regression models • Be able to build a model from a pool of variables, using

5) SSE

6) 𝒔!{𝒃}

7) 𝑠!{pred}when𝑋! = 30.

b) Frompart(a6),obtainthefollowing:

1) 𝑠! 𝑏!

2) 𝑠 𝑏!,𝑏!

3) 𝑠 𝑏!

c) ObtainthematrixofthequadraticformforSSE.

Homework4 DueNovember106.10,p.249

RefertoGroceryretailerProblem6.9.

a) Fit regressionmodel (6.5) to the data for three predictor variables. State the estimatedregressionfunction.Howare𝑏!,𝑏!,and𝑏!interpretedhere?

b) Obtain the residuals andprepare aboxplotof the residuals.What informationdoes thisplotprovide?

c) Plottheresidualsagainst𝑌,𝑋!,𝑋!,𝑋!,and𝑋!𝑋!onseparategraphs.Alsoprepareanormalprobabilityplot.Interprettheplotsandsummarizeyourfindings.

d) Prepare a time plot of the residuals. Is there any indication that the error terms arecorrelated?Discuss.

e) Dividethe52casesintotwogroups,placingthe26caseswiththesmallestfittedvalues𝑌!into group 1 and the other 26 cases into group 2. Conduct the Brown-Forsythe test forconstancyoftheerrorvariance,using𝛼 = 0.01.Statethedecisionruleandconclusion.

7.4,p.289

RefertoGroceryretailerProblem6.9.

a) Obtaintheanalysisofvariancetablethatdecomposestheregressionsumofsquares intoextrasumsofsquaresassociatedwith𝑋!;withX3,given𝑋!;andwith𝑋!,given𝑋!andX3.

b) Test whether 𝑋! can be dropped from the regression model given that 𝑋! and X3 areretained.Use the𝐹∗ test statisticand𝛼 = 0.05. State thealternatives,decision rule,andconclusion.WhatistheP-valueofthetest?

Page 9: Math 385/585 Applied Regression Analysis · 2017-09-01 · • Know how to use indicator variables in regression models • Be able to build a model from a pool of variables, using

c) Does SSR(𝑋!)+ SSR(𝑋!|𝑋!) equal SSR(𝑋!)+ SSR(𝑋!|𝑋!) here? Must this always be thecase?

7.17,p.290

RefertoGroceryretailerProblem6.9.

a) Transform the variables by means of the correlation transformation (7.44) and fit thestandardizedregressionmodel(7.45).

b) Calculate the coefficients of determination between all pairs of predictor variables. Is itmeaningfulheretoconsiderthestandardizedregressioncoefficientstoreflecttheeffectofonepredictorvariablewhentheothersareheldconstant?

c) Transform the estimated standardized regression coefficients bymeans of (7.53) back totheones for the fitted regressionmodel in theoriginalvariables.Verify that theyare thesameastheonesobtainedinProblem6.10a.

8.16,p.337-338

Refer to Grade point average Problem 1.19. An assistant to the director of admissionconjecturedthatthepredictivepowerofthemodelcouldbeimprovedbyaddinginformationonwhetherthestudenthadchosenamajorfieldofconcentrationatthetimetheapplicationwassubmitted.Assumethatregressionmodel(8.33)isappropriate,where𝑋! isentrancetestscore and 𝑋! = 1 if student had indicated a major field of concentration at the time ofapplicationand0ifthemajorfieldwasundecided.DataforX2wereasfollows:

𝑖: 1 2 3 … 118 119 120

𝑋!!: 0 1 0 … 1 1 0

a) Explainhoweachregressioncoefficientinmodel(8.33)isinterpretedhere.

b) Fittheregressionmodelandstatetheestimatedregressionfunction.

c) Testwhether the𝑋! variable can be dropped from the regressionmodel; use𝛼 = 0.01.Statethealternatives,decisionrule,andconclusion.

d) Obtaintheresiduals for regressionmodel (8.33)andplot themagainst𝑋!𝑋!. Is thereanyevidenceinyourplotthatitwouldbehelpfultoincludeaninteractionterminthemodel?

8.34,p.340

Inaregressionstudy,threetypesofbankswereinvolved,namely,commercial,mutualsavings,andsavingsandloan.Considerthefollowingsystemofindicatorvariablesfortypeofbank:

Typeofbank 𝑋! 𝑋!Commercial 1 0

Mutualsavings 0 1

Page 10: Math 385/585 Applied Regression Analysis · 2017-09-01 · • Know how to use indicator variables in regression models • Be able to build a model from a pool of variables, using

Savingsandloan −1 −1

a) Developafirst-orderlinearregressionmodelforrelatinglastyear’sprofitorloss(𝑌)tosizeofbank(𝑋!)andtypeofbank(𝑋!,𝑋!).

b) Statetheresponsefunctionsforthethreetypesofbanks.

c) Interpreteachofthefollowingquantities;

1) 𝛽!

2) 𝛽!

3) −𝛽! − 𝛽!

Homework5 DueNovember299.15,p.378-379

Kidney function. Creatinineclearance(𝑌) is an importantmeasureof kidney function,but isdifficult to obtain in a clinical office setting because it requires 24-hour urine collection. Todeterminewhetherthismeasurecanbepredictedfromsomedatathatareeasilyavailable,akidneyspecialistobtainedthedatathatfollowfor33malesubjects.Thepredictorvariablesareserumcreatinineconcentration(𝑋!),age(𝑋!),andweight(𝑋!).

a) Prepare separate dot plots for each of the three predictor variables. Are there anynoteworthyfeaturesintheseplots?Comment.

b) Obtainthescatterplotmatrix.Alsoobtainthecorrelationmatrixofthe𝑋variables.Whatdo the scatter plots suggest about the nature of the functional relationship between theresponsevariable𝑌andeachpredictorvariable?Discuss.Areanyseriousmulticollinearityproblemsevident?Explain.

c) Fit themultiple regression function containing the threepredictor variables as first-orderterms.Doesitappearthatallpredictorvariablesshouldberetained?

9.16,p.379

RefertoKidneyfunctionProblem9.15.

a) Usingfirst-orderandsecond-ordertermsforeachofthethreepredictorvariables(centeredaroundthemean)inthepoolofpotential𝑋variables(includingcrossproductsofthefirst-orderterms),findthethreebesthierarchicalsubsetregressionmodelsaccordingtothe𝐶!criterion.

b) Istheremuchdifferencein𝐶!forthethreebestsubsetmodels?

9.19,p.379

RefertoKidneyfunctionProblem9.15.

Page 11: Math 385/585 Applied Regression Analysis · 2017-09-01 · • Know how to use indicator variables in regression models • Be able to build a model from a pool of variables, using

a) Using thesamepoolofpotential𝑋 variablesas inProblem9.16a, find thebest subsetofvariablesaccordingtoforwardstepwiseregressionwith𝛼limitsof0.10and0.15toaddordeleteavariable,respectively.

b) Howdoesthebestsubsetaccordingtoforwardstepwiseregressioncomparewiththebestsubsetaccordingtothe𝑅!,!! criterionobtainedinProblem9.16a?

10.10a,p415

RefertoGroceryretailerProblems6.9and6.10.

a) Obtainthestudentizeddeletedresidualsandidentifyanyoutlying𝑌observations.UsetheBonferronioutliertestprocedurewith𝛼 = 0.05.Statethedecisionruleandconclusion.

Homework6 DueDecember1110.10b-f,p415

RefertoGroceryretailerProblems6.9and6.10.

b) Obtainthediagonalelementsofthehatmatrix.Identifyanyoutlying𝑋observationsusingtheruleofthumbpresentedinthechapter.

c) Managementwishestopredictthetotallaborhoursrequiredtohandlethenextshipmentcontaining 𝑋! = 300,000 cases whose indirect costs of the total hours is 𝑋! = 7.2 and𝑋! = 0 (no holiday in week). Construct a scatter plot of 𝑋! against 𝑋! and determinevisually whether this prediction involves an extrapolation beyond the range of the data.Also,use (10.29) todeterminewhether anextrapolation is involved.Doyour conclusionsfromthetwomethodsagree?

d) Cases16,22,43,and48appeartobeoutlying𝑋observations,andcases10,32,38,and40appear to be outlying𝑌 observations. Obtain the DFFITS,DFBETAS, and Cook’s distancevaluesforeachofthesecasestoassesstheirinfluence.Whatdoyouconclude?

e) Calculatetheaverageabsolutepercentdifferenceinthefittedvalueswithandwithouteachofthesecases.Whatdoesthismeasureindicateabouttheinfluenceofeachofthecases?

f) Calculate Cook’s distance 𝐷! for each case and prepare an index plot. Are any casesinfluentialaccordingtothismeasure?

11.29,p.479

RefertoMuscleMassProblem1.27.

a) Fitatwo-regionregressiontree.Whatisthefirstsplitpointbasedonage?WhatisSSEforthistwo-regiontree?

b) Find the second split point given the two-region tree in part (a). What is SSE for theresultingthree-regiontree?

Page 12: Math 385/585 Applied Regression Analysis · 2017-09-01 · • Know how to use indicator variables in regression models • Be able to build a model from a pool of variables, using

c) Findthethirdsplitpointgiventhethree-regiontreeinpart(b).WhatisSSEfortheresultingfour-regiontree?

d) Prepareascatterplotofthedatawiththefour-regiontreeinpart(c)superimposed.Howwell does the tree fit the data?What does the tree suggest about the change inmusclemasswithage?

e) Preparearesidualplotof𝑒! versus𝑌!forthefour-regiontreeinpart(d).Stateyourfindings.

13.10,p.550

Enzymekinetics. Inanenzymekineticsstudythevelocityofa reaction(𝑌) isexpectedtoberelatedtotheconcentration(𝑋)asfollows:

𝑌! =𝛾!𝑋!𝛾! + 𝑋!

+ 𝜀!

Eighteenconcentrationshavebeenstudiedandtheresultsfollow:

i: 1 2 3 … 16 17 18

𝑋!: 1 1.5 2 … 30 35 40

𝑌!: 2.1 2.5 4.9 … 19.7 21.3 21.6

a) Toobtainstartingvaluesforg0andg1,observethatwhentheerrortermisignoredwehave𝑌!! = 𝛽! + 𝛽!𝑋!!, where 𝑌!! =

!!!, 𝛽! =

!!!, 𝛽! =

!!!! and 𝑋!! =

!!!. Therefore fit a linear

regression function to the transformed data to obtain initial estimates 𝑔!(!) = !

!! and

𝑔!(!) = !!

!!.

b) Using the starting values obtained in part (a), find the least square estimates of theparameters𝛾!and𝛾!.

13.12,p.550

RefertoEnzymekineticsProblem13.10.Assumethatthefittedmodelisappropriateandthatlarge-sampleinferencescanbeemployedhere.

1) Obtainanapproximate95percentconfidenceintervalfor𝛾!.

2) Test whether or not 𝛾! = 20; use 𝛼 = 0.05. State the alternatives, decision rule, andconclusion.


Recommended