+ All Categories
Home > Documents > STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power...

STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power...

Date post: 24-Jul-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
54
STA 6236 Regression Analysis Dr. Mark E. Johnson Fall 2014 1
Transcript
Page 1: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

STA  6236  Regression  Analysis  

Dr.  Mark  E.  Johnson  Fall  2014  

1  

Page 2: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

STA 6236 Regression Analysis

Mark  E.  Johnson*  Fall  2014  

*Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009. Further modified 2010, 2011,2013) 2  

Page 3: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Aside  from  Roster  InformaCon  

Survey:        What  have  you  had  in  staCsCcs  and  what  do  you  expect,  hope  for  this  semester?    

3  

Page 4: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

A  LiOle  MaOer  of  Recall  

Second  sheet  to  be  filled  out  now  if  you  would  be  so  kind    Take  a  quick  look…  

4  

Page 5: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Intros  please  

5  

Page 6: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

At  long  last,  the  exciCng  syllabus  

6  

Page 7: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

1

7  

Page 8: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

     

8  

Page 9: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

9  

Page 10: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

CorrecCon  (first  3  4222’s  should  be  6236)  

10  

Page 11: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

11  

Page 12: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

12  

Page 13: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Addendum  to  the  Syllabus  

•  Latest  version  kept  on  the  web  site  for  the  course  

 

13  

Page 14: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

How  to  Get  JMP  Pro  11  

•  MAP  150D  or  150E  (Math  and  Physics,  Data  Mining  Lab)  

•  Free  for  enrollees  in  my  class  •  Can  be  loaded  on  Mac  and  Non-­‐Mac  machines  •  Bring  laptop  or  PC  to  lab  (must  be  done  on  site)  •  Promised  to  be  on  all  UCF  systems  this  fall…  •  AlternaCve  socware  at  your  own  discreCon,  risk  and  challenges  (provided  it  does  all  the  good  things  JMP  Pro  11  does…good  luck!)  

14  

Page 15: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Sites  for  downloading  are  set  up  

•  Feel  free  to  start  reading  the  text  •  Tutorials  on  JMP  •  Data  sets  from  JMP  •  Data  sets  with  text  

15  

Page 16: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Brief introduction to JMP Pro 11 abbreviated for now other than

miscellaneous illustrations week 1 •  Tutorial as part of the package

– Data files –  ch1ta1

•  Will use in class for all calculations •  You need it to produce output to bring to

quizzes and for assignments to be submitted •  Play with the package, try it out on data,

become proficient •  What does this part of the output tell me?

16  

Page 17: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Let’s  try  one  of  JMP’s  data  sets  

•  Not  my  favorite  in  that  x  and  y  are  generic  (i.e.,  made  up  or  disguised/proprietary)  

•  Just  for  illustraCon  

17  

Page 18: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

August  20,  2014  (Class  #2)  

•  Bubble  plot  (you  tube  video  okay  on  this)  •  Intern  posiCon  given  on  class  web  site  •  JMP  Pro  11  both  MAC  and  non-­‐MAC  loadable  •  Word  or  two  on  the  “stuff  you  goOa  know”  •  Regression  and  Big  Data,  terminology  perspecCve  •  “Regression”    thank  you  Mr.  Galton  •  Regression  to  the  mean  •  Simple  examples,  illustraCon  of  normal  dist.  

18  

Page 19: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

SLOPE.jmp  file  

Not  sure  what    x,  y    represent      L  File  SLOPE.jmp  from  the  sample  data  sets  in  JMP  Slope  of  fiOed  line  looked  “Goofy”  and  potenCally  embarrassing  All  unexpected  results  are  learning  opportuni-es  600  data  points?    Seriously?      Seriously?  …  Repeats  or  close-­‐to-­‐repeats  Finding  the  culprits…    

19  

Page 20: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Demo  type  stuff  

Analyze  distribuCon        mode  of  62  (histogram      not  so  clear)  

Bubble  plot  Stack  x,  y  and  jiOer  the  points  (jiOer  only  for  box  plots  in  one-­‐way)  

20  

Page 21: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Status  of  Stuff…  

•  JMP Pro 11. Site license in play? •  Power point slides to be loaded this weekend.  

21  

Page 22: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Stuff  you  should  know  (eventually)  

Normal  density…an  “easy”  way  to  remember  it  Gamma  distribuCon  (easiest  skewed  distribuCon  to  remember  and  includes  χ 2)  Let’s  look  at  some  plots…  Probability  integral  transform  (well  worth  knowing)  Sample  mean  and  sample  variance  (22;  10)  CLT…    allows  us  to  actually  do  something  when  our  data  is  not  normal  Skewness  as  in  skewed;    kurtosis    not  so  well  known  

22  

Page 23: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

23  

Density  funcCons  for  distribuCons  having  mean  0,  variance  1,  skewness  zero  and  kurtosis  =  3.    (kurtosis  does  not  equal  peakedness.)    

Page 24: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

•  Wikipedia  has  a  lot  of  good  stuff;  knowing  when  the  5%  or  so  that  is  goofy  is  actually  goofy  is  a  challenge.      

24  

Page 25: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

ElecCon  Data  per  Gomez  et  al.  

The  Republicans  Should  Pray  for  Rain:    Weather,  Turnout,  and  VoCng  in  U.S.  PresidenCal  ElecCons  Brad  T.  Gomez  University  of  Georgia,  Thomas  G.  Hansford  University  of  California,  Merced,  George  A.  Krause  University  of  PiOsburgh  

The  Journal  of  Poli-cs,  2007  

25  

Page 26: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Basic  Premises  

Goal  is  to  do  the  analysis  RIGHT  (i.e.,  sans  Cme  constraints  or  need  to  take  short  cuts)  (If  we  had  10,000  variables,  …  )  Brainpower  +  analyCcal  tools  Track  down  data  aberraCons,  issues,  etc.  ElecCon  data  file    guide  to  clean  up  

 26  

Page 27: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

What is Regression?

1. Method of modeling relationships between a response variable Y and one or more predictors X. (also known as dependent/endogenous variable Y and independent/exogenous variables X)

2. A way of “fitting line (or curve) through data”

27  

Page 28: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

ISO  3534-­‐3  terminology  standard  3.3    regression  analysis  

 collecCon  of  procedures  associated  with  assessing  models  relaCng  predictor  variables  to  response  variables    NOTE  1  Regression  analysis  is  commonly  associated  with  the  process  of  esCmaCng  the  parameters  of  anassumed  model  by  opCmizing  the  value  of  an  objecCve  funcCon  (for  example,  minimizing  the  sum  of  squared  differences  between  the  observed  responses  and  those  predicted  by  the  model).  The  existence  of  staCsCcal  socware  packages  has  eliminated  much  of  the  drudgery  in  obtaining  parameter  esCmates,  their  standard  errors,  and  contain  a  wealth  of  model  diagnosCcs.    

28  

Page 29: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

See  New  Work  Item  on  PredicCve  AnalyCcs  to  get  a  sense  of  DirecCon  

PredicCve  analyCcs  encompasses  the  body  of  staCsCcal  knowledge  supporCng  the  analysis  of  massive  data  sets.      Massive  data  sets  

 automated  data  collecCon  associated  with  remote  sensing      transacConal  (on-­‐line)  purchases      web  site  browsing  and  viewing  paOerns      social  media  (networks  and  interacCons)  

Goal:      Extract  useful  informaCon  (=  acConable  items)    Challenges:    1000s  of  explanatory  variables,  unstructured  data  OpportuniCes:    (sufficient  data  for  validaCng  models)    Getng  started:    core  staCsCcal  methodologies  (e.g.,  regression  analysis)  remain  highly  relevant  although  the  usual  emphasis  on  inference  and  hypothesis  tesCng  gives  way  to  esCmaCon  and  predicCon.        Massive  data  sets  has  forced  pracCConers  to  rethink  methodologies:    assess  the  strengths  and  limitaCons;  extend  where  possible  or  to  develop  new  techniques  to  take  advantage  of  or  to  cope  with  the  data  size  magnitudes  

29  

Page 30: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Why regression analysis? Basic Objectives please.

•  Relationship between variables (response variable with explanatory variables; dependent with independent variables)

•  Prediction (BIG DATA driver) •  Identify key variables of interest •  Clean up data set (data preparation) •  Test scientific hypotheses (not direct jump

to validate prior hypotheses)

30  

Page 31: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Regression—Two Extreme “Schools” of Thought Bracket the Area

•  Don’t try this at home, I am a professional: –  Tell me what you have done and I will gleefully point

out all of the errors, misunderstandings, and so forth. Only experts should be permitted to apply regression techniques, let alone use sophisticated software such as JMP Pro 11. Otherwise, only junk will be produced.

•  Give it a try, what can possibly go wrong: –  One will always learn something from a detailed

regression analysis. Thank goodness for JMP Pro 11 to eliminate the drudgery in the computations. Try your best and you can always ask for forgiveness before re-running your analysis with a friendly expert’s help.

31  

Page 32: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Why “Regression?” Regretable term to some extent! Common usage… Galton, late 1800s: Average height of sons—at a given

fathers height—tends to “regress” toward the mean of the population (mediocrity)

5.0 5.5 6.0 6.5

5.0

5.5

6.0

6.5

Fathers

Sons

Pop Average

Y=X

Regression

32  

Page 33: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Regression  to  Mean  Gets  “Rediscovered”  Periodically  

•  Note  story  from  Jordan  Ellenberg’s  excellent  liOle  book:    How  Not  to  be  Wrong  in  Mathema-cal  Thinking,  Penguin  Press  2014.  

33  

Page 34: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

34  

Page 35: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

More  Galton  Stuff  •  Galton  devoted  much  of  his  life  to  the  study  of  variaCon  in  

human  populaCons  and  it  was  during  his  studies  about  heredity  (the  passing  of  traits  from  parents  to  their  offspring)  that  he  introduced  the  concept  of  regression.    However,  he  did  not  use  this  term  as  staCsCcians  do  now  (when  referring  to  the  fitng  of  linear  relaConships);  instead  he  was  referring  to  a  very  specific  staCsCcal  phenomenon  known  as  regression  to  the  mean.  

•  InvesCgaCng  the  relaConship  between  the  heights  of  parents  and  their  children,  Galton  ploOed  the  heights  of  930  children  who  had  reached  adulthood  against  the  mean  height  of  their  parents.    To  account  for  differences  due  to  gender  he  increased  female  heights  by  a  factor  of  1.08.    

35  

Page 36: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

•   “It  appeared  from  these  experiments  that  the  offspring  did  not  tend  to  resemble  their  parents  in  size,  but  always  to  be  more  mediocre  than  they  –  to  be  smaller  than  the  parents,  if  the  parents  were  large;  to  be  larger  than  the  parents,  if  the  parents  were  small.”  

36  

Page 37: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Horace  Secrist  

•  Prof.  of  stat  at  Northwestern,  Dir.  Bureau  for  Business  Research  

•  From  1920,  collected  and  compiled  massive  data  on  businesses  to  determine  who  fails/wins  

•  In  1933  The  Triumph  of  Mediocrity  in  Business  – Extremes  (good  or  bad)  headed  for  the  middle  

•  So  the  Great  Depression  was  like  inevitable?    

37  

Page 38: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Another  interpretaCon  

•  Height  effected  by  many  things  (geneCcs,  nutriCon,  lots  of  liOle  random,  lucky  things)  

•  Same  for  businesses…extremes  somewhat  lucky  and  even  though  they  had  superior  methods/pracCces,  others  could  have  luck  in  their  favor  as  well  (random  fluctuaCons  in  Cme)  

•  Think  of  several  people  flipping  coins,  ask  the  most  heads  and  least  heads  to  flip  again…  

38  

Page 39: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Hotelling  enters  the  fray…  •  Alg.  Topologist,  bright  guy  (monopoly  in  his  head)  •  His  take  on  Secrist:    “The  labor  of  compilaCon  and  of  direct  collecCon  of  data  must  have  been  giganCc”  however,  all  of  these  tables  and  graphs  merely  “prove  nothing  more  than  that  the  raCos  in  quesCon  have  a  tendency  to  wander  about”    +    “mathemaCcally  obvious  from  general  consideraCons    and  does  not  need  the  vast  accumulaCon  of  data  adduced  to  prove  it”      results  should  work  backward  in  Cme,  but  they  don’t  

•  In  other  words,  Secrist  had  wasted  ten  years  of  his  life  

39  

Page 40: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

•  Also,  published  in  JASA,  but  Secrist  really  didn’t  get  it  •  So  Hotelling  stops  being  Mr.  Nice  Guy:  •  “The  thesis  of  the  book  when  correctly  interpreted  is  essenCally  

trivial….      •  To  ‘prove’  such  a  mathemaCcal  result  by  a  costly  and  prolonged  

numerical  study  of  many  kinds  of  business  profit  and  expense  raCos  is  analogous  to  proving  the  mulCplicaCon  table  by  arranging  elephants  in  rows  and  columns,  and  doing  the  same  for  numerous  other  kinds  of  animals.  

•  The  performance,  though  perhaps  entertaining,  and  having  a  certain  pedagogical  value,  is  not  an  important  contribuCon  

•  Either  to  zoology  or  mathemaCcs”  

40  

Page 41: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Brownlee  stack  loss  data  

•  Once  upon  a  Cme  the  profession  focused  its  aOenCon  on  a  liOle  data  set  of  21  observaCons  collected  before  color  televisions  became  available  for  homes!  

41  

Page 42: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Example 1: Store Site Selection

Model sales Y at existing sites as a function of demographic variables:

X1 = Population in store vicinity X2 = Income in area X3 = Age of houses in area X4 = Unemployment rate X5 = Traffic data

From equation, predict sales at new sites 42  

Page 43: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Example 2: Marketing Research

Model  consumer  response  to  a  product  on  basis  of  product  characterisCcs:    

 Y  =  Taste  score  on  soc  drink    

 X1  =  Sugar  level    X2  =  CarbonaCon  level    X3  =  Ice/No  Ice    X4  =  _______  

   

43  

Page 44: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Example 2: Marketing Research

Model  consumer  response  to  a  product  on  basis  of  product  characterisCcs:    

 Y  =  Taste  score  on  soc  drink    

 X1  =  Sugar  level    X2  =  CarbonaCon  level    X3  =  Ice/No  Ice    X4  =  sugar  content    X5  =        cost  

   

44  

Page 45: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Example 3: Arson Forensics Understand burning processes for baselines of arson investigations

Y’s = time until flame out, max temp. in room, time until max temp. reached, average temp. in room overall and at individual locations, depth of char and bubble size throughout the room

X1 = Fuel type: Gasoline or Kerosene mix X2 = Ignitable fuel amount X3 = Ignitable fuel placement X4 = Additional materials on sofa X5 = Window 1 openness X6 = Window 2

45  

Page 46: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Example 4: Real Estate Pricing

Y = Selling price of houses X1 = Square footage X2 = Taxes X3 = Lot acreage X4 = Houses in area foreclosed X5 = Rating of neighborhood school X6 = Distance/time to downtown

46  

Page 47: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

One-Predictor Regression (Chris Nachtsheim example)

533 Homes Sold in Minnetonka, MN 2001

47  

Page 48: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

One-Predictor Regression

500 1500 2500 3500 4500

0

500000

1000000

SqFt

Pric

e

Price = -1957.83 + 158.950 SqFt

S = 79122.9 R-Sq = 67.2 % R-Sq(adj) = 67.1 %

Regression Plot

48  

Page 49: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

One-Predictor Regression

500 1500 2500 3500 4500

0

500000

1000000

SqFt

Pric

e

Price = -1957.83 + 158.950 SqFt

S = 79122.9 R-Sq = 67.2 % R-Sq(adj) = 67.1 %

Regression Plot

49  

Page 50: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Orlando Housing Market

•  Bad news on housing diminishing, supposedly better the past year or so

•  Zillow site for recent sales •  Last year, looked at previous 30 days, sold

price, square footage, #bedrooms, baths, taxes

50  

Page 51: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

51  

Page 52: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Little Demo in JMP Pro 11 with this data

•  Getting data into a data table (steps skipped to go from Zillow to JMP)

•  Looking at the data •  Sales price as a function of variables •  Multivariate •  Predictive model? •  Model for understanding? •  Extent of generalizing possible….

52  

Page 53: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Demo…  

•  Pull  data  into  JMP  Regress  (v.)    Price  on  sqc  Check  out  fit(s)  (with/without  various  data  points;  formula  funcCon)  

53  

Page 54: STA6236$ Regression$Analysis$...STA 6236 Regression Analysis MarkE.%Johnson*% Fall%2014$ *Power point slides modified from those developed by C. Nachtsheim, 2007, adapted 2008, 2009.

Used  ZILLOW  for  2013  recent  data  

•  ZILLOW  not  great  for  downloads…  •  65  observaCons  

– 4+  bedrooms  – 3+  bathrooms  –   non-­‐missing  lot  size  – Orlando  area  – “pool”  

54  


Recommended