+ All Categories
Home > Documents > Statistics Project

Statistics Project

Date post: 22-Feb-2016
Category:
Upload: kanoa
View: 62 times
Download: 0 times
Share this document with a friend
Description:
Statistics Project. By: Rich Miktus, Christopher Geigel, Brandon Butch. 2004 Data - Raw New Jersey Counties. Abuse and Neglect Referrals of Children Special Education Enrollment Number of Child Arrests Average Income of Families with Children Child Poverty Child Population - PowerPoint PPT Presentation
Popular Tags:
22
Statistics Project By: Rich Miktus, Christopher Geigel, Brandon Butch
Transcript
Page 1: Statistics Project

Statistics Project

By: Rich Miktus, Christopher Geigel, Brandon Butch

Page 2: Statistics Project

2004 Data - Raw New Jersey Counties

– Abuse and Neglect Referrals of Children– Special Education Enrollment– Number of Child Arrests– Average Income of Families with Children– Child Poverty– Child Population– Total Population– School Enrollment

Page 3: Statistics Project

VariablesPer Capita

• Abuse• Poverty• Special Education• Income• School Enrollment• Arrests• Population Density

Page 4: Statistics Project

Introduction• One Variable Analysis

– Histograms– Scatterplots– Q-Q Plots

• Two Variable Analysis– Linear models– Regression analysis

• Simple Models– Arrests– School Enrollment

• Residual Diagnostics

Page 5: Statistics Project

One Variable Analysis

• Histograms & Scatterplots– Frequency of occurrences– Skew of data

• Q-Q Plot– Normal distribution

• Usefulness of variables– Real-life relationships– Data flaws

Page 6: Statistics Project

Example HistogramPopulation Density

Page 7: Statistics Project

Example Q-Q PlotSchool Enrollment

Page 8: Statistics Project

Two Variable Analysis

• Correlation Table – used to check initial predictions

• Linear regression line• Residuals• How much do our explanatory variables

matter?

Page 9: Statistics Project

Two Variable Analysis

• More refined analysis to test:– Arrests ~ abuse, special education

enrollment, poverty, school– School enrollment ~ income, poverty, abuse,

population density

Page 10: Statistics Project

Arrests vs. Abuse

• Good linear fit – strong correlation

• Residuals relatively small

• Large F Statistic, small P Value

Page 11: Statistics Project

School vs. Income

• Relationship is very weak

• No strong, overall trend

• Possible weak, positive correlation

Page 12: Statistics Project

Two Variable AnalysisConclusions

• Arrests strongly correlated with abuse, moderately correlated with special education enrollment and poverty, and not correlated with school enrollment

• School enrollment strongly with population density, and not related to income, poverty and abuse

Page 13: Statistics Project

Simple ModelsSchool Enrollment

• Possible variables– Abuse– Income– Poverty– Population density

Page 14: Statistics Project

Problems with School Model• Income and Poverty

– Correlation– Variance Inflation

Factors• Best Regression

– By AIC

• Not enough applicable data

-0.911 Income: 8.489 Poverty: 9.278

School~DensityUnderfittedFlawed variables

Page 15: Statistics Project

Simple ModelsArrests

• Possible variables– Abuse– Special Eduacation– Poverty– Population Density– School enrollment

Page 16: Statistics Project

Problems with Arrests Model

Problem• High correlation and

VIFs with explanatory variables

• Multicollinearity

Fix• Removed Income (too

similar to poverty)

• Proceeded to refine the model and it worked itself out

Page 17: Statistics Project

Arrests Modelchoosing a model

• The Test for best fit– AIC goodness test

• Arrests~Abuse + Special Ed + Poverty + Density + School

• Arrests~Abuse + Special Ed + School

Page 18: Statistics Project

Residual DiagnosticsModel Refinement

• Residual plots led to possible transformation on School

• To choose transform used GAM plots

Page 19: Statistics Project

Residual DiagnosticsModel Refinement

• Used a Cubic transform– Resulted in a higher Adj R squared value– New Model didn’t have normal residuals

– Rejected the model

Page 20: Statistics Project

Residual DiagnosticsModel Refinement

• Box Cox Plot– Lowest near 0– No transform required

Page 21: Statistics Project

Residual DiagnosticsRemoving Outliers

• LRPlot– One obvious non

influential outlier– Easily removed

without damage to the model

Page 22: Statistics Project

Conclusions• Good linear fit between arrests and its

explanatory variables; not so for school enrollment

• Juvenile arrests can be modeled by: Arrests = 2.58 + 0.21(Sped) + 0.95(Abuse) – 0.08(School)

• Not enough appropriate data to make a model for school enrollment

• Improvements– Check correlation of variables earlier– Additional data acquisition


Recommended