Statistics Project
By: Rich Miktus, Christopher Geigel, Brandon Butch
2004 Data - Raw New Jersey Counties
– Abuse and Neglect Referrals of Children– Special Education Enrollment– Number of Child Arrests– Average Income of Families with Children– Child Poverty– Child Population– Total Population– School Enrollment
VariablesPer Capita
• Abuse• Poverty• Special Education• Income• School Enrollment• Arrests• Population Density
Introduction• One Variable Analysis
– Histograms– Scatterplots– Q-Q Plots
• Two Variable Analysis– Linear models– Regression analysis
• Simple Models– Arrests– School Enrollment
• Residual Diagnostics
One Variable Analysis
• Histograms & Scatterplots– Frequency of occurrences– Skew of data
• Q-Q Plot– Normal distribution
• Usefulness of variables– Real-life relationships– Data flaws
Example HistogramPopulation Density
Example Q-Q PlotSchool Enrollment
Two Variable Analysis
• Correlation Table – used to check initial predictions
• Linear regression line• Residuals• How much do our explanatory variables
matter?
Two Variable Analysis
• More refined analysis to test:– Arrests ~ abuse, special education
enrollment, poverty, school– School enrollment ~ income, poverty, abuse,
population density
Arrests vs. Abuse
• Good linear fit – strong correlation
• Residuals relatively small
• Large F Statistic, small P Value
School vs. Income
• Relationship is very weak
• No strong, overall trend
• Possible weak, positive correlation
Two Variable AnalysisConclusions
• Arrests strongly correlated with abuse, moderately correlated with special education enrollment and poverty, and not correlated with school enrollment
• School enrollment strongly with population density, and not related to income, poverty and abuse
Simple ModelsSchool Enrollment
• Possible variables– Abuse– Income– Poverty– Population density
Problems with School Model• Income and Poverty
– Correlation– Variance Inflation
Factors• Best Regression
– By AIC
• Not enough applicable data
-0.911 Income: 8.489 Poverty: 9.278
School~DensityUnderfittedFlawed variables
Simple ModelsArrests
• Possible variables– Abuse– Special Eduacation– Poverty– Population Density– School enrollment
Problems with Arrests Model
Problem• High correlation and
VIFs with explanatory variables
• Multicollinearity
Fix• Removed Income (too
similar to poverty)
• Proceeded to refine the model and it worked itself out
Arrests Modelchoosing a model
• The Test for best fit– AIC goodness test
• Arrests~Abuse + Special Ed + Poverty + Density + School
• Arrests~Abuse + Special Ed + School
Residual DiagnosticsModel Refinement
• Residual plots led to possible transformation on School
• To choose transform used GAM plots
Residual DiagnosticsModel Refinement
• Used a Cubic transform– Resulted in a higher Adj R squared value– New Model didn’t have normal residuals
– Rejected the model
Residual DiagnosticsModel Refinement
• Box Cox Plot– Lowest near 0– No transform required
Residual DiagnosticsRemoving Outliers
• LRPlot– One obvious non
influential outlier– Easily removed
without damage to the model
Conclusions• Good linear fit between arrests and its
explanatory variables; not so for school enrollment
• Juvenile arrests can be modeled by: Arrests = 2.58 + 0.21(Sped) + 0.95(Abuse) – 0.08(School)
• Not enough appropriate data to make a model for school enrollment
• Improvements– Check correlation of variables earlier– Additional data acquisition