+ All Categories
Home > Technology > Prediction of wrong way bike violators at USC using binary logistic regression by recording sample...

Prediction of wrong way bike violators at USC using binary logistic regression by recording sample...

Date post: 05-Dec-2014
Category:
Upload: maisam-shahid-wasti
View: 507 times
Download: 0 times
Share this document with a friend
Description:
We found out that we can predict the probabilities of a bike rider committing the wrong way violation at Trousdale Parkway using logistic regression. We recorded individuals on bike from 15th April to 18th April, 2013. Out of the fifteen different variables that we measured the information of gender, time, type of clothing, presence of bag, and our binary measure for speed gave us a statistically significant prediction model.
21
EE517 Project Presentation Maisam Shahid Wasti and Dennis Hartono
Transcript
Page 1: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013

EE517 Project Presentation

Maisam Shahid Wasti and Dennis Hartono

Page 2: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013

Biking wrong way at Trousdale

We found that it is possible to predict wrong way violations at Trousdale Parkway, USC

Campus

Page 3: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013

Data Collection

Collected 14 hours of data with total sample size of 2837

Page 4: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013

Decision Rule to classify violators

Established a consistent decision rule for sample validation

Page 5: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013

Overview of the observation site

Five minutes slot following class ending times

Observed higher proportion of violators for few minutes after the classes end

Page 6: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013

Name: 5min_after

Five minutes slot following class ending times

Interpretation of important variables

Name '5min_after'Type Binary

Description Counted '1' if sample observed within 5 minutes slot following the class ending times

Page 7: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013

Used ‘bodyweight’ as a binary measure for speed

Interpretation of important variables

Page 8: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013

Approach to model selection

Refined our model in three stages

All non-interaction termsAll non-interaction terms

Initial Model with selective non interaction terms

Initial Model with selective non interaction terms

+ (n,C,2) second order terms

+ (n,C,2) second order terms

Intermediate Model Intermediate Model Backward-LR

Backward-LR

Final Model after removing terms causing

Multicollinearity

Final Model after removing terms causing

Multicollinearity

Filtration

LOGISTIC REGRESSION

Page 9: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013

Variables in final model

Found significant independent variables

We interpret bag as an indicator for student on campus

Variables Significance

Gender .003

Bag * Sportswear .010

Bag * Bodyweight .002

Bag * 5min_after .001

Non-Interaction Terms

Interaction Terms

Page 10: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013

Model evaluationTest Statistic Significance

Omnibus 33.518 0.000

Cox and Snell R2 0.015

Nagelkerke R2 0.024

• Observed significant improvement in Log-Likelihood through Omnibus test

• Model suffered from low R2 values

Page 11: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013

Multicollinearity Test 1

Found no serious multi-collinearity issues (>0.3)

with highest correlation coefficient of magnitude 0.186

Correlation Matrix Gender

Bag * Sportswear

Bag * Body_weight

Bag * 5min_after

Gender 1.000 -.085 -.186 .041

Bag * Sportswear -.085 1.000 -.028 -.015

Bag * Body_weight -.186 -.028 1.000 -.007

Bag * 5min_after .041 -.015 -.007 1.000

Page 12: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013

Multicollinearity Test 2

Observed Standard Errors to be bounded by maximum of 0.258

Variables B S.E. Wald Sig. Exp(B)

Gender -.335 .112 8.936 .003 .715

Bag * Sportswear .667 .258 6.671 .010 1.949

Bag * Bodyweight .798 .258 9.605 .002 2.222

Bag * 5min_after .409 .120 11.533 .001 1.506

Page 13: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013

Residual Analysis

Observed no residuals lying above 2 standard deviation

Page 14: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013

Challenges with Classification Accuracy

- Have a skewed class distribution

- Resulting in high baseline accuracy - Difficult to improve much from the high baseline

accuracy

Violators Non-Violators0

500

1000

1500

2000

2500

522

2315

18.3 %

81.7 %

Page 15: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013

Predicted Probabilities Histograms

- Observed significant overlap- The default 0.5 gave a bad cut-off threshold

Violators Non-Violators

Page 16: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013

Classification Tables

The overall classification accuracy remains the same with increased prediction power for

violations

Ground Truth

Baseline Classification Table

 

Prediction

Wrong Way Violation Percentage CorrectNot Violating Violating

Not Violating 465 0 100.0

Violating 107 0 0.0

Total 572 0 81.3

Ground Truth

Classification Table with 0.35 Cut-off

 

Prediction

Wrong Way Violation Percentage CorrectNot Violating Violating

Not Violating 463 2 99.6

Violating 105 2 1.9

Total 568 4 81.3

Page 17: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013

The ROC Graph

Observed to be better at predicting violations than the baseline at Cut-off = 0.35

Page 18: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013

Multiway Cross-tabulation tests

Lacking significant number of violators for few cases

Wrong_Way_Violation * Food_or_Beverages Crosstabulation

Count

Food_or_Beverages

TotalWithout food or beverage

With food or beverage

Wrong_Way_Violation Not Violating 2278 37 2315

Violating 511 11 522

Total 2789 48 2837

Wrong_Way_Violation * Formal_Dressing Crosstabulation

Count

Formal_Dressing

TotalNot in formal

dress In formal dressWrong_Way_Violation Not Violating 2268 47 2315

Violating 510 12 522

Total 2778 59 2837

Wrong_Way_Violation * Helmet Crosstabulation

Count

Helmet

TotalNot wearing

helmetWearing helmet

Wrong_Way_Violation Not Violating 2299 16 2315

Violating 520 2 522

Total 2819 18 2837

Wrong_Way_Violation * Hoodie Crosstabulation

Count

Hoodie

TotalNot wearing

hoodieWearing hoodie

Wrong_Way_Violation Not Violating 2286 29 2315

Violating 517 5 522

Total 2803 34 2837

Page 19: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013

Other classifiers

Experimented other classifiers to achieve a slight increase in overall accuracy

Classifier Accuracy %

Baseline 81.3

Logistic Regression 81.3

Parzen windows 81.64

Linear Perceptron 81.29

K-Nearest Neighbors 81.29

Page 20: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013

Logistic Regression vs. Parzen Windows

Achieved slightly improved TPR/FPR and overall classification accuracy using Parzen Windows

Ground Truth

Parzen Window

 

Prediction

Wrong Way Violation Percentage CorrectNot Violating Violating

Not Violating 463 0 100

Violating 105 2 1..9

Total 570 2 81.64

Ground Truth

Logistic Regression (0.35)

 

Prediction

Wrong Way Violation Percentage CorrectNot Violating Violating

Not Violating 463 2 99.6

Violating 105 2 1.9

Total 568 4 81.30

Page 21: Prediction of wrong way bike violators at USC using binary logistic regression by recording sample size of 2800+ on campus, Statistics for Engineers, USC Spring 2013

Questions…


Recommended