Prediction of wrong way bike violators at USC using binary logistic regression by recording sample...

EE517 Project Presentation

Maisam Shahid Wasti and Dennis Hartono

Biking wrong way at Trousdale

We found that it is possible to predict wrong way violations at Trousdale Parkway, USC

Campus

Data Collection

Collected 14 hours of data with total sample size of 2837

Decision Rule to classify violators

Established a consistent decision rule for sample validation

Overview of the observation site

Five minutes slot following class ending times

Observed higher proportion of violators for few minutes after the classes end

Name: 5min_after

Five minutes slot following class ending times

Interpretation of important variables

Name '5min_after'Type Binary

Description Counted '1' if sample observed within 5 minutes slot following the class ending times

Used ‘bodyweight’ as a binary measure for speed

Interpretation of important variables

Approach to model selection

Refined our model in three stages

All non-interaction termsAll non-interaction terms

Initial Model with selective non interaction terms

Initial Model with selective non interaction terms

+ (n,C,2) second order terms

+ (n,C,2) second order terms

Intermediate Model Intermediate Model Backward-LR

Backward-LR

Final Model after removing terms causing

Multicollinearity

Final Model after removing terms causing

Multicollinearity

Filtration

LOGISTIC REGRESSION

Variables in final model

Found significant independent variables

We interpret bag as an indicator for student on campus

Variables Significance

Gender .003

Bag * Sportswear .010

Bag * Bodyweight .002

Bag * 5min_after .001

Non-Interaction Terms

Interaction Terms

Model evaluationTest Statistic Significance

Omnibus 33.518 0.000

Cox and Snell R2 0.015

Nagelkerke R2 0.024

• Observed significant improvement in Log-Likelihood through Omnibus test

• Model suffered from low R2 values

Multicollinearity Test 1

Found no serious multi-collinearity issues (>0.3)

with highest correlation coefficient of magnitude 0.186

Correlation Matrix Gender

Bag * Sportswear

Bag * Body_weight

Bag * 5min_after

Gender 1.000 -.085 -.186 .041

Bag * Sportswear -.085 1.000 -.028 -.015

Bag * Body_weight -.186 -.028 1.000 -.007

Bag * 5min_after .041 -.015 -.007 1.000

Multicollinearity Test 2

Observed Standard Errors to be bounded by maximum of 0.258

Variables B S.E. Wald Sig. Exp(B)

Gender -.335 .112 8.936 .003 .715

Bag * Sportswear .667 .258 6.671 .010 1.949

Bag * Bodyweight .798 .258 9.605 .002 2.222

Bag * 5min_after .409 .120 11.533 .001 1.506

Residual Analysis

Observed no residuals lying above 2 standard deviation

Challenges with Classification Accuracy

- Have a skewed class distribution

- Resulting in high baseline accuracy - Difficult to improve much from the high baseline

accuracy

Violators Non-Violators0

500

1000

1500

2000

2500

522

2315

18.3 %

81.7 %

Predicted Probabilities Histograms

- Observed significant overlap- The default 0.5 gave a bad cut-off threshold

Violators Non-Violators

Classification Tables

The overall classification accuracy remains the same with increased prediction power for

violations

Ground Truth

Baseline Classification Table

Prediction

Wrong Way Violation Percentage CorrectNot Violating Violating

Not Violating 465 0 100.0

Violating 107 0 0.0

Total 572 0 81.3

Ground Truth

Classification Table with 0.35 Cut-off

Prediction



Violating 105 2 1.9

Total 568 4 81.3

The ROC Graph

Observed to be better at predicting violations than the baseline at Cut-off = 0.35

Multiway Cross-tabulation tests

Lacking significant number of violators for few cases

Wrong_Way_Violation * Food_or_Beverages Crosstabulation

Count

Food_or_Beverages

TotalWithout food or beverage

With food or beverage

Wrong_Way_Violation Not Violating 2278 37 2315

Violating 511 11 522

Total 2789 48 2837

Wrong_Way_Violation * Formal_Dressing Crosstabulation

Count

Formal_Dressing

TotalNot in formal

dress In formal dressWrong_Way_Violation Not Violating 2268 47 2315

Violating 510 12 522

Total 2778 59 2837

Wrong_Way_Violation * Helmet Crosstabulation

Count

Helmet

TotalNot wearing

helmetWearing helmet


Violating 520 2 522

Total 2819 18 2837

Wrong_Way_Violation * Hoodie Crosstabulation

Count

Hoodie

TotalNot wearing

hoodieWearing hoodie


Violating 517 5 522

Total 2803 34 2837

Other classifiers

Experimented other classifiers to achieve a slight increase in overall accuracy

Classifier Accuracy %

Baseline 81.3

Logistic Regression 81.3

Parzen windows 81.64

Linear Perceptron 81.29

K-Nearest Neighbors 81.29

Logistic Regression vs. Parzen Windows

Achieved slightly improved TPR/FPR and overall classification accuracy using Parzen Windows

Ground Truth

Parzen Window

Prediction


Not Violating 463 0 100

Violating 105 2 1..9

Total 570 2 81.64

Ground Truth

Logistic Regression (0.35)

Prediction



Violating 105 2 1.9

Total 568 4 81.30

Questions…

Date post:	05-Dec-2014
Category:	Technology
Upload:	maisam-shahid-wasti
View:	507 times
Download:	0 times

Prediction of wrong way bike violators at USC using binary logistic regression by recording sample...

Technology