Date post: | 05-Dec-2014 |
Category: |
Technology |
Upload: | maisam-shahid-wasti |
View: | 507 times |
Download: | 0 times |
EE517 Project Presentation
Maisam Shahid Wasti and Dennis Hartono
Biking wrong way at Trousdale
We found that it is possible to predict wrong way violations at Trousdale Parkway, USC
Campus
Data Collection
Collected 14 hours of data with total sample size of 2837
Decision Rule to classify violators
Established a consistent decision rule for sample validation
Overview of the observation site
Five minutes slot following class ending times
Observed higher proportion of violators for few minutes after the classes end
Name: 5min_after
Five minutes slot following class ending times
Interpretation of important variables
Name '5min_after'Type Binary
Description Counted '1' if sample observed within 5 minutes slot following the class ending times
Used ‘bodyweight’ as a binary measure for speed
Interpretation of important variables
Approach to model selection
Refined our model in three stages
All non-interaction termsAll non-interaction terms
Initial Model with selective non interaction terms
Initial Model with selective non interaction terms
+ (n,C,2) second order terms
+ (n,C,2) second order terms
Intermediate Model Intermediate Model Backward-LR
Backward-LR
Final Model after removing terms causing
Multicollinearity
Final Model after removing terms causing
Multicollinearity
Filtration
LOGISTIC REGRESSION
Variables in final model
Found significant independent variables
We interpret bag as an indicator for student on campus
Variables Significance
Gender .003
Bag * Sportswear .010
Bag * Bodyweight .002
Bag * 5min_after .001
Non-Interaction Terms
Interaction Terms
Model evaluationTest Statistic Significance
Omnibus 33.518 0.000
Cox and Snell R2 0.015
Nagelkerke R2 0.024
• Observed significant improvement in Log-Likelihood through Omnibus test
• Model suffered from low R2 values
Multicollinearity Test 1
Found no serious multi-collinearity issues (>0.3)
with highest correlation coefficient of magnitude 0.186
Correlation Matrix Gender
Bag * Sportswear
Bag * Body_weight
Bag * 5min_after
Gender 1.000 -.085 -.186 .041
Bag * Sportswear -.085 1.000 -.028 -.015
Bag * Body_weight -.186 -.028 1.000 -.007
Bag * 5min_after .041 -.015 -.007 1.000
Multicollinearity Test 2
Observed Standard Errors to be bounded by maximum of 0.258
Variables B S.E. Wald Sig. Exp(B)
Gender -.335 .112 8.936 .003 .715
Bag * Sportswear .667 .258 6.671 .010 1.949
Bag * Bodyweight .798 .258 9.605 .002 2.222
Bag * 5min_after .409 .120 11.533 .001 1.506
Residual Analysis
Observed no residuals lying above 2 standard deviation
Challenges with Classification Accuracy
- Have a skewed class distribution
- Resulting in high baseline accuracy - Difficult to improve much from the high baseline
accuracy
Violators Non-Violators0
500
1000
1500
2000
2500
522
2315
18.3 %
81.7 %
Predicted Probabilities Histograms
- Observed significant overlap- The default 0.5 gave a bad cut-off threshold
Violators Non-Violators
Classification Tables
The overall classification accuracy remains the same with increased prediction power for
violations
Ground Truth
Baseline Classification Table
Prediction
Wrong Way Violation Percentage CorrectNot Violating Violating
Not Violating 465 0 100.0
Violating 107 0 0.0
Total 572 0 81.3
Ground Truth
Classification Table with 0.35 Cut-off
Prediction
Wrong Way Violation Percentage CorrectNot Violating Violating
Not Violating 463 2 99.6
Violating 105 2 1.9
Total 568 4 81.3
The ROC Graph
Observed to be better at predicting violations than the baseline at Cut-off = 0.35
Multiway Cross-tabulation tests
Lacking significant number of violators for few cases
Wrong_Way_Violation * Food_or_Beverages Crosstabulation
Count
Food_or_Beverages
TotalWithout food or beverage
With food or beverage
Wrong_Way_Violation Not Violating 2278 37 2315
Violating 511 11 522
Total 2789 48 2837
Wrong_Way_Violation * Formal_Dressing Crosstabulation
Count
Formal_Dressing
TotalNot in formal
dress In formal dressWrong_Way_Violation Not Violating 2268 47 2315
Violating 510 12 522
Total 2778 59 2837
Wrong_Way_Violation * Helmet Crosstabulation
Count
Helmet
TotalNot wearing
helmetWearing helmet
Wrong_Way_Violation Not Violating 2299 16 2315
Violating 520 2 522
Total 2819 18 2837
Wrong_Way_Violation * Hoodie Crosstabulation
Count
Hoodie
TotalNot wearing
hoodieWearing hoodie
Wrong_Way_Violation Not Violating 2286 29 2315
Violating 517 5 522
Total 2803 34 2837
Other classifiers
Experimented other classifiers to achieve a slight increase in overall accuracy
Classifier Accuracy %
Baseline 81.3
Logistic Regression 81.3
Parzen windows 81.64
Linear Perceptron 81.29
K-Nearest Neighbors 81.29
Logistic Regression vs. Parzen Windows
Achieved slightly improved TPR/FPR and overall classification accuracy using Parzen Windows
Ground Truth
Parzen Window
Prediction
Wrong Way Violation Percentage CorrectNot Violating Violating
Not Violating 463 0 100
Violating 105 2 1..9
Total 570 2 81.64
Ground Truth
Logistic Regression (0.35)
Prediction
Wrong Way Violation Percentage CorrectNot Violating Violating
Not Violating 463 2 99.6
Violating 105 2 1.9
Total 568 4 81.30
Questions…