On Computing Probabilities of Dismissal of 10b-5Securities Class-Action Cases
Sumanta Singha, Steve Hillmer, Prakash P. Shenoy
School of BusinessUniversity of Kansas
Capitol Federal Hall, 1654 Naismith DriveLawrence, KS 66045 USA
October 21, 2016
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 1 / 26
Outline
1 Introduction
2 Objective and Related Works
3 Data
4 Feature Selection and Analysis
5 Results
6 Conclusions
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 2 / 26
Introduction
Securities class-action violations are lawsuits filed by investors orshareholders against corporations.
Some common allegations are fraudulent disclosure, misleadingforecast, violation of securities laws, insider trading and financialrestatements.
Some statistics:
Number of securities class-action filings since 1996 : 4,100
1 in every 18 S&P 500 companies face class-action litigation.
$87 billion has been dispensed in settlements.
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 3 / 26
Class-action Litigation Process
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 4 / 26
Objective
To identify features that are significant for dismissal/non-dismissal of10b-5 securities class-action lawsuit.
To propose a model that predicts the probability of dismissal based onthese features.
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 5 / 26
Related Works
Two types of literatures:
Baker et al. [2007, 2009]; Cox et al. [2006, 2008]; Johnson et al.[2007]: Make qualitative arguments which features are importantfrom legal viewpoint.
Pitchard et al. [2005]; McShane et al. [2012]: Focus on predictingthe probability of dismissal/settlement in a class-action case.
In this paper, we propose a hybrid model of Naıve Bayes (NB) andLogistic Regression (LR).
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 6 / 26
Assumptions of NB, LR, and Hybrid LR-NB method
NB Model:
Conditional independence of the predictors given the class variable.
Can not incorporate non-parametric continuous features.
LR Model:
Log odds of class variable is a linear function of features.
Can not incorporate features with missing values.
Hybrid Model:
Features in the LR part are independent of features in NB part giventhe class.
Features in the NB part are conditionally independent given the class.
Can simultaneously handle missing value and continuous predictors
Inference Method : Prior of the naıve Bayes is replaced by posteriorof logistic regression.
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 7 / 26
Hybrid Model
O(C = d | f, e) = eβ0+∑m
i=1 βi fi
n∏j=1
L(C = d , ej) (1)
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 8 / 26
Data Preparation: A New Notion of Dismissal
Total instances: 925 (# Dismissed: 414; # Not dismissed: 511)Training to Test Ratio: 90:10Source: SCAC, Stanford Law School; between 2002-2010.
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 9 / 26
Features
9 important features were selected for the analysis. They are:
1 GP = GAAP violations (1) or not (0)
2 SI = SEC Investigation (1) or not (0)
3 II = lead plaintiff is institutional investor (1)or not (0)
4 BR = defendant filed for bankruptcy (1) or not (0)
5 IS = insider selling (1) or not (0)
6 IC = lack of internal control (1) or not (0)
7 S11 = Section-11 violations (1) or not (0)
8 RF = Restated the company financial (1) or not (0)
9 STD = sudden short-term (one to five working days) drop in shareprice
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 10 / 26
5-Step Procedure
1 Markov blanket estimation for the class ‘dismissed’ as initial step forfeature selection.
2 Supervised discretization of any continuous feature, necessary only forNB model.
3 Searching for the best LR and best NB Models, from the set offeatures in Step 1; using 8 fold CV on training set and RMSE asperformance metrics.
4 Searching for the best hybrid model using a heuristic. This minimizessearch space from 602 models to 30 models.
5 Estimating out-of-sample error and confidence interval for the besthybrid model using 1000 non-parametric bootstrap re-samples of thetest set.
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 11 / 26
What is Markov blanket?
A variable’s Markov Blanket contains its parents, children, andco-parents of the children. It can be shown that a node is conditionallyindependent of all other nodes in the network given its Markov blanket.
Figure: Markov Blanket
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 12 / 26
Step 1: Markov Blanket Estimation
40 Markov blankets are estimated using 4 constrained-basedalgorithms and 10 different CI tests.
Union of all 40 MBs is considered as MB for class ‘dismissed’, whichis equivalent to removing features that all MBs agree as irrelevant.
The Markov blanket contains 6 features: (i) GP, (ii) RF , (iii) IC , (iv)S11, (v) BR, and (vi) STD.
In the next step, best LR and best NB model are searched from theset of 6 features.
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 13 / 26
Step 2: Discretization of Continuous Feature STD
FR
EQ
UE
NC
Y
SHORT TERM DROP
" Not Dismissed'
"Dismissed"
42.2 %
The proportion of dismissed to non − dismissed (likelihood ratio) changesbelow and above this breakpoint.
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 14 / 26
Step 3: Selection of Best LR and Best NB Model
There are a total (26 − 1) = 63 candidate models to search from tofind the best LR and best NB model.
We use 8-fold cross-validation on the training set to find thetraining-set error. We repeat this process 100 times and take theaverage CV error.
We propose to use RMSE as the performance metric, not classificationerror, because we aim to predict probability and not do classification.
The best model is that one that produces lowest average training seterror. The best LR model contains 5 features and best NB modelcontains 2 features.
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 15 / 26
Best Naıve Bayes Model
Figure: Best naıve Bayes model based on RMSE
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 16 / 26
Best Logistic Regression Model
Figure: Best logistic regression model based on RMSE
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 17 / 26
Step 3: Selection of Best LR and Best NB Model
Computation of RMSE:
Compute the probability of dismissal for all 832 cases in the trainingset.
Sort instances based on predicted probabilities and partition thetraining set into 8 bins.
For each bin, compute the average of predicted probability and actualprobability.
The difference between predicted average and actual average is theprediction error.
Compute the SSE and RMSE.
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 18 / 26
Step 3: Selection of Best LR and Best NB Model
An example of computation of bin probability and RMSE:
Avg. predicted prob. Actual prob. Sq. error
Bin 1 0.270 0.288 0.0003Bin 2 0.320 0.355 0.0012Bin 3 0.340 0.336 0.0000Bin 4 0.385 0.365 0.0004Bin 5 0.438 0.442 0.0000Bin 6 0.482 0.432 0.0024Bin 7 0.598 0.644 0.0021Bin 8 0.654 0.625 0.0008
Sum of squared errors 0.0072RMSE 0.0300
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 19 / 26
Step 3: Selection of Best LR and Best NB Model
Figure: Graphical representation of bin probabilities
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 20 / 26
Step 4: Searching the Best Hybrid Model
There are a total 602 possible candidate models to search from.
The heuristic proceeds in two steps.
First, it considers features which are present in either best LR and bestNB model. We have 5 such features. Number of candidate modelsreduces from 602 to 180.
Second, it considers all 5 features must be there in the best hybridmodel in some arrangement. Number of candidate models reducesfrom 180 to 30.
The best hybrid model has 4 features in the LR side and 1 feature inthe NB side.
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 21 / 26
Best Hybrid Model
Figure: Best hybrid model based on RMSE
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 22 / 26
Step 5: Finding Test Set Error of Best Hybrid Model
1000 re-samples of same size as the test set generated usingnon-parametric bootstrapping.
Computation of RMSE from each bootstrap re-samples.
Average test set error and confidence interval are computed.
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 23 / 26
Results
Table: Model Selection Results
Method Predictors Avg. RMSE Std.Dev.
Naıve Bayes GP, STD 0.0488 0.0012Logistic regression GP, IC, STD, BR, S11 0.0436 0.0010Hybrid LR part: GP, IC, BR, S11 0.0412* 0.0011
NB part: STD
* significant in paired t-test @5% significance level.
Table: Test set errors
Models RMSE Bootstrap Std Error Bootstrap CI
Best Hybrid Model 0.0930 0.0444 [0.0110,0.1843]
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 24 / 26
Comparing Results with McShane et al. [2012]
Method # Predictors Training Set Error Test Set Error
LR Model 18 6.01% 11.17%Hybrid LR-NB 5 4.12% 9.30%
McShane et al. use all 18 features to predict dismissal and settlementboth. We do not know which features affect dismissal and which featuresaffect settlement.
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 25 / 26
Conclusions
Hybrid model includes best aspects of logistic regression and naıveBayes.
Retains simplicity (small number of parameters) of LR and NB.
Method for learning parameters remains the same as LR/NB.
Easy to make inferences.
For the dataset on 10b-5 class-action cases, hybrid model performsbetter than pure LR and pure NB.
Features for predicting dismissal are (i) GP, (ii) IC, (iii) BR, (iv) S11,and (v) STD.
Our hybrid model performs better than the LR model proposed byMcShane et al. [2012]
S.Singha, S.Hillmer, PP. Shenoy (KU) Computing Probabilities October 21, 2016 26 / 26