Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | nathaniel-jefferson |
View: | 215 times |
Download: | 1 times |
Presented by
Rebecca Shwayri
Introduction to Predictive Coding and its benefits
How can records managers use Predictive CodingPredictive Coding in Action Limitations of keyword searches & human
reviewPredictive Coding Defensibility
What is predictive coding?How does it work?
NOT Magic NOT a cure for cancer NOT based on voodoo
Keyword searchingConcept searchingE-mail threadingThese methods can be useful but do not
predict relevance of future documents based on past documents
Expert (you) develops an understanding of the documents and classifies the documents
Old tech In common use today
Example: Spam Filter, Amazon.comMath and Statistics
AlgorithmsMathematical model builtAccuracy depends on quality of training set
Random Sample
Single person reviews & codes
the Sample
Non-Responsive
Responsive
Computer learns & predicts
Computer categorizes all remaining documents
Responsive Non-Responsive
Repeat as needed
Review 2000-5000 randomly selected documentsOne person’s time for 15-39 hours
Predictive Coding in Practice
Dramatic Reduction in e-discovery costsMore accurate than human review and
keyword searchLight years faster than human review and
keyword search
Fact driven, not fear driven, settlementsLearn the facts of the case in a few days
rather than over months or years using traditional methods of review
Helps avoid litigation – uncovers the facts more quickly
Use as an information governance tool
Method Recall Ratio
Cost Speed
Keywords 20 percent High $$$ Slow - Misses content
Human Review 60 percent Very High $$$$ 60 docs / hr
Predictive Coding
75-98 percent
Low $ >80-250x faster
Information Governance Tool (proactive)Litigation Tool (reactive)
Encompasses a variety of disciplinesRecords ManagementKnowledge ManagementInformation Security and Privacy
Data breach risksE-discovery costsUnable to locate documents needed for the
business units
Standardized IG policiesReduce the need to review every single
document to determine the importance of the document to the company
Locate data within the company’s IT infrastructure and categorize it appropriately for the business units
Locate data that needs to be destroyed
Example: Company is sued in a dispute involving fraud and breach of contract
Custodians: 20 Potential Custodians with average e-mail box of 40 GB each (800 total GB of e-mail data)
Other electronic Files: 200 GB Total Data: 1 Terabyte
Company is served with a Request for Production of Documents by Plaintiffs’ Counsel
Plaintiffs’ Counsel demands searching through ESI of custodians
Plaintiffs’ Counsel makes a broad demand for accounting records
What do you do?Keyword search 1TB of data? How do you
keyword search fraud? Information disadvantage!
Human review? It will take many, many months and millions of dollars to review 1TB of data!
Use Predictive CodingShould you disclose?
One school of thought suggests disclosing use of predictive coding to opposing counsel, agreeing to precision and recall rates (Full Agreement and Full Disclosure)
The other school of thought suggests making no disclosures (Avoid litigation associated with use of predictive coding)
Recall (Completeness) Recall measures how successful the system was in finding all
of the responsive documents. If 1,000 documents in the full set were actually responsive, but
the system only marked 750 of those documents responsive, then the recall would be 75 percent.
Precision (Accuracy) Precision measures how often the documents that were
marked responsive were actually responsive. If the system marked 10 documents responsive, and only six of
them were actually responsive, then the precision would be 60 percent.
Depends on collection “richness”2-5 days – one person & one only!500-5000 documents reviewedStop when system exhibits:
High rates of Precision & Recall – above the agreed to rates
No longer discovering new topics to teach the computer about
Computer is predicting with consistency
It is like Exit Polling….
Statistics Truth: Sample of a certain size yields a certain level of confidence and a certain margin of error.
400 randomly selected docs provides 95% confidence level in the estimate of Predictive Coding accuracy, with a ± 5% margin of error. Reference: Cochran, WG 1977. Sampling Techniques, 3rd Ed. John Wiley & Sons, New York,
New York, USA.
When you are out of timeIf you want to save moneyConsider using CAR for cases involving 5 GB
or more of dataPredictive coding makes sense when you
have 20,000 documents or more
Judge Facciola (D.DC): “If you are practicing e-discovery without a clawback, you are committing malpractice.”
Parties agree in writing that inadvertent production of privileged material does not automatically constitute a waiver
What if the other side won’t agree to the clawback agreement?
Go to the Court!Rajala v. McGuire Woods, 2010 WL 294582
(D. Kan. July 22, 2010): Court issued clawback order with no need to show reasonable efforts
Consider Clawback Agreement during “meet and confer” conference
Embody agreement in Court Order (Rule 502(d))
Predictive coding should be used to cull down data set to a manageable level
This should occur AFTER predictive codingAttorneys should conduct privilege reviewAttorneys need to decide what is privileged: Do
not put this on auto-pilot
Why Linear Review is IneffectiveLinear Review compared to other methods
Catches only 20 percent of relevant evidence Therefore…misses 80 percent
The “Google” phenomenon
Limitations of Keywords
Failure of imagination (Example: Nasdaq versus Stock Market)
How many synonyms for the word “think”?Precise Terms of ArtMisspellings (Example: Mangment, Mangemnt…)
Problems With Keywords
Human problemPeople express concepts differentlyDifficulties in learning to adopt another party’s
language styleTREC (Text Retrieval Conference) was a
competition and it showed a complete failure in keyword searches
Human keyword based review is expensiveIt is slow & inaccurateIt unnecessarily complicates a simple processIs widely used as until now, there were no
alternativesPredictive coding – when “done right” – can save
a corporation 80-90% of review costs.
Keyword searches missed 96 percent of relevant documents (recall ratio averaged less than 4 percent)
TREC Legal Track Study 2009
97 percent of relevant documents not foundOnly a 3 percent recall ratio (76,373 relevant
documents not discovered)Boolean searches reduced the initial corpus from
685,592 to 2,715 documents87 percent precision ratio (2,362 documents out
of 2,715 are relevant)
TREC Legal Track Study 2010
Involved a San Francisco Bay Area Rapid Transit Accident
Discovery database contained 40,000 documents and 350,000 pages
Attorneys believed keyword searches uncovered 75 percent of relevant documents
In reality: Only 20 percent of relevant documents uncovered
Blair and Maron Study
Human eyeballs on every documentJudge Peck: The “gold” standard does not have
any goldHuman assessors disagree on the relevance of a
document to a single topic
The “Gold” Standard
TREC Conclusion: 65% Recall and 65% Precision is best retrieval effectiveness for human reviewers
Human eyeballs on every document is not working
Reviewers disagree as frequently as 50 percent
Monique Da Silva Moore v. Publicis Groupe & MSL Group (SDNY) (endorsed using predictive coding) Complicated and confusing protocol – DO NOT USE Defendants offered plaintiffs everything they wanted –
protocol was so confusing they could not see they got everything they ask for – so they went after the Judge.
Global Aerospace, Inc. v. Landow Aviation Limited Partnership (Circuit Court of Loudoun County Virginia) (authorized use of predictive coding over objection) Nothing in news – as no controversy – everything
worked!
Expensive Kleen case – 1400 attorney hours to determine search
terms – and plaintiff was not satisfied – and neither was aware of overall effectiveness of terms
Not effective Over or Under produces
Known to be very problematic“Ostrich approach” is no longer advisable –
technology has evolved Judges know it exists, plaintiffs know it exists and ask
for it
EORHB, Inc., et. al. v. HOA Holdings, LLC (Delaware Chancery Court)
Court ordered the parties sua sponte to use predictive coding and ordered the parties to use the same vendor
Judge may have over stepped bounds
Technology is your friendMake data driven decisionsWe are living in the “MoneyBall” ageIf you are unsure, please ask – this is not
going away