Date post: | 25-May-2015 |
Category: |
Technology |
Upload: | emily-mermell |
View: | 268 times |
Download: | 1 times |
Demystifying Advanced Technologies to Find Solutions that Work
Friday, Oct. 11 | 9:45 – 10:45
Presented by
Peter Oesterling
Assistant General Counsel | Nationwide
Alex Ponce de Leon
Discovery Counsel | Intel
J. William Speros
Evidence Consulting Attorney | Speros & Associates
“Technology-Assisted Review,” called by its nickname “Predictive Coding,” describes a process whereby computers are programmed to search a large amount of data to find quickly and efficiently the data that meet a particular requirement. Computer science and the sciences of statistics and psychology inform its use. While it bruises the human ego, scientists…determined that …[i]t is now indubitable that technology-assisted review is an appreciably better and more accurate means of searching a set of data.”
THE GROSSMAN-CORMACK GLOSSARY OF TECHNOLOGY-ASSISTED REVIEW
FEDERAL COURTS LAW REVIEW Volume 7, Issue 1 (2013) Foreword by John M. Facciola, U.S. Magistrate Judge
“Technology-Assisted Review,” called by its nickname “Predictive Coding,” describes a process whereby computers are programmed to search a large amount of data to find quickly and efficiently the data that meet a particular requirement. Computer science and the sciences of statistics and psychology inform its use. While it bruises the human ego, scientists…determined that …[i]t is now indubitable that technology-assisted review is an appreciably better and more accurate means of searching a set of data.”
THE GROSSMAN-CORMACK GLOSSARY OF TECHNOLOGY-ASSISTED REVIEW
FEDERAL COURTS LAW REVIEW Volume 7, Issue 1 (2013) Foreword by John M. Facciola, U.S. Magistrate Judge
Process: a series of actions that produce
something or that lead to a particular result
“Now, the methodology of the use of technology-assisted review may itself be in dispute, with the parties controverted to each other’s use of a particular method or tool. Those controversies have already lead to judicial decisions that have to grapple with a wholly new way of searching and with scientific principles derived from the science of statistics or other disciplines.”
THE GROSSMAN-CORMACK GLOSSARY OF TECHNOLOGY-ASSISTED REVIEW
FEDERAL COURTS LAW REVIEW Volume 7, Issue 1 (2013) Foreword by John M. Facciola, U.S. Magistrate Judge
“Now, the methodology of the use of technology-assisted review may itself be in dispute, with the parties controverted to each other’s use of a particular method or tool. Those controversies have already lead to judicial decisions that have to grapple with a wholly new way of searching and with scientific principles derived from the science of statistics or other disciplines.”
THE GROSSMAN-CORMACK GLOSSARY OF TECHNOLOGY-ASSISTED REVIEW
FEDERAL COURTS LAW REVIEW Volume 7, Issue 1 (2013) Foreword by John M. Facciola, U.S. Magistrate Judge
Methodology: a set of methods, rules, or ideas that are important in a science or art : a particular procedure
or set of procedures
THE GROSSMAN-CORMACK GLOSSARY OF TECHNOLOGY-ASSISTED REVIEW
FEDERAL COURTS LAW REVIEW Volume 7, Issue 1 (2013)
Predictive Coding: An industry-specific term generally used to describe a
Technology-Assisted Review process involving the use of a Machine Learning Algorithm to distinguish Relevant from Non-Relevant Documents, based on Subject Matter Expert(s)’ Coding of a Training Set of Documents.
THE GROSSMAN-CORMACK GLOSSARY OF TECHNOLOGY-ASSISTED REVIEW
FEDERAL COURTS LAW REVIEW Volume 7, Issue 1 (2013)
Predictive Coding: An industry-specific term generally used to describe a
Technology-Assisted Review process involving the use of a Machine Learning Algorithm to distinguish Relevant from Non-Relevant Documents, based on Subject Matter Expert(s)’ Coding of a Training Set of Documents.
“A word is not a crystal, transparent and unchanged, it is the skin of a living thought and may vary greatly in color and content according to the circumstances and the time in which it is used.”
Justice Oliver Wendell Holmes Jr., Towne v. Eisner, 245 U.S. 418, 425 (1918)
THE GROSSMAN-CORMACK GLOSSARY OF TECHNOLOGY-ASSISTED REVIEW
FEDERAL COURTS LAW REVIEW Volume 7, Issue 1 (2013) Foreword by John M. Facciola, U.S. Magistrate Judge
“I think you should be more explicit here in step two.”
Published as guest contributor to Ralph Losey’s E-Discovery Team Blog Site:
http://e-discoveryteam.com/2013/04/28/predictive-codings-erroneous-zones-are-emerging-junk-science/?shareadraft=517d80048f827
“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”
“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”
• “PBS’ Frontline’s Forensic Tools: What’s Reliable and What’s Not-So-Scientific dispelled the infallibility, and in some instances, the validity, of analytical techniques long relied upon by our legal profession.”
• “Even if those techniques were not botched or biased, their validity ranges from bought-and-paid-for infomercials to, at best, an approximation.”
• “Back then attorneys and judges (and experts and vendors) did with those junk sciences just what we are doing now with respect to predictive coding: allowing claims, however unjustified and erroneous, to form the basis of our practices, to influence our precedent and to accrue authority.”
“[T]hose of us who trust the scientific and adversarial process recognize that erroneous claims don’t naturally defeat truth. They suppress truth, distract from truth and sometimes persist so long that we forget to inquire into the truth. Oftentimes, weak interests seek to dispel erroneous claims which are promoted by strong commercial interests. With respect to predictive coding my sense is that we are neither deluded nor deceptive — well, not too much anyway — but we just have not yet thought it through.”
“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”
“[T]hose of us who trust the scientific and adversarial process recognize that erroneous claims don’t naturally defeat truth. They suppress truth, distract from truth and sometimes persist so long that we forget to inquire into the truth. Oftentimes, weak interests seek to dispel erroneous claims which are promoted by strong commercial interests. With respect to predictive coding my sense is that we are neither deluded nor deceptive — well, not too much anyway — but we just have not yet thought it through.”
“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”
Erroneous Practice #1
Using a full-text search to identify prospectively responsive documents and then employing predictive coding to eliminate those that are not responsive.
Erroneous Practice #2
Pulling a random sample of documents to train the initial seed set.
Erroneous Practice #3
Identifying “magic numbers” of minimum:• “Iterations”• Responsive documents within a
randomly accumulated setErroneous Practice #4
Asserting that Predictive Coding software is the “gold standard” for document retrieval in complex matters.
“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”
Erroneous Practice #4
Asserting that Predictive Coding software is the “gold standard” for document retrieval in complex matters.
Is Erroneous Because
It asserts that predictive coding is a standard:• Share some commonly understood
characteristics but no precise attributes• Involves some general methodologies but no
clear rules• Are associated with general aspirations but
no comprehensively defined operations.Example All advertisements or orders for “predictive
coding”
“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”
Erroneous Practice #4
Asserting that Predictive Coding software is the “gold standard” for document retrieval in complex matters.
Is Erroneous Because
It asserts that predictive coding is a standard:• Share some commonly understood
characteristics but no precise attributes• Involves some general methodologies but no
clear rules• Are associated with general aspirations but
no comprehensively defined operations.Example All advertisements or orders for “predictive
coding”
“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”
Gold Standard vs “Standard”
Erroneous Practice #2
Pulling a random sample of documents to train the initial seed set.
Is Erroneous Because
A. Looks for relevance in all the wrong places: Thoughtful researchers don’t try learn about relevant docs by examining irrelevant ones.
B. It turns a blind eye to what is staring you in the eye: denies that attorneys know what they are paid to know: where to look and what to find.
C. Measures the wrong stuff: • Constrained and circular “like” definition• Prevalence vs Relevance vs Probativeness
Example Global Aerospace v. Landow Aviation (settled without court ruling re strategy)
“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”
Erroneous Practice #2
Pulling a random sample of documents to train the initial seed set.
Is Erroneous Because
A. Looks for relevance in all the wrong places: Thoughtful researchers don’t try learn about relevant docs by examining irrelevant ones.
B. It turns a blind eye to what is staring you in the eye: denies that attorneys know what they are paid to know: where to look and what to find.
C. Measures the wrong stuff: • Constrained and circular “like” definition• Prevalence vs Relevance vs Probativeness
Example Global Aerospace v. Landow Aviation (settled without court ruling re strategy)
“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”
Erroneous Practice #1
Using a full-text search to identify prospectively responsive documents and then employing predictive coding to eliminate those that are not responsive.
Is Erroneous Because
A.Over-relies and under-delivers: presumed arrogance or clairvoyance
B.It arbitrarily places documents out-of-sight and, therefore, out-of-mind: likelihood that responsive documents will ever be produced but dumbing-down the predictive coding intelligence
Example In re: Biomet M2a Magnum Hip Implant Prods. Liab. Litig. (endorsed by court)
“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”
Erroneous Practice #1
Using a full-text search to identify prospectively responsive documents and then employing predictive coding to eliminate those that are not responsive.
Is Erroneous Because
A.Over-relies and under-delivers: presumed arrogance or clairvoyance
B.It arbitrarily places documents out-of-sight and, therefore, out-of-mind: likelihood that responsive documents will ever be produced but dumbing-down the predictive coding intelligence
Example In re: Biomet M2a Magnum Hip Implant Prods. Liab. Litig. (endorsed by court)
“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”
Erroneous Practice #3 Identifying “magic numbers” of minimum:• “Iterations”• Responsive documents within a randomly
accumulated setIs Erroneous Because A.You may not be able to get there from here:
Don’t know starting point or ending pointB.You don’t know what isn’t yet known: Cannot
predict alternative pathsC. Consider low frequency, high probativenessD.Who’s the witness?
Example • “This [iteration] process shall be repeated for a total of seven iterations… [Requesting party pays] costs and fees… [for] more 40,000 documents.” (DaSilva Moore)• Vendors’ affidavits in various matters
“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”
Erroneous Practice #3 Identifying “magic numbers” of minimum:• “Iterations”• Responsive documents within a randomly
accumulated setIs Erroneous Because A.You may not be able to get there from here:
Don’t know starting point or ending pointB.You don’t know what isn’t yet known: Cannot
predict alternative pathsC. Consider low frequency, high probativenessD.Who’s the witness?
Example • “This [iteration] process shall be repeated for a total of seven iterations… [Requesting party pays] costs and fees… [for] more 40,000 documents.” (DaSilva Moore)• Vendors’ affidavits in various matters
“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”
May not be able to get there even with a “Magic” number of steps…
Erroneous Practice #1
Using a full-text search to identify prospectively responsive documents and then employing predictive coding to eliminate those that are not responsive.
Erroneous Practice #2
Pulling a random sample of documents to train the initial seed set.
Erroneous Practice #3
Identifying “magic numbers” of minimum:• “Iterations”• Responsive documents within a
randomly accumulated setErroneous Practice #4
Asserting that Predictive Coding software is the “gold standard” for document retrieval in complex matters.
“Predictive Coding’s Erroneous Zones Are Emerging Junk Science”
Search Mechanisms’ InferencesIn
fere
nces
(ris
k) re
reca
ll
Search Mechanism
Databases
Files, Folders(in place)
End-usertags
Files, Folders(per user)
Duplicates
“Technology Assisted Review”
via Machine Learning
E-mail threading and “Near” Duplicates
Key words
Random Sampling
Similarity/Clusters Sorting
Similarity
Clustering
Your Notes
_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
Your Notes
_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
Your Notes
_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________