Streamlining Automation in eDiscovery
Wednesday, November 9, 201612:00 pm ET
Sandra Serkes, President & CEOValora Technologies, Inc.
Why Automate?• Cost• Efficiency (value/cost)• Time• Consistency• Accuracy• Repeatability/Best practices/Knowledge
transfer• Defensibility• Return on Investment• Economies of Scale
Forces pushing us towards automation
• Technological advances• Focus on value, particularly cost• Ever larger document/file volumes• Ever more complex data analysis needs• Decreasing time frames• Limited/finite resources• Competition for services• Security concerns• Integration with other business/legal practices
We are entering a time of Routine vs. Specialty eDiscovery Practice
20 GBs of email 2 GB of tweets45 boxes of paperForeign language docsPersonal data
• Automating Tasks• Automating Workflow
Routine v. Specialty eDiscovery• Paper productions• Foreign Language• Audio & Video files• ESI with no metadata• Email attachments• Shared files, loose files• Databases & Repositories • Special application files• Personnel documents• Contracts & Agreements• Stored records• Social media• Multi-party litigation• Sensitive material• And much more….
Moving beyond email messages!
Can we automate specialty eDiscovery?
Specialty eDiscovery Client Use Cases
• AutoIndex 400,000 files per day for 4 months
• AutoRedact SSN & TID from credit applications
• Host online “Bidder’s Library” of 100 years of scanned records
• AutoBusiness Rules for document retention & compliance
• Convert paper medical records to digital format with embedded indexing
• AutoReview 1.5M files for responsiveness, privilege, & hotdocs
• AutoIndex 3M FOIA request documents
• AutoTranslate Japanese, Spanish, French & German docs to English
• Oversee & manage 6-city simultaneous data collection & conversion
• AutoRedact personally identifying information (PII)
Typical Specialty eDiscovery Services• AutoUnitization Ability to distinguish the beginning & end
of documents, as well as determine which documents incorporate other documents as attachments
• AutoCoding Identify and label documents by type (balance sheet, tax form, memo, etc.), relevant people (authors, recipients, cc/bcc), date and subject/title.
• AutoReview Identify and label documents by groupings (dupes/near dupes, conversation threads, issues/clustering) and disposition (responsive, privileged, “hot,” etc.)
• AutoRedaction Ability to identify & markup documents to “black out” select information (such as PII – private identification information, patient data or privileged information)
• AutoTranslation Automatic translation of non-English documents to English text. Supports dozens of originating languages.
• AutoTranscription for Audio & Video Files Automatic transcription of audio & video files to corresponding text files. Mutilple file type support.
• Hosting, Database Creation & Data Visualization Hosting of pre/post-processed documents in BlackCat or other (iConect, Relativity, etc.). Intuitive, graphical presentation of data with easy navigation, understanding and manipulation of document subsets. Good for Early Case Assessment.
• AutoBusinessRules Identify and label documents by workflow treatment, retention plans, compliance audit or other groupings. Useful for DocReview, retention and compliance dispostion.
• Electronic File Processing (EFP) File Conversion to TIF/PDF format, text and metadata extraction, de-NISTing, cross-custodian de-duplication, filtering/culling, analytics
• OCR Optical Character Recognition for converting images to searchable text
• NearDuplicateDetection Identify documents that are highly similar, if not identical across custodians and the entire population. Includes cross-correlation of paper & electronic documents
• EmailThreading /Dethreading Join separated email conversation threads into a consistent stream from start to finish. Separate threaded emails into component threads
• Scanning Image conversion for paper documents into electronic image format (TIF, PDF, JPEG, etc.)
• Professional Services Options for Project Management, Technical data/file manipulation, Subject Matter Expertise, Resources & Worfklow Design & Management
Document Intake &
Submission
Submission to Valora
• Files/Docs arrive• Log COC, inventory• Tracking closed• Email acknowledgement
PH Receipt & Pre-Process
Manual 1: LDD & Triage
PowerHouse Automation
Manual 2: QC
Export & Delivery
• AutoCoding• Rules• NearDupe• AutoReview• QC Assignment
PH Post-process &
Ship
Valora Suggested Workflow Process
• Load data to systems • Autotriage (reject/alert)• AutoTranslate to English• AutoTranscribe • AutoLDD• ND Store check• Tracking ID assignment
• Error Handling• LDD QC• Special Instructions• Q&A
• FTP, SFTP• Drag & drop• Email bounce• Media (drive/DVD)• Send boxes
• Coding QC• GQC & Audit• Work Assignment• Template/Rules ID• PH Tools• Ready to Ship
• Number Assignment• Data Integrity Checks• ND Store• Export & Ship
• Prep load file(s)• Load to BlackCat• Prep shipment package• Ship & track• Confirm receipt
Client
Valora
How does Automation Work?• Processing (aka Intake) is the process of “ingesting” data into an analytics engine
– Creating OCR for scanned images– Extracting text for native files & email– Speech to text for audio/video files– Translating content to English– Re-ordering or re-aligning pages– Applying redactions
• Tagging (aka Coding, Indexing, Sequencing) is the process of extracting key information and attributes about each document– Document Type, Important Dates– Key Names & Phrases– Topics, Keywords & Themes– File, Content and DocType attributes– Relation to other documents (duplicate, related, attached, contradictory, etc.)
• Disposition (rules) is the process of creating a destination or status for each document– Retention status & duration– Folder (taxonomy) location– Labelling & keywords display
native text
text fielded data
fielded data disposition
Intake – PowerHouse – Output
PH Web Portal
Folder Taxonomy
Hosted Repository
Shared Server Poll
OCR/Text Extraction
Translation/Transcription
Unitization
Coding/Tagging
Rules/Disposition
Redaction
Exceptions
PowerHouse Portal
Users drag & drop files into the portal for immediate, automatic
loading into PowerHouse.
PowerHouse responds with an automatic acknowledgement
email.
Automating eDiscovery & Beyond
INDEXING/TAGGING
ANALYSIS/RULES
PRESENTATION
Date, Author, Patent # …
Year Total, Hot Doc, Priv…
BlackCat, Relativity, .CSV …
AutoIndexing
AutoBusinessRules
Analytics
Database Prep
How AutoCoding Works
Docs enter the system as extracted or OCR’ed text
Data is extracted from each document into a
database table
DocType = Patent ApplicationDate = 10/18/2007
Date Format = US
Author = Patent Authors, Author City, Author Country
Assignee = RIM
Tone = Neutral to slightly positive
Embedded Graphic with Title
Other Data Capturable Data Elements:• Patent Number• Filing Date• Key Phrases & Terms• Managing PTO• Implied/Attached Docs• Bar Code Present• And many more . . .
INDEXING/TAGGING for eDiscovery
AutoIndexing
AutoUnitization
AutoBusinessRules
Analytics
Database Prep
How AutoUnitization Works
Docs enter the system with physical (or no) boundaries
Documents are separated down to the unit document level
AutoReview Defined
• AutoReview is the iterative application of software and technique to capture information about documents– “Protective” Fields: Privilege, IP/Trade Secret, Confidential,
Non-responsive/Irrelevant, Work Product/Attorney Notes, Suppressed– “Producable” Fields: Responsive, Issue/Category/Filter/Cluster, Duplicate/Near
Duplicate– Categorizing /Grouping Fields: Duplicate/NearDuplicate, Conversation Thread,
Issue/Category, Hot– Privacy & Protection (Redaction): Privileged portions, customer/patient data,
financial info, Private identification information (PII)• Emerging flavors of AutoReview, Technology-Assisted Review (TAR)
– Valora one of a handful of true Service Providers– Uses software & OCR/extracted text, metadata and Statistical Pattern-Matching
• Generally accepted that AutoReview is faster and lower cost than manual review, with higher quality
ANALYSIS/RULESFor eDiscovery
Litigation Document Review Manual
Determining ResponsivenessThe document should be marked responsive if any of the following conditions are present:• Mentions or discusses the specific protocol for handling simultaneous voice and data actions• Is a design document or graphic that shows the specific protocol for handling simultaneous voice and data actions• Discusses or is related to patent ‘009• Mentions Apple Inc. or Apple Computers, Inc. or is a communication from/to anyone at Apple Computer, Inc., or apple.com.• And so on…
-7-
Rule: Responsive for Protocol DiscussionWhen: [FullText] contains any of <Voice protocol key phrases 12> and [FullText] contains any of <Data protocol key phrases 25> and [DocType] is not any of [Brochure, Press Release, Website], ...
Indexing/Tagging
Rule: Responsive for Patent ‘009When: Any document in the Attachment Family matches: [FullText] contains any of <Patent '009 key phrase list 4>, or Parent of Attachment Family matches: Any of [Author, Recipient, CCs] contains any of <Patent '009 experts contact list 23>, …
Rule: Responsive for AppleWhen: [FullText] contains (fuzzy match) any of <Apple key phrase list 7>, or Any of [Authors, Recipients, CCs] contains any of <Apple contact list 15>, or [Author] matches "*@apple.com“ …
AutoTranslation Defined
• Universal Translation to/from 65 languages– Software performs the translation per Google’s licensed translation
engine– Ex: Non-English converted to English
• Multiple choice presentation– Original language, translated language(s), or both– Presentation can include Redactions
• Available for all kinds of further processing– Convert to English, then:– Apply AutoCoding, AutoBusinessRules or AutoRedaction– Perform NearDuplicates, Filtering and Culling, or Content Clustering
• Save on expensive manual translation hours!
PowerHouse
Intake
AutoTranslation
AutoIndexing
AutoBusinessRules
Analytics
How AutoTranslation Works
Docs enter the system in their native language
Docs convert to searchable English (or other target)
What it looks likeOriginal text AutoTranslated text
DINHEIRO DIGITAL05-01-2015 as 07:53
China deixa de controlar precos dotabacoA China aboliu o controlo de precos da folha de tabaco, o ultimo produto agricola a ter limites, anunciou este fim de semana a Comissao Nacional de Desenvolvimento e Reforma, o principal organismo de planeamento economico da nacao asiatica.O prego da folha do tabaco e, no entanto, apenas urn pequeno fator no custo total dos cigarros - urnmonopolio estatal na China -, o que torna improvavel que haja efeitos significativos para os fumadores.O Governo chines tern tentado reduzir o consumo de tabaco mas as medidas tern tido urn impacto limitado.O tabaco esta antra os 24 produtos e servigos cujo controlo de custo foi removido, incluindo tambem transporte ferroviario de carga a granel, do envio de encomendas por correio, transporte de passageiros e fabrico de explosivos para use civil.A empresa estatal China Tobacco tern o monopolio da produgao de cigarros mas o prego do tabaco sera determinado de acordo corn o <<oferta e procura industrial e corn os custos e lucros da empresa>>, disse a Comissao, ern comunicado.De acordo corn declaragoes, hoje publicadas, do dirigente da Comissao Wang Shengmin ao jornal China Daily, a China produz cerca de 2,5 milhoes de toneladas de tabaco por ano.
#==============================================================##== Valora Technologies, Inc. AutoTranslation ==##== The following text has been auto-translated to English ==##== From Portuguese ==##==============================================================#
DIGITAL MONEY05/01/2015 at 07:53
China no longer controls prices oftobaccoChina abolished the control of tobacco leaf prices, the lastagricultural product to have limits, announced this weekend the CommissionNational Development and Reform Commission, the main bodyeconomic planning of the Asian nation.The tobacco leaf of the nail and, however, only a small factor in the total cost of cigarettes - aState monopoly in China - which makes it unlikely that there are significant effects on theSmoking.The Chinese government tried to reduce the consumption of tobacco but the measures taken are an impactlimited.Tobacco this and 24 products and services whose cost control was removed, includingalso transport by rail bulk cargo, shipping, mail order, transportpassengers and manufacture of explosives for civil use.The state-owned China Tobacco tern the monopoly of production cigarettes but tobacco nailwill be determined according the corn << supply and industrial demand and corn costs and profits>> company, said the Commission, in a statement.Of corn declarations agreement, published today, the head of the Commission Wang Shengmin the newspaperChina Daily, China produces about 2.5 million tons of tobacco per year.
Why bother with AutoTranslation?
• Far more cost-effective than manual translation
• Often the “gist” of a document is good enough to make decisions
• AT text is strong enough for automated processing– AutoCoding– Rules & Analytics– Classification & Workflow
• Speed! 10,000 pages/hr• Easily tag documents that
must be manually translated, leave the rest as AT.
• Analogous to Early Case Assessment, similar to OCR & other tagging technologies
• Note: there will be errors and omissions.
AutoTranscription Defined
• Automated transcription of captured speech (audio)– Software performs the transcription per IBM’s Watson licensed translation engine– Technology commonly known as speech-to-text, similar to OCR (image-to-text)
• Multiple choice presentation– Simple text– Standard legal deposition transcript format– Time stamps option– Presentation can include Redactions – Video stills option
• Available for all kinds of further processing– Convert to text, then:– Apply AutoCoding, AutoBusinessRules or AutoRedaction– Perform NearDuplicates, Filtering and Culling, or Content Clustering
• Save on expensive manual transcription hours!
PowerHouse
Intake
AutoTranscription
AutoIndexing
AutoBusinessRules
Analytics
How AutoTranscription Works
Audio (& video) files enter the system in their native
format
Docs convert to searchable text format.With time stamps, stills, formatting, redactions, as needed.
China no longer controls prices of
tobaccoChina abolished the control of tobacco leaf prices, the
lastagricultural product to have limits, announced this
weekend the Commission
National Development and Reform Commission, the
main bodyeconomic planning of the Asian nation.
The tobacco leaf of the nail and, however, only a
small factor in the total cost of cigarettes - a
State monopoly in China - which makes it unlikely
that there are significant effects on the
Smoking.The Chinese government tried to reduce the
What it looks likeOriginal video AutoTranscribed text
Cigarette_Smuggling.mp4
ABC agents are joining the fight to try and crack down on cigarette smuggling. We're not talking about one or two packs but hundreds of cartons out of Virginia. We're told it's a crime, it's big business now and criminals are cashing in. Their new abode takes us into the world of cigarette smuggling. Cigarettes aren't legal but what kind like this may cost you anywhere from thirty to forty five dollars in Virginia, in New York City it brings that's nearly one hundred and fifty. Criminals are making a lot of money by buying cigarettes here and then selling them illegally up north. It's become such big business. It's become a money-making game for them. I figured I'd give it a whirl. Cigarette smuggling according to the Virginia state crime Commission has become more profitable than cocaine heroin marijuana and guns.
Why bother with AutoTranscription?
• Far more cost-effective than manual transcription
• Often the “gist” of a document is good enough to make decisions
• AT text is strong enough for automated processing– AutoCoding– Rules & Analytics– Classification & Workflow
• Speed! 10,000 pages/hr• Easily tag documents that
must be manually transcribed, leave the rest as AT.
• Analogous to Early Case Assessment, similar to OCR & other tagging technologies
• Note: there will be errors and omissions.
• Automated redaction of “offending” text or phrases– Software performs the redaction based on Rules
• Multiple choice presentation– Image, text or both– Solid Black, Black with white writing, Translucent Yellow, Translucent
Gray• Available for all kinds of information
– List provided or “derived” from tags– Ex: SSN, DOB, Name, Age, Address, Account Number, Product
Name/ID…• Unlimited redactions in a single document
AutoRedaction Defined[REDACTED]
What kind of redaction makes sense?
Serkes Sandra 123-45-6789 226-588-98
• Should redactions be visible: always, sometimes or never?• Does someone need to approve or override system
redactions?
What kinds of information can be AutoRedacted?
• PII – names, addresses, DOB, SSN• Financial – account number, credit card info, mortgage files• Non-class action personnel & info• Product names, brand names, makes/models• Organizational names & information• Locations, addresses, lat/lon, IP addresses• Concepts & issues
Best bets are formulaic data or lists of info
Now that we’ve covered Automating the Tasks of eDiscovery, let’s Automate the Process of
Specialty eDiscovery
Why have a strong relationship with a Specialty Provider?
• When the crisis comes, you want us to be– Pre-vetted (Preferred Services Provider)– Familiar with your workflow, processes, load files, terminology, etc.– Ready to go quickly
• Best to have all “regular” workflow locked down and provided from same place. Same is true with specialty work.– 1 go to source– All specialty services, no matter how oddball– Able to adapt to unique (specialty) circumstances
• Ability to control and predict costs– Options for preferred pricing, onsite/offsite/SaaS
• Ability to completely customize the workflow to your clients’ needs
Specialty eDiscovery Pricing Models
• One-off Projects– Standard transactional services price list– Bulk Discounts available for high volume & resale
• Regular, Monthly Usage– Custom, discounted pricing at multiple tier options
• Product Licensing– PowerHouse– BlackCat– Professional Services
One-off or Subscription?
• More than 1-2 small specialty matters per month?• More than 1 “whopper” specialty case per year?• Repeat specialty cases (or tasks)?• Distributed offices and clients?• Lean litigation support team?• Prior subscription or on premise product purchases?
If you answered, “yes” to any of the above, it’s time to think about streamlining specialty matters with subscription based pricing models.
Subscription or On Premise (or Cloud) Product?
• More than 3-4 small specialty matters per month?• More than 3 “whopper” specialty cases per year?• Non-litigation uses? (Records, Info Gov, Knowledge Mgmt)• Integration with other systems? (iManage, Aderant, Conflicts
DB, case management)• Strong IT support?• Lit Support/eDiscovery as a profit center?
If you answered, “yes” to any of the above, it’s time to think about streamlining specialty matters with on premise models.
Conclusions
• Important to make distinctions between Routine and Specialty eDiscovery
• Many, many capabilities can now be mostly or fully automated
• eDiscovery is converging with other document/file-centric disciplines
• Important to evaluate your technical tool needs in advance of your budget cycle– Goes against case-by-case, one-off utilization
• Increasing role of consultants and non-attorney/counsel players
Valora Technologies• Bedford, MA software firm specializing in machine-assisted
document processing capabilities (aka analytics)– World experts in the automated analysis, indexing, mining and presentation
of documents, data & content– 20 staff, 200+ clients, 1,500,000+ pages every week
• Customers: corporate legal departments, government agencies, and their professional advisory colleagues (law firms & consultancies)
• Target market: those who wish to harness and profit from the 2.5 quintillion bytes of document & content data being created each day, aka “Big Data”
• Objective: to overtake traditional information repository creation (manual data entry), management, analysis (search, review) and workflow (retention, production, routing) with high quality, low cost, scalable technology & best practices in analytics.– Provide cost competitive document analytics solutions in the United States– Provide efficient, world-class, targeted solutions to data, document & content utilization problems
The power of Big Data is the story about the ability to compete and win
with few resources and limited dollars. - Forbes, March 2012
(this is Valora’s story, too)
“”
Legal/Litigation/eDiscovery Problems
• Too many documents to review, cull & produce by hand
• Cost-effective alternative solutions to contract attorney & offshore labor “armies”
• Missing, poor, or ineffective metadata• Re-unitization, organization, indexing &
redacting of documents• Bridging multi-language document
populations to English
Records Management Problems• Help automate defensible deletion efforts
for IG• Organize & control loose documents on
shared drives, desktops, networks & devices
• Eliminate expensive and information-poor storage options
• Serve as automated intake for multiple content generation sources
Business Intelligence Problems• Organize & control decades of contracts &
agreements• Provide brand integrity/protection data
mining of public/private documents• Forecast & trending of topics, people &
locations over time• Loose, shared files analysis & control
Health Care Problems• Heavy expense & time converting hardcopy
medical records to EMRs/EHRs• Cannot keep up with fax server data
collection• Cost effective alternative solutions to
“armies” of temp data entry coders
Typical Problems Valora Solves
Who We Serve• Corporate Legal Departments with complex
document/data/content management needs– Litigation– Risk Exposure– Compliance– Records– Information Governance
• Government Agencies with limited resources for document/data/content monitoring, analysis, management – Litigation– Investigations– Compliance– Records
• Their Advisory Counsel The law firms, consultancies and service providers who support these entities
Thank You!
For More Information:
Valora Technologies, Inc.101 Great Road, Suite 220
Bedford, MA 01730781.229.2265