Deep Dive: Advanced Search Technologies

Post on 05-Dec-2014

691 views 2 download

description

Even with recent advancements in predictive coding, tried and true searching tactics such as keyword searching, concept searching, topic grouping, near de-duplication, and email threading will continue to play an important role in ediscovery filtering, review and production across the Electronic Discovery Reference Model (EDRM).

transcript

2

Discussion Overview

Case Law and Industry Guidance: The Role of Searching in Ediscovery

Back to the Basics: Keyword Searching Tips

Deep Dive: Advanced Searching Technologies

3

Judicial Viewpoints on Keyword Searching

Court required parties to “confer on the

development of reasonable search terms”

instead of compelling production without a list

of proposed search terms provided by the

requesting party

“Common practice governing the

discovery of [ESI] requires the use of

search terms . . . If the producing party

generates the search terms on its own,

the inevitable result will be complaints

that the search terms were

inadequate” EEOC v. McCormick & Schmick’s Seafood Restaurants,

Inc., 2012 WL 380048 (D. Md. Feb. 3, 2012).

4

Keyword searching plays an important role in winnowing document sets for discovery

Analyzing Search Methods

5

Objective of search: high recall and precision

» Recall – fraction of relevant documents found during review

» Precision – fraction of identified documents that actually are relevant

In this example, fruit is relevant; broccoli is not.

Designing Effective Keyword Searches

1. Understand your search engine

» Learn how each operator works (OR, AND, PROXIMITY, etc.)

» Be aware of operator precedence (Boolean or left-to-right) and use parentheses to clarify

» Work with ediscovery provider to create an alternative strategy for lengthy searches that may “time out”

6

Designing Effective Keyword Searches

2. Develop a search strategy

» Run broad searches for date-range culling, etc. then use results as scope for sub-level searches

» Save searches and search results for future use and reference

» Find on-point documents and use “similar” documents and concepts to provide additional key terms

» Know your universe (foreign language requires foreign keywords!)

7

Designing Effective Keyword Searches

3. Build smart keyword lists

Use a text editor to reduce errors

» Programs that format text can cause difficulty

» Use a program like Notepad and place each term on a separate line

» Spell check

» Be aware of commonly misspelled keywords or privilege terms

Understand the impact of your key terms

» Be flexible: account for word/phrase permutations – use a “Data Dictionary”

» Over-inclusive? Under-inclusive?

» “Noise words” increase likelihood of false hits

8

Advanced Searching Technologies

What are some “new and evolving” search methods?

1. Concept Searching

2. Topic Grouping

3. Language Identification

4. Email Threading

5. Near De-Duplication

6. Sampling

**Technology-assisted Review

9

Will not cover in this

presentation – hot, evolving

topic!

Will cover in this presentation

Keyword Searching Concept Searching

Allows reviewers to find

documents with similar

conceptual terms even if they

do not contain the exact

search terms

Seldom used for filtering;

increasingly used for review

1. Keyword Searching vs. Concept Searching

Uses search terms to

retrieve documents that

contain those exact

terms

10

Standard practice; generally

accepted in the courts

Emerging as a technology alternative

2. Topic Grouping

Documents automatically grouped by theme without human input

Topic grouping will group similar documents and label them for quick identification

Users do not need to “seed” the processing engine by providing keywords

11

3. Language Identification

This technology can identify all languages in a document as well as the primary language and pass this information along via a metadata field

A legal team needs to know what languages are in a collection, and the volume of foreign language documents

Reports can help determine whether to use machine translations, foreign language reviewers, or a combination

12

4. Email Threading

Identifies and groups for review e-mail conversations based on content

Using actual content of the e-mails to identify e-mail threads is the most reliable method, as it will not fail to recognize a thread if the subject line changes or if e-mails are exchanged across different e-mail applications

13

5. Near De-Duplication

Reviewers can quickly identify and compare documents that are very similar to one another but are not exact duplicates

Technology assesses document set’s similarities, identifying the most uniquely representative documents as “the core”

» All related documents are then grouped around the core

14

6. Sampling: Defensibility & Quality Control

Sampling is the practice of looking at a certain % of documents in a data set or particular folder of data

» Strengthens the defensibility of the process

» Helps validate what you have (and equally important, do not have) in your production set

» May take place iteratively throughout the review process or prior to production

– During ongoing quality control

– At the end to assess completeness of review

15