Date post: | 19-Jul-2015 |
Category: |
Law |
Upload: | robbie-hilson |
View: | 166 times |
Download: | 1 times |
2
Join Today! aceds.org/join
Exclusive News and Analysis
Monthly Members-Only Webcasts
Networking with CEDS, Members
On-Demand Training
Resources
Jobs Board
bits + bytes Newsletter
Affinity Partner Discounts
“ACEDS provides an excellent, much needed forum… to train, network and stay
current on critical information.”
Kimarie Stratos, General Counsel, Memorial Health Systems, Ft. Lauderdale
4
PRESENTERS
Stephanie L. Giammarco sits on BDO’s Board of Directors and leads
BDO’s Forensic Technology Services practice with more than 20
years of experience and a background in accounting, information
technology and criminology. Having worked on some of the largest
financial frauds to date, she has led teams creating databases of
millions of records, performed advanced data analytics and provided
testimony pertaining to damages and electronically stored
information.
Stephanie provides litigation and consulting services to organizations
and their counsel, including data analytics, computer forensics and
e-discovery services related to domestic and international matters
involving product liability, financial statement fraud, class action
lawsuits, internal investigations, securities fraud, employee and
vendor schemes, and breach of contract. She is skilled in the
collection, preservation and analysis of electronic evidence, as well
as the implementation of various e-discovery tools.
She has been deposed as a Rule 30(b)6 e-discovery witness and
testified before the Judicial Arbitration Services on the calculation
of damages in contract disputes. Stephanie has published and
presented on a range of computer forensics and e-discovery topics,
including before the Securities and Exchange Commission, Security
Industry Authority and National Futures Association.
Chris J. Lopata is of counsel at Jones Day in New York. His practice
focuses on complex and general civil litigation, including product
liability, toxic torts, credit reporting, and a wide range of business
litigation.
Chris is a member of the firm's e-Discovery Committee and serves as the
New York office coordinator for e-discovery issues. Chris has led
discovery teams in numerous joint defense groups. He has extensive
experience coordinating affirmative and defensive e-discovery efforts on
behalf of clients.
Chris' practice extends beyond pretrial e-discovery. He has served as lead
trial counsel in a variety of commercial disputes in New York State
courts. He also has counseled clients who have sought and obtained
favorable settlements in non-trial bound business disputes.
The views set forth herein are the personal views of the author and do
not necessarily reflect those of the law firm with which he is associated.
Stephanie L. Giammarco, CPA/CITP, CFE, CEDS
Partner, BDO Consulting
Direct: 212-885-7439
Christopher J. Lopata
Of Counsel, Jones Day
Direct: 212-326-3602
5
OUR AGENDA
1. A quick poll of the audience…
2. Structured v. unstructured data
3. Some necessary definitions
4. Examples of database-driven applications
5. The database schema and data dictionary
6. Theories of database discovery
7. Database discovery: methods for “pulling” data for review and
production
8. Practice pointers
7
A QUICK POLL…
Who knows what a database is?
A fancy Excel spreadsheet. A collection of rows and columns, each populated with a value.
Who has used a database as part of their personal or work activities?
All of you have…Google & Lexus for research. Your time-keeping system, Concordance, Summation,
and Relativity, are all databases. Your company’s email system is effectively a database.
Who has had to conduct discovery from a database (or database-driven application)?
Sales and Marketing (CRM), Human Resources (HRIS), and GL/Inventory (ERP). SAP, and Hyperion
are perfect examples.
Bonus Question: Who can tell me what a relational database is?
A bunch of Excel spreadsheets (tables) linked together by a common key…
9
DEFINITIONS| UNSTRUCTURED V. STRUCTURED DATA
Unstructured Data
Wikipedia definition: Unstructured Data (or unstructured information) refers to
information that does not have a pre-defined data model. Unstructured information is
typically text-heavy, but may contain data such as dates, numbers, and facts as well.
Translation: MS office files, loose files, most of the information that you can see via
Windows Explorer.
Structured Data
Definition: Structured Data is information that resides in fixed fields within a record
or file, or is information that is organized into rows and columns, with pre-set
characteristics.
Translation: Multiple tables, containing rows and columns which relate to each other
via common key.
10
DEFINITIONS| THE TABLE (THE CORE OF THE DATABASE)
Records, not files…
Rows v. columns
Tables maintain the relationship
between columns
A field is another way of saying column
Data values, in the context of rows,
columns and tables, is the substance
Real-time, constantly changing
information
Data dictionary
Schema
11
DEFINITIONS| THE RELATIONAL DATABASE
Some databases only have one table
(flat file systems) and are no different
than a Microsoft Excel spreadsheet (very
rare).
Relational databases, which are much
more common, have multiple tables,
each with a key that “links” them
together.
How can relational databases be more
challenging to handle than flat file
systems in the context of discovery?
Why do we use databases?
13
DATABASES| DATABASE-DRIVEN APPLICATIONS
A database, when combined with a user interface is often called a database-driven
application.
Enterprise Resource Planning (ERP)
Data Warehouses & Business Intelligence Systems
Human Resource Information System (HRIS)
Customer Relationship Management (CRM)
Adverse Effects Systems
SharePoint
Email Archiving Systems
kCura Relativity
DATABASES ARE ALL AROUND US AND WE WORK WITH THEM EVERY DAY.
14
DATABASES| THE SCHEMA
The database schema is the key to understanding:
What tables of data exist within the relational database.
The name assigned to each column within each table.
How the columns are grouped together in each table.
How the tables relate to each other.
15
DATABASES| THE DATA DICTIONARY
Within the STUDENTS table, there are two columns of information.
– The STUDENT column contains the name of the student enrolled in the university
– The ID column is the unique identification number assigned to each student
Within the ACTIVITIES table, there are four columns of
information.
– The ID column is the unique identification number assigned to each
student
– The ACTIVITY1 column contains the name of the activity they are
registered for
– The COST1 column contains the fee paid to the school for the activity
– The ACTIVITY2 column represents the secondary (if any) activity that
the student is registered for
– The COST2 column contains the fee paid to the school for the
secondary activity
The ID field is the primary key between the STUDENTS and ACTIVITIES tables.
18
DATABASE DISCOVERY
Theory #2 – Data is all that matters...
Databases are huge, historical repositories of “activity”
– Information inserted into a CRM system by an sales person, recording customer wins and losses,
potential new business opportunities, or even other uses for a medication he or she is selling
(Pharmaceutical Sales).
– The price point for a specific medication inserted into a POS system, and the entity that is
paying for it (Medicare Fraud).
– A history of consistent payments to a “false” or “suspicious” entity in the general ledger
(within the ERP system) (FCPA).
The best way to identify trends is to pull large amounts of data into a usable format -
sort, filter, and investigate.
19
DATABASE DISCOVERY| THE “BRUTE FORCE” METHOD
Just get the data out. Common in DOJ and FTC requests for data. Also used to provide
raw data to experts for analysis.
Sample DOJ Database Request
1. Identify each electronic or other database or data set used or maintained by the company at any
time after January 1, 2009, without regard to custodian, that contains information concerning the
company’s (a) products and product codes; (b) facilities; (c) production; (d) shipments; (e) sales;
(f) prices; (g) margins; (h) costs, including but not limited to production costs, distribution costs,
research and development costs, storage costs, standard costs, expected costs, and opportunity
costs; (i) patents or other intellectual property; (j) research or development projects; or (k)
customers, to the extent such customer information is not provided in response to specifications 9
and 10. For each such database, identify (i) the database type, i.e., flat, relational, or
enterprise; (ii) the size in both number of records and bytes of information; (iii) the fields,
query forms, and reports available or maintained; and (iv) any software product or platform
required to access the database.
20
DATABASE DISCOVERY| THE “BRUTE FORCE” METHOD
2. Submit a useable copy of each database or data set identified in response to specification 1), any
accompanying data dictionary, and any software product or platform required to access the
database or data set. For each database or data set identified in response to specification 1) that
contains cost or margin information, submit one copy of each regularly produced (no more
frequently than in four week periods) report generated using that database since January 1, 2009,
and any documentation that defines, describes or explains the calculation in any terms, measures,
or aggregations appearing on the materials provided.
3. For all databases or data sets produced in response to the specifications 1) and 2), describe in
detail the relationship of the different tables in the database (e.g., an entity relationship diagram
and all foreign keys) and submit documents sufficient to show the tables that are populated by the
company, and the following items for each table: (a) the size of the table in both number of
records and bytes of information; (b) the table name; (c) a general description of the
information contained in the table; (d) a list of field names; (e) a definition for each field as it
is used by the company, including the meanings of all codes that can appear as field values; (f)
the format, including variable type and length, of each field; and (g) the primary key in a
given table that defines a unique observation.
21
DATABASE DISCOVERY| THE “BRUTE FORCE” METHOD
Why is this request so difficult and what is the potential way to approach this?
Work with data dictionary and schema to determine what information exists in the
system.
With the limited information you have (table and column names, as well as limited
descriptions), attempt to ascertain what information is relevant within the database.
Find a “super user.”
Try to understand how the columns and tables that you have identified relate to each
other.
Develop a “custom” query to extract that information into a “usable” format
(Microsoft Excel, delimited text file).
Review & Produce…
22
DATABASE DISCOVERY| THE “BRUTE FORCE” METHOD
Some Potential Problems:
Unfortunately, the data dictionary and schema often do not exist, especially in the
case of a proprietary or legacy system.
If one or the other doesn’t exist, this method becomes much more complex.
Many fields in a typical database are not used, which adds complexity.
This method can be very time consuming.
Often it can result in a heated negotiation between parties (how did you choose those
fields, what other fields exist, how do we (opposing) know you gave us everything…
You can leverage in-house resources, but then they may have to testify.
23
DATABASE DISCOVERY| THE “REPORT” METHOD
Commonly used to extract data to evaluate potential damages.
Sample Request
Documents sufficient to show: (a) the number of units sold by month, year and purchaser from January
1, 2001 to the present including product numbers; (b) the revenue attributable to each food product
by month and year from January 1, 2001 to the present; (c) the gross profit attributable to each food
product by month and year from January 1, 2001 to the present; (d) the net profit attributable to each
food product by month and year from January 1, 2001 to the present; and (e) any discounts, rebates
not reflected in price per unit.
For each food product identified in your answer to above, produce documents sufficient to show your
revenue, costs, including but not limited to both fixed and variable costs for each component, and
profit margin, from January 1, 2001 to the present.
24
DATABASE DISCOVERY| THE “REPORT” METHOD
Investigate the existing reporting functionality:
Virtually every database-driven application has a built-in, somewhat user-friendly
reporting function.
Generate a list of all the “standard” reports that are typically “run” from the system.
Narrow the lengthy list to a select few and pull samples (repeat as necessary).
Review the reports to determine whether they address the relevant activity
(potentially even meet and confer on the topic).
Agree on the reports that will be produced and the timeframe applicable.
25
Practice Pointers
Databases v. Reports
Balancing the Pros and Cons
“Unstructured“ Data in the
“Structured" Database
The Truth is (Not) Always in the
Numbers
Meet-and-Confer Considerations
26
PRACTICE POINTERS
Assess the value of producing/seeking databases
versus reports
How to prove or defend the case?
What do your experts need?
How substantial are costs and burdens -- and is fee
shifting a possibility?
Is specialized software or hardware required for
native databases?
Is the database structure (not just the data) a trade
secret?
27
PRACTICE POINTERS
Balancing some of the pros and cons of databases
and reports
Reports are often easier to review, more limited
in scope, and generally less costly
Databases are often incomprehensible to mere
mortals, open to any kind of search, and
generally more expensive -- for producing and
requesting parties
28
PRACTICE POINTERS
Beware the "unstructured" data hiding in the
"structured" database
Open-text or free form fields
Redacting databases with 10+ billion entries
Anticipate privacy issues if personally identifiable
information exists
29
PRACTICE POINTERS
The truth is (not) always in the numbers
Missing data or errors in the data
Data dictionaries -- explaining the codes
Figuring out how the database is "really" used
Legacy system migrations and migraines
30
PRACTICE POINTERS
Meet-and-Confer Considerations
Scope of relevant information
Understand the systems before making/demanding
commitments
Limitations on time period, fields, geography,
business units, etc.
Availability of preexisting reports and creating
custom reports
Listings of tables, columns, rows
Data dictionaries and the schema
31
Q & AStephanie L. Giammarco
Partner, BDO Consulting
Direct: 212-885-7439
Christopher J. Lopata
Of Counsel, Jones Day
Direct: 212-326-3602