5/29/2012
1
Forensics and Electronic Documents: Critical Activities, Considerations, and Steps for Success
Effective Internal Investigations
For Compliance Professionals
November 10, 2011
Agenda
• Electronically Stored Information
• eDiscovery For Internal Investigations
• Preliminary Investigative Planning
• How To Approach Each Stage
• Computer Forensics
• Data Breach Investigations
• Q & A
2
5/29/2012
2
What is ESI? Where Can It Be Found?
3
150 GB = 11.25 Million Pages Boxes = 4,500
4
How Much Are We Talking About?
=
250 GB = 18.75 Million Pages Boxes = 7,500
300 GB = 22.5 Million Pages Boxes = 9,000
•1 Box = 2,500 pages •1 MB = 75 pages •1 GB = 75,000 pages
1 2,500
5/29/2012
3
Storage and Forms of Digital Data
• Active
• Files residing on user's hard drive and/or network server
• Archival
• Data compiled in back-up tapes
• Replicant
• Temporary files created by programs, also called “ghost” or “clone” files
• Residual
• Deleted files and e-mails not actually deleted until the medium has been destroyed or completely overwritten
5
Metadata - Defined
• “System Metadata” is automatically created by a computer system and relates to system operation and file handling
• Examples: file name and date; author, time of creation or modification; file path
• “Application Metadata” can be automatically created or user created, and relates to application use and output generated including the substantive changes made to the document by the user
• Examples: prior edits, editorial comments, track changes, excel formulas, hidden rows, hyperlinks
6
5/29/2012
4
MAC Times
Vital Dates and Times
7
Metadata – Defined (cont’d)
• “Embedded Metadata” consists of the text, numbers, content, data, or other information that is directly or indirectly inputted into a Native File by a user and which is not typically visible to the user viewing the output display of the Native File on a screen or print out.
• Examples: spreadsheet formulas, hidden columns, linked files (such as sound files), and hyperlinks.
8
5/29/2012
5
Embedded
9
ESI GROWING
Data doubling within corporations every 12-18 months.
………………………………………..
Market Realities
10
MORE REGULATION
Increased corporate scrutiny and investigation due to inquiries and expectations.
BREAKING
NEWS
……………………………..…..
TECHNOLOGY COMPLEXITY
Technology options available, but only as good as support behind it.
……………………………………..…..
LEGAL CHALLENGES
Courts and regulators demand that corporate entities defend their processes.
………………………………………....
Legal and Regulatory Risks and Burdens
10
5/29/2012
6
The eDiscovery Process
11
Electronic Discovery Reference Model (EDRM)
Similar Activities To Be Performed
Nature of investigation
• Employee misconduct and abuse, fraud
• Violation of business practices and processes
• Theft of trade secrets
• Data security and cybercrime
• Foreign Corrupt Practices Act
• Antitrust
• Sarbanes Oxley (SOX)
• HIPAA investigations
Processes and techniques same for:
• Undertaking due diligence
• Reviewing business practices
• Identifying wrongdoing
• Implementing/enhancing compliance programs
12
5/29/2012
7
Goals Are Different
• Identification of culpability
• Focus on a few bad actors
• Find that “Smoking Gun”
• Rapid review process and limited focus
• Documenting what is not found in evidence may be equally important!
• Protection from liability or hope for leniency
13
Preliminary Planning
• Gathering information at kickoff
• Understand history of players
• Information already developed
• Review key issues and considerations
• Geographic locations
• Data privacy and protection laws
• Data export
14
5/29/2012
8
Preliminary Planning (Cont’d)
• Covert or overt investigation
• Internal resources available to work
• Role of IT department
• Appropriate information gathering process
• Understanding security protocols
• Is forensic analysis required?
15
Working As a Team
• Teaming Strategies
• Close alignment with investigative team and cross-communication re: work efforts
• Communication on IT policies and procedures/environment
• Aid in activation of capture mechanisms
• Security logs (pass cards, security codes)
• IM chat
• Journaling
16
5/29/2012
9
Investigative Workflow & Methodology
17
E-Discovery Provider
•Key word searches
•E-mail review
•Electronic file review
•Metadata analysis
•Phone record analysis
•Access log review
•Relationship analysis
Forensic Accounting
•Accounting reports
•Financial statement
•General ledgers
•Invoices
•Contracts
•Expense reports
•Interviews
•Office sweeps
•Corporate records
•Criminal records
•Property records
•Litigation records
•Media/News reports
Traditional Investigation
•New Corporations
Individuals
Properties
Relationships
•New Key Words
Relationships
•New Corporations
Relationships
Transactions
•New Electronic Evidence
Key Words
Relationships
•New Corporations
Transactions
Accounts
Individuals
•New Corporations
Individuals
Relationships
Data Identification: Proactive and Reactive
18
Evaluate policies & practices
Understand where potential
ESI resides
5/29/2012
10
Proactive Planning By Data Mapping
• Create inventory of data repositories
• Evaluate relevant retention and disposal policies
• Develop deliverables to satisfy legal and regulatory requirements
• Ensure mapping is cross-functional
• Prepare evergreen process
19
Identification: Ask Right Questions First
• Develop an understanding of relevant IT systems
• Physical inspection
• Interview
• Get an organizational chart
• Obtain a schematic overview of systems
• Identify business owners
• Understand retention policies
20
5/29/2012
11
Ask Right Questions First (Cont’d)
• Determine what evidence exists and where it resides
• Who’s got what, where, in what form?
• Who keeps what and for how long?
• Reporting features
• Custodian focused inquiries and capture
• Interview custodians
• Directory listings
• Include key administrators!
21
Preservation and Collection: Scope and Capture
22
Define scope and protect integrity
5/29/2012
12
Collection Scope
• Secure computers and data?
• Targeted capture and/or forensic images?
• Capture network share data?
• Retrieve loose media?
• Obtain mobile devices?
• Retrieve logs?
• Evaluate offsite and third-party systems?
• Identify and query databases?
• Consider legacy systems?
• Determine best backup tape strategy?
23
Protect Integrity and Security
• Using encrypted target drives
• Documenting all processes and procedures
• Securing data in evidence locker/safe
• Tracking and auditing the collection process
Note: Policies, processes, and procedures around
data collection may be in place if organization has
proactively addressed
24
5/29/2012
13
Preparing and Analyzing the Data
25
Prepare data for analysis and review Identify content
and refine searches
Post Collection & Pre-Review: Now What Do We Do?
• Evaluate non-user created files
• Identify file extensions of interest
• Extract or isolate files by file types
• Index and process data for search and review
Note: Critical to understand implications of single or
multi-step processing and loading
26
5/29/2012
14
Sample Analytic Approach For Active Data
An effective defensible and
transparent targeting process
• Search and validation
• Automated tools • Sampling
Advanced Technology
• Collaboration • Nuances of
language • Experience • Oversight
Human Judgment
27
Result of Targeting the Data
• Identification of critical themes, dates, time frames, custodians, and communication patterns
• Defensibility of search strategy and process
• Finding key documents to build on
• Further scoping and refinement
28
5/29/2012
15
Formalized Review and Production
29
Conduct document review Execute on
delivery requirements
Note: Services and technology must be focused on reducing the money and time spent on the largest part of the EDRM lifecycle
Document Review Dominates Budget and Time
30
5/29/2012
16
Measure Search Impact
• Measure results from queries to refine
• Reduce costs without expense to quality of data
Query # Query Total % Distinct %
02_001 (contaminat* OR discharg* OR release* OR dispos* OR leak*) w/3 (oil* OR waste* OR effluent*)
27,195 29.99% 6,392 7.05%
02_002 (pcb) OR (polychlorinated biphenyls) OR (aroclor) OR (arochlor)
32,574 35.92% 6,251 6.89%
02_003 ((greenville) OR (stony hill) OR (n woodstock) OR (north woodstock) OR (nw)) w/3 ((plant*) OR (site*) OR (facilit*) OR (location*))
42,589 46.97% 14,896 16.43%
02_004 (manufactur* process*) 4,425 4.88% 875 0.96%
02_005 (safety) w/3 ((manual*) OR (committee)) 1,269 1.42% 802 0.88%
31
Iteration 01
Iteration 02
Iteration 03
Execute Search
Review Dataset
Approved Indexed Dataset
Test Sample Execute Measure &
Report Modify Validate Document
Get To Key Issues Rapidly and Effectively Using Iterative Search Techniques
Report Measured Results Consult with Team Modify Criteria as Appropriate
32
5/29/2012
17
Precision and Recall
33
Good Precision
High Responsive
Rate
Good Recall
Fewer Missed
Items in Review
A balance between Precision and Recall will provide more responsive documents with fewer responsive items missed.
Collection Actual Responsive Actual Privileged Search Result
Measure: Full Production Example
Assuming all docs in collection reviewed
34
5/29/2012
18
Search Term
Results
Measure: Good Precision / Poor Recall
35
Under-inclusive search.
Good candidate for defensibility challenge
Not an unduly expensive, but yet incomplete review
scenario
Collection Actual Responsive Actual Privileged Search Result
Search Term Results
Measure: Good Recall / Poor Precision
36
Over-inclusive search.
Less likely candidate for defensibility
challenge
Unduly expensive review scenario
Collection Actual Responsive Actual Privileged Search Result
5/29/2012
19
Search Term Results
Measure: Poor Recall / Poor Precision
37
Under-inclusive and over-inclusive search.
Good candidate for
defensibility challenge
Unduly expensive and incomplete review
scenario
Collection Actual Responsive Actual Privileged Search Result
Search Term Results Targeted search.
Unlikely candidate
for defensibility challenge
Right-sized review scenario as to cost
and efficiency
Measure: Good Recall / Good Precision
38
Collection Actual Responsive Actual Privileged Search Result
5/29/2012
20
Non Hit Review by Investigative Team
Precision and Recall: Getting There
Iteration 2
Testing, Feedback, Research
Case Team Interaction
Collection Actual Responsive Actual Privileged Search Result
Iteration 3
Testing, Feedback, Research
Case Team Interaction
Final Iteration
Validated
Search Criteria
Initial Search Criteria
39
Document Review: Platform Considerations
• Do you have pre-defined terms you are working with or is there any effort to refine and test?
• What foreign languages need to be reviewed?
• Can the platform support large data volumes?
• Is there any degradation of performance based on the number of users accessing the platform?
• Are there complex tagging requirements?
• Will it meet your production and reporting needs?
• What are the costs? Is the pricing predictable?
40
5/29/2012
21
What Happens To Deleted Files?
• Operating system just marks space as available
• True text of file still viewable with forensic software
• Text may stay on computer’s hard drive for years
41
Example: Unallocated Space
• Remainder of space on the
hard drive
• Is constantly used by the computer’s operating system
• May hold vast amounts of old information
42
5/29/2012
22
Data Forensics and Targeted Inquiries
• Did the employee communicate with others not previously identified during investigation?
• Evidence of any deletion or wiping software?
• Did searches against fragments, partially overwritten data identify any key communication or file?
Files on images
• Was anything deleted? Wiped?
• Were there any file extension changes?
• What websites were accessed and when?
Result: Further Refinement & Investigation
43
Web-Based Email: Spotlight
• Did employee use webmail accounts?
• Messages are read while on the internet
• Pages are in “HTML” format
• Are any additional individuals identified through webmail?
44
5/29/2012
23
Blackberries and Other Mobile Devices
45
Why Data Breaches Happen
• Targeted: “Malicious actors or criminal attacks are the most expensive cause of data breaches and not the least common”
• Targeted and Inadvertent: “Breaches involving lost or
stolen laptop computers and mobile devices remain a consistent and expensive threat”
• Inadvertent: “Negligence remains the most common
threat” 2010 U.S. Cost of a Data Breach conducted by Ponemon Institute
46
5/29/2012
24
Anatomy of Breach Investigation
47
Gain understanding of the incident
• Identify the known scope of breach
• Review IT infrastructure document to identify systems
• Interview relevant staff
• Timeline of business events
• Identify other computers potentially compromised
Perform forensic imaging and collection
• Servers, relevant laptop, and desktops
• Imaging of operating system and logs
• Gather any copies of previously preserved data for gap analysis
Anatomy of Breach Investigation (Cont’d)
48
Analyze audit logs for activity and identify source • User Assist Logs: programs and times they were run • Internet History: installation occurred and accessed sites • Prefetch Files: what and when a program was run
Network analysis logs for the when and where
• Firewall Logs: activity undertaken during time in question • Proxy Logs: logging of network web traffic and volumes • Intrusion Detection Logs: watch traffic to detect unusual activity
Perform malware analysis
• Review programs started when computer is logged on or booted
• Identify any software running in odd locations • Evaluate when malware installed
5/29/2012
25
Remediation
Reporting and remediation
• Develop and outline timeline
• Assist with technology response
Risk mitigation/incident response
• Provide management with information for action
• Monitor network for signs of additional compromise
• Patch and fix security vulnerabilities
Conduct risk assessment and independent testing
• Evaluate effectiveness and adequacy of response
• Certify security process and perform audits
49
Other Key “Quick Wins” and Best Practices
• Expand use of encryption • Inventory storage, control, and tracking • Strengthen information security governance • Deploy solutions and anti-malware tools • Improve physical and network security • Train personnel and develop awareness • Vet security of partners and providers
50
5/29/2012
26
Key Information Security Requirements
• ISO 27001
• Auditable international standard with 133 controls
• International gold standard for information security; rigorous audit process
• SAS 70
• Less defined than ISO27001
• SSAE 16
• Supersedes SAS 70
• Additional requirements added
• EU Safe Harbor and Similar Data Protection Provisions
• Certification needed to accept the transfer of data from the EU and other jurisdictions
51
52
Questions
5/29/2012
27
Thank You
Contact:
Andy Teichholz, Esq.
Senior eDiscovery Consultant
(212) 867-3044 ext. 204 [email protected]
53