1
EIE 4114 Digital Forensics for Crime Investigation
Lecturers: First half: Dr Bonnie Law
Second half: Dr Wen Chen
EIE4114 My part: 5 September to 15 October
Room: DE 609 Tel: 2766 4746 [email protected]
http://www.eie.polyu.edu.hk/~nflaw/eie4114.html
3
Topics Covered Forensics framework Collecting, searching and sorting evidence Machine learning forensics
For crime investigation, prevention and detection
Authenticating and attributing evidence source for email/image, identity behind
social media account Forensics and Issues of anti-forensics 4
Assessment
Examination: 50% Continuous Assessment: 50% (2 parts) My part: (27.5%)
Quiz: 10% 26 September, 10 October
Laboratory sessions (2.5%) – 17 Sept, 15 Oct Mini-project (15%) (2 in a group)
Phase 1: project proposal: due on 3 Oct 2019 Phase 2: project report and presentation: 10 mins video:
both members need to explain the findings verbally (29 November
Mini-Project: Phase 1: proposal
Outline a forensics problem to be addressed Identify techniques/software/methods to solve the
forensics problem Phase 2:
State the forensics problem A detailed comparison of two
techniques/software/methods to solve the problem Outline the features of the new tools to be developed
/ propose new ideas that may better solve the problem
Forensic Science Application of scientific methods to establish
factual answers to legal problems “What has happened” “How did it happen” “Who has involved” “When did it occur”
Digital forensics: application of computer science and investigative procedures for a legal purpose Goal: reconstruct the incident and find supporting or
refuting evidence
7
Forensics process Defines a structured investigation of digital
evidence from any device capable of storing or processing data in a digital form Maximizing the usefulness of incident evidence
data, minimizing the cost of forensics Cost:
The estimated investigation time in NZ hacker’s case, characterized as a typical intrusion scenario, was 417 hours, resulting in investigation cost of $27,800 (one victim)
A Russian hacker’s case (automated online auctions using stolen credit card) that resulted in prosecution took 9 months of investigator’s time. A partial estimate of the cost was $100,000.
Sometimes, if cost was greater than benefit dismissal of charges
8
Forensics process Example:
Case: received an email with potentially important evidence of a crime
As email was sent over the Internet origin of the email must be considered as uncertain, as must the timestamps (may be tampered with while en route)
How may one know that it was created by the system’s owner, not by intruder? Or not by Torjanhorse/other malware)
usefulness of evidence Evidentiary weight in a court of law, relevant
and sufficient for, trustworthiness of the evidence, determining root cause, linking the attacker to the incident, …
10
Definition of Digital Forensics
Digital forensics involves the analysis of digital evidence after proper
search authority Evidence integrity: Preservation of evidence in its original form (without any
intentional or unintentional changes) chain of custody, validation with mathematics, use of validated tools, repeatability, reporting
and possible expert presentation National Institute of Standards and Technology (NIST)
(www.cftt.nist.gov): create a set of criteria for evaluating forensics tools
11
Digital Forensics Covers the general practices of analyzing
all forms of digital evidence Include
Computer forensics (file system forensics) Preservation and analysis of computers
Network forensics Traffic analysis and logs from networks
Mobile device forensics Cell phones, smart phones, satellite navigation
systems (GPS) Malware forensics
Malicious code (viruses, worms, Trojan horse)
Digital Forensics Process
Evidence integrity: Preservation of evidence in its original form (without any intentional or unintentional
changes) Chain of custody
Refers to the documentation of acquisition, control, analysis and disposition of physical and electronic evidence
Shows how the evidence is acquired, managed, transferred during the investigation process, and who involve in the process, what their responsibilities are and for how much time they store the evidence and how they transfer it to some else
Example: Chain of Custody
Chronological documentation
Source (Digital evidence) data stored or transmitted using a computer that
support or refute a theory of how an offense occurred or that address critical elements of the offense such as intent Computer systems: laptops, desktops, servers, … Communication systems: Internet, networks,
GPS, SMS messages, email, website that was visited,
Embedded computer systems: mobile devices, smart cards, security cameras, …
Digital evidence: fragile, can be modified / edited 14
Example of Digital evidence
a scanner: is used to digitize illegal photos Evidence: has unique scanning characteristics that
links the hardware to the digitized images can seize as digital evidence
All service provides (e.g., telephone companies, ISPs, banks, credit institutions) Reveal location and time of an individual’s activities
(items purchased, car rentals, automated toll payment, mobile telephone calls, Internet access, online banking/shopping, …)
15
Example of Digital evidence
“Cost” consideration: Based on the type of incident or crime scenario,
focus on the most common places for evidence Hypothetical scenarios: considered possible
scenarios that generate potential evidence and plan to collect it in a proper manner
Questions: Where is the data? Format? How long is it stored or
retained? How much data is produced? Who is the owner? Who has
access? Is data generated during normal operations?
16
17
Investigation Investigating digital devices includes:
Collecting data securely Examining suspect data to determine details such as
origin and content Presenting digital information to courts Applying laws to digital device practices
Goal: Present supporting facts and probabilities Resist the influence of others’ opinions and
avoid jumping to conclusions Evidence: authentic and has not been tampered 18
Collect digital evidence Investigative plan: identify sources of data Collect data according to the volatility of
the data (data lifetimes)
19
Collect digital evidence Recover data from
Deleted files File fragments Complete files
Deleted files still on the disk until new data is saved on the same physical location
Tools can be used to retrieve deleted files ProDiscover Basic
Example: “search” Sample data search
Identify and extract all email and deleted items
Search media for evidence of photos Configure and load sized database for data
mining Recover all deleted files for review
20
Examination Large volume of data File hashes can be used to identify files
Known good files: many files belong to OS, software or other applications do not contain useful evidence
Known bad hash databases: identify suspicious files like malware or images known to be associated with criminal activities
National software reference library: www.nsrl.nist.gov
Analysis Statistical methods, manual analysis, techniques
for understanding protocols and data formats, linking of multiple data objects (through the use of data mining) and timelining analysis Keyword searches: targeted analysis technique that
can be used if one knows what to look for E.g., imagine an illegal drug case where the
investigation was triggered based on a reported crime with specific info about a person and a certain drug or its code name the name of the person or the drug can be used as a keyword
Analysis Pattern matching: (regular expression)
social security numbers can be relevant for identity theft cases
credit card numbers/account numbers for fraud cases
File properties such as name, type, size, data of creation, and when accessed
Phone numbers Addresses (IP, email, physical home or work, along
with website URLs)
Analysis Who/What
Who or what application created, edited, modified, sent, received or caused the file to be?
Who is this item linked to? Where
Where was it found? Does it show where relevant events took place? Evidence points to a common source?
26
Analysis When
When was it created, accessed, modified, received, sent, viewed and deleted?
Time analysis? How
How did it originate on the media? How was it created, transmitted, modified and
used?
27
Analysis Reconstruction (Timeframe analysis)
Understanding the sequence of events Association (connects a person to a crime
scene)
28
29
Forensic Analysis Groups such as the Scientific Working Group on
Digital Evidence set standards for recovering, preserving and examining digital evidence
Scientific evidence: evaluated using 4 criteria Whether the theory or technique can be (and has been) tested) Whether there is a high known or potential rate of error, and
the existence and maintenance of standards controlling the technique’s operation
Whether the theory/technique has been subjected to peer review and publication
Whether the theory/technique enjoys “general acceptance”within the relevant scientific community
Case Studies 5 different case studies Steps:
Identify relevant information concerning the case
Locate all files and find relevant info(how?) Associate files with …? Reconstruct the events/activities
30
Forensic Framework
31
Collection Identify and collect
digital evidence
selective acquisition?cloud storage?Generate data subset for
examination?
Examination of evidenceString search?Pattern matching?Data visualization (time-
line analysis)?Analysis
Forensic Framework
32
Data mining?cluster analysisdiscriminant analysisrule mining
Presentation
Analysisdetermine data significance and draw conclusion
Supplementary Info Clustering
Motivation: big volume of data (files) Manual inspection / string comparison: not
effective Desirable
Automatic system: document clustering into different groups
Objects within a cluster are more similar to each other than with other clusters Focus investigation on certain clusters 33
Supplementary Info
34
Supplementary Info File: feature vector:
term frequency: how frequent the term appears TF (t) = no of times the term t appears in a
document / total number of terms in the document
Inverse document frequency: how important the term is IDF (t) = Log (total number of documents / no
of document with term t in it)35
Supplementary Info Example:
10 million documents found 1000 of these 10 million documents contain
the term “Honda” In a document containing 100 words, the
term “Honda” appears 3 times TF (“Honda”) = 3/100 = 0.03 Idf (“Honda”) = log (10,000,000/ 1000) = 4 Feature = 0.03 x 4 = 0.12
36
Supplementary Info Features: a matrix
Each document has a set of TF-IDF features
similarity: between two documents: (0.12-0)^2 + (0.34-0)^2 +(0.55-0)^2 +
(0.11-0)^2 + (0.44-0)^2 37
Doc Honda Check License Phone Buy Sell …
1 0.12 0.11
2 0.34 0.55 0.44
…
Example 1 million document Identify six terms: “Honda”, “Check”,
“License”, “Phone”, “Buy”, “Sell” Term frequencies: 10000, 1000, 1000,
50000, 6000, 1000 Document 1: [10, 0, 0, 5, 5, 2] Document 2: [10, 0, 0, 5, 3, 5] Document 3: [3, 0,2, 1, 0, 0]
38
Example
39
Honda Check License Phone Buy Sell
idf 2 3 3 1.3 2.2 3
df 10000 1000 1000 50000 6000 1000
d1 d2 d3 tf
Honda 10 10 3Check 0 0 0
License 0 0 2
Phone 5 5 1Buy 5 3 0Sell 2 5 0
Example
40
Tf-idf d1 d2 d3
HondaCheck
License
PhoneBuySell
d1-d2:d1-d3:d2-d3:
Supplementary Info Clusters: use similarity
41
Inter-cluster distances are maximized
Intra-cluster distances are minimized
Example: Financial
crime:
42
Self-study: case study Censorship through Forensics:
Video analysis in post-war crisis https://www.cmu.edu/chrs/documents/Wexler-Censorship-Through-Forensics.pdf
On August 25, 2009, Channel 4 News in U.K. broadcast a video depicting men in Sri Lankan military uniforms shooting naked, bound prisoners in the head.
43
Self-study: case study Channel 4 acquired the video,
approximately one minute long, from Journalists for Democracy in Sri Lanka Condition: total anonymity of the source
Forensic analysts sought to resolve these speculations by examining the video file for traces of image manipulation. difficulties
44
Self-study: case study Background:
Hashing to verify copies are the same Check the consistency with the proprietary
container format Meta data:
Start / end time, duration of recording, sampling rate, make / model of the recording device, GPS location
Visual analysis: visual discontinuities Simultaneous visual and aural analysis 45
Self-study: case study Background:
Multiple compression detection Record in one format (usually compressed
format) Before editing can be done decompress After editing compress again
Source origin analysis (PRNU signal) Determine if PRNU signal is consistent through
the whole video PRNU signal will be different if video segments
from another device is inserted 46
Challenges in digital forensics Increasing number and size of
storage capacity Increasing volume of data + need to
provide fast results Availability of anti-forensics tools
Negatively affect the existence, amount and/or quality of evidence
47
Challenges in digital forensics
Disk cleaning utilities: overwrite existing data in disk
File wiping utilities Delete individual files by overwriting the clusters
occupied by files with random data, multiple times Guttmann standard: 35 times DoD standard: 7 times
Much faster than disk cleaning utilities
48
Challenges in digital forensics
Disk degaussing Magnetic file is applied to a digital media device The device is entirely clean of any previously
stored data Expensive approach, although effective
Trail obfuscation: Replace relevant info with false info (such as IP
address spoofing), alter metadata such as date/time stamps, log deletion/modification
49
Data Hiding Techniques Changing or manipulating a file to conceal
information Techniques:
Hiding entire partitions Changing file extensions Setting file attributes to hidden Bit-shifting Encryption Password protection
50
Data Hiding Techniques Changing file extensions: first techniques
to hide data Compare the file extension with file headers
Bit-shifting Changes data from readable code to data that
looks like binary executable code Data fabrication: e.g., modifying MAC info
(modified, accessed, created dates) or create excessive amount of data of certain type 51
NEW CHALLENGES
52
US election in 2016 https://www.nytimes.com/2015/06/0
7/magazine/the-agency.html?_r=2
53
Reports http://www.theverge.com/2016/11/14/13
626694/election-2016-trending-social-media-facebook-twitter-influence
54
Reports Agency: organized disinformation
campaigns on social media using pseudonyms and virtual identities Promoting false news events influencing public opinion on politics Digital forensics?
Determine underlying identities of these agency’s employees content originating from them could be flagged and monitored (or banned?) 55
Reports American Scientist (Sept-Oct 2013,
volume 101, no 5) Without developing fundamentally new
tools and capabilities, forensics experts will face increasing difficulty and cost along with the ever-expanding data size and system complexity.
56
Summary Definition of digital forensics Case studies Forensics framework Challenges in Forensics
57