Date post: | 19-Jan-2017 |
Category: |
Documents |
Upload: | dirk-kotze |
View: | 199 times |
Download: | 1 times |
//Defence of MSc Dissertation
XML Accounting Trail: A model for introducing digital forensic readiness
to XML Accounting and XBRLby
Dirk Kotze
22 July 2015Promotor: Prof. Martin S. Olivier
2
Introduction
21st century economy: information Business: Need for making sense & sharing of
financially relevant information (e.g. accounting data)
XML Rise Requirements Lit. review – XML weaknesses
Cyber-Crime $800.5 million – 2014 Mostly fraud 60% discovered by accident/tip-off (ACFE 2014) Digital forensics
3
Introduction (2) Big Data Problem – How does investigator know
whether XML financial accounting data has been modified?
Research problemXML financial data is susceptible to tampering due to the human-readability property required by the XML data specification. Upon presentation of a set of XML financial data, how can one determine whether data has been tampered with, and reconstruct the past events so that the nature of tampering can be determined?
Purpose Detect Reconstruct
Research Method Method (detecting) Model (reconstructing)
4
Background
Overview of key topics Most of you should be familiar Brief discussion of key concepts necessary to
understand later work Will be discussing
Digital Forensics Compilers
Won’t be discussing (assume everyone is familiar with and due to time constraints)
XML
5
Background: Digital Forensics
Definition (McKemmish)“the application of computer science and investigative procedures for a legal purpose involving the analysis of digital evidence after proper search authority, chain of custody, validation with mathematics, use of validated tools, repeatability, reporting, and possible expert presentation”
Economics Cost Disruption Complexity (Data to analyse, Anti-Forensics)
Forensic Readiness
6
Background: Compilers
Stages of compilation Analysis & Synthesis Synthesis out of scope
Analysis Lexical Analysis Syntactic Analysis Semantic Analysis
Error handling Panic mode Phrase Level
Minimise noise (graph) Error Productions
Pre-specify known patterns of data irregularities Global Correction
Determine potential data irregularities introduced – minimum change required to make input correct.
7
Detecting Data Irregularities
Problem Statement Forensic Pathology Analysing XML files
Rigid structure and accounting rules Definition of data irregularity
Any unauthorised modification to XML accounting data that impact the semantic meaning of the financial accounting content.
How do these occur? Direct modification (bypassing controls and rules) Indirect modification (via application) of illegitimate
transaction Large Data Set Problem
8
Detecting Data Irregularities (2)
Analysing XML files Trend analysis/pattern analysis
Double entry example Salami attack example
Manual vs Automated Searching Automating the search for data irregularities
Compiler Theory Classification of input, based on patterns as well as pre-
defined rule sets; and Recursive identification of patterns, using decision tree. Handling of errors
9
Detecting Data Irregularities (3)
10
Detecting Data Irregularities (4)
How process works Establish Rule Set
normal transactions (no errors will be noted); and error productions (patterns of transactions that deviate
from the norm i.e. data irregularities). Execute Compiler Results
Disclaimers
11
Applying Automated Detection of Data Irregularities
Application Consider sample XML Accounting Format
Example 1: Generic XML accounting data format<Transaction>
<ID> 101-1 </ID><Account> Bank </Account><Action> Credit </Action><Amount> 25000 </Amount><User> 012437 </User><Date> 6/19/2011 8:25:02 AM </Date><Hash> 1a88f9a8293e88c87ae1ae5f8bd63585 </Hash>
<Transaction>
12
Applying Automated Detection of Data Irregularities (2)
Type of Error
Lexical Syntactic Semantic
XML Data
1.1. Tag not opened or closed correctly, e.g. a missing ‘<’ or ‘>’.1.2. Amounts that contain non-numeric characters.1.3. Reserved characters (< or >) used in transaction statement.
2.1. The XML schema is violated.2.2. A transaction entry has a missing or imbalanced tag for:• Transaction, Balance,
Hash, User, Date, Amount, Account, etc.
2.3. Tags that are not defined, e.g. a tag containing a spelling error on the tag name. 2.4. An entry matches one or more predefined rules specifying an incorrect transaction.
3.1. Tag is not correctly specified to match the content described by the tag, for example the tag attribute incorrectly specifies 24 hour time whilst the time is specified in AM/PM ‘<’Time format=”HH:mm”‘>’ 12:30 PM ‘<’/Date‘>’.
13
Applying Automated Detection of Data Irregularities (3)
Type of Error
Lexical Syntactic Semantic
Data Errors (Errors in Accounting Data)
1.4. Irregularities are found in the formatting of the data, introduced by editing the machine generated data, e.g. numbers within tags are given a comma to indicate thousands, but the comma is omitted in certain numbers.1.5. The data contained within XML tags is bad, e.g. a ‘;’ or ‘@’ character occurs in a number, or date with the month specified as larger than 12.
2.5. A violation of the hierarchical structure and/or order of the tags, e.g. an ID tag that exists in isolation (instead of belonging to a parent tag, such as a transaction), or a transaction tag without children.2.6. The allocation of optional tags that is not applicable to the tag object, e.g. listing anasset number together for a vehicle in a furniture purchase transaction.
3.2. Omission of part of a transaction, e.g. a transaction with a missing corresponding double entry.3.3. Transaction ID Errors: ID skipped ID repeated3.4. Violation of transaction logic, e.g. purchase fulfilment comes before order.
14
Applying Automated Detection of Data Irregularities (4)
Handling of errors: Lexical: Typically Panic mode. Syntactic: Panic mode or Phrase-Level correction. Also, error
productions. Semantic: Can be done in rule set, e.g. error productions or global
correction, but needs additional consideration by investigator. Handling of semantic errors:
Allows for hypothesis leading to reconstruction Investigator can look at:
Statistical analysis e.g. Benford Benford’s law (also known as the first-digit law), applies to most large
sources of numerical data and refers to the frequency distribution of the first digit of such data. In summary, Benford’s law concludes that digits starting with a ‘1’ should occur around 30% of the time, whilst larger digits occur in that position less frequently.
Analysis of time trends Transaction order
15
Advantages
Investigation time shortened Triage: Indication of whether XML accounting data
file requires further investigation Little chance of error/non-detection of data
irregularities.
16
Reconstructing the events
Problem statement Investigative questions
When? What? Who? Why? How?
XML does not store this info and not available elsewhere
Black box Similar to aircraft crash Instrumentation
17
Reconstructing the events (2)
Minimum set of evidence required: Evidence showing the details of the data modifications; Evidence stating the date and time of the modification;
and Evidence showing who modified the data. How & why not covered.
Architecture Logging of evidence
Event reconstruction history not available Need for real time logging
Interrupts vs. Real-time Proxy Reference monitor
Circumvention? Need for tamper-proofing of XML file. Digital Signatures (email)
18
Reconstructing the events – Need for a reference monitor
19
Reconstructing the events (3)
Reconstructing the ‘What?’ Version Control
Reconstructing the ‘When?’ Logging Timestamps
Local vs. trusted external Reconstructing the ‘Who?’
Disclaimer Username/Password authentication
Storing the evidence Encryption
20
Reconstructing the events: Overview of the XML Accounting Trail Model
21
Reconstructing the events: Overview of the XML Accounting Trail Model (2)
22
Conclusion
Research problemXML financial data is susceptible to tampering due to the human-readability property required by the XML data specification. Upon presentation of a set of XML financial data, how can one determine whether data has been tampered with, and reconstruct the past events so that the nature of tampering can be determined?
Proposal Method to detect data irregularities
Compiler Model to reconstruct events
Instrumentation
23
Conclusion (2)
Self evaluation & future work Despite best efforts, areas in research always exist
where answers may not be clear or proposed solution leads to more questions. Therefore, important to step back and reflect on suggested work suggested to ID shortcomings and areas for future work.
Detecting data irregularities Shows great promise but no real world implementation
(prototype) Rule set is key
Incomplete/bad rule set – compiler won’t work Template Rule Sets (future work)
Expanding use of errors & error handling Global error correction
24
Conclusion (3)
XML Accounting Trail Lack of real-world implementation If reference monitor compromised, work has been for
naught. Secure private key already some protection, but not
complete Securing reference monitor using anti-forensics & anti-
hacking to protect private key against extraction
25
Published Work
XBRL-Trail: A Model for Introducing Digital Forensic Readiness to XBRL, In Proceedings of the Fourth International Workshop on Digital Forensics & Incident Analysis (WDFIA), 2009, pages 93-104.
Detecting XML Data Irregularities by Means of Lexical Analysis and Parsing. In Proceedings of the 9th European Conference on Information Warfare and Security, 2010, pages 151-159.
26
Acknowledgements
Prof. Martin S. Olivier Prof. Stefan Gruner Dr. Wynand van Staden Employers (PwC/RMB) specifically Michael Nean Fiancé – Dr. Sheena Steyl Mom & Dad Dedicated to my Mom (passed away 09 Sept 2009)
27
Questions?