XML Accounting Trail - Oral Defence v1.2

//Defence of MSc Dissertation

XML Accounting Trail: A model for introducing digital forensic readiness

to XML Accounting and XBRLby

Dirk Kotze

22 July 2015Promotor: Prof. Martin S. Olivier

2

Introduction

21st century economy: information Business: Need for making sense & sharing of

financially relevant information (e.g. accounting data)

XML Rise Requirements Lit. review – XML weaknesses

Cyber-Crime $800.5 million – 2014 Mostly fraud 60% discovered by accident/tip-off (ACFE 2014) Digital forensics

3

Introduction (2) Big Data Problem – How does investigator know

whether XML financial accounting data has been modified?

Research problemXML financial data is susceptible to tampering due to the human-readability property required by the XML data specification. Upon presentation of a set of XML financial data, how can one determine whether data has been tampered with, and reconstruct the past events so that the nature of tampering can be determined?

Purpose Detect Reconstruct

Research Method Method (detecting) Model (reconstructing)

4

Background

Overview of key topics Most of you should be familiar Brief discussion of key concepts necessary to

understand later work Will be discussing

Digital Forensics Compilers

Won’t be discussing (assume everyone is familiar with and due to time constraints)

XML

5

Background: Digital Forensics

Definition (McKemmish)“the application of computer science and investigative procedures for a legal purpose involving the analysis of digital evidence after proper search authority, chain of custody, validation with mathematics, use of validated tools, repeatability, reporting, and possible expert presentation”

Economics Cost Disruption Complexity (Data to analyse, Anti-Forensics)

Forensic Readiness

6

Background: Compilers

Stages of compilation Analysis & Synthesis Synthesis out of scope

Analysis Lexical Analysis Syntactic Analysis Semantic Analysis

Error handling Panic mode Phrase Level

Minimise noise (graph) Error Productions

Pre-specify known patterns of data irregularities Global Correction

Determine potential data irregularities introduced – minimum change required to make input correct.

7

Detecting Data Irregularities

Problem Statement Forensic Pathology Analysing XML files

Rigid structure and accounting rules Definition of data irregularity

Any unauthorised modification to XML accounting data that impact the semantic meaning of the financial accounting content.

How do these occur? Direct modification (bypassing controls and rules) Indirect modification (via application) of illegitimate

transaction Large Data Set Problem

8

Detecting Data Irregularities (2)

Analysing XML files Trend analysis/pattern analysis

Double entry example Salami attack example

Manual vs Automated Searching Automating the search for data irregularities

Compiler Theory Classification of input, based on patterns as well as pre-

defined rule sets; and Recursive identification of patterns, using decision tree. Handling of errors

9


10


How process works Establish Rule Set

normal transactions (no errors will be noted); and error productions (patterns of transactions that deviate

from the norm i.e. data irregularities). Execute Compiler Results

Disclaimers

11

Applying Automated Detection of Data Irregularities

Application Consider sample XML Accounting Format

Example 1: Generic XML accounting data format<Transaction>

<ID> 101-1 </ID><Account> Bank </Account><Action> Credit </Action><Amount> 25000 </Amount><User> 012437 </User><Date> 6/19/2011 8:25:02 AM </Date><Hash> 1a88f9a8293e88c87ae1ae5f8bd63585 </Hash>

<Transaction>

12

Applying Automated Detection of Data Irregularities (2)

Type of Error

Lexical Syntactic Semantic

XML Data

1.1. Tag not opened or closed correctly, e.g. a missing ‘<’ or ‘>’.1.2. Amounts that contain non-numeric characters.1.3. Reserved characters (< or >) used in transaction statement.

2.1. The XML schema is violated.2.2. A transaction entry has a missing or imbalanced tag for:• Transaction, Balance,

Hash, User, Date, Amount, Account, etc.

2.3. Tags that are not defined, e.g. a tag containing a spelling error on the tag name. 2.4. An entry matches one or more predefined rules specifying an incorrect transaction.

3.1. Tag is not correctly specified to match the content described by the tag, for example the tag attribute incorrectly specifies 24 hour time whilst the time is specified in AM/PM ‘<’Time format=”HH:mm”‘>’ 12:30 PM ‘<’/Date‘>’.

13


Type of Error

Lexical Syntactic Semantic

Data Errors (Errors in Accounting Data)

1.4. Irregularities are found in the formatting of the data, introduced by editing the machine generated data, e.g. numbers within tags are given a comma to indicate thousands, but the comma is omitted in certain numbers.1.5. The data contained within XML tags is bad, e.g. a ‘;’ or ‘@’ character occurs in a number, or date with the month specified as larger than 12.

2.5. A violation of the hierarchical structure and/or order of the tags, e.g. an ID tag that exists in isolation (instead of belonging to a parent tag, such as a transaction), or a transaction tag without children.2.6. The allocation of optional tags that is not applicable to the tag object, e.g. listing anasset number together for a vehicle in a furniture purchase transaction.

3.2. Omission of part of a transaction, e.g. a transaction with a missing corresponding double entry.3.3. Transaction ID Errors: ID skipped ID repeated3.4. Violation of transaction logic, e.g. purchase fulfilment comes before order.

14


Handling of errors: Lexical: Typically Panic mode. Syntactic: Panic mode or Phrase-Level correction. Also, error

productions. Semantic: Can be done in rule set, e.g. error productions or global

correction, but needs additional consideration by investigator. Handling of semantic errors:

Allows for hypothesis leading to reconstruction Investigator can look at:

Statistical analysis e.g. Benford Benford’s law (also known as the first-digit law), applies to most large

sources of numerical data and refers to the frequency distribution of the first digit of such data. In summary, Benford’s law concludes that digits starting with a ‘1’ should occur around 30% of the time, whilst larger digits occur in that position less frequently.

Analysis of time trends Transaction order

15

Advantages

Investigation time shortened Triage: Indication of whether XML accounting data

file requires further investigation Little chance of error/non-detection of data

irregularities.

16

Reconstructing the events

Problem statement Investigative questions

When? What? Who? Why? How?

XML does not store this info and not available elsewhere

Black box Similar to aircraft crash Instrumentation

17

Reconstructing the events (2)

Minimum set of evidence required: Evidence showing the details of the data modifications; Evidence stating the date and time of the modification;

and Evidence showing who modified the data. How & why not covered.

Architecture Logging of evidence

Event reconstruction history not available Need for real time logging

Interrupts vs. Real-time Proxy Reference monitor

Circumvention? Need for tamper-proofing of XML file. Digital Signatures (email)

18

Reconstructing the events – Need for a reference monitor

19

Reconstructing the events (3)

Reconstructing the ‘What?’ Version Control

Reconstructing the ‘When?’ Logging Timestamps

Local vs. trusted external Reconstructing the ‘Who?’

Disclaimer Username/Password authentication

Storing the evidence Encryption

20

Reconstructing the events: Overview of the XML Accounting Trail Model

21

Reconstructing the events: Overview of the XML Accounting Trail Model (2)

22

Conclusion

Research problemXML financial data is susceptible to tampering due to the human-readability property required by the XML data specification. Upon presentation of a set of XML financial data, how can one determine whether data has been tampered with, and reconstruct the past events so that the nature of tampering can be determined?

Proposal Method to detect data irregularities

Compiler Model to reconstruct events

Instrumentation

23

Conclusion (2)

Self evaluation & future work Despite best efforts, areas in research always exist

where answers may not be clear or proposed solution leads to more questions. Therefore, important to step back and reflect on suggested work suggested to ID shortcomings and areas for future work.

Detecting data irregularities Shows great promise but no real world implementation

(prototype) Rule set is key

Incomplete/bad rule set – compiler won’t work Template Rule Sets (future work)

Expanding use of errors & error handling Global error correction

24

Conclusion (3)

XML Accounting Trail Lack of real-world implementation If reference monitor compromised, work has been for

naught. Secure private key already some protection, but not

complete Securing reference monitor using anti-forensics & anti-

hacking to protect private key against extraction

25

Published Work

XBRL-Trail: A Model for Introducing Digital Forensic Readiness to XBRL, In Proceedings of the Fourth International Workshop on Digital Forensics & Incident Analysis (WDFIA), 2009, pages 93-104.

Detecting XML Data Irregularities by Means of Lexical Analysis and Parsing. In Proceedings of the 9th European Conference on Information Warfare and Security, 2010, pages 151-159.

26

Acknowledgements

Prof. Martin S. Olivier Prof. Stefan Gruner Dr. Wynand van Staden Employers (PwC/RMB) specifically Michael Nean Fiancé – Dr. Sheena Steyl Mom & Dad Dedicated to my Mom (passed away 09 Sept 2009)

27

Questions?

Date post:	19-Jan-2017
Category:	Documents
Upload:	dirk-kotze
View:	199 times
Download:	1 times

XML Accounting Trail - Oral Defence v1.2

Documents