PREDICTING FRAUD
IN MOBILE MONEY TRANSFER
ADEYINKA ADEDOYIN
A thesis submitted in partial fulfilment of the
requirements of the University of Brighton
for the degree of Doctor of Philosophy
June 2018
PREDICTING FRAUD IN MOBILE MONEY TRANSFER
Supervisory team:
Dr. Stelios Kapatenakis,School of Computing, Engineering and MathematicsUniversity of Brighton
Prof. Miltos Petridis,Department of Computer ScienceMiddlesex University, London, UK
Dr. Emmanouil Panaousis,School of Computing, Engineering and MathematicsUniversity of Brighton
Dr. Georgios Samakovitis,Department of Computing and Information SystemsUniversity of Greenwich
Declaration
I declare that the research contained in this thesis, unless otherwise formally
indicated within the text, is the original work of the author. The thesis has
not been previously submitted to this or any other university for a degree, and
does not incorporate any material already submitted for a degree.
Adeyinka Adedoyin
iii
Abstract
Mobile Money Transfer (MMT) is a fast growing medium of making financial
transactions via a mobile device. It is increasingly becoming adopted in grow-
ing markets especially in developing countries. The ability of Mobile Money
Transfer services (MMT) to handle large number of small value payments
worldwide funds exchange in digital currencies and lack of oversight makes it
an attractive target for attackers and fraudsters. Although the risks inherent
in all payments channels exist in the mobile money payment environment. The
usage of mobile money transfer technologies introduces additional risks caused
by the large number of non-bank participants, higher speed of transactions
and level of anonymity compared to mobile banking and mobile commerce
systems. This provides motivation for detecting and preventing fraudulent
mobile money transactions in mobile payment systems.
The main objective of this thesis is to investigate and propose a pattern
recognition model to predict fraud in Mobile money transfer transactions. To
this end, a novel pattern recognition model has been proposed from the find-
ings of this thesis. Also, synthetic mobile money transfer transaction dataset
was simulated with possible different fraud scenario(s) to explore. The ap-
plicability of the proposed pattern recognition model was evaluated using the
simulation dataset. From the results of the experiments, a promising recogni-
tion performance was achieved. The results also provide the ranking of clusters
of transaction neighbours for new cases which may operate as an effective tool
for experts to develop preliminary insight into suspicious transactions which
can then be investigated in more detail.
Acknowledgements
I owe my sincere and endless appreciation to my supervisors Doctor Stelios Ka-
patenakis, Professor Miltos Petridis, Doctor Georgios Samakovitis and Doctor
Emmanouil Panaousis for their impeccable supervision. Their continuous en-
couragement, guidance and immense supports from inception made the com-
pletion of this thesis possible. I have to extend my thanks to all PhD students
who helped and motivated me to keep going in my research. In particular,
thanks to Mohammed AL-Obeidallah, and Jose L. Jorro-Aragoneses it was a
valuable experience sharing office with you.
Most importantly, I would like to thank my family. None of this would
have been possible without their love and encouragements. My wife, Kehinde,
for her patience and support in difficult times, and my daughter, Joan. I would
like to give an extra mention to Doctor Georgios Samakovitis for his sincere
enthusiasm in my PhD work, his relentless effort and support saw me through
the completion of this thesis. My innermost gratitude goes to my parent, who
are always there for me. I extend my thanks to my brothers and other relatives
for their continuous support towards the success of this PhD thesis.
v
Contents
Declaration iii
Abstract iv
Acknowledgements v
1 Introduction 1
1.1 Mobile Money Overview . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Operational Challenges . . . . . . . . . . . . . . . . . . . 6
1.2.2 Technological Challenges . . . . . . . . . . . . . . . . . . 9
1.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Research Methodology . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Contribution to Knowledge . . . . . . . . . . . . . . . . . . . . . 16
1.6 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7 Publications and Research Activities . . . . . . . . . . . . . . . 19
2 Background on Mobile Money Transfer Operation and Dataset 20
2.1 Mobile Money Transfer . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Mobile Money Business Model . . . . . . . . . . . . . . . . . . . 23
vi
Contents vii
2.2.1 Mobile Money Transfer Eco-system . . . . . . . . . . . . 24
2.2.2 Mobile Money in Emerging Economies . . . . . . . . . . 28
2.2.3 Regulatory Issues . . . . . . . . . . . . . . . . . . . . . . 30
2.3 Mobile Money Transfer Synthetic Data . . . . . . . . . . . . . . 32
2.3.1 Why Synthetic Data? . . . . . . . . . . . . . . . . . . . . 35
2.3.2 Synthetic Data Generation Methodology . . . . . . . . . 36
2.3.3 Creation of Synthetic Log Data . . . . . . . . . . . . . . 39
2.3.4 Synthetic Data Simulation Using MABS . . . . . . . . . 40
2.3.5 Evaluation of Simulation Data . . . . . . . . . . . . . . . 41
2.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . 42
3 A Review of Fraud Detection Techniques 44
3.1 ML Algorithms for Fraud Detection . . . . . . . . . . . . . . . . 45
3.1.1 Supervised Approaches . . . . . . . . . . . . . . . . . . . 46
3.1.2 Unsupervised Approaches . . . . . . . . . . . . . . . . . 55
3.2 Learning from Imbalanced Data . . . . . . . . . . . . . . . . . . 57
3.2.1 Data Level Methods . . . . . . . . . . . . . . . . . . . . 57
3.2.2 Algorithm Level Methods . . . . . . . . . . . . . . . . . 59
3.2.3 Cost-sensitive Learning Methods . . . . . . . . . . . . . 61
3.3 Classification Performance Measures . . . . . . . . . . . . . . . 63
3.4 Case-Based Reasoning . . . . . . . . . . . . . . . . . . . . . . . 64
3.4.1 Case-Based Reasoning Cycle . . . . . . . . . . . . . . . . 66
3.5 CBR and Machine Learning . . . . . . . . . . . . . . . . . . . . 68
3.6 Case-Based Reasoning in Fraud Detection . . . . . . . . . . . . 70
3.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . 72
Contents viii
4 Mobile Money Transfer Data Simulation 74
4.1 Mobile Money Model . . . . . . . . . . . . . . . . . . . . . . . . 75
4.2 Users’ Behaviour Model . . . . . . . . . . . . . . . . . . . . . . 76
4.2.1 Legitimate Actors . . . . . . . . . . . . . . . . . . . . . . 76
4.2.2 Bad Actors . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.3.1 Simulated Scenarios . . . . . . . . . . . . . . . . . . . . . 81
4.3.2 Input Parameters . . . . . . . . . . . . . . . . . . . . . . 82
4.3.3 Output Parameters . . . . . . . . . . . . . . . . . . . . . 85
4.3.4 Simulation Walkthrough . . . . . . . . . . . . . . . . . . 86
4.4 Evaluation of the Log Data . . . . . . . . . . . . . . . . . . . . 88
4.4.1 Quality of Data . . . . . . . . . . . . . . . . . . . . . . . 92
4.4.2 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . 93
4.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . 99
5 Design of Mobile Money Transfer Fraud Detection System 100
5.1 Proposed Detection Method . . . . . . . . . . . . . . . . . . . . 102
5.2 Standard CBR Model . . . . . . . . . . . . . . . . . . . . . . . . 104
5.3 Weighted CBR Model . . . . . . . . . . . . . . . . . . . . . . . 105
5.3.1 Problem Representation . . . . . . . . . . . . . . . . . . 106
5.3.2 Case Similarity . . . . . . . . . . . . . . . . . . . . . . . 108
5.3.3 CBR Model Feature Weighting . . . . . . . . . . . . . . 110
5.4 Data Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . 112
5.5 Preliminary Experiment . . . . . . . . . . . . . . . . . . . . . . 114
5.6 Evaluating the Efficiency of Prediction . . . . . . . . . . . . . . 116
5.6.1 Cross-Validation . . . . . . . . . . . . . . . . . . . . . . 118
5.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . 119
Contents ix
6 Experiments and Validation 121
6.1 First Set of Experiments . . . . . . . . . . . . . . . . . . . . . . 122
6.2 Second Set of Experiments . . . . . . . . . . . . . . . . . . . . . 126
6.2.1 Data Sampling . . . . . . . . . . . . . . . . . . . . . . . 128
6.2.2 Experiments with the Weighted CBR . . . . . . . . . . . 129
6.2.3 Weighted CBR with Clustering . . . . . . . . . . . . . . 134
6.2.4 Weighted CBR with Clustering Experiment . . . . . . . 136
6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
7 Conclusion and Further Work 142
7.1 Thesis Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 143
7.2 Contributions and Findings . . . . . . . . . . . . . . . . . . . . 146
7.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Appendix 153
A Preliminary Experiments 153
A.1 Experimental Results from the Selected Machine Learning Al-
gorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
B Data Simulation Configuration 157
References 160
List of Figures
1.1 Mobile money services: the figure is adapted from [AHG14], to
highlight the target financial transaction service covered in the
study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Number of subscribers and transactions (2011-2016) . . . . . . . 5
1.3 Illustration of the research methodology . . . . . . . . . . . . . 13
2.1 P2P funds transfer using MMT service . . . . . . . . . . . . . . 22
2.2 Mobile money transfer service ecosystem [Rie+13] . . . . . . . . 25
2.3 Regulatory issues . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Synthetic log data generation method [BKJ03] . . . . . . . . . . 37
2.5 The synthetic log data generation process [LKJ02] . . . . . . . . 39
3.1 Algorithmic solutions for fraud detection systems . . . . . . . . 45
3.2 CBR classical paradigm [Cor08] . . . . . . . . . . . . . . . . . . 65
3.3 The CBR cycle [AP94] . . . . . . . . . . . . . . . . . . . . . . . 66
4.1 The synthetic log data generation process [LKJ02] . . . . . . . . 79
4.2 Screen-shot of MMT simulation window . . . . . . . . . . . . . 81
4.3 Simulation window for input parameters . . . . . . . . . . . . . 83
4.4 A flowchart representing the simulation walk-through . . . . . . 87
4.5 Number of non-fraud and fraud transactions . . . . . . . . . . . 95
x
List of Figures xi
4.6 Different transaction services performed in the simulation . . . . 95
4.7 Categories of users based on their frequency of transactions . . . 96
4.8 Fraction of fraud types to category of transaction in the MMT
dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.9 Total number of different fraud types . . . . . . . . . . . . . . . 97
4.10 Line plot of amount fraction in Kenya shillings (Sept.2013 -
Feb.2014) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.11 A relationship graph for 100 most active MMT users in the
simulation. The blue and red nodes represent Legitimate and
bad actors respectively and the edges represent relationships
between the actors. . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.1 Proposed fraud detection framework . . . . . . . . . . . . . . . 102
5.2 Schematic representation of k-fold cross-validation . . . . . . . . 118
5.3 Schematic representation of 5-fold cross-validation . . . . . . . . 119
6.1 Transaction neighbours summary . . . . . . . . . . . . . . . . . 125
6.2 An illustration of the SMOTE + Tomek . . . . . . . . . . . . . 129
6.3 Results for all types of fraud class detection . . . . . . . . . . . 131
6.4 Results for Non-fraud transaction detection . . . . . . . . . . . . 132
6.5 Structure of case library retrieval using clustering algorithm
[TD13] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.6 CBR with clustering Results . . . . . . . . . . . . . . . . . . . . 139
A.1 Classifiers recall results . . . . . . . . . . . . . . . . . . . . . . 154
A.2 Classifiers F-measure results . . . . . . . . . . . . . . . . . . . . 155
A.3 Classifiers Mathews Correlation Coefficient results . . . . . . . . 155
A.4 Classifiers area under ROC curve results . . . . . . . . . . . . . 156
List of Tables
1.1 The growth in number and value of transactions. (Central Bank
of Kenya, 2016 [Ken16]) . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Mobile payments business model . . . . . . . . . . . . . . . . . . 24
2.2 List of mobile money applications in emerging economies . . . . 30
3.1 List of some related works on fraud detection . . . . . . . . . . 50
3.2 Discussion of some of the related works performance . . . . . . . 51
3.3 Confusion matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.1 Simulation input parameters . . . . . . . . . . . . . . . . . . . . 84
4.3 Simulation output parameters . . . . . . . . . . . . . . . . . . . 85
4.5 Groups of users according to possible days of the week the
mWallet account is used. . . . . . . . . . . . . . . . . . . . . . . 89
4.6 Results of chi-square test for each set of users . . . . . . . . . . 91
4.7 MMT dataset statistics . . . . . . . . . . . . . . . . . . . . . . . 94
6.1 Evaluation of StdCBR classifier on small MMT dataset. . . . . . 124
6.2 Performance of classifiers: row 1, represents StdCBR, row 2 Std-
CBR + new features, row 3 Weighted CBR, and row 4 Weighted
CBR + new features. . . . . . . . . . . . . . . . . . . . . . . . . 133
B.1 Simulation Input parameters . . . . . . . . . . . . . . . . . . . . 158xii
Abbreviations
AML Anti-Money Laundry MNO Mobile Network Operator
AUC Area Under the ROC Curve mMoney Mobile Money
CV Cross-Validation mWallet Mobile Account
CBR Case-Based Reasoning MABS Multi-agent Based Simulator
DT Decision Tree NB Naive Bayes
ENN Edited Nearest Neighbor NN Neural Network
FD Fraud Detection kNN k-Nearest Neighbor
FDS Fraud Detection System P2P Person-to-Person
FN False Negative RF Random Forest
FNR False Negative Rate ROC Receiving Operating Charac-
teristic
FPR False Positive Rate SMS Short Message Service
GA Genetic Algorithm SVM Support Vector Machine
LG Logistic Regression SNA Social Network Analysis
ML Machine Learning TNR True Negative Rate
MCC Mathew Correlation Coeffi-
cient
TPR True Positive Rate
MMT Mobile Money Transfer
xiii
Chapter 1
Introduction
This Chapter presents an overview of the term mobile money in Section 1.1.
Section 1.2 discusses the motivation(s) for detecting fraud in mobile money
transfer. The research questions along with the thesis contribution to knowl-
edge are presented in Sections 1.3 and 1.5 respectively. Finally, the thesis
structure is presented in Section 1.6 while the paper published as part of re-
search activities tied to this study, are presented in Section 1.7.
1.1 Mobile Money Overview
Mobile money is an umbrella term that defines an ecosystem encompassing
various types of financial activities or services transacted via a mobile device.
Three main types of mobile financial services exist; they include Mobile Bank-
ing, Mobile Payments and Mobile Commerce as shown in Figure 1.1. Those
can be defined as [AHG14]: (i) Mobile Banking: This involves the remote
management of one’s bank account in a mobile environment such as access
account information, setting up standing orders, paying bills, direct debits etc.
1
1.1. Mobile Money Overview 2
(ii) Mobile Payments: This is commonly referred to as mobile money transfer
(MMT) and refers to peer-to-peer payment services, operated under financial
regulation and performed from or via a mobile device. This covers remittance
type of activities, covering both domestic and international, cash in/out, etc.
(iii) Mobile Commerce: This involves the buying and selling of goods and
services through a mobile device, either remotely or on-site.
Figure 1.1: Mobile money services: the figure is adapted from [AHG14], tohighlight the target financial transaction service covered in the study.
Figure 1.1 is adapted from [AHG14] to illustrate the financial activity that will
be covered in this thesis. The highlighted area covers mobile money transfer
transactions within the mobile payment financial services. Although the risks
inherent in all payments channels also exist in the mobile money payment en-
vironment, however the use of mobile money transfer technologies introduces
additional risks caused by an increasing number of non-bank participants, the
higher speed of transactions and level of anonymity compared to mobile bank-
ing and mobile commerce systems [Mer11; NK14].
This research is motivated by the recognition that Mobile Money Trans-
fer (MMT) is a fast growing medium for financial transaction made via a
1.1. Mobile Money Overview 3
mobile device [Muy15]. It is seen as a platform with high significant societal
value and considered critical in supporting financial inclusion to unbanked and
under-banked populations in developing countries [LD12; Lak13]. In developed
countries, where most people have bank accounts and easy access to banks,
mobile money transfer is seen as just another evolving channel for existing fi-
nancial products and services. In developing countries financial infrastructure
are not well developed, physical transportation infrastructure are often inad-
equate, unreliable, and dilapidated making access to financial services very
costly. This consequently results into a larger unbanked population [Int13a]
and throwing up a large percentage of the population to be operating on a
cash only basis outside the formal banking system. In some cases, informal
methods are used to transfer money. For example, in rural areas people have
to travel long distances from their homes to collect remittances. This presents
several risks and a significant cost in addition to the already high transfer fees
[Int13b].
All these benefits have helped to make mobile money transfer more ap-
pealing [Int13b]. In addition, it has brought about significant implications
for economic activity across the board. First, it offers a simple and low-cost
service with reduced risk and second, it facilitates the flow of money from one
party to another using a communications infrastructure that already connects
billions of people around the world. This has given MMT provider the ad-
vantage to incorporate wide range of financial transaction services such as bill
payment, payroll deposit, loan receipt and repayment, and purchases of goods
and services, prepaid airtime, groceries, bus tickets, micro insurance etc., into
their system [Jen08].
In 2007 Vodafone and Safaricom in Kenya launched M-PESA, a short mes-
1.1. Mobile Money Overview 4
sage service (SMS)-based money transfer system that allows individuals to
deposit, send, and withdraw funds from a virtual account on their cell phones
which is separate from the banking system [WST10]. Three years after mo-
bile money services were first launched in Kenya in 2007, there were about 16
million mobile money subscribers [She13]. This number grew by 142% about
39 million in 2016, higher than the 2015 adult population of about 25.6 mil-
lion. Between 2011 and 2016, active mobile money subscribers grew with a
compound annual growth rate (CAGR) of about 16% from 19 million to 39
million (see Table 1.1). The usage metrics calculated by number and value of
transactions as well as balance on customer accounts have grown at a CAGR
of over 26% [Pae17]. The mobile financial services sector experience in Kenya
has been one of significant growth. This has made it to be widely viewed as
a success story worthy of being emulated across the developing world. As a
result, similar products have recently been launched in some other countries
across Africa, Asia, and Latin America, with the intent of expanding financial
services to low income and rural populations [WST10].
Table 1.1: The growth in number and value of transactions. (Central Bank ofKenya, 2016 [Ken16])
Measurement 2011 2012 2013 2014 2015 2016 CAGR
Mobile Money Ac/s (mns) 19 21 25 25 32 39 16%
No. of Transactions (mns) 433 575 733 911 1,114 1,362 26%
Transactions Value (KSh bn) 1,169 1,538 1,902 2,372 2,816 3,343 23%
Average Transaction Value
(KSh)
2,700 2,672 2,594 2,604 2,528 2,454 -2%
No. of Agents 76,912 50,471 113,130 123,703 143,946 167,501 17%
Studies have shown [Pae17] that the successful penetration of mobile money
services was typically accompanied by an increase in financial inclusion. For
1.1. Mobile Money Overview 5
example, in Uganda non-bank formal financial inclusion grew from 7% in 2009
when mobile money was first introduced to 49% in 2016 [Pae17; Uga16]. Sim-
ilarly, in Zimbabwe (Figure 1.2) between 2011 and 2014, financial inclusion
grew from 60% to 77% largely due to mobile money service [Tru15].
Source: Central Banks of Kenya, Tanzania and Uganda
Figure 1.2: Number of subscribers and transactions (2011-2016)
However, the level of evolution and uptake has varied by country. Kenya
has taken the lead in terms of uptake and more competitive pricing while
Uganda lags behind in terms of available financial services. Tanzania on it’s
part is the first of the East African countries to implement interoperability
between mobile money operators [Pae17]. In 2014 due to mobile money ser-
vice, sub-Saharan Africa accounted for 17% of the worlds unbanked population
compared to 31% in South Asia and 24% in East Asia and Pacific [DK+15].
The key component driving the penetration of mobile money and electronic
payments is their ability to disaggregate or unbundle the services traditionally
offered by banks into less expensive and accessible platforms [GV16].
1.2. Motivation 6
1.2 Motivation
Predicting fraud in mobile money transfer services comes with a number of
challenges. These challenges can be categorised into operational and techno-
logical challenges; the operational challenge relates to processes such as regu-
latory controls, control on account cash in and cash out, customer registration
process etc., and how they evolve over time. While it is critical to take into
account some of these challenges they are however, not a direct output of this
research work. On the other hand, the technological challenge take into ac-
count problems associated with the techniques/tools used for addressing this
challenges in the literature. The discussions on some of these challenges follows
below:
1.2.1 Operational Challenges
Mobile Money Transfer (MMT) services are financial services provided by a
Mobile Network Operator (MNO) that enable transfer of funds using a digi-
tal equivalent of cash (electronic money) between service subscribers through
mobile channels [Zhd+14]. In developed countries MMT is merely seen as an
extension to existing banking services. Consequently, in developing countries
where access to banking is often challenging for individuals and businesses, mo-
bile money transfer technologies is viewed as platforms with significant strate-
gic and societal value in supporting financial inclusion to both unbanked and
under-banked populations. More than 2.5 billion adults globally lack a formal
bank account majority of them in developing countries. Furthermore, approx-
imately 68 percent of that population have access to a mobile phone [Int13b].
In a 2013 Gartner report [She13], the worldwide market for MMT was esti-
1.2. Motivation 7
mated to reach over 450 million subscribers in 2017, with a mobile transaction
value of more than $721 billion. The main drivers behind the success of mo-
bile money are the explosive growth in the number of mobile devices and the
drop in computing power cost, which has made mobile phones more accessible
[Int13a].
The ability of MMT to handle large numbers of small value payments,
its suitability for transferring funds worldwide in digital currencies, and the
current absence of robust regulatory oversight, makes it both an attractive
target for attackers, fraudsters and an equally attractive vehicle for money
laundering [BD13]. While in most countries Anti-Money Laundering (AML)
and transaction fraud reporting is compulsory for service providers and fi-
nancial institutions [Zhd+14], in many of them, existing ML legislation is not
presently fit to fully accommodate the relatively young m-money markets. The
absence of suitable oversight intensifies the exposure of MMT to risk. These
risks includes fraud, money laundering and other financial misuse. For ex-
ample, where proper controls are not deployed, fraudsters can get access to
MMT services without disclosing their identity to the MNO, by taking advan-
tage of prepaid phones, ”pooling” and delegation of mobile devices [Zhd+14;
Cha+11b]. ”Since the success of any payment system is based on ubiquity,
convenience, and trust, it is necessary to address emerging risks in order to
maintain public confidence in mobile money” [Lui+15].
A crucial observation is made at this point to distinguish between capa-
bilities for investigating transaction fraud as opposed to those addressing the
identification of money laundering. While transaction fraud is typically recog-
nised as most commonly associated with money laundering [Zhd+14], money
laundering activity itself may technically exist in the absence of transaction
1.2. Motivation 8
fraud e.g. through the use of mule accounts [Zhd+14]. Even more crucially,
money laundering is process−driven as opposed to transaction fraud which is
event−driven. As a consequence, AML predictive modelling is far more com-
plex and computationally demanding than fraud monitoring while selection of
suitable Artificial Intelligence approaches becomes significantly more challeng-
ing for AML. Therefore, this thesis considers the development of monitoring
and predictive models for transaction fraud with a view to merely supporting
AML indirectly.
One of the commonest approaches used today for mitigating illicit financial
activity is to impose transaction thresholds on different risk profiles [BH02].
Transactions that exceed this thresholds will require extra scrutiny whereby
the client needs to declare the origin of the funds [LRA14]. These thresholds
are usually set by law without distinction made between different economic
sectors or actors. This of course is easily countered by fraudsters through
adapting their spending behaviour to smaller value transactions (a.k.a smurf-
ing) [Zhd+14]. Consequently, this and other similar methods hitherto used
have proven insufficient [LRA14; Mag09].
New promising research in the field of data mining based methods have been
used to detect fraud [Phu+10]. Observations from this research shows that
machine learning algorithms can identify novel methods of fraud by detecting
those transactions that are different (anomalous) in comparison to the benign
transactions. Hence, several machine learning techniques have been used for
the detection of fraud [LRA14] and the application of machine learning to this
problem is advantageous in many situations [Yue+07; ZSY03]. Most of these
machine leaning techniques are data-driven, typically requiring a significant
amount of financial transaction historical data [Sha+16].
1.2. Motivation 9
The challenges in obtaining real life financial transaction data sets for re-
search purposes are well-known [BH02] including data protection, confidential-
ity, purpose and storage limitation as well as what is outlined through GDPR
principles [Com18]. Even where real life data sets are available, this may be
small in size and lack information on confirmed fraud cases and their possible
taxonomies [Gor15]. The overall scarcity of real data use cases in the academic
literature clearly attests to this [Gab+13]. In the absence of historical dataset
due to a number of reasons as described above, Phua et al. [Phu+10] suggests
the simulation of synthetic transaction data which matches closely to real data
as a solution. This can be achieved by using real data as a property seed for
the simulation i.e statistical properties from small amounts of authentic data
is used to generate large amounts of synthetic data. This motivates part of
our aim in simulating mobile money transfer transaction data for the purpose
of evaluating the proposed prediction model.
1.2.2 Technological Challenges
In general fraud detection system relies on the analysis of recorded transactions
data for the purpose of identifying unusual transactions. These transaction
data can be enormous and are mainly composed of a number of attributes
such as account identifier, transaction date, recipient, amount of transaction
and more. As a result, the use of automatic systems are essential since it is
not always possible or easy for human analysts to detect fraudulent patterns
in transaction datasets by manually checking all transactions [Poz15].
A classic approach towards this system is an expert-based approach that
uses experience, intuition and domain knowledge from fraud analysts to define
rules that are used to predict the probability of a new transaction to be fraud-
1.2. Motivation 10
ulent or not [BVV15]. For example [BVV15], lets assume we have a rule based
fraud detection system for an insurance claim company. The expert rule can
be ”IF: Amount of claim is above threshold OR Severe accident, but no police
report OR Multiple receipts submitted, THEN: Flag claim as suspicious AND
Alert fraud investigation officer”. This expert system relies on human expert
input, evaluation and monitoring, thereby suffering from a number of disad-
vantages. This systems are expensive to build since they require advanced
manual input by the fraud experts. They often turn out to be difficult to
maintain and manage when they become obsolete due to fraud evolution (i.e
when fraudsters change their modus operandi, they become undetectable by
the current rules) or change in behavioural pattern of customers.
As an alternative to this expert system, automated approaches (such as sta-
tistical and machine learning techniques) that leverage on the recorded trans-
action for data monitoring and analysis in a more efficient manner is used.
These automated approaches are data driven and are able to learn from the
data in a supervised or unsupervised manner for the purpose of identifying pat-
terns that are most probably related to a fraudulent behaviour [Poz15]. This
data driven approaches are able to learn complex fraudulent configurations,
ingest large volumes of data and adapt to changing distribution in the case
of fraud evolution. However, they come with some drawbacks such as [Poz15;
Sha+16]: (i) in the absence of significant size of historical data, they tend
not to perform well, (ii) some models are black box, i.e. they are not easily
interpretable by investigators and thus they do not provide an understanding
of the reason why an alert is generated.
A Case-based reasoning method as an alternative to the aforementioned
methods, comes with a number of advantages when applied to the field of finan-
1.2. Motivation 11
cial transaction fraud. For example [Sha+16; PDM15], case-based reasoning
features has the ability to (i) learn in the absence of historical consumption
data while continuously improving when more data becomes available over
time. (ii) realize knowledge transfer as spending habits evolve; as is the case
where information on one transaction is exploited to improve predictions for
different yet similar transactions. (iii) provide precedent-based justification
instead of justifying a solution by showing a trace of the rules that led to the
decision [CK91; Wat99]. In addition, a CBR system is fast in construction
compared to expert systems which are easier to maintain and can cope with
complex structures. This is an advantage in comparison with ANNs that uses
numeric input or symbolic patterns to deal with the complexity of structures
[PH07; Kap12]. Additionally, a CBR system is more transparent than black-
box models, such as neural networks [Sha+16], making its overall applicability
easy.
Leveraging on the advantages above, a case-based reasoning methodology
is considered suitable to provide an effective way to analyse complex structures
(such as evolving genuine and fraudulent behaviours) in mobile money transfer
payment services. However, to design a fraud detection system using case-
based reasoning methodology a couple of requirements needs to be considered.
Some of these includes: (i) either to use a supervised or unsupervised approach.
(ii) mechanism for feature selection/dimensionality reduction on sample sets.
(iii) unbalanced dataset problem associated with financial transaction data.
All these requirements will be investigated in detail and considered in this
thesis as further discussed in chapter 3.
1.3. Research Questions 12
1.3 Research Questions
Based on the motivations above, this research makes an investigation on whether
CBR methodology can be used towards the effective prediction of mobile pay-
ment fraud. Thus, the main research question in this thesis can be stated as
follows:
How can the Case-based Reasoning (CBR) methodology be used
for effective analysis and prediction of transaction fraud in mobile
money transfer (MMT) networks?
This main research question is further divided into three sub-questions, as
addressed in this thesis.
1. How can a model developed through the CBR approach be used for ef-
fective analysis of MMT transaction fraud? Achieving this will support
feature engineering of systems that will deliver improved predictive ac-
curacy.
2. To what extent can such a model deliver measures/metrics for prediction
of MMT fraud? This will involve the similarity measures used and the
performance from these predictions.
3. What are the limitations of such a model and the performance to expect
from it?
Finding answers to these research questions will help to address the im-
plementation, validation and evaluation of the proposed CBR approach. One
major concern in building and evaluating the performance of the proposed
1.4. Research Methodology 13
model in a realistic condition is the challenge of obtaining mobile money pay-
ment dataset due to privacy issues. This leads to an additional question: How
can background transaction data as training and test cases for pattern analy-
sis and learning algorithms evaluation be obtained? Addressing this challenge
will support the evaluation of the proposed prediction model.
1.4 Research Methodology
The research methodology adopted in this thesis can be classified into three
major steps and its presentation is as shown in Figure 1.3:
Figure 1.3: Illustration of the research methodology
As seen in Figure 1.3 above, the first approach in this thesis was to carry
out an investigation into the literature so as to provide answers to the stated re-
search questions. To investigate the state of art, a narrative review was carried
out with a view of providing comprehensive overview of topics such as Mobile
money transfer services environment. This was done by providing basic knowl-
edge on fundamentals of Mobile money such as the business models, ecosystem
1.4. Research Methodology 14
and regulatory issues. Investigation was also carried out into the existing ap-
proach that were used in the literature for dealing with the challenge of lack
of publicly available mobile money transfer dataset. Also, common algorith-
mic solutions used for financial service fraud detection in the literature was
studied. Different application of supervised machine learning approaches exist
in fraud detection problems and the common approaches used for handling
unbalanced data problem in supervised learning were investigated. The eval-
uation techniques for measuring a fraud detection system effectively were also
investigated. Lastly, other significant areas of this research were investigated
and highlighted.
The second step is the generation of synthetic mobile money transfer (MMT)
data set. Due to the absence of real MMT transaction dataset, the need for
simulated dataset is identified and the methodology for simulating synthetic
mobile transfer dataset in [LKJ02] was adopted. This is based on the fact that
this method has a well defined interface, which makes it easy to use i.e the
whole process is divided into steps and this provides the possibility of using
the whole or part of the system for data simulation (as discussed in Section
2.3.2). To run the simulation, the Multi-agent based simulator (MASON) used
in [LrA12b] was adapted. The rationale for using MASON was that it is fast
and supports discrete event interaction between many agents in swam i.e it
facilitates the implementation of social networks as discussed in Section 2.3.4.
In order to evaluate the simulation dataset, verification and quantitative (chi-
square test) methods were carried out since there are no real data as input to
the simulator as discussed in Subsection 4.4. After the data simulation, the
proposed CBR system was designed and developed using jCOLIBRI frame-
work [RGGCDA14]; a Java framework that allows rapid prototyping of a CBR
1.4. Research Methodology 15
system, the development and deployment of the CBR system in real scenarios.
As part of an improvement to the CBR system, it classifies features in the
MMT transaction dataset into five contexts and then recombines into a single
dimension to capture user behaviour effectively.
The third step is the simulation of the proposed CBR prediction algorithms.
At this stage different set of experiments were designed and conducted. The
process followed an evolutionary approach in order to evaluate the research
methodology. As a result, the first set of experiment started with the applica-
tion of basic CBR technique as discussed in Section 5.2 on minimalistic simu-
lated dataset. Gradually the experiments were progressed with the application
of Weighted CBR techniques using machine learning capabilities (Genetic al-
gorithm) for assigning parameter weights and automating the random selection
of k-value in the CBR k-NN algorithm. In addition, for the purpose of cap-
turing user behaviour effectively and improving the CBR prediction accuracy
feature engineering was carried out as discussed in Section 5.3.
Next, in order to demonstrate the applicability of the proposed model,
more sophisticated simulation datasets imposing both high complexity and
significant size were used. The rationale was to ensure that a promising clas-
sification precision of the proposed approach is achieved. To ensure that the
CBR model is not biased as a result of unbalanced dataset and also using the
same transaction data for both training and testing phase in the experiment,
the following were carried out: (i) data balancing using an hybrid approach
that combines oversampling and under-sampling algorithms (SMOTE+Tomek-
Link) was adopted. According to Chawla et al. in [Cha+02; GS17], this hybrid
approach works better than either one. (ii) The MMT dataset was split into
training and testing set using a ratio of 70:30 respectively. In addition, to avoid
1.5. Contribution to Knowledge 16
an overoptimistic estimate of the CBR model performance after the dataset
split, transactions from known compromised MMT account was removed from
the subsequent split as proposed in [Fab+17]. For example when an MMT
account is already associated with a fraudulent transaction in the training set,
its transactions are removed from the test set. Furthermore, Clustering tech-
nique was applied to the CBR retrieval process to reduce the computation cost
problem associated with the use of genetic algorithms for feature weighting.
Clustering has been widely used in various fields in the literature to improve
the classification accuracy and computation cost of learning algorithms [TD13]
as the case library grows.
As evaluation metrics, four types of performance measures were used namely:
recall, F-measure, Mathew Correlation Coefficient (MCC) and Area Under the
Curve (AUC) as discussed in Section 5.6. This is based on the fact that they
have high efficiency with respect to handling imbalanced data without getting
biased towards the majority class and also they are highly suitable with respect
to handling fraud detection domain. Finally, the difference in the results are
analysed both qualitatively and quantitatively.
1.5 Contribution to Knowledge
The work that will be discussed in this thesis presents the following contribu-
tions to knowledge.
First, this thesis proposes a novel approach to detecting fraud in mobile
money payment networks using Case-based reasoning methodology. This work
did not only present the use of CBR approach in financial transaction fraud
detection domain but it also introduces the novelty of feature engineering by
1.6. Thesis Structure 17
classifying features into the context of information. This approach allows users’
behavioural pattern to be captured effectively and also helps to improve the
prediction accuracy of the proposed CBR system.
An additional value offered in this work is the injection of variations of
known frauds into the CBR system simulation. The general binary annotation
for fraud class (i.e 1) in the literature was further annotated and introduced
into the mobile money transfer dataset using the three fraud case scenario
presented in the data simulation. This approach allows the investigation and
analysis of how this variation affects the proposed system performance param-
eters, such as its detection rate for each fraud class case.
The final contribution in this thesis is the extension of the synthetic fraud
data generation methodology in [LKJ02] to accommodate the use-case of mo-
bile money transfer services. The methodology was used to expand and simu-
late additional use-case scenarios which were hitherto not considered in the lit-
erature for mobile money transfer dataset. The use of these simulated dataset
in the absence of real data will allow the development and evaluation of fraud
detection techniques or tools.
1.6 Thesis Structure
Chapter 2 provides a general overview on Mobile money transfer services envi-
ronment, by providing basic knowledge on fundamentals of Mobile money such
as the business models, ecosystem and regulatory issues. It also discusses the
existing approach that was used in the literature for dealing with the challenge
of lack of publicly available mobile money transfer dataset.
Chapter 3 discusses common algorithmic solutions used for financial service
1.6. Thesis Structure 18
fraud detection in the literature review. It further discusses the application
of supervised approaches in fraud detection problems and the common ap-
proaches used for handling unbalanced data problem in supervised learning.
The evaluation techniques for measuring a fraud detection system effectively
was also discussed.
Chapter 4 contains discussion on the generation of synthetic mobile money
transfer dataset. It provides discussion on the data simulation model as well
as the different misuse scenarios used. This chapter also discusses the imple-
mentation, simulated scenario and evaluation of the generated dataset. The
results from the simulated data were analysed and presented and then used
for evaluating the proposed models.
Chapter 5 provides an extensive description for the methodology adopted for
the needs of predicting money transfer fraud in mobile money services. It also
contains the rationale behind the conducted experiments as well as the case
representation used for the similarity measures. The chapter further discuss
the enhancement of basic CBR model using machine learning capabilities to
enhance its effectiveness in predicting money transfer fraud. Finally, the data
pre-processing and performance evaluation matrices used in this thesis were
discussed.
Chapter 6 discusses the results from the model evaluations using different
volumes of dataset. This involves the implementation of a set of experiments
in an evolutionary approach starting from a simplified to more complex ones.
It also discusses the rationale for the use of clustering to further enhance the
performance of the proposed CBR model.
Chapter 7 concludes this thesis by summarising the findings and highlighting
1.7. Publications and Research Activities 19
the main contributions with respect to the thesis objectives. Discussion on
future plans for further work in the field of mobile money transfer fraud are
also explored.
1.7 Publications and Research Activities
The list of works published during this thesis is summarized as follows.
Peer Reviewed International Conference Papers
1. Adeyinka Adedoyin, Stelios Kapetanakis, Georgios Samakovitis, and Mil-
tos Petridis (2017). Predicting Fraud in Mobile Money Transfer Using
Case-Based Reasoning. In: 37th (SGAI) International Conference on
Artificial Intelligence, AI-2017, Cambridge, Uk (2017). Won the Best
Student Paper.
2. Adeyinka Adedoyin, Stelios Kapetanakis, Georgios Samakovitis, and Mil-
tos Petridis (2017). Fraud Detection in Mobile Payment Transfer. In:
22nd UK Symposium on Case-Based Reasoning (UKCBR2017), Cam-
bridge, Uk (2017).
3. Adeyinka Adedoyin, Stelios Kapetanakis, Miltos Petridis, and Emmanouil
Panaousis (2016). Evaluating Case-Based Reasoning Knowledge Discov-
ery in Fraud Detection. In: 24th International Conference on Case-Based
Reasoning (ICCBR2016) Workshop proceedings, Atlanta Georgia, USA,
October, 2016.
Chapter 2
Background on Mobile Money
Transfer Operation and Dataset
In the previous chapter, types of mobile money financial services were identified
and the rationale for selecting mobile Payments transfer as the area of study in
this thesis was discussed. An overview of mobile money transfer was carried out
as well as discussions on motivation in this thesis. This chapter provides basic
knowledge on fundamentals of Mobile money such as the mobile money transfer
as in Section 2.1, business models, ecosystem, regulatory issues and mobile
money in emerging countries in Section 2.2. Then a detailed approach that has
been adopted for dealing with the challenge of lack of publicly available mobile
money transfer data set is presented in Section 2.3 as well as few relevant work
in the literature where financial transaction dataset were simulated. Section
2.4 concludes the chapter by summarising the topics related to the research
questions in this thesis and the research gaps to be addressed.
20
2.1. Mobile Money Transfer 21
2.1 Mobile Money Transfer
According to Zhdanova et al. [Zhd+14], Mobile Money Transfer (MMT) ser-
vices can be define as a financial service provided by a Mobile Network Op-
erator (MNO) that enables transfer of funds (mMoney) between service sub-
scribers through the use of mobile channels. In MMT service, mobile sub-
scribers can add electronic money called mMoney to his or her virtual mobile
account (mWallet) and store for later use, transfer to other mobile subscribers
or purchase goods via mobile phone. The receiver can inexpensively convert
this credit back into cash through a retailer such as local corner shops to act
as bank branches.
Mobile money transfer service allows users to send cash using SMS technol-
ogy thereby avoiding inconvenient and costly transfer methods such as physical
travel, the mail, or traditional wire transfer services like Western Union and
Postapay which are often done in banks. For example, payments for services
like electricity and water where people need to travel long distances and may
end up meeting huge queues at the bank. To deposit funds into mobile money
account, consumers go to participating local shops (retailer) and hand over
physical money. There is no charge to a customer for depositing funds into
his/her account, but a sliding tariff is levied on withdrawals from the account.
A subscriber who sends mMoney is charged a flat fee if sending to another
registered user and a sliding fee if sending to a mobile subscriber that is not
registered with the same MMT service provider [ML10; WST10]. Figure 2.1
shows a person to person fund transfer using MMT service [Zhd+14].
2.1. Mobile Money Transfer 22
Figure 2.1: P2P funds transfer using MMT service
From Figure 2.1, in order to access MMT services e.g perform P2P transfer,
Jane must register at an authorized MMT retail agent outlet R1 (e.g M-PESA).
Then get an individual electronic money account (mWallet) that is managed by
MNO (e.g Safaricom), which in turn deposits the full value its customers store
in M-PESA accounts at a pooled account in a regulated bank. Thus, the issuer
of M- PESA accounts is Safaricom, but the value in the accounts is entirely
backed by highly liquid deposits at a commercial bank. So Jane converts
her cash into mMoney and deposit its amount into her mWallet account with
the help of a retailer R1. Then, Jane can use her mobile device to transfer
mMoney to Frank if he is subscribed to the same MMT service. On receiving
the transfer, they both get an SMS receipts as confirmation Frank can then
withdraw cash from his mWallet at the retailer R2. Since the sender and
recipient receive an SMS receipt as proof of transfer after each transaction, it
has helped to build more trust in the system even if a customer doesn’t trust
the local agents themselves [Zhd+14]. The next Section discusses the different
business model that can be adopted to build a mobile money transfer service.
2.2. Mobile Money Business Model 23
2.2 Mobile Money Business Model
In order to build a mobile money transfer service as discussed above, a busi-
ness model is required Lurie in [Lur11] defined a business model as a way of
designing a business to create, deliver and capture value. According to the
author in [AHG14], mobile money is growing rapidly around the world and
they require a supportive business model as the different key players try to
leverage their interest into the service. These varies from financial institutions
trying to leverage new technologies and channels to mobile operators looking
to augment their revenue opportunities and third party vendors attempting to
take advantage of business opportunities presented by new technologies and
changes in consumer behaviour. There are three core business models in mobile
money payment and they are:
• MNO-Centric: The Mobile network operator (MNO) takes the lead
and provides various financial services initially outside the banking sys-
tem. In extreme cases the MNO could acquire banking licenses that
would allow them to store the deposits made into the system [AHG14].
• Bank-Centric: A bank takes the lead and finds an MNO with which
to partner. They use mobile phone platforms to leverage their credi-
bility and expertise in the extension of their existing and new channels
[AHG14].
• Collaborative (including third-party players): In collaborative model
an MNO and a bank joins forces to create an m-money service. In the
case of third party led, the vendors create solutions that allow cross
operator and cross bank solutions to be launched in the market [Lak13].
2.2. Mobile Money Business Model 24
Examples of how the various actors in the value chain have implemented these
various business models is as shown in Table 2.1 [AHG14].
Table 2.1: Mobile payments business model
Business
Model
Technologies
Used
Purchase
RelationshipCharged to Examples
Financial
Institution Led
NFC
Internet
External Device
Consumer to
Business
Bank Account
Debit Card
Credit Card
Prepaid Card
Barclay NFC
Tag
Mobile
Operator Led
SMS
USSD
Internet
NFC
Customer to Business
Business to Business
Peer to Peer
Network Bill
Debit Card
Credit Card
M-Pesa
Felica
Third Party Led
Internet
NFC
External Device
Customer to Business
Business to Business
Peer to Peer
Debit Card
Credit Card
Prepaid Card
Paypal
Sqaure
Google Wallet
Over time as mobile money technology keeps evolving, the business model
will experience some variations. For instance an MNO-centric venture could
evolve over time by increasing its partnership with banks and possibly develop
into a collaborative model. A good example of this, is M-PESA in Kenya. This
shows that the models are dynamic, and they can be linked at certain stages of
financial development in each country [Ste11]. However, the implementation
of this aforementioned business models involves partnership between different
stakeholders. The discussion of the major stakeholders follows in the next
Section.
2.2.1 Mobile Money Transfer Eco-system
Jenkins in [Jen08] defined Mobile Ecosystem as the networks of organizations,
individuals, processes and systems that link and facilitate or control the deliv-
ery of payments system. Nazareno, the president of Smart Communications
2.2. Mobile Money Business Model 25
highlights that mobile money ecosystem works on three rules; partnership,
partnership and partnership. This encourages the creation of a mesh of part-
nership covering various networks of relationships [Jen08]. The key players in
mobile money transfer ecosystem are the Consumers (End-Users and Service
Providers), Distribution channels (retailer and wholesaler), Network Operator
(MNO), Commercial banks and the Central bank [Rie+13]. The activities of
this different key players within the ecosystem was used to simulate the mobile
money transfer dataset as discussed in Chapter 3.
Figure 2.2: Mobile money transfer service ecosystem [Rie+13]
Figure 2.2 which is adapted from [WST10], shows the roles of various key
players in the mobile money transfer ecosystem. This outlines the major use
cases in the MMT ecosystem for the purpose of data simulation as further
discussed in Chapter 3. The discussion of each of the key players is as follows:
2.2. Mobile Money Business Model 26
A. Mobile Network Operator
Mobile Network Operator (MNO) emits mMoney (m) in partnership with a
private bank and they regularly produce compliancy reports to the Central
Bank who is responsible for the country’s monetary policy [Rie+13]. The role
of MNO in mobile money ecosystem is very critical as they play the leadership
role by drawing the different stakeholders in the ecosystem together. MNO
provides infrastructure such as wireless communication, back end server and
the mobile application for the operation of the ecosystem. In addition, they
bring their huge existing distribution channels and subscribers into the ecosys-
tem. Wherever there is mobile coverage, there is an agent of a distributor that
sells prepaid credits. The geographical distribution of the agents gives MNO
the ability to reach customers across all income segments. This coupled with
the ownership of the infrastructure gives MNO the ability to be the key player
in the mobile money ecosystem. They also play key roles in further ecosystem
expansion and training of agents in dealing with consumers. However, they
lack experience in the financial services, payment risk, regulatory and legal
governance of the payment system [Jen08; Tob11].
B. Financial Institutions (Partner Banks)
The financial institution provides banking license and help to store the mobile
money customers’ keep in their mWallet. They also bring their vast experi-
ence and customer trust in dealing with eMoney while acting as intermediary
between the MNOs and agents in acquiring the eValue. The branch offices of
the banks act as aggregation points for the merchants, distribution channels
and their agents in facilitating the flow of money in mobile money ecosystem.
The bank provides financial regulatory advice to the MNOs and also on-line
banking integration to the m-commerce system of the MNOs to facilitate their
2.2. Mobile Money Business Model 27
operations [Tob11].
C. Distribution Channels (Agents)
The distribution channels (agents) are primarily the consumer facing touch-
point and can often be seen as the ”face” for mobile money offering [SCP16].
The distribution channels are non-bank entities such as MNOs retail shops,
village corner stores or a mix of both that handle customer registration, cash-
in/cash-out services and other transactions on behalf of the MNO. Through
their knowledge and understanding of the consumers, they help to educate,
maintain liquidity, handle account opening procedures and report suspicious
transactions in line with regulatory requirements. The agents earn commis-
sions based on the amount of mobile money trading they undertake which are
usually very small amount per transaction. However, it is expected that the
volume of transactions will add up to a good amount to sustain their retail
business [Muy15; Tob11].
D. Service Providers (Merchant and Utilities)
The adoption of mobile money platforms as a means of receiving payment by
service providers enables convenient and timely payments for both the mer-
chant and customers. In Kenya and Ghana for example, subscribers of popular
pay-per-view TV service use mobile money (M-PESA and ZAP) for the pay-
ment of their subscription fees rather than queuing up to make such payment.
Also, the adoption of mobile money platform can lead to increased customer
base of the mobile money ecosystem thereby acting as a catalyst in promoting
the service [Tob11].
E. Regulators
The function of regulators in mobile money is to provide an enabling envi-
2.2. Mobile Money Business Model 28
ronment, protect the stability of the financial system, ensure implementation
of regulations and innovation facilitation. The development of mobile money
cuts across two regulatory bodies in most countries, telecommunication and
banking. This has brought about competition and unclear functions between
the two major operators. As a result, many countries have not yet developed
mobile money regulations and policies. There is therefore, the need to clarify
and understand the relationships between the actors within the mobile money
ecosystem so as to ensure improved efficiency and clear regulatory policies.
This also gives rise to a need for a converged regulation for both technology
standards and policy which is slowly coming to the attention of regulators
globally. This proposed collaboration requires careful balancing with national
interest [LD12; Tob11].
F. Customers
In mobile money ecosystem, customers are the final recipients and it is there-
fore important that effective and efficient services are made available by all
participating mobile service providers. The use of mobile money payment re-
duces the risk of carrying cash and increased access of payment, remittances
and other financial services for customers particular in the developing markets
[Tob11].
2.2.2 Mobile Money in Emerging Economies
Globally, the main drive behind the success of mobile money is the explosive
growth in the number of mobile devices and the drop in the cost of computing
power. This has enabled millions of people to own mobile devices [Int13a].
Mobile money in emerging countries is more than just a technology because
it brings about financial inclusion to the worlds unbanked and under-banked
2.2. Mobile Money Business Model 29
population. In developing countries [Int13b], more than 2.5 billion people lack
a formal bank account but about 68 percent of these population have a mobile
phone. The reasons for high percentage of adult not having a bank account
ranges from lack of money to use one, distance from banking facilities especially
in rural areas, and high cost of banking services. Africa has the highest growth
rate in mobile phone usage with increasing mobile coverage. This has allowed
millions of people in the remote areas to adopt the use of mobile money as an
alternatives to traditional banking especially in Sub-Saharan Africa [Ken16].
In Kenya like most low income African countries, a great number of households
that depend on domestic remittances using M-PESA makes it the biggest suc-
cess story in terms of overall usage. Since the launch of M-PESA in Kenya
2007, the mobile money industry has continued to grow in many African coun-
tries and today it has reached a level of sophistication not seen anywhere else
in the word [Int13b; Zha12]. Examples of few mobile money applications in
emerging economies are as shown in Table 2.2 [Int13b].
2.2. Mobile Money Business Model 30
Table 2.2: List of mobile money applications in emerging economies
M-money Application Countries Implemented Main Features
M-PESA Kenya, Tanzania, South Africa
and Afghanistan
P2P transfer, Pay school fees, Pay
electricity, Pay for goods and ser-
vices.
Easypaisa Pakistan Pay utility bills, Make P2P trans-
fer, Increase airtime credits, Save
money, Pay for goods and services.
T-Cash Haiti Receive salary, Make P2P trans-
fers, Pay bills.
Globe GCash Philippines Pay utility bills, Make P2P trans-
fers, Use as a mobile wallet, In-
crease airtime credits, Pay for
goods and services.
Airtel Money India and 14 African coun-
tries including Uganda, Tanza-
nia and Kenya
Make P2P transfers, Pay for goods
and services, Bill payments.
MTN Mobile Money Africa, including Uganda,
Ghana, Cameroom, Ivory
Coast, Rwanda and Benin
P2P transfers, Buy airtime, Check
balances, Pay utility bills.
EKO India Make P2P transfers, Bill and loan
payments.
WIZZIT South Africa P2P transfers, Buy airtime, Check
balances and statements, Pay elec-
tricity.
The level of MMT evolution and uptake has varied by country bringing
about various regulatory issues. The next Section discusses some of the rele-
vant regulatory issues.
2.2.3 Regulatory Issues
As discussed in Section 2.2.2 the rate of MMT use in emerging economies is
growing rapidly and as a result they come with some regulatory issues. In
general, the regulatory space for Mobile money services encompasses both
telecommunications and central banks. The role of regulators in the mobile
2.2. Mobile Money Business Model 31
money ecosystem is very critical for the long term survival of its ecosystem.
This underscores the need for partnership and collaboration between both
sectors in order to mitigate the risks for the consumer. The primary goal of
regulation is to enforce compliance to the various regulations so as to safeguard
the interests of the consumer, enhance trust in the payment system and ensure
that participants have effective means for identifying, measuring and managing
business risk [Int13b; Tob11]. For example, the Philippines regulates MNO
that provide mobile money services; subscribers are required to register in
person with the service providers using a valid photographic identification
before they can put cash into their mobile accounts or withdraw cash. They
also regulate how much money a subscriber can transfer at any one time, during
a day or month. In some developing countries their are no clear regulatory
frameworks for mobile financial services. In Kenya and Cambodia, they have
not issued a specific regulations but nevertheless allow MNO-centric models
on an ad hoc basis through ”no objection” letters and conditional approvals
or other means [Int13b].
According to [Int13b] report, the regulation of mobile money services can
be different from country to country depending on the business model adopted.
For example, in a country where only banking institutions are allowed to han-
dle cash in the context of mobile money transfer, it will be difficult to outsource
that cash handling function to a service provider outside the financial insti-
tution. Also, the issue of whether or not a company offering mobile money
services should be regulated as a bank is another challenge. This is the case
in Pakistan unlike in Kenya. The emergence of mobile money remittances
have brought about great concern on some regulatory issues which are outside
the traditional financial institution regulations. Figure 2.3 shows a graphical
2.3. Mobile Money Transfer Synthetic Data 32
representation of some of the observed concerns.
Figure 2.3: Regulatory issues
2.3 Mobile Money Transfer Synthetic Data
In the absence of real transactions dataset due to a number of reasons as men-
tioned in Section 1.2.1, the need for simulating synthetic transaction data arise
as proposed by Phua et al. in [Phu+10]. Lundin2002 et al. [LKJ02] defined
synthetic data as a data that is generated by humans using simulated users
in a simulated system to perform simulated actions. Simulated actions can be
an agent or program that perform actions according to a specification created
by the experiment organizers, reflecting the desired behaviour of the system.
A synthetic data simulation can be implemented by either stimulating user
behaviour using a software automation or hire people to generate background
2.3. Mobile Money Transfer Synthetic Data 33
data and attacks. In addition to carrying out synthetic simulation, a choice
needs to be made between using a fully simulated system, a real system or a
mix of real and simulated system components [LKJ02].
In the recent years, research areas such as data mining [Jes+05], artificial
intelligence [RN03] and process mining [Rie+13] are paying more attention
in developing data generation systems that systematically generate synthetic
data for numerous applications. This can be associated to the difficulties in
obtaining or modifying properties of a real life data for evaluating various
algorithms [PZ06]. The availability of representative data such as synthetic
data in analysing the use of machine learning for fraud detection gives sev-
eral advantages compared to using authentic data [LKJ02]. Data properties of
synthetic data can be tailored to meet various patterns and attacks not avail-
able in authentic data sets. However, a purely synthetic dataset suffers from
the fact that they may underfit and will consequently not properly reflect the
properties or pattern of a real life data set [BKJ03].
In order to generate a synthetic data that properly reflects the properties
of an authentic dataset, a methodology can be developed such as in [LKJ02;
WHV08; Jes+05] using the authentic data as the property ”seed” to simulate
the synthetic data. This is achieved by using statistical properties from small
amounts of authentic data to generate large amount of synthetic data thereby
preserving some important parameters in the authentic data such as users and
service behaviour [LKJ02]. Despite the increasing interest, the research on
synthetic data generation in the area of fraud detection is still in its early
stage. With respect to exhaustive survey of approaches, there are only three
synthetic log generators that exists in the field of fraud detection.
The authors in [BKJ03] applied their proposed data generation method in
2.3. Mobile Money Transfer Synthetic Data 34
[LKJ02] to generate synthetic test data on IP based Video-on-Demand service.
They aimed at testing the feasibility of generating and using synthetic data
for training and testing a fraud detection system. In the study, small amount
of authentic log data was used to generate a large amount of synthetic log files
containing both normal and fraudulent user behaviour. In the simulation, they
were able to preserve the important statistical properties of the authentic data
by using authentic normal data and fraud as seed for generating the synthetic
data. Thus, they were able to create a realistic behaviour profile for both
normal users and attackers.
In [LrA12b], the author modelled money-laundering cases in mobile money
systems. In their work, a Multi-agent based simulator (MABS) was developed
to simulate the behaviour of several clients interacting in a Mobile Money
environment. Their aim was to simulate several transactions corresponding to
concurrent use of an account by legitimate user, a fraudulent one and results
in a shift of behaviour. The implementation of their model was limited to
random actions of users at random times.
Gaber et al. in [Gab+13] modelled mobile money transfer using the con-
cept of habit to create a meaningful pattern of normal user behaviour. The
implementation of the model was based on the assumption that legitimate
users transactions are mostly related to their habits i.e. legitimate users tend
to carry out frequently and repeatedly a specific set of transactions.
Other methods are used in generating synthetic data but do not generate
data related to frauds and attacks. Some focus on testing a specific detection
prototype and comparing the performance of various detection systems. Some
studies used manipulated authentic data sets while in others there is no detailed
description of how these datasets were generated.
2.3. Mobile Money Transfer Synthetic Data 35
In this work, to address the challenge of lack of mobile money transaction
dataset, the methodology in [Gab+13] was adopted and implemented using
the Multi-agent based simulator in [LrA12b]. To trace the sequence of events
in the simulated dataset, time-stamp parameter was included in the simulator
rather than steps or category of users over a period of time. This was applied in
[Gab+13] and [LrA12b] respectively as further discussed in chapter 4. The next
Section provides a justification for the use of simulated dataset in evaluating
the applicability of predictive models.
2.3.1 Why Synthetic Data?
In information discovery and analysis, there is need to identify events that
could occur in the future. Developing test cases for information discovery,
analysis and prediction tools requires background data into which hypothet-
ical future scenarios can be overlaid. Obtaining real life data sets on mobile
money can be difficult due to privacy issues, time and cost associated with
collecting multiple instances of a diverse set of data sources [Jes+05]. Addi-
tionally, when real life data are available, they may be of poor quality, small
in quantity, lack fraud cases and possible different scenarios to explore. Al-
though using real data in supervised learning algorithm is often preferred, the
use of simulated data becomes a necessity due to inaccessibility of real data.
Simulated sets are designed to reflect real data behaviour as closely as pos-
sible [LKJ02]. Furthermore, it is highly important to be able to model how
emerging threat landscape will affect existing counter measures. For example
information from simulation of potential future scenarios can make it possible
to better plan for what might happen next by initiating further precautions,
if deemed necessary. Under all this circumstances, synthetic data becomes an
2.3. Mobile Money Transfer Synthetic Data 36
alternative solution in making the data set available [Gor15; Phu+10]. When
using synthetic data, several benefits can be identified including [Gab+13]:
• the possibility to generate as much data and as many different scenarios
as needed which could have taken months or years to collect from real
system,
• the control by researcher over parameters of the generated data which
can help to address the issues of class imbalance,
• the possibility to create data with specific properties to test characteristic
features of the algorithm,
• the absence of privacy or disclosure issues which hinder research in fraud
detection,
• the possibility to test and choose detection algorithms for systems which
are yet to be deployed.
There is no guarantee that synthetic data are fully realistic or representative of
real situations that an anomaly detection system can effectively use to identify
both existing attacks as well as newly evolving ones. However, to balance
some of this disadvantages, it is highly important to build a simulation with
characteristics that are as close as possible to a real world situations [Gab+13].
The next Section discusses the identified methodology from the literature that
as the level of depth required to generate synthetic data for this research work.
2.3.2 Synthetic Data Generation Methodology
The approach of generating synthetic data can be a complex system and in-
volves the use of autonomous and interactive agents. The main components
2.3. Mobile Money Transfer Synthetic Data 37
needed in the automation process are specifications of desired user behaviour
in the system, a user, an attacker and a system simulator. The data generation
process begins with collection of information about the anticipated behaviour
in the target system such as background data from similar systems and possible
attacks. This data serves as basis for modelling the user and system behaviour.
The various components involved in the data generation is as shown in Fig-
ure 2.4 and the aim of this methodology is to guide the production of these
components [LKJ02].
Figure 2.4: Synthetic log data generation method [BKJ03]
As illustrated in Figure 2.4, the different steps required to generate syn-
thetic data includes the following [BKJ03; LKJ02]:
2.3. Mobile Money Transfer Synthetic Data 38
From Step 1, the process begins with the collection of data that should be
representative of the anticipated behaviour. The data may consist of authentic
background data from the target system, background data from similar sys-
tems, authentic attacks as well as possible attacks. In most situations, known
attacks are often not available and to circumvent these difficulty, there is the
need to invent possible attack scenarios or adapt known frauds from other types
of services to the situation. Step 2, involves the analysis of the data collected
and identification of important properties such as user classes, attack charac-
teristics, statistics of usage and system behaviour. The aim of this step is to
get a picture of how the system is used, how user actions and attacks shows
up in the log data. Next Step 3, here the information from previous steps is
used to identify important parameters that can be used detect the anticipated
attacks. One way to identify these parameters is to study the features needed
to detect the expected fraud. The output from this step is files for different
user classes containing values for all parameters that are required for the user
simulation. Step 4, here a user and attack model is created. This models
must be sophisticated enough to preserve the selected profile parameters. The
output from this step is user classes containing values for all parameters that
are required for the user simulations. Finally the system is modelled in Step
5. This model must be accurate enough to produce log data similar to that
of the target system where the similarity can be restricted to those features
meant for fraud detection. The system simulator is implemented according to
this model and the output from this step is a lists of user actions in a format
that is suitable to use as input to the system simulator.
The essence of dividing the whole process into steps with well defined in-
terfaces is to reduce the complexity of the tasks in such a way that different
2.3. Mobile Money Transfer Synthetic Data 39
groups of people can work on different task. This makes it possible for us to
use people instead of a user simulator to create user actions. In addition, one
can use the whole, or parts of the real system instead of a system simulator.
This may be preferable in some situations especially when the system or user
behaviour is very complex and needs to be modelled in great detail [BKJ03;
LKJ02]. The next Section discusses how synthetic log data are created using
the methodology above.
2.3.3 Creation of Synthetic Log Data
During the data generation process, the interactions between different compo-
nents are as shown in Figure 2.5 :
Figure 2.5: The synthetic log data generation process [LKJ02]
The user profiles are used as input to the user and attacker simulator. The
configuration data for the user and attacker simulator contains information
that controls the generation of normal user and attack actions. For example,
it may be the start time and stop time for simulation, the number of normal
users from different user classes, the number of attacks of each type, and the
start and stop time for different attacks. The user and attacker simulator
2.3. Mobile Money Transfer Synthetic Data 40
generates user actions that are fed to the system simulator. The configuration
data for the system simulator contains for example information about the
number of clients that should be simulated, statistics about reply times and
traffic amounts for different events as well as behaviour in the presence of
certain attack events. The configuration data for the system simulator also
contains the parameters that will be tuned to vary the different simulation runs.
Lastly, in live data injection processes, aggregated data from real financial
transaction dataset can be injected as seeds into the simulation. This will
allow the simulation of a more realistic transaction data (due to absence of
real transaction dataset, this process was not implement in this study). With
these batches, synthetic data can be generated for different purposes without
reprogramming the simulators [LKJ02]. The next Section refers to the relevant
and adopted tools used for simulating synthetic data in this research.
2.3.4 Synthetic Data Simulation Using MABS
Simulation of synthetic data using Multi-agent based simulation (MABS) may
be implemented by programming with computer languages such as C, C++,
Java, etc., depending on their computational efficiency for the proposed model
[HB12]. There are also other user friendly event simulation libraries that can
be use to develop an agent-based model such as Repast [Rep], Swarm [Swa],
Netlogo [Net], Mason [MAS], etc. A detailed comparison of their features is
provided by [All09] and [RLJ06].
For the needs of synthetic data creation, MASON is adopted as the sim-
ulation engine. MASON is a fast and easily extendable multi-agent based
simulation software designed in Java. It supports discrete event interaction
between many agents in swarm [Luk+04]. According to Luke et al. [Luk+04],
2.3. Mobile Money Transfer Synthetic Data 41
MASON has several advantages such as (i) It carefully delineates between
model and visualisation. This allows models to be dynamically detached from
or attached to visualisers and enabling cross platform migration. (ii) Possibil-
ity of visualising and manipulating models in both 2D and 3D (using Java3D)
to produce screenshots and movies. (iii) The model layer comes with small
collection of classes consisting of a discrete-event schedule, high-quality ran-
dom number generator, and variety of fields which hold objects and associate
them with locations. These factors influenced the choice to use MASON as a
Java library for the data simulation in this research. The next Section refers
to relevant approaches for evaluating simulation dataset.
2.3.5 Evaluation of Simulation Data
Simulation models are abstract representations of real state systems and they
can help to increase the ability to control, forecast or understand the behaviour
of the actual system [DA14]. Simulation models can also be applied to decision
making and solving complex problems but there is the concern over whether
the outputs of a model are correct or not. However, it can be very difficult
to ascertain whether the behaviour that is been observed is truly the repre-
sentation of that system [HD92]. As a result, researchers have come up with
numerous approaches and techniques for validating and verifying the accuracy
of a model simulation.
The verification process is generally defined as the process of testing whether
or not the logic of the model is acceptable [CCB08; DA14]. While validation
process refers to the extent that the model adequately represents the system
being modelled [DA14]. According to [DA14], the validation approach can be
classified into two general categories: (1) quantitative methods, also called sta-
2.4. Chapter Summary 42
tistical methods that use statistical approaches to evaluate the credibility of the
simulation model such as t-test, Johnson’s modified t statistics, distribution-
free statistics, sensitivity analysis e.t.c (2) subjective methods that are based
on judgement of experts e.g Black-box testing, internal validity, Turing test
and face validation.
A chi-square test was used in [Gab+13]. The authors used chi-square test
with a significance level of 5% to check whether the amounts and periods of
the simulated data are normally distributed. It was shown that all the selected
regular users had normally-distributed amount and period. In [LrA12b] the
authors verified their simulated data by checking constraints such as positive
balance numbers, account age, consistency between the transfers, deposits and
withdrawals with the changes in account balances. The authors also validated
the simulated data, however since there was no real world data as input to
the simulator, the validation relied on the description of the desired scenario
and opinions of experts in the field to show that the basic statics and overall
process of the simulation design correspond to a real world scenario. The next
Section concludes the chapter by providing a summary.
2.4 Chapter Summary
In this chapter a number of issues on fraud detection in mobile money trans-
fer has been discussed. Firstly, a background on mobile money transfer, its
ecosystem and the regulatory issues fraudsters could explore was discussed.
Followed by the discussion from the investigation on issues surrounding the
availability of financial data set and methodology on how to generate mobile
money payment transaction dataset. From the result of the investigation, only
2.4. Chapter Summary 43
three previous work on fraud detection domain provided the required level of
depth that is needed to carry out this research work. From their work, the pro-
posed methodology in [LKJ02] and Multi-agent based simulator in [LrA12b]
were adapted in this thesis to simulate mobile transfer transaction data as
discussed later in Section 4.3.
The rationale for using the methodology in [LKJ02] was because it has a
well defined interface, which makes it easy to use i.e the whole process is divided
into steps. This provides the possibility of using the whole or part of the system
for data simulation. The choice of Multi-agent based simulator (MASON) was
based on the fact that MASON is fast and it supports discrete event interaction
between many agents in swarm (i.e facilitates the implementation of social
networks).
Furthermore in Chapter 2, the existing approach in the literature for eval-
uating simulated dataset were also discussed. From the evaluation, statistical
approaches were identified as the most recognised approach for evaluating the
credibility of a simulated dataset. Therefore, to evaluate the simulated dataset
in this thesis, a chi-square test is employed. The rationale for choosing chi-
square test is because it is easy to compute and robust with respect to the
distribution of the data [Mch13]. Finally, background on the topics related to
the research questions in this thesis and the research gaps that this thesis has
addressed was discussed.
Chapter 3
A Review of Fraud Detection
Techniques
In the previous chapter, the mobile money transfer business model and eco-
system was discussed. The challenges faced in obtaining real mobile money
transaction dataset was also reviewed. In the light of absence of real MMT
transaction dataset, the need for simulated dataset was identified and the
methodology for simulating synthetic mobile transfer dataset and how to eval-
uate these dataset was analysed. The decision to use a multi-agent based
simulation (MASON) was taken. This was done because of it’s speed and ease
of use for simulating the mobile money transfer dataset for the purpose of
building a predictive model that is able to learn hidden patterns in the trans-
action data. This chapter presents a review of common approaches used in the
literature for dealing with financial transaction service fraud detection.
Section 3.1 discusses the algorithmic solutions for financial service fraud
detection. The discussion was further divided into supervised and unsuper-
vised approaches. These two categories may be further grouped into four
44
3.1. ML Algorithms for Fraud Detection 45
groups based on how they are been evaluated as shown in Figure 3.1. Section
3.2 reviews the existing approach for handling unbalanced data for predictive
algorithms. The rationale for the adopted approach in this thesis were also dis-
cussed. Section 3.3 elaborates on evaluation techniques i.e performance metrics
for measuring a fraud detection system effectively. Section 3.7 concludes the
chapter with a summary.
Figure 3.1: Algorithmic solutions for fraud detection systems
Leveraging on the generic categorisation of algorithmic solution i.e Machine
leaning techniques into supervised and unsupervised approach in the litera-
ture, Figure 3.1 above presents a flow chart with slight modification showing
grouping based on their application in financial fraud detection domain. The
discussions on this different categories and groups follows.
3.1 ML Algorithms for Fraud Detection
Machine learning algorithms for fraud detection is an area of research where
ML algorithms are used to recognise patterns in data in order to discern fraud-
sters from legitimate clients. This is done based on thousands of pieces of
information that sometimes may seem completely unrelated to human beings
3.1. ML Algorithms for Fraud Detection 46
[Ale17]. Machine learning approaches have shown promising results in pre-
dicting financial transaction fraud [Zhd+14]. In the literature, these promis-
ing results were achieved using both supervised [Aze+14; KSM07; LrA12a]
and unsupervised [BH01; FM06; Wes+08] ML algorithms. The supervised
method is a system that attempts to learn by example using a teacher. Here,
a predictive model is trained under the supervision of labelled data i.e transac-
tions labelled as genuine or fraudulent to discover patterns associated to either
while the unsupervised learning methods works with unlabelled data samples
[AMZ16]. This approach is commonly used in outlier or anomaly detection
technique where an associated fraudulent behaviour to any transaction does
not conform with the majority class [BH01].
3.1.1 Supervised Approaches
In the supervised learning algorithm, labelled samples are used to train a
learner in order to predict the class of a new observation i.e the output vari-
ables that defines the class observation is assumed to be dependent on the input
variables [AMZ16]. Based on their evaluation approach, supervised learning
can be categorised into [Poz15]: i) supervised profiling, ii) classification, iii)
cost-sensitive and iv) networks methods. Supervised learning techniques have
been widely used in the literature for detecting financial transaction fraud.
This is particularly when the available dataset for the evaluation of this al-
gorithmic solutions are already annotated by fraud analyst. However, they
come with the challenge of overfitting due to unbalanced financial transaction
dataset characteristics i.e skewed class distribution of fraud to non-fraud data.
Several methods have been applied in the literature to deal with this prob-
lem such as data sampling, cost-sensitive e.t.c as further discussed in Section
3.1. ML Algorithms for Fraud Detection 47
3.2. The discussion of the different categories of supervised learning technique
follows.
Supervised Profiling
Supervised profiling works by observing the distribution of relevant variables
for both genuine and fraudulent accounts in an already labelled transaction
data sample [Sud+10]. As a result, different profiles are created for each class
such that when there is a new incoming transaction it is compared to see
which profile is more similar [Sud+10]. For example, Xu [XSL07] proposes
an adaptive user profiling method using association rule set to mine users
behaviour for credit card fraud detection.
Rule-based profiles are a popular approach in supervised profiling. These
rules can be defined by human experts or learned from data with a rule dis-
covery algorithm e.g ”any individual exceeding one million dollars worth of
transactions in a single day shall be considered suspicious”. The rule-based
approach is easy to understand and implement [Sud+10]. However, as the
criminal activities and legitimate user-behaviour evolves, fraudulent profiles
have to be updated as well. As an alternative, Wang et al. [Wan+03] pro-
poses a weighted ensemble approach that can be used to include new rules
while maintaining old rules. This means Profiles must be updated to reflect
the dynamic patterns of criminal activity as well as changes in legitimate user
behaviour. As a result, it presents a challenge for static rule-based methods
that are learned off-line as they must be frequently validated and retrained
[Poz15; Sud+10].
Supervised profiling are commonly used in fraud detection domain when
labelled transactions data are available. This makes it possible to profile or
3.1. ML Algorithms for Fraud Detection 48
construct distributions of relevant variables for both genuine and fraudulent
transaction [Sud+10]. For instance in supervised profiling one profile of ex-
pected genuine behaviour is maintained per customer and one profile of fraud-
ulent behaviour is maintained per type of fraud. Incoming new transaction are
then compared with the customer’s profile of genuine behaviour and with the
different profiles of fraud. Any deviations from expected behaviour or similarly
to known patterns of fraud may be a sign of criminal activity. This approach
is commonly used in telecommunications fraud detection research [Hol00].
Classification methods
Similar to supervised profiling approach discussed above, classification meth-
ods are commonly used when labelled transactions data are available. Tradi-
tionally they are commonly implemented as linear and non-linear models and
often seen as the standard way of solving financial transaction fraud problem
[Nga+11]. Over the years, several classification algorithms have been used
in the literature for fraud detection e.g Neural networks [CLL05; Moh+09;
Rav+11; ZS06], Support vector machines [DD13], Probabilistic graphical model
[Lui+15], and Decision Trees [LrA12a].
In recent years, decision trees have gained prominence in fraud detection
and credit risk scoring field [Zhd+14]. The authors in [Zhd+14] applied part
decision table, C4.5 and random forest algorithm to detect fraud chains in mo-
bile money transfer. The results from the experiment shows that C4.5 decision
tree had a better precision performance but the recall for all three algorithms
was quite low. Fabian et. al. in [Fab+17] incorporated pattern information in
the form of new attributes into random forest and logistic regression model.
The new attributes lead to a significant performance improvement for both
3.1. ML Algorithms for Fraud Detection 49
models compared to state-of-art aggregated transaction features. However,
the random forest has slightly better performance.
Artificial neural network consist of highly interconnected mathematical pro-
cessing elements, called neurons, that work together to solve problems in a
similar way the human brain performs [Hay99]. However, they are black box
models i.e difficult to understand and typically require a significant amount of
historical data [Sha+16]. A decision tree offers an alternative that is commen-
surate with neural network limitation describe above. It uses an approach that
is based on the extraction of conjunction and disjunction of rules that are rep-
resentation of the choice of the classification and they are easy to understand
[Poz15].
In this thesis four base-line classifiers were selected based on the fact that
they are commonly used in the literature for fraud detection problems. They
are Logistic regression (LR), Random forest (RF), Support vector machine
(SVM) and Artificial neural networks (ANN) for experiments in this thesis.
LR is a generalized linear model, easy to use and one of the most commonly
used technique for data mining in practice but is vulnerable to overconfidence
[Bha+11]. The RF classifier has the ability to capture non-linear data, shows
high scalability with better visual representation of results data but are liable
to over-fit. SVM has a regularization parameter that is used to prevent over-
fitting [Sud+10]. ANN can show higher accuracy in prediction but they are
much more computationally expensive and hard to understand the interpreta-
tion of predicted results. Table 3.11 shows some approaches used in detecting
financial transaction fraud.
1The description of most of these machine learning approaches are standard and can befound in a textbook on Data mining, for example [HTF09]
3.1. ML Algorithms for Fraud Detection 50
Table 3.1: List of some related works on fraud detection
Abbr. Full Name Description References
LR Logistic Re-
gression
Models the probability of oc-
currence of one (success) of the
two classes of dichotomous.
[Bhattacharyya2011,
Lin2014a, Ngai2011, Rav-
isankar2011, Pallavi2015]
DT Decision Tree A tree structure, where each
node represents a test on an at-
tribute and each branch repre-
sents an outcome of the test.
[Bahnsen2015, Geng2015,
Sahin2013, Soltaniz-
iba2015, Tsang2014]
RF Random Forest Operates as a decision tree op-
erator but creates an ensemble
of random trees.
[Albashrawi2016a,
Liu2015, Nolan2017,
Qi-Feng2015, Xuan2018,
Yaram2016]
SVM Support Vector
Machine
It uses a linear model to im-
plement nonlinear class bound-
aries by mapping input vec-
tors nonlinearly into a high-
dimensional feature space.
[Akbani2004, Chyan-
long2018, Francis2011,
Moepya2014, Sub-
udhi2015]
NB Naive Bayes A probabilistic classifier based
on applying Bayes Theorem.
[Bhowmik2011,
Gupta2017, Pani-
grahi2009, Saravanan2014]
ANN Artificial Neu-
ral Network
An adaptive algorithm that
works in a way in which the hu-
man brain performs.
[Azeem2014, Bekirev2015,
Dong2014, Olszewski2014,
Ogwueleka2011,
Roselina2015, Rav-
isankar2011, Yuan2017]
k-NN Nearest Neigh-
bour
An instance based learning
algorithm that operates by
choosing k nearest instances.
[Arianto2017, Chang2012,
Fahmi2016, Ganji2012,
Malini2017]
The discussion of some of these related works based on their performance
evaluation using different performance metrics follows in Table 3.2.
3.1. ML Algorithms for Fraud Detection 51
Tab
le3.
2:D
iscu
ssio
nof
som
eof
the
rela
ted
wor
ks
per
form
ance
Abbr.
References&
Sum
mary
LR
Bh
att
ach
ary
ya
etal.
[Bh
a+
11]
exam
ined
the
per
form
an
ceof
logis
tic
regre
ssio
n(L
R),
ran
dom
fore
sts
(RF
)an
dsu
pp
ort
vec
tor
mach
ines
(SV
M)
for
cred
itca
rdfr
au
dd
etec
tion
usi
ng
diff
eren
tle
vel
sof
un
der
sam
plin
gte
chn
iqu
e.T
he
RF
class
ifier
dem
on
stra
ted
over
all
bet
ter
per
form
an
ceacr
oss
per
form
an
cem
easu
res
use
d.
Afa
ctor
contr
ibu
tin
gto
the
per
form
an
ceof
LR
isp
oss
ibly
the
care
fully
der
ived
att
rib
ute
s
use
d.
Lin
etal.
[Lin
+14]
ap
plied
LR
ton
ovel
featu
rese
lect
ion
met
hod
sto
fin
an
cial
dis
tres
sp
red
icti
on
.T
he
emp
iric
al
resu
lts
ind
icate
s
that
the
LR
mod
elb
ase
don
the
novel
featu
rese
tse
lect
ion
ou
tper
form
edth
em
od
elw
ith
trad
itio
nal
featu
rese
lect
ion
mod
els
pre
dic
tion
acc
ura
cy.
Ravis
an
kar
etal.
[Rav+
11]
use
sd
ata
min
ing
tech
niq
ues
such
as
LR
,S
VM
,G
enet
icp
rogra
mm
ing
(GP
)an
dP
rob
ab
ilis
tic
Neu
ral
Net
work
(PN
N)
for
fin
an
cial
state
men
tfr
au
dw
ith
an
dw
ith
ou
tfe
atu
rese
lect
ion
.T
he
PN
Nou
tper
form
edall
tech
niq
ues
wit
hou
tfe
atu
re
sele
ctio
n,
an
dG
Pan
dP
NN
ou
tper
form
edoth
ers
wit
hfe
atu
rese
lect
ion
an
dw
ith
marg
inal
equ
al
acc
ura
cies
.
DT
Bah
nse
net
al.
[BA
O15]
pro
pose
dan
exam
ple
-dep
end
ent
cost
-sen
siti
ve
dec
isio
ntr
eealg
ori
thm
,by
inco
rpora
tin
gth
ed
iffer
ent
exam
ple
-
dep
end
ent
cost
sin
toa
new
cost
-base
dim
pu
rity
mea
sure
an
da
new
cost
-base
dp
run
ing
crit
eria
.T
his
was
evalu
ate
du
sin
gcr
edit
card
frau
d
det
ecti
on
,cr
edit
scori
ng
an
dd
irec
tm
ark
etin
gd
ata
set.
Th
ere
sult
ssh
ow
that
the
pro
pose
dalg
ori
thm
isth
eb
est
per
form
ing
met
hod
for
all
data
base
s.G
eng
etal.
[GB
C15]
pre
dic
ted
fin
an
cial
dis
tres
sin
Sh
an
gh
ai
Sto
ckE
xch
an
ge.
Th
ed
ecis
ion
tree
class
ifier
per
form
edle
sser
than
neu
ral
net
work
s,su
pp
ort
vec
tor
mach
ines
,as
wel
las
an
ense
mb
leof
mu
ltip
lecl
ass
ifier
sco
mb
ined
usi
ng
ma
jori
tyvoti
ng.
As
contr
ibu
tion
,
itw
as
dis
cover
edth
at
fin
an
cial
ind
icato
rs,
such
as
net
pro
fit
marg
inof
tota
lass
ets,
retu
rnon
tota
lass
ets,
earn
ings
per
share
,an
dca
sh
flow
per
share
,p
lay
an
imp
ort
ant
role
inp
red
icti
on
of
det
erio
rati
on
inp
rofi
tab
ilit
y.S
ah
inet
al.
[SB
D13]
pro
pose
da
new
cost
-sen
siti
ve
dec
isio
ntr
eeap
pro
ach
tom
inim
ize
the
sum
of
mis
class
ifica
tion
cost
sw
hile
sele
ctin
gth
esp
litt
ing
att
rib
ute
at
each
non
-ter
min
al
nod
ean
d
the
per
form
an
ceof
this
ap
pro
ach
was
com
pare
dw
ith
wel
l-kn
ow
ntr
ad
itio
nal
class
ifica
tion
mod
els
on
are
al
worl
dcr
edit
card
data
set.
Th
e
resu
lts
show
that
cost
-sen
siti
ve
dec
isio
ntr
eealg
ori
thm
ou
tper
form
sth
eex
isti
ng
wel
l-kn
ow
nm
eth
od
son
the
giv
enp
rob
lem
set
wit
hre
spec
t
toth
ew
ell-
kn
ow
np
erfo
rman
cem
etri
cssu
chas
acc
ura
cyan
dtr
ue
posi
tive
rate
.
RF
Nola
n[N
ol1
7]
pro
pose
da
com
bin
ati
on
of
un
sup
ervis
edan
dsu
per
vis
edle
arn
ing
usi
ng
both
Logis
tic
regre
ssio
nan
dR
an
dom
fore
stcl
ass
ifier
s
togen
erate
frau
dsc
ore
.T
he
frau
dsc
ore
was
use
don
the
pla
tform
tofl
ag
tran
sact
ion
san
dm
ark
them
for
manu
al
ver
ifica
tion
.A
pro
mis
ing
resu
ltw
as
ach
ieved
.Y
ara
m[Y
ar1
6]
use
da
set
of
class
ifica
tion
alg
ori
thm
s(D
ecis
ion
Tre
e,R
an
dom
Fore
stan
dN
aiv
eB
ayes
)fo
rd
ocu
men
t
clu
ster
ing
an
dfr
au
dd
etec
tion
.T
he
resu
ltan
aly
sis
revea
lsth
at
Dec
isio
nT
ree
an
dR
an
dom
Fore
stalg
ori
thm
sp
erfo
rmb
ette
rth
an
Nave
Bayes
alg
ori
thm
.Q
i-F
eng
etal.
[QF
+15]
pro
pose
da
fram
ework
that
use
sen
sem
ble
learn
ing
tod
etec
tn
ovel
tyb
ase
don
Ran
dom
Fore
st
(RF
).T
he
pro
pose
dap
pro
ach
was
com
pare
dagain
sttw
oco
mm
on
ap
pro
ach
es:
sup
port
vec
tor
dom
ain
des
crip
tion
(SV
DD
)an
dG
au
ssia
n
Mix
edM
od
el(G
MM
)on
on
eart
ifici
al
data
set
an
dfi
ve
ben
chm
ark
data
sets
.T
he
exp
erim
enta
lre
sult
ssh
ow
that
the
pro
pose
dm
eth
od
ach
ieved
bet
ter
per
form
an
cein
term
sof
acc
ura
cyan
dre
call.
3.1. ML Algorithms for Fraud Detection 52
SV
MS
ub
ud
hi
an
dP
an
igra
hi
[SP
15]
use
dS
up
port
Vec
tor
Mach
ine
(SV
M)
class
ifier
for
frau
dd
etec
tion
inm
ob
ile
tele
com
mu
nic
ati
on
net
work
s.
Th
eex
per
imen
tsh
ow
sp
rom
isin
gre
sult
sin
term
sof
det
ecti
ng
frau
du
lent
calls
wit
hou
tra
isin
gto
om
any
fals
eala
rms.
Moep
ya
etal.
[MN
V14]
dev
elop
edsu
pp
ort
vec
tor
mach
ine
(SV
M)
mod
elto
det
ect
fin
an
cial
state
men
tfr
au
du
sin
gp
ub
lish
edS
ou
thA
fric
an
fin
an
cial
data
for
the
evalu
ati
on
.T
he
SV
Mm
od
elw
as
com
pare
dto
the
k-N
eare
stN
eighb
ou
r(k
NN
)m
eth
od
an
dL
ogis
tic
regre
ssio
n(L
R).
Th
e
SV
Mm
od
elp
rovid
edan
incr
ease
dcl
ass
ifica
tion
acc
ura
cy.
Chyan
-long
[Cl1
8]
ap
plied
SV
Mto
det
ect
ente
rpri
ses
fin
an
cial
state
men
ts
frau
dfo
rT
aiw
an
Sto
ckE
xch
an
ge.
Ap
rom
isin
gp
red
icti
on
acc
ura
cyw
as
ach
ieved
.
NB
Gu
pta
etal.
[GK
B17]
use
dn
aiv
e-b
ayes
for
det
ecti
ng
cred
itca
rdfr
au
du
sin
gti
me-
stam
pan
dIP
ad
dre
ssfe
atu
res.
Th
eem
pir
ical
resu
lt
from
the
exp
erim
ent
show
sth
at
the
pro
pose
dapp
roach
work
sw
ith
more
effici
ency
than
exis
tin
gm
od
els.
Sara
van
an
etal.
[Sar+
14]
ap
plied
naiv
e-b
ayes
ian
class
ifica
tion
toca
lcu
late
the
pro
bab
ilit
yan
dan
ad
ap
ted
ver
sion
of
KL
-div
ergen
ceto
iden
tify
the
frau
du
lent
cust
om
ers
on
the
basi
sof
sub
scri
pti
on
inte
leco
mm
un
icati
on
sect
or.
Th
ere
sult
sfr
om
the
exp
erim
ent
show
sa
red
uce
dfa
lse
posi
tive
rate
.P
an
igra
hi
etal.
[Pan
+09]
pro
pose
da
novel
ap
pro
ach
for
cred
itca
rdfr
au
dd
etec
tion
usi
ng
afu
sion
of
dem
pst
er-s
hafe
rth
eory
an
d
bayes
ian
learn
ing.
Th
eem
pir
ical
resu
ltsh
ow
sth
at
fusi
on
of
diff
eren
tev
iden
ces
has
aver
yh
igh
posi
tive
imp
act
on
the
per
form
an
ceof
a
cred
itca
rdfr
au
dd
etec
tion
syst
emas
com
pare
dto
oth
erm
eth
od
s.
AN
NY
uan
etal.
[Yu
a+
17]
pro
pose
da
novel
fram
e-w
ork
that
com
bin
esd
eep
neu
ral
net
work
san
dsp
ectr
al
gra
ph
an
aly
sis
for
frau
dd
etec
tion
.
Inp
art
icu
lar,
they
use
the
nod
ep
roje
ctio
n(c
alled
as
spec
tral
coord
inate
)in
the
low
dim
ensi
on
al
spec
tral
space
of
the
gra
ph
sad
jace
ncy
matr
ixas
inp
ut
of
dee
pn
eura
ln
etw
ork
s.T
he
exp
erim
enta
lre
sult
ssh
ow
sth
at
the
spec
tru
mb
ase
dd
eep
neu
ral
net
work
sare
effec
tive
in
frau
dd
etec
tion
.O
lsze
wsk
iet
al.
[Ols
14]
pro
pose
da
frau
dd
etec
tion
met
hod
base
don
the
use
racc
ou
nts
vis
ualiza
tion
an
dth
resh
old
-typ
e
det
ecti
on
.T
hey
pro
pose
dvis
ualisa
tion
an
dth
resh
old
-typ
ed
etec
tion
met
hod
usi
ng
SO
MU
-Matr
ix.
Th
eex
per
imen
tal
resu
ltco
nfi
rmed
the
effec
tiven
ess
of
the
pro
pose
dap
pro
ach
.A
zeem
etal.
[Aze
+14]
use
dan
evolu
tion
ary
sim
ula
ted
an
nea
lin
galg
ori
thm
totr
ain
Neu
ral
Net
work
sfo
rC
red
itC
ard
frau
dd
etec
tion
inre
al-
tim
esc
enari
o.
Th
eex
per
imen
tal
resu
ltsh
ow
sth
at
bet
ter
resu
ltis
ach
ieved
wit
hA
NN
wh
entr
ain
edw
ith
sim
ula
ted
an
nea
lin
galg
ori
thm
.
k-N
NM
alin
ian
dP
ush
pa
[MP
17]
an
aly
sed
k-N
Nan
dou
tlie
rd
etec
tion
tooth
erco
mm
on
mach
ine
learn
ing
tech
niq
ues
for
cred
itca
rdfr
au
d
det
ecti
on
.T
hey
cam
eto
aco
ncl
usi
on
that
both
tech
niq
ues
min
imis
eth
efa
lse
posi
tive
rate
an
din
crea
seth
efr
au
dd
etec
tion
rate
wh
en
use
din
mon
itori
ng
cred
itca
rdtr
an
sact
ion
s.A
rianto
etal.
[AA
N17]
use
dk-N
Nfo
rd
etec
ting
op
inio
nan
om
aly
for
pu
blic
sect
or
fin
an
cial
state
men
ts.
Th
eyp
rop
ose
the
use
of
ori
gin
al
featu
res
from
pu
blic
sect
or
rath
erth
an
the
use
of
mod
ified
featu
res
from
pri
vate
sect
or
for
the
det
ecti
on
.T
he
resu
ltsh
ow
sth
at
the
ori
gin
al
featu
refr
om
pu
blic
sect
or
ou
tper
form
edth
at
of
the
pri
vate
sect
or.
Ch
an
g[C
C12]
pro
pose
da
new
earl
yon
lin
eau
ctio
nfr
au
dd
etec
tion
met
hod
that
con
sid
ers
acc
ura
cyan
dti
mel
ines
ssi
mu
ltan
eou
sly
usi
ng
k-N
N.
To
det
erm
ine
the
most
ap
pro
pri
ate
att
rib
ute
sth
at
dis
tin
gu
ish
bet
wee
nn
orm
al
trad
ers
an
dfr
au
dst
ers,
am
od
ified
wra
pp
erpro
ced
ure
is
dev
elop
edto
sele
cta
sub
set
of
att
rib
ute
sfr
om
ala
rge
can
did
ate
att
rib
ute
pool.
Usi
ng
thes
eatt
rib
ute
s,th
eex
per
imen
tal
resu
ltsh
ow
s
an
incr
ease
inth
efr
au
dd
etec
tion
syst
emacc
ura
cy.
3.1. ML Algorithms for Fraud Detection 53
Cost-sensitive methods
Unlike the aforementioned methods, cost-sensitive learning in fraud detection
systems helps in making cost-benefit-wise optimal decisions which means it
helps to estimate cost such as misclassification cost and test cost. Therefore,
cost-sensitive learning method is a type of learning in data-mining that takes
the misclassification costs and possibly other types of cost of misclassifying a
transaction as fraud or legitimate into consideration [KBC16]. Cost-sensitive
learning methods such as the Meta-Cost procedure, deal with class-imbalance
problems by incurring different costs for different classes [KBC16; LS10b]. In
the literature, diverse learning algorithms were used on minimising the total
cost of misclassification costs, test costs and other types of costs such as Deci-
sion Tree [SBD13], AdaCost [FSC99], AdaBoost [SS99] and SMOTEBOOST
[Cha+03].
Cost-sensitive learning such as AdaCost assumes that costs are fixed and
class-dependent [FSC99]. In fraud detection system, the cost lost or saved
is proportional to the transaction amount. This implies that the larger the
amount, the greater the potential loss. Similarly, the cost of missing a fraud
i.e a false negative is not fixed but proportional to the transaction amount
[FSC99]. For example in [Bah+13; BAO15; KBC16; SBD13], a cost-sensitive
classifier that depends on transaction cost was applied to the field of fraud
detection. Bahnsen et al. [Bah+13] used Bayes minimum risks classifier as
a method for cost sensitive credit card fraud detection. There experimental
results show that their proposed cost sensitive method decreased significantly
the cost due to fraud as compared to other techniques used in the literature.
Similarly, Kim et al. [KBC16] saved cost on financial misstatements using a
multi-class cost-sensitive classifier and a promising result was achieved.
3.1. ML Algorithms for Fraud Detection 54
However, according to Pozzolo in [Poz15] when the cost of misclassifying
a fraud is higher than that of a genuine transaction, cost-based algorithms
could in principle generate false alerts rather than take the risk to predict a
transaction as legitimate when it is actually not. As a result, these algorithms
can generate many false positive alerts which may be of no practical use for
investigators who require precise alerts.
Social Network methods
Social network analysis (SNA) is a form of link analysis technique that aims at
understanding relationships between network participants, by means of map-
ping and measuring [LN15]. The detection of links between data in SNA can
be achieved by the application of various graph mining algorithms (e.g Peer
group analysis [Van+15] and Entity link analysis [Sud+10]) on this data source
[Maj15]. This approach is commonly used in financial institutions for identi-
fying crime rings or groups of people that work together to commit money
laundering or fraud [Sud+10]. In fraud detection systems, analysing links
to identify suspicious individuals, groups, relationships, and unusual changes
over time/geography can be used to complement other traditional data-mining
techniques [Maj15] previously discussed above. For example, when groups of
individuals are creating fake identities for loan applications, SNA can be used
to flag suspicious behaviour by showing the connections of things like ad-
dresses, phone numbers and email addresses [Div15]. A literature review on
the research and application of knowledge mapping and SNA can be found in
[CL06].
Recently Veronique et al. [Ver+17] proposed a new approach (GOTCHA)
to define and extract features from a time-weighted network. GOTCHA was
3.1. ML Algorithms for Fraud Detection 55
used to exploit and integrate network-based and intrinsic features in fraud
detection system. They observed that domain-driven network variables have a
significant impact on detecting past frauds, future and improve the baseline by
detecting up to 55% additional fraudsters over time. However, linking social
networks data spread upon different heterogeneous data repositories calls for
addressing several challenging problems such as algorithms optimization and
parallelization, new knowledge representation paradigms for heterogeneous,
redundant, false information, graph analysis for clustering and partitioning
[SMM13].
3.1.2 Unsupervised Approaches
Unsupervised learning method works with no prior knowledge of any partic-
ular class of observations present in a data sample [HTF09]. In the case of
financial transaction fraud, there are no prior sets of legitimate and fraudulent
observations as in the case of supervised learning techniques. As discussed in
Section 3.1.1, assigning labels of class membership to available dataset are car-
ried out by fraud analyst which are subject to errors and are time consuming.
Unsupervised learning approaches are commonly used in outlier or anomaly
detection technique where an associated fraudulent behaviour to any transac-
tion does not conform with the majority class [BH01]. As a result, they are
not affected by the problem of mislabelled dataset and class imbalance. This
is based on the fact that there is no human class labelling which are subject
to errors and no class label for tuning and evaluation of algorithms during
simulation [Wue+16] respectively. Despite this advantage, the application of
unsupervised learning method in financial fraud detection has not received a
lot of attention in the literature [BH01]. This could be because it is more novel
3.1. ML Algorithms for Fraud Detection 56
and requires background knowledge in interpreting the discovered structure in
the dataset [GCG17].
In [KS12] an improved peer group analysis (unsupervised technique) was
proposed to detect suspicious patterns of stock price manipulation. In their
work, they incorporated the weight of peer group members into summarizing
their behaviour as well as the consideration of parameter updates over time
in order to detect a target that its behaviour deviates from its peer. Sherly
and Nedunchezhian [SN10] used clustering technique, one of the most popular
unsupervised method to detect credit card fraud and a promising result was
achieved. Despite the benefits of this technique, clustering analysis can suffer
from a bad choice of metric. This refers to the way it scale, transform and
combine variables to measure the ’distance’ between observations. A typical
instance is the difficulty to combine categorical and continuous variables in a
good clustering metric. In other words, observations may cluster differently on
some subsets of variables than they do on others in order to have more than
one valid clustering in a data sample [BH01].
In general, the idea of unsupervised learning is to observe a data sample
that represents normal behaviour, then attempt to identify groups that show
the greatest departure from this norm and flag as outliers [Kou+04]. However
this approach is very difficult to manage because [Kou+04]: i) Configuring
such rules require precise, laborious, and time-consuming programming for
each imaginable fraud possibility. ii) The dynamic appearance of multiple new
fraud types demands frequent adaptation of the rules to accommodate this
emerging fraud types. iii) As more data becomes available for the system to
process the scalability of the system is affected.
3.2. Learning from Imbalanced Data 57
3.2 Learning from Imbalanced Data
As discussed in Section 3.1.1, the application of supervised learning techniques
in fraud detection domain comes with the challenge of overfitting due to unbal-
anced transaction dataset characteristics. Most supervised learning algorithms
are not designed to cope with a large difference between the number of cases
belonging to different classes [GAM00]. Learning from unbalanced dataset can
therefore lead to several problems with respect to the output of a classifica-
tion model. For example: (1) The classifier may assume that the samples are
uniformly distributed, which is false in this case. (2) The classification model
is biased towards the dominant class [GS17]. Several methods have been pro-
posed in the literature to deal with this problem. However, to conform with
available literature, these proposed methods are categorised into three ma-
jor approaches; the data level, algorithmic level and cost-sensitivity learning
framework. The justification for this categorisation is based on the fact that
in data level method the data is tuned to handle the dataset skewness. In
algorithm level the learning algorithm is tuned to handle the dataset skewness
while the cost sensitive approach is a compromise for both data and algorithm
level methods. The discussions on this methods follows below.
3.2.1 Data Level Methods
In the data level method, the unbalanced strategies are used as pre-processing
steps. This is done to rebalance the dataset or remove the noise between the
two classes before any algorithm is applied. The reason for this is to reduce the
effect of the skewed class distribution in the learning process [Gal+12]. Data
sampling methods do not take into consideration any class formation in remov-
3.2. Learning from Imbalanced Data 58
ing or adding observations, yet they are easy to implement and understand
[Poz15]. They can be grouped into three main categories: under-sampling,
over-sampling and hybrid.
Under-sampling creates a subset of the original data-set by eliminating in-
stances usually majority class instances [Elh+16]. In under-sampling method,
it can be assumed that many observations of the majority class are redun-
dant and that by removing some of them at random, the resulting distribu-
tion should not change much. However, this comes with the risk of removing
relevant observations from the dataset that might contribute to the learn-
ing process since the removal is done in an unsupervised manner [DH03]. In
practice, this reduces the size of the data and therefore decreases the run-
time cost making easily to be adopted to unbalance data learning problems
[Elh+16]. Examples of under-sampling methods include but not limited to
Under-sampling based on Clustering [YL06; ZM03], Condensed Nearest Neigh-
bour (CNN) [Har68], Edited Nearest Neighbour (ENN) [Wil72] and Tomek
Link Removal (T-Link) [Tom76].
Oversampling creates a superset of the original dataset by replicating some
instances or creating new instances from existing ones. It replicates the mi-
nority class until the two classes have equal frequency [Elh+16]. As a result,
it increases the risk of overfitting [DH03] by biasing the model towards the mi-
nority class. In addition, it increases the training time particularly when the
original dataset is large thus making it ineffective. Examples of over-sampling
methods include but not limited to Adaptive Synthetic Sampling (ADASYN)
[Hai+16], Random Over-Sampling (ROS) [Fer+08] and SMOTE[Cha+02].
Hybrid methods consists of a combination of over and under-sampling
methods. In the hybrid approach, after applying the oversampling strategy the
3.2. Learning from Imbalanced Data 59
data is cleaned such that original or introduced instances by the oversampling
method in the new dataset are removed. For example, Ganguly and Samira
[GS17] applied a hybrid approach for the classification of imbalanced auction
fraud data. The result from their study shows an improved classification effi-
ciency for the different classifiers used. The hybrid strategy of over-sampling
and under-sampling arguably works better than either one [Cha+02]. Exam-
ples of hybrid sampling algorithms include but not limited to Smote+Tomek
[BBM03], SMOTE+ENN [BPM04] and SMOTE+Spreadsubsample[GS17].
3.2.2 Algorithm Level Methods
As an alternative to data level methods discussed above, algorithmic level
learning methods are adjusted to deal with the minority class [Gu+08]. As
a result, it is not exposed to the risk of removing relevant observations from
the dataset which might contribute to the learning process. In addition, it is
exposed to overfitting. In order to develop an algorithmic solution for data bal-
ancing, one needs knowledge of both the corresponding classifier learning algo-
rithm and the application domain. Most especially, a thorough comprehension
on why the learning algorithm fails when the class distribution of available data
is uneven is critical[Sun+07]. In this case, the learning algorithm is adjusted
by modifying the cost per class [PF01], adjusting the probability estimation
in the leaves of a decision tree (establishing a bias towards the positive class)
[WP03], or learning from just one class [RK04] (”recognition based learning”)
instead of learning from two classes (discrimination based learning) [Fer+08].
The idea behind the algorithm level method, is to modify the original clas-
sifier in order to learn better patterns from the minority class. Examples of
algorithmic solutions are Cost sensitive learning [LS10b] and Ensemble [ZY12].
3.2. Learning from Imbalanced Data 60
Cost sensitive learning2 is a learning approach that is used for improving
the performance of classifiers by applying a cost to the minority class distribu-
tion [San+17]. It biases the classifier towards the minority class by assigning a
higher misclassification cost to this class while at the same time seeks to mini-
mize the total error cost of both classes [Gal+12] as discussed in Section 3.2.3.
The major drawback of this approach, is the need to define misclassification
costs which are not usually available in the data-sets [Gal+12]. Research work
that focuses on using cost sensitive learning approach for financial fraud detec-
tion includes Sahin et al.[SBD13], Viaene et al.[Via+04] and Pinquet [PAG07],
etc.
Ensemble learning is the combination of multiple classifiers to improve or
increase the prediction accuracy of weak classification algorithms on unbal-
anced data. Bagging [BK99] and Boosting [FS96] are the most widely used
methods and their applications on several classification problems have led to
significant improvements [OT08]. In supervised ensemble learning [LWZ09],
majority of the classes are under-sampled by iteratively removing the majority
class instances that are correctly classified by a boosting algorithm. The idea
is that observations of the majority class that are easy to classify are redun-
dant and that by removing them the algorithm can concentrate on the hard
cases. In practical, these implies that the classification algorithm has to be
applied several times to reduce the majority class thus leading to an increased
computational cost. Research work that focuses on using ensemble learning
approach for fraud detection includes:
Hassan and Ajith [IA16], designed an innovative ensemble Insurance Fraud
2The cost sensitive approach is also presented in the next Section as it is thought to bea compromise between data and algorithm level methods in the literature.
3.2. Learning from Imbalanced Data 61
Detection (IFD) model using decision tree, support vector machine and arti-
ficial neural network base-classifiers. The base-classifiers were evaluated using
automobile insurance dataset and the empirical results illustrate that the pro-
posed models gave better results.
Perera et al. [Per+13], applied a novel ensemble approach for click fraud
detection which is based on a set of new features derived from existing at-
tributes. The ensemble model which is based on six different learning algo-
rithms showed an improved results on training, validation and test datasets,
thus demonstrating its generalizability to different datasets.
Zareapoor and Shamsolmoali [ZS15] applied bagging classifiers based on
decision tree algorithm for detecting fraud in credit card transactions. The
performance of the classifier was found to be stable gradually as the fraud rate
was increased during evaluation and with less computational time.
Sohony et al. [IRU18] applied ensemble machine learning approach to
detect fraudulent credit card transaction. In their ensemble learning model,
a combination of random forest and neural network was used with the aim
of addressing the skewed nature of financial transaction dataset. The results
from their experiment presents a high prediction accuracy.
3.2.3 Cost-sensitive Learning Methods
The cost-sensitive learning method falls between data and algorithmic level ap-
proaches. It incorporates both data level transformation which involves adding
costs to instances and algorithmic level modifications which modifies the learn-
ing process to accept costs [Gal+12]. In classification task with unbalanced
data, it is usually more important to correctly predict positive (minority) in-
stances than negative (majority) instances [WMZ07].
3.2. Learning from Imbalanced Data 62
According to Ling in[LS10b], cost sensitive learning can be classified into
two categories. The first category is the design of classifiers that are cost-
sensitive to themselves e.g decision tree [DH00]. The second category is the
design of a wrapper that converts any existing cost -insensitive classifier into
a cost-sensitive classifier. Their goal is to minimize the misclassification cost
using standard classification algorithms [Mal+97]. In the literature many ma-
chine leaning algorithms have been proposed for cost sensitive learning includ-
ing but not limited to decision tree [WMZ07], ANN [CB13] and SVM [CGC05]
.
In [SBD13], Sahin et al. applied a cost decision tree induction algorithm
to identify credit card fraud. As part of contribution to knowledge, they used
varying misclassification cost for their cost-sensitive decision tree induction al-
gorithm. Similarly, in [CZZ13], an optimized cost-sensitive SVM was proposed
to improve the performance of classification by simultaneously optimizing the
feature subset and misclassification cost parameters. The result shows that
the proposed approach is effective in comparison to commonly used sampling
techniques. Furthermore, estimating the cost in cost-sensitive approach is
considered highly challenging. Costs are not explicitly available or easy to
estimate. This makes it difficult to be adopted as a solution for unbalanced
learning [WMZ07].
For the purpose of data balancing in this thesis, a data sampling approach
was chosen over cost-sensitive approach because costs are not explicitly avail-
able or easy to estimate in cost-sensitive approach [Mal+97]. More discussions
on the advantages of data sampling approach over cost-sensitive approach can
be found in [EA13; WMZ07; VKN07]. Additionally, to ensure that the best
data balancing approach is adopted, a hybrid technique that combines an over-
3.3. Classification Performance Measures 63
sampling and under-sampling algorithm is employed. These works better than
either one [Cha+02; GS17].
3.3 Classification Performance Measures
As discussed in the Sections above, the applicability of the identified machine
learning techniques in detecting financial transaction fraud can be evaluated
using different performance metrics. Thus, the success of computational in-
telligence algorithms is an important step in determining their suitability at
solving their respective problems. This is especially true for financial trans-
actions fraud problem where minor improvements in performance can lead to
large economic benefits [WB16]. There are varieties of standards that can be
used to measure the computational performance of an algorithm such as: abso-
lute ability, visual mediums, probability of success, and more [WB15; WB16].
These standards are formulated from four possible outcomes from the classifi-
cation as shown in Table 3.3.
Table 3.3: Confusion matrix
Positive class Negative class
Positive class True positives (TP)
Number of examples correctly
predicted as pertaining to the
positive class.
False positives (FP)
Number of examples pre-
dicted as positive, which are
from the negative class.
Negative class False negatives (FN)
Number of examples pre-
dicted as negative, whose true
class is positive.
True negatives (TN)
Number of examples correctly
predicted as belonging to the
negative class.
3.4. Case-Based Reasoning 64
It should be noted that none of these measures alone is adequate by itself
since the aim of the classification task is to achieve good quality results for
both classes. One way to combine these measures and produce an evaluation
criterion is to use the area under the ROC curve (AUC) [Gal+12]. AUC pro-
vides a single measure of classifier’s performance for evaluating which model
is better on average. Thus, AUC can be applied to evaluate the imbalanced
dataset [Bra97] learner. Traditionally, standard classification measures such as
Accuracy, Recall and F-measures are commonly used for evaluating a classifier
[Cha+11a]. However, for classification with the class imbalance problem, ac-
curacy is no longer a proper measure since the rare class has very little impact
on accuracy as compared to the prevalent class [Gu+08]. Recall focuses on
the significant class (e.g fraud) and is not sensitive to data distribution, while
F-measure is a good metric due to its non-linear nature. Another common
evaluation metric is Mathews Correlation Coefficient (MCC); it measures the
quality of a two-class problem. This takes into account the true and false
positives and negatives. It is a balanced measure, even when the classes are
different sizes. In MCC the result of +1 indicates a perfect prediction, and -1
a total disagreement [Ran+17].
3.4 Case-Based Reasoning
As already mentioned in Section 3.1, most machine learning techniques rely
on statistical relevant dataset for prediction. Unfortunately, in the absence of
a significant size of historical data, they tend not to perform well [Sha+16].
A Case-based reasoning (CBR) on the other hand is a modern computational
method that solves new problems using solution (specific knowledge) from
3.4. Case-Based Reasoning 65
past and similar problems that were successfully solved [Lee08]. A simple
illustration of these by [AP94] is further described. A financial consultant
working on a difficult credit decision task, uses the knowledge from similar
previous case that involves a company in similar trouble as the current one.
This is used to recommend that a loan application should be refused. In essence
CBR is a cased-based reasoner that solves problems by using or adapting
previously successful solutions to old problems [BP89]. A basic principle of
CBR can be summarised as shown in Figure 5.3.
Figure 3.2: CBR classical paradigm [Cor08]
From Figure 3.2 the classical principle of CBR is to solve a target problem,
by retrieving a source case and adapt the solution to fit the target problem re-
quirements. The root of CBR methodology in Artificial intelligence arose out
of research into cognitive science on human reasoning and memory organiza-
tion by Roger Schank [Rog83]. Schank suggests that human knowledge about
the world is mainly organized as memory packets holding together particu-
lar episodes from our lives that were significant enough to remember. These
memory organization packets (MOPs) and their elements are not isolated but
interconnected by our expectations as to the normal progress of events called
scripts. For example, an MOP that contains a situation where some problem
was successfully solved and later the person finds him/herself in a similar sit-
uation. Then the previous experience is recollected and the person can try
to follow and apply the same steps in order to reach a solution [Rog83]. The
3.4. Case-Based Reasoning 66
reasoning of classical CBR paradigm is often organised following a cycle which
specifies the sequence of the various steps. This cycle now serves as a reference
for most studies in this field [Cor08]. The next Section presents the discussions
on the CBR process cycle.
3.4.1 Case-Based Reasoning Cycle
In general, CBR is a cyclic and integrated process made up of four cycles,
commonly referred to as the four R’s. The R’s stand for Retrieve, Reuse, Revise
and Retain. In a CBR process, there is a terminology called a case which
usually denotes the knowledge memory. These knowledge memory usually
contains details about a problem with a known solution as well as relevant
information that leads to the addressed solution [Kap12]. In Figure 3.3, this
cycle is illustrated.
Figure 3.3: The CBR cycle [AP94]
3.4. Case-Based Reasoning 67
The discussion of the CBR process as illustrated in Figure 4.5 follows:
1. Retrieve: In a CBR system, once a new transaction (a query) is ob-
served, the retrieve tasks examines the transaction description, searches
through its knowledge repository or case-base for similar cases to the
investigated one and ends when a best matching previous case has been
found. During the matching, the most common traditional approach
used is a similarity measure which can vary from rather a simplistic one
to an advance one based on the structure, knowledge intensive and field
of application [Kap12].
2. Reuse: This step is also known as the adaptation stage. The retrieved
case transaction found at the retrieve step is examined by a domain
expert. If suitable, it is proposed as the similar transaction case to the
new case. Otherwise, it is adapted to meet the specific requirements
or features of the new case [Cor08; Kap12]. This step is very domain
dependent and is optional depending on the application.
3. Revise: This step examines and evaluates the proposed transaction or
cases to the new case or query, to verify it’s suitability. If successful, it
uses it as the class of transaction to the new transaction (case retain-
ment). Otherwise it repairs the case transaction using domain specific
knowledge [AP94; Kap12]. This stage is domain dependent and may
change among applications [RGDAGC08]. Therefore, the touch of a
fraud analyst becomes critical at this stage to correct the system when
the class of transaction detected is wrong.
4. Retain: once the previous steps have been conducted successfully and a
verified transaction class (solution) has been confirmed and validated, a
3.5. CBR and Machine Learning 68
new case to be retained is formulated. This means that the result of the
problem-solving process is added to the systems knowledge repository for
possible future use. The retained case consists of the imported problem,
its proven solution as well as any available surrounding information in
the problem-solving process [Kap12].
In general the CBR cycle has been used over a wide range of applications
in the literature due to its generic characteristics and its adaptation flexibility
across systems [Kap12]. The next Section refers to the use of CBR with other
machine learning algorithms.
3.5 CBR and Machine Learning
In the literature, the coupling of a CBR method with other methods is used
to solve specific problems in various domains [YMR14]. These added methods
could be embedded in any of the stages that compose the CBR cycle. Gener-
ally, CBR systems are flexible systems capable of using the beneficial properties
of other technologies to their advantage. For example, Silva et al. [SVF15]
combined CBR with Artificial neural networks for credit risk analysis. Also,
according to Corchado [CL01], ”ANNs deal easily (and normally) with numeric
data sets whereas CBR systems deal normally with symbolic knowledge. Even
when symbolic knowledge can be transformed into numeric knowledge and nu-
meric into symbolic, there is always the risk of losing accuracy and resolution
in the data and hence obtaining misleading results”. Therefore, a combination
of CBR systems and ANNs may avoid transforming data and therefore gain
precision [CL01].
Other works where machine learning is used with CBR includes clustering
3.5. CBR and Machine Learning 69
[KPJ15; MA16; SE13], optimization algorithm [KGA12; Kj04; Yu+16], genetic
algorithms [Isl+02; SH99] and ontology [AL13; Mar+13; Qin+18]. In addition,
the combination of CBR with machine learning have been widely applied to
various domains. For example, the CBR method and rule-based reasoning was
applied to medical diagnosis system [SEDMK14]. The proposed system was
used to improve the accuracy of the retrieval process of the CBR systems.
The medical diagnosis system integrates case-based reasoning and rule-based
reasoning and also applies the adaptation process automatically by exploiting
adaptation rules. Both adaptation rules and reasoning rules are generated
from the case-base. The results show that the proposed approach increases the
diagnosing accuracy of the retrieval process of the CBR systems, and provides
a reliable accuracy compared to the current breast cancer and thyroid diagnosis
systems. Other applications of CBR and machine learning in the medical field
includes [ABL13; CG+13; YMR14] and [ABL13].
Ahn et al. in [AKH06] applied CBR and Genetic algorithm for customer
classification i.e classifies customers into either purchasing or non-purchasing
groups. The genetic algorithms was used to optimize the weights of features
and selection of instances for the CBR method. The experimental results
shows that simultaneously optimized CBR may improve the classification ac-
curacy and outperform various optimized models of CBR as well as other
classification models including logistic regression, multiple discriminant analy-
sis, artificial neural networks and support vector machines. Other examples of
CBR and machine learning for customer classification includes [LS10a; SK11]
and [ZLD14].
Chuang et al. in [Chu13] developed a CBR-based hybrid model for predict-
ing bankruptcy prediction i.e business failure. The hybrid models developed
3.6. Case-Based Reasoning in Fraud Detection 70
in their study include: RST-CBR (combining Rough Set Theory with CBR),
RST-GRA-CBR (integrating RST, Grey Relational Analysis, and CBR), and
CART-CBR (combining Classification and Regression Tree with CBR). The
RST-GRA-CBR hybrid model is a viable alternative method. It appears to
outrun the other four algorithms in terms of accuracy as it helps users to iden-
tify similar cases as references for making decisions. Other examples of CBR
and machine learning for business failure includes [LX12; PH02] and [SVF15].
CBR and machine learning has been used in the literature for both credit
scoring and fraud detection. For example, Wheeler in [WA00] applied mul-
tiple algorithmic CBR for fraud detection. The result from their experiment
suggests that multi-algorithmic CBR will be capable of high accuracy rates.
Other applications of CBR and machine learning in credit scoring and fraud
detection includes [CH11; Lee08]
From the examples above, it is proven that the CBR method is indeed very
flexible and viable for application in various forms of knowledge and domains.
The use of GA and CBR was extensively used in the literature. This motivates
the choice of GA for feature weighting in the proposed CBR systems in this
research work. The next Section refers to the relevant work in the literature
where the CBR system was applied to the problem of financial transaction
fraud detection.
3.6 Case-Based Reasoning in Fraud Detection
Case-based reasoning (CBR) as already mentioned in Section 3.4 is an artifi-
cial intelligence paradigm for solving new problems through reusing previous
similar problem solving experiences [AP94]. As an alternative to standard
3.6. Case-Based Reasoning in Fraud Detection 71
machine learning methods, CBR comes with a number of advantages when
applied to the field of financial transaction fraud [Wat99]. For example, case-
based reasoning features has the ability to (i) Learn in the absence of histori-
cal consumption data while continuously improving when more data becomes
available over time. (ii) Realize knowledge transfer as spending habits evolve;
as is the case where information on one transaction is exploited to improve
predictions for different yet similar transactions. (iii) Provide precedent-based
justification instead of justifying a solution by showing a trace of the rules that
led to decision [PH02; Wat99].
A case-based reasoning methodology has proven to be valuable and suc-
cessful in many range of applications. This is due to its generic characteristics
and its adaptation flexibility across systems [Kap12]. Furthermore, it is more
transparent than black-box models such as neural networks and has the ability
to operate with limited experience, learn and improve predictive accuracy as
more data becomes available [Sha+16; PDM15]. There are few research in
applying these technique in the context of detecting financial fraud patterns
[Ade+16; Kap+12], e.g [Ade+17].
In [WA00], multi-agent case-based reasoning approach was applied to the
problem of reducing the number of final-line fraud investigation in credit ap-
proval process. From the results, the adaptive CBR algorithm was found to
have the best performance. The results indicate that an adaptive solution can
provide fraud filtering and case ordering functions for reducing the number of
final-line fraud investigations necessary. The model however needs to be tested
with similarly complex data sets from other real world domains.
Park and Han [PH02] used multi-agent case-based reasoning approach to
reduce the number of final-line fraud investigation in credit approval process
3.7. Chapter Summary 72
achieving precise results. In [Ade+16] and [Kap+12], promising results were
produced with a basic CBR model (i.e standard CBR) for monitoring and pre-
dicting financial transaction fraud. However, the predictive accuracy of that
model was lower than that of a neural network of similar complexity and fea-
tured a relatively high false positive rate. As discussed in [SK13], this identified
weakness is considered as damaging due to high false negative rates for cus-
tomer trust. This acutely reflects why precision requirements for operational
fraud detection systems are high and partly explaining current reluctance to
adopt unified industry-wide approaches. In this thesis, an improved CBR sys-
tem that uses machine learning capabilities to predict mobile money payment
fraud is proposed.
3.7 Chapter Summary
In this chapter, algorithmic solutions for financial service fraud detection was
discussed. The discussion was further divided into supervised and unsuper-
vised approaches. These two categories were further grouped into four groups
based on how they are been evaluated as shown in Figure 3.1. Related works
on the use of machine learning algorithms for financial fraud detection were re-
viewed. As a result, logistic regression, random forest, support vector machine
and artificial neural neural network were then selected to run some prelim-
inary experiments as further discussed in Appendix A.1. The afore choices
was based on the fact that logistic regression is easy to use and one of the
most commonly used technique for data mining in practice. The random for-
est’s classifier has the ability to capture non-linear data, shows high scalability
with better visual representation of results data. Artificial neural network can
3.7. Chapter Summary 73
show higher accuracy in prediction.
Section 3.2 reviews the existing approach for handling unbalanced data for
predictive algorithms. As a result, a conclusion was reached to use a hybrid
technique that combines an oversampling and an undersampling algorithm to
balance the MMT dataset. According to [Cha+02], this works better than
either one. Section 3.3 discusses evaluation techniques (performance metrics)
for measuring a fraud detection system effectively. Finally, Section 3.4 con-
cludes the chapter by providing an introduction to the case-based reasoning
methodology and its application in fraud detection domain.
Chapter 4
Mobile Money Transfer Data
Simulation
Chapter 2 has demonstrated that there is no concrete evidence of research
using real transaction data for the learning stage of fraud detection system.
This is due to data protection, confidentiality, ethical issues, time, and the cost
associated with collecting multiple instances of a diverse set of data sources
[Jes+05]. In order to circumvent the challenge of lack of publicly available
data sets, representative mobile money transfer data is simulated as suggested
by Lundin et. al. in [LKJ02].
This chapter presents the simulation of transaction data that resembles a
Mobile money transfer payment platform. This is achieved by generating a
representative data on mobile money transactions that are as close as possible
to a real world situations. As case study for the data simulation, the Mobile
Money Transfer Service scenario requirements developed by EU FP7 MASSIF
(”MAnagement of Security information and events in Service Infrastructures”)
[Lla+11] will be used for the ongoing research. The rationale for using EU FP7
74
4.1. Mobile Money Model 75
MASSIF mobile money transfer service scenarios as the case study for the data
simulation was because from the literature only this work provides the required
level of depth that is needed to carry out the investigation.
Section 4.1 describes the mobile money model simulation platform. Sec-
tion 4.2 expands on this by discussing the users behavioural model within the
MMT platform. It describes the different agents within the MMT platform as
well as events that indicates their behavioural models. Section 4.3 discusses
the implementation of the simulator using the adapted Multi-agent Based tool.
This Section also describes the simulated scenarios as well as input and out-
put parameters from the simulation. It further describes the simulation walk
through. Section 4.4 discusses the data evaluation and presents the overall
statistical analysis of the simulated data. Finally, Section 4.5 concludes this
chapter by summarising the simulation approach adopted in this thesis for the
needs of mobile money transfer data generation.
4.1 Mobile Money Model
In order to model the mobile money transfer platform, the mobile money trans-
fer(MMT) service described in Section 2.2.1 is considered. This service uses
the schema of the real mobile money service that enables end-users to transfer
money to other end-users or buy goods and services from merchants. It is
assumed that the transactions are made with mMoney, which corresponds to
electronic money emitted by the operator that manages the service. In ad-
dition, End-users can exchange cash for mMoney and vice versa at mMoney
vendors by depositing or withdrawing cash from their MMT accounts as dis-
cussed in [Gab+13].
4.2. Users’ Behaviour Model 76
4.2 Users’ Behaviour Model
The user behaviour model approach corresponds to multi-agents models that
uses the schema of the real mobile money service to generate synthetic data
with known ground truth. Here, a scenario on legitimate/fraudulent agents
from the literature were created. In addition, elements of their activities
from the scenario was mapped to the data [WHV08]. To align with litera-
ture [Gab+13], there are two major agents in the simulation of our MMT
platform; (i) The legitimate users who subscribe to the Mobile-based Money
Transfer Service and thus own an account. (ii) The fraudsters also known as
bad actors who attack the system and take the place of a legitimate end-user.
4.2.1 Legitimate Actors
There are three major agents involved in the legitimate MMT system trans-
action; End-users (customers), Service providers (merchant and utilities) and
Distribution channels (retailers). Each category is made up of several roles
which are associated with specific actions in the platform. End-users are in-
dividuals who use their mobile devices to access to the MMT platform to
carry out transactions through the network provided by their operator. Ser-
vice providers sell services or goods to end-users. Retailer are in charge of the
distribution of electronic money to end-users [Gab+13].
In the legitimate agent model simulation, it is assumed that a legitimate
user tends to carry out frequently and repeatedly a specific set of transactions
i.e their transactions are mostly related to their habits. According to Gaber
et al. [Gab+13] a habit is a repetition of a sequence of legitimate transactions
which are characterized by (i) a type of transaction, (ii) a normally distributed
4.2. Users’ Behaviour Model 77
transaction amount, (iii) a normally distributed period of time between two
transactions of the considered habit, (iv) an initial date and (v) a final date.
From this, it can also be deduced that the connection of actors in the Mo-
bile money ecosystem who interact on a regular basis with each other can be
referred to as his Community of Interest (COI). Any shift or deviation from
this may be viewed as an anomaly or suspicion. This concept was also used in
[CPV01] and [Gab+13].
4.2.2 Bad Actors
They correspond to misuse cases of the system by ”Fraudsters” and they can
follow a pattern in which many parameters can evolve. In other to simu-
late these agents, the three identified misuse scenario from EU FP7 MASSIF
project [Lla+11] were explored and they are described in detail below:
1. Account takeover as a result of Sim splitting or loss of device. In this
case, an attacker gains access to the device of an mWallet holder and
knowledge of authentication credentials. Then he carries out several
transactions, purchases or money withdrawals. Events that indicate the
scenario: abnormal or shift in behaviour of end users compared to a
specific profile or compared to usual behaviour [Lla+11].
2. Retailer who is complicit with Money Laundering activities and then
facilitates opening of an account despite knowing such an account will
be loaded with funds coming from criminal activities. In this case, the
actor does not require the subscriber to provide true identity or any
specific identity. The fraudulent subscriber uses mWallet to manage
money coming from criminal activities. We assume that the fraudulent
4.3. Implementation 78
subscriber will withdraw money or make a purchase corresponding to
the amount of money stolen at a certain time after he receives it. Events
that indicate the scenario: an atypical use of mWallets or an abnormal
volume and frequency of cash transactions compared to a specific profile
e.g mWallet used only for withdrawal or p2p transfer, multiple funding
and loading sources of the mWallet, followed by withdrawals shortly
afterwards [Gab+13; Lla+11].
3. Account management system compromise by an attacker. Here, the at-
tacker takes control of the account management system and uses the
system to change accounts data (identity, balance, credit/debit accounts
etc). Events that indicates such a misuse case are a change of the global
balance of the mMoney in system, an intrusion in the accounts manage-
ment system, or a large number of transfers from several mWallets used
to fund the one specific account [Lla+11].
4.3 Implementation
The Multi-Agent Based Simulation was implemented by simulating the com-
bination of behaviour and habits of several users in a Mobile money envi-
ronment. The simulator is built according to the methodology proposed in
[LKJ02] as shown in Figure 4.1 (aggregated data from real transaction was
not injected into simulation as earlier mentioned in 2.3.3) and implemented
using the adapted multi-agent based simulator (MABS) developed by Lopez
et al. [LrA12b]. The main reason was that the proposed methodology has
a well defined interface which makes it easy to use while MASON facilitates
the implementation of social networks. As a contribution, misuse scenario in-
4.3. Implementation 79
volving SIM swap and retailer facilitating opening of end user account were
modelled which were not considered in [Gab+13; LrA12b]. Also, to trace the
sequence of events in the generated data, time-stamp parameter was included
in the simulation rather than steps or category of users over a period of time
used in [Gab+13; LrA12b] respectively.
Figure 4.1: The synthetic log data generation process [LKJ02]
From Figure 4.1 above, the user profile configuration (for both user and
attacker simulator) involves agents with two types of account profile; Personal
(P1) and Business account (P2). In order to simplify the model (i.e to reflect
a realistic scenario, using MPESA as the case study), it is assumed that all
values are given in Kenyan Shilling, with Profile (P1) having a maximum daily
limit of 35,000 Ksh (approx. 285 Pounds) for all transactions and a maximum
account balance of 70,000 Ksh. While for profile (P2) both thresholds are
increased to 140,000 Ksh.
In the user/attacker simulator, agents are switched from one profile to
another using Markov matrix of transition probabilities (Markov matrix was
chosen because it has the ability to transit an agent from one state to another
and it is commonly used in the literature for sequence evolution). This tells
the system when to change from Active to Inactive and from Profile P1 to
4.3. Implementation 80
Profile P2 which allows higher limits for transactions. The output from here
acts as simulated user actions which are then fed into the system simulator. As
part of configuration in the system simulator, each clients (i.e users) has five
possible actions in each step of the simulation i.e the category of transaction
they can perform which is considered as a ”habit” in the simulation. They
can either make a money deposit (MD), money withdrawal (MW), merchant
payment (MP), person-to-person transfer (P2P) or airtime recharge (AR). The
autonomy of the agent is implemented by a probabilistic transition function
that computes the type of operation and the action that an agent will perform
in each step. This transition function depends on clients attributes such as
category of user and the amount which is calculated according to the balance
and the limits of each client’s profile [LrA12b; Zhd+14]. Other configuration
parameters/data are provided in Section 4.3.2, 4.3.3 and appendix A.1.
For each simulation, the parameters and probabilities of occurrence for the
transition can be modified to improve the quality of the simulation towards
a more realistic scenario. Additionally, the implementation of the type of
action each client can perform is based on pseudo random transitions. These
pseudo random transitions are based on 3 different configurations using the
percentage of account balance in comparison with the maximum limit allowed
by the client profile (Lower than 15%, higher than 80% and medium balance
which is between low and high) [LrA12a]. The agent has a higher probability
to make a deposit when the balance is low. When the balance is high the agent
has a higher probability to make a withdrawal, transfer, pay a merchant, or
airtime recharge rather than a deposit [LrA12b].
4.3. Implementation 81
4.3.1 Simulated Scenarios
The system employs the concept of social networks to model agent’s behaviour
and interconnection within a mobile money transfer service. In a social net-
work, a node and edge connected to another node constitutes a social graph.
The nodes represent mobile money users while the edges describes the social
relationship between two mobile money users. Figure 4.2 illustrates the social
graph of mobile money users in the simulation of an MMT system. The graph
shown in Figure 4.2b represents a small simulation of 35 agents across 7 cities
(the nodes represents the 7 cities). This is used to explain what the edges and
nodes of the desired simulation scenario look like. The red nodes represents
fraudsters, while the purple represents legitimate users. Both nodes represent
various end users and their edges represents the interaction between pairs of
end users (i.e money sent or received).
Figure 4.2: Screen-shot of MMT simulation window
In total 2000 end users were created from 7 different cities performing sev-
eral transactions with partners either inside or outside of their city as shown
4.3. Implementation 82
in Figure 4.2a. In the simulation as part of the configuration, around 10%
of the end users will be behaving as malicious agents (fraudsters). Although
in a real life scenario it is more common to find a lower percentage of fraud-
sters. The rationale for using a high percentage of fraudsters is to prevent the
class imbalance problem during the training of the proposed prediction model
[LrA12b]. The transaction sequence is expected to follow this pattern [Lla+11;
Zhd+14]: (i) authentication, (ii) transmission of sender’s payment instructions
and transaction details to the MMT platform, (iii) authorization by the MMT
platform, (iv) credit and debit on the receiver’s and sender’s accounts respec-
tively. When a simulated user carries out a transaction, all the transactions
details are stored in a log file. Each entry contains the transaction type, trans-
action amount, sender and receiver pre- and post-transaction balance, sender
and receiver category, and transaction time-stamp [Zhd+14]. The generated
database contains all simulated transactions for six months.
4.3.2 Input Parameters
The simulation starts with some basic initial input parameters that can be
provided from aggregated statistics or supplied randomly based on users ex-
perience1. These values are automatically set as the default values when the
simulation is loaded as shown in Figure 4.3.
1Some of the parameters used in the simulation were obtained from the work of Lopez-Rojas in [LrA12a]
4.3. Implementation 83
Figure 4.3: Simulation window for input parameters
The input parameters are the initial parameters needed to run the sim-
ulation. The values for these parameters can be entered via the simulation
console or input files as shown in Figure 4.3. A detailed description of the
input parameters is as shown in Table 4.1
4.3. Implementation 84
Table 4.1: Simulation input parameters
Parameter Default Value Description
RandomMultiplier 0.1 multiplier for execution
MaxNeighbour 10 Maximum number of friends a mobile
money transfer user associated to a de-
vice can have
ClientsBalance - Clients balance generated randomly
during the execution of the simulation
using probabilistic function
MaxOtherNeighbour 2 Maximum number of friends a mobile
money transfer user can delegate when
in an in-active state
UpgradeAccountRate 0.01 The rate at which account is been up-
graded from personal to business during
the simulation
TransactionRate 0.5 The rate at which transaction is been
initiated for each mobile money transfer
users
Trans - Transaction category
NumClients 2000 Number of mobile money transfer user
device
Types - Type of operation a user perform in the
simulation
NumCities 7 Number of cities
The data simulation was run five times and at each simulation the pa-
rameters were varied (except for the number of clients, cities, transactionrate,
& UpgradeAccountRate). This will allow changes in behavioural pattern of
each client (i.e users) during each simulation. The different values used for the
parameters during the simulation is provided in Appendix B.
4.3. Implementation 85
4.3.3 Output Parameters
The output parameters are based on the results from simulating the behaviour
of several clients interacting in a Mobile Money environment. A log of trans-
action is produced as output from the simulation (MABS). This log was built
to generate the attributes described in Table 4.3.
Table 4.3: Simulation output parameters
Parameter Description
Time-stamp Date and time of transaction
Client ID Sender account’s number
Usercategory User’s grouping according to their possible spending behaviour (as
discussed in Section 4.4)
Profile Account type (P1-personal or P2-business)
Location Sender’s location
TypeTrans Type’s of transaction operations (i.e MD, MW, P2P, MP and AR)
Amount Transaction amount
Balance Sender’s account balance after each transaction
ClientB ID Reciever’s account number
ProfileB Account type (P1-personal or P2-business)
LocationB Receiver’s location
BalanceB Receiver’s account balance after each transaction
Suspicious Transaction label (fraud or non-fraud)
FraudType Class of misuse scenarios described in Section 4.2.2
These parameters were selected because they are typical description of any
mobile money transaction instances. The discussion of the simulation process
is presented in the next Section.
4.3. Implementation 86
4.3.4 Simulation Walkthrough
Simulation walk-through: In the beginning of the simulation, the first step
is to set up agents and their locations. Then different clients that will be
present in the simulation were randomly generated and each client is assigned
an ID. A client state at each time depends on a Markov transition matrix that
assigns when to change from Active to Inactive and from personal to business
account with higher limits on daily transaction. The clients in this simulation
have basic operations; they can either make a deposit, withdrawal, person-to-
person transfer, pay a merchant, buy airtime or decide not to perform any
transaction. If a client needs to perform an action, it conducts a local search
within its network to see which of its neighbours are in active state. If the
search is successful, then it places a request for a type of operation using a
probabilistic transition function. The request placed depends on the transition
function from client account balance, daily limits on each client’s account type,
and user spending habits category. When the balance is high the agent has a
higher probability to make a withdrawal, transfer, pay a merchant, or airtime
recharge, rather than a deposit. Figure 4.4 outlines these activities.
4.3. Implementation 87
Figure 4.4: A flowchart representing the simulation walk-through
Nevertheless, if the search is unsuccessful, the client can delegate a request
to an in-active client to conduct a local search within its own neighbourhood
for a mediator. Once this is achieved, a routing record is created with infor-
mation about the originator of the request. At each pass, the routing record is
updated with information about the intermediate requestor. At some point, if
the search is successful, then the delegated client places a request to perform
an operation on behalf of the initial requesting client. The delegation of re-
quest stops after a search is conducted in the neighbourhood, a level above the
requesting clients level. For each simulation, the input parameter values were
modified in order the improve the quality of the simulation. A total of 2000
end users were created from different cities performing several transactions
with partners either inside or outside of their network. The data simulation
was run five times for six months. At each simulation, the parameters were
4.4. Evaluation of the Log Data 88
varied except for the number of clients and cities to allow for changes in be-
havioural pattern of each client i.e users to be captured. The simulator stores
transactions details in a log file and each entry contains informations such as
the transaction type, amount, sender and receiver profile (Id, account type),
time-stamp etc. The files generated were merged, cleaned and to be used as
input data for the proposed prediction approach.
At the end of the data preparation phase, a total of 497,565 transactions
were generated with 5,900 transactions (1.2%) labelled as suspicious. That ra-
tio is a realistic representation of common class imbalance problems in financial
transactions dataset where one of the classes is very large in comparison to the
other one. In order to evaluate whether the dataset itself offers a realistic
representation, the log data was evaluated. This is discussed in Section 4.4
below.
4.4 Evaluation of the Log Data
The simulated MMT platform, format of the generated logs and Multi-agent
based simulator have been validated in [Gab+13], [Lla+11] and [LrA12b] re-
spectively. To evaluate the generated dataset in the absence of real world data
as input to the simulator, the simulated data was verified by checking con-
straints such as positive balance numbers, account age, consistency between
the transfers, deposits and withdrawals with the changes in account balances
(as earlier discussed in Section 2.3.5 & 2.4). In addition, to verify whether the
amounts and periods of the simulated data were normally distributed as in
case of Gaber et. al. [Gab+13], a chi-square test was employed. For the pur-
pose of evaluating the simulated data, the end-users to be used were selected
4.4. Evaluation of the Log Data 89
randomly and manually. However, to avoid too much complexity, the number
of end-users used in the evaluation was limited to 20. Only the 905 users who
had done more than 60 transactions (average transaction generated by the end
users) were considered. The end users were separated into four different groups
according to the days of the week. The mWallet account is used as adapted
from [Gab+13]. This enables us to highlight the days of the week the mWallet
account is used most often. Therefore, it provides a suitable representation of
spending behaviour frequency in the MMT platform as described in Table 4.5.
Table 4.5: Groups of users according to possible days of the week the mWalletaccount is used.
Class Days of the Week Possible spending representation
S M T W T F S
A x x Low (up to 2 days per week)
B x x x Medium (up to 3 days per week)
C x x x x x High (up to 5 days per week)
D x x x x x x x Very high (up to 7 days per week)
It is however interesting to point out that from legitimate users spending
activity, habitual behaviours can be developed i.e transactions of each mWallet
account holder include reoccurring patterns regarding shopping areas, group
persons transfer made with or to, amount spent etc [Kok97]. According to
Zhdanova et. al [Zhd+14], this habitual behaviour are expected to have (i)
a type of transaction, (ii) a normally distributed transaction amount, (iii) a
normally distributed period of time between two transactions of the considered
habit, (iv) an initial date and (v) a final date. Based on these observations, the
verification of whether both the amounts and periods are normally distributed
was carried out using chi-squared test. The rationale for using chi-square test
4.4. Evaluation of the Log Data 90
is based on the fact that it is easy to compute and robust with respect to the
distribution of the data [Mch13]. For the chi-square analysis, a significance
level of 5% was used to check whether the amount and periods were normally
distributed. The results are summarized in Table 4.6.
4.4. Evaluation of the Log Data 91
Tab
le4.
6:R
esult
sof
chi-
squar
ete
stfo
rea
chse
tof
use
rs
Cla
ssM
DM
WP
2P
MP
AR
am
ou
nt
per
iod
use
ram
ou
nt
per
iod
use
ram
ou
nt
per
iod
use
ram
ou
nt
per
iod
use
ram
ou
nt
per
iod
use
r
A100%
100%
5100%
100%
5100%
100%
5100%
100%
5100%
100%
5
B100%
100%
5100%
100%
5100%
100%
5100%
100%
5100%
100%
5
C100%
100%
4100%
100%
5100%
100%
5100%
100%
5100%
100%
5
D100%
100%
5100%
100%
5100%
100%
5100%
100%
5100%
100%
5
Tota
l100%
100%
20
100%
100%
20
100%
100%
20
100%
100%
20
100%
100%
20
4.4. Evaluation of the Log Data 92
As shown in Table 4.6, the chi-square results are organized according to
users spending behaviour defined in Table 4.5 and the category of types of
transactions performed and which is considered as an habit. From Table 4.6
the results shows that the proportion of the selected users for both the amount
and period are normally distributed. On the basis of the above, the dataset
is considered realistic. Although, the reason for uniform results may be based
on the fact that the average percentage of transactions used in the selection of
participants was 60%. The accuracy of the generated log couldn’t be calculated
due to lack of authentic sample data in the simulation.
4.4.1 Quality of Data
The quality of data in information discovery, analysis and prediction is highly
important. A poor quality data can produce a unrealistic answer during anal-
ysis and prediction problems [HSW07]. For this purpose, several queries were
executed to verify if there are missing values, errors and outliers in the gener-
ated dataset.
• Missing values: transactions data with missing values (unverified client
profile) were used as part of the analysis. Although, there were few
missing values compared to the total volume of the data. These missing
values were left for further analysis because they may give some insight
into the scenario of a retailer who is complicit with Money Laundering
activities and then facilitates the opening of an account despite know-
ing that the account will be loaded with funds coming from criminal
activities.
• Errors: in the generated dataset, constraints such as positive balance
4.4. Evaluation of the Log Data 93
numbers, consistency between the transfers, deposits and withdrawals
with the changes in account balances were checked. There were a few
entries with negative values for the amount transacted which is impossi-
ble since the MMT platform was not designed to accommodate overdraft.
These entries were excluded from the analysis.
• Outliers: transaction data that are inconsistent with the remainder of
the dataset or deviate so much from other observations were identified
and examined (≈ 0.01% of the total transactions). In the context of
fraud analysis or detection, outliers have a higher probability of being
a fraud [BK17]. For this purpose this entries were left in the data for
further analysis.
The descriptive representation of the dataset after pre-processing follows
in the next Subsection.
4.4.2 Data Analysis
The overall statistics for the generated dataset consists of 497,565 transactions
log between 1st of September 2013 and 28th of February 2014. This Section
presents the descriptive representation of the simulated mobile money dataset.
Table 4.7 summarises the properties of the simulation dataset i.e the min.
and max. value, mean and standard deviation. Figure 4.5 shows the numbers
of non-fraud and fraud transactions. The number of fraud to non-fraud with
respect to the five different transaction operations performed is as shown in
Figure 4.6. Figure 4.7 2 shows the categories of users based on their frequency
of transactions. Figure 4.8 shows the Fraction of fraud types to category
2The description of the simulated data output parameters are presented in Section 4.3.2.
4.4. Evaluation of the Log Data 94
of transaction in the MMT dataset. Figure 4.9 shows the total number of
different fraud types. Figure 4.10 shows the line plot of amount fraction in
Kenya Shillings between the period of simulation. Finally, Figure 4.11 shows
a relationship graph for 100 most active MMT users in the simulation.
Table 4.7: MMT dataset statistics
Attribute Minimum Maximum Mean StdDeviation
usercategory 1 4 2.497 1.121
profile 1 2 1.149 0.356
location 1 7 3.421 2.196
typeTrans 1 5 3.408 1.296
amount (KES) 0.18 139998.46 25124.354 25736.816
balance (KES) 0.03 139998.06 25301.94 26036.484
profileB 1 2 0.837 0.599
locationB 1 7 2.354 2.367
balanceB (KES) 0.05 139993.95 18301.084 24755.301
FraudType 1 4 1.024 0.235
suspicious 0 1 0.012 0.108
4.4. Evaluation of the Log Data 95
Figure 4.5: Number of non-fraud and fraud transactions
As it can be seen from Figure 4.5 above, the MMT dataset has a ratio
of 98.8:1.2 for non-fraud to fraud transactions respectively. This represents a
realistic situation of common class imbalance problem in financial transaction
dataset where one of the classes is very large in comparison to the other one.
Figure 4.6: Different transaction services performed in the simulation
Figure 4.6 shows the distribution of fraud and non-fraud class for the differ-
ent types of MMT operations carried out (i.e Deposit, Withdrawal, Transfer,
Pay Merchant and Buy Airtime).
4.4. Evaluation of the Log Data 96
Figure 4.7: Categories of users based on their frequency of transactions
As it can be seen above, Figure 4.7 shows the distribution of different class
of transactions based on user spending behaviour frequency as mentioned in
Section 4.4.
Figure 4.8: Fraction of fraud types to category of transaction in the MMTdataset.
Figure 4.8 above shows the fraction of fraud types (as mentioned in Section
4.2.2) to category of transaction that was performed in the simulated dataset.
An observation from Figure 4.8 was that high proportion of fraudulent trans-
4.4. Evaluation of the Log Data 97
actions were carried out on Pay Merchant, Transfer and Withdrawal category
of transaction.
Figure 4.9: Total number of different fraud types
Figure 4.9 shows the distribution of the three misuse scenarios introduced
into the data simulation as mentioned in Section 4.2.2 after the data pre-
processing phase.
Figure 4.10: Line plot of amount fraction in Kenya shillings (Sept.2013 -Feb.2014)
4.4. Evaluation of the Log Data 98
Figure 4.10 shows the daily fraction of transaction amount between the
period of simulation. A representation of relationship between the most active
users in the simulation is presented in Figure 4.11 below using social network
analysis tool (Gephi).
Figure 4.11: A relationship graph for 100 most active MMT users in the simula-tion. The blue and red nodes represent Legitimate and bad actors respectivelyand the edges represent relationships between the actors.
4.5. Chapter Summary 99
4.5 Chapter Summary
This chapter addressed the challenge of lack of publicly available data set on
mobile money transfer by simulating transaction data using the synthetic data
generation method discussed in Section 2.3.2. To simulate the MMT platform,
a multi-agent based simulator(MABS) in [LrA12b] was used to generate mobile
money transfer transaction data using the misuse scenarios in EU FP7 MASSIF
project [Lla+11]. As a distinct contribution, misuse scenario involving SIM
swap and retailer facilitating opening of end user account were modelled which
were not considered in [Gab+13] and [LrA12b]. Also, to trace the sequence
of events in the generated data, time-stamp parameter was included in the
simulation rather than steps or category of users over a period of time as used
in [Gab+13] and [LrA12b] respectively. To ensure that the modelled data
characteristics are as close as possible to a real world situation, combinations
of user habits as well as their behaviour was incorporated in the simulation
as proposed by Gaber et al. [Gab+13]. From the results of the evaluation,
the generated dataset seems to be as close as possible to actual transactions
dataset. This dataset will be used to train and test the proposed predictive
model to further verify it’s applicability on fraud detection tools.
Chapter 5
Design of Mobile Money
Transfer Fraud Detection
System
All through the literature review in Chapter 2, it was consistently shown that
pattern recognition models using machine learning algorithms can be efficiently
used towards the identification of financial transaction fraud. However, there
is the need for a more efficient detection algorithm to address issues that can
affect their performance such as change in spending habits of legitimate users
and evolving sophistication of fraudsters.
This chapter presents the research approach towards the identification of
money transfer fraud in Mobile Money Transfer (MMT) environments. In or-
der to resort to the stated research questions, an integrated fraud detection
system in a mobile money transfer environment is used as a case study for the
ongoing research. Section 5.1 shows and describes the proposed framework
in this thesis to predict fraud in a financial transaction service environment.
100
Chapter 5. Design of Mobile Money Transfer Fraud DetectionSystem 101
This was used to illustrate the different components and algorithms used by
the approach as well as the different experiments performed with the aim of
investigating their performances. Section 5.2 describes the simplified technique
referred to in this thesis as standard case-based reasoning model and/or rep-
resentation for the similarity measurement of different mobile payment trans-
action instances.
However, with the aim of improving the performance of standard CBR
model, the approach was enhanced. Section 5.3 describes the problem repre-
sentation and how the standard CBR model was complimented with machine
learning capabilities for assigning parameter weights and automating the ran-
dom selection of k-value so as to detect financial transaction fraud. Section
5.4 address the skewness of the sample data using data sampling approach.
The applicability of the simulated dataset was tested by conducting pre-
liminary experiments using some machine learning algorithms. The rationale
is that, these algorithms have been used successfully in previous studies and
it is of interest to evaluate the simulated dataset with these algorithms before
using it to test the performance of the proposed CBR system. For that reason,
Section 5.5 describes the machine learning algorithms used as preliminary ex-
periment in this research for detecting mobile money transfer fraud. Section
5.6 discuss the different performance evaluation metrics and cross-validation
approach adopted. Finally, Section 5.7 contains the conclusions of this chapter
by summarising the research approach adopted in this thesis.
5.1. Proposed Detection Method 102
5.1 Proposed Detection Method
The aims of a fraud detection system (FDS) in a financial transaction service
would be to monitor and identify malicious transactions as early as possible,
while at the same time limiting the possibility of raising too many false alarms.
In a financial transaction service environment a fraudster attempts to or pre-
forms series of transactions with out being detected while the FDS tries to
recognise any malicious behaviour. As a step in performing these functions,
a detection framework is proposed using an augmented Case-based reasoning
method with machine learning capabilities as illustrated in Figure 5.1.
Figure 5.1: Proposed fraud detection framework
Figure 5.1 illustrates the architecture of the proposed framework. To de-
tect mobile money transfer fraud, a case-based reasoning (CBR) system is
being proposed as the classifier. The reason for such suggestion is that from
literature [PDM15; Sha+16], CBR has proven to be effective in the absence
of historical consumption data. It also shows a continous improvement when
more data becomes available over time. In addition, CBR has the ability to
realize knowledge transfer as spending habits evolve; as is the case where infor-
mation on one transaction is exploited to improve predictions for different yet
5.1. Proposed Detection Method 103
similar transactions [Ade+17]. As a result, CBR seems to be an appropriate
methodology as justified by several studies. The proposed framework consists
of three main components: Input, Process and Output. The discussion of these
components is as follows.
1. Input: Under the framework, the simulated MMT data were first pre-
processed by running several queries to verify it’s quality as discussed
in Section 3.5.1. The dataset was highly skewed with a ratio of 99.8:1.2
negative to positive instances, which is a major characteristics of finan-
cial transaction dataset. Neglecting data skewness in these approach
will lead to a less accurate prediction. Therefore, the ability to deal
with the imbalance data for the learning problem by selecting the best
data balancing approach provides an effective foundation for sampling
methods. After the data preparation phase, a clustering algorithm is
applied. The clustering algorithm helps to guide and reduce the search
space by collecting the most similar clusters. This allows the identifica-
tion of cases collected under similar circumstances and limits the retrieval
to these cases. The rational behind this is to provide a structured case
base that will guide and speed up the retrieval process for the case-base
classification. This will also help to address the computation complexity
associated with the use of GA for feature selection as more data becomes
available over time.
2. Process: In this section, the CBR classifier is used to classify new in-
stances of MMT transactions into either fraud or non-fraud case. In order
to exploit the flexibility of weighting all the input vectors in the process
component, a Genetic Algorithm (GA) was used to calculate their weight
so as to reflect the significance of each vector as determined by the GA
5.2. Standard CBR Model 104
procedure. The GA is also used to automate the random selection of
k-value for the CBR classification. To perform the similarity measure
and retrieval task in the classification section using CBR, a k-Nearest
Neighbour algorithm is executed using Euclidean distance metric.
3. Output: As indicated in Figure 5.1, the output section provides a sum-
mary window for the prediction. It will provide ranking of clusters of
transaction neighbours for new cases which may operate as an effective
tool for experts to develop preliminary insight into suspicious transac-
tions which can then be investigated in more detail.
In order to evaluate the proposed detection framework, the implementation
of the experiments were performed in an evolutionary approach as discussed
in chapter 6. The initial sets of the experiment started with the application
of basic techniques (Standard CBR) on minimalistic datasets. After which
the remainder experiments progressed with the application of advance tech-
niques (Weighted CBR) on more sophisticated dataset, imposing both high
complexity and significant size. The discussion of the different stages of the
experimental design follows.
5.2 Standard CBR Model
Standard case-based reasoning methodology (henceforth referred to as Std-
CBR) can be used as a classification technique. It classifies an unlabelled case
by retrieving closely matching labelled cases and reusing their labels. For the
purpose of case classification, k−nearest neighbour (kNN) was used to predict
mobile money transfer fraud. The kNN requires defining the case represen-
tation and the similarity function, which may employ algorithms for feature
5.3. Weighted CBR Model 105
selection or weighting [MGA04]. In order to compute the similarity between
a new case and previously experienced case instances, weighted Euclidean dis-
tance metric was executed due to its simplicity.
d = |Z −X| =
√√√√ m∑i=1
wi|Zi −Xi|2 (5.2.1)
Where wi is the weight of vector (attribute) i, Z is the query (new case),
X is the source (retrieved case), m is the number of vectors in each case, and
i is an individual vector from 1 to m. The idea of this algorithm is to choose
k neighbouring cases for the input cases as the most similar case to this new
case and it also assigns the class of its nearest neighbour(s) to the new case.
Each neighbouring case has a classification (categorical) variable of interest
(e.g fraudulent or non-fraudulent) associated to it. Therefore to chose a k
class of transaction for an incoming transaction Zi, the incoming transaction
is compared to all stored cases in the past using the similarity measure d.
5.3 Weighted CBR Model
This chapter proposes an improved CBR approach (referred to in this thesis as
Weighted CBR) for the identification of money transfer fraud in Mobile Money
Transfer (MMT) environments. Here, StdCBR capability is augmented by ma-
chine learning techniques to assign parameter weights in the sample dataset
and automate k-value random selection in k-NN classification to improve CBR
performance. The CBR system observes users’ transaction behaviour within
the MMT service and tries to detect abnormal patterns in the transaction
flows. To capture user behaviour effectively, the CBR system classifies the
5.3. Weighted CBR Model 106
log information into five contexts and then recombines them into a single di-
mension. This is done instead of using the conventional approach where the
transaction amount, time dimensions or features dimension are used individ-
ually. The applicability of the proposed Weighted CBR system is evaluated
using simulation data.
5.3.1 Problem Representation
There are several factors that affect consumer spending behaviour such as
lifestyle, age, and income group which may evolve over time [SM70]. This
indicates that consumer transaction behaviour is temporal in nature. In addi-
tion, as consumer spending behaviour is temporal and most individuals exhibit
consistent spending habits, an event-driven chain of transactions [KPK10] can
be a robust representation of patterns [JTPL03]. To represent such behavioural
pattern of users, it is necessary to define events that model the MMT process.
According to [Rie+13], it is challenging to extract a workflow pattern of trans-
actions from event flow of mobile money systems since any user is free to use
the system as they wish (for instance, a user can have unlimited transactions
of different amounts, various frequencies, pay for activities of their choice,
etc.). For this reason our events representation was generated from the users’
behaviour in the mobile money system.
In mobile money transfer transaction processing, the spending behaviour
contains information about the transaction amount, time gap since last trans-
action, day of the week, etc. The transaction amount, frequency, and time are
closely related to spending behaviour of a person which are actually influenced
by income, resource availability, and lifestyle of the person. In most conven-
tional fraud detection systems (FDS), the transaction amount is considered
5.3. Weighted CBR Model 107
as the most important parameter for fraud detection. Also, previous research
work [KSM06], has shown that the efficiency of any FDS is associated with the
dimensions of amount and time which were used separately. In [Kun+09], the
authors combined these two dimensions into a single one and a significant im-
provement in performance accuracy was achieved. In this work, the approach
in [Kun+09] was extended by classifying features in our MMT transaction
dataset into five contexts and recombined into a single dimension to capture
user behaviour. For example, each clients transaction on the mWallet account
is denoted by a quintuple,
transactioninstance = [Transtype, Client, Interval, location,Amount]
where:
1. Transtype: Transaction type entities
2. Client: Features of entities (client ID, Profile e.g savings or current
account, account balance, spending habit category)
3. Interval: Features of entities (Month of the year, Day of the Week).
4. Location: Location entities
5. Amount: quantization of amount entities into finite levels up to maxi-
mum daily spending limit.
For the CBR system problem formulation, lets consider the total set of
transaction instances E in the log file, where each instance is a quintuple (vi)
vector representing the different contexts of information in a transaction.
vi = [ c1, c2, . . . , c5 ] (5.3.2)
5.3. Weighted CBR Model 108
where (c) is the context of information type
Each instance of query (Z) is composed of 5 vectors (vi) representing the
different context of information:
Z = [ vz1, . . . , vz5 ] (5.3.3)
For the case base representation, each case (X) contains a description (D)
with the corresponding solution vector (S), i.e an outcome tag (y) associated
to each instance of transaction. That is,
X = [ Dx, Sx ] (5.3.4)
where
Dx = [ vx1 , . . . , vx5 ] (5.3.5)
Sx = y1, . . . , yn (5.3.6)
In the experiment, the value of n is set to n = 2 as there are only two
possible transaction outcomes (non-fraudulent and fraudulent) in the MMT
dataset.
5.3.2 Case Similarity
For the needs of similarity measure, the similarity between two cases is de-
fined as a weighted average of the vector similarities. In [Ade+16], where the
StdCBR approach was used, the flexibility of weighting was not exploited i.e
all weights were simply set to the same value. However, since different vectors
are obviously of different importance, a conclusion was reached to take advan-
5.3. Weighted CBR Model 109
tage of the capability of genetic algorithms to assign weights that reflect such
differences more accurately. The CBR system was used in the following way:
Retrieval: During the retrieval process, an ordered list of k1 most similar
cases to the query were retrieved and returned. This was implemented using
a k-Nearest Neighbour algorithm2. The overall similarity value was computed
by weighting the local similarity of each vector (vi). Therefore, each pair of
vectors [ vzi , vzi ] is computed and the resulting value is weighted with a value
(wi) that represents the relevance of the corresponding transaction in the global
similarity computation:
Sim(Z,Dx) =5∑
j=1
wi ∗ Simi(vzi , vxi ) (5.3.7)
where5∑
i=1
wi = 1 (5.3.8)
Reuse: to obtain the solution for the query (Z), the proposed CBR system
applies a weighted voting schema according to the similarity of the retrieved
cases. Using a scoring function:
score(pi) =∑
sim(Z, Sx) ∀ x|Sx = pi (5.3.9)
Therefore, the solution assigned to the query is:
1For the purpose of this experiment as explained in Section 6.2, the value of k= 3, 5, 7,& 9 will be used.
2Weighted Euclidean distance was executed for the k-NN algorithm based on the factthat it is easy to understand and interpret [GG16].
5.3. Weighted CBR Model 110
pi = argmax{score(pi), i = 1, . . . , k} (5.3.10)
Where k is the value of nearest neighbour assigned during retrieval of the
kNN algorithm.
The two final steps of the CBR cycle (revise and retain) are implemented in
this particular case outside this research work. While they are important part
of the CBR cycle, they will be implemented as part of the fraud identification
process by the fraud analyst as this falls outside this research work. i.e the
touch of a fraud analyst becomes critical at this stage and as a result this
cycles are not automated in this research.
5.3.3 CBR Model Feature Weighting
In order to exploit the flexibility of weighting all the input vectors, a Genetic
Algorithm (henceforth GA) was used to calculate their weight so as to reflect
the significance of each vector as determined by the GA procedure. Weighting
of variables using GA was extensively used in the literature as for instance
in [AKH06; Eka03; JADaRg14; Man+12] 3. The rationale4 for using GA is
that it allows a better exploration of search space and thus tends to produce
a better result [BGK05]. For the purpose of this present work, the method
in [JADaRg14] was adapted for the configuration of the GA in obtaining the
weights for each of the vectors. In the experiment, the GA uses a popula-
tion of individuals representing the different weights and the generation of the
population evolves until the individual weights with the best performance is
3This has been used in other domains as discussed in Section 3.5.4Other methods like gradient descent was not chosen because it can get stuck in local
minimum and their performance are dependent on initial values of design variables [DCM93].
5.3. Weighted CBR Model 111
returned. Each individual weight contains both the vector weight and k pa-
rameter of the K-Nearest Neighbour algorithm to estimate the best number
of cases that must be retrieved to classify new transactions. For the need of
configuration, the genetic algorithm was run with an initial population of 1000
individuals. Each individual contains the weights of each vector and the value
of k5 (a random value 3, 5, 7 or 9) that the CBR model uses in the retrieval
stage. Also, at the initial stage a random value was assigned to each weight
and then later normalised to the sum of 1. The following cycle is repeated
until there is no more improvement in the performance of the best individual
population:
1. Evaluation: At this stage, the genetic algorithm executes a cross-validation
of the CBR system configured with the weights and k value for each in-
dividual in the population. The resulting performance is the fitness of
the individual.
2. Remove: After the evaluation of all the individual population, 25% of
the population with the worst fitness performance was removed.
3. Cross-over: To reproduce the population that was removed (i.e 25%
individual removed after the fitness evaluation), the genetic algorithm
combines the individual population with the best performance. During
the cross-over process, the parent individuals are taken in pairs and then
combined together to form a new individual called child. The weight of
each child individual contains the average weights of the parents (nor-
malised weight) and the value of k is computed analogously.
5The range 3 to 9 was used for k so as to ensure a scalable computation cost of theaugmented CBR system is achieved.
5.4. Data Pre-processing 112
4. Mutation: During the implementation of the mutation function, the In-
dividuals along with their weights are chosen randomly for modification,
using 5% of the population. This modification prevents local maximum
values.
In the evaluation stage, the genetic algorithm executes the cross-validation
of the CBR system using hold-out function6. The rational for using the hold
out function is to minimise the computation cost of running the experiment.
The next Section discusses the pre-processing of the dataset.
5.4 Data Pre-processing
In a real-world fraud detection scenario, the number of negative instances,
which refers to legitimate records, far outnumbers the number of positive in-
stances which refers to illegitimated records [GS17]. According to [PF01], the
ratio of negative instances may vary as much as 99:1 in the fraud domain.
This is because fraud is rare. In the case of mobile transfer fraud in this study,
the number of normal instances (usual transfer behaviour) exceeds the num-
ber of suspicious instances (unusual transfer behaviour). As a result of this
highly skewed dataset with very small number of positive instances, it is hard
to classify correctly. Nevertheless it is important to detect [AKJ04; GC03].
So to deal with the imbalance learning problem, the best strategy needs to
be selected. Therefore for mobile money fraud classification task, the data
sampling approach is chosen due to the reasons below [GS17; WMZ07]:
6The Hold-Out function is an evaluator function in the jColibri2 framework that splitsthe case base into test case (query) and training case (normal case base) [RGDAGC08].
5.4. Data Pre-processing 113
• Sampling methods perform with somewhat similar effectiveness if not
better, than that of cost-sensitive and algorithmic learning.
• In both cost-sensitivity and algorithmic learning technique, determining
the misclassification cost depends on the problem domain and is consid-
ered a challenging task.
In the literature over-sampling and under-sampling techniques have been
widely used. However, neither of them seems to be a clear winner. The
kind of sampling that yields to better results depends largely on the training
dataset that is been balanced [GS17; WMZ07]. Furthermore, the author in
[Cha+02] combined over-sampling and under-sampling techniques. As a result,
the initial bias of the learner towards the negative (majority) class was reversed
in favour of the positive (minority) class. Thus, the hybrid strategy of over-
sampling and under-sampling works better than either one [Cha+02; GS17].
For the purpose of balancing the training dataset in this research, a hybrid
strategy was adopted using the most used over-sampling and under-sampling
schemes namely; Synthetic Minority Over-sampling Technique (SMOTE) and
Tomek-Link respectively. Both methods have been widely used to deal with
class imbalance problems [GS17]. The next Section discusses the preliminary
experiments conducted after data pre-processing.
5.5. Preliminary Experiment 114
5.5 Preliminary Experiment
In this thesis, as part of background study 7, preliminary experiments were
conducted using some robust machine learning algorithms. From the litera-
ture, several machine learning algorithms have been used to predict financial
transaction fraud and they all have shown promising results. However, one of
the challenges of deploying machine learning tools for fraud detection problem
is how to select the right machine learning methodology. A successful selection
of a prediction algorithm [Bur17] for the identification of money transfer fraud
requires three important aspects; (1) There should be a mechanism for feature
selection/dimensionality reduction on sample sets so as to improve the estima-
tors accuracy scores or to boost their performance on very high-dimensional
datasets. In certain types of data, there are large number of features compared
to the number of data points. (2) The algorithm should calculate and provide
fast enough prediction to support decision making. (3) The algorithm should
have a continuous learning ability by which it keeps learning over time as more
data becomes available. That is to say the algorithm has to constantly use his-
torical data for learning and adapt its predictions to the dynamics of user’s
behaviour. These requirements can be addressed with a supervised machine
learning algorithms.
In supervised learning algorithms, in order to understand the performance
of the learning algorithms and to gain insight into the problem, it is helpful to
formulate the learning problem [Bur17]. In the preliminary experiment of this
thesis, the association between a categorical dependent variable and indepen-
7The rationale for using the preliminary experiments was to evaluate the sample data inthis experiment with some existing machine learning algorithms (having them as a baseline)before using it to test the performance of the proposed CBR system.
5.5. Preliminary Experiment 115
dent variables with either continuous or discrete values is defined. Therefore,
the problem is then formulated as a classification task. For the purpose of
representing the learning problem, let Y be an unknown outcome for a new
test sample to be predicted. Then a predictive model M (i.e a function with
adjustable parameters) is built, which is used to discover this unknown out-
come. Let X be the training examples. This is used to select an optimum set
of parameters for the classifier.
Definition 1 Let X be a set of input variables xi and Y be a label with classes
ci. Dtr is the training set of instances (xi, ci), where Dtr = (xi, ci), xi ∈ X, ci ∈
Y and 1 ≤ i ≤ n, where n is the total number of instances. The classification
problem is to determine a model M(X, Y ) such that it maps xi to target classes
ci.
The process of training and testing of the model is preceded by dividing
data D into two sets; training dataset Dtr and test dataset Dts. The training
dataset Dtr is used for training the classifier whereas the test dataset Dts is
used for the actual prediction. The data D is then split in the training and test
dataset into different proportions to evaluate the model (this process is known
as cross validation). The variable for prediction Y in the training dataset Dtr
is considered known but unknown in the test dataset Dts: thus, the variable
for prediction Y is predicted by the trained model M(X, Y ) on a test dataset
Dts.
The choice of machine learning algorithm is influenced by results from re-
viewed literature discussed in chapter 3. As a result, four well-known baseline
classifiers, Logistic regression, Random forest, Support vector machine and Ar-
tificial neural network were then chosen as algorithms for experiments in this
5.6. Evaluating the Efficiency of Prediction 116
thesis. Logistic regression is a generalized linear model, easy to use and one of
the most commonly used technique for data mining in practice but is vulnera-
ble to overconfidence [Bha+11]. The random forest’s classifier has the ability
to capture non-linear data, shows high scalability with better visual represen-
tation of results data but is liable to over-fit. Support vector machine has a
regularization parameter that is used to prevent over-fitting [Sud+10]. Artifi-
cial neural network can show higher accuracy in prediction but they are much
more computationally expensive and hard to understand the interpretation of
prediction results.
5.6 Evaluating the Efficiency of Prediction
In a supervised learning problem an algorithm is assessed based on its overall
accuracy to predict the correct classes of new and unseen observations. In
order to measure the capability of a trained model M(X, Y ), a test set Dts is
introduced:
Definition 2 Let a test set be defined as Dts = {xj, yj} where xj ∈ X, yj ∈ Y ,
1 ≤ j ≤ m, where X is a variable, Y is a variable for prediction, and m is the
number of data entries. There are no common elements in the sets Dtr and
Dts: Dtr∩Dtr = ∅, where Dtr is a training set. The set Dts is used to evaluate
the performance of the model M(X, Y ). Such evaluation can be performed by
comparing the values of variables yj ∈ Y against predicted variables yj for xj
for all (xj, yj) ∈ Dts.
In classification task, standard classification measures such as TPR, TNR
and Accuracy are misleading assessment measures in unbalanced class prob-
lem [Pro00] as earlier mentioned in Section 3.3. A well-accepted measure for
5.6. Evaluating the Efficiency of Prediction 117
unbalanced classification is the Area Under the ROC Curve (AUC) [Cha05].
This metric gives a measure of how much the ROC curve is close to the point of
perfect classification. Therefore, for the purpose of this thesis four metrics have
been employed including Recall, F-measure, Mathew Correlation Coefficient
(MCC) and Area Under the ROC Curve (AUC). According to [GS17], these
four metrics have high efficiency with respect to handling imbalanced data
without getting biased towards the majority class and also they are highly
suitable with respect to handling fraud detection domain:
1. Recall:
Recall =TP
FN + TP(5.6.11)
Recall focuses on the significant class (fraud) and is not sensitive to data
distribution.
2. F-measure: It is also known as F-score (F1).
F1 = 2 ∗ Precision×RecallPrecision+Recall
(5.6.12)
Precision is computed as TPTP+FP
F-score is a good metric due to its non-linear nature and has been used
for fraud detection.
3. Mathew Correlation Coefficient (MCC):
MCC =TP × TN − FP × FN√
(TP + FP )(TP + FN)(TN + FP )(TN + FN)(5.6.13)
MCC is considered one of the best singular assessment measure and is
less influenced by imbalanced data [Cha+11a; GS17].
5.6. Evaluating the Efficiency of Prediction 118
4. Area Under the Curve (AUC):
AUC = 1− FPrate + FNrate
2(5.6.14)
Where FPrate = FNTP+FN
and FNrate = FNTP+FN
AUC evaluates the overall classifier performance and is very appropriate
for class imbalance [Pow07].
5.6.1 Cross-Validation
Cross-validation sometimes called rotation estimation is a computer intensive
technique for evaluating predictive models by partitioning the sample dataset
into a training set to train the model, and a test set to evaluate it [Koh95].
In a k-fold cross validation the dataset D is randomly split into k mutually
exclusive subsets (the folds) D1, D2, . . . , Dk of approximately equal size. As
illustrated in Figure 5.2, the data is first divided in k parts, and then k models
are built on k − 1 parts of the data consequently.
Figure 5.2: Schematic representation of k-fold cross-validation
Each model is then tested on the left kth part of the data. For example
[Dan17] in a 5-fold validation, the sample data is divided into 5 pieces, each
being 20% of the full dataset as shown in Figure 5.3 below:
5.7. Chapter Summary 119
Figure 5.3: Schematic representation of 5-fold cross-validation
The experiment starts by running the first model, which uses the first fold
as holdout set and everything else as training data. This gives a measure of
model quality based on a 20% holdout set. Then in the second model, data
from the second fold are held out (using everything except the 2nd fold for
training the model). This gives a second estimate of the model quality. The
process is repeated using every fold once as the holdout. Putting this together,
100% of the data is used as a holdout at some point. This approach is employed
in this thesis for evaluating the prediction accuracy for the different classifiers
used.
5.7 Chapter Summary
Chapter 5 introduced the research approach for the identification of money
transfer fraud in Mobile Money Transfer (MMT) environments. To achieve
this, the research questions stated in Chapter 1 were investigated. In order
to apply similarity measures among transaction event sequences, the log in-
formation from the simulation data was classified into five contexts and then
5.7. Chapter Summary 120
recombined into a single dimension rather than the conventional approach
where transaction amount, time dimensions or features dimension are used
individually.
Two similarity algorithms are being presented in this chapter. They include
StdCBR and a Weighted CBR model. The weighted system uses a combina-
tion of CBR and GA as a tool to assign the significance level (weights) of the
features with the aim of improving the performance of the proposed CBR ap-
proach. Finally some machine learning algorithms were selected and discussed.
This were used to carry out preliminary experiment as further discussed in Ap-
pendix A.1.
Chapter 6
Experiments and Validation
The previous chapter described the approach used in this thesis towards the
identification of money transfer fraud in Mobile Money Transfer (MMT) envi-
ronments. In order to implement and apply the CBR methodology, a cus-
tom CBR model was designed and developed using jCOLIBRI framework
[RGGCDA14]; a Java framework that allows rapid prototyping of a CBR sys-
tem, the development and deployment of the CBR system in real scenarios.
For the purpose of evaluating the research methodology, different sets of
experiments were designed and conducted. However, to evaluate the research
methodology, the experiments were performed in an evolutionary approach
described in Section 5.1. The initial sets of the experiment started with the
application of basic CBR technique (StdCBR) on minimalistic dataset. After
those, the remaining experiments progressed with the application of advanced
techniques (Weighted CBR) on more datasets which exhibit both high com-
plexity and significant sample size. The rationale for this is to assess any
improvements achieved and verify the overall efficiency of the proposed CBR
system.
121
6.1. First Set of Experiments 122
This chapter provides the evaluation process of the undergoing research
with the produced results as well as the encountered outcomes from series
of experiments undertaken. The outline of this chapter is as follows; Section
6.1 discusses the initial experiments conducted using StdCBR approach with
the preliminary dataset. Section 6.2 addresses the research motivation for the
identification of mobile money transfer fraud using enhanced approach and
the experiments undertaken towards its evaluation. Furthermore, this section
unfolds the reasons that led to the adoption of clustering and gives an overview
of the experiments conducted using CURE clustering algorithm. The proposed
CBR approach was evaluated using1 Recall, f-measure, Mathews correlation
coefficient and area under curve (AUC). Finally, Section 6.3 concludes with
discussions on the results of the research experiments.
6.1 First Set of Experiments
In the first experiment, sets of preliminary experiments were designed to evalu-
ate the proposed approach discussed in Chapter 5. For the needs of evaluating
the CBR model, a step wise approach was adopted for the dataset used. An
initial familiarisation with a preliminary dataset was conducted to evaluate
whether a StdCBR model (as previously discussed in Section 5.2) is able to ef-
ficiently identify simple patterns in an already known pre-classified case. Then
a more complex case with another class of different fraud types was then used
in an attempt to promote its prediction aptitude. The reason for this break-
down of experiments is to allow better approximation and handling of the
results as well as gradual evaluation and reflection for its capabilities.
1This metrics can take a value from 0 to 1.
6.1. First Set of Experiments 123
The dataset used for this experiment was selected from the evaluated syn-
thetic dataset as discussed in Section 4.4 where only users who had done more
than 60 transactions were randomly selected. The dataset were characterised
with fraudulent or non-fraudulent labels serving as key point indicators to
whether the transaction are problematic or not. This label classification can
be used when the cases are not complex. Having this in mind, the experiments
were then designed to investigate whether it could perform successful identifi-
cation with an elementary dataset and also to check the overall feasibility of
the system.
To proceed further, a simulation was proposed and designed as the most
suitable approach since one of the prior targets was the feasibility evaluation
of the system while classifying the elementary dataset. The simulation was
carried out using 2000 cases from the event logs. These was split randomly
into a case base of 1800 cases and a test sample (target sample) of 200 cases.
The classification of the target sample was based on simple voting using kNN
algorithm with k = 3. The 3 nearest neighbours were retrieved for each case as
in equation 4.2. Each target case was classified as fraudulent or not fraudulent
based on the votes of its three nearest neighbours. In order to acquire more
accurate results the evaluation run was repeated 10 times and classification
results were finally averaged over 10 runs. This approach was applied to all
experiments conducted in this thesis. Results associated to the StdCBR model
are shown in Table 6.1.
6.1. First Set of Experiments 124
Table 6.1: Evaluation of StdCBR classifier on small MMT dataset.
Performance measure StdCBR
Recall 0.778
F-measure 0.579
Mathews Correlation Coefficient 0.573
Area under the ROC Curve 0.865
From Table 6.1 above, it can be observed that the StdCBR classifier was
able to correctly classify 78% of fraudulent transactions. Although the results
shows fairly positive monitoring attempt for the StdCBR system performance
on simplistic dataset, the result from the F-measure score indicates a high
precision score for the StdCBR classifier. Such a result could possibly lead to
a decline of valid customer transactions (i.e high false positive score).
The application of the StdCBR system was further used to provide explana-
tions on the similarity measures for the retrieved cases as shown in Figure 6.1.
6.1. First Set of Experiments 125
Figure 6.1: Transaction neighbours summary
From the interface of Figure 6.1, 3 nearest neighbours can be seen for each
new case, including classification score (fraud as 1 and non-fraud as 0) as well
as their similarity performance. This can provide a good insight into a number
of final line case investigation for experts after the existing detection system
has been utilised.
The primitive experiment and evaluation of StdCBR system has been
shown to be effective for the monitoring of simple dataset after applying de-
fault StdCBR configuration. As a step further, more dataset ( i.e dataset that
exhibits both high complexity and significant sample size) with improved CBR
configuration could be used. These dataset and configurations could be used to
evaluate the classification precision of this research approach. That motivates
the design and implementation of the second experiment as discussed in the
6.2. Second Set of Experiments 126
next section.
6.2 Second Set of Experiments
This is motivated by the encouraging results produced by the preliminary eval-
uation of the CBR approach. In order to ensure the classification precision of
the approach, there is the need for testing with more dataset that exhibit
both high complexity and significant sample size. This was profoundly indi-
cated after the evaluation and analysis of the previous experiment. Therefore
the following experiments aims to evaluate the prediction performance of ap-
proaches previously presented in Section 5.3 on high volume of transaction
instances.
The dataset used for this experiment had a significantly larger size with
more complexity than the dataset used in the previous experiments. Here,
the general annotation for fraud class (i.e 1) in the previous experiment were
further annotated using three distinct classes: A, B, and C. This represents the
three classes of fraud that was introduced in the data simulation experiment as
mentioned in Section 4.2.2. These classes are described in more detail below:
• Class A: Indicates account takeover fraud as a result of Sim splitting or
loss of device. For example an abnormal or shift in behaviour of end users
compared to a specific profile or compared to usual behaviour [Lla+11].
• Class B: Indicates a retailer who is complicit with Money Laundering
activities and then facilitates opening of account despite knowing that
the account will be loaded with funds coming from criminal activities.
For example, an atypical use of mWallets or an abnormal volume and
frequency of cash transactions compared to a specific profile i.e mWallet
6.2. Second Set of Experiments 127
used only for withdrawal or p2p transfer, multiple funding and load-
ing sources of the mWallet, followed by withdrawals shortly afterwards
[Gab+13; Lla+11].
• Class C: Indicates an account management system compromise. For
example, a change of the global balance of mMoney in the system or
a large number of transfer from several mWallets used to fund the one
specific account [Lla+11].
For this experiment, a slightly different approach was adopted regarding
the classification evaluation of CBR model. The reason for this was to investi-
gate which could be the most effective configuration for the predictor in order
to enhance their prediction capabilities. For the needs of this investigation,
certain changes have been made in terms of the case representation, nearest
neighbour configuration, parameter weights assignment.
Upon the above stated rationale, the CBR model used kNN algorithm with
k = 3, 5, 7, 92 respectively for each instances of payment transferred transac-
tions. In order to capture user’s behaviour effectively, the CBR model classifies
the log information into five contexts and then combines them into a single
dimension (already presented in Section 5.3). This is done instead of using
the conventional approach where the transaction amount, time dimensions or
features dimension are used individually. The advantage is that it gives a sig-
nificant improvement in accuracy over a system that considers each feature
dimension individually [Kun+09].
The number of parameters taken into consideration for this experiment was
2The value of k is usually odd number in case the kNN algorithm measure returns anequal number of frequencies for various classes [HAA14].
6.2. Second Set of Experiments 128
significantly larger compared to the previous experiment performed. This is to
allow an efficient evaluation and at the same time understand the effectiveness
of the current approach. As a result, the experiment factored in all the available
cases in the dataset taking into account their k (for k= 3, 5, 7, 9) nearest
neighbours. The key difficulty encountered with the use of large dataset was
the class imbalance. Therefore, a data sampling technique was introduced to
reduce the skewness of the training set. The discussion of the data sampling
follows.
6.2.1 Data Sampling
In a fraud detection problem, the class imbalance issue is often present and as a
result the number of fraud data is always less than that of non-fraud. This issue
presents a negative effect on the algorithms because most times the classifiers
are biased towards the majority class and thus, they return low performance
results [GS17]. In the case of the MMT dataset used in this thesis, it was
highly skewed with a ratio of 99.8:1.2 negative to positive instances which is a
major characteristics of financial transaction datasets [PF01]. Neglecting data
skewness in our detection model may lead to diminishing prediction accuracy.
Therefore, according to [GS17] the ability to deal with the imbalanced data for
the learning problem by selecting the best data balancing approach provides
an effective foundation for sampling methods. Thus, a data sampling approach
was chosen over cost-sensitive approaches because determining the cost in a
cost-sensitive approach is considered highly challenging [WMZ07].
For the purpose of balancing the MMT dataset using the data sampling ap-
proach, a hybrid strategy that combines an oversampling and under-sampling
algorithm is adopted as previously discussed in Section 3.2 which works better
6.2. Second Set of Experiments 129
than either one [Cha+02; GS17]. SMOTE and Tomek-Link were then selected
as the over-sampling and under-sampling respectively because they are well-
known classifiers in the literature [Pad+07; Gu+08]. In order to re-balance the
MMT data, a sampling ratio of 2:1 was adopted. According to [CC11; GS17],
this ratio is preferable in order to achieve an efficient distribution between
negative and positive instances in a training set. Figure 2.4 below shows the
result of performing SMOTE with k = 5 and ratio = 0.5 followed by Tomek
link removal.
Figure 6.2: An illustration of the SMOTE + Tomek
6.2.2 Experiments with the Weighted CBR
The StdCBR model proves to be efficient in the previously performed experi-
ment as discussed in Section 6.1 by providing good prediction on the investi-
gated cases. Despite that, further experiments had to be conducted in order
to ensure and verify its overall efficiency. Towards this direction an advanced
algorithm has been developed that uses the capabilities of GA for assigning
parameter weights and automating the random selection of k-value in order to
6.2. Second Set of Experiments 130
detect mobile money transfer fraud. At this stage of the experiment, events
and cases in the transaction streams were classified into five different types
of mobile-enabled financial operations available in the sample data for the
purpose of computing the case similarity. The main difference between the
StdCBR model and its advanced variant lies in augmentation with GA capa-
bilities and the event/case representation.
In order to evaluate the Weighted CBR model, a set of experiment was de-
signed and conducted. The motive was to evaluate whether the Weighted CBR
model was of benefit to the predictive accuracy of the CBR process. Addi-
tionally, the overhead computational performance of both the StdCBR model
and the Weighted CBR model was considered. The pseudo-code to calculate
the similarity between transaction sequences for the purpose of identifying
fraudulent transactions is been presented below:
Using k-Nearest Neighbour
Classify (X,S, Z) // X: case base, S: class labels of X, Z: query sample
for j = 1 to m do
Compute distance d(Xj, Z) // similarity Sim
end for
Compute set I containing indices for the k smallest distances d(Xj, Z).
return majority label for { Sj where j ∈ I}
This algorithm calculates the similarity between transaction sequences i.e
case and query. The similarity function was computed by optimally combining
the individual local similarity of (vi) into a global similarity as discussed in
Section 5.3.2. For the needs of experiment evaluation, the Weighted CBR
classifier is compared with the basic CBR model (StdCBR) using the individual
feature similarity dimension as the baseline.
In the experiment evaluation, a total of 736,030 transactions were used with
6.2. Second Set of Experiments 131
a ratio of 2:1 after the data sampling. To ensure that the CBR model is not
biased as a result of using the same transaction data for both the training and
testing phase in the experiment, the MMT dataset was split into training and
testing set using a ratio of 70:30 respectively. However, to avoid an overopti-
mistic estimate of the performance as proposed in [Fab+17], transactions from
known compromised MMT account was removed from the subsequent split.
For example when an MMT account is already associated with a fraudulent
transaction in the training set, its transactions are removed from the test set.
The Weighted CBR model uses GA to automate k-value random selection
in kNN classification between the range 3, 5, 7, or 9. Table 6.2 shows the
results for k= 3; however, the use of larger values of k in the experiment
did not present any significantly different results. The experiments were ran
for 10 iterations and the average was taken for the final classification result.
Figure 6.3 and 6.4 below shows the results of the experiments conducted as
well as the percentage of the correct classification for each distinct fraud class
category (A, B and C) and non-fraud transactions respectively.
Figure 6.3: Results for all types of fraud class detection
6.2. Second Set of Experiments 132
Figure 6.4: Results for Non-fraud transaction detection
The overall performance of the classifiers as shown in Figure 6.3 and 6.4
above can be characterised as good especially for the Weighted CBR + con-
text classifier. In cases belonging to status class C, which indicates account
management system compromise, StdCBR system shows some reluctance in
classifying them accurately. However, it can be observed that as the config-
uration of StdCBR system was enhanced using context features and genetic
algorithm capabilities for feature weights, the CBR system (Weighted CBR +
context) was able to classify with precision all the cases of fraud class more
accurately. This leads to an increase in detection accuracy of 0.032% for class
C. The same performance trend was also recorded for the remaining classifiers
as shown in Figure 6.4 for non-fraudulent transactions classification.
To further represent the effectiveness of the CBR classifiers, the following
parameters were adopted as the metrics for comparison and evaluation: Re-
call, F-measure, Mathews Correlation Coefficient (MCC) and Area Under the
ROC Curve (AUC). These four metrics have high efficiency with respect to
handling imbalanced data without getting biased towards the majority class
6.2. Second Set of Experiments 133
and also they are highly suitable with respect to handling fraud detection
[GS17]. Recall focuses on the significant class (fraud) and is not sensitive to
data distribution. F-score is a good metric due to its non-linear nature and
MCC is considered one of the best singular assessment measures and because
they are less influenced by imbalanced data [GS17; Cha+11a]. AUC evaluates
the overall classifier performance and is very appropriate for class imbalance
[Gu+08]. The performance evaluation of the classifier using the above metrics
are as shown in Table 6.2.
Table 6.2: Performance of classifiers: row 1, represents StdCBR, row 2 Std-CBR + new features, row 3 Weighted CBR, and row 4 Weighted CBR + newfeatures.
Model, attributes Recall F-measure MCC AUC
StdCBR 0.981 0.977 0.966 0.984
StdCBR + context 0.987 0.977 0.966 0.985
Weighted CBR 0.994 0.995 0.993 0.996
Weighted CBR + context 0.998 0.998 0.998 0.999
As observed in Table 6.2, the results from this experiment differs from pre-
vious work3 although the dataset contains the same features but with extended
simulation data of higher volume. Our recall is higher i.e our model accurately
captured both fraudulent and non-fraudulent events. Looking at Weighted
CBR + context classifier, it can be observed that the new feature (context of
information) lead to an improvement of 0.021 in F-measure. This indicates
that the new feature had significant improvement to the StdCBR model. It
can be observed from the Table 6.2 above that the Weighted CBR with context
3Tables 3, 4 [Ade+17] reports for smaller unbalanced dataset a recall of 0.46 for Std-CBRand 0.78 FW-CBR.
6.2. Second Set of Experiments 134
classifier had the best Mathews Correlation Coefficient result when compared
to other classifiers. This indicates that Weighted CBR + context classifier
has the highest probability of been the perfect model for the sample data in
this experiment. Another observation is that Weighted CBR + context clas-
sifier outperformed the other CBR classifiers in the prediction of both fraud
and non-fraudulent cases by demonstrating a high AUC value. To further
test whether the detected differences are statistically significant, we applied
t-Test on the performance result. All differences are found to be significant
(α < 0.01); the test shows that the newly proposed pattern features in addition
to the feature weighting significantly improves the performance regarding all
the different performance measures used. The experiment demonstrates that
the addition of the new features improves the StdCBR approach compared to
[Ade+17]. The addition of weighting and context as demonstrated in Table 6.2
further improves performance of the CBR model.
The computational complexity associated with the use of genetic algorithms
is seen as one of the major challenges in this experiment. The average execution
time for each of the experiments was 43,581 seconds on a pre-set Intel Core i5
2.16 GHz machine. However, more emphasis is placed on reducing computation
cost to improve the scalability of the proposed system in the next Subsection.
Although further experiments on different application domain may change the
experiment attitude, such investigation is beyond the scope of this thesis.
6.2.3 Weighted CBR with Clustering
The combination of case-based reasoning (CBR) with genetic algorithms has
been successfully applied to a wide range of applications in literature, such as
classification, diagnosis, configuration and decision support [Man+12]. As it
6.2. Second Set of Experiments 135
is obtainable in other learning systems, the collaboration of CBR with GA can
suffer from computational cost problem which occurs when knowledge learned
in an attempt to improve a system’s performance, degrades it instead [AJ04;
FR93]. This issue can be addressed by using a collaboration between CBR
and clustering to propose an available strategy at retrieval task which permits
choosing the best solution from a set of solutions found by clustering a case
base [MA16]. The discussion about the benefits of clustering was discussed in
Section 5.1.
Clustering techniques have been widely used in various fields to improve the
classification accuracy and computation cost of learning algorithms [TD13] as
the case library grows. The clustering approach divides data units or variables
into clusters such that elements within a cluster have a high degree of natural
association among themselves while the clusters are relatively distinct from
one another [KH01]. It is often used both in information retrieval [MA16]
and large or irregular case library problems [TD13]. For the selection of the
appropriate clustering algorithm, Kapetanakis et al. in [Kap12] suggested two
major factors that needs to be taken into consideration:
• The complexity of the case structure.
• The similarity algorithms that provided similarity among cases instead
of providing a static set of values for each case.
Based on the above specification, a CURE data clustering algorithm could
be applied in order to cluster the data into meaningful groups. Thus, a CURE4
4The scattered points approach employed by CURE alleviates the shortcomings of boththe all-points as well as the centroid-based approaches in other clustering algorithms. Thus,it enables CURE to correctly identify clusters in a dataset [GRS01].
6.2. Second Set of Experiments 136
clustering algorithm was chosen to be used for the retrieval process of the
Weighted CBR system. CURE algorithm employs a novel hierachical cluster-
ing algorithm that adopts a middle ground between centroid-based and the
all-point extremes [GRS01]. According to Tong and Wu in [TD13], CURE
algorithm operates by decomposing the entire case library and selecting a
small part of the cases in the clustering space using the high efficient ran-
domly extracting algorithm called Rivest-Shamir-Adleman (RSA). Then by
CURE clustering, the algorithm divides the subset of cases into a group of
local clustering. Each division is based on the minimum average distance to
find the center of each cluster and setup index. Then, k-NN algorithm is used
to cluster the entire case set based on the index of these cluster centers and
the threshold value T created by CURE algorithm. After a certain number of
iterations, the largest average similarity is identified as the final cluster result
[GRS01; TD13]. CURE clustering algorithm was chosen because it is more
robust to outliers, has lower computational time requirements and identifies
clusters having non-spherical shapes and wide variances in size [GRS01].
In summary, to further improve the computational cost of and at the same
time augment the prediction accuracy of the proposed CBR model, a collab-
oration approach using CBR and clustering is implemented. The objective is
to reduce the search space in the retrieval step and also to consider only the
most suitable cases and solution to support decision and provide an intelligent
strategy that enables the predictor to have the best solution.
6.2.4 Weighted CBR with Clustering Experiment
The performance of the Weighted CBR model is augmented by using clustering
techniques to partition the case-base into sample groups. The Clustering does
6.2. Second Set of Experiments 137
not aim at labelling the cases in a group with a specific tag as it happens in
case of classification where the tag represents a piece of generalized domain
knowledge, extracted from the subsumed cases. Rather, it collects the most
similar clusters i.e identification of the cases under similar circumstances and
limit the retrieval task just to them. It also structure the CBR case base,
guide and speed-up its retrieval process [MA16]. Thus, to improve the retrieval
process of the Weighted CBR system in this thesis, CURE algorithm in [TD13]
was adopted.
The clustering analysis [GRS01; TD13] of the case libary start by creating
an index for each cluster center point using CURE algorithm as shown in
Figure 6.5 below. To retrieve the target instances (DCase) from the case
library, the cluster which the target case belongs to is first determined using
the combination of index and nearest neighbour searching algorithm. There
after, the nearest neighbour algorithm is used to retrieve the case that is most
similar to the target case. This is based on the fact that nearest neighbour
algorithm is efficient for well-organised and indexed library.
Figure 6.5: Structure of case library retrieval using clustering algorithm [TD13]
6.2. Second Set of Experiments 138
The implementation of the process includes the following stages [TD13]:
Step 1: Determine a cluster that a target case belongs to. Calculate the
similarity between target case and each center point case cluster. Identify the
cluster that the center point case with the maximum similarity belongs to.
Step 2: Compare the maximum similarity with a threshold T. If the maximum
similarity is larger than the threshold T, put the target case into a separate
class, CO.
Step 3: Find the source case with the maximum similarity using the nearest
neighbour searching algorithm. Compute the similarity between the target
case and all the cases in this subset. Identify the source case that has the
maximum similarity with the target case as the candidate case to solve the
target case.
The algorithm description for the case retrieval process is given as follows:
SUB RetrieveInCB (CB)
I = MaxSimilarityOfCore (DCase, SubsetCaseCore[ ]);
Find cluster I that has the maximum similarity with target case DCase;
If the maximum similarity < threshold (T) THEN I = C0 as a separate class;
For C=StartInCB I TO C <> “” Extract cases by nearest neighbour algorithm;
SameDegree(C, DCase) ’Dcase is the targetcase;
NEXT C
RETURN the source case that has the maximum similarity with DCase;
END SUB
For the purpose of running the CURE algorithm to cluster the CBR case
library, five (5) clusters were chosen as the clustering configuration number.
The rationale is based on the fact that the value of k=5 has been widely used in
6.2. Second Set of Experiments 139
the literature with promising prediction accuracy as used in [Mou+06; SPA10;
Pat73]. The extended simulated balanced dataset with significant sample size
discussed in Subsection 6.2.1 was used as the case library for this experi-
ment. In order to comprehensively evaluate the experiment on the provided
dataset, the application of CURE algorithm with the set of CBR classifiers
(StdCBR, StdCBR + context, Weighted CBR and Weighted CBR + context)
were examined. To run the CBR classifiers using kNN algorithm, a clustering
configuration of k= 3, 5, 7 and 9 was adopted. The variation in the value
of k beyond 3 for the kNN algorithm retrieval process did not present any
significantly different results. Figure 6.6 shows the produced results after the
application of CURE algorithm.
Figure 6.6: CBR with clustering Results
Weighted CBR + context classifier had the highest results across the dif-
ferent performance metrics used in the evaluation. An additional observation
from the result of this experiments was that as the CBR system configuration
was been augmented with novel features and GA capabilities, the performance
of the CBR system increased gradually across board. The results for the classi-
6.3. Summary 140
fiers in the previous experiment (Section 6.2.2) shows that the Weighted CBR
systems outperformed the CBR systems with clustering algorithm. One of the
main concerns was whether the efficiency of the CURE clustering algorithm
was maximised or not. Although a fixed number of clusters were used in the
CURE algorithm configuration, the use of dynamic number of clusters could
have provided different or better results. This could not be verified and inves-
tigated in this thesis due to time constraints. However, a positive observation
was made from the experimental results which indicate that the computation
cost associated with collaboration of CBR system with clustering algorithms
was reduced from 43,581 seconds to an average of 8,695 seconds.
6.3 Summary
In this paper, a Weighted CBR model is suggested with the aim of improving
the performance of a StdCBR system for fraud identification in mobile money
transfer (MMT). Results from the experiments have shown that CBR system
can detect mobile money payment fraud efficiently by applying novel feature
as well as similarity measures on them.
In the performed experiments, the Weighted CBR system uses a combina-
tion of CBR and GA as a tool to optimize the significance level (weights) of
the features. For the experiment, instead of using the conventional approach
where the transaction amount, time dimensions or feature dimension are used
individually, the log information from the simulation data was classified into
five contexts and then recombined into a single dimension. A reason for that
was to improve the performance accuracy of the CBR system. Results demon-
strate that the classification of log information into five contexts improves the
6.3. Summary 141
performance of our proposed weighted CBR + context classifier with an area
under curve (AUC) of 0.98% to 0.99% for the two feature dimension perspec-
tives. Although the computational cost was the main concern of the experi-
ment which was further investigated. A conclusion was reached to use CURE
clustering algorithm to improve the computation cost of the retrieval process of
the Weighted CBR system. The results from the CBR system with clustering
algorithm shows positive prediction, indicating the success of the approach.
However, the Weighted CBR without clustering algorithm outperformed it. In
conclusion, it can be stated that the CBR classifiers have demonstrated good
prediction accuracy for the examined case study.
The current chapter finalises the research approach towards the detection of
mobile money payment fraud by using augmented CBR systems. The relevant
research work was presented as well as the experiments conducted for the
evaluation of the adopted research approach. The next chapter concludes this
thesis by summarising the finding, conclusions and future directions.
Chapter 7
Conclusion and Further Work
Mobile money transfer is a fast growing medium of making financial transac-
tion via a mobile device and it is increasingly becoming adopted in growing
markets, especially in developing countries. It has the ability to handle large
number of small value payments and worldwide funds exchange in digital cur-
rencies. Consequently, the usage of mobile money transfer introduces addi-
tional risks caused by a large number of non-bank participants, higher speed
of transactions and level of anonymity compared to other existing banking
systems. This thesis investigated how CBR can be used to address some of
these issues. In particular, the thesis focused on how the proposed detection
approach can be effectively used to analyse and predict transaction fraud in
mobile money transfer (MMT) networks. This chapter summarizes the differ-
ent investigated areas needed to carry-out this research work, results from the
experiments carried out, and presents future research directions.
142
7.1. Thesis Summary 143
7.1 Thesis Summary
This thesis started by providing an overview of mobile money services ecosys-
tem and the rationale for selecting mobile money transfer service as the inves-
tigated domain. Then an investigation was carried on how to circumvent the
challenge of obtaining real life financial transaction dataset in this case Mobile
money. From the result of the investigation, only three previous work on fraud
detection domain provided the required level of depth that is needed to carry
out this research work. Therefore, the proposed methodology in [LKJ02] and
Multi-agent based simulator in [LrA12b] were adapted in this thesis to simu-
late mobile transfer transaction data as discussed in Section4.3. Furthermore,
in Chapter 2, the existing approach in the literature for evaluating simulated
dataset were also discussed. From the evaluation, chi-square test was selected
to evaluate the simulated dataset. The rationale for choosing chi-square test
was because it is easy to compute and robust with respect to the distribution of
the data. The results from the evaluation using chi-square test shows that the
proportion of the selected users for both the amount and period are normally
distributed.
Throughout this research, several other algorithms for detecting financial
transaction fraud were researched including Logistic regression (LR), Artificial
neural network (ANN), Support vector machine (SVM) and more. As a result
LR, RF, SVM and ANN were then selected to run some preliminary experi-
ments as further discussed in Appendix A.1. The afore choices was based on
the fact that logistic regression is easy to use and one of the most commonly
used technique for data mining in practice. The random Forest’s classifier
which has the ability to capture non-linear data, shows high scalability with
better visual representation of results data. Support vector machine has a
7.1. Thesis Summary 144
regularization parameter that is used to prevent over-fitting. Artificial neu-
ral network has shown higher accuracy in prediction. Also in Chapter 3, a
review of the existing approach for handling unbalanced data for predictive
algorithms was presented. As a result, a conclusion was reached to use a hy-
brid technique that combines an oversampling and undersampling algorithm
to balance the MMT dataset. According to [Cha+02], this works better than
either one. Furthermore, discussions on evaluation techniques (performance
metrics) for measuring a fraud detection system effectively was carried out.
Therefore, the following performance metrics were then selected to be used in
this thesis. They include Recall, F-measure, Mathews correlation coefficient
and Area under the ROC curve. This is based on the fact that they have high
efficiency with respect to handling imbalanced data without getting biased
towards the majority class and also they are highly suitable with respect to
handling fraud detection [GS17].
Chapter 4 addressed the challenge of lack of publicly available data set on
mobile money transfer by simulating transaction data using the synthetic data
generation method in [LKJ02]. To simulate the MMT platform, a multi-agent
based simulator(MABS) in [LrA12b] was used to generate mobile money trans-
fer transaction data using the misuse scenarios in EU FP7 MASSIF project
[Lla+11]. As contribution, misuse scenario involving SIM swap and retailer
facilitating opening of end user accounts were modelled. These were not con-
sidered in [Gab+13; LrA12b]. Also, to trace the sequence of events in the
generated data, time-stamp parameter was included in the simulation rather
than steps or category of users over a period of time used in [Gab+13; LrA12b]
respectively. To ensure that the modelled data characteristics are as close as
possible to a real world situations, combination of users habit as well as their
7.1. Thesis Summary 145
behaviour was incorporated in the simulation as proposed by Gaber et al.
[Gab+13]. From the results of the evaluation, the generated dataset seems to
be as close as possible to actual transactions dataset. This dataset was used to
train and test the proposed predictive model to further verify it’s applicability
on fraud detection tools.
Chapter 5 introduced the proposed CBR approaches for the identification
of money transfer fraud in Mobile Money Transfer (MMT) environments. In
order to apply similarity measures among transaction event sequences in the
CBR system, the log information from the simulation data was classified into
five contexts and then recombined into a single dimension rather than the
conventional approach where transaction amount, time dimensions or features
dimension are used individually. Two similarity algorithms are being presented
in this chapter, StdCBR and a Weighted CBR model. The Weighted system
uses a combination of CBR and GA as a tool to optimize the significance level
(weights) of the features with the aim of improving the performance of the
proposed CBR approach. The result from the experiments shows that the
incorporation of new attribute (context of information) into the CBR system
lead to a significant performance improvement. In addition, some machine
learning algorithms were selected and discussed. This were used to carry out
preliminary experiment as further discussed A.1.
Finally in Chapter 6, the experiment implementation of the CBR sys-
tems formulated in chapter 5 is presented. In the performed experiments,
the Weighted CBR system uses a combination of CBR and GA as a tool to
optimize the significance level (weights) of the features. For the experiment, in-
stead of using the conventional approach where the transaction amount, time
dimensions or features dimension are used individually, the log information
7.2. Contributions and Findings 146
from the simulation data was classified into five contexts and then recombined
into a single dimension. A promising result was achieved in the experiment.
To further enhance the performance of the computational cost of the CBR
systems, a CURE clustering algorithm was applied to the retrieval process.
This lead to a significant improvement to the computation cost. Although, a
fixed cluster value was used for the clustering algorithm for the CBR retrieval
process, however a dynamic approach could produce a different or better re-
sults. This could not be verified and investigated due to time constraint. As
conclusion, it can be stated that the CBR classifiers have demonstrated good
prediction accuracy for the examined case study.
7.2 Contributions and Findings
This Section presents the contribution to knowledge by revisiting the research
questions introduced in Chapter 1 (Section 1.3), and it further provides answers
to this research questions from the findings of this thesis. The first addressed
question follows:
1. How can a model developed through the CBR approach be used for
effective analysis of MMT transaction fraud?
A novel model based on CBR methodology for predicting transaction fraud
in mobile money transfer networks was proposed in chapter 5. As a contri-
bution, the proposed model was used to augment basic CBR (i.e StdCBR)
capability using machine learning techniques (Genetic algorithm) to assign
parameter weights in the sample dataset and automating k-value random se-
lection in k-NN classification to improve CBR performance in detecting finan-
cial transaction fraud. The results from the experiment using Weighted CBR
7.2. Contributions and Findings 147
model shows a good prediction accuracy. Another contribution is the use of
a novel approach for capturing user behaviour with the use of feature clas-
sification into context of information so as capture user behaviour effectively
and improve the prediction accuracy which was introduced into the CBR clas-
sifier. The incorporation of new attribute (context of information) into the
CBR system led to a significant performance improvement (see Table 6.2).
During the CBR experiment simulation to ensure that the CBR model is
not biased as a result of unbalanced dataset and also using the same transaction
data for both training and testing phase in the experiment, the following were
carried out; (i) Data balancing using a hybrid approach that combines over-
sampling and under-sampling algorithms (SMOTE+Tomek-Link) was adopted
using a ratio of 2:1 for both negative and positive instances respectively. Ac-
cording to Chawla et al. in [Cha+02; GS17], this hybrid approach works better
than either one. (ii) The MMT dataset was split into training and testing set
using a ratio of 70:30 respectively. In addition, to avoid an overoptimistic
estimate of the CBR model performance after the dataset split transactions
from known compromised MMT account was removed from the subsequent
split as proposed in [Fab+17]. For example when an MMT account is already
associated with a fraudulent transaction in the training set, its transactions
are removed from the test set. The experimental result shows a promising pre-
diction performance, although the use of larger values of k in the experiment
did not present any significant different results.
2. To what extent can such a model deliver measures/metrics for prediction
of MMT fraud?
This involved the similarity measures used and the performance output
from this predictions. The CBR similarity function was computed by opti-
7.2. Contributions and Findings 148
mally combining the weighted individual local similarity of (vi) into a global
similarity as discussed in Section 5.3.2. For the purpose of evaluating the pro-
posed CBR methodology thoroughly, the experiments were performed in an
evolutionary approach described in Section 5.1. The initial sets of the experi-
ment started with the application of basic techniques (StdCBR) on minimalis-
tic dataset. The experimental results shows fairly positive monitoring attempt
for the StdCBR system performance on simplistic dataset (see Table 6.1).
However, the result from the F-measure score indicates a high precision score
for the StdCBR classifier. Such result could possibly lead to decline of valid
customer transaction i.e high false positive score.
As a step further to improving the proposed CBR model performance, more
dataset that exhibits both high complexity and significant sample size (i.e in
addition to the general annotation of fraud and non-fraud, three distinct classes
of fraud were introduced to represent the different misuse scenarios used in the
simulation) with improved CBR configuration were used. These dataset and
configurations were used to evaluate the classification precision of this research
approach. Thus, the use of annotated dataset using three distinct classes of
fraud as discussed in Section 6.2 in the evaluation provided an insight on the
detection rate of the CBR model on each of this classes. The result from the
experiment shows that Weighted CBR + context classifier provided a good
detection rate across the three classes of fraud cases in the sample data (see
Figure 6.3).
3. What limitations does such a model come with and what performances
can we expect from it?
A major challenge that was observed during the proposed CBR model ex-
periment was the computational complexity associated with the augmentation
7.2. Contributions and Findings 149
of CBR model with genetic algorithms. To improve the computational cost of
and at the same time augment the prediction accuracy of the proposed CBR
model, a collaborative approach using CBR and clustering (CURE clustering
algorithm) was implemented as discussed in Section 6.2.4. The aim was to use
it to reduce the search space in the CBR retrieval process and also to consider
only the most suitable cases and solution to support decision and provide an
intelligent strategy that enables the predictor to have the best solution. The
rationale for using CURE algorithm was because its more robust to outliers,
has lower computational time requirements and identifies clusters having non-
spherical shapes and wide variances in size [GRS01]. The experimental results
show an improved computational cost but with a lower prediction accuracy
compared to the proposed CBR model with out CURE clustering algorithm
(see Figure 6.6). One of the main concern was whether the efficiency of the
CURE clustering algorithm was maximised or not. Although a fixed number
of clusters was used in the CURE algorithm configuration, the use of dynamic
number of clusters could have provided different or better results. This could
not be verified and investigated in this thesis due to time constraints.
An additional contribution to knowledge in this thesis is how the chal-
lenge of obtaining mobile money payment dataset for the purpose of evaluat-
ing the proposed predictive model was addressed. In order to provide answer
to the question ”How can background transaction data as training and test
cases for pattern analysis and learning algorithms evaluation be obtained? ”,
the methodology for simulating financial transaction dataset was reviewed as
discussed in Section 2.3. A conclusion was reached to use the methodology in
[LKJ02] and the multi-agent based simulator (MABS) developed by Lopez et
al. [LrA12b] to simulate mobile money transfer dataset containing information
7.3. Future Work 150
about mobile money transactions with examples of fraudulent samples. This is
based on the fact that the proposed methodology has a well defined interface
which makes it easy to use while MASON facilitates the implementation of
social networks.
As contribution, misuse scenario involving SIM swap and retailer facilitat-
ing opening of end user account were modelled which were not considered in
the literature. Also, to trace the sequence of events in the generated data,
time-stamp parameter was included in the simulation rather than steps or cat-
egory of users over a period of time used in [Gab+13; LrA12b] respectively.
To ensure that the modelled data characteristics are as close as possible to real
world situations, a combination of user’s habits as well as their behaviour was
incorporated in the simulation as proposed by Gaber et al. [Gab+13]. From
the results of the evaluation, the generated dataset seems to be as close as pos-
sible to actual transactions dataset. This dataset was used as input data for
training and testing of fraud detection techniques. In addition, the simulated
dataset was used to test the properties of the proposed Fraud Detection Sys-
tem by injecting variations of known frauds or emerging frauds into synthetic
data to study how this affects performance parameters such as the detection
rate.
7.3 Future Work
This section discusses future research directions, despite the successful results
achieved in this thesis. A list of possible future improvements and further
works are summarised below.
• In future work, the simulation of synthetic mobile money transfer dataset
7.3. Future Work 151
can be explored further. For example, by building an improved model
or a more realistic dataset using a combination of synthetic and real
data. This will make it even more valuable as a realistic dataset for
fraud detection experiments. Also, real-world geographical locations can
be implemented in the simulation with the extension of MASON called
GEOMASON [Col13].
• The Weighted CBR with CURE clustering approach (see Section 6.2.4)
showed a better time performance but with significantly low accuracy
in comparison to the CBR system without the CURE algorithm. One
of the concerns was on whether the efficiency of the CURE clustering
algorithm was maximised or not. Although a fixed number of clusters
was used in the CURE algorithm configuration, more emphasis is placed
in future work to use dynamic number of clusters to investigate if it could
have provided different or better results.
• As an improvement in future work, the computational performance of
the CBR approach can be further enhanced by parallelising the imple-
mentation of the CBR model. For example, a better computational
performance can be achieved by splitting the data and computational
tasks over multiple computers, CPUs, GPUs and threads [Sri+99].
• The properties of the FDS can be further tested (e.g false alarm rate)
by varying the background data, where background data is defined as
normal usage with no attacks. Possible directions for improvements in-
cludes: (i) testing the FDS by using benchmarks, (ii) combine with en-
semble methods or hybrid approaches, (iii) evaluate using real financial
transaction dataset.
7.3. Future Work 152
• Finally, in future work it is important to explore the performance of
non-CBR methods such as Artificial neural network, Deep learning and
Discrete modelling approaches. There is a possibility to improve the
performance of the prediction further.
In conclusion, the work presented in this thesis proposed a novel CBR
approach for predicting mobile money transfer fraud. The experimental results
from the experiment shows a good prediction accuracy (see Table 6.2). Despite
these successes, the aforementioned improvements above can be implemented
in future studies.
Appendix A
Preliminary Experiments
A.1 Experimental Results from the Selected
Machine Learning Algorithms
In Section 3.1.1 of chapter 3, the application of some machine learning algo-
rithms were investigated. As a result four algorithmic solution (Logistic regres-
sion, Random forest, Support vector machine and Artificial neural network)
were selected based on their performances from the literature. In Section 5.5
of chapter 5, the representation of both the training and evaluation phase of
this algorithms were presented. In order to run and evaluate the performances
of the selected machine learning algorithms in the prediction of mobile money
transfer fraud, python libraries such as pandas and keras among others were
implemented. This Section presents and discusses the performance evalua-
tion results from the selected machine learning algorithms using the simulated
dataset.
The experiment was carried out using 5-fold cross validation and evaluated
153
A.1. Experimental Results from the Selected Machine LearningAlgorithms 154
using the following metrics Recall, F-measure, Mathews Correlation Coefficient
(MCC) and Area Under the ROC Curve (AUC). These four metrics have high
efficiency with respect to handling imbalanced data. They avoid getting biased
towards the majority class and are also highly suitable with respect to handling
fraud detection [GS17]. Recall focuses on the significant class (fraud) and is not
sensitive to data distribution. F-score is a good metric due to its non-linear
nature and MCC is considered one of the best singular assessment measure
and are less influenced by imbalanced data [GS17; Cha+11a]. AUC evaluates
the overall classifier performance and is very appropriate for class imbalance
[Gu+08]. The performance evaluation of the classifier using the above metrics
is as shown below.
Figure A.1: Classifiers recall results
Figure A.1 above shows the recall value for the four classifiers used in
this thesis. The results clearly show that random forest outperformed the
other classifiers in the identification of both fraud and non-fraudulent cases by
demonstrating a high recall value.
A.1. Experimental Results from the Selected Machine LearningAlgorithms 155
Figure A.2: Classifiers F-measure results
The above F-measure results from the classifiers can be characterised as
good performance. The result further shows that random forest classifier
achieved the best score. Another observation from Figure A.2 above is that
the application of cross validation for LR, RF and SVM did not show any sig-
nificant effect on their performances. However in the case of ANN, the value
increased between the 10 - 30 percent and gradually decrease to between 70 -
90 during the testing phase.
Figure A.3: Classifiers Mathews Correlation Coefficient results
Figure A.3 shows Mathews correlation coefficient results for the classifiers.
A.1. Experimental Results from the Selected Machine LearningAlgorithms 156
It can be observed that RF classifier shows an excellent result when compared
to other classifiers. This indicates that RF classifiers has the highest probabil-
ity of been the perfect model for the sample data in this experiment.
Figure A.4: Classifiers area under ROC curve results
The results produced from the classifiers using AUC performance mea-
sures is as shown in Figure A.4. The results show that RF outperformed the
remaining classifiers with an excellent performance value of 0.92. It can also be
observed that the application of cross validation had more influence on ANN
than the remaining classifiers.
In conclusion the overall results from the experiments with the selected
machine learning algorithms can be characterised as good, especially with the
random forest classifiers. However, part of the motivation for this thesis was
to investigate the performance of case-based reasoning (CBR) methodology in
detecting mobile payment fraud. This prompted the application of CBR to a
mobile payment case study in this research.
Appendix B
Data Simulation Configuration
During the simulation about 10% of the clients were configured to behave as
malicious agents (fraudsters). In a real life scenario, it is more common to
find a lower percentage of fraudsters. The idea behind a higher proportion
of fraudsters is to prevent the class imbalance problem during the training of
the detector. The social network between the clients were built by varying the
network for clients within the same city and outside different cities. The fraud-
sters can also interact with normal clients in the system. The data simulation
was run five times. At each simulation the parameters were varied except for
the number of clients & cities. This will allow changes in behavioural pattern
of each client (i.e users) during each simulation. The different values used for
the parameters during the simulation are presented in Table B.1. The files
generated were merged and ultimately used as input for the proposed CBR
system presented in Section 5.3.1.
157
Appendix B. Data Simulation Configuration 158
Table B.1: Simulation Input parameters
Parameters Exp. 1 Exp. 2 Exp. 3 Exp. 4 Exp. 5
RandomMultiplier 0.1 0.3 0.5 0.7 0.9
MaxNeighbor 10 8 6 4 2
ClientsBalance - - - - -
MaxOtherNeighbor 2 3 4 5 6
UpgradeAccountRate 0.01 0.01 0.01 0.01 0.01
TransactionRate 0.5 0.5 0.5 0.5 0.5
Trans - - - - -
NumClients 2000 2000 2000 2000 2000
Types - - - - -
NumCities 7 7 7 7 7
From Table B.1 above, ”Types” represents type of account profile as dis-
cussed in Section 4.3. Here agents are switched from one profile to another
using Markov matrix of transition probabilities (Markov matrix was chosen
because it has the ability to transit an agent from one state to another and
is commonly used in the literature for sequence evolution). This tells the sys-
tem when to change from Active to Inactive and from Profile P1 to Profile
P2 which allows higher limits for transactions. For ”Trans”, this represents
the categories of transactions that clients can perform. They can either make
a money deposit (MD), money withdrawal (MW), merchant payment (MP),
person-to-person transfer (P2P) or airtime recharge (AR). The autonomy of
the agent is implemented by a probabilistic transition function that computes
the type of operation and the action that an agent will perform in each step.
This transition function depends on clients attributes such as category of user
and the amount which is calculated according to the balance and the limits of
Appendix B. Data Simulation Configuration 159
each client’s profile [LrA12b; Zhd+14]
References
[AAN17] Ahmad Dwi Arianto, Achmad Affandi, and Supeno MardiSusiki Nugroho. “Opinion detection of public sector financialstatements using K-nearest neighbors”. In: 4th InternationalConference on Electrical Engineering, Computer Science andInformatics (EECSI). Yogyakarta, Indonesia: IEEE, 2017.
[ABL13] Mobyen Uddin Ahmed, Hadi Banaee, and Amy Loutfi. “HealthMonitoring for Elderly: An Application Using Case-Based Rea-soning and Cluster Analysis”. In: ISRN Artificial Intelligence(2013), pp. 1–11.
[Ade+16] Adeyinka Adedoyin, Stelios Kapetanakis, Miltos Petridis, andEmmanouil Panaousis. “Evaluating Case-Based Reasoning Knowl-edge Discovery in Fraud Detection”. In: 24th Workshop onCase Based Reasoning (ICCBR2016): Synergies between CBRand Knowledge Discovery. Atlanta, USA, 2016, pp. 182–191.
[Ade+17] Adeyinka Adedoyin, Stelios Kapetanakis, Georgios Samakovi-tis, and Miltos Petridis. “Predicting Fraud in Mobile MoneyTransfer Using Case-Based Reasoning”. In: Bramer M., PetridisM. (eds) Artificial Intelligence XXXIV. SGAI 2017. LectureNotes in Computer Science, vol. 10630. Cambridge, UK: Springer,Cham, 2017, pp. 325–337.
[AHG14] Debray Atanu, Kwon Hyejung, and Richard Gill. “Mobile MoneyOpportunities for Mobile Operators”. In: Business & NetworkConsulting, Hauwei Technologies White Paper (2014), pp. 1–12.
160
REFERENCES 161
[AJ04] Niloofar Arshadi and Igor Jurisica. “Maintaining Case-BasedReasoing Systems A Machine Learning Approach”. In: FunkP., Gonzalez Calero P.A. (eds) Advances in Case-Based Rea-soning. ECCBR 2004. Lecture Notes in Computer Science.Vol. 3155. Springer, Berlin, Heidelberg, 2004, pp. 17–31.
[AKH06] Hyunchul Ahn, Kyoung-jae Kim, and Ingoo Han. “Hybrid ge-netic algorithms and case-based reasoning systems for cus-tomer classification”. In: Expert Systems 23.3 (2006), pp. 127–144.
[AKJ04] Rehan Akbani, Stephen Kwek, and Nathalie Japkowicz. “Ap-plying Support Vector Machines to Imbalanced Datasets”. In:Boulicaut JF., Esposito F., Giannotti F., Pedreschi D. (eds)Machine Learning: ECML 2004. Lecture Notes in ComputerScience. Vol. 3201. Springer, Berlin, Heidelberg, 2004, pp. 39–50.
[AL13] Khaled Amailef and Jie Lu. “Ontology-supported case-basedreasoning approach for intelligent m-Government emergencyresponse services”. In: Decision Support Systems 55.1 (2013),pp. 79–97.
[Ale17] Kijek Aleksander. A Beginner’s Guide to Machine Learningin Payment Fraud Detection & Prevention. 2017.
[All09] Rob J Allan. Survey of Agent Based Modelling and Simula-tion Tools. Tech. rep. Daresbury, Warrington: STFC Dares-bury Laboratory, 2009, pp. 57–72.
[AMZ16] Aisha Abdallah, Mohd Aizaini Maarof, and Anazida Zainal.“Fraud detection system: A survey”. In: Journal of Networkand Computer Applications, ScienceDirect 68 (2016), pp. 90–113.
[AP94] Agnar Aamodt and Enric Plaza. “Case-based reasoning: Foun-dational issues, methodological variations, and system approaches”.In: AI communications 7.1 (1994), pp. 39–59.
REFERENCES 162
[Aze+14] Ush Azeem, Khan Shan, Akhtar Nadeem, and Naved QureshiMohammad. “Real-Time Credit-Card Fraud Detection usingArtificial Neural Network Tuned by Simulated Annealing Al-gorithm”. In: International Conference on Recent Trends inInformation, Telecommunication and Computing, ITC. Chandi-garh, India: Association of Computer Electronics and Electri-cal Engineers, 2014, pp. 113–121.
[Bah+13] Alejandro Correa Bahnsen, Aleksandar Stojanovic, DjamilaAouada, and Bjorn Ottersten. “Cost sensitive credit card frauddetection using bayes minimum risk”. In: 12th InternationalConference on Machine Learning and Applications, ICMLA.Vol. 1. Washington, DC, USA: IEEE Computer Society, 2013,pp. 333–338.
[BAO15] Alejandro Correa Bahnsen, Djamila Aouada, and Bjorn Ot-tersten. “Example-dependent cost-sensitive decision trees”. In:Expert Systems with Applications 42.19 (2015), pp. 6609–6619.
[BBM03] Gustavo E A P A Batista, Ana L C Bazzan, and Maria Car-olina Monard. “Balancing training data for automated anno-tation of keywords: a case study”. In: Revista Tecnologia daInformacao 3.2 (2003), pp. 15–20.
[BD13] N Bennett and S Dilloway. “Investigating the Convergenceof Money Laundering and Terrorist Financing”. In: ACAMSAML and Financial Crime Conference. Amsterdam, 2013.
[BGK05] A. Blansche, P. Gancarski, and J.J. Korczak. “Genetic Al-gorithms for Feature Weighting: Evolution vs. Coevolutionand Darwin vs Lamarck”. In: In: Gelbukh A., de AlbornozA., Terashima-Marın H. (eds) MICAI 2005: Advances in Ar-tificial Intelligence. MICAI 2005. LNCS, vol. 3789. SpringerBerlin Heidelberg, 2005, pp. 682–691.
[BH01] Richard .J Bolton and David .J Hand. “Unsupervised Profil-ing Methods for Fraud Detection”. In: Proceedings of creditscoring and credit control. Edinburgh, Uk, 2001, pp. 235–255.
REFERENCES 163
[BH02] Richard .J Bolton and David .J Hand. “Statistical Fraud De-tection A Review”. In: Statistical Science 17.3 (2002), pp. 235–249.
[Bha+11] Siddhartha Bhattacharyya, Sanjeev Jha, Kurian Tharakunnel,and J. Christopher Westland. “Data mining for credit cardfraud: A comparative study”. In: Decision Support Systems,ScienceDirect 50.3 (2011), pp. 602 –613.
[BK17] Richard A. Bauder and Taghi M. Khoshgoftaar. “A proba-bilistic programming approach for outlier detection in health-care claims”. In: International Conference on Machine Learn-ing and Applications, ICMLA 2016. IEEE, 2017, pp. 347–354.
[BK99] Eric Bauer and Ron Kohavi. “An Empirical Comparison ofVoting Classification Algorithms: Bagging, Boosting, and Vari-ants”. In: Bauer, E. & Kohavi, R. Machine Learning 36.1-2(1999), pp. 105–139.
[BKJ03] Lundin Emilie Barse, Hakan Kvarnstrom, and Erland Jonsson.“Synthesizing test data for fraud detection systems”. In: 19thAnnual Computer Security Applications Conference (ACSAC2003). 03. Sweden: IEEE Computer Society, 2003, p. 11.
[BP89] Mike Brown and George Paliouras. Inside Case-based Reason-ing. Ed. by Schank C. Roger, Alex Kass, and K. Christopher.Lawrence Erlbaum Associates, 1989, p. 21.
[BPM04] Gustavo E. A. P. A. Batista, Ronaldo C. Prati, and Maria Car-olina Monard. “A Study of the Behavior of Several Methods forBalancing Machine Learning Training Data”. In: SIGKDD Ex-plor. Newsl. Special issue on learning from imbalanced datasets6.1 (2004), pp. 20–29.
[Bra97] Andrew P. Bradley. “The use of the area under the ROC curvein the evaluation of machine learning algorithms”. In: PatternRecognition 30.7 (1997), pp. 1145–1159.
REFERENCES 164
[Bur17] Nikolay Burlutskiy. “Machine Learning for Predicting UserBehaviour”. PhD thesis. University of Brighton, 2017, pp. 34–45.
[BVV15] Bart Baesens, Veronique Van Vlasselaer, and Wouter Verbeke.Fraud Analytics Using Descriptive, Predictive, and Social Net-work Techniques: A Guide to Data Science for Fraud Detec-tion. Cary, North Carolina, USA: SAS Institute Inc., 2015,pp. 1–36.
[CB13] Cristiano L. Castro and Antonio P. Braga. “Novel cost-sensitiveapproach to improve the multilayer perceptron performance onimbalanced data”. In: Transactions on Neural Networks andLearning Systems 24.6 (2013), pp. 888–899.
[CC11] Wen Hsi Chang and Jau Shien Chang. “A novel two-stagephased modeling framework for early fraud detection in onlineauctions”. In: Expert Systems with Applications 38.9 (2011),pp. 11244–11260.
[CC12] Wen Hsi Chang and Jau Shien Chang. “An effective earlyfraud detection method for online auctions”. In: ElectronicCommerce Research and Applications 11.4 (2012), pp. 346–360.
[CCB08] Andrew Crooks, Christian Castle, and Michael Batty. “KeyChallenges in Agent-Based Modelling for Geo-Spatial Simula-tion”. In: Computers, Environment and Urban Systems 32.6(2008), pp. 417–430.
[CGC05] Xue-wen Chen, Byron Gerlach, and David Casasent. “Prun-ning Support Vectors for Imbalanced Data Classification”. In:Proceedings of International Joint Conference on Neural Net-works. Montreal, Que., Canada: IEEE, 2005, pp. 1883–1888.
[CG+13] Boris Campillo-Gimenez, Wassim Jouini, Sahar Bayat, andMarc Cuggia. “Improving Case-Based Reasoning Systems byCombining K-Nearest Neighbour Algorithm with Logistic Re-
REFERENCES 165
gression in the Prediction of Patients’ Registration on the Re-nal Transplant Waiting List”. In: PLoS ONE 8.9 (2013).
[CH11] Chun Ling Chuang and Szu Teng Huang. “A hybrid neuralnetwork approach for credit scoring”. In: Expert Systems 28.2(2011), pp. 185–196.
[Cha05] Nitesh V. Chawla. “Data Mining for Imbalanced Datasets: AnOverview”. In: Maimon O., Rokach L. (eds) Data Mining andKnowledge Discovery Handbook. Springer, Boston, MA, 2005,pp. 853–867.
[Cha+02] Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, andW. Philip Kegelmeyer. “SMOTE: Synthetic minority over-sampling technique”. In: Journal of Artificial Intelligence Re-search 16 (2002), pp. 321–357.
[Cha+03] Nitesh V Chawla, Aleksandar Lazarevic, Lawrence O Hall,and Kevin W Bowyer. “SMOTEBoost : Improving Predic-tion”. In: Lavrac N., Gamberger D., Todorovski L., Blockeel H.(eds) Knowledge Discovery in Databases: PKDD 2003. LNCS.Vol. 2838. Springer Berlin Heidelberg, 2003, pp. 107–119.
[Cha+11a] Kevin Chai, Chen Wu, Vidyasagar Potdar, and Pedram Hay-ati. “Automatically Measuring the Quality of User GeneratedContent in Forums”. In: Wang D., Reynolds M. (eds) AI 2011:Advances in Artificial Intelligence, LNCS. Vol. 7106. SpringerBerlin Heidelberg, 2011, pp. 51–60.
[Cha+11b] Pierre-Laurent Chatain, Andrew Zerzan, Wameek Noor, Na-jah Dannaoui, and Louis de Koker. “Protecting Mobile Moneyagainst Financial Crimes: Global Policy Challenges and Solu-tions”. In: Directions in Development Finance (2011), pp. 1–234.
[Chu13] Chun-Ling Chuang. “Application of hybrid case-based reason-ing for enhanced performance in bankruptcy prediction”. In:(Information Sciences) 236 (2013), pp. 174–185.
REFERENCES 166
[CK91] Robert T.H. Chi and Melody .Y Kiang. “An integrated ap-proach of rule-based and case-based reasoning for decisionsupport”. In: Proceedings of the 19th annual conference onComputer Science (CSC ’91). Vol. 103. New York, NY, USA:ACM, 1991, pp. 255–267.
[CL01] J. M. Corchado and B. Lees. “Adaptation of cases for casebased forecasting with neural network support”. In: Soft com-puting in case based reasoning (2001), pp. 293–319.
[CL06] Kelvin Chan and Jay Liebowitz. “The synergy of social net-work analysis and knowledge mapping: a case study”. In: In-ternational Journal of Management and Decision Making 7.1(2006), p. 19.
[Cl18] Jan Chyan-long. “An Effective Financial Statements FraudDetection Model for the Sustainable Development of Finan-cial Markets: Evidence from Taiwan”. In: MDPI SustainabilityJournal 10.2 (2018), p. 513.
[CLL05] Rong-chang Chen, Shu-ting Luol, and Vincent C.S. Lee. “Per-sonalized Approach Based on SVM and ANN for DetectingCredit Card Fraud”. In: International Conference on NeuralNetwork and Brain, ICNN&B ’05. Beijing, China: IEEE, 2005,pp. 810–815.
[Col13] Mark Coletti. The GeoMason Cookbook. Tech. rep. George Ma-son University, 2013, pp. 1–41.
[Cor08] Amelie Cordier. “Interactive and Opportunistic KnowledgeAcquisition in Case-Based Reasoning”. PhD thesis. UniversiteClaude Bernard-Lyon I, 2008.
[CPV01] Corinna Cortes, Daryl Pregibon, and Chris Volinsky. “Com-munities of Interest”. In: Proceedings of the 14th InternationalConference on Advances in Intelligent Data Analysis. London,Uk: Springer-Verlag, 2001, pp. 105–114.
REFERENCES 167
[CZZ13] Peng Cao, Dazhe Zhao, and Osmar Zaiane. “An optimizedcost-sensitive SVM for imbalanced data learning”. In: Pei J.,Tseng V.S., Cao L., Motoda H., Xu G. (eds) Advances inKnowledge Discovery and Data Mining. PAKDD 2013. LNCS7819 (2013), pp. 280–292.
[DA14] Mehdi Darvishi and Gholamreza Ahmadi. “Validation tech-niques of agent based modelling for geospatial simulations”. In:International Archives of the Photogrammetry, Remote Sens-ing and Spatial Information Sciences - ISPRS Archives. Vol. XL-2/W3. Tehran, Iran, 2014, pp. 91–95.
[Dan17] B Dan. Cross-Validation. 2017.
[DCM93] Christian Darken, Joseph Change, and John Moody. “Learn-ing rate schedules for stochastic gradient algorithms”. In: Neu-ral Network for Signal Processing II. IEEE, 1993, p. 133.
[DD13] V. Dheepa and R. Dhanapal. “Hybrid Approach for Impro-vising Credit Card Fraud Detection Based on Collective Ani-mal Behaviour and SVM”. In: Thampi S.M., Atrey P.K., FanCI., Perez G.M. (eds) Security in Computing and Communi-cations. SSCC 2013. Communications in Computer and In-formation Science. Springer Berlin Heidelberg, 2013, pp. 293–302.
[DH00] Chris Drummond and Robert C. Holte. “Explicitly represent-ing expected cost”. In: Proceedings of the sixth ACM SIGKDDinternational conference on Knowledge discovery and data min-ing - KDD ’00. 2000, pp. 198–207.
[DH03] Chris Drummond and R.C. Holte. “C4.5, class imbalance, andcost sensitivity: why under-sampling beats over-sampling”. In:Workshop on Learning from Imbalanced Datasets II, ICML.Washington, DC, USA: AAAI Press, 2003, pp. 1–8.
[Div15] David Divitt. Social network analysis for fraud detection inpayments. 2015.
REFERENCES 168
[DK+15] Asli Demirguc-Kunt, Leora Klapper, Dorothe Singer, and VanPeter Oudheusden. The Global Findex Database 2014: Measur-ing Financial Inclusion around the World. Tech. rep. Wash-ington, DC, USA: The world Bank Group, 2015, pp. 1–8.
[EA13] Shaza M Abd Elrahman and Ajith Abraham. “A Review ofClass Imbalance Problem”. In: Network and Innovative Com-puting 1 (2013), pp. 332–340.
[Eka03] Aniko Ekart. “Using genetic algorithms for improved discretesequence prediction”. In: International Conference on Artifi-cial Inteligence. Las Vegas, Nevada, USA, 2003, pp. 1–10.
[Elh+16] T Elhassan, M Aljurf, F Mohanna, and M Shoukri. “Classi-fication of Imbalance Data using Tomek Link(T-Link) Com-bined with Random Under-sampling (RUS) as a Data Reduc-tion Method”. In: Journal of Informatics and Data Mining 1.2(2016), pp. 1–12.
[Fab+17] Braun Fabian, Olivier Caelen, Evgueni N Smirnov, StevenKelk, and Bertrand Lebichot. “Improving Card Fraud De-tection Through Suspicious Pattern Discovery”. In: BenferhatS., Tabia K., Ali M. (eds) Advances in Artificial Intelligence:From Theory to Practice. IEA/AIE 2017. Vol. 10351. Springer,Cham, 2017, pp. 181–190.
[Fer+08] Alberto Fernandez, Salvador Garcıa, Marıa Jose del Jesus,and Francisco Herrera. “A study of the behaviour of linguis-tic fuzzy rule based classification systems in the frameworkof imbalanced data-sets”. In: Fuzzy Sets and Systems 159.18(2008), pp. 2378–2398.
[FM06] Zakia Ferdousi and Akira Maeda. “Anomaly Detection UsingUnsupervised Profiling Method in Time Series Data”. In: AD-BIS Research Communications (2006).
[FR93] Anthony G. Francis and Ashwin Ram. The utility problem incase-based reasoning. Tech. rep. 1993.
REFERENCES 169
[FS96] Yoav Freund and Robert E. Schapire. “Experiments with anew boosting algorithm”. In: Proceedings of the Thirtennth In-ternational Conference of Machine Learning. Bari, Italy: Mor-gan Kaufmann, 1996, pp. 148–156.
[FSC99] Wei Fan, Salvatore J Stolfo, and Philip K Chan. “AdaCost :Misclassification Cost-sensitive Boosting”. In: Proceedings ofthe Sixteenth International Conference on Machine Learning(ICML 99). Morgan Kaufmann, 1999, pp. 97–105.
[Gab+13] Chrystel Gaber, Baptiste Hemery, Mohammed Achemlal, MarcPasquet, and Pascal Urien. “Synthetic logs generator for frauddetection in mobile transfer services”. In: Sadeghi AR. (eds)Financial Cryptography and Data Security. FC 2013. LNCS.Vol. 7859. Springer, Heidelberg, 2013, pp. 397–398.
[Gal+12] Mikel Galar, Alberto Fern, Edurne Barrenechea, and Hum-berto Bustince. “A Review on Ensembles for the Class Im-balance Problem: Bagging-, Boosting-, and Hybrid-Based Ap-proaches”. In: Transactions on Systems, Man, and Cybernet-ics, Part C (Applications and Reviews) 42.4 (2012), pp. 463–484.
[GAM00] Batista Gustavo, Carvalho Andre, and Monard Maria. “Ap-plying One-Sided Selection to Unbalanced Datasets”. In: CairoO., Sucar L.E., Cantu F.J. (eds) MICAI 2000: Advances inArtificial Intelligence. MICAI 2000. Vol. 1793. Springer BerlinHeidelberg, 2000, pp. 315–325.
[GBC15] Ruibin Geng, Indranil Bose, and Xi Chen. “Prediction of fi-nancial distress: An empirical study of listed Chinese compa-nies using data mining”. In: European Journal of OperationalResearch 241.1 (2015), pp. 236–247.
[GC03] Wu Gang and Edward Y. Chang. “Class-boundary alignmentfor imbalanced dataset learning”. In: The Twentieth Interna-tional Conference on Machine Learning (ICML), Workshop onImbalanced Data Sets. 1. Washington, DC, USA, 2003, pp. 49–56.
REFERENCES 170
[GCG17] Katsiaryna V. Gris, Jean-Philippe Coutu, and Denis Gris. “Su-pervised and Unsupervised Learning Technology in the Studyof Rodent Behavior”. In: Frontiers in Behavioral Neuroscience11 (2017), pp. 1–6.
[GG16] Micheal J. Greenacre and Patrick J. F. Groenen. “WeightedEuclidean Biplots”. In: Journal of Classification 33.3 (2016),pp. 442–459.
[GKB17] Aayushi Gupta, Dhananjay Kumar, and Atul Barve. “Hid-den Markov Model based Credit Card Fraud Detection Systemwith Time Stamp and IP Address”. In: International Journalof Computer Applications 166.5 (2017), pp. 33–37.
[Gor15] Dan Gorton. “IncidentResponseSim: An AgentBased Simula-tion Tool for Risk Management of Online Fraud”. In: Bucheg-ger S., Dam M. (eds) Secure IT Systems, LNCS. Vol. 9417.Springer, Cham, 2015, pp. 172–187.
[GRS01] Sudipto Guha, Rajeev Rastogi, and Kyuseok Shim. “CURE:An efficient clustering algorithm for large databases”. In: In-formation Systems 26.1 (2001), pp. 35–58.
[GS17] Swati Ganguly and Samira Sadaoui. “Classification of Imbal-anced Auction Fraud Data”. In: Mouhoub M., Langlais P.(eds) Advances in Artificial Intelligence. Vol. 10233. Canada:Springer, 2017, pp. 84–89.
[Gu+08] Qiong Gu, Zhihua Cai, Li Zhu, and Bo Huang. “Data Min-ing on Imbalanced Data Sets”. In: International Conferenceon Advanced Computer Theory and Engineering, ICACTE’08.Phuket, Thailand: IEEE Computer Society, 2008, pp. 1020–1024.
[GV16] Robb Genna and Thando Vilakazi. “Mobile Payments Marketsin Kenya, Tanzania and Zimbabwe: A Comparative Study ofCompetitive Dynamics and Outcomes”. In: The African Jour-nal of Information and Communication 17 (2016), pp. 9–37.
REFERENCES 171
[HAA14] Ahmad Basheer Hassanat, Mohammad Ali Abbadi, and Ah-mad Ali Alhasanat. “Solving the Problem of the K Parameterin the KNN Classifier Using an Ensemble Learning Approach”.In: International Journal of Computer Science and Informa-tion Security (IJCSIS) 12.8 (2014), pp. 33–39.
[Hai+16] He Haibo, Bai Yang, Garcia A. Edwardo, and Li Shutao. “Adap-tive Synthetic Sampling Approach for Imbalanced Learning”.In: IEEE International Joint Conference on Neural Networks,IJCNN ’08. 3. Hong Kong, China: IEEE, 2016, pp. 1322–1328.
[Har68] P. Hart. “The condensed nearest neighbor rule (Corresp.)”In: IEEE Transactions on Information Theory 14.3 (1968),pp. 515–516.
[Hay99] Simon Haykin. Neural Networks: A Comprehensive Founda-tion. Second. Pearson Education, 1999, pp. 23–56.
[HB12] Dirk Helbing and Stefano Balietti. “How to Do Agent-BasedSimulations in the Future : From Modeling Social Mechanismsto Emergent Phenomena and Interactive Systems Design”. In:Social Self-Organisation. Ed. by Dirk Helbing. Springer BerlinHeidelberg, 2012, pp. 25–70.
[HD92] James S. Hodges and James A. Dewar. Is It You or Your ModelTalking? A Framework for Model Validation. Tech. rep. SantaMonica, Califonia: National Defense Research Institute, 1992,pp. 1–43.
[Hol00] Jaakko Hollmen. “User profiling and classification for frauddetection in mobile communications networks”. PhD thesis.Helsinki University of Technology, 2000.
[HSW07] Thomas N. Herzog, Fritz J. Scheuren, and William E. Winkler.“What is Data Quality and Why Should We Care”. In: DataQuality and Record Linkage techniques. 1st ed. Springer-VerlagNew York, 2007. Chap. 2, pp. 7–15.
REFERENCES 172
[HTF09] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. TheElements of Statistical Learning. second. Vol. 27. 2. Springer,2009, pp. 83–85.
[IA16] Amira Kamil Ibrahim Hassan and Ajith Abraham. “ModelingInsurance Fraud Detection Using Ensemble Combining Classi-fication”. In: International Journal of Computer InformationSystems and Industrial Management Applications 8 (2016),pp. 257–265.
[IRU18] Sohony Ishan, Pratap Rameshwar, and Nambiar Ullas. “En-semble learning for credit card fraud detection”. In: Proceed-ings of Joint International Conference on Data Science andManagement of Data. India: ACM, 2018, pp. 289–294.
[JADaRg14] Jose L. Jorro-Aragoneses, Belen Dıaz-agudo, and Juan A. Recio-garcıa. “CBR tagging of emotions from facial expressions”.In: Lamontagne L., Plaza E. (eds) Case-Based Reasoning Re-search and Development (ICCBR), LNCS. Vol. 8765. Springer,Heidelberg, 2014, pp. 245–259.
[Jen08] Beth Jenkins. Developing Mobile Money Ecosystems. Tech.rep. Harvard Kennedy School, 2008, p. 36.
[Jes+05] Daniel R. Jeske, Behrokh Samadi, Pengyue J .J Lin, Lan Ye,Sean Cox, Rui Xiao, Ted Younglove, Minh Ly, Douglas Holt,and Ryan Rich. “Generation of Synthetic Data Sets for Eval-uating the Accuracy of Knowledge Discovery Systems”. In:Proceeding of the eleventh SIGKDD international conferenceon knowledge discovery in data mining. Chicago, USA: ACM,2005, pp. 756–762.
[JTPL03] Chang Hyeon Joh, Harry J.P. Timmermans, and Peter T.L.Popkowski-Leszczyc. “Identifying purchase-history sensitive shop-per segments using scanner panel data and sequence align-ment methods”. In: Journal of Retailing and Consumer Ser-vices 10.3 (2003), pp. 135–144.
REFERENCES 173
[Kap12] Stelios Kapetanakis. “Intelligent monitoring of business pro-cesses using case-based reasoning”. PhD thesis. University ofGreenwich, 2012.
[Kap+12] Stelios Kapetanakis, Georgios Samakovitis, P V G BuddhikaGunasekera, and Miltos Petridis. “Monitoring Financial Trans-action Fraud with the use of Case-based Reasoning”. In: Sev-enteenth UK Workshop on Case-Based Reasoning. Cambridge,UK, 2012.
[KBC16] Yeonkook J. Kim, Bok Baik, and Sungzoon Cho. “Detectingfinancial misstatements with fraud intention using multi-classcost-sensitive learning”. In: Expert Systems with Applications62 (2016), pp. 32–43.
[Ken16] Central Bank of Kenya. The Kenya Financial Sector StabilityReport, 2015. Tech. rep. 7. Nairobi: Central Bank of Kenya,2016.
[KGA12] Marjan Kaedi and Nasser Ghasem-Aghaee. “Improving case-based reasoning in solving optimization problems using Bayesianoptimization algorithm”. In: Inteligent Data Aanalysis 16.2(2012), pp. 199–210.
[KH01] Kyung Sup Kim and Ingoo Han. “The cluster-indexing methodfor case-based reasoning using self-organizing maps and learn-ing vector quantization for bond rating cases”. In: Expert Sys-tems with Applications 21.3 (2001), pp. 147–156.
[Kj04] Kim Kyoung-jae. “Toward Global Optimization of Case-BasedReasoning Systems for Financial Forecasting”. In: Applied In-telligence 21.2 (2004), pp. 239–249.
[Koh95] Ron Kohavi. “A Study of Cross-Validation and Bootstrap forAccuracy Estimation and Model Selection”. In: Proceedingsof the 14th international joint conference on Artificial intelli-gence. Vol. 2. Montreal, Quebec, Canada: Morgan Kaufmann,1995, pp. 1137–1143.
REFERENCES 174
[Kok97] A. I. Kokkinaki. “On atypical database transactions: Identifi-cation of probable frauds using machine learning for user pro-filing”. In: Proceedings of Knowledge & Data Engineering Ex-change Workshop, KDEX. Newport Beach, USA: IEEE, 1997,pp. 107–113.
[Kou+04] Yufeng Kou, Chang-Tien Lu, Sirirat Sirwongwattana, and Yo-Ping Huang. “Survey of fraud detection techniques”. In: Pro-ceedings of International Conference on Networking, Sensing& Control, IEEE 2 (2004), pp. 749–754.
[KPJ15] Gulmira Khussainova, Sanja Petrovic, and Rupa Jagannathan.“Retrieval with Clustering in a Case-Based Reasoning Systemfor Radiotherapy Treatment Planning”. In: Journal of Physics:Conference Series 616 (2015), pp. 1–11.
[KPK10] S Kapetanakis, M Petridis, and B Knight. “A case based rea-soning approach for the monitoring of business workflows”.In: Bichindaritz, I., Montani, S. (eds.) (ICCBR 2010) LNCS.Vol. 6176. sp, 2010, pp. 390–405.
[KS12] Yoonseong Kim and So Young Sohn. “Stock fraud detectionusing peer group analysis”. In: Expert Systems with Applica-tions 39.10 (2012), pp. 8986–8992.
[KSM06] Amlan Kundu, Shamik Sural, and A.K. Majumdar. “Two-Stage Credit Card Fraud Detection Using Sequence Align-ment”. In: Bagchi A., Atluri V. (eds) Information SystemsSecurity, (ICISS2006) LNCS. Vol. 4332. Springer Berlin Hei-delberg, 2006, pp. 260–275.
[KSM07] Efstathios Kirkos, Charalambos Spathis, and Yannis Manolopou-los. “Data Mining techniques for the detection of fraudulent fi-nancial statements”. In: Expert Systems with Applications 32.4(2007), pp. 995–1003.
[Kun+09] Amlan Kundu, Suvasini Panigrahi, Shamik Sural, and Arun K.Majumdar. “BLAST-SSAHA Hybridization for Credit CardFraud Detection”. In: IEEE Transactions on Dependable and
REFERENCES 175
Secure Computing. Vol. 6. 4. IEEE Computer Society, 2009,pp. 309–315.
[Lak13] Andrew James Lake. Risk management in Mobile Money : Ob-served Risks and Proposed Mitigants for Mobile Money Oper-ators. Tech. rep. Swiss: World Bank Group, 2013, pp. 1–21.
[LD12] Norman Lonergan and Jonathan Dharmapalan. “Mobile money:An overview for global telecommunications operators”. In: Mo-bile Money (2012), pp. 1–40.
[Lee08] Gun Ho Lee. “Rule-based and case-based reasoning approachfor internal audit of bank”. In: Knowledge-Based Systems, Sci-enceDirect 21.2 (2008), pp. 140–147.
[Lin+14] Fengyi Lin, Deron Liang, Ching Chiang Yeh, and Jui ChiehHuang. “Novel feature selection methods to financial distressprediction”. In: Expert Systems with Applications 41.5 (2014),pp. 2472–2483.
[LKJ02] Emilie Lundin, Hakan Kvarnstrom, and Erland Jonsson. “ASynthetic Fraud Data Generation Methodology”. In: Deng R.,Bao F., Zhou J., Qing S. (eds) Information and Communica-tions Security. ICICS 2002. LNCS. Vol. 2513. Springer, Hei-delberg, 2002, pp. 265–277.
[Lla+11] Marc Llanes, Elsa Prieto, Rodrigo Diaz, and Et Al. D2.1.1Scenario Requirements (Public version). Tech. rep. 2011, pp. 1–75.
[LN15] Sanni Lookman and Selmin Nurcan. “A framework for occupa-tional fraud detection by social network analysis”. In: CEURWorkshop Proceedings. Vol. 1367. Stockholm, Sweden: CEUR,2015, pp. 221–228.
[LrA12a] Edgar Alonso Lopez-rojas and Stefan Axelsson. “Money Laun-dering Detection using Synthetic Data”. In: The 27th annualworkshop of the Swedish Artificial Intelligence Society (SAIS),Orebro; Sweden. 2012, p. 49.
REFERENCES 176
[LrA12b] Edgar Alonso Lopez-rojas and Stefan Axelsson. “Multi AgentBased Simulation ( MABS ) of Financial Transactions for AntiMoney Laundering ( AML )”. In: 17th Nordic Conference onSecure IT. Karlskrona, Sweden, 2012.
[LRA14] Edgar Alonso Lopez-Rojas and Stefan Axelsson. “Social Sim-ulation of Commercial and Financial Behaviour for Fraud De-tection Research”. In: Miguel, Amblard, Barcelo & Madella(eds.) Advances in Computational Social Science and SocialSimulation. Bellaterra, Cerdanyola del Valles, 2014, pp. 1–12.
[LS10a] Hui Li and Jie Sun. “Business failure prediction using hybrid2case-based reasoning (H2CBR)”. In: Computers & OperationsResearch 37.1 (2010), pp. 137–151.
[LS10b] Charles X. Ling and Victor S. Sheng. “Cost-Sensitive Clas-sification”. In: Sammut C., Webb G.I. (eds) Encyclopedia ofMachine Learning and Data Mining. Springer, 2010, pp. 231–235.
[Lui+15] Coppolino Luigi, D’Antonio Salvatore, Formicola Valerio, Mas-sei Carmine, and Romano Luigi. “Use of the Dempster-ShaferTheory for Fraud Detection: The Mobile Money Transfer CaseStudy”. In: Camacho D., Braubach L., Venticinque S., BadicaC. (eds) Intelligent Distributed Computing VIII. Studies inComputational Intelligence. Vol. 570. Springer, Cham, 2015,pp. 465–474.
[Luk+04] Sean Luke, Claudio Cioffi-Revilla, Liviu Panait, and KeithSullivan. “Mason: A new multi-agent simulation toolkit”. In:Proceedings of the 2004 SwarmFest Workshop. Vol. 8. 2. 2004,pp. 316–327.
[Lur11] Michael Lurie. What is a business model ? A new approach.Tech. rep. Blue Mine Group, 2011, pp. 1–7.
[LWZ09] Xu Ying Liu, Jianxin Wu, and Zhi Hua Zhou. “Exploratoryunder-sampling for class-imbalance learning”. In: IEEE Trans-
REFERENCES 177
actions on Systems, Man, and Cybernetics, Part B (Cybernet-ics) 39.2 (2009), pp. 539 –550.
[LX12] H Li and Tao Xiong. “Predicting business risk using combinedcase-based reasoning in Euclidean space”. In: World Automa-tion Congress (WAC), 2012 (2012), pp. 1–6.
[MA16] Abdelhak Mansoul and Baghdad Atmani. “Clustering to En-hance Case-Based Reasoning”. In: Chikhi S., Amine A., ChaouiA., Kholladi M., Saidouni D. (eds) Modelling and Implemen-tation of Complex Systems. LNNS. Vol. 1. Springer, 2016,pp. 137–151.
[Mag09] Dan Magnusson. “The costs of implementing the anti-moneylaundering regulations in Sweden”. In: Journal of money laun-dering control 12.2 (2009), pp. 101–112.
[Maj15] Archisman Majumdar. “Social Network Analysis Approachesfor Fraud Analytics”. In: Mphasis NEXTlabs (2015), pp. 1–5.
[Mal+97] Marcus A. Maloof, Pat Langley, Stephanie Sage, and ThomasO. Binford. “Learning to detect rooftops in aerial images”.In: Proceedings of the 1997 Image Understanding Workshop(DARPA97). San Francisco, CA: Morgan Kaufmann, 1997,pp. 835–845.
[Man+12] Jaweria Manzoor, Saara Asif, Maryum Masud, and Malik Ja-han Khan. “Automatic Case Generation for Case-Based Rea-soning Systems Using Genetic Algorithms”. In: Third GlobalCongress on Intelligent Systems (GCIS). China: IEEE, 2012,pp. 311–314.
[Mar+13] Kowalski Martin, Klupfel Hubert, Zelewski Stephan, Bergen-rodt Daniel, and Saur Alexandra. “Integration of Case-Basedand Ontology-Based Reasoning for the Intelligent Reuse ofProject-Related Knowledge”. In: Clausen U., ten Hompel M.,Klumpp M. (eds) Efficiency and Logistics. Lecture Notes inLogistics. Springer, Berlin, Heidelberg, 2013, pp. 289–299.
REFERENCES 178
[MAS] MASON. Multi-Agent Simulator Of Neighborhoods.
[Mch13] Mary L Mchugh. “The Chi-square test of independence Lessonsin biostatistics”. In: Biochemia Medica 23.2 (2013), pp. 143–9.
[Mer11] Cynthia Merritt. Mobile money transfer services: The nextphase in the evolution of person-to-person payments. Tech. rep.2011, pp. 1–32.
[MGA04] Luke K Mcdowell, Kalyan Moy Gupta, and David W Aha.“Case-Based Collective Classification”. In: American Associ-ation for Artificial Intelligence. 2004, pp. 399–404.
[ML10] Britni Must and Kathleen Ludewig. “Mobilemoney: Cell phonebanking in Developing countries”. In: Policymatters 2.7 (2010),pp. 1–35.
[MNV14] Stephen O. Moepya, Fulufhelo V. Nelwamondo, and Christi-aan Van Der Walt. “A Support Vector Machine Approach toDetect Financial Statement Fraud in South Africa: A FirstLook”. In: Asian Conference on Intelligent Information andDatabase Systems (ACIIDS). LNCS. Vol. 8398. Springer, Cham,2014, pp. 42–51.
[Moh+09] Azlinah Mohamed, Ahmad Fuad Mohamed Bandi, Abdul RazifTamrin, Md Daud Jaafar, Suriah Hasan, and Faeizah Jusof.“Telecommunication fraud prediction using backpropagationneural network (SoCPaR)”. In: International Conference ofSoft Computing and Pattern Recognition. Malaysia: IEEE Com-puter Society, 2009, pp. 259–265.
[Mou+06] Kim Mouridsen, Søren Christensen, Louise Gyldensted, andLeif Østergaard. “Automatic selection of arterial input func-tion using cluster analysis”. In: Magnetic Resonance in Medicine55.3 (2006), pp. 524–531.
[MP17] N. Malini and M. Pushpa. “Analysis on credit card fraud iden-tification techniques based on KNN and outlier detection”. In:Advances in Electrical, Electronics, Information, Communica-
REFERENCES 179
tion and Bio-Informatics (AEEICB). Ed. by IEEE. Chennai,India, 2017.
[Muy15] Catherine Muya. “Mobile money in Africa”. In: Barclays BankPLC (2015), pp. 1–5.
[Net] Netlogo. Agent-Based Modelling Toolkit.
[Nga+11] E. W. T Ngai, Yong Hu, Y. H. Wong, Yijun Chen, and XinSun. “The application of data mining techniques in finan-cial fraud detection: A classification framework and an aca-demic review of literature”. In: Decision Support Systems 50.3(2011), pp. 559–569.
[NK14] Evgenia Novikova and Igor Kotenko. “Visual analytics for de-tecting anomalous activity in mobile money transfer services”.In: Teufel S., Min T.A., You I., Weippl E. (eds) Availability,Reliability, and Security in Information Systems. CD-ARES2014. LNCS 8708 (2014), pp. 63–78.
[Nol17] Ian Nolan. “Transaction Fraud Detection using Random For-est Classifier and Logistic Regression”. In: Neural Networks &Machine Learning 1.1 (2017).
[Ols14] Dominik Olszewski. “Fraud detection using self-organizing mapvisualizing the user profiles”. In: Knowledge-Based Systems 70(2014), pp. 324–334.
[OT08] Nikunj C. Oza and Kagan Tumer. “Classifier ensembles: Selectreal-world applications”. In: Information Fusion 9.1 (2008),pp. 4–20.
[Pad+07] T. Maruthi Padmaja, Narendra Dhulipalla, P Radha Krishna,Raju S Bapi, and A Laha. “An Unbalanced Data Classifica-tion Model Using Hybrid Sampling Technique for Fraud Detec-tion”. In: Ghosh A., De R.K., Pal S.K. (eds) Pattern Recogni-tion and Machine Intelligence (PReMI 2007) LNCS. Vol. 4815.Springer, Berlin, Heidelberg, 2007, pp. 341–348.
REFERENCES 180
[Pae17] Anthea Paelo. “A Comparison of the Mobile Financial ServicesSector in Kenya, Tanzania and Uganda”. In: The 3rd AnnualCompetition and Economic Regulation (ACER) Conference.Dar es Salaam, Tanzania, 2017, pp. 1–19.
[PAG07] Jean Pinquet, Mercedes Ayuso, and Montserrat Guillen. “Se-lection bias and auditing policies for insurance claims”. In:Journal of Risk and Insurance 74.2 (2007), pp. 425–440.
[Pan+09] Suvasini Panigrahi, Amlan Kundu, Shamik Sural, and A. K.Majumdar. “Credit card fraud detection: A fusion approachusing Dempster-Shafer theory and Bayesian learning”. In: In-formation Fusion 10.4 (2009), pp. 354–363.
[Pat73] Edward A. Patrick. “Clustering Using a Similarity MeasureBased on Shared Near Neighbors”. In: IEEE Transactions onComputers C-22.11 (1973), pp. 1025–1034.
[PDM15] Radu Platon, Vahid Raissi Dehkordi, and Jacques Martel.“Hourly prediction of a building’s electricity consumption us-ing case-based reasoning, artificial neural networks and princi-pal component analysis”. In: Energy and Building 92 (2015),pp. 10–18.
[Per+13] Kasun S. Perera, Bijay Neupane, Mustafa Amir Faisal, ZeyarAung, and Wei Lee Woon. “A novel ensemble learning-basedapproach for click fraud detection in mobile advertising”. In:Prasath R., Kathirvalavakumar T. (eds) Mining Intelligenceand Knowledge Exploration. LNCS. Vol. 8284. Springer, Cham,2013, pp. 370–382.
[PF01] Foster Provost and Tom Fawcett. “Robust classification forimprecise environments”. In: Machine Learning 42.3 (2001),pp. 203–231.
[PH02] Cheol-Soo Park and Ingoo Han. “A case-based reasoning withthe feature weights derived by analytic hierarchy process forbankruptcy prediction”. In: Expert Systems with Applications23.3 (2002), pp. 255–264.
REFERENCES 181
[PH07] Jim Prentzas and Ioannis Hatzilygeroudis. “Categorizing ap-proaches combining rule-based and case-based reasoning”. In:Expert Systems 24.2 (2007), pp. 97–122.
[Phu+10] Clifton Phua, Vincent Lee, Kate Smith, and Ross Gayler. “AComprehensive Survey of Data Mining-based Fraud DetectionResearch”. In: International Conference on Intelligent Compu-tation Technology and Automation (ICICTA). Vol. 3. Chang-sha, China: IEEE, 2010, p. 14.
[Pow07] David M W Powers. Evaluation: From Precision, Recall andF-measure to ROC, Informedness, Markedness & Correlation.Tech. rep. Adelaide, Australia: Technical Report (SIE) Schoolof Informatics and Engineering, Flinders University, 2007, pp. 1–24.
[Poz15] Andrea Dal Pozzolo. “Adaptive Machine Learning for CreditCard Fraud Detection”. PhD thesis. Universite Libre de Brux-elles, 2015, pp. 1–55.
[Pro00] Foster Provost. “Machine learning from imbalanced data sets101”. In: Proceedings of the AAAI’2000 Workshop on Imbal-anced Data Sets. 2000, pp. 1–3.
[PZ06] Yaling Pei and Osmar Zaıane. A synthetic data generator forclustering and outlier analysis. Tech. rep. Alberta: Universityof Alberta, 2006, pp. 1–33.
[QF+15] Zhou Qi-Feng, Zhou Hao, Ning Yong-Peng, Yang Fan, andLi Tao. “Two approaches for novelty detection using randomforest”. In: Expert Systems with Applications 42.10 (2015),pp. 4840–4850.
[Qin+18] Yuchu Qin, Wenlong Lu, Qunfen Qi, Xiaojun Liu, Meifa Huang,Paul J. Scott, and Xiangqian Jiang. “Towards an ontology-supported case-based reasoning approach for computer-aidedtolerance specification”. In: Knowledge-Based Systems 141 (2018),pp. 129–147.
REFERENCES 182
[Ran+17] Kuldeep Randhawa, Chu Kiong Loo, Manjeevan Seera, CheePeng Lim, and Asoke K. Nandi. “Credit card fraud detectionusing AdaBoost and majority voting”. In: IEEE Access XX(2017), pp. 1–8.
[Rav+11] P. Ravisankar, V. Ravi, G. Raghava Rao, and I. Bose. “Detec-tion of financial statement fraud and feature selection usingdata mining techniques”. In: Decision Support Systems 50.2(2011), pp. 491–500.
[Rep] Repast. Recursive porous agent simulation toolkit.
[RGDAGC08] Juan a Recio-Garcıa, Belen Dıaz-Agudo, and Pedro Gonzalez-Calero. jCOLIBRI2 Tutorial. Tech. rep. University Complutenseof Madrid, 2008, pp. 1–110.
[RGGCDA14] Juan A. Recio-Garcia, Pedro A. Gonzalez-Calero, and BelenDiaz-Agudo. “Jcolibri2: A framework for building Case-basedreasoning systems”. In: Science of Computer Programming 79(2014), pp. 126–145.
[Rie+13] Roland Rieke, Maria Zhdanova, Jurgen Repp, Romain Giot,and Chrystel Gaber. “Fraud Detection in Mobile PaymentsUtilizing Process Behavior Analysis”. In: International Con-ference on Availability, Reliability and Security (ARES). Ger-many: IEEE, 2013, pp. 662–669.
[RK04] Bhavani Raskutti and Adam Kowalczyk. “Extreme Re-balancingfor SVMs: a case study”. In: ACM SIGKDD Explorations 6.1(2004), pp. 60–69.
[RLJ06] Steven F. Railsback, Steven L. Lytinen, and Stephen K. Jack-son. “Agent-based Simulation Platforms: Review and Develop-ment Recommendations”. In: Simulation 82.9 (2006), pp. 609–623.
[RN03] Stuart J. Russell and Peter Norvig. Artificial Intelligence: AModern Approach. 2nd. Alan Apt, 2003, pp. 215–218.
REFERENCES 183
[Rog83] Schank C. Roger. Dynamic memory: A theory of remindingand learning in computers and people. Combridge UniversityPress New York, NY, USA, 1983, pp. 1–20.
[San+17] B Santoso, H Wijayanto, K.A Notodiputro, and B Sartono.“Class Imbalanced Problems : A Review”. In: Conference Se-ries: Earth and Environmental Science. Vol. 58. 1. IOP Pub-lishing Ltd, 2017, pp. 427–436.
[Sar+14] P. Saravanan, V. Subramaniyaswamy, N. Sivaramakrishnan,M. Arun Prakash, and T. Arunkumar. “Data mining approachfor subscription-fraud detection in telecommunication sector”.In: Contemporary Engineering Sciences 7.11 (2014), pp. 515–522.
[SBD13] Yusuf Sahin, Serol Bulkan, and Ekrem Duman. “A cost-sensitivedecision tree approach for fraud detection”. In: Expert Systemswith Applications 40 (2013), pp. 5916–5923.
[SCP16] David Shrier, German Canale, and Alex Pentland. Mobile Money& Payments : Technology Trends. Tech. rep. Massachusetts In-stitute of Technology, 2016.
[SE13] Abir Smiti and Zied Elouedi. “Using clustering for maintainingcase based reasoning systems”. In: 5th International Confer-ence on Modeling, Simulation and Applied Optimization (ICM-SAO). 2013, pp. 1–6.
[SEDMK14] Dina A. Sharaf-El-Deen, Ibrahim F. Moawad, and M.E Khal-ifa. “A New Hybrid Case-Based Reasoning Approach for Medi-cal Diagnosis Systems”. In: Medical Systems 38.2 (2014), pp. 1–11.
[SH99] Kyung-shik Shin and Ingoo Han. “Case-based reasoning sup-ported by genetic algorithms for corporate bond rating”. In:(Expert Systems with Applications) 16.2 (1999), pp. 85–95.
[Sha+16] Aulon Shabani, Adil Paul, Radu Platon, and Eyke Hullermeier.“Predicting the Electricity Consumption of Buildings: An Im-
REFERENCES 184
proved CBR Approach”. In: Goel A., Dıaz-Agudo M., Roth-Berghofer T. (eds) Case-Based Reasoning Research and Devel-opment. ICCBR 2016. LNCS. Atlanta, USA: Springer BerlinHeidelberg, 2016, pp. 356–369.
[She13] Sheng Shen. Forecast: Mobile Payment, Worldwide, 2013 Up-date. 2013.
[SK11] Sanjay Sood and Parijat Kat. “Business Listing Classifica-tion Using Case Based Reasoning and Joint Probability”. In:AAAI2011 Symposium. 2011, pp. 23–28.
[SK13] Georgios Samakovitis and Stelios Kapetanakis. “Computer-aided Financial Fraud Detection: Promise and Applicabilityin Monitoring Financial Transaction Fraud”. In: Proceedingsof International Conference on Business Management and IS,Dubai, United Arab Emirates. 2013.
[SM70] John W Slocum and H Lee Mathews. “Social Class and Incomeas Indicators of Consumer Credit Behavior”. In: Journal ofMarketing 34.2 (1970), pp. 69–74.
[SMM13] Yaya Sylla and Pierre Morizet-Mahoudeaux. “Fraud Detectionon Large Scale Social Networks”. In: International Congresson Big Data. 1. Santa Clara, CA, USA: IEEE, 2013, pp. 413–414.
[SN10] K. K. Sherly and R Nedunchezhian. “BOAT adaptive creditcard fraud detection system”. In: International Conference onComputational Intelligence and Computing Research. Coim-batore, India: IEEE, 2010, pp. 1–7.
[SP15] Sharmila Subudhi and Suvasini Panigrahi. “Quarter-SphereSupport Vector Machine for Fraud Detection in Mobile Telecom-munication Networks”. In: International Conference on Com-puter, Communication and Convergence (ICCC 2015). Vol. 48.Elsevier, 2015, pp. 353–359.
REFERENCES 185
[SPA10] Ikan Sidat, D I Perairan, and Segara Anakan. “Sebaran Uku-ran Hasil Tangkapan Dan Aspek Reproduksi”. In: PatternRecognition Letters, Science Direct 20.10 (2010).
[Sri+99] Anurag Srivastava, Eui-Hong Han, Vipin Kumar, and VineetSingh. “Parallel Formulations of Decision-Tree ClassificationAlgorithms”. In: Srivastava, A., Han, EH., Kumar, V. et al.Data Mining and Knowledge Discovery 3.3 (1999), pp. 237–261.
[SS99] Robert E. Schapire and Yoram Singer. “Improved boostingalgorithms using confidence-rated predictions”. In: MachineLearning 37.3 (1999), pp. 297–336.
[Ste11] Peer Stein. IFC Mobile Money Study 2011. Tech. rep. Wash-ington, DC, USA: International Finance Corporation, WorldBank Group, 2011.
[Sud+10] Agus Sudjianto, Sheela Nair, Ming Yuan, Aijun Zhang, DanielKern, and Fernando Cela-Dıaz. “Statistical methods for fight-ing financial crimes”. In: Technometrics 52.1 (2010), pp. 5–19.
[Sun+07] Yanmin Sun, Mohamed S. Kamel, Andrew K.C. Wong, andYang Wang. “Cost-sensitive boosting for classification of im-balanced data”. In: Pattern Recognition 40.12 (2007), pp. 3358–3378.
[SVF15] Cesar Silva, Germano Vasconcelos, and Gabriel Frana. “Case-based Reasoning Combined with Neural Networks for CreditRisk Analysis”. In: International Joint Conference on NeuralNetworks (IJCNN). Killarney, Ireland: IEEE, 2015.
[Swa] Swarm. Agent-Based Modelling Toolkit.
[TD13] Lin Tong and Wu Di. “Research on Optimization of Case-Based Reasoning System”. In: Third International Confer-ence on Control, Automation and Systems Engineering (CASE2013). Atlantis Press, 2013, pp. 34–37.
REFERENCES 186
[Tob11] Peter Tobbin. “Understanding the mobile money ecosystem:Roles, structure and strategies”. In: Proceedings of 10th Inter-national Conference on Mobile Business, ICMB 2011. Como,Italy: IEEE Computer Society, 2011, pp. 185–194.
[Tom76] Ivan Tomek. “Two Modifications of CNN”. In: Transactionson Systems, Man, and Cybernetics SMC-6.11 (1976), pp. 769–772.
[Tru15] FinMark Trust. FinScope Consumer Survey Zimbabwe 2014.Tech. rep. 2015, pp. 1–11.
[Uga16] Bank of Uganda. Annual Supervision Report. Tech. rep. De-cember. Kampala, 2016, pp. 24–25.
[Via+04] S. Viaene, D. Van Gheel, M. Ayuso, and Guillen M. “Cost-Sensitive Design of Claim Fraud Screens”. In: Perner P. (eds)Advances in Data Mining. ICDM 2004. LNCS. Vol. 3275. Springer,Berlin, Heidelberg, 2004.
[VKN07] Jason Van Hulse, Taghi M. Khoshgoftaar, and Amri Napoli-tano. “Experimental perspectives on learning from imbalanceddata”. In: Proceedings of the 24th international conference onMachine learning - ICML ’07. ACM, 2007, pp. 935–942.
[Ver+17] Van Vlasselaer Veronique, Eliassi-Rad Tina, Akoglu Leman,Snoeck Monique, and Baesens Bart. “GOTCHA! Network-based fraud detection for social security fraud”. In: Manage-ment science 63.9 (2017), pp. 3090 –3110.
[WA00] Richard Wheeler and Stuart Aitken. “Multiple algorithms forfraud detection”. In: Knowledge-Based Systems 13.2 (2000),pp. 93–99.
[Wan+03] Haixun Wang, Wei Fan, Philip S. Yu, and Jiawei Han. “Min-ing concept-drifting data streams using ensemble classifiers”.In: Proceedings of the ninth ACM SIGKDD international con-ference on Knowledge discovery and data mining. ACM, 2003,pp. 226–235.
REFERENCES 187
[Wat99] Ian Watson. “Case-based reasoning is a methodology not atechnology”. In: Knowledge-Based Systems 12 (1999), pp. 303–308.
[WB15] Jarrod West and Maumita Bhattacharya. “Mining FinancialStatement Fraud An Analysis of Some Experimental Issues”.In: Proceedings of The 10th IEEE Conference on IndustrialElectronics and Applications (ICIEA 2015) (2015).
[WB16] Jarrod West and Maumita Bhattacharya. “Some Experimen-tal Issues in Financial Fraud Detection: An Investigation”.In: The International Conference on Computational Science(ICCS2016). Procedia Computer Science, Elsevier 80 (2016),pp. 7–10.
[Wes+08] David J. Weston, David J. Hand, Niall M. Adams, ChristopherWhitrow, and Piotr Juszczak. “Plastic card fraud detectionusing peer group analysis”. In: Advances in Data Analysis andClassification 2.1 (2008), pp. 45–62.
[WHV08] Mark A. Whiting, Jereme Haack, and Carrie Varley. “Creat-ing realistic, scenario-based synthetic data for test and evalu-ation of information analytics software”. In: Proceedings of the2008 Workshop on BEyond Time and Errors: Novel EvaLu-ation Methods for Information Visualization (BELIV08). 8.Florence, Italy: ACM, 2008, pp. 1–6.
[Wil72] Dennis L. Wilson. “Asymptotic Properties of Nearest Neigh-bor Rules Using Edited Data”. In: IEEE Transactions on Sys-tems, Man and Cybernetics SMC-2.3 (1972), pp. 408–421.
[WMZ07] Gary M. Weiss, Kate McCarthy, and Bibi Zabar. “Cost-sensitivelearning vs. sampling: Which is best for handling unbalancedclasses with unequal error costs?” In: IEEE ICDM. 2007, pp. 35–41.
[WP03] Gary M Weiss and Foster J Provost. “Learning When TrainingData are Costly: The Effect of ClassDistribution on Tree In-
REFERENCES 188
duction.” In: J. Artif. Intell. Res. (JAIR) 19 (2003), pp. 315–354.
[WST10] Jack William, Tavneet Suri, and Robert Townsend. “MonetaryTheory and Electronic Money: Reflections on the Kenyan Ex-perience.” In: Economic Quarterly, Massachusetts Institute ofTechnology 96.1 (2010), pp. 83–122.
[Wue+16] Thorsten Wuest, Daniel Weimer, Christopher Irgens, and Klaus-Dieter Thoben. “Machine learning in manufacturing: advan-tages, challenges, and applications”. In: Production & Manu-facturing Research 4.1 (2016), pp. 23–45.
[XSL07] Jianyun Xu, Andrew H. Sung, and Qingzhong Liu. “Behaviourmining for fraud detection”. In: Journal of Research and Prac-tice in Information Technology 39.1 (2007), pp. 3–18.
[Yar16] Suresh Yaram. “Machine learning algorithms for documentclustering and fraud detection”. In: Data Science and Engi-neering (ICDSE). Cochin, India: IEEE, 2016.
[YL06] Show Jane Yen and Yue Shi Lee. “Under-sampling approachesfor improving prediction of the minority class in an imbalanceddataset”. In: Huang DS., Li K., Irwin G.W. (eds) IntelligentControl and Automation. Lecture Notes in Control and Infor-mation Sciences 344 (2006), pp. 731–740.
[YMR14] Wei Liang Yeow, Rohana Mahmud, and Ram Gopal Raj. “Anapplication of case-based reasoning with machine learning forforensic autopsy”. In: Expert Systems with Applications 41.7(2014), pp. 3497–3505.
[Yua+17] Shuhan Yuan, Xintao Wu, Jun Li, and Aidong Lu. “Spectrum-based deep neural networks for fraud detection”. In: Proceed-ings of the 2017 ACM on Conference on Information andKnowledge Management. Singapore, Singapore: ACM, 2017,pp. 2419–2422.
REFERENCES 189
[Yue+07] Dianmin Yue, Xiaodan Wu, Yunfeng Wang, and Yue Li. “AReview of Data Mining-based Financial Fraud Detection Re-search”. In: International Conference on Wireless Communi-cations, Networking and Mobile Computing (WiCom 2007).Shanghai, China: IEEE, 2007, pp. 5519–5522.
[Yu+16] Dequan Yu, Chuan Lv, Quan Lei Wu, and Xu Peng. “Study oncase-based reasoning expert system about the matching opti-mization of particle swarm optimization algorithm”. In: Prog-nostics and System Health Management Conference, PHM 2015.2016.
[Zha12] Jian Zhang. “Financial inclusion and integration through mo-bile payments and transfer”. In: Proceedings of Workshopson Enhancing Financial Integration throgh Sound Regulationof Cross-Border Mobile Payments: Opportunities and Chal-lenges. Mumbai, India: Africa Development Bank Group, 2012.
[Zhd+14] Maria Zhdanova, Jurgen Repp, Roland Rieke, Chrystel Gaber,and Baptiste Hemery. “No smurfs: Revealing fraud chains inmobile money transfers”. In: 9th International Conference onAvailability, Reliability and Security, ARES 2014. IEEE, 2014,pp. 11–20.
[ZLD14] Yu Jie Zhao, Xin Xing Luo, and Li Deng. “A CBR-Based andMAHP-based customer value prediction model for new prod-uct development”. In: Scientific World Journal 2014.1 (2014).
[ZM03] Jianping Zhang and Inderjeet Mani. “kNN Approach to Un-balanced Data Distributions: A Case Study involving Informa-tion Extraction”. In: Workshop on Learning from ImbalancedDatasets II ICML Washington DC 2003. Washington DC: Sci-entific Research Publishing Inc., 2003, pp. 42–48.
[ZS06] V Zaslavsky and A Strizhak. “Credit Card Fraud DetectionUsing Self-Organizing Maps”. In: Information and Security.18 (2006), pp. 48–63.
REFERENCES 190
[ZS15] Masoumeh Zareapoor and Pourya Shamsolmoali. “Applicationof credit card fraud detection: Based on bagging ensemble clas-sifier”. In: Procedia Computer Science. Vol. 48. Elsevier Ltd,2015, pp. 679–686.
[ZSY03] Zhongfei Zhang, John J. Salerno, and Philip S. Yu. “Apply-ing data mining in investigating money laundering crimes”.In: Proceedings of the SIGKDD International Conference onKnowledge Discovery and Data Mining. Washington, DC: ACM,2003, pp. 747–752.
[ZY12] Cha Zhang and Ma Yunqian. Ensemble Machine Learning.Springer Berlin Heidelberg, 2012, pp. 1–35.
[Com18] Information Commissioner’s Office Uk. Guide to the GeneralData Protection Regulation (GDPR). 2018.
[Int13a] International Telecommunication Union. The Mobile MoneyRevolution. Part 1: NFC Mobile Payments. Tech. rep. ITU-TTechnology, 2013, p. 22.
[Int13b] International Telecommunication Union. The Mobile MoneyRevolution Part 2: Financial Inclusion Enabler. Tech. rep.ITU-T Technology, 2013, p. 30.
[Isl+02] E. Islas Perez, C.A. Coello Coello, A. Hernandez-Aguirre, andA. Villavicencio Ramırez. “Genetic Algorithms and Case-BasedReasoning as a Discovery and Learning Machine in the Opti-mization of Combinational Logic Circuits”. In: Coello CoelloC.A., de Albornoz A., Sucar L.E., Battistutti O.C. (eds) MI-CAI 2002: Advances in Artificial Intelligence. MICAI 2002.2002, pp. 128–137.
[Van+15] Veronique Van Vlasselaer, Cristian Bravo, Olivier Caelen, TinaEliassi-Rad, Leman Akoglu, Monique Snoeck, and Bart Bae-sens. “APATE: A novel approach for automated credit cardtransaction fraud detection using network-based extensions”.In: Decision Support Systems 75 (2015), pp. 38–48.