+ All Categories
Home > Documents > SITI HAJAR BT MAT ZAN

SITI HAJAR BT MAT ZAN

Date post: 16-Oct-2021
Category:
Upload: others
View: 10 times
Download: 0 times
Share this document with a friend
61
GENDER CLASSIFICATION THROUGH DYNAMIC KEYSTROKE BASED ON MOBILE PHONE USING ARTIFICIAL NEURAL NETWORK SITI HAJAR BT MAT ZAN BACHELOR OF COMPUTER SCIENCE WITH HONOURS (COMPUTER NETWORK SECURITY) UNIVERSITI SULTAN ZAINAL ABIDIN 2018
Transcript
Page 1: SITI HAJAR BT MAT ZAN

i

i

GENDER CLASSIFICATION THROUGH DYNAMIC KEYSTROKE BASED ON MOBILE PHONE USING ARTIFICIAL NEURAL

NETWORK

SITI HAJAR BT MAT ZAN

BACHELOR OF COMPUTER SCIENCE WITH HONOURS

(COMPUTER NETWORK SECURITY)

UNIVERSITI SULTAN ZAINAL ABIDIN

2018

Page 2: SITI HAJAR BT MAT ZAN

ii

GENDER CLASSIFICATION THROUGH DYNAMIC KEYSTROKE BASED ON MOBILE PHONE USING ARTIFICIAL NEURAL

NETWORK

SITI HAJAR BT MAT ZAN

BACHELOR OF COMPUTER SCIENCE WITH HONOURS

(COMPUTER NETWORK SECURITY)

FACULTY OF INFORMATICS AND COMPUTING

UNIVERSITI SULTAN ZAINAL ABIDIN, TERENGGANU, MALAYSIA

AUGUST 2018

Page 3: SITI HAJAR BT MAT ZAN

i

DECLARATION

I hereby declare that the project report entitled Gender Classification Through

Dynamic Keystroke Based on Mobile Phone Using Artificial Neural Network is based

on the result of my research with information from sources that is stated in

confession. I also declare that it has not been produced by any other degree and other

education institutions.

________________________________

Name :SitiHajarBt Mat Zan

Date : .................................................

Page 4: SITI HAJAR BT MAT ZAN

ii

CONFIRMATION

This Project report Title Gender Classification Through Dynamic Keystroke Based on

Mobile Using Artificial Neural Network was prepared and submitted by SitiHajarBt

Mat Zan. This project report has meet the requirement in term of scope, quality and

presentation for the Bachelor of Computer Science (Network Security) with Honors in

University Sultan ZainalAbidin.

_______________________________

Supervisor : Dr MohamadAfendee b Mohamed

Date : ..................................................

,

Page 5: SITI HAJAR BT MAT ZAN

iii

DEDICATION

First of all, I would like to express my gratitude to the Most Gracious and The Most

Merciful to Allah S.W.T for his blessing that been given for me able to complete my

final year project, Gender Classification through Dynamic Keystroke Based on

Mobile Using Artificial Neural Network.

The research presented in this dissertation could not have been to complete without

the support,encouragement and cooperation of many people. I would like to express

my deepest gratitude to the most important person that was patiently supervising,

advising, teaching and giving encouragement at each of stage throughout the

development this project, Dr.MohamadAfendee bin Mohamed. I would like to thank

him for giving the opportunity to work and learn under his guidance along the way of

completing this project.

I also would like thank to all lectures especially to all member of panels that involve

in this final year project of Faculty of Informatics Computing for their comment,

feedback and advises to improving my project progress. Other than that, I would like

to show my appreciation to my parents Mat ZanARani and ZaidaCheMuda that given

moral support and attention in order to finishing this project.

My sincere thanks also go to my beloved friends for their encouragement

and valuable advice in completing this project. May Allah S.W.T blesses all of you

for the effort that been given. This project would not accomplish without their

precious support.

Page 6: SITI HAJAR BT MAT ZAN

iv

ABSTRACT

Cyber crime also known as computer oriented crimes is refers to misuse of computer

and network equipment to steal, modify and damage the data for particular purpose.

It’sbeen used to threaten people security personally. Computer crimes are difficult to

be detected and proven due to the cyber crime happens virtually and cannot be proven

physically by the easy way. Nowadays cyber crimes mostly remain unsolved due to

limitation of evidence gathering process from security experts to identify the potential

attacker. There are method that can be used to searching criminals by narrow down

the possibility of the criminal identity by identifying their gender. The gender

identifying is useful to department of cyber security to take further action for criminal

that involve with cyber crime.Keystroke dynamics is an alternative approach to

identifying the gender of criminal. Keystroke dynamics is known as behavioural

biometric that refers to the rhythm of the individual typing on a touch keyboard based

on the manner which is a automated method of identity identifying. Keystroke

dynamics capture the individual unique behavioural characteristic of typing rhythm

and it will automated generate dataset type of gender by recording type of pattern

from a group of mobile user. The gender are classified on the criteria that meet the

requirement of the keyboard features types. In this project the artificial neural network

algorithm will applied. Artificial neural network will create the signature to pattern

type from individual and differentiate their data classification type whether male or

female. It is anticipated that this will bring a greatcontribution to the investigatorsby

providing information of gender for the investigations.

Page 7: SITI HAJAR BT MAT ZAN

v

ABSTRAK

Jenayah siber juga dikenali sebagai jenayah berorientasikan komputer merujuk

kepada salah guna komputer dan peralatan rangkaian untuk mencuri, mengubah suai

dan merosakkan data untuk tujuan tertentu. Ia telah digunakan untuk mengancam

keselamatan orang secara peribadi. Jenayah komputer sukar untuk dikesan kerana

jenayah siber yang berlaku tidak dapat dibuktikan secara fizikal dengan cara yang

mudah. Kini, jenayah siber sebahagian besarnya masih tidak dapat diselesaikan

kerana terhadnya proses pengumpulan bukti daripada pakar keselamatan untuk

mengenal pasti penjenayah siber yang berpotensi. Terdapat kaedah yang boleh

digunakan untuk mencari penjenayah dengan mengurangkan kemungkinan identiti

jenayah dengan mengenal pasti jantina mereka. Pengenalpastian jantina berguna

untuk jabatan keselamatan siber untuk mengambil tindakan selanjutnya terhadap

jenayah yang melibatkan jenayah siber. Dinamika keystroke adalah pendekatan

alternatif untuk mengenal pasti jantina jenayah. Dinamika keystroke dikenali sebagai

biometrik tingkah laku yang merujuk kepada irama pemetik individu pada papan

kekunci sama ada komputer mahupun papa kekunci sentuh berdasarkan cara mereka

menaip yang merupakan kaedah pengenalan identiti automatik. Dinamika keystroke

menangkap ciri perilaku unik individu menaip irama dansecara automatik

menghasilkan jenis dataset jantina dengan merakam jenis corak dari sekumpulan

pengguna komputer. Jantina dikelaskan berdasarkan kriteria yang memenuhi

keperluan jenis ciri papan kekunci. Dalam projek ini, algoritma rangkaian neural

tiruan akan digunakan. Rangkaian neural tiruan akan mewujudkan model kepada

jenis corak dari individu dan membezakan jenis klasifikasi data mereka sama ada

lelaki atau perempuan. Dengan adanya data klasifikasi mengenai janitna penjenayah

ini diharapkan akan memberi sumbangan besar kepada penyiasat dengan

menyediakan maklumat untuk melakukan siasatan.

Page 8: SITI HAJAR BT MAT ZAN

vi

TABLE OF CONTENTS

PAGE

DECLARATION I

CONFIRMATION II

DEDICATION III

ABSTRACT IV

ABSTRAK V

CONTENTS VI-IX

LIST OF TABLES X

LIST OF FIGURES XI

LIST OF ABBREVIATONS / TERMS / SYMBOLS XII

LIST OF APPENDICES

XIII

CHAPTER 1 INTRODUCTION

1.1 Project Background 1-2

1.2 Problem Statement 2

1.3 Objectives 3

1.4 Scopes of Work 3

1.5 Limitation Of Work 4

1.6 Thesis Structure 5

Page 9: SITI HAJAR BT MAT ZAN

vii

CHAPTER 2LITERATURE REVIEW

2.1 Introduction 6

2.2 Literature Review

2.1.1 Gender Classification

2.1.2 Biometrics

2.1.3 Biometrics Techniques

2.1.4 Dynamic Keystroke

6

6-7

7

8-9

10-12

2.3 Method Use 13

2.4 Data Mining

2.4.1 Artificial Neural Network

2.4.2 Logistic Regression

2.4.3 Naive Bayes

2.4.3 Decision Table

2.4.5 Sequential Minimal Optimization (SMO)

13

14-15

15-16

16

17

17-18

2.5 Review Summary 19-21

3.5 Summary 22

Page 10: SITI HAJAR BT MAT ZAN

viii

CHAPTER 3 METHODOLOGY

3.1 Introduction 23

3.2 Scientific Research Method 23-25

3.3 Knowledge Discovery in Database

3.4.1 Attribute Selection

3.4.2 Data Pre-Processing

3.4.3 Data Transformation

3.4.3 Data Mining

3.4.5 Interpretation / Evaluation

25

26

26

26

27

27

3.5 System Requirement and Specification

3.51 Hardware Requirement

3.5.2 Software Requirement

28

28

28

3.5 Framework 29-30

3.6 Datasets 30

3.7 Summary 31

Page 11: SITI HAJAR BT MAT ZAN

ix

CHAPTER 4 RESULT AND DISCUSSION

4.1 Introduction 32

4.2 Experimental Results 32-34

4.3 Comparison in Accuracy of Data Model 34-35

4.4 Duration of Building Data Model 36

4.5 Artificial Neural Network Algorithm Confusion Matrix 36

4.5.1 Calculation 37

4.6 Summary 39

Page 12: SITI HAJAR BT MAT ZAN

x

CHAPTER 5 CONCLUSION

5.3 Limitation

40

5.4 Future work 40

REFERENCES

Page 13: SITI HAJAR BT MAT ZAN

xi

LIST OF TABLES

TABLE TITLE PAGE

2.5 Summary of Literature Review 19-21

3.5.1 List of Software Requirement. 28

3.5.2 List of Hardware Requirement. 28

3.6 Table of Dataset Keystroke Dynamic 30

4.6.2 Comparison Betwwen Data Mining Based On

Prescicion,Recall And F-Measure

35

4.6.4 Artificial Neural Network AlgorithmConfusion

Matrix Based on Gender

36

4.5.2 Comparison of Confusion Matrix Of 5 Different Data

Mining

38

Page 14: SITI HAJAR BT MAT ZAN

xii

LIST OF FIGURES

FIGURE TITLE PAGE

1.1 Keystroke Dynamics In The Field Of Computer

Security

12

2.4.1 Diagram Of Artificial Neural Network 15

2.4.3 Formula Of Naive Bayes 14

2.4.5 Formula Of Sequential Minimal Optimization 18

3.2 Scientific Research Method 24

3.3 Knowledge Discovery In Database For Data Analysis 25

3.5 Framework 29

4.2 Graph Of 54 Attribute Of Different Keystroke

Dynamic

33

4.4.1 Comparison Between Data Mining 35

Page 15: SITI HAJAR BT MAT ZAN

xiii

LIST OF ABBREVIATIONS/TERMS/SYMBOLS

ANN Artificial Neural Network

KDD Knowledge discovery Database

KD Key Down

KU Key Up

FAR False Acceptance Rate

FRR False Rejection Rate

EER Equal Error Rate

PP Key Press– Key Press

PR Key Press- Key Release

RP Key Release- Keys Press

RR Key Release-Key Release

QP Quadratic programming

SMO Sequential Minimal Optimization

Page 16: SITI HAJAR BT MAT ZAN

xiv

LIST OF APPENDICES

APPENDIX TITLE PAGE

APPENDIX A

42

APPENDIX B

43

Page 17: SITI HAJAR BT MAT ZAN

1

CHAPTER 1

INTRODUCTION

1.1 ProjectBackground

Biometrics is consist of keystroke dynamics, mouse dynamics, fingerprints, voice,

face that known as nonintrusive that do not require capture information biometrics

using specialized hardware. The term “biometrics” is borrowed from the Greek words

‘bio’ means life and ‘metric’ is to measure. Biometrics refers to the classification of

humans by their physical characteristics or traits. Biometrics is classified into two

parts which is physiological and behavioural biometrics. Physiological biometrics

known as something that related to part of the body such as fingerprint, voices, face

recognition and others .On the other hand, behavioural biometrics related to the

behaviour of a person. Keystroke dynamics and signature verification are some

example of behavioural biometrics. [1]

Keystroke dynamics is a behavioural biometrics that aims to identify users based on

the typing of the individuals such as duration of a keystroke,key hold time, latency of

keystroke, typing error, force of keystrokes and others from numerous of input devices

from normal keyboard to soft keyboards which is based on mobile phone. Many

previous studies have demonstrated that keystroke dynamics has potentialand ability

as a biometrics for identifying the gender that do not require high cost.[2]

Gender is a type of soft biometric that will help the cyber intelligence to investigate

and get relevant information of the person that involve with cyber criminal. Gender

classification has been successfully applied in several biometric identification based

on face, speech, iris or gait recognition. The methods of face recognition always

perform a gender classification first before the face recognition process to halved the

amount of comparisons for faster result in recognition system.

Page 18: SITI HAJAR BT MAT ZAN

2

Neural Network is defined as a network composed of a number interconnected

units[3].Itis designed in a way in order to seek computing of human brain style. As a

result, it is powerful enough to variety of problem been solve that are proved to be

difficult with conventional digital computational methods [3].Neural network can

detect all complex nonlinear relationship between input and outputs which does not

require excessive statistical training [4]

In this project,51 student keystroke features data will be extracted which is the data

are consist of keystroke dynamic features that include flight times and dwell times that

have beencollected from mobile based keystroke dynamic features data based on

previous research paper which is consist of 51 student male and female are required to

type a password to extracted their dynamic keystroke feature based on different

gender. Other than that, theWeka Tool are been used to train the existed data and test

their accuracy on classification of gender.

1.2 Problem Statement

Gender identifying is one step to solve the cyber crime.The most common approach

for detecting the cyber criminal is identity based on their gender in the investigation

that using several types of biometrics data such as face,iris,speech recognition and

others.This method is greatly though to implement for cyber intelligent as it cannot

capture the information of cyber criminalintrusion occur and the cost of

implementation is high rather than keystroke dynamics biometric.Therefore,

nowadays cyber crimes remain unsolved due to limitation of evidence gathering

process to identify the potential attacker that inspired this proposal to be prepared

Page 19: SITI HAJAR BT MAT ZAN

3

1.3 Objectives

The objectives are listed below

1. To Study the ability of keystroke dynamic based on gender classification

2. To Model a gender classification data collected keystroke dynamics using

Artificial Neural Network (ANN)

3. To evaluate and test the accuracy of gender classification using our model in

classifying gender

1.4 Scope of work

The scope of project are listed below

1. 1.The scope of the project is to pre-processing the data of 51 student keystroke

dynamic before used it to the algorithm which is the technique that been used

to convert the raw data into clear data set for feasibility to do analysis.

2. Application scope able to extract keystroke features that include dwell times

(the time interval a key is pressed down), and flight times (the duration

between keystrokes), typing speed.

3. Create machine learning model using WEKA tools from extracted keystroke

features

4. To Test the accuracy of the data model.

Page 20: SITI HAJAR BT MAT ZAN

4

1.5 Limitation Of Work

The accuracy of the gender classification may low due to sample size because the

accuracy of data model is depending on the amount of data sample been used .The

accurateness the result of the data model which represent in percentage are influence

based on the keystroke dynamic data that been collected also have inaccurate and

error as the typing behaviour of each of student may not incorrect enough as it may

influence by emotion, the stress level of person, the switching the different physical

of touch screen keyboard, the influence of medication or alcohol and more.Other than

that, the other effect the result of accuracy of data model by factor of the switching

type of keyboard this may influence the convenient of each user to used the mobile

based data collected of keystroke dynamic to extract the keystroke dynamic features

with precisely.

Page 21: SITI HAJAR BT MAT ZAN

5

1.6Thesis Structure

The first chapter of this report is the introduction that includes introduction,problem

statement, objective and scope for this project. The main of the project to be

contribute is state at this chapter. The second chapter is literature review for the

project. Literature review provide a knowledge and prominent understanding on

previous research paper that been done in related field, which can help the project can

be done without or reduce its imperfection as possible as can.Third Chapter describe

the methodology used in this research. Project Methodology depicts the multiple

development phase that used in the design,testing,implementation of the system.The

requirement needed to done the project also included in this chapter such as hardware

and software requirement.Chapter 4 is the implementation and testing the project.

Result from various inputs and output are tested and recorded to verify and predict the

accuracy of gender classification.Chapter 5 will conclude the general contribution of

project including the future work that can improve this project.

Page 22: SITI HAJAR BT MAT ZAN

6

CHAPTER 2

LITERATURE REVIEW

2.1 Introduction

Literature review’s chapter will discuss about the previous article and research paper

that relate with this topic. In order to have a better understanding of the used and

applied technologies some of information has been gathered.

2.2 Gender Classification

Gender classification is to identify a person’s gender which is an example male or

female, based on its biometric information. Usually facial images are used to extract

features and then a classifier is applied to the extracted features to learn a gender

recognizer. It is an active research topic in computer environment and biometrics

fields. The gender classification the result of gender classification are often in a binary

value which is 1 or 0, that representing male or female. Gender recognition and

classification is crucial fortype of two class classification problem. Although other

biometric traits could also be used for gender classification, such as gait, face-based

approaches which is still a popular method for gender discrimination.[5]

Based on research paper write by Gokhansilahtaroglu (2015), the gender classification

of customer are very important parameters for retailing and marketing. It is well

known that they both play very important roles in purchasing habits. In this study, a

model to predict the gender of an online customer analysing by using mouse

movements. Which is known as biometrics behavioural. To accomplish this purpose,

Ithave been developed a novel data cube model. The model consists of six dimensions

which are customer demographic data, customer visits, mouse movements, online

Page 23: SITI HAJAR BT MAT ZAN

7

shopping cart, external data and time dimension to detect customer gender using

artificial neural network model.This research paper based on gender online customer

may be predicted with up to 80% of success rate. The prediction or classification of

online user gender are useful for promotional and marketing purposes.

2.3 Biometrics

Biometrics itself derived from Greek words ‘bio’ means life and ‘metric’ is to

measure. Biometrics refers to the identification of human by their traits or

characteristics. Biometrics is used as a form of identification. Biometrics can be

categorized into two parts which are physiological and behavioural biometrics.

Physiological biometrics is related to the physical of a person including iris,

fingerprint, face recognition, DNA and many more. Behavioural are associated to the

behaviour of a person that includes particular of voices, mouse dynamics, keystroke

and signature of the user .Biometrics historically, have presented a problem that they

tend to be rather expensive for the average end user [6]

The described characteristics and their related techniques have also been commonly

classified as Soft and Hard biometrics. Soft biometric are those characteristics or

features, usually associated to behavioural traits, that provide some information about

the individual, but lack the distinctiveness to differentiate effectively any two

individuals [7] . On the other hand, Hard biometric traits, are considered better in

terms of distinctiveness, just like the fingerprintor the geometry of the face that can

give great results when classified individual

2.2.3 Biometric techniques

Page 24: SITI HAJAR BT MAT ZAN

8

There are many biometric techniques been used in previous research paper in

distinctiveness of human characteristics based on their gender, emotional states , age

and others. According to research paper that written by Clayton Teppand his research

team on identifying emotional states through keystroke dynamics, His research paper

provide a solution in determining user emotions by analysing the rhythm of an

individual‘s typing patterns on a standard keyboard. The keystroke dynamics

approach allow for the uninfluenced determination of emotion using technology that is

in widespread use nowadays. He isconducted a field study where participants

keystrokes were collected and their emotional states were recorded via self reports by

using various data mining techniques and data get model based on 15 different

emotional states.

Page 25: SITI HAJAR BT MAT ZAN

9

List above show that the most common techniques and their main defining

characteristics for distinctiveness the human characteristics.[8,9]

1. Fingerprint scanning: A fingerprint is the pattern of furrows on the surface of a

fingertip. They are so distinct that even fingerprint of identical twins are

different..This technique has been used for centuries and its validity has been well-

established.

2. Face recognition: This technique focuses on recognizing the global positioning

and shape of the eyes, eyebrows, nose, lips, and chin of the face of an individual.

Applications using identification based on face geometry range from the static,

where users are still in front of non-variable backgrounds to dynamic, uncontrolled

face identification with dynamic backgrounds.

3. Iris scan: The iris is the annular region of the eye bounded by the pupil and the

sclera (white of the eye) on either side. The visual texture of the iris stabilizes

during the first two years of life and its complex structure carries very distinctive

information useful for identification of individuals

4. Hand geometry: This biometric technique focuses and the shape of the hand,

includingthelengthofthe fingers and their respective width. Thetechniqueisvery

simple, relatively easy to use, and inexpensive. Unfortunately, the physical size of

a hand geometry-based system is too big for applications in laptop computers. At

the same time, the use of the shape of the hand as an authentication is totally

viable, but using it to continuously verify a user may not be feasible.,

The most important feature, and the one that is most looked for this proposed is

accuracy in discrimination the characteristics of Human in computer security

environment especially for authentication,password hardening and detecting the cyber

criminal

Page 26: SITI HAJAR BT MAT ZAN

10

2.4Keystroke dynamics

The emergence of keystroke dynamics biometrics was dated back in the late 19th

century, where telegraph revolution was at its peak .It was the major long distance

communication instrument that been used in that century. Telegraph operators could

smoothly differentiate each other by simply listening to the tapping rhythm of dots

and dashes. While telegraph key served as an input device in those days, just like a

computer keyboard, mobile keypad, and touch screen are common input devices in the

21st century. Moreover, hand written signature unique that humans have relied on to

verify identity of an individual for many centuries has the same neurophysiologic

factors just like keystroke pattern.[10]

Individual’s unique profile can be generated by monitoring keyboardkeystroke when

individual typing in the program that been provides. The keyboard input includesthe

time taken key pressed down and released,number of backspace used,the position of

keystroke used and the total key pressed.Keystroke dynamics are usually evaluated

based on the following metrics[ 11] :

1 False Acceptance Rate (FAR) – the percentage that the system wrongly denied

access to user

2 False Rejection Rate (FRR) – the percentage that the system wrongly gives

authorization to unauthorized user

3 Equal Error Rate (EER) – the error rate when the system’s parameter are set such

that FRR and FAR are equal.The lower the EER the more precise the system.

Page 27: SITI HAJAR BT MAT ZAN

11

Keystroke dynamics refers to the habitual patterns or rhythms an individual exhibits

while typing on a keyboard input device. These rhythms and patterns of typing are

idiosyncratic, in the same way as handwritings or signatures, due to their similar

governing neuron physiological mechanisms [12]. Keystroke Dynamics (also known

as Keystroke Biometrics or Typing Dynamics) can also be defined as the detailed

timing information that describes when each key was pressed (KeyDown) (KD) and

when it was released (KeyUp )(KU) as a person is typing on a computer keyboard.

This also includes dwell times (the time interval a key is pressed down), and flight

times (the duration between keystrokes), typing speed, frequency of errors, used of

modifier keys. The principal idea behind this biometric measurement is that every user

has a particular way of typing and that, like any other behavioural biometric system, it

allows the identification, authentication or classification of these users.[13]

Page 28: SITI HAJAR BT MAT ZAN

12

Figure 1.1 Keystroke dynamics in the field of Computer security

Keystroke dynamics is antechnology to distinguish people by their typing rhythm

were demonstrablyreliable, it would significantly advance computer security. For

criminal investigations, keystroke dynamics could tie a suspect to the “scene” of a

computer-based crime much like a fingerprint does in real-world crime. For access

control, keystroke dynamics could act as a second factor in authentication an impostor

who compromised a password would still need to type it with the correct rhythm. For

insider-threat detection, keystroke dynamics could detect when a masquerade is using

another user’s account; the technology could even identify who is using a backdoor

account(Kevin S. Killourhy et al January 2012)

Page 29: SITI HAJAR BT MAT ZAN

13

2.3 Method used

Keystroke dynamics biometric is build or designed with three main modules which are

data capture module, feature extraction module and classifier module. Data

collectingmodule which is fundamental stage that consist of an program that can

collect data regarding the keystroke behaviour on a keyboard of an individual when

individuals is interacting with keyboard. The purpose of feature extraction is to

analyse raw keystroke data to generate user feature and stored as reference template

that can be used to distinctive user behaviour through their mobile based touch screen

keyboard keystroke. Moreover, classifier module are used to identify a user based on

the extraction feature

2.4 Data Mining

The growth of computer technology was produced an enormous amount of data

nowdays.The impact of this growth technology make the difficulty in analyzing for

particular data set.Hence,data mining is useful to extract the crucial and benefit data

from large amount of data.Data mining is a process of finding trends and pattern in

data to discover new information based on KDD(Knowledge Discovery Database).In

order to extract information and pattern in data Algorithm is used in produces a

statically proven result.The comparison of accuracy of the predicting model between

different techniques is the main reason of why this research is done.

Page 30: SITI HAJAR BT MAT ZAN

14

2.4.1 Artificial neural network

Artificial neural network alas defined as Neural network is a network that composed

of interconnected unit(neurons).It is design in a similar to the human brain which is

dynamic organ that involve with training and learning for specific period of time.This

biologically and behavioural characteristics of human brain is converted into artificial

neurons in order to attain the better of result in data mining.[14].The Studies have

found that it is produced a very efficient and effective result in the data mining field.

(Ripundeep et al, 2014)optimization and time-consuming calculations are no longer

needed when ANN is used because it fast and accurate after the training process is

completed, So, the network outputs are predicted directly for the provided inputs

based on what it has learned to predict for a specific system. There are many ANN

types that are used for various applications such as engineering, weather and flood

forecasting, business, and medicine because of their power and ability to generalize

any practical problem (Coit et al., 1998; Twomey et al, 1998).

According to PriyankaMehtaniand her team in their research paperPattern

Classification using ArtificialNeural Networks. The word network in Neural Network

refers to the interconnection between neuronspresent in various layers of a system.

Every system is basically a 3 layered system,which are Input layer, Hidden Layer and

Output Layer. The input layer has inputneurons which transfer data via synapses to the

hidden layer, and similarly the hiddenlayer transfers this data to the output layer via

more synapses. The synapses storesvalues called weights which helps them to

manipulate the input and output to variouslayers. In neural Network, the

backpropagation algorithm and others are learning algorithm that are commonly

used.The networks the output is compared to the expected output and its error is

Page 31: SITI HAJAR BT MAT ZAN

15

computed.The weights are adjusted with the error fed back with each iteration,the

error gradually declines until the neural model produces the expected output

(Giovanni et al , 2013)

Figure 2.4.1 Diagram Of Artificial Neural Network

2.4.2 Logistic Regression

Logistic Regression considered as one of the most common predictive models that are

used in variety of Predicting and identifying in the investigation of cyber criminal

tasks.Logistic regression determines a relative importance for each variable by

estimating probabilities using logistic function.In logistic regression,the model

complexity is low rate, especially when there few interactions terms and variable

transformation used. This indicates that over-fitting and long training time is less of an

issues in this case,Although performing variable selection is way to reduce the

Page 32: SITI HAJAR BT MAT ZAN

16

complexity of the model and consequently decrease the risk of over-fitting,a loss in

the flexibility of the model(Stephan,2003) .

However, the prediction of continuous outcomes are difficult in logistic regression.It

attempts to predict outcomes based on set of independent variables the logic models

may result in overconfidence. The models appear to have more tendency in predictive

power than it actually do as a result of sampling bias.

2.4.3 Naive Bayes

Naive Bayes is a the most simple classification techniques forconstructing classifiers:

models that assign class labels to problem instances represented as vector of feature

values,where the class labels are drawn from finite set.

Figure 2.4.2 Formula of Naive Bayes

Naive Bayes is able to train discrete data and classify in a limit of time and not

sensitive to irrelevant features.The training data to estimate the parameters only

require small amount in order to estimate the parameters necessary for

classification.Unfortunately,Naive Bayes assume the independence of features which

may cause loss of accuracy

Page 33: SITI HAJAR BT MAT ZAN

17

2.4.3 Decision Table

Decision Table is one of the type algorithm for data mining and classification

techniques that involve with hierarchical table that each of the entry in a table of the

most highest gets split by the values of a pair additional attributes to build or form

another table. Method of visualization is presented that let on a model with many

attributes recognize even the attributes not well known with machine learning. The

assorted forms of interaction been used to make the visualization more benefits and

appropriate than other static design.

2.4.4 Sequential Minimal Optimization (SMO)

Sequential Minimal Optimization (SMO) is one of the data mining algorithm which is

fast algorithm for training support vector machine. Support vector machine are need

the really large quadratic programming (QP) optimization problem, this larger QP

problem are break down by the SMO into the smallest possible of QP problems which

is been solved analytical. SMO allows to handle a high and large training sets by

scales the linear and quadratic with assorted test problem. Beside that Sequential

Minimal Optimization can be fasters than Support Vector Machine and sparse data

sets[ John plat, 1988]

Page 34: SITI HAJAR BT MAT ZAN

18

Figure 2.4.3 Formula of Sequential Minimal Optimization

Page 35: SITI HAJAR BT MAT ZAN

19

2.5 Review Summary

Author Title Algortihm Advantages Disadvantages

Shing-honLau, Roy Maxion et. Al, 2014

Clustersand Markers for Keystroke Typing Rhythms

Agnes clustering,Sparse Logistic Regression,Support Vector Machine (SVM)

The typist can be grouped into small number of types.Each type is distinguished from the rest of the population by characteristic keystroke features It can distinguished from the rest of the population by characteristic keystroke features.

The work presented in this paper is only a preliminary investigation, leaving many stones unturned. examined only one data set generalization to other data sets remains to be verified.

PriyankaMehtani, et al 2010

Pattern Classification using Artificial Neural Networks (IRIS dataset)

Artificial Neural Networks,Probabilistic Neural Network (PNN), Optical Backpropagation Algorithm

ANN gives the bestaccuracy classification of gender based on IRIS dataset

Optical Backpro propagation Algorithm less accuracy in gives classification than Artificial Neural Network

GOKHANSILAHTAROGLU et al 2015

predicting gender of online customer using artificial neural networks

artificial neural networks,

K-Means,

K-Medoids

The tests suggest that predictions are accurate enough to be used for business purposes such as marketing, production. It propose the reliability, accuracy and feasibility of predicting online customer gender.

Page 36: SITI HAJAR BT MAT ZAN

20

Dr.ReganMandryk,ClaytonEpp, Mike Lippold et al 20

Identifying emotional states through keystroke dynamics

Decision Tree Determine the affective emotional state of the user without the user aware and not continuously reminded that he is being recorded

Depending on the frequency of the sample period, the interruption to subjects daily activities can be burdensome

AnushriJaswante Asif UllahKhan,BhupeshGour et al

Back Propagation Neural Network Based Gender Classification Technique Based on Facial Features

Back propagation Neural Network

Viola Jones Algorithm

The proposed methodology give 90% accurate results in identifying gender images .The proposed system has a low complexity and is suitable for real time implementations. The efficiency of the proposed method makes it a good choice for real-time systems

PranjaliPohankar,SnehalataKarmare et al 2014

Character Recognition using Artificial Neural Network

Artificial Neural Network(Back Propagation Neural Network)

This paper show that simple character recognition program can be designed. The algorithm used works on gradient decent rule

Handwritten character recognition is a very difficult to get accurate efficiency due to great variation of writing style, different size and shape of the character.

Page 37: SITI HAJAR BT MAT ZAN

21

StepehanDreiseitlaLucila ,OhnoMachando . 2002

Logistic RegressionANN

Logistic Regression, Artificial Neural Network

Neural network are better in terms of discriminatory

The model building process is easier for logisticregression

RipundeepDigh Gill and Ashima.2014

Understanding of Neural Networks

Neural Networks

Neural network offer a significant learningabilities,able to represent highly nonlinear and multivariable relationship

The lack of comparison between Neural Network and logistic regression algorithm

Hongjun Lu et al

Decision Tables: Scalable Classification Exploring RDBMS Capabilities

Decision Table a novel approach to build efficient scalable classifiers by exploring the capability of relational database management systems that support powerful data aggregation and summarization functions.

John Platt, 1998 Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines

Sequential Minimal Optimization

SMO is an improved training algorithm for SVMs. SMO solved quickly and analytically, improving its scaling and computation time significantly.

Page 38: SITI HAJAR BT MAT ZAN

22

2.6 Summary

This chapter discuss the direction of the research and development that will taken for

requirement of the system design. It to ensure that the end product carries the ability to

perform prediction or classification gender type with high accuracy.

Page 39: SITI HAJAR BT MAT ZAN

23

CHAPTER 3

METHODOLOGY

3.1 Introduction

This chapter aims to explain the details of methodology that is being used.The Project

methodology should organized in a systematic and scientific way to solve a problem

in order to ensure the project objectives are achieves.This study focused on the gender

classification project based on data from a group of student in DSS model,testing and

evaluate for its usability.Thereforesuitable methodology need to be adopted to ensure

the successful completion of the project.The study firstlypre-processing the collected

keystroke dynamics data from group of student using a mobile basedsystem.Then,each

of data is classified with its meet criteria by using Neural Network classifying of

gender using Neural Network was develop using waterfall model which is consist of

detailed plan on describing how to develop,maintain replace and alter with specific

software tools.

3.2 Scientific research Method

The scientific research method is a process of experimentation that used to known the

observations and answer questions. The main purpose of an experiment is to

determine whether observations agree with or not with the prediction derived from a

hypothesis which is experiment is designed so that changes to one item cause

something else to vary in predictable path.

Page 40: SITI HAJAR BT MAT ZAN

Based on observations,

research questions to next section.

At this stage,evidence is evaluated from previous experiments, personal scientific

observations and previous research to formulate question

Hypothesis is stated in way that can easily measure and is constructed to answer the

research question. In this case t

gender classification.

Literature study on previous research is done to find the best way to do things and

prevent the repeating the mistake from the previ

hypothesis is to determine whether observations of the real world to agree or conflicts

with the predictions derived from a hypothesis. After the testing phase,general

5

6

24

Figure 3.2 Scientific Research Method

rvations,on the keystroke dynamic for gender classification

to next section.

At this stage,evidence is evaluated from previous experiments, personal scientific

observations and previous research to formulate question

Hypothesis is stated in way that can easily measure and is constructed to answer the

ion. In this case the keystroke dynamics is proposed to be releva

Literature study on previous research is done to find the best way to do things and

ting the mistake from the previous past.The purpose of

hypothesis is to determine whether observations of the real world to agree or conflicts

with the predictions derived from a hypothesis. After the testing phase,general

4 3

7

on the keystroke dynamic for gender classification this leads to

At this stage,evidence is evaluated from previous experiments, personal scientific

Hypothesis is stated in way that can easily measure and is constructed to answer the

is proposed to be relevant to

Literature study on previous research is done to find the best way to do things and

us past.The purpose of testing the

hypothesis is to determine whether observations of the real world to agree or conflicts

with the predictions derived from a hypothesis. After the testing phase,general

1

2

Page 41: SITI HAJAR BT MAT ZAN

25

theories are developed but must be consistent with most or all variable data and with

the other current theories

3.3 Knowledge Discovery in Database

Knowledge discovery in Database, KDD is the process of searching useful

information and patterns in data that consist many steps. According to Gregory

Piatetsky-Shapiro, Christopher Matheus, Padhraic Smyth, and RamasamyUthurusamy

(1996), KDD refers to the process of discovering benefits knowledge from data that

involves with evaluation and possibly interpretation of the patterns to make the

decision of what qualifies as knowledge. The core of the process is refers to process of

Data mining method for extracting and discovery patterns from data.

Figure 3.4Knowledge Discovery in Database for Data Analysis

The data was taken from 51 student typing rhythms and from previous research paper.

Dataset contains 954 instances and 54 attributes that include data keystroke features

based on mobile phone soft keyboard which is include flight times : release-to-press

(RP) the duration of the time interval between a key released and a key that been

Page 42: SITI HAJAR BT MAT ZAN

26

pressed), press-to-release (PP) (The duration of the time interval between key pressed,

release-to-release (RR) (the duration of the time interval between two key released

and (PR) the duration of the time interval between a key that been pressed and

released which that been represent in milliseconds.

3.3.1 Attribute Selection

An attribute selection known as feature selection of subset that relevant to the

features(variables , predictors ) use in model construction. Attribute Selection purpose

to reduce training time, easier interpretation when simplify the model and reducing

over-fitting by enhance generalization.

3.3.2 Data Pre-processing

Data pre-processing are known as a raw data that been transform into understandable

format. Commonly the real-world data is always incomplete, incompatible and likely

to have more errors. There are some of techniques to clean a incompatible data which

is replace the missing value of the mean of the attribute, remove records with missing

value.

3.3.3 Data Transformation

Data transformation known as a process of converting the data format from a source

of data system into the data format of a destination system. In this case, the Dataset

isCSVformat. In order to train model using Weka, the CSV format of the dataset must

be converted into ARFF format before the phase of training and testing model. The

CSV format restrict the data from having special characters in the dataset.

Page 43: SITI HAJAR BT MAT ZAN

27

3.3.4 Data Mining

Data Mining known as a process of extracting information from a data set and

converted or transform it into acomprehensible structure for future use. The data

mining algorithms which is Artificial Neural Network is applied in order to mining the

data by discover the relationship between data

3.3.5 Interpretation/Evaluation

The data that been extracted is interpreted into new of knowledge. The relationship

between the selected attributes and the class is show based on the accuracy of the

model. In this case,model needed to have high accuracy in distinctiveness the gender.

Page 44: SITI HAJAR BT MAT ZAN

28

3.4 Software and hardware requirements

This section will list all of software and hardware that been used to developed the

project is efficient way,

3.4.1 Software Requirement

No Software Purpose

1 Microsoft office 2016 Tool for writing report,proposal and

Gantt Chart

2 Paint Tool for crop and editing images

3 Xampp v3.2.1 Tool to set up and run localhost

4 Google chrome Browser to open and run localhost

5 Dropbox 3.18.1 Tool for backup data in cloud storage

6 MySQL Workbench 17.0 Tool to for check sql syntax

7 Weka 3.6 Tool used for data analysis and data

modelling

3.4.2 Hardware requirement

No Software Purpose

1 Laptop HP 14 Notebook PC

2 Processor Inter (R) core i3

3 Memory

4.00 GB RAM

4 Hard disk

Samsung SSD 500GB

5 System Type 64-bit Operating System

6 Pendrive Kingston 2GB

Page 45: SITI HAJAR BT MAT ZAN

29

3.5System Framework

FIGURE 3.6.1 System Framework

A framework is a conceptual structure to guide for the developing or building

something into useful structure. This Project is divided into 2 phase where as the pre

processing data training and test data using Weka tools. The data that been use which

is keystroke dynamic from 51 student that require to type 13 character of sentences

that capture that keystroke dynamic data based on touchscreen keyboard which is

contain of the 54 attribute and 954 instances. The attribute include of flight times and

dwell times of key press which is PP( Duration of Time interval between Key press to

Key press) PR (duration of time interval between Key press to Key release,

RP(duration of time interval between Key Release to Key Press) and RR (Duration of

time interval between Key Press to Key Press) That represent in Milliseconddata of 51

student are been used. Each of student are require to type the 13 character of sentences

Page 46: SITI HAJAR BT MAT ZAN

30

(password) and average 15-20 times for each of person that produce 954 instances.

The data that been used are converted from Csv format into Arff format used in

training and test phase of data. The phase of pre-processing data included removing

the unintentional attribute that are not require in this study. At the phase where

artificial neural network algorithm used in wekatool the data are split into 81%

percentage split which is each of is to train data and test data accuracy. From 10/51

student which 19% from data are used for testing that show the 181 data are for test

data. The expected result is to show the best and highest accuracy in gender

classification based on keystroke dynamic features.

3.6 Datasets

3.6 Table of Dataset Keystroke Dynamic

Table 4.6 show half of dataset keystroke dynamic data using timing based feature

which is flight time and dwell times . Column A which is PP represent the

time interval between key press and key press of 13 characters of

RHUUNIVERSITY word, Column B which is PR represent the time interval key

press and key release while Column C which is RR represent the time interval the

time interval between key release and key release. For the column D RP represent

time interval between key release and key press. Lastly for the last column which

is E column show the class of gender each of total 954 attributes based on 4 type

Page 47: SITI HAJAR BT MAT ZAN

31

of attributes which is as mention before PP,PR,RR,R. All of this data represent in

milliseconds. This dataset is taken from research paper before and been customize

to compatible with this research which is focused on classified the gender.

3.7Summary

In this Chapter, it represent all the methodologies that are used by effectiveness

predicting system. It also provide explanation about the required hardware and

software that are used in this project.The explanation of every phase in this project are

been briefly explain in order to able understanding in better way.This chapter also

explained about the design and modelling of the system.

Page 48: SITI HAJAR BT MAT ZAN

32

CHAPTER 4

RESULTS & DISCUSSION

4.1 Introduction

This chapter represented the experimental of the result and analysis of the technique

that been proposed which is artificial Neural Network will be represented. The

experimental result and testing phase show the accuracy that represent in percentage

of correctly classified instances which is the accuracy of algorithm that can gives for

this project . This project are also show the comparison between other algorithm to

show that the proposed algorithm will gives the best accuracy and more reliable

compare of other 5 algorithm that have discuss previous chapter 2.

4.2 Experimental Results

In this study, the proposed algorithm in classifying gender based on keystroke

dynamic feature is implemented using WEKA tool that can train and test the data of

51 student (keystroke dynamic feature) with the data that consist of 954 instances and

54 attributes as shown in Figure 4.2. The accuracy of proposed algorithm in this

project which is Artificial Neural Network are compare with other four algorithm

which is Logistic Regression, Naive Bayes, Decision Table and Sequential Minimal

Optimization. This is are for show that the proposed algorithm will gives the best

accuracy, high recall and precision rate in classified the gender based on their typing

rhythm.

Percentage split are been used to evaluate the accuracy of the classifier which 81% for

data training and other 19% for data testing. This percentage split are used equal each

of algorithm to avoid the data evaluate error and data not bias to each other. The 954

label data are from 51 student that require to type average 18 times of 13 character

which is keystroke dynamic data based on times based that represent in milliseconds.

Page 49: SITI HAJAR BT MAT ZAN

33

Figure 4.2 Graph of 54 attribute of different keystroke dynamic

This figure 4.2 show the comparison of 54 attributes between two type of gender

based on their typing rhythm that show the different of Time interval each of the

keystroke between male and female (Blue represent Male while Red represent

Female). The firstPP1 until thirteenthPP13 attribute are represented the duration of

time interval between a key press for the next key press. In this case have 13 character

(RHUUNIVERSITY) which is R-H-U-U-N-I-V-E-R-S-I-T-Y while for the attribute

Page 50: SITI HAJAR BT MAT ZAN

34

firstPR1 until thirdPR13 is represent the duration of time interval between key press to

the next key release in 13 character. Beside that firstRR1 until fourteenthRR14 are

represent the duration of time interval between key release for the next key release.

Lastly, for the attribute firstRP1 until thirteenthRP13 are represent the duration of

time interval between key release to the next key press. All of these keystroke features

are useful for classified gender.

The metrics of the precision,recall (True positive Rate) accuracy (correctly classified

instances ) is obtained after the classifier is be run with percentage split 81% for

testing and 19% for training.

4.3 Comparison in Accuracy of Data Model

4.3.1 Comparison between data model based on Precision, Recall and F-Measure

Page 51: SITI HAJAR BT MAT ZAN

35

Table 4.3.2 Comparison Between data mining

Table 4.3.1 show that the Artificial Neural Network gives the best result in accuracy

as it show the highest accuracy compare with Naive Bayes, Decision Table,

Sequential Table, Sequential Minimal Optimization and Logistic regression other as

show in table above. Based on table 4.3.2, the precision probability show 0.767 for the

L class (Male) and 0.7691 for P class (Female) .The recall which is equal rate with the

true positive rate show that 0.856 and 0.649 correctly classified instances .This table

show the logistic regression is second best algorithm in data model accuracy followed

by SMO, Decision Table and Naive Bayes.

Page 52: SITI HAJAR BT MAT ZAN

36

4.4Duration of Building Data Model

Time taken to build model 46.53 seconds for artificial neural network algorithm, while

for Naive Bayes take 0.19 second, Decision Table 2.18 seconds, SMO 0.6 and logistic

regression1.74seconds. This can be big factor that artificial can gives highest accuracy

in data model as it taken longer time to make some summation calculation between

weight and input.

4.5 Artificial Neural Network Algorithm Confusion Matrix

A techniques that can summarizing the classification based on performance algorithm

is also known as confusion matrix. The accuracy of classification based on model may

misleading or bias if the number of observations in each class is unequal. The

calculation of confusion matrix will gives the type of right and error of classification

model.In a simple word confusion matrix is summary result of prediction on a

classification problem. The summarizing the number of correct and incorrect

predictions by broken down each of class.The table 4.6 show below the confusion

matrix of artificial neural network algorithm based on gender.

Positives (L) Negatives (P)

Positives

(L)

TP (a)

89

FP (b)

15

Negatives

(P)

FN (c)

27

TN (d) 50

4.5.1 table of artificial neural network algorithmconfusion matrix based on gender

Page 53: SITI HAJAR BT MAT ZAN

37

4.5.1 Calculation

False-negative rate = c/(a+c)

27/(89+27) = 0.233

False-Positive rate = b/(b+d)

15/(15+50)= 0.230

Positive-predictive value = a/(a+b)

89/(89+15)= 0.856

Negative-predictive value = d/(c+d)

50/(27+50)= 0.650

Sensitivity (power) = a/(a+c)

89/(89+27)= 0.767

Specificity= d/(b+d)

50/(15+50)= 0.770

Efficiency = (a+d)/(a+b+c+d)

(89+50)/(89+15+27+50)= 0.768

Page 54: SITI HAJAR BT MAT ZAN

38

Algorithm Confussion Matrix

Artificial Neural Network

Naive Bayes

Decision Table

Sequential Minimal Optimization

Logistic Regression

4.5.2 Comparison of confusion matrix of 5 different data mining

Page 55: SITI HAJAR BT MAT ZAN

39

4.6 Summary

In this Chapter, it represent all the result that related to study which is data mining on

keystroke dynamic time based feature. The comparison of five algorithm also be

shown in this chapter to differentiate the capability of different algorithm which is

Naive bayes,ANN,SMO, Logistic Regression and Decision table in accuracy data

model. Moreover, the calculation based on confusion matrix and duration of building

data model been shown in this chapter.

Page 56: SITI HAJAR BT MAT ZAN

40

CHAPTER 5

CONCLUSION AND FUTURE WORKS

.

5.1Conclusion

In this Research, a high accuracy in data modelling based on keystroke dynamic in

classifying gender is aimed to study the effectiveness in classified that gender class

based on 5 different algorithm to prove that the proposed algorithm in study show the

great and best in model the data. A class of gender is selected based on the different of

typing behaviour from 2 different gender male or female that include the extracted

keystroke features which is Flight time and dwell times from 954 sample. Based on

the data, the attributes ( L or P) that relevant to with their timing based of keystroke

dynamics is found and substantiate and proven using Artificial Neural Network also

known as Neural Network. Beside that the data model is been tested to determine the

accuracy. After the data testing, the model prediction is prove to have a result of

prediction highly accurate.As conclusion, The study of gender classification through

dynamic keystroke based on mobile phone using artificial neural network has two

main phase which is the pre-processing data and the training & testing the data model.

Based on this scientific data of keystroke dynamic in distinctness of gender can be

useful for cyber security environment in implement this model in real world as

narrow down the possibility of the cyber criminal identity by identify their gender as

this approach is cost effective and among other biometric approach in acquisitions of

criminal without use of other high cost installed hardware as it used natural typing

behaviour of the people on the touch screen based keyboard.Hopefully, this project

can contribute to the cyber security department in the analysing the different typing

Page 57: SITI HAJAR BT MAT ZAN

41

gender behaviour as it can reduce the potential of computer crimes such as hacking,

phishing and online scams, committing fraud and child soliciting and abuse as it can

analyses the gender of criminal that can monitored by the security department to take

further action in preventing this criminal.

5.2 Future Work

Based on the discussed the limitation of work, some of improvement can be added for

achievable and viable that useful in cyber security world based on this data model.

The added more extracted feature keystroke dynamic data can gives best and more

accuracy in predicting and classifying the gender that not limited the used of flight

times and dwell times which is not only the time based feature but also can added the

frequency of the Delete Backspace used as it also can distinct the class of gender.

Beside that, the added of larger data sample clearly can gives best in accuracy data

model in prediction. There might be a possible attributes which is a hidden

relationship not be found due to limitations of data provided.

Page 58: SITI HAJAR BT MAT ZAN

42

REFERENCES

[1] Stephen Mayhew, “History of Biometrics | BiometricUpdate,” January 14,

2015. [Online]. Available: http://www.biometricupdate.com/201501/history-of

biometrics. [Accessed: 26-Apr-2017].

[2] Neural Network based Age and Gender Classification for Facial Images by

Thakshila R. Kalansuriya and Anuja T. Dharmaratne

[3] Bechtel, Jason, Serpen, Gursel, and Brown, Marcus. “Passphrase authentication

based on typing style through an ART 2 neural network”. In: International Journal of

Computational Intelligence and Applications 2.02 (2002), pp. 131–152

[4]A Scientific Understanding of Keystroke DynamicsKevin S. Killourhy,January

2012School of Computer Science Computer Science Department Carnegie Mellon

University Pittsburgh, PA 15213

[5]Bleha, Saleh Ali and Gillespie, Dave. “Computer user identification using the mean

and the median as features”. In: Systems, Man, and Cybernetics, 1998. 1998 IEEE

International Conference on. Vol. 5. IEEE. 1998, pp. 4379–4381.

[6] Epp, Clayton, Lippold, Michael, and Mandryk, Regan L. “Identifying Emotional

States using Keystroke Dynamics”. In: Conference on Human Factors in Computing

Systems. 2011.

[7]Furnell, Steven M., Morrissey, Joseph P., Sanders, Peter W., and Stockel, Colin T.

“Applications of keystroke analysis for improved login security and continuous user

authentication”. In: Information systems security. Chapman & Hall, Ltd. 1996, pp.

283–294.

Page 59: SITI HAJAR BT MAT ZAN

43

[8]Hocquet, Sylvain, Ramel, Jean-Yves, and Cardot, Hubert. “User classification for

keystroke dynamics authentication”. In: Advances in biometrics. Springer, 2007, pp.

[9]Ilonen, Jarmo. “Keystroke dynamics”. In: Advanced Topics in Information

Processing (2003), pp. 03–04.

[10]” An Overview Of Biometrics” Jammi Ashok 1 VakaShivashankar.V.G.S.Mudiraj

and Head Assistant Professor Associate Professor,MCADept.Department of IT,

Department of MCA Adams Engg. College, GCET, Hyderabad, India.

[11]] L. I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms. Wiley-

Interscience, 2004.

[12]Gutierrez, F.J., Lerma-Rascon, M.M. et al. 2002. Biometrics and Data Mining:

Comparison of Data Mining-Based Keystroke Dynamics Methods for Identity

Verification. Lecture Notes in Computer Science. 221-245.

[13] Rodrigues, Ricardo Nagel et al. “Biometric access control through numerical

keyboards based on keystroke dynamics”. In: Advances in biometrics. Springer, 2005,

pp. 640–646

[14]Hempstalk, Kathryn. “Continuous typist verification using machine learning”.

PhD thesis. The University of Waikato, 2009

[15] Decision Tables: Scalable ClassificationExploring RDBMS Capabilities

[16]”A mobile-based benchmark for keystroke dynamics systems “,Mohamad El-

Abed, MostafaDafer,Ramzi El Khayat ,Rafik Hariri University, Meshref, Lebanon

2014

Page 60: SITI HAJAR BT MAT ZAN

44

Gantt Chart

1 2 3 4 5 6 8 9 10 11 12 13 14 15 16

Discuss the title for the project with supervisor

Submission of project title and abstract

Precision problem statement, objective, scope and literature review

Presentation Preparation

Proposal Presentation

Proposal Correction

Design CD, ERD, DFD

Prepare documentation of proposal

Proposal slide presentation

Designing the interface

Final Presentation FYP1

Report Submission

Final Submission to Supervisor

Gantt chart (FYP 1)

activity

Week

Page 61: SITI HAJAR BT MAT ZAN

45

Gantt Chart

1 2 3 4 5 6 8 9 10

Project Meeting with Supervisor

Project Development

Testing and Documentation

Project Progress Presentation, Panel's Evaluation

Project Development& Testing

Report, Seminar Registration

Seminar Presentation and Panel's Evaluation

Finalizing Report and Documentation of the Project

Report, Logbook Submission

Gantt chart (FYP 2)

activity

Week


Recommended