+ All Categories
Home > Documents > Copyright by Rima Hirendrasinhji Rana 2019

Copyright by Rima Hirendrasinhji Rana 2019

Date post: 04-Jan-2022
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
60
Copyright by Rima Hirendrasinhji Rana 2019
Transcript
Page 1: Copyright by Rima Hirendrasinhji Rana 2019

Copyright

by

Rima Hirendrasinhji Rana

2019

Page 2: Copyright by Rima Hirendrasinhji Rana 2019

The Thesis Committee for Rima Hirendrasinhji Rana

certifies that this is the approved version of the following thesis:

International Identity Protection

APPROVED BY

SUPERVISING COMMITTEE:

Kathleen Suzanne Barber, Supervisor

Razieh Nokhbeh Zaeem

Page 3: Copyright by Rima Hirendrasinhji Rana 2019

International Identity Protection

by

Rima Hirendrasinhji Rana

Thesis

Presented to the Faculty of the Graduate School of

The University of Texas at Austin

in Partial Fulfillment

of the Requirements

for the Degree of

Master of Science in Engineering

The University of Texas at Austin

May 2019

Page 4: Copyright by Rima Hirendrasinhji Rana 2019

To my parents Sadhna Rana and Hirendrasinhji Rana and my sister Dr. Urvashi

Rana who always inspire and support me.

Page 5: Copyright by Rima Hirendrasinhji Rana 2019

Acknowledgments

First of all, I would like to sincerely thank my supervisor Dr Suzanne Barber for her

exemplary and invaluable guidance in both academic and non academic pursuits.

Her conscientious insight and encouragement were a constant source of motivation

for my research work. I would also like to extend my heartfelt gratitude to Dr

Razieh Nokhbeh Zaeem for her tremendous support in my research. I would once

again like to thank Dr Barber and Dr Zaeem for being on my thesis committee as

supervisor and reader respectively.

My special thanks to James Zaiss whose work on the ITAP project has

been a very crucial source of reference for this research. I am also thankful to my

peers in Center for Identity research group: David Liau, Teng Chieh Huang, Chia

”Christie” Ju and Kai Chih Chang and all other members of the group as well for

their stimulating discussions and overall support. I would also like to thank my

ECE graduate advisor Ms Melanie Gulick for her help and Ms Stephanie Michelle

for facilitating my appointment with the Center for Identity research group.

I am also deeply grateful to my family and friends for their endless love and

support. Last but not the least I thank the Almighty for being so kind to me.

v

Page 6: Copyright by Rima Hirendrasinhji Rana 2019

International Identity Protection

Rima Hirendrasinhji Rana, M.S.E.

The University of Texas at Austin, 2019

Supervisor: Kathleen Suzanne Barber

With the global reach of internet, protecting identity and privacy has become a con-

cern of paramount importance across the world. A person’s identifier data named as

Personally Identifiable Information (PII) is at risk of identity theft and fraud in the

cyber and the physical world. To address this problem, the Center for Identity at

the University of Texas at Austin has developed a project named Identity Ecosys-

tem which delivers a framework to understand the risk, loss value and relationship

between the PII attributes. This thesis uses the Identity Ecosystem to propose

an international approach in understanding the identity. This thesis extends the

mathematical representation and implementation model of the Identity Ecosystem

representing PII attributes and relationships to international PII. Previously, the

model had been primarily populated using data about US theft and fraud cases to

include PII attributes used to transact crime as well as accidental exposure of PII

attributes. This research describes how the content of the Identity Ecosystem and

resulting analysis change when PII attributes from international identity theft and

vi

Page 7: Copyright by Rima Hirendrasinhji Rana 2019

fraud cases are incorporated. In addition to identity theft and frauds, this thesis

compared and contrasts the notion of identity in the case of different international

legal business processes. This thesis seeks to provide a holistic picture of identity

including both theft and also legitimate scenarios internationally.

Furthermore, this thesis utilizes the Identity Ecosystem to recommend im-

provements for an internationally emerging identity management solution: blockchain-

based identity management. In a blockchain-based identity solution, the user is given

the control of his/her identity by storing personal information on his/her device and

having the choice of identity verification document used later to create blockchain

attestations. Yet, the blockchain technology alone is not enough to produce a bet-

ter identity solution. The users have a choice of identity documents to provide but

without guidelines, they do not know which document poses higher risk or liability.

This research helps the user to make such an informed decision by providing guide-

lines using the Identity Ecosystem. To summarize, this work provides a means for

designing more accurate models of PII in the international context.

vii

Page 8: Copyright by Rima Hirendrasinhji Rana 2019

Contents

Acknowledgments v

Abstract vi

List of Tables xi

List of Figures xiii

Chapter 1 Introduction 1

Chapter 2 Background: Identity Ecosystem Overview 5

2.1 Ecosystem Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.1 PII Node And Edge Model . . . . . . . . . . . . . . . . . . . 6

2.1.2 Bayesian Network . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.3 Mathematical Model . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Infer The Probability Of Breach Based On Evidence . . . . . 9

2.2.2 Detect Most Probable Origin Of A Breach . . . . . . . . . . . 9

2.2.3 Find Breach Hotspot . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Identity Theft Assessment And Prediction . . . . . . . . . . . . . . . 10

viii

Page 9: Copyright by Rima Hirendrasinhji Rana 2019

Chapter 3 Internationalization Of The Identity Ecosystem 12

3.1 Data Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 US Centric PII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3 International PII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.4 Comparative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.5 Impact Of GDPR On PII . . . . . . . . . . . . . . . . . . . . . . . . 18

Chapter 4 International Business Uses Of Identity 20

4.1 Countries And Usecases . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.2 Proposed Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.2.1 Edge: Breeder Relationship . . . . . . . . . . . . . . . . . . . 22

4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Chapter 5 Assessment of Blockchain Identity Solutions with Identity

Ecosystem 25

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.2.1 Blockchain And Its Advantages . . . . . . . . . . . . . . . . . 27

5.2.2 Identity Management With Blockchain . . . . . . . . . . . . . 28

5.2.3 Blockchain Identity Verification Solutions . . . . . . . . . . . 28

5.2.4 How Is Blockchain-Based Identity Different Than Current Iden-

tity? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.3.1 Graph Statistics Approach . . . . . . . . . . . . . . . . . . . 30

5.3.2 Query Approach . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.4.1 Results From Graph Statistics Approach . . . . . . . . . . . . 33

ix

Page 10: Copyright by Rima Hirendrasinhji Rana 2019

5.4.2 Results From Query Approach . . . . . . . . . . . . . . . . . 35

5.4.3 Combined Results . . . . . . . . . . . . . . . . . . . . . . . . 36

5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Chapter 6 Conclusion 41

Chapter 7 Future Work 43

Bibliography 44

x

Page 11: Copyright by Rima Hirendrasinhji Rana 2019

List of Tables

Table 1: List of properties for PII attributes. . . . . . . . . . . . . . . . . 7

Table 2: List of edge relationships between PII attributes. . . . . . . . . 7

Table 3: Classification of US scenarios. . . . . . . . . . . . . . . . . . . . 14

Table 4: List of countries covered for international scenarios. . . . . . . . 15

Table 5: Categorization of 30 PII attributes from the latest ITAP. . . . . 16

Table 6: Risk of exposure for US and International ITAP for some PII

attributes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Table 7: Financial consequences (in USD) for US and International ITAP

for some PII attributes. . . . . . . . . . . . . . . . . . . . . . . 17

Table 8: Data used for US and International PII. . . . . . . . . . . . . . 18

Table 9: Prior Probability and Intrinsic Loss Value. . . . . . . . . . . . . 31

Table 10: Outdegree/Number of Children. . . . . . . . . . . . . . . . . . . 31

Table 11: Indegree/Number of Parents. . . . . . . . . . . . . . . . . . . . 32

Table 12: Number of nodes in tree of depth 2. . . . . . . . . . . . . . . . . 32

Table 13: Count of high risk nodes after running query. . . . . . . . . . . 35

Table 14: Count of medium risk nodes after running query. . . . . . . . . 35

Table 15: List of high risk nodes after running query for Email Address

and Phone Number. . . . . . . . . . . . . . . . . . . . . . . . . 36

xi

Page 12: Copyright by Rima Hirendrasinhji Rana 2019

Table 16: List of high risk nodes after running query for Email Address,

Phone Number and SSN. . . . . . . . . . . . . . . . . . . . . . . 37

Table 17: List of high risk nodes after running query for Email Address,

Phone Number and Passport. . . . . . . . . . . . . . . . . . . . 37

Table 18: List of high risk nodes after running query for Email Address,

Phone Number and Driver’s License. . . . . . . . . . . . . . . . 38

Table 19: List of high risk nodes after running query for Email Address,

Phone Number and National Identity Card. . . . . . . . . . . . 38

Table 20: Number of high risk nodes for query different than ones for query

with evidence: Email Address and Phone Number. . . . . . . . 39

xii

Page 13: Copyright by Rima Hirendrasinhji Rana 2019

List of Figures

Figure 1: Snapshot of Identity Ecosystem. . . . . . . . . . . . . . . . . . . 6

Figure 2: Results of the query: Infer Probability of breach. . . . . . . . . 9

Figure 3: Results of the query: Detect orign of breach. . . . . . . . . . . . 10

Figure 4: Results of the query: Find breach hotspot. . . . . . . . . . . . . 11

Figure 5: Countries used for studying business use of identity. . . . . . . . 21

Figure 6: Process for Driver’s License in South Africa. . . . . . . . . . . . 23

Figure 7: Breeder edges between PII nodes for South Africa. . . . . . . . 23

Figure 8: Loss Value of PII Node Set. . . . . . . . . . . . . . . . . . . . . 33

xiii

Page 14: Copyright by Rima Hirendrasinhji Rana 2019

Chapter 1

Introduction

The reach of internet has drastically affected our daily lives in a number of ways

like online shopping, banking, social network, social media, etc. A person’s identity

represented by Personal Identifiable Information (PII) can encompass data describ-

ing both physical and digital attributes that serve to identify the person with the

growing intermingling of online and offline attributes. Examples of online attributes

are one’s social media accounts, online shopping patterns, passwords, and email

accounts. Offline attributes are those related to the physical world such as bank

accounts, credit and debit cards, Social Security Number, and one’s physical char-

acteristics. The physical and online world have been converging. A person may

hold a credit card, but all the information associated with the card and the person

are represented and managed in a digital form. Other examples include biometrics

that represents physical characteristics of a person (e.g. fingerprints, face, gait) or

information tracking one’s physical location that is all managed in a digital form.

The rate of online availability is rapidly increasing, exposing personal data

and leading to also increasing security risk and breach. A Javelin Strategy and

Research Report stated that PII misuse and fraud hits record high with 15.4 million

US victims in 2016, about 16% more than the previous year[9]. A comprehensive

1

Page 15: Copyright by Rima Hirendrasinhji Rana 2019

analysis of PII attributes and their relationships is necessary to protect users from

identity theft. Personally Identifiable Information (PII) demand more attention

for better understanding and security. It is evident that a comprehensive identity

framework is required on the grounds of a solid foundation of PII concepts.

The Identity Ecosystem is a tool developed at the Center for Identity at

the University of Texas at Austin that can be used to estimate the value of PII

attributes, determine the connectedness and dependencies between PII and predict

the risk of losing PII and the liability associated with the fraudulent use of respective

PII attributes[28, 20]. The Identity Ecosystem has every PII attribute modeled as a

graph node and PII attribute relationships modeled as graph edges. The probabilis-

tically determines relationship between two PII attribute nodes is very informative

in this research analysis since it represents the probability that PII attribute B can

be determined if PII attribute A is known (exposed). Ecosystem uses Bayesian Net-

work inferences to provide a better understanding of the risk for PII exposure and

to give insights for protecting PII.

The Identity Ecosystem is now expanded to cover identity theft and fraud

stories from across the globe. The previous version of Identity Ecosystem was popu-

lated with stories from USA only. However, with globally mobile citizens, the iden-

tity of an individual affects many people across different geographical locations. A

person who travels to Australia from the USA may have multiple addresses. His/her

digital device like a mobile phone or laptop, storing personal information or credit

card details, can be exposed in the USA or Australia. By building international PII,

we obtain a comprehensive knowledge of PII in today’s connected world[25].

The international PII is studied more in detail to get a clearer picture of

identity being collected and used in many legitimate government businesses uses

in different countries. This data can be used to provide a framework to analyze

the PII attributes overlapping and the differences and patterns in these usecases.

2

Page 16: Copyright by Rima Hirendrasinhji Rana 2019

International PII is not limited to just identity theft but also expanded to legiti-

mate uses. The data from the legitimate usecases of PII is used to study a new

type of relationship between PII attributes: breeder. The breeder relationship in-

dicates if an instance/value of PII A is needed in order to create a (legitimate or

fraudulent/counterfeit) instance/value of PII B, where A is the source and B is the

destination of the edge. This relationship can be used to denote the edges between

the PII attributes in the Identity Ecosystem to compare the model of identity for

each country in legitimate scenarios.

Furthermore, this thesis uses the Identity Ecosystem to recommend improve-

ments to blockchain-based identity management, an internationally emerging iden-

tity management solution[24]. Identity Management(IdM) systems based on the

blockchain technology have been on the rise in the past decade. Blockchain - the de-

centralized ledger system - provides a unique answer addressing security and privacy

with its embedded immutability. In a blockchain-based identity solution, the user

is given the control of his/her identity by storing personal information on his/her

device and having the choice of identity verification document used later to cre-

ate blockchain attestations. Yet, the blockchain technology alone is not enough to

produce a better identity solution. The user cannot make informed decisions as to

which identity verification document to choose if he/she is not presented with tan-

gible guidelines. We analyze different PII options given to users for authentication

on current blockchain-based solutions. Based on the Identity Ecosystem model, we

evaluate these options and their risk and liability of exposure. Powered by real-

world data of about 6,000 identity theft and fraud stories, the model recommends

some authentication choices and discourages others. This work paves the way for a

truly effective identity solution based on blockchain by helping users make informed

decisions and motivating blockchain identity solutions to introduce better options to

their users. This work provides a holistic picture for international identity covering

3

Page 17: Copyright by Rima Hirendrasinhji Rana 2019

theft and fraud, legitimate use, and blockchain-based identity verification.

The rest of the thesis is organized as follows: Chapter 2 briefly introduces

the Identity Ecosystem covering the model used to represent the identity attributes

and the ITAP project used to populate the Ecosystem. Chapter 3 compares the PII

in the context of USA versus international PII detailing the trends and relationships

between the PII identifiers. Chapter 4 extends the International Identity by deep

diving into business uses of identity and introducing a new relationship between

PII attributes. Chapter 5 explains the blockchain based identity with an emphasis

on decentralized identity. It uses Identity Ecosystem to contrast and understand

PII used for blockchain-based verification solutions. Chapter 6 concludes with a

summary of the thesis work. Chapter 7 provides insights for future work.

4

Page 18: Copyright by Rima Hirendrasinhji Rana 2019

Chapter 2

Background: Identity

Ecosystem Overview

A person’s identity can be represented in the form of set of data related to that per-

son. Each data is called a Personally Identifiable Information (PII) attribute and

example of these attributes are phone number, email and name. These attributes

can be physical as well as online. Examples of each category include physical at-

tributes like social security card and online attributes like email and social media

account. Attributes can be considered unique for each individual like Social Security

Number(SSN) and can be same for a group of people like zipcode. To get a better

understanding of our research, the basics and the mathematical model of Identity

Ecosystem along with queries are explained in this chapter [28]. Lastly, we also

cover the Identity Theft Assessment and Prediction (ITAP) used to provide input

to Ecosystem for identity theft and fraud stories.

5

Page 19: Copyright by Rima Hirendrasinhji Rana 2019

2.1 Ecosystem Model

The Identity Ecosystem is a graph-based identity model wherein the nodes represent

an individual’s identity attributes and the edges represents the relationships between

these nodes as shown in Figure 1 [28]. It provides a statistical framework for

understanding the value, risk and mutual relationships of PII attributes.

Figure 1: Snapshot of Identity Ecosystem.

2.1.1 PII Node And Edge Model

Ecosystem models each of PII attributes as a graph node. It can color or size

the attributes based on their properties. Different properties identified for a PII

attribute in Ecosystem are shown in Table 1. Identity attributes are related to each

other in a number of ways. These relationships are modelled as edges between the

nodes in Ecosystem. The Ecosystem model defines various relationship between PII

attributes as shown in Table 2.

The most prevalent and presently used relationship is the probabilistically

6

Page 20: Copyright by Rima Hirendrasinhji Rana 2019

Table 1: List of properties for PII attributes.1. Type (categories PII based on their nature): What You Are/ Have/ Know/ Do.2. Risk of exposure: Low, Medium, High.3. Liability Value (the monetary loss if compromised): Low, Medium, High.4. Possession (identifies if individuals necessarily have the PII): Essential, Accidental.5. Verification Accuracy At Enrollment : Low, Medium, High.6. Prevalence : Ubiquitous, Common, Rare.7. Uniqueness : Individual, Small Group, Large Group.8. Verification Invasiveness : Low, Medium, High.

Table 2: List of edge relationships between PII attributes.1. A breeds B2. A composed of B3. A changes sensitive to B4. A temporally precedes B5. A determines B6. A necessary for B7. A probabilistically determines B

determines relationship, by which if identity attribute A is exposed, the probability

of exposing attribute B can be determined. For our research, we have used the

probabilistically determines edges between all the nodes for US-Centric vs interna-

tional PII and blockchain-based identity verification. We have proposed the breeder

relationship edge between PII attributes for the legitimate business use of identity.

2.1.2 Bayesian Network

Bayesian Network is a probabilistic graphical model (a type of statistical model)

that represents a set of random variables and their conditional dependencies via a

directed acyclic graph (DAG)[3]. Bayesian Network is an appropriate fit for the

Identity Ecosystem. Firstly, Bayesian network helps to understand about causal

relationships, which are natural and universal in relationships of PII. Causal rela-

tionships help us to understand the problem domain and also allow predictions in

the presence of interventions. Secondly, Bayesian Network is able to handle incom-

7

Page 21: Copyright by Rima Hirendrasinhji Rana 2019

plete data sets. The Identity Ecosystem is getting its data presently from online

identity theft stories which are continuously growing and sometimes not very accu-

rate. Lastly, Bayesian networks along with statistical techniques enable the domain

knowledge and data together. This helps especially when the Identity Ecosystem

varies in data completeness and scale[31].

2.1.3 Mathematical Model

The Identity Ecosystem stores known data about identity attributes in a probabilis-

tic model and performs Bayesian Network-based inference to calculate the posterior

effects on each attribute. The Identity Ecosystem is represented as a graph G(V,

E) consisting of N attributes A1,...,AN and a set of directed edges as a tuple ei,j=

< i, j > where Ai is the originating node and Aj is the target node such that 1 ≤ i,

j ≤ N as shown in Figure 1. Each edge ei,j represents a possible path by which Aj

can be breached given that Ai is breached. Every edge has a conditional probability

attached for the same path.

Every node Ai is labeled with a Boolean random variable, denoted D(Ai),

which is true if the attribute has been exposed/breached and false otherwise. Every

parent Ai affects the probability of exposure for its child Aj.

Every node also has a prior probability P(Ai) which is the probability of

getting exposed on its own. It varies for each PII attributes as some PII nodes

like date of birth are easily exposed as compared to SSN. Every node also has a

monetary loss value L(Ai) associated with itself. It represents the intrinsic value

associated if it is breached.

2.2 Queries

The Identity Ecosystem can answer some relevant questions related to the overall

risk and liability of a person for managing identity attributes. At present, there

8

Page 22: Copyright by Rima Hirendrasinhji Rana 2019

are three type of queries being supported in the Identity Ecosystem, namely, infer

probability of breach based on evidence, detect most probable origin of a breach

and find breach hotspots.

2.2.1 Infer The Probability Of Breach Based On Evidence

The Identity Ecosystem user can use this query to understand the magnitude and

reach of impact on risking a set of PII. For example, if someone by mistake exposed

his/her Social Security Number and wants to know the result of exposure. We can

select the Social Security Number and Social Security Card as evidence and run

the query as shown in Figure 2. The result provides a table of all PII attributes

impacted with loss value estimated given the breach.

Figure 2: Results of the query: Infer Probability of breach.

2.2.2 Detect Most Probable Origin Of A Breach

If a set of attributes have been exposed, what was the most likely origin of the

breach? For example, if a person knows his/her Credit Card Number has been

exposed, he/she wants to know what the cause of this exposure is. We can run

9

Page 23: Copyright by Rima Hirendrasinhji Rana 2019

query 2, select Credit card number as evidence and run the query as shown in

Figure 3. The user gets a bar graph showing probable cause of breach and can act

on securing it.

Figure 3: Results of the query: Detect orign of breach.

2.2.3 Find Breach Hotspot

For the case of credit card number exposure, we can find the breach hotspots so

we can prevent further breaches, i.e. the nodes whose exposure will cost the most

in terms of total loss (intrinsic loss plus secondary loss downstream). On running

query 3 with the evidence of credit card number, we get a list of nodes which are

hotspots as shown in Figure 4.

2.3 Identity Theft Assessment And Prediction

The Identity Ecosystem currently takes its input from the Identity Threat Assess-

ment and Prediction (ITAP) project. The ITAP[30] is a risk assessment tool which

collects case data from sources like law enforcement and the news media. It signifi-

cantly enhances the understanding of identity theft processes and patterns of threats

and vulnerabilities. The ITAP models instances of identity crime and accumulates

10

Page 24: Copyright by Rima Hirendrasinhji Rana 2019

Figure 4: Results of the query: Find breach hotspot.

them to analyze and describe the identity vulnerabilities, the value of identity at-

tributes, and their risk of exposure[27, 29]. The ITAP model describes the business

process; comprising of inputs, process steps, outputs, consequences, and victims im-

pacted; by which PII is deliberately stolen, accidentally exposed, and fraudulently

used[10]. The ITAP database is structured, elaborate and growing. The repository

is covering around 6000 such stories to provide a comprehensive picture of identity

theft. Earlier the ITAP focused on criminal stories of identity based in the US but

now it has extended its coverage to many countries across the world. The cases

analyzed in this thesis are the latest data from ITAP as of December 2018..

11

Page 25: Copyright by Rima Hirendrasinhji Rana 2019

Chapter 3

Internationalization Of The

Identity Ecosystem

Identity theft and fraud are not just a US problem. According to a new report from

Risk Based Security, in 2016, there were 4,149 confirmed breaches exposing more

than 4.2 billion records globally. That is approximately 3.2 billion more records

than were exposed in 2013, the previous all-time high[9].

This chapter 1 describes how the content of the Identity Ecosystem and re-

sulting analysis change when PII attributes from international identity theft and

fraud cases are incorporated. Not only are the PII attributes different in an inter-

national Identity Ecosystem, the relationships between PII attributes change, the

monetization value of PII attributes change, and the risk of exposure change when

worldwide identity theft and fraud cases are considered.

The previous Identity Ecosystem was limited to a model that utilizes PII

1Rima Rana, Razieh Nokhbeh Zaeem, K. Suzanne Barber; US-Centric vs. International Per-sonally Identifiable Information: A Comparison Using the UT CID Identity Ecosystem; In 2018International Carnahan Conference on Security Technology (ICCST), pages 1-5, Oct 2018. RimaRana is the primary author of this paper and worked on the execution, design and graph based com-parative analysis of US/International PII with respect to Identity Ecosystem and drafted significantparts of the paper.

12

Page 26: Copyright by Rima Hirendrasinhji Rana 2019

stories based in the USA. However, with globally mobile citizens, the identity of

an individual affects many people across different geographical locations. A person

may have multiple addresses in different countries. His/her digital device like a

mobile phone or laptop, storing personal information or credit card details, can

be exposed in any of the country. A security incident that happens to the mobile

device may expose personal information of family members or related organizations

worldwide and result in financial losses across countries. By getting international

PII, we obtain a comprehensive knowledge of PII in today’s connected world.

3.1 Data Source

The Identity Ecosystem is populated by modeling stories of real world theft and

fraud cases in the Identity Threat Assessment and Prediction (ITAP) project [30].

The ITAP is a risk assessment tool, which collects case data from sources like law

enforcement and the news media and has covered almost 6,000 identity theft and

fraud scenarios. A team of modelers at the Center for Identity analyzes identity

theft news and stories daily to model the value of identity attributes and their risk

of exposure[27, 29]. Specifically, the ITAP model describes the business process

(inputs, process steps, outputs, consequences, and victims impacted) by which PII

is deliberately stolen, accidentally exposed, and fraudulently used. Earlier, the

ITAP focused on stories in the US, however, now with the inclusion of international

PII attributes, it gets global with around 12.3% (and increasing) international cases.

The ITAP stories coverage is extensive and ranges from Australia, New Zealand, and

the UK to China and India with also some representation from South Africa, South

Korea, Singapore, Thailand, Middle East and even cybercrimes that are independent

of the geographical location. The ITAP analyzes fraud cases based on market sector,

demographics of victims and now victims from different countries along with the US

to compute a model of identity to provide a unique global calculation of risk of

13

Page 27: Copyright by Rima Hirendrasinhji Rana 2019

Table 3: Classification of US scenarios.Category Example of PII

Health related Patient Medical Records, Health Insurance Policy NumberBank related CVV Code, Card Number, Account Number, Payment Reference NumberCitizenship US Passport Number, Social Security Number, I-94Cyber/Online Email account, IP Address, User CredentialsEmployer IRS Documents, W2 Form, Job TitleFamily Spouse PII, Relatives PII

exposure and intrinsic loss value.

3.2 US Centric PII

The previous implementation of the Identity Ecosystem used ITAP identity theft

and fraud stories based in the United States, enabling sophisticated inference about

people, devices and organizations in the identity space. Financial consequences of

exposure of the PII attributes in US cases is calculated and used to compare with

the international PII.

3.3 International PII

Identity breach is now a widespread problem across the globe with increasing de-

gree of sophistication in hacking involved. The Identity Ecosystem gets input from

ITAP considering international hacks, attacks and breach stories. This helps in eval-

uating the risk and effect on other PII for an individual or related people, devices

or organizations. The financial consequences involved in exposure of international

PII are multiplicatively enhanced with an organization having customer base across

continents risking identities of many people.

To understand the Personally Identifiable Information in International con-

text, identity thefts in different countries are considered but also cyber world attacks

impacting individuals from more than one country. Recent data breach from major

14

Page 28: Copyright by Rima Hirendrasinhji Rana 2019

Table 4: List of countries covered for international scenarios.Afghanistan Austria Australia AzerbaijanBalkans Bangladesh Belgium BermudaBrazil Canada Chile ChinaColombia Denmark England DubaiEuropean Countries (Other) France Germany GhanaIndia Iran Ireland IsraelItaly Japan Lithuania MexicoMiddle East Netherlands New Zealand NigeriaPakistan Philippines Puerto Rico RomaniaRussia Saudi Arabia Scotland SingaporeSouth Africa South Korea Spain SwedenSwitzerland Syria Taiwan ThailandTurkey UK Ukraine USAVietnam Zimbabwe

internet companies, phishing scams, hacking, server breaches, malware attacks and

many other different types of cyberattacks are compromising the online PII of an

individual which violates privacy in the cyber as well as physical world.

Though some of the identity attributes are common between US centric PII

and international PII, the impact of the exposure and risk varies numerically and

even the relationships, i.e. the edges, change. For example, PII node of Social Secu-

rity Number can be used to identify US PII uniquely but can be associated with a

person belonging to another nationality in the international PII and multiplicatively

increase the monetary consequences given the SSN PII is breached as compared to

the same pretext in US PII scenario.

International PII for organizations will have differences in impact of exposure

of any PII attribute node depending on the deployment and data storage solutions.

The risk and financial loss involved in the international PII node exposure for an

onpremise solution is observed to be less as compared to an organization having

cloud solution[12].

15

Page 29: Copyright by Rima Hirendrasinhji Rana 2019

Table 5: Categorization of 30 PII attributes from the latest ITAP.US Centric Country Specific Global

I-94 Record Visa Credit CardMedicaid ID Number Medical Patient Record PassportW2 Tax Information Email AccountSocial Security Number Insurance Date of BirthIRS Login Zip Code IP AddressDriver and Vehicle Information Database Doctors Name Debit Card NumberNational Identity Number Nationality IMEITelmate Account Information Stock Options URLElectronic Filling ID Information City of Residence BitcoinLSAT Score Court Documents Security Question and Answer

3.4 Comparative Analysis

Previously, the Identity Ecosystem had focused on US theft and fraud stories. This

research describes how the content of the Identity Ecosystem and resulting analysis

change when PII attributes from international identity theft and fraud cases are

incorporated. To incorporate internationalization into the Identity Ecosystem, the

international PII attributes are introduced from international theft and fraud stories.

Consequently, the values of all PII attributes, US and international, change in terms

of monetary loss details for global breaches, and so does the existing relationships

between PII attributes and the risk of PII exposure resulting from a data breach.

The PII attributes are divided into three categories based on scope, value and risk for

this global examination of PII: US Specific, Country Specific and Globally Prevalent

in Table 5. Distinguishing country specific and globally prevalent PII attributes, as

well as the relationship between PII attributes in the international context, improves

our understanding and fine-tunes the calculation of probabilities. This research aims

to enhance our understanding of the scope, value, and risk levels of PII attributes

in the international context, thereby, improving our understanding of how best to

protect these PII attributes from theft and fraud.

The categorization of PII attributes helps in understanding how change of

geographical location affects certain PII attributes and does not affect others. A

16

Page 30: Copyright by Rima Hirendrasinhji Rana 2019

Table 6: Risk of exposure for US and International ITAP for some PII attributes.PII Attribute Type US International

IP Address Global 0.001456 0.000534Date of Birth Global 0.233261 0.000178Zip Code Country specific 0.003275 0.001246Visa details Country specific 0.000364 0.001246W-2 Form US 0.004367 0.008188National Identity Number US 0.000728 0.003560

Table 7: Financial consequences (in USD) for US and International ITAP for somePII attributes.

PII Attribute Type US International

IP Address Global 3822967 7223816Date of Birth Global 6275159 10835724Zip Code Country specific 116453 4728544Visa details Country specific 15291869 6328318W-2 Form US 6855956 5025014National Identity Number US 7645934 5418592

person’s online PII attributes like email accounts and social media accounts remain

globally the same. On the other hand, PII attributes pertaining to tax or medical

information are country specific. Such country specific attributes tend to change,

based on the geographical location, in the value of PII attribute as well as the effect

on the edges with other PII attributes and hence the risk of exposure. Every PII

Attribute in both the ITAP Scenarios have the probability of getting exposed from

the probabilistically determines edge relationship which is calculated as the risk

of exposure. Financial consequences relate to the dollar value associated with the

theft of that PII according to the ITAP stories collected. Based on the identity theft

covered in the US ITAP scenario stories and international ITAP stories, analysis can

be drawn about the risk of exposure and financial losses associated with PII attribute

depending on the category to which it belongs.

Observing some of the PII attributes of each category (US specific, coun-

try specific and global), we can see for risk of exposure and financial consequences

17

Page 31: Copyright by Rima Hirendrasinhji Rana 2019

Table 8: Data used for US and International PII.International ITAP US ITAP

Number of Nodes 542 296Number of Edges 4742 1758Number of Connected Nodes 423 232Percentage of Connected nodes/Total nodes 78.04% 78.37%

between US and International ITAP. Attributes belonging to the Global PII like

Date of Birth and IP Address, increase both in risk of exposure and financial con-

sequences as the ITAP theft stories go from US based to international. As seen in

Tables 6 and 7, PII attributes like Zip code and Visa details, which are country

specific, have a reduction in the risk of exposure; however, they witness an increase

in the financial loss. Lastly, while PII attributes relative to US such as W-2 form or

National Identity Number have an increase in risk of exposure, they show a slight

decrease in monetary losses.

Looking at the nodes used for international and US based ITAP data in Table

8, we observe the number of PII nodes have almost doubled. The number of edges

i.e., probabilistically determines relationships, have increased to almost thrice. The

percentage of connected nodes remains constant at 78%, however with the increase

in number of edges, the financial impact of exposing one PII attribute increases

multiplicatively in international scenarios.

3.5 Impact Of GDPR On PII

The General Data Protection Regulation aims to give control of personal data to the

citizens of the European Union(EU) [7]. After its implementation on May 25 2018,

it provides rules and procedures for the processing of the Personally Identifiable

Information of individuals inside the EU and applies to all the organizations doing

business in European Economic Area, irrespective of location. GDPR provides the

EU citizens with the ‘right to be forgotten’ measure to claim removal of their name

18

Page 32: Copyright by Rima Hirendrasinhji Rana 2019

alongside certain keywords in search engine results[15]. ‘Right of Access’ helps the

EU citizen to access a processed copy of their personal data along with requests to

third party, storage, erasure, purpose and many others[14]. Many companies have

updated their privacy policies and informed their users and customers to comply

with the GPPR regulation[17]. This has also led companies to tighten up their

security and shift towards privacy practices for not just EU users, but all the users,

hence reducing the risk of exposure for international PII. Thus, implications of

GDPR can have a positive impact on the protection of international PII[13], reducing

the value of monetary loss propagated by the exposure of international PII.

19

Page 33: Copyright by Rima Hirendrasinhji Rana 2019

Chapter 4

International Business Uses Of

Identity

The Identity Ecosystem presently is powered by real world data from identity theft

and fraud stories. The first version of Ecosystem focused only on the real theft

stories from the USA and later on was expanded to incorporate fraud and breach

stories internationally. It becomes important to study not only the fraudulent and

criminal usage of PII but also extend the understanding of identity to its legitimate

use.

In this chapter, we explore the global business uses of identity. We have

selected 10 countries from different continents along with the USA and study few

usecases involving business use of PII. We note the process and dependencies be-

tween the PII attributes involved in these operations. First we enlist the countries

and usecases used for this research and later propose a model for an identity ecosys-

tem for each country. We plan to compare the graph model of identity for each

country and observe similarities and differences between their Ecosystem represen-

tations in the future.

20

Page 34: Copyright by Rima Hirendrasinhji Rana 2019

4.1 Countries And Usecases

The countries used for the study are shown in the following Figure 5[11]. The coun-

tries selected are Brazil, Russia, India, China, South Africa, Australia, Argentina,

Egypt, Iran, Turkey and USA. We have selected different countries from across the

world to have holistic representation to compare the legitimate identity usage in the

respective country against the pattern in the USA.

Figure 5: Countries used for studying business use of identity.

Identity is used in different legitimate areas: public as well as private. In

order to understand global business use of identity, we propose to study scenarios

from different sectors. These are the present scenarios list used for this ongoing

research:

• Driver’s License

• National Identification

• Medical Card

21

Page 35: Copyright by Rima Hirendrasinhji Rana 2019

• Taxation Number/Card

• Credit Card

4.2 Proposed Model

The PII attributes used in the generation of each usecase is extensively researched

and compared for each country. We observe that each PII can be represented as

an identity node in Identity Ecosystem and a relationship between the input and

output nodes: Breeder relationship.

4.2.1 Edge: Breeder Relationship

An edge between two nodes α and β is represented as breeder relationship : α

breeds β. It denotes an instance/value of α is needed in order to create a legitimate

or fraudulent/counterfeit instance/value of β, where each of α and β is a personally

identifying information attribute. In this chapter, we are focused on the legitimate

instance generation and define such relationship between PII attributes. It is also

important to note, α might not be sufficient to breed a β, other PII attributes are

also required.

We represent the breeder relationship between the PII attributes in the le-

gitimate generation of Driver’s License for South Africa. In Figure 6, we show the

process of generation of a driver’s license in South Africa[2, 6] and from this dataflow,

we are able to ascertain the breeder relationship between the PII attributes involved

as shown in Figure 7. We can observe that ‘Name’ is used to breed ‘Learner’s Li-

cense’ which is again used to breed ‘Driver’s License’. ‘Learner’s License’ along

with all the other listed PII attributes leads to the generation of a new instance of

‘Driver’s License’.

22

Page 36: Copyright by Rima Hirendrasinhji Rana 2019

Figure 6: Process for Driver’s License in South Africa.

Figure 7: Breeder edges between PII nodes for South Africa.

4.3 Discussion

For each PII node, we plan to define properties which will help to categorize the PII

further in the legitimate instances. Node properties like type(what you are / have

/ know/ do), value, level of assurance and being country specific can help to define

each PII node further in the legitimate settings. We also plan to compare the PII

properties from the identity theft stories from ITAP and study the in/out degree

for each PII node. Finally we would like to compare Identity Ecosystem for each

country against the USA and deduce the similarities/differences for PII identifiers

23

Page 37: Copyright by Rima Hirendrasinhji Rana 2019

involved in the Ecosystem for that country. Finally, the Identity Ecosystem can be

represented as a model for identity taking input from identity theft and fraud stories

and from legitimate business uses.

4.4 Future Work

This work is ongoing and, currently at the data collection phase, for each country

and usecase mentioned above. It can be used to represent the Ecosystem model for

each country in legitimate settings and compare to derive trends. There are possible

research questions which can be addressed using this work:

• If a set of PII attributes breeds another set, what is the ratio between the risk

and value of the input set to the output set?

• There are cases like breeding ‘Driver’s License’, which need Proof of Identity

and Proof of Address, a number of PII attributes can be used to satisfy the re-

quirement. Which is the best set of PII used to generate the ‘Driver’s License’

while minimizing risk and loss value?

• We observed during the data collection that sometimes there exists a cyclic

dependency between PII attributes. For example, ‘Aadhar Identity Card’

is used to generate ‘India’s Driver’s License’ and ‘Driver’s License’ is also

accepted as valid proof for generating ‘Aadhar Identity Card’. This type

of cyclic dependencies between PII attributes can be observed in multiple

countries.

24

Page 38: Copyright by Rima Hirendrasinhji Rana 2019

Chapter 5

Assessment of Blockchain

Identity Solutions with Identity

Ecosystem

5.1 Introduction

According to the Identity Theft Resource Center’s report, there was a 126% increase

in the number of breached records that contained sensitive Personally Identifiable

Information (PII) from 2017 to 2018 [1]. Identity breaches can happen in the most

unexpected places. Many victims reported incidents in which PII were compromised

through third party vendors[1], such as Identity management (IdM) systems. With

such a high rate of identity breaches, it becomes imperative to address the issue of

risk involved with PII in Identity management systems1.

IdM systems based on the blockchain technology have been on the rise in the

1Rima Rana, Razieh Nokhbeh Zaeem, K. Suzanne Barber; An assesment of blockchain identitysolutions: minimizing risk and liability of authentication; In 2019 International Conference on WebIntelligence, under submission. Rima Rana is the primary author and in collaboration with RaziehNokhbeh Zaeem contributed to design, conception, interpretation and drafting of the research paperinvolving authentication in Blockchain-based Identity Solutions using Identity Ecosystem.

25

Page 39: Copyright by Rima Hirendrasinhji Rana 2019

past decade. A blockchain is a shared database or decentralized distributed ledger.

Bitcoin ledger was the first blockchain but now many more exist across the world. A

blockchain consists of chains of “blocks” containing information about transactions,

participants, and identifiers. The features that make blockchain very interesting are

immutability, security, and transparency.

In blockchain-based identity solutions, different applications and services are

available. A self sovereign identity solution gives the control of one’s personal infor-

mation to the individual in a more secure environment[26]. The user first creates an

account with the blockchain identity service and then logs into the account. Here

the user provides his/her choice of government attested identity document used for

verification which is obfuscated to create blockchain attestations[8, 16, 5]. The IdM

solution offers the user multiple choices of documents that are acceptable. The user,

however, cannot make an informed decision about which identity verification docu-

ment to use without proper guidelines. Yet, blockchain-based IdM solutions do not

provide such guidelines.

We help the user of a blockchain-based IdM in making informed decisions

about what identity documents to use on the IdM. We achieve this goal by evaluating

the current blockchain-based identity verification solutions with the help of Identity

Ecosystem[28] which covers more than 6000 identity theft stories from across the

world[25].

In order to provide guidelines to users about choosing identity verification

documents wisely in the current blockchain identity solutions, we are using the PII

nodes’ graph properties and relationships and the first query of Identity Ecosystem.

We perform experiments and analyze the identity sets required in specific use cases

to recommend more secure authentication choices and avoid the risky and/or costly

PII. This work provides a holistic picture for the user to make informed decisions

on blockchain-based IdM systems and inspires the providers of these systems to

26

Page 40: Copyright by Rima Hirendrasinhji Rana 2019

improve their IdM solutions[24].

The chapter is organized as follows: Section 2 deals with the related work

covering blockchain, identity management with blockchain and identity verification

solutions based on blockchain. Section 3 provides describes our two approaches

using the comprehensive identity framework model, Identity Ecosystem. Section 4

deep dives into comprehensive analysis for evaluation. Finally, Section 5 discusses

the results from the blockchain related research..

5.2 Related Work

5.2.1 Blockchain And Its Advantages

Blockchain is a new technology supporting distributed ledgers that does not need

a central authority to validate transactions; instead these transactions are shared

amongst peers [22, 23]. It has different type of consensus mechanisms in place

to achieve its state of transactions in its record, named the ledger [19]. Apart

from cryptocurrency, blockchain has found many applications including identity

management. As previously suggested in relaetd work [21, 4], the advantages of

using blockchain to IdM are as follows:

• Decentralization: Identity stored in a ledger is not controlled by a single au-

thority.

• Immutability: As transactions are appended only and verified by all members,

its integrity can be checked by anyone.

• Transparency: Everything on the ledger is visible to everyone.

• Security: As the blockchain is maintained and verified by so many actors, no

one can influence the state without getting the majority.

27

Page 41: Copyright by Rima Hirendrasinhji Rana 2019

5.2.2 Identity Management With Blockchain

Blockchain-based Identity solutions encrypt a user’s identity, hash it and add its

attestations to the blockchain ledger. These attestations are later used in order to

prove the user’s identity. It also incorporates the following different ways [21]. This

section covers the important schemes/concepts used in these solutions.

• Decentralized Identity: This identity solution is similar to the conventional

identity management solutions where credentials from a trusted service are

used. The only difference arises is the storage of validated attestations on a

distributed ledger for later validation by a third party.

• Self-sovereign identity: A person is the one who owns and controls his/her

identity without heavily relying on central authorities. It provides a framework

to enable exchange of information and propagation of trust between peers.

• Zero knowledge proofs: It is a cryptographic method by which one party (the

prover) can prove to another party (the verifier) that they know a value x,

without conveying any information apart from the fact that they know the

value x. It is used in blockchain to perform authentication without giving the

secret to the other party [18].

5.2.3 Blockchain Identity Verification Solutions

Our research focuses on the blockchain-based identity verification solutions. We

have studied different blockchain-based IdM services offered by multiple companies

like Authenteq [8], ShoCard [16], and Civic [5]. In all these services, the user first

logs in to the web/mobile app, and provides his email address and phone number

to create an account. The user then selects which government issued identity docu-

ments to use for identity verification, e.g., Passport, National Identity Card, Driving

License, or Social Security Number. The user scans one of these identity verification

28

Page 42: Copyright by Rima Hirendrasinhji Rana 2019

documents and the app verifies the document against a third party. After the check,

the user identity is confirmed, attested on the blockchain and can be reused. Our

research seeks to help the user by providing guidelines to answer the following ques-

tion: “For blockchain-based identity verification, which is the best set of identity

attributes to use to minimize the risk of exposure and loss value while completing

the authentication process?” It is important to note that identity attributes are

being shared with third parties and also data is stored in the individual’s device

creating a risk for breach [1]. Presence of unsafe mobile apps and malware installed

on a device can trigger a breach in the entire mobile device risking the PII used in

the blockchain-based IdM service.

5.2.4 How Is Blockchain-Based Identity Different Than Current

Identity?

Digital Identity is one of the biggest problems on the internet and still there are

a lot of security issues surrounding proving a digital identity. However, with the

concept of self-sovereign identity being implemented with blockchain, there is a

change in the actors involved in controlling identity rather than just the technical

system process changing [21]. Now the individual is able to control his/her identity

by choosing which PII attributes to use for authentication and also storing the

personal information hashed securely on blockchain, available to everyone. This

contrasts with the present system where a person hands over the identity document

to a third party. The third party controls and stores these documents in a central

database and sometimes shares it without the individual’s knowledge. Yet, it is not

sufficient alone to produce a technologically better identity solution, if the user is

not properly trained how to best use this technology.

With the blockchain-based identity, the extent of control over identity in-

creases with which an individual can make more informed decisions about which

29

Page 43: Copyright by Rima Hirendrasinhji Rana 2019

attributes to use for proving identity. This leads to the bigger question of how can

an individual make a better decision in choosing which identity documents to use

to prove his/her identity on blockchain-based services. We aim to provide helpful

guidelines using Identity Ecosystem with which the risk of exposure and impact of

using different PII attributes for identity verification can be compared.

5.3 Methodology

In this section, we seek to provide guidelines about choosing identity documents

best suited for authentication while minimizing the risk and liability for blockchain-

based identity verification solutions. We use our previous work, the probabilistic

tool Identity Ecosystem, focusing on the graphical model and query used for our

research. We define two approaches using the Identity Ecosystem to apply to the

PII attributes from these solutions.

5.3.1 Graph Statistics Approach

Studying three popular blockchain-based identity verification solutions, we observed

the use of a limited set of PII (identity documents) for authentication in them [8, 16,

5]. The PII attributes used in these solutions are Email Address and Phone Number

for account creation or enrollment and either of these government issued identity

cards: Social Security Number, Driver’s License Number, Passport Information, and

National Identity Number.

We use the Identity Ecosystem to investigate different properties associated

with these PII nodes. As we discussed in previous sections, each identity attribute

is represented as a graph node in Ecosystem. This node has a prior probability,

meaning the probability this node is likely to be exposed on its own before the

breach evidence set is given. It also has an intrinsic loss value in US Dollars based

on the collection of all identity theft and fraud cases researched involving these PII.

30

Page 44: Copyright by Rima Hirendrasinhji Rana 2019

Table 9: Prior Probability and Intrinsic Loss Value.PII Node Prior Probability Loss(USD)

Social Security Number 0.096598 27465086Driver’s License Number 0.008719 2314811Passport Information 0.002565 1252465National Identity Number 0.000342 0Email Address 0.027526 18105024Phone Number 0.017439 4405490

Table 10: Outdegree/Number of Children.PII Node Number of Children

Social Security Number 10Driver’s License Number 21Passport Information 11National Identity Number 1

We tabulate the prior probability and intrinsic loss value for each of the PII nodes

frequently used in the blockchain IdM solutions in Table 9.

The Identity Ecosystem used for this research is modelled as a directed graph

with identity attributes as nodes and “probabilistically determines exposure” rela-

tionship between them as edges [28]. We tabulate for each of the PII of interest,

their outdegree which is the number of children with loss values and indegree which

is the number of parents in Tables 10 and 11 respectively. (A child in the Identity

Ecosystem is a PII that might be exposed, with a certain probability, if its parent is

exposed.) We also show the number of nodes present in a tree rooted at that partic-

ular PII node with a depth of 2 in Table 12. The total loss value for different sets

of PII attributes used for identity verification is shown in Figure 8. The impact of

breach of PII can be determined by its children and its risk of exposure by parents.

We analyze all the numbers and determine trends and discuss the results in a later

section.

31

Page 45: Copyright by Rima Hirendrasinhji Rana 2019

Table 11: Indegree/Number of Parents.PII Node Number of Parents

Social Security Number 27Driver’s License Number 8Passport Information 4National Identity Number 2

Table 12: Number of nodes in tree of depth 2.PII Node Number of Nodes

Social Security Number 153Driver’s License Number 282Passport Information 200National Identity Number 1

5.3.2 Query Approach

We use the Identity Ecosystem to run the first query “estimating the exposure of

breach given a set of identity attributes are exposed”. We run this query providing

set of evidence from each case mentioned below for the authentication in blockchain-

based identity verification solutions. We compare the following cases as the email

address and phone number are required before choosing any one identity document

for verification:

• Email Address + Phone number (Base Case)

• Email Address + Phone number + SSN

• Email Address + Phone number + Passport

• Email Address + Phone number + Driver’s License

• Email Address + Phone number + National Identity Card

The goal is to identify which of the above sets, if compromised, introduces

a smaller increase in the risk of exposure and the liability value of the individual

32

Page 46: Copyright by Rima Hirendrasinhji Rana 2019

Figure 8: Loss Value of PII Node Set.

identity as a whole, including possible future compromises that might stem from that

event. With the Bayesian network in Identity Ecosystem, a posterior probability

of exposure and loss of monetary value is calculated for all the affected nodes and

nodes are classified into high, medium and low categories based on their risk and

loss values. The nodes with high risk level after the breach in each case is tabulated

and analyzed further in the next section.

5.4 Evaluation

In this section, we extensively investigate the results from both the approaches dis-

cussed above and analyze them. We demonstrate the results on the current Identity

Ecosystem comprising of 627 nodes covering more than 6000 identity theft stories

and what insights these results provide us for blockchain-based identity verification.

5.4.1 Results From Graph Statistics Approach

We first apply the graph statistics approach on the data. We have listed the prior

probability and intrinsic loss value for the identity nodes in question for authentica-

33

Page 47: Copyright by Rima Hirendrasinhji Rana 2019

tion in Table 9. We observe that the prior probability and loss value is highest for

Social Security Number. We see Social Security Number has the highest probability

of being involved in an identity theft and fraud case, at least 10 times more than

any other node considered. From Figure 8, we can observe that the loss value of

SSN along with email address and phone number is twice as compared to any other

PII set. All these indicate that Social Security Number is at highest risk of exposure

and leads to the maximum financial consequences.

We have also listed the number of children for those identity nodes in Table

10. This data shows Driver’s License Number has the highest number of children

which is twice as much than any other identity node. It will impact the greatest

number of nodes if it is exposed. We can also deduce the number of children for

National Identity Number is 1 or there are no outgoing edges from it. Hence, it is

most safe to use for blockchain-based identity verification in our research question

based on this data. To emphasize the trends in vulnerability impact of the selected

identity nodes, we list the number of nodes in the tree of depth 2 in Table 12.

It further proves our observation where Driver’s License Number has the highest

number (281) and National Identity Number keeps its impact restricted to 1 node,

itself.

Not only have we considered the outdegree, we have also tabulated the in-

degree or number of parents for the identity nodes selected in Table 11. As per the

data in Table 11, Social Security Number has the maximum number of parents, i.e.

27 which is almost thrice than indegree for any other node. This shows that breach

of Social Security Number can be triggered by many more paths leading to it than

other nodes in consideration. Social Security Number has the greatest number of

parents hence it is at most risk of getting exposed. The more the number of parents,

the more edges leading to it, and the higher the risk of it getting exposed given any

parent is breached.

34

Page 48: Copyright by Rima Hirendrasinhji Rana 2019

Table 13: Count of high risk nodes after running query.Evidence Node Set High Risk Node Count

Email Address, Phone Number 22Email Address, Phone Number, SSN 22Email Address, Phone Number, Passport 24Email Address, Phone Number, Driver’s License 29Email Address, Phone Number, National Identity Card 22

Table 14: Count of medium risk nodes after running query.Evidence Node Set Medium Risk Node Count

Email Address, Phone Number 82Email Address, Phone Number, SSN 92Email Address, Phone Number, Passport 88Email Address, Phone Number, Driver’s License 96Email Address, Phone Number, National Identity Card 82

5.4.2 Results From Query Approach

We also applied the query approach on our data in Identity Ecosystem. As we have

seen in many blockchain-based IdM solutions, email address and phone number are

used to set up accounts, hence we have noted them as the base case. Then we have

listed down each of the set of PII exposed and run the query for that evidence set.

We have listed the count of high and medium risk nodes after running the query

of inferring the extent of breach when a set of identity nodes are exposed in Tables

13 and 14. We observe the maximum number of high and medium risk nodes for

Driver’s License along with email address and phone number. This shows that using

Driver’s License for proving one’s identity leads to more breaches than any other

document and hence is most risky.

The list of all the high risk nodes after running the query of breach in Identity

Ecosystem for various evidence sets has been shown in Tables 15 to 19. Using these

lists, we have captured the number of unique high risk nodes for each of the evidence

set as compared to base case of email address and phone number in Table 20. These

35

Page 49: Copyright by Rima Hirendrasinhji Rana 2019

Table 15: List of high risk nodes after running query for Email Address and PhoneNumber.1. Username 2. Name3. Expiration Date 4. Address5. Date of Birth 6. Social Security Number7. Account Number 8. Credit Card Number9. Debit Card Information 10. ID Card Information11. Stolen Driver’s License Information 12. Driver’s License Number13. Bank Account Number 14. Monetary Amount15. CVV Code 16. User Credentials17. Account Information 18. Physical Address19. Check Information 20. Patient Medical Record21. Routing Number 22. Insurance Policy Information

nodes are the unique high risk nodes present only in each case and not the base case

implying higher number of unique high risk nodes exposed than the base case. We

observe the count is highest for the evidence set of Driver’s License which is almost

double as compared to all the others. The result analysis is not very straightforward.

These are the extra number of identity nodes which will be at risk of exposure if the

PII email address, phone number and Driver’s License are breached as compared

to when only email address and phone number are exposed. The more the number

of nodes in the result of this query, the more is the impact of breach. We would

recommend to select an identity node with less number of nodes at risk.

5.4.3 Combined Results

Combining both the graph statistics and query approach, we also observed National

Identity Card has the lowest prior probability for getting exposed and lowest number

of children, hence it will be least at risk of exposing other identity nodes. Also, the

number of parents for National Identity Number is the least putting it least at risk

as compared to other PII nodes. The number of high and medium risk nodes getting

breached after running query for National Identity Card as evidence set is the least

36

Page 50: Copyright by Rima Hirendrasinhji Rana 2019

Table 16: List of high risk nodes after running query for Email Address, PhoneNumber and SSN.1. Password 2. Username3. Bank Account Information 4. Name5. Address 6. Date of Birth7. Debit Card Information 8. ID Card Information9. Stolen Driver’s License Information 10. Monetary Amount11. W-2 Form Information 12. CVV Code13. User Credentials 14. Employee Record15. Fake ID Card Information 16. Account Information17. Personal Identification Number (PIN) 18. Biographic Data19. Check Information 20. Birth Certificate Information21. Routing Number 22. Insurance Policy Information

Table 17: List of high risk nodes after running query for Email Address, PhoneNumber and Passport.1. Password 2. Username3. Name 4. Expiration Date5. Address 6. Date of Birth7. Social Security Number 8. Account Number9. Credit Card Number 10. ID Card Information11. Stolen Driver’ s License Information 12. Bank Account Number13. Monetary Amount 14. User Credentials15. Signature 16. Employee Record17. Fake ID Card Information 18. Account Information19. Personal Identification Number (PIN) 20. Biographic Data21. Check Information 22. Routing Number23. Date 24. Insurance Policy Information

37

Page 51: Copyright by Rima Hirendrasinhji Rana 2019

Table 18: List of high risk nodes after running query for Email Address, PhoneNumber and Driver’s License.1. Bank Account Information 2. Name3. Patient Database 4. Address5. Date of Birth 6. Social Security Number7. Credit Card Number 8. Debit Card Information9. ID Card Information 10. Monetary Amount11. W-2 Form Information 12. Bank Card Expiration Date13. CVV Code 14. Login Credentials15. User Credentials 16. Signature17. Employee Record 18. Fake ID Card Information19. Account Information 20. Personal Identification Number (PIN)21. Physical Address 22. Biographic Data23. Passport Information 24. Customer Database25. Check Information 26. Patient Medical Record27. Date 28. Insurance Policy Information29. Expiration Date

Table 19: List of high risk nodes after running query for Email Address, PhoneNumber and National Identity Card.1. Bank Account Information 2. Name3. Address 4. Date of Birth5. Social Security Number 6. Debit Card Information7. ID Card Information 8. Stolen Driver’s License Information9. Monetary Amount 10. CVV Code11. User Credentials 12. Employee Record13. Fake ID Card Information 14. Account Information15. Personal Identification Number (PIN) 16. Biographic Data17. Passport Information 18. Check Information19. Date 20. Insurance Policy Information21. Expiration Date 22. Patient Database

38

Page 52: Copyright by Rima Hirendrasinhji Rana 2019

Table 20: Number of high risk nodes for query different than ones for query withevidence: Email Address and Phone Number.Evidence Node Set Unique High Nodes Count

Email Address, Phone Number, SSN 8Email Address, Phone Number, Passport 7Email Address, Phone Number, Driver’s License 14Email Address, Phone Number, National Identity Card 8

confirming the observation from the first approach. Hence, using National Identity

Card as identity verification document is recommended for blockchain-based solu-

tions as it fulfills the authentication requirements and minimizes risk and liability.

5.5 Discussion

We studied the various PII used in identity verification for blockchain-based IdM

services. We tried to find out ways to determine the best set of PII to be used

for proving a person’s identity in self-sovereign identity systems. Such PII can be

used to complete authentication capabilities but also minimize risk and liability of

exposure. We used the Identity Ecosystem, developed at the Center for Identity at

the University of Texas at Austin, to provide two approaches. In the first approach,

we investigated the PII graph node properties: their prior probability of risk and

intrinsic loss value. For a specific PII node, its parents and children were observed

to determine the likely exposure and influence it propagates to others. We conclude

that the Social Security Number is the most risky node with highest prior probability

and initial monetary loss and maximum number of children in the graph increasing

its risk.

In the second approach, we leveraged the query of breach in the Ecosystem

when a set of identity attributes are compromised. We studied the impact on the

other identity nodes by calculating the posterior probability and related loss. We

also analyzed the differences in the nodes at risk in the highest risk level against

39

Page 53: Copyright by Rima Hirendrasinhji Rana 2019

the base case of documents required to open an IdM account, i.e., when only email

address and phone number are exposed. The more the number of high and medium

risk nodes in the results, the more the impact of the breach. We conclude the

Driver’s License has the highest impact and so using this kind of attribute for

identity certification will increase risk to the individual’s privacy.

Using both the graph statistics and query approach, we also concluded that

National Identity Card is the most recommended identity document to use for verifi-

cation in blockchain-based IdM solutions. It has the lowest prior probability, lowest

number of children and number of high and medium risk nodes getting breached

after running query. Hence, using National Identity Card as identity verification

document is a good option as it fulfills the authentication requirements and mini-

mizes risk and liability.

The graph statistics and query approach give us different but justifiable

results. The graph statistic approach analyzes the identity attribute nodes and

their relationships with other nodes. The query approach based on the graph and

Bayesian network adds to the probability model which is used in the Ecosystem and

defines its results. As the number of identity theft and fraud stories continue to grow

in our ITAP database, a more comprehensive study can be undertaken to capture a

better picture of the blockchain-based identity verification. With the security and

privacy increasing in the new blockchain-based IdM solutions, technologies can be

customised to minimize breach of identity while performing authentication.

40

Page 54: Copyright by Rima Hirendrasinhji Rana 2019

Chapter 6

Conclusion

In this thesis, we studied the internationalization of identity and further investigated.

This work addresses three research problems: the US-centric vs. International PII:

the similarities and differences with respect to Identity Ecosystem, the legitimate

business uses of identity in the international scenario and the authentication in

blockchain based Identity Management services present worldwide.

This work seeks to improve our understanding of PII attributes in the in-

ternational context. It hence enhances our ability to protect these PII attributes

from theft and fraud. A novel concept of international PII was introduced which

offers an insight into how personally identifiable information is utilized in different

countries. We are interested in the differences between PII in the US context ver-

sus that of the international one. Previously, the UT CID Ecosystem could answer

queries related to people, devices, and organizations for the USA. By combining

and analyzing identity theft stories from across the globe, more comprehensive and

holistic results from the Identity Ecosystem were derived.

The use of identity is also prevalent in the global business context in a legal

manner beyond identity theft and fraudulent stories. We studied and modeled

the international identity in legitimate business usecases for a few countries and

41

Page 55: Copyright by Rima Hirendrasinhji Rana 2019

compared it with those of the USA. We also added a new type of edge relationship

between the PII attributes as a breeder edge. This research is still ongoing and

is planned to be used to populate the Identity Ecosystem for each country. The

Identity Ecosystem will have an option to model identity either in the context of

identity theft or legitimate business use.

We also studied the various PII attributes used in identity verification for

blockchain based IdM services. We tried to find out ways to determine the best set

of PII attributes to be used for proving a person’s identity in self-sovereign identity

systems. The PII attributes can be successfully used to complete authentication

capabilities but also have a risk of exposure. We used the Identity Ecosystem de-

veloped at the Center for Identity to provide two approaches: graph statistics and

query approach. We concluded the Social Security Number due to its high risk of

exposure and Driver’s License with its high impact to other PII attributes are risky

to be used for performing authentication on blockchain based Identity services. The

National Identity Number, with low prior risk and loss value and few children, is

the recommended option for identity verification. This work can provide the much

needed tangible guidelines used in choosing identity verification document by a user

in self-sovereign blockchain-based solution.

In summary, this thesis is a thorough guide to international identity cover-

ing theft/fraud, legitimate business use, and blockchain based IdM authentication

services.

42

Page 56: Copyright by Rima Hirendrasinhji Rana 2019

Chapter 7

Future Work

There are still many research topics in the internationalization of identity worthy

of further investigation. First of all, the potential coverage from ITAP can be

expanded to include more identity theft stories from more countries. This will

help to predict more accurate PII for the international context since the Identity

Ecosystem envisions to identify and authenticate people, devices and organizations

internationally. The queries present in the Identity Ecosystem can be extended

further to provide more insight into identity. Identity in blockchain is still a growing

domain, Identity Ecosystem can be used to answer more questions of this area.

Finally, the present legitimate use of identity, covered as part of this research,

may be applied to more number of countries and scenarios to encompass a holistic

picture of identity. The risk and loss value for each PII in the input and output

settings for the business use can be compared to derive trends. This work may also

act as another input to the Identity Ecosystem providing different models of identity

theft and of legitimate business use. We would like to build an Ecosystem model

for each country: one for identity theft and another for global business scenario

coverage. The two different inputs to the Identity Ecosystem may provide a clear

vision of identity for each country and internationally as a whole.

43

Page 57: Copyright by Rima Hirendrasinhji Rana 2019

Bibliography

[1] 2018 end of year data breach report - identity theft resource cen-

ter. https://www.idtheftcenter.org/2018-end-of-year-data-breach-report/. (Ac-

cessed on 04/11/2019).

[2] Apply for a driving licence - department-of-transport.

http://www.transport.gov.za/apply-for-a-driving-licence. (Accessed on

04/15/2019).

[3] Bayesian network - wikipedia. https://en.wikipedia.org/wiki/Bayesian network.

(Accessed on 04/15/2019).

[4] Civic - white paper (draft)3-4.indd. https://tokensale.civic.com/ CivicToken-

SaleWhitePaper.pdf. (Accessed on 04/11/2019).

[5] Civic decentralized reusable kyc services - blockchain-powered.

https://www.civic.com/solutions/kyc-services/. (Accessed on 04/11/2019).

[6] Driving licence — south african government.

https://www.gov.za/services/services-residents/driving/driving-licence. (Ac-

cessed on 04/15/2019).

[7] Eugdpr information portal. https://eugdpr.org/. (Accessed on 04/15/2019).

44

Page 58: Copyright by Rima Hirendrasinhji Rana 2019

[8] Home — identity verification & kyc — authenteq. https://authenteq.com/.

(Accessed on 04/11/2019).

[9] Identity fraud hits record high with 15.4 million u.s. victims in 2016,

up 16 percent according to new javelin strategy & research study

— javelin. https://www.javelinstrategy.com/press-release/identity-fraud-hits-

record-high-154-million-us-victims-2016-16-percent-according-new. (Accessed

on 04/15/2019).

[10] Identity threat and assessment prediction (itap).

https://identity.utexas.edu/research-projects/identity-threat-and-assessment-

prediction-itap. (Accessed on 05/09/2019).

[11] Interactive visited countries map — amcharts.

https://www.amcharts.com/visitedcountries/.(Accessedon04/15/2019).

[12] Iso/iec 27018:2019 - information technology – security techniques – code of prac-

tice for protection of personally identifiable information (pii) in public clouds

acting as pii processors. https://www.iso.org/standard/76559.html. (Accessed

on 04/27/2019).

[13] Policybriefgeneral+data+protection+regulation(gdpr)052418.pdf.

https://static1.squarespace.com/static/568af8d2d82d5e25a610856b/t/

5b08050c1ae6cf2a417d2ecc/1527252236967/PolicyBriefGeneral

+Data+Protection+Regulation(Accessed on 04/27/2019).

[14] Right of access gdpr eu.org. https://www.gdpreu.org/the-regulation/list-of-

data-rights/right-of-access/. (Accessed on 04/27/2019).

[15] Right to be forgotten — general data protection regulation (gdpr).

https://gdpr-info.eu/issues/right-to-be-forgotten/. (Accessed on 04/27/2019).

45

Page 59: Copyright by Rima Hirendrasinhji Rana 2019

[16] Shocard identity management use cases — shocard.

https://shocard.com/identity-management-use-cases/. (Accessed on

04/11/2019).

[17] So what does gdpr mean for you? - identity theft resource center.

https://www.idtheftcenter.org/so-what-does-gdpr-mean-for-you/. (Accessed

on 04/27/2019).

[18] A zero-knowledge proof: Improving privacy on a blockchain — altoros.

https://www.altoros.com/blog/zero-knowledge-proof-improving-privacy-for-a-

blockchain/. (Accessed on 04/11/2019).

[19] Christian Cachin and Marko Vukolic. Blockchain consensus protocols in the

wild. arXiv preprint arXiv:1707.01873, 2017.

[20] Kai Chih Chang, Razieh Nokhbeh Zaeem, and K Suzanne Barber. Enhancing

and evaluating identity privacy and authentication strength by utilizing the

identity ecosystem. In Proceedings of the 2018 Workshop on Privacy in the

Electronic Society, pages 114–120. ACM, 2018.

[21] P. Dunphy and F. A. P. Petitcolas. A first look at identity management schemes

on the blockchain. IEEE Security Privacy, 16(4):20–29, July 2018.

[22] T. M. Fernndez-Carams and P. Fraga-Lamas. A review on the use of blockchain

for the internet of things. IEEE Access, 6:32979–33001, 2018.

[23] Marco Iansiti and Karim R Lakhani. The truth about blockchain. Harvard

Business Review, 95(1):118–127, 2017.

[24] R. Rana, R. N. Zaeem, and K. S. Barber. An assesment of blockchain identity

solutions:minimizing risk and liability of authentication. In 2019 International

Conference on Web Intelligence, under submission.

46

Page 60: Copyright by Rima Hirendrasinhji Rana 2019

[25] R. Rana, R. N. Zaeem, and K. S. Barber. Us-centric vs. international personally

identifiable information: A comparison using the ut cid identity ecosystem.

In 2018 International Carnahan Conference on Security Technology (ICCST),

pages 1–5, Oct 2018.

[26] Andrew Tobin and Drummond Reed. The inevitable rise of self-sovereign iden-

tity. The Sovrin Foundation, 29, 2016.

[27] Yongpeng Yang, Monisha Manoharan, and K Suzanne Barber. Modelling and

analysis of identity threat behaviors through text mining of identity theft sto-

ries. In 2014 IEEE Joint Intelligence and Security Informatics Conference,

pages 184–191. IEEE, 2014.

[28] R. N. Zaeem, S. Budalakoti, K. S. Barber, M. Rasheed, and C. Bajaj. Pre-

dicting and explaining identity risk, exposure and cost using the ecosystem

of identity attributes. In 2016 IEEE International Carnahan Conference on

Security Technology (ICCST), pages 1–8, Oct 2016.

[29] Razieh Nokhbeh Zaeem, Monisha Manoharan, Yongpeng Yang, and K Suzanne

Barber. Modeling and analysis of identity threat behaviors through text mining

of identity theft stories. Computers & Security, 65:50–63, 2017.

[30] Jim Zaiss, Razieh Nokhbeh Zaeem, and K Suzanne Barber. Identity threat

assessment and prediction. Journal of Consumer Affairs, 53(1):58–70, 2019.

[31] Liang Zhu. Specialization in the identity ecosystem. Master’s thesis, The

University of Texas at Austin, 2014.

47


Recommended