Modeling and Detecting Anomalous Topic Access Siddharth Gupta 1, Casey Hanson 2, Carl A Gunter 3,...

Modeling and Detecting Anomalous Topic Access

Siddharth Gupta1, Casey Hanson2, Carl A Gunter3, Mario Frank4, David Liebovitz4, Bradley Malin6

1,2,3,4Department of Computer Science, 3,5Department of Medicine, 6Department of Biomedical Informatics

1,2,3University of Illinois at Urbana-Champaign, 4University of California, Berkeley, 5Northwestern University, 6Vanderbilt University

• Motivation and Challenges• Our Contributions• Dataset Description• Random Topic Access (RTA) Model• Random Topic Access Detection (RTAD) Model• Evaluation and Results

Outline of the talk

Reported on April 2013

• The University of Florida : 2 offenders illegitimately accessed 15,000 patients over 3 years (March 2009- October 2012).

• Personal information, including names, addresses, date of birth, medical record numbers and Social Security numbers were compromised for the purposes of billing fraud.

• One of the offender was the insider in the hospital without prior.

• How can we efficiently model and detect these types of attacks in the healthcare system.

EMR Access Breach

• Two broad classes of threats:• Inside Threats: the behaviors of hospital users (staff) that adversely affects the

healthcare institution, where they commit financial frauds, medical identity thefts and curiosity accesses to EMR.

• Outside Threats: an outsider entity hires an insider to commit fraud, a visitor accessing records on open computers in some scenarios, untrustable patient seeking information about other patient’s records.

• Ramifications: Irreversible violation of patient privacy and subsequent high cost for hospitals.

• Deterrent: The current legal deterrent is a number of legal regulations, such as the HIPAA and HITECH, which impose specific privacy rules for patients and financial penalties for violating them

Motivation

• Build a classifier on labeled data to differentiate anomalous users from legitimate users.

• Real healthcare data is not labeled.

• Current methods use injection of synthetic anomalous users and evaluate on them.

Classical Detection Methodologies

• In Healthcare information systems the primary mechanism for generating anomalous users is to associate users with random patients in the dataset.

• We call such a system, ROA (random object access).

• The resulting user doesn’t appear to be a plausible attacker in the real hospital setting.

Random Object Access

• Random Topic Access (RTA): we introduce and study a random topic access model or RTA aimed at users whose access may be illegitimate but is not fully random because it is focused on common semantic themes.

• User Simulation: we utilize the latent topic framework to simulate illegitimate users and model them as samples from a Dirichlet distribution over topic multinomials.

• Anomaly Detection Framework: study RTA to detect and evaluate the users having suspicious access patterns.

Our Contributions

Data Set

Fig a) Summary Statistics for Audit Logs

Fig b)Summary Statistics for Patient Records

• Random Topic Access (RTA) Model: a mechanism for utilizing latent topic structures to represent real users in the population and allow for the synthetic generation of semantically relevant anomalous users.

• Topic modeling can provide a concise description of how a user behaves in the context of his peers and the meaning of that behavior.

• Model users as samples from a Dirichlet distribution over topic multinomials.

Random Topic Access (RTA) Model

Latent Dirichlet Allocation (LDA)

Diagnosis Raw FeaturePatient

...1 0 1 0 1

LDA

Diagnosis Topic FeaturePatient

1 0.2 0.1 0.70

Topic Distributions

Topics Distributions

Diagnosis Topics

Neoplasm Topic Obstetric Topic Kidney Topic

Characterizing Users

Topic 1 Topic 2 Topic 30

0.10.20.30.40.50.60.70.80.9

1

User and Accessed Patient Topic Distributions

Patient 1: 100 times Patient 2: 30 times User

Topic ID

P(To

pic)

Patient 1 Patient 20

10

20

30

40

50

60

70

80

90

100

Number of Accesses

Multidimensional Scaling: Patient Diagnosis

RTA: Simulating Users• r ~ Dir() with n dimensions, where n is the number of topics.

a.) Directed or Masquerading User (α<1) : an anomalous user of some specialty gains sole access to the terminal of another user in the hospital.

b.) Purely Random User (α=1): user is characterized by completely random behavior, with little semantic congruence to the hospital setting

c.) Indirect User: user type resembles an even blend of the topics of many specialized users

Population Distribution

α = 0.01 α = 0.1

α = 1 α = 100

A. Directed Users

B. Purely Random Users C. Indirected Users

Role Distribution

NMH Resident Fellow CPOE

Masquerading Users Purely Random Users

Indirect Users

Anomalous Users

Real Users

• Random Topic Access Detection (RTAD): an anomaly detection framework that generates synthetic users using RTA and applies a standard spatial outlier, k-nearest neighbor k-NN detection scheme for classification.

• Methodology1. LDA: define patient topics, and user typing to represent users in the topic

space.2. RTA user injection: generate three types of anomalous users and insert into

each role at a 5% mix rate.3. Detection (k-NN): if the ratio of the avg. distance from a user to its k nearest

spatial neighbors to the avg. pairwise distance among those neighbors is greater than a threshold, call the user anomalous.

4. Evaluation Metric: best Area Under the Curve (AUC) for each , role combination.

Random Topic Access Detection (RTAD)

Results - I

The best AUC across all evaluated dimensions is plotted for each role performing poor for .

Results - II

The best AUC across all evaluated dimensions is plotted for each role performing well or near average for .

Thank You !

Contact: [email protected]

Sponsors:

Date post:	26-Dec-2015
Category:	Documents
Upload:	frederica-tamsyn-shaw
View:	214 times
Download:	1 times

Modeling and Detecting Anomalous Topic Access Siddharth Gupta 1, Casey Hanson 2, Carl A Gunter 3,...

Documents