Date post: | 26-Dec-2015 |
Category: |
Documents |
Upload: | frederica-tamsyn-shaw |
View: | 214 times |
Download: | 1 times |
Modeling and Detecting Anomalous Topic Access
Siddharth Gupta1, Casey Hanson2, Carl A Gunter3, Mario Frank4, David Liebovitz4, Bradley Malin6
1,2,3,4Department of Computer Science, 3,5Department of Medicine, 6Department of Biomedical Informatics
1,2,3University of Illinois at Urbana-Champaign, 4University of California, Berkeley, 5Northwestern University, 6Vanderbilt University
• Motivation and Challenges• Our Contributions• Dataset Description• Random Topic Access (RTA) Model• Random Topic Access Detection (RTAD) Model• Evaluation and Results
Outline of the talk
Reported on April 2013
• The University of Florida : 2 offenders illegitimately accessed 15,000 patients over 3 years (March 2009- October 2012).
• Personal information, including names, addresses, date of birth, medical record numbers and Social Security numbers were compromised for the purposes of billing fraud.
• One of the offender was the insider in the hospital without prior.
• How can we efficiently model and detect these types of attacks in the healthcare system.
EMR Access Breach
• Two broad classes of threats:• Inside Threats: the behaviors of hospital users (staff) that adversely affects the
healthcare institution, where they commit financial frauds, medical identity thefts and curiosity accesses to EMR.
• Outside Threats: an outsider entity hires an insider to commit fraud, a visitor accessing records on open computers in some scenarios, untrustable patient seeking information about other patient’s records.
• Ramifications: Irreversible violation of patient privacy and subsequent high cost for hospitals.
• Deterrent: The current legal deterrent is a number of legal regulations, such as the HIPAA and HITECH, which impose specific privacy rules for patients and financial penalties for violating them
Motivation
• Build a classifier on labeled data to differentiate anomalous users from legitimate users.
• Real healthcare data is not labeled.
• Current methods use injection of synthetic anomalous users and evaluate on them.
Classical Detection Methodologies
• In Healthcare information systems the primary mechanism for generating anomalous users is to associate users with random patients in the dataset.
• We call such a system, ROA (random object access).
• The resulting user doesn’t appear to be a plausible attacker in the real hospital setting.
Random Object Access
• Random Topic Access (RTA): we introduce and study a random topic access model or RTA aimed at users whose access may be illegitimate but is not fully random because it is focused on common semantic themes.
• User Simulation: we utilize the latent topic framework to simulate illegitimate users and model them as samples from a Dirichlet distribution over topic multinomials.
• Anomaly Detection Framework: study RTA to detect and evaluate the users having suspicious access patterns.
Our Contributions
Data Set
Fig a) Summary Statistics for Audit Logs
Fig b)Summary Statistics for Patient Records
• Random Topic Access (RTA) Model: a mechanism for utilizing latent topic structures to represent real users in the population and allow for the synthetic generation of semantically relevant anomalous users.
• Topic modeling can provide a concise description of how a user behaves in the context of his peers and the meaning of that behavior.
• Model users as samples from a Dirichlet distribution over topic multinomials.
Random Topic Access (RTA) Model
Latent Dirichlet Allocation (LDA)
Diagnosis Raw FeaturePatient
...1 0 1 0 1
LDA
Diagnosis Topic FeaturePatient
1 0.2 0.1 0.70
Topic Distributions
Topics Distributions
Diagnosis Topics
Neoplasm Topic Obstetric Topic Kidney Topic
Characterizing Users
Topic 1 Topic 2 Topic 30
0.10.20.30.40.50.60.70.80.9
1
User and Accessed Patient Topic Distributions
Patient 1: 100 times Patient 2: 30 times User
Topic ID
P(To
pic)
Patient 1 Patient 20
10
20
30
40
50
60
70
80
90
100
Number of Accesses
Multidimensional Scaling: Patient Diagnosis
RTA: Simulating Users• r ~ Dir() with n dimensions, where n is the number of topics.
a.) Directed or Masquerading User (α<1) : an anomalous user of some specialty gains sole access to the terminal of another user in the hospital.
b.) Purely Random User (α=1): user is characterized by completely random behavior, with little semantic congruence to the hospital setting
c.) Indirect User: user type resembles an even blend of the topics of many specialized users
Population Distribution
α = 0.01 α = 0.1
α = 1 α = 100
A. Directed Users
B. Purely Random Users C. Indirected Users
Role Distribution
NMH Resident Fellow CPOE
Masquerading Users Purely Random Users
Indirect Users
Anomalous Users
Real Users
• Random Topic Access Detection (RTAD): an anomaly detection framework that generates synthetic users using RTA and applies a standard spatial outlier, k-nearest neighbor k-NN detection scheme for classification.
• Methodology1. LDA: define patient topics, and user typing to represent users in the topic
space.2. RTA user injection: generate three types of anomalous users and insert into
each role at a 5% mix rate.3. Detection (k-NN): if the ratio of the avg. distance from a user to its k nearest
spatial neighbors to the avg. pairwise distance among those neighbors is greater than a threshold, call the user anomalous.
4. Evaluation Metric: best Area Under the Curve (AUC) for each , role combination.
Random Topic Access Detection (RTAD)
Results - I
The best AUC across all evaluated dimensions is plotted for each role performing poor for .
Results - II
The best AUC across all evaluated dimensions is plotted for each role performing well or near average for .