Universal Outlier Hypothesis Testing
Venu Veeravalli ECE Dept & CSL & ITI
University of Illinois at Urbana-Champaign http://www.ifp.illinois.edu/~vvv
(with Sirin Nitinawarat and Yun Li)
NSF Workshop on Signal Processing for Big Data
March 22, 2013
2
Statistical Outlier Detection
• Single sequence of observations
• Normal observations follow some fixed (possibly unknown) distribution or generating mechanism
• Outliers follow different generating mechanism
• Goal: To find outliers efficiently
• Applications: fraud detection, public health monitoring, cleaning up sensor data, etc.
v
Veeravalli, NSF Big Data 3/22/13
3
Fraud Detection
• Example: spending records for a male graduate student
Trans-actions
Grocery Gas Books ---
Amount 30$ 35$ 75$
Normal behavior
Veeravalli, NSF Big Data 3/22/13
4
Fraud Detection
• Example: spending records for a male graduate student
Trans-actions
Grocery Gas Books --- Spa Women apparels
Cosmetics
Amount 30$ 35$ 75$ 250$ 1500$ 500$
Normal behavior Abnormal behavior
Veeravalli, NSF Big Data 3/22/13
5
Fraud Detection: Group Monitoring
• Male graduate students
Student 1 Student 2 Student 3 … Student M
Grocery Dining Grocery … Gas
Dining Grocery Gas … Books
Books Movie Books … Grocery
… … … … …
Movie Books Spa … Dining
Grocery Gas Women apparel … Movie
Gas Books Cosmetics … Grocery
Veeravalli, NSF Big Data 3/22/13
6
Outlier Hypothesis Testing
• M sequences of observations, with M large • Almost all sequences are generated from
common typical distribution
• Small subset of sequences generated from different (outlier) distributions
Veeravalli, NSF Big Data 3/22/13
7
Outlier Hypothesis Testing
• M sequences of observations, with M large • Almost all sequences are generated from
common typical distribution
• Small subset of sequences generated from different (outlier) distributions
• Special case: o Exactly one sequence is generated from outlier
distribution
o Goal: to detect outlier sequence efficiently o Universal setting: neither typical nor outlier
distributions known; no training data provided
Veeravalli, NSF Big Data 3/22/13
8
Universal Outlier Hypothesis Testing
• Typical distribution
• Outlier distribution
8
µπ
πµ π π ππ µ π π ππ µπ π π
π µππ ππ π ππ µ
21 MH1
H
2
H
M
Veeravalli, NSF Big Data 3/22/13
9
Mathematical Model
yn(1) y
n(2) … y
n(M )
y1(1) y
1(2) … y
1(M )
y2(1) y
2(2) … y
2(M )
H
i: p
iy(1), …, y(M )( ) = µ y
k(i )( ) π y
k( j )( )
j ≠ i∏
⎡
⎣⎢⎢
⎤
⎦⎥⎥
k = 1
n
∏
y(1)
y(2)
y(M )
Veeravalli, NSF Big Data 3/22/13
10
Hypothesis Testing Problem
Universal Detector: δ : Y Mn → 1, …, M{ }
H
i: p
iyMn( ) = µ y
k(i )( ) π y
k( j )( )
j ≠ i∏
⎡
⎣⎢⎢
⎤
⎦⎥⎥
k = 1
n
∏
Nothing is known about (µ,π ) except that they are distinct
Independent of (µ,π )
Veeravalli, NSF Big Data 3/22/13
11
Performance Metrics
• Maximal error probability:
• Exponent for maximal error probability:
α δ, µ, π( )( ) = lim
n→∞ −
1
nloge δ, µ, π( )( )
e δ, µ, π( )( ) = max
iP
iδ(yMn )≠ i{ }
Veeravalli, NSF Big Data 3/22/13
12
Performance Metrics
• Maximal error probability:
• Exponent for maximal error probability:
α δ, µ, π( )( ) = lim
n→∞ −
1
nloge δ, µ, π( )( )
e δ, µ, π( )( ) = max
iP
iδ(yMn )≠ i{ }
Consistency : e→ 0 as n→∞Exponential Consistency : α> 0
Veeravalli, NSF Big Data 3/22/13
13
Background: Binary Hypothesis Testing
H
1: p
1(y) =
k=1
n
∏ π(yk) H
2: p
2(y) =
k=1
n
∏µ(yk)
Chernoff Info :C(µ,π) = max
0≤s≤1− log µ
y∑ (y )sπ(y )1−s⎛
⎝⎜⎜⎜⎜
⎞
⎠⎟⎟⎟⎟
If (µ,π) known, δML
(y) = argmaxi
logpi(y)
has α(δ,(µ,π)) = C(µ,π) > 0 ← exponential consistency
Veeravalli, NSF Big Data 3/22/13
14
Outlier Hypothesis testing: known (µ,π )
Hi: p
iyMn( ) = µ y
k(i )( ) π y
k( j )( )
j ≠ i∏
⎡
⎣⎢⎢
⎤
⎦⎥⎥
k = 1
n
∏
ML Rule : δ
ML(yMn ) = argmax
i logp
i(yMn )
πµ π π ππ µ π π ππ µπ π π
π µππ ππ π ππ µ
21 MH1
H
2
H
M
Exponential Consistency : α(δML
,(µ,π)) = 2B(µ,π)
Bhattacharya Distance :B(µ,π) = −log µy∑ (y )1/2π(y )1/2⎛
⎝⎜⎜⎜⎜
⎞
⎠⎟⎟⎟⎟
Veeravalli, NSF Big Data 3/22/13
15
Binary Hypothesis Testing: Unknown
H
1: p
1(y) =
k=1
n
∏ π(yk) H
2: p
2(y) =
k=1
n
∏µ(yk)
µ
If µ unknown
for any given δ there existsµ s.t. α = 0
No exponential consistency!
Veeravalli, NSF Big Data 3/22/13
16
Outlier Hypothesis Testing: unknown µ
H
i: p̂
iyMn( ) = µ̂
iy
k(i )( ) π y
k( j )( )
j ≠ i∏
⎡
⎣⎢⎢
⎤
⎦⎥⎥
k = 1
n
∏
πµ π π ππ µ π π ππ µπ π π
π µππ ππ π ππ µ
21 MH1
H
2
H
M
Generalized Likelihood (GL) Rule :
δGL
(yMn ) = argmaxi
logp̂i(yMn )
Exponential Consistency : α(δ
GL,(µ,π)) = 2B(µ,π)
Same as known µ,π
µ uknown: µ̂i
= γi← empirical Distribution
Veeravalli, NSF Big Data 3/22/13
17
Sanov’s Theorem
• Sanov’s Theorem: For i.i.d. rvs exponent of probability that random empirical distribution falls in closed set E is
Yn p,
Veeravalli, NSF Big Data 3/22/13
18
Key Tool: Sanov’s Theorem
• Sanov’s Theorem: For i.i.d. rvs exponent of probability that random empirical distribution falls in closed set E is
Yn p,
lim
n −
1
nlogP Empirical Y n( ) ∈ E{ } = min
q∈E D q || p( )
E
p
D q* || p( )
q*
Veeravalli, NSF Big Data 3/22/13
19
Proposed Universal Test
H
i: ˆ̂p
iyMn( ) = µ̂
iy
k(i )( ) π̂ y
k( j )( )
j ≠ i∏
⎡
⎣⎢⎢⎢
⎤
⎦⎥⎥⎥k = 1
n
∏
πµ π π ππ µ π π ππ µπ π π
π µππ ππ π ππ µ
21 MH1
H
2
H
M
µ,π( ) not known : µ̂
i= γ
iπ̂
i=
1
M −1γ
jj≠i∑
Generalized Likelihood (GL) Rule :
δGL
(yMn ) = argmaxilog ˆ̂p
i(yMn )
Empirical distributions
Veeravalli, NSF Big Data 3/22/13
20
Performance of Universal Test
α δ, µ, π( )( )= minp1, …., pM
D(p1|| µ)+D(p
2|| π)+…+D(p
M|| π)
D(p
j||
1
M −1p
kk≠1∑ )
j≠1∑ ≥ D(p
j||
1
M −1p
kk≠2∑ )
j≠2∑
Veeravalli, NSF Big Data 3/22/13
21
Universally exponential consistency!
Performance of Universal Test
α δ, µ, π( )( )= minp1, …., pM
D(p1|| µ)+D(p
2|| π)+…+D(p
M|| π)
D(p
j||
1
M −1p
kk≠1∑ )
j≠1∑ ≥ D(p
j||
1
M −1p
kk≠2∑ )
j≠2∑
> 0, ∀ µ,π( )
Veeravalli, NSF Big Data 3/22/13
22
Asymptotic Optimality
• Motivation: When only is known, optimal error exponent is
• Estimate of satisfies
π
2B µ, π( )
π
limn→∞
1
Mγ
ii=1
M
∑ = 1
Mµ +
M −1
Mπ,
limM→∞
1
Mµ +
M −1
Mπ = π
Veeravalli, NSF Big Data 3/22/13
23
Our universal outlier detector achieves error exponent lower bounded by
Asymptotic Optimality
min q: D (q||π) ≤
1
M−1( 2B(µ, q )+Cπ )
2B(µ, q)
Veeravalli, NSF Big Data 3/22/13
24
Our universal outlier detector achieves error exponent lower bounded by
This lower bound is non-decreasing in
and converges to as
Asymptotic Optimality
min q: D (q||π) ≤
1
M−1( 2B(µ, q )+Cπ )
2B(µ, q)
M ≥3,
2B(µ, π) M →∞
Veeravalli, NSF Big Data 3/22/13
25
Numerical Results
5 10 15 20 250
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
log2 (M), where M is the number of the coordinates
Low
er B
ound
of t
he T
rue
Erro
r Exp
onen
t
Lower bound of the errorexponent
2 B (µ, π), µ=(0.36, 0.64)π=(0.64, 0.36)
M Veeravalli, NSF Big Data 3/22/13
26
Discussion
• Generalized likelihood (GL) test is universally exponentially consistent for single outlier hypothesis testing • GL test is asymptotically optimum in error
exponent for large M • What if we have more than one outlier?
Veeravalli, NSF Big Data 3/22/13
27
Discussion
• Generalized likelihood (GL) test is universally exponentially consistent for single outlier hypothesis testing • GL test is asymptotically optimum in error
exponent for large M • What if we have more than one outlier?
o If it is known that we have exactly K << M outliers, then GL test is still universally exponentially consistent
Veeravalli, NSF Big Data 3/22/13
28
Discussion
• Generalized likelihood (GL) test is universally exponentially consistent for single outlier hypothesis testing • GL test is asymptotically optimum in error
exponent for large M • What if we have more than one outlier?
o If it is known that we have exactly K << M outliers, then GL test is still universally exponentially consistent
o If number of outliers is not known a priori, then universally exponentially does not exist!
Veeravalli, NSF Big Data 3/22/13