Xiaowei Ying, Xintao Wu, Daniel Barbara Spectrum based Fraud
Detection in Social Networks 1
Slide 2
An abstraction of collaborative attacks including spam, viral
marketing, individual re-identification via active/passive attacks
The attacker creates some fake nodes and uses them to attack a
large set of randomly selected regular nodes; Fake nodes also mimic
the real graph structure among themselves to evade detection.
Random Link Attack Shirvastava et al. icde08 2
Slide 3
3 Idea count external triangles around each node --- neighbors
of a regular user have many triangles, but random victims do not.
Algorithm detecting suspects clustering test and neighborhood
independence test detecting RLAs GREEDY and TRWALK Limitation too
many parameters high computational cost difficult to detect when
there exist multiple RLAs Topology Approach Shirvastava et al.
icde08
Slide 4
Our Approach Examine the spectral space of graph topology. :
undirected, un-weighted, unsigned, and without considering
link/node attribute information; Adjacency Matrix A (symmetric)
Adjacency Eigenspace 4
Slide 5
5 Spectral coordinate: Ying and Wu SDM09 Polbook Network
Slide 6
Spectrum Based Fraud Detection RLA from the matrix perturbation
point of view 6
Slide 7
Spectrum Based Fraud Detection Approximate the spectral
coordinate 7
Slide 8
Approximate the eigenvector in random link attack Regular nodes
Approximation first order second order Attacking nodes 8
Slide 9
Illustrating network data 9 Network of the political blogs on
the 2004 U.S. election (polblogs, 1,222 nodes and 16,714 edges) The
blogs were labeled as either liberal or conservative.
Slide 10
Illustrating example Political blogs (1222, 16714): each node
labeled as either liberal or conservative Add one RLA with 20
attacking nodes that have the same degree dist. as the regular
ones. 10
Slide 11
Problem We do not know who are attackers/victims in the graph
topology. For Random Link Attacks, we can derive the distribution
of attacking nodes spectral coordinates. 11
Slide 12
The spectral coordinate of attacking node p has the normal
distribution with mean and variance bounded by: We can get the
region in the spectral space where RLA attacking nodes appear with
high prob. Dist. of attackers spectral coordinates Inner structure
of attackers does not affect the region!!! polblogs (1222, 16714),
20 attackers, each randomly attacks 30 victims 12
Slide 13
It is tedious to check every dimension one by one. The node
non-randomness of RLA attackers We derive the upper bounds of mean
and variance and get the decision line: Using node non-randomness
13
Slide 14
The node non-randomness of RLA attackers Identifying suspects
Nodes below the decision line are suspects 14
Slide 15
RLAs with varied inner structure 15
Slide 16
SPCTRA Algorithm 16
Slide 17
Evaluation Topology based RLA detection approach Shrivastava et
al. ICDE08 clustering test and neighborhood independence test
GREEDY and TRWALK Experimental Setting Political blogs
(1222,16714), add 1 RLA with 20 attackers Web Spam Challenge data
(114K nodes and 1.8M links), add a mix of 8 RLAs with varied sizes
and connection patterns. 17
Slide 18
Evaluation on political blogs (1 RLA each time) Evaluation
18
Slide 19
Evaluation on Web spam challenge data A snapshot of websites in
domain.UK (2007) SPCTRA: based on spectral space GREEDY: based on
outer-triangles [Shrivastava, ICDE, 2008] Accuracy 19
Slide 20
Execution time TRWALK is 10 times faster than GREEDY (with less
accuracy), but still 100 times slower than SPCTRA. Discussion of
complexity is in the paper. 20
Slide 21
Bipartite Core Attacks Attacker creates two type of nodes:
Accomplices: behave like normal users except heavily connecting to
fraudsters to enhance fraudsters rating. Fraudsters: nodes that
actually do frauds, mostly connect to accomplices. No link exists
within accomplices or fraudsters. Figure from: Duen Horng Chau et.
al., Detecting Fraudulent Personalities in Networks of Online
Auctioneers 21 Bipartite core
Slide 22
Bipartite Core Attacks 22 20 fraudsters and 30
accomplices.
Slide 23
DDoS attacks 23 Attacker controls 10% normal nodes to attack
one victim node.
Slide 24
Conclusion Present a framework that exploits the spectral space
of graph topology to detect attacks. Theoretical analysis showed
that attackers locate in a different region from the regular ones
in the spectral space. Develop the SPCTRA algorithm for detecting
RLAs. Demonstrate its effectiveness and efficiency through
empirical evaluation. 24
Slide 25
Future Work Explore other attacking scenarios in both social
networks and communication networks. In Sybil attacks, attackers
may choose victims purposely, rather than randomly. Track how graph
evolves dynamically. 25
Slide 26
Questions? Acknowledgments This work was collaborated with
Xiaowei Ying and Daniel Barbara, and was supported in part by U.S.
National Science Foundation IIS- 0546027, CNS-0831204 and
CCF-1047621. Thank You! 26