Spectrum based Fraud Detection in Social Networks

Spectrum Based Fraud Detection in (Social) Networks

Xiaowei Ying, Xintao Wu, Daniel Barbara

Spectrum based Fraud Detection in Social Networks11An abstraction of collaborative attacks including spam, viral marketing, individual re-identification via active/passive attacks The attacker creates some fake nodes and uses them to attack a large set of randomly selected regular nodes;Fake nodes also mimic the real graph structure among themselves to evade detection.Random Link Attack Shirvastava et al. icde08

2One common attack in the social network is the random link attack. In the random link attack, the attacker join the network, and create links to a set of randomly selected victims. The multiple attackers can simulate real graph patterns among themselves to evade detection. Spam email and some spam messages in instance message network are one type of RLA.23Ideacount external triangles around each node --- neighbors of a regular user have many triangles, but random victims do not.Algorithm detecting suspects clustering test and neighborhood independence testdetecting RLAsGREEDY and TRWALKLimitationtoo many parametershigh computational costdifficult to detect when there exist multiple RLAs

Topology Approach Shirvastava et al. icde083Our Approach Examine the spectral space of graph topology. : undirected, un-weighted, unsigned, and without considering link/node attribute information;Adjacency Matrix A (symmetric)

Adjacency Eigenspace

4

A graph with n nodes and m edges can have various structures. An edge can be directed or undirected. For example, the friendship is usually mutual, while the links between web pages usually directions. In some scenarios, there is weight associated to each link, higher weight means more communication. The edges can even have signs, friends have positive edges, and enemies have negative links. In my research, I mainly focus on the undirected, un-weighted, and unsigned graph. This type of graph can be represented by the adjacency matrix. A_ij is equal to 1 if there is an edge between node I and j. The degree of node I is the number of edges connected to node i. Laplacian and normal matrix also commonly used data structure for social network.4Adjacency Eigenspace5

Spectral coordinate: Ying and Wu SDM09

Polbook Network5

Spectrum Based Fraud DetectionRLA from the matrix perturbation point of view

6It is a special type of randomization.6

Spectrum Based Fraud DetectionApproximate the spectral coordinate

7It is a special type of randomization.7Approximate the eigenvector in random link attack

Regular nodesApproximationfirst ordersecond order

Attacking nodes

88Illustrating network data9Network of the political blogs on the 2004 U.S. election (polblogs, 1,222 nodes and 16,714 edges)The blogs were labeled as either liberal or conservative.

This is a link network of political blogs. It has more than 1 thousand nodes and 16 thousand links. Again, we can observe two communities in the graph. This is because blogs with similar opinions are often linked to each other, and they seldom link to the blogs with different political views.9Illustrating examplePolitical blogs (1222, 16714): each node labeled as either liberal or conservative Add one RLA with 20 attacking nodes that have the same degree dist. as the regular ones.

1010ProblemWe do not know who are attackers/victims in the graph topology.

For Random Link Attacks, we can derive the distribution of attacking nodes spectral coordinates.11

The spectral coordinate of attacking node p has the normal distribution with mean and variance bounded by:

We can get the region in the spectral space where RLA attacking nodes appear with high prob.Dist. of attackers spectral coordinates

Inner structure of attackers does not affect the region!!!polblogs (1222, 16714), 20 attackers, each randomly attacks 30 victims1212It is tedious to check every dimension one by one.The node non-randomness of RLA attackers

We derive the upper bounds of mean and variance and get the decision line:Using node non-randomness

1313

The node non-randomness of RLA attackers

Identifying suspects

Nodes below the decision line are suspects

1414RLAs with varied inner structure15

SPCTRA Algorithm16

Evaluation Topology based RLA detection approach Shrivastava et al. ICDE08 clustering test and neighborhood independence testGREEDY and TRWALKExperimental Setting Political blogs (1222,16714), add 1 RLA with 20 attackers Web Spam Challenge data (114K nodes and 1.8M links), add a mix of 8 RLAs with varied sizes and connection patterns. 17Evaluation on political blogs (1 RLA each time)

Evaluation18

18Evaluation on Web spam challenge dataA snapshot of websites in domain .UK (2007)SPCTRA: based on spectral spaceGREEDY: based on outer-triangles [Shrivastava, ICDE, 2008]

Accuracy

1919Execution timeTRWALK is 10 times faster than GREEDY (with less accuracy), but still 100 times slower than SPCTRA.Discussion of complexity is in the paper.

2020Bipartite Core AttacksAttacker creates two type of nodes:Accomplices: behave like normal users except heavily connecting to fraudsters to enhance fraudsters rating.Fraudsters: nodes that actually do frauds, mostly connect to accomplices. No link exists within accomplices or fraudsters.

Figure from: Duen Horng Chau et. al., Detecting Fraudulent Personalities in Networks of Online Auctioneers21

Bipartite core21 Bipartite Core Attacks2220 fraudsters and 30 accomplices.

22 DDoS attacks23Attacker controls 10% normal nodes to attack one victim node.

23ConclusionPresent a framework that exploits the spectral space of graph topology to detect attacks.Theoretical analysis showed that attackers locate in a different region from the regular ones in the spectral space.Develop the SPCTRA algorithm for detecting RLAs.Demonstrate its effectiveness and efficiency through empirical evaluation.

24Future WorkExplore other attacking scenarios in both social networks and communication networks.In Sybil attacks, attackers may choose victims purposely, rather than randomly.Track how graph evolves dynamically.25Questions?

AcknowledgmentsThis work was collaborated with Xiaowei Ying and Daniel Barbara, and was supported in part by U.S. National Science Foundation IIS-0546027 , CNS-0831204 and CCF-1047621.Thank You!262627Another Example

27Adjacency Eigenspace28

Spectral coordinate: Ying and Wu SDM09

Polbook Network28

Date post:	23-Feb-2016
Category:	Documents
Upload:	elke
View:	32 times
Download:	0 times

Spectrum based Fraud Detection in Social Networks

Documents