Visually Analyzing People with Graphs

Post on 07-Jan-2017

446 views 2 download

transcript

1

Visually Analyzing People

Leo Meyerovich (@LMeyerov)CEO

is:Supercharging visual analytics through GPU cloud streaming.

(We tricky graphs.)

CASE STUDY: TWITTER FRAUD

Naïve layout on 1K+ node graphs give impenetrable hairballs.

Gauss-Seidel Force-Directed Graph, O(N^2) n-body, GPU

Node: Twitter accountEdge: Friendship

Friends and friend-of-friends of a bot who randomly messaged real people and retweeted them.

Even on a small graph (77 nodes), smart design starts adding clarity

With smart layouts, fake account clusters pop outForceAtlas2 Layout, O(n log n) n-body, GPU

The spambot is an entrypointto more bots…

Obviously fakeaccount names

A quiet small business who buys virtual game currency from

gamers…

Who somehow got exactly 1 message massively

trended & advertised by Twitter

spammer

laundering

accounts

bot retweet network

It’s a “retweet laundering” botnet! Tricks Twitter into targeting gamers

to check out a cyberfraud site.They steal gamers’ money and

identities.

Relationships hard to see without graphs with smart layouts & interactions.

Next step: explore the time dimensionEx: how do mobs launch from Twitter?

11Leo A. Meyerovich, @lmeyerov, GraphistryAriel S. Rabkin, @asrabkin, Cloudera

THE SOCIOLOGY

OFPROGRAMMIN

GLANGUAGESadoption

http://hammerprinciple.com/therighttool

~14,000 developers

Fastest? C > Java> JavaScript > PascalSafest? Java > Pascal > JavaScript > C

Goal: Rank Beliefs

Programmers won’t agree on

ranking..

Idea: Chess Ranking

Let’s run a competition for the friendliest language! (Glicko2)

Each survey response is a game match:1. Person A says Python beats C in

friendliness2. Person A says Java beats C in

friendliness3. Person B says C beats APL in

friendliness …

Score Points set by a BookieEvery language starts with rank 1000

1. “Person A: Python friendlier than C” Python’s rank goes up

2. “Person B: Python friendlier than C” Python already > C, less valuable win

3. “Person C: Haskell friendlier than Python”Problem: little known about Haskell (“sparse”) Haskell beat a high-rank language: big level increase!

(Bayesian!)

Many Tournaments = Correlation Matrix!

Language x Belief

Cluster (K-Means)

Reduce Dimensionality: Pick fun languages & cluster centers

Graphs are (Adjacency) Matrices

Correlation Matrices are Fuzzy Graphs

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5 0.5 0.5

0.5

0.5

0.5 0.5

0.5 0.5

Weak Edges Are Annoying!

Filter: Only Show Strong Relationships

Relationships hard to see without graphs with smart layouts & interactions.

Step 2 of analysis is correlate (step 1 is count).

Correlations are relationships, so explore them as graphs!

26

Projects (2000-2010)200K[PLATEAU 2013]

-20%

0%

20%

40%

60% Java

Project categories (223)

Popu

larity

0%

1%

2%

3%

4%

Scheme

Project categories (223)

Popularity Across Niches

27

blogging

search

build tools

28

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

PrologVBScript SchemeFortran

PL/SQL AssemblyC#

Java

Dispersion across niches(σ / μ)

Popu

lari

tyPopularity vs. Niche: Dispersion

29

0.127 1.27 12.7 1270.0100%

0.1000%

1.0000%

10.0000%

100.0000%

Language Rank (Decreasing )

Propor-tion of Projects

for Lan-

guage

Language Use (survey)Java: winner takes all

Long TailDesign for

nichesand grow

30

Survey of 1,679 Developers

Extrinsic factorsdominate!

(on last project)

FUTURE STEP: Now that we’ve counted things, let’s correlate

them!

Topics in Free-form ResponsesAnswer Correlations

Relationships hard to see without graphs with smart layouts & interactions.

Step 2 of analysis is correlate (step 1 is count).

Correlations are relationships, so explore them as graphs!

Powerful because correlations everywhere:

raw features, inferred topics, …

33

We’re Hiring Designers! (and contact if you have

interesting graphs)info@graphistry.comTwitter: @LMeyerov