+ All Categories
Home > Data & Analytics > Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Date post: 18-Feb-2017
Category:
Upload: paragonscienceinc
View: 951 times
Download: 0 times
Share this document with a friend
53
Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs Steve Kramer, Ph.D. President & Chief Scientist Paragon Science, Inc. September 2015 Copyright © 2006-2015 Paragon Science, Inc. All rights reserved.
Transcript
Page 1: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Finding Emerging Topics Using Chaos and Community Detection in Social Media GraphsSteve Kramer, Ph.D.President & Chief ScientistParagon Science, Inc.September 2015

Copyright © 2006-2015 Paragon Science, Inc. All rights reserved.

Page 2: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Overview Background Information about Paragon Science Example 1: Ebola Twitter Analysis 2014 Example 2: Stock Market Analysis via Twitter Q & A

Paragon Science, Inc. 2

Page 3: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

About Paragon Science Advisory Board Company

• Analysis of Healthcare Data Digital Motorworks/CDK Global

• Vehicle Pricing Analytics Houston Law Firm

• Email Analysis for Patent Lawsuit Place IQ

• Mobile Phone Data Analysis RetailMeNot

• Web Analytics for Online Coupons Vast.com

• Web User Click Patterns

Paragon Science, Inc. 3

Founder: Dr. Steve Kramer• PhD in computational physics (nonlinear

dynamics)• Self-funded data science entrepreneur• 22 years of research and high-tech

experience• Manager and consultant at software

companies• Reviewer for scientific journals and

conferences• Member of StartOut Austin steering

committee

 http://affinityincmagazine.com/paragon-science-puts-patented-technology-to-work-for-range-of-clients/

Page 4: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Paragon Science, Inc. 4

Using our patented anomaly detection software to find the “unknown unknowns”: unusual changes that represent revenue opportunities to exploit or risks to mitigate

Many possible application areas: • Social media alerting and sentiment change detection• Pricing and market trend analysis and alerting• Fraud prevention (banking, insurance, online auctions,…)

Key advantages• No machine learning or training required• Robust to missing or erroneous data• Highly scalable and parallelizable

What Are We Doing?

Page 5: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Paragon Science, Inc. 5

How Is It Done Today? Existing approaches

• Standard SNA metrics• Rule-based systems (transaction profiling, etc.)• Bayesian and other statistical/probabilistic models• Machine learning tools (neural nets, HMMs, etc.)

Some limitations of existing methods• Training requirements can be large for neural nets.• For rule-based systems, it is difficult to effectively predict or define

new “bad” anomalies or patterns in advance. • Many current methods are not scalable to real-world operational

requirements.

Page 6: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Paragon Science, Inc. 6

What Is New in Our Patented Approach? A powerful anomaly detection approach that

incorporates nonlinear time series analysis methods• US Patent #8738652 (1.usa.gov/1kkyVD9)

“Systems and Methods for Dynamic Anomaly Detection” Key questions answered:

• Which entities behave or evolve differently than others in the data set?

• Which entities have shifted their behavior unexpectedly?

Page 7: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Paragon Science, Inc. 7

What Is New in Our Approach? (Cont’d.) Our framework inherently captures the dynamics of the entities under

study, without having to specify in advance normal vs. abnormal behavior.

We can simultaneously analyze the time evolution of• Network structures• Any associated attributes (text terms, geospatial position, etc.)

Our technique is robust with respect to missing or erroneous data. As result, we can

• Find key players in rapidly changing networks• Provide early warning of viral videos and online documents• Focus attention on the most-anomalous events or transactions

Page 8: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Paragon Science, Inc. 8

Dynamic Anomaly Detection Overview A general approach that incorporates nonlinear time series

analysis methods• Complexity measures• Finite-time Lyapunov exponents (FTLEs)

Input data• Communications or transactional data streams• General time-dependent data sets

Key questions• Which entities behave or evolve differently than others in the data

set?• Which entities have shifted their behavior unexpectedly?

Page 9: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Paragon Science, Inc. 9

Finite-Time Lyapunov Exponents (FTLEs) General dynamical system

Flow map• Advects points in the state

space• Describes the time

evolution of the system

Page 10: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Paragon Science, Inc. 10

FTLEs characterize the amount of stretching or contraction about a point x0 during a time interval T• Stability• Predictability

Definition

Finite-Time Lyapunov Exponents (FTLEs)

Page 11: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Paragon Science, Inc. 11

Similarly, characteristic vectors derived from the flow map’s Jacobian can describe the generalized directions of the local stretching or contraction.

Possible derivation approaches:• Weight-based column sampling• Singular value decomposition (SVD)• Principal component analysis (PCA)

Derived Jacobian Vectors

Page 12: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Paragon Science, Inc. 12

Paragon Dynamic Anomaly Detection

Representation of Data at t=ti

Cluster Resolution

Feature Vector Encoding

Outlier Detectionat t=ti

3+Time Intervals?

Yes

No

Clustering /Segmentation

Dynamic Anomaly Detection

Nonlinear Time Series AnalysisFTLEs, Dynamic Thresholds, etc.

PatternClassification

Outlier Detection

Domain-Specific FilteringThreat Signatures,Risk Profiles, etc.

Page 13: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Example 1: Ebola Twitter Analysis 2014 Sample data set from Twitter API collected using twittertap

• Date range: 11/8/2014 – 11/16/2014• 2,541,812 tweets• 4,708,678 generated links with hashtags, URLs, and user replies

Research plan• Perform k-core decomposition• Run anomaly detection software on sub-networks of nodes in the

central core to find the most influential users and most viral URLs• Carry out community detection and topic detection

Paragon Science, Inc. 13

Page 14: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Twitter-Induced Social Networks

Paragon Science, Inc. 14

User A User B

User C

replies to

mentions

URL 1 URL 2

Hash Tag 1

Hash Tag 2

references

uses

uses

references

Page 15: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Paragon Science, Inc. 15

K-core Decomposition The k-core of a graph is a maximal subgraph in which each

vertex has at least degree k. • The coreness of a vertex is k if it belongs to the k-core but not to

the (k+1)-core. • The k-core decomposition is performing by recursively removing

all the vertices (along with their respective edges) that have degrees less than k.

The k-core decomposition of a network can be very effective in identifying the individuals within a network who are best positioned to spread or share information. • M. Kitska, et al., “Identifying influential spreaders in complex networks,”

arXiv:1001.5285v1 [physics.soc-ph] (2010).

Page 16: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

K-Core Decomposition of the Ebola Network

Paragon Science, Inc. 16

http://sourceforge.net/projects/lanet-vi/

Page 17: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Central Core of the Ebola Network

Paragon Science, Inc. 17

Page 18: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Top URLs in the Central Core

Paragon Science, Inc. 18

URL K Shell

Degree

http://goo.gl/pFg3Z2 49 279 http://goo.gl/BFEUgy 49 233 http://goo.gl/S37kHT 49 212 http://goo.gl/silISF 47 364 http://invst.rs/7MKWHB 22 779 http://cnn.it/1wlIlUe 22 741 http://trib.al/YKSMCSN 22 734 http://nyp.st/136BPG3 22 698 http://nypost.com/2014/10/29/cdc-admits-droplets-from-a-sneeze-could-spread-ebola/

22 415

http://fxn.ws/1oVgLwc 22 406

Page 19: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Top-Ranked Website (URLs 1, 2, and 4)

Paragon Science, Inc. 19

UMA MENTIRA CHAMADA ,,EBOLA,, VEJAM !!! | NOTICIÃRIO DA WEBA statement made by a man in Ghana called Nana Kwame rocked the internet in recent days. The following information has to reach people. We need to see the Ebola for what it really is. It's time to wake up the world agenda behind this whole story.

Follow what this man has to say about what is happening in their country of origin:

People in the world need to know what is happening here in West Africa. They are lying! The '' Ebola''como a virus does not exist and is not contagious. The Red Cross brought a disease to four specific countries, for four specific reasons and is only contracted by those who receive treatments and injections of the Red Cross. That's why Liberians and Nigerians began to expel the Red Cross in their countries!

Page 20: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

5th Ranked Website

Paragon Science, Inc. 20

Page 21: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

6th Ranked Website

Paragon Science, Inc. 21

Page 22: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Topic Detection in the Ebola Twitter Network

Paragon Science, Inc. 22

User A User B

User C

replies to

mentions

URL 1 URL 2

references

Term 1

Term 2

Term N

Term 3

Topic 1

Topic 2

Topic M

Page 23: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Applicable “Soft” Clustering Methods K-Groups/Group Discovery Algoritjm (GDA)

• J. Kubica, A. Moore, and J. Schneider, “Tractable group detection on large link data sets,” The Third IEEE International Conference on Data Mining (2003).

Clique Percolation (http://www.cfinder.org/) • G. Palla, et al., “Uncovering the overlapping community structure

of complex networks in nature and society,” Nature, 435, p. 814 (2005).

Louvain Modularity Optimization• V. Blondel, et al., “Fast unfolding of communities in large

networks,” Journal of Statistical Mechanics: Theory and Experiment, 10, P10008 (2008).

Paragon Science, Inc. 23

Page 24: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Summary of Top 200 Topic Anomalies

Paragon Science, Inc. 24

Topic Peak Start Time Peak End Time Max Change Metric

# Anomalies

Topic 99 2014-11-06 06:18 2014-11-12 10:18 2.97 40Topic 8 2014-11-05 20:18 2014-11-07 07:18 2.891 34Topic 59 2014-11-06 20:18 2014-11-11 19:18 2.43 28Topic 1 2014-11-05 17:18 2014-11-05 19:18 2.32 3Topic 52 2014-11-05 17:18 2014-11-05 18:18 2.30 2Topic 50 2014-11-05 19:18 2014-11-06 15:18 2.22 11Topic 32 2014-11-05 18:18 2014-11-05 19:18 2.18 2Topic 20 2014-11-05 20:18 2014-11-06 02:18 2.11 7Topic 2 2014-11-07 07:18 2014-11-12 16:18 2.10 33Topic 28 2014-11-05 20:18 2014-11-05 22:18 2.00 3Topic 29 2014-11-08 02:18 2014-11-12 18:18 1.96 21Topic 97 2014-11-06 09:18 2014-11-07 03:18 1.91 4Topic 30 2014-11-05 20:18 2014-11-05 20:18 1.84 1Topic 22 2014-11-05 23:18 2014-11-06 02:18 1.79 4Topic 18 2014-11-05 17:18 2014-11-05 17:18 1.65 1Topic 15 2014-11-05 19:18 2014-11-05 19:18 1.63 1Topic 4 2014-11-08 14:18 2014-11-12 15:18 1.61 5

Page 25: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Key Sites Related to Top 5 Ebola Topic Anomalies

Paragon Science, Inc. 25

Topic Max Change Metric

Peak Datetime

Top Related URL Title

Topic 99

2.973 2014-11-06 17:18:27

FACT SHEET: Emergency Funding Request to Enhance the U.S. Government’s Response to Ebola at Home and Abroad | The White House

Topic 8

2.888 2014-11-05 20:18:27

BBC News - Ebola outbreak: Barack Obama 'to ask Congress for $6bn'

Topic 59

2.426 2014-11-07 02:18:27

» Obama Caught Ordering Press to Cover Up Ebola Alex Jones' Infowars: There's a war on for your mind!

Topic 1

2.321 2014-11-05 17:18:27

UMA MENTIRA CHAMADA ,,EBOLA,, VEJAM !!! | NOTICIÃRIO DA WEB

Topic 52

2.296 2014-11-05 17:18:27

Nigeria Property: Ebola Virus Originated From US Bio-warfare Labs In West Africa – American Prof

Page 26: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Example: Topic 99 URL-to-User Links

Paragon Science, Inc. 26

Page 27: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Topic 99a: Economic Consequences

Paragon Science, Inc. 27

Page 28: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Topic 99b: Mobile Data to Prevent Ebola

Paragon Science, Inc. 28

Page 29: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Topic 99c: ISIS and Ebola

Paragon Science, Inc. 29

Page 30: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Topic 99d: @ebolafiles (Twitter user)

Paragon Science, Inc. 30

Page 31: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Topic 99e: Emergency Funding Request

Paragon Science, Inc. 31

Page 32: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Topic 99f: Follow Ebola

Paragon Science, Inc. 32

Follow Ebola | Updated every second & see what the #CDC & #WHO is not telling you about #Ebola

Page 33: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Overview Background Information about Paragon Science Example 1: Ebola Twitter Analysis 2014 Example 2: Stock Market Analysis via Twitter Q & A

Paragon Science, Inc. 33

Page 34: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Twitter Stock Market Data Set Date range: August 5-29, 2015 175,246 tweets sent by 28,754 users Network graph generated includes these links:

• symbol links to URL: 430,842 (74,034 distinct URLs)• user links to URL: 149,117• user mentions user: 74,247 • user references hash tag: 176,670 • user references symbol: 501,165 • user replies to user:10,698

Goal: • Identify key influencers and emerging topics that could influence prices • Provide high-quality input for Moodzee predictive models

Paragon Science, Inc. 34

Page 35: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Twitter Stock Market Graph for August 2015

Paragon Science, Inc. 35

Page 36: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Twitter Stock Market Graph (Zoom 1)

Paragon Science, Inc. 36

Page 37: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Twitter Stock Market Graph (Zoom 2)

Paragon Science, Inc. 37

Page 38: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Identifying Key Influencers Perform k-core

decomposition Results:

• 50 k-shells• 102 users at the center of

the network• Examine stock symbol ->

URL links for the central users using uncertainty scores for the content of the web pages

Paragon Science, Inc. 38

Twitter User # LinksDayTradersGroup 855diggingplatinum 652Benzinga 261WrigleyTom 203SeekingAlpha 182OpenOutcrier 126theflynews 125WallStJesus 119Istock8 96valuewalk 93

Page 39: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Network of 102 Central Users and 2910 Neighbors

Paragon Science, Inc. 39

Page 40: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Network of 102 Central Users and Neighbors (Zoom 1)

Paragon Science, Inc. 40

Page 41: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Network of 102 Central Users and Neighbors (Zoom 2)

Paragon Science, Inc. 41

Page 42: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Using Financial Sentiment Scores: Uncertainty

Paragon Science, Inc. 42

Web Page Title URL(s) UncertaintyPredicting Is Hard Business | Seeking Alpha http://seekingalpha.com/article/3422496-predicting-

is-hard-business?source=feed_f69

In Today's Overheated Market, Control Risk In Your Retirement Portfolios With Sound Valuation | Seeking Alpha

http://seekingalpha.com/article/3455116-in-todays-overheated-market-control-risk-in-your-retirement-portfolios-with-sound-valuat

63

Comments On The Market Correction; Focus On Biotechs: Large Caps - Regeneron Pharmaceuticals, Inc. (NASDAQ:REGN) | Seeking Alpha

http://seekingalpha.com/article/3468626-comments-on-the-market-correction-focus-on-biotechs-large-caps?source=feed_f

55

TradingView: Free Stock Charts and Forex Charts Online.

http://www.tradingview.com 51

A MASSIVE New Platinum Pick Is Being Released At 9:30 am Today! Get On The List For Early Access To This New Play. | Blog

http://tinyurl.com/oea3bjx, http://tr.im/oCRrP, http://bit.ly/1JhlgVb

49

Our Pick On VGTL Has Gained 242.86% For Our Subscribers, In 2 Months! | Blog

http://bit.ly/1OOMiY9, http://tr.im/6hNJf 47

After 550% Gains On Our Picks In 5 Weeks, We Have A Major New Pick Coming Tomorrow! It is ONLY being released to Platinum Members Tomorrow, So Go Platinum To Get It Early! | Blog

http://ow.ly/QrGNn 47

Our Picks Gained Over 550% In The Past Month! And We Have A MASSIVE New Pick Coming To Our Platinum Members! Subscribe To Get It Early. | Blog

http://bit.ly/1UjdodT, http://goo.gl/r34fP7, http://tr.im/mZn9y

47

Our Pick On VGTL Has Gained 242.86% For Our Subscribers, In 2 Months! | Blog

http://tinyurl.com/qjwxxwk 47

What To Find Before Seeking Alpha: Position Size | Seeking Alpha

http://seekingalpha.com/article/3444516-what-to-find-before-seeking-alpha-position-size?source=twitter_sa_factset

37

Loughran and McDonald Financial Sentiment Dictionaries:Tim Loughran and Bill McDonald, 2011, “When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks,” Journal of Finance, 66:1, 35-65

Page 43: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Anomaly Scores for Symbols -> URL Links

Paragon Science, Inc. 43

Largest jump in the anomaly scores: $BIDU on 8/13/2015

Page 44: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

$BIDU Network at First Uncertainty Surge

Paragon Science, Inc. 44

Page 45: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Topic Detection in the Twitter URL Network

Paragon Science, Inc. 45

User A User B

User C

replies to

mentions

URL 1 URL 2

references

Term 1

Term 2

Term N

Term 3

Topic 1

Topic 2

Topic M

Page 46: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Topic Detection: Network of 698 Web Pages Shared by 102 Central Users

Paragon Science, Inc. 46

215 topics detected

Page 47: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Network of 698 Web Pages Shared by 102 Central Users

Paragon Science, Inc. 47

Page 48: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Network of 698 Web Pages Shared by 102 Central Users

Paragon Science, Inc. 48

Nodes colored by topic #

Page 49: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Web Site Titles in Largest Topic

Paragon Science, Inc. 49

SPY ETF Turns Negative For Year Before Clawing Back - Investors.com

$TSLA $GE $JCP $JWN $LOCO $KING $DD $JPM $AMAT $BAC $CBK: Stocks to Watch: Tesla, GE, JC Penney, Nordstrom | Stock News Hour

$AAPL Apple has completed a 6-month complex H&S top

$MU $SYMC $AAPL $ATML $SYNA $QLGC $CRUS $FCS $YHOO $BABA $AKAM $FSLR: It’s Not Just Apple: Yahoo!, Micron, Synaptics Fall on China Fears | Stock News Hour

$GS $NVDA $BRCM $MU $SWKS $QCOM $INTC $WYNN $AAPL $YHOO $CAT $GM $T $VZ: China Damage Spreading | Stock News Hour

$GOOGL $CAT $AAPL $SHAK $KHC $TW $JASO $RRGB $CSC $SYMC $CREE: Investors eye positive catalysts in oil, Google | Stock News Hour

$GOOGL $PCLN $CTRP $BIDU $FB $AMZN $BABA $EXPE $LONG $QUNR $AWAY: The only US Web company that’s figured out China | Stock News Hour

Page 50: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

New Partner Company: Moodzee Text analytics for financial markets

• Predictive models• Advanced warning of price-moving events

Initial target users: Hedge funds Price correlations done, now back-testing then

paper trading then real trading

Paragon Science, Inc. 50

Alerts Correlation Analysis Downloader

Sentiment

Price-Movers

Anomalies

Page 51: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Paragon Science, Inc. 51

What Are the Payoffs? Find the “unknown unknowns” in dynamic data sets Quickly identify key influencers and trends in online

networks Provide early warning of viral videos, anomalous web

events, or unusual network traffic Enable enhanced business intelligence without having to

specify normal vs. abnormal behavior in advance

Page 52: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Third-Party Software Acknowledgements Paragon Science gratefully acknowledges the following researchers and software

providers:• Cytoscape (http://www.cytoscape.org/) • dynnetwork Cytoscape plugin (https://code.google.com/p/dynnetwork/) • Lanet-vi (http://sourceforge.net/projects/lanet-vi/)

◦ J. Alvarez-Hamelin, et al. "Understanding Edge Connectivity in the Internet through Core Decomposition," Internet Mathematics 7 (1): 45–66, 2011.

• Louvain community detection software (http://perso.crans.org/aynaud/communities/)◦ V. Blondel, et al., “Fast Unfolding of Communities in Large Networks,” Journal of

Statistical Mechanics: Theory and Experiment, 10, P10008, 2008.• Networkx (https://networkx.github.io/)

◦ A Hagberg, D Conway, "Hacking social networks using the Python programming language (Module II - Why do SNA in NetworkX)", Sunbelt 2010: International Network for Social Network Analysis.

Paragon Science, Inc. 52

Page 53: Finding Emerging Topics Using Chaos and Community Detection in Social Media Graphs

Overview Background Information about Paragon Science Example 1: Ebola Twitter Analysis 2014 Example 2: Stock Market Analysis via Twitter Q & A

Paragon Science, Inc. 53


Recommended