+ All Categories
Home > Data & Analytics > Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

Jordi Nin – Hermes: Distributed social network monitoring system - NoSQL matters Barcelona 2014

Date post: 31-Jul-2015
Category:
Upload: nosqlmatters
View: 197 times
Download: 1 times
Share this document with a friend
Popular Tags:
25
Hermes Distributed social network monitoring system Daniel Cea and Jordi Nin Barcelona Supercomputing Center BSC Universitat Politècnica de Catalunya UPC {dcea, nin}@ac.upc.edu
Transcript

Hermes

Distributed social network monitoring system

Daniel Cea and Jordi Nin

Barcelona Supercomputing Center (﴾BSC)﴿́

Universitat Politècnica de Catalunya (﴾UPC)﴿

{dcea, nin}@ac.upc.edu

Index

1.  Introduction

2.  Technologies

3.  Implementation

4.  Results

5.  Conclusions

1. Introduction

Problem formulation

Objectives

Problem formulation

Platform to build social relations among people

who share interests, activities, backgrounds or

real-‐life connections.

New issues born:

Privacy, child safety,

addiction.

4/25

Problem formulation §  Rise of social networks -‐> Big amounts of

social data.

§  Two main problems: Multiple sources +

Hardware limitations.

§  Solution: Implement a distributed, scalable

social media analyser ready to gather from

multiple sources and show the aggregated results in real-‐time.

5/25

Objectives

Input web interface:

§  Start a new query.

§  Control the data

recollection.

§  Query history.

6/25

Objectives

Backend:

§  Render interfaces.

§  Gather data from external

APIs.

§  Enrich and store data into a

NoSQL database.

7/25

Objectives

Output web interface:

§  See aggregated

results.

§  Filter results.

§  Customize how the

results are displayed.

8/25

2. Current Technologies

Data Access

Data Process

Data Storage

Data Access

Twitter Stream API (﴾ready to add other sources)﴿

10/25

Data Process

JavaScript (﴾client and server side)﴿

§  Platform: Node.js

§  Web framework: Express

§  Sentiment analysis:

Dictionaries obtained from Amazon Turk*

* Amy Beth Warriner, Victor Kuperman, Marc Brysbaert. "Norms of valence, arousal, and dominance for 13,915 English

lemmas”. December 2013, Ghent university. 11/25

Data Storage

CouchBase (﴾Storage)﴿ + ElasticSearch (﴾Indexing)﴿

12/25

3. Implementation

Description

Data access layer

Business logic layer

Enrichers

Description

Implementation structured in 3 layers, following a Model

View Controller pattern:

•  Data access -‐> Storage and indexing of documents

(﴾ json)﴿ and queries.

•  Business logic -‐> Start query, manage data stream,

process + enrich tweets, send them to storage.

•  User Interface -‐> Allow user control of the system.

14/25

15/25

16/25

Enrichers Stream slots implement the following data enrichers:

§  Device enricher: Determines the device used to

write the message.

§  Geo enricher: Filters messages by geo-‐location.

§  Spain enricher: For messages coming from Spain,

determines the autonomous community.

17/25

Enrichers §  Stopwords enricher: Remove stop words from

the text.

§  Stemmer enricher: Applies a stem to the prior

filtered words.

§  Sentiment enricher: Determines the sentiment

and arousal of the stemmed message.

18/25

4. Results

Use case: 9N referendum

Use case: 9N referendum

§  What? -‐> The 9N

unofficial Catalonian

independence referendum

§  When? -‐> from 7th Nov.

2014, to 11th Nov. 2014.

§  Where? -‐> Catalonia

20/25

Use case: 9N referendum §  How? -‐> Storing all tweets with filters:

§  Location: none.

§  Language: none.

§  Text: Contains “9N”.

§  Time: From Nov 7 at 00:00 to Nov 11 at 23:59.

§  Why? -‐> Analyse the reactions in the world before,

during and after the referendum.

21/25

5. Conclusions

General conclusions

Future work

General conclusions

§  NoSQL Technologies are crucial for the project. Couchsbase + Elasticsearch + kibana works

perfectly.

§  Elasticsearch is flexible enough for allowing fast

developing and performing real time queries

§  Kibana allows us to create fancy plots with few

effort

23/25

Future work

§  More data sources.

§  Better data enrichment.

§  Add user data context.

§  Percolation queries

24/25

Hermes

Thank you for your attention


Recommended