Post on 09-Jan-2017
transcript
Brian Davis, Keith Cortis, Laurentiu Vasiliu, Adamantios
Koumpis, Ross McDermott, Siegfried Handschuh
ssix-project.eu twitter.com/SSIX_project
ALLDATA 2016
22nd February 2016
bit.ly/ssix_facebook
Project Inspiration
• Studies that demonstrate the predictive power of Social Networks on Financial Markets
Research conclusion considering the statistical results: • There is a relationship (therefore predictive power) between
the content of posts on social networks and the trend of stock market indices
2
Project Objectives
1. Classify and score content using a framework of qualitative and quantitative parameters called X-Scores, regardless of language or data architecture
2. Provide European SMEs with a collection of easy to use tools to analyse and interpret attitudes for any given target
3. Enable SMEs to exploit sentiment characteristics to assist them in creating new products and services resulting in increased revenues
3
Challenges and Contributions of SSIX
1. Aims for an “open source” framework: this must be balanced between commercial partners’ Intellectual Property (IP)
2. Data acquisition, filtering, representative sampling: data ethics, sampling methodology, commoditisation of social media (Facebook closed Public Feed API by April 30th 2015, Twitter dropped Datasift)
3. Global Sentiment Access: deficit of tools for other European languages → SSIX will make major breakthrough into mining multilingual social media streams
4. Multilingual Cost Saving: where cross lingual opinion mining cannot be adapted, relevant snippets to EN will be automatically translated
4
SSIX Templates
• Provide SSIX end users an easy way to personalise and customise specific SSIX platform behaviours without requiring any development effort
• Template made of both configurable files (e.g. XML) and software that implements a number of variables to allow personalisation for any targeted case study
• Advantage: To leverage massive amount of sentiment data produced and published on social media networks within multiple domains
• Other target domains: Government, Health, Politics
5
Big Data Challenges in SSIX
1. Collection and handling of multiple kinds of data:
• Public data from social networking and news sources
• Linguistic Linked Data and Linking Open Data cloud datasets
• Language Resources (LRs) e.g., SentiWordNet (LR for opinion mining), EuroSentiment (marketplace for LRs and services dedicated to sentiment analysis)
2. Diversity in nature of gathered data:
• High volume
• High velocity
• High variety 6
● StockTwits
● Google+
● Blogs
● Forums
● News Feeds
● RSS
● Newsletters
Multilingual NLP Pipeline
Data Management
● Bootstrap with Knowledge based IE - custom
dictionaries and finite state grammars -
hybrid rule/ML based approach
● Adapt shallow NLP tools for social media to
other languages/translate relevant snippets
to EN for mining
8
SSIX Index
SSIX X-Scores ● Raw Scores - time series data streams
direct from NLP
● Statistical Scores for deeper analysis of sentiment behaviour, e.g. Volume, Polarity, Volatility, Averages, etc.
● Influence and reputation scoring
● Custom SSIX Index composed of specific X-Score streams for any topic
● SSIX Index composition using any index formula
9
What SSIX Brings
• Most services are English only, SSIX will provide multilingual opinion mining with the aim to cover the most widely used European languages
• SSIX aims to provide near real-time sentiment analysis in addition to extensive analysis on delayed time series
• SSIX will provide a fully customisable API for the SSIX Index and X-Scores, allowing SMEs to easily integrate SSIX technology into their own platforms
• SSIX will provide an interactive visual and analysis dashboard allowing users to easily understand sentiment dynamics
10
NLP in SSIX – Components and Technologies
Open Source Toolkits: Apache Stanbol/OpenNLP, GATE, Stanford NLP, NLTK
EU Projects Results: MONNET, TrendMiner, LIDER, OpeNER, EuroSentiment
● Knowledge engineering approach initially taken – Finite State Transducer
(FST) Grammars + Custom Dictionaries and Sentiment Lexica
● Opinion Orientated Information Extraction approach using existing open
source tools that are customised/customisable, such as TwitIE in GATE
● Wrap existing ML based approaches in NLP frameworks to social media
sentiment analysis and retrain, e.g., Stanford Twitter Sentiment Analyser
● Adapt existing shallow NLP tools, i.e. tokenisers and POS taggers
● Provide localised language models for the SSIX pipeline
● Provide machine translation for languages that are under-resourced 11
Use Cases - SSIX Industry Pilot Partners
1) Finance - Peracton
Sharpen investment and trading of complex decision making of Peracton’s
MAARS Big Data analytics application by adding custom sentiment metrics
2) Media - 3rdPLACE
Provide deep and reliable information of the finance sector to news providers
through SSIX metrics that will empower their Data Management software –
3rdEYE
3) Multilingual Analytics - Lionbridge
Get structured, comprehensive and actionable analytics on competitors,
customers and prospects from multilingual sources through the analysis of
specific market segments by SSIX 12
SSIX Metrics: Raw Scores – time series data streams from opinion mining techniques and Statistical Scores for deeper analysis of sentiment behaviour
Aims to provide near real-time sentiment analysis in addition to extensive analysis on time frame series with additional data sources
Multiple custom X-Scores data streams can be used to generate a custom SSIX Index for any target
Provide multilingual opinion mining with the aim to cover the most widely used European languages
A fully customisable API for the X-Scores and Index will allow SMEs to easily integrate SSIX technology into their own platforms
An interactive visual and analysis dashboard allowing users to easily understand sentiment dynamics
Summary – SSIX Overview
13