+ All Categories
Home > Documents > 20121108 sntmnt data_sciencenl

20121108 sntmnt data_sciencenl

Date post: 27-Jan-2015
Category:
Upload: datasciencenl
View: 110 times
Download: 3 times
Share this document with a friend
Description:
 
Popular Tags:
28
the prevailing attitude of investors as to anticipated price development in a market. < sen·ti·ment > Tim Harbers, CTO SNTMNT DataScienceNL Meetup November 8th 2012
Transcript
Page 1: 20121108 sntmnt data_sciencenl

the prevailing attitude of investors as to anticipatedprice development in a market.

< sen·ti·ment >

Tim Harbers, CTO SNTMNTDataScienceNL Meetup November 8th 2012

Page 2: 20121108 sntmnt data_sciencenl

Tim Harbers

Background

BSc Computer Science

MSc Computer Science

Researcher

Data Miner

Technical Consultant

Co-Founder and COO

Co-Founder and CTO

Page 3: 20121108 sntmnt data_sciencenl

Vincent van LeeuwenCustomer Development

Kees van NunenProduct Development

Durk KingmaData Mining Expert

Tim HarbersMachine Learning Expert

The Rockstars‣ Balanced multidisciplinary

team

‣ Two machine learning experts in predictive analysis and large datasets

‣ Academic degrees in Behavioral Finance, Portfolio Finance, Strategic Management & Artificial Intelligence

‣ Strong network in (Dutch) financial industry

‣ Young, enthusiastic team with a proven entrepreneurial mindset

Page 4: 20121108 sntmnt data_sciencenl

How to select the right stockto invest in?

Page 5: 20121108 sntmnt data_sciencenl

Our solution:

Predicting stock price movementbased on online buzz

Engineered based on academic research:

Bollen, et al, (2010)

Sprenger and Welpe (2010)

Van Leeuwen (2011)

Sehgal and Song (2007)

Page 6: 20121108 sntmnt data_sciencenl

Why would this work? Very different from traditional indicators News travels faster via social than traditional

media Tremendous amount of data (Almost) nobody uses it yet

Page 7: 20121108 sntmnt data_sciencenl

Why focus on Twitter? Public data & easily accessible Structured language 400M tweets per day

Page 8: 20121108 sntmnt data_sciencenl

Historic ResearchBollen (2010) Created a model based on Twitter mood states, which was 86% accurate on the DJI.

Sprenger and Welpe (2011) Analyzed correlation of the stock market and micro blogs

Page 9: 20121108 sntmnt data_sciencenl

Financial Sentiment vs Brand Sentiment

Financial Sentiment Brand Sentiment

Tweets relating to stocks

Written by traders Trader mumbo

jumbo More relevant Shorter term

Tweets relating to brands

Written by consumers

Any language Larger dataset Longer term

Page 10: 20121108 sntmnt data_sciencenl

Data setupPeriodJune 2010 to April 2012

StocksTop 15 most tweeted stocks in S&P 500

TweetsFinancial Dataset Timm Sprenger (4 million)4 Million tweets Topsy Brand Tweets (100+ million tweets)

OtherKloutPeerindex

Page 11: 20121108 sntmnt data_sciencenl
Page 12: 20121108 sntmnt data_sciencenl

Sentiment Scoring

Page 13: 20121108 sntmnt data_sciencenl

Financial tweets

Page 14: 20121108 sntmnt data_sciencenl

Commercial tweets

Page 15: 20121108 sntmnt data_sciencenl

Sentiment analysis:

Enabling computers to derive sentimentfrom natural language

Page 16: 20121108 sntmnt data_sciencenl

Naive Approach: Dictionaries Use a dictionary of common positive and

negative terms Count the number of positive and negative

terms Use the difference between the two.

Page 17: 20121108 sntmnt data_sciencenl

SNTMNT’s approach: machine learning Label a training set of tweets (target) Use preprocessing techniques Use several feature extractors Create a sparse dataset. Use supervised learning to train a machine

learning model.

Page 18: 20121108 sntmnt data_sciencenl

Labeling

• 25K Financial tweets hand labeled• 30K Commercial tweets hand

labeled• 1M #happy vs. #sad

Page 19: 20121108 sntmnt data_sciencenl

Difficulties in sentiment analysis Authors / Urls Foreign languages

Slang aykm lol tgsttttptct

Negation

Target Sentiment Analysis

Page 20: 20121108 sntmnt data_sciencenl

ResultsFinancial tweets84.3% accurate on 2-point scale (Baseline: 60.4%)76.8% accurate on 3-point scale (Baseline: 65.0%)Beat Lexalytics (84.3% vs. 70.3%)Commercial tweets 84.7% accurate on 2-point scale (Baseline: 61.0%) 86.9% accurate on 3-point scale (baseline: 81.1%)

Page 21: 20121108 sntmnt data_sciencenl

Stock Regression

Page 22: 20121108 sntmnt data_sciencenl

Stock Regression Input:

Sentiment scores Mood states Meta Data Stock

Output: Trading Indication Confidence

Page 23: 20121108 sntmnt data_sciencenl

Many dimensions Tweet period Trading period Financial Tweets or Commercial Tweets Tweet Crunchers Models Trading strategy

Page 24: 20121108 sntmnt data_sciencenl

Tweet Aggregation Problem

Tweet volume Volume positive

tweets Avg sentiment Sentiment Growth Etc.

Page 25: 20121108 sntmnt data_sciencenl

Machine Learning Models Linear Regression Bayesian Approaches Decision Trees Neural Nets Support Vector Machines

Page 26: 20121108 sntmnt data_sciencenl

Results R2 < 0.01 Not usable as an independent trading model

after transaction costs. Still usable as an extra indicator to be used by

proven trading models.

Page 27: 20121108 sntmnt data_sciencenl

Stock Dashboard (B2B2C)

Sentiment APIs(B2B)

Trading Indicator API(B2B)

Products - next steps:

‣ Extend scope to further niche domains and languages.

‣ Market leader and thought leader financial sentiment analysis. ‣ Getting more

insights into added value of SNTMNT algorithm as indicator next to fundamental and technical analysis.

Page 28: 20121108 sntmnt data_sciencenl

Any questions?

For more info, visit:

www.SNTMNT.com


Recommended