Date post: | 27-Jan-2015 |
Category: |
Documents |
Upload: | datasciencenl |
View: | 110 times |
Download: | 3 times |
the prevailing attitude of investors as to anticipatedprice development in a market.
< sen·ti·ment >
Tim Harbers, CTO SNTMNTDataScienceNL Meetup November 8th 2012
Tim Harbers
Background
BSc Computer Science
MSc Computer Science
Researcher
Data Miner
Technical Consultant
Co-Founder and COO
Co-Founder and CTO
Vincent van LeeuwenCustomer Development
Kees van NunenProduct Development
Durk KingmaData Mining Expert
Tim HarbersMachine Learning Expert
The Rockstars‣ Balanced multidisciplinary
team
‣ Two machine learning experts in predictive analysis and large datasets
‣ Academic degrees in Behavioral Finance, Portfolio Finance, Strategic Management & Artificial Intelligence
‣ Strong network in (Dutch) financial industry
‣ Young, enthusiastic team with a proven entrepreneurial mindset
How to select the right stockto invest in?
Our solution:
Predicting stock price movementbased on online buzz
Engineered based on academic research:
Bollen, et al, (2010)
Sprenger and Welpe (2010)
Van Leeuwen (2011)
Sehgal and Song (2007)
Why would this work? Very different from traditional indicators News travels faster via social than traditional
media Tremendous amount of data (Almost) nobody uses it yet
Why focus on Twitter? Public data & easily accessible Structured language 400M tweets per day
Historic ResearchBollen (2010) Created a model based on Twitter mood states, which was 86% accurate on the DJI.
Sprenger and Welpe (2011) Analyzed correlation of the stock market and micro blogs
Financial Sentiment vs Brand Sentiment
Financial Sentiment Brand Sentiment
Tweets relating to stocks
Written by traders Trader mumbo
jumbo More relevant Shorter term
Tweets relating to brands
Written by consumers
Any language Larger dataset Longer term
Data setupPeriodJune 2010 to April 2012
StocksTop 15 most tweeted stocks in S&P 500
TweetsFinancial Dataset Timm Sprenger (4 million)4 Million tweets Topsy Brand Tweets (100+ million tweets)
OtherKloutPeerindex
Sentiment Scoring
Financial tweets
Commercial tweets
Sentiment analysis:
Enabling computers to derive sentimentfrom natural language
Naive Approach: Dictionaries Use a dictionary of common positive and
negative terms Count the number of positive and negative
terms Use the difference between the two.
SNTMNT’s approach: machine learning Label a training set of tweets (target) Use preprocessing techniques Use several feature extractors Create a sparse dataset. Use supervised learning to train a machine
learning model.
Labeling
• 25K Financial tweets hand labeled• 30K Commercial tweets hand
labeled• 1M #happy vs. #sad
Difficulties in sentiment analysis Authors / Urls Foreign languages
Slang aykm lol tgsttttptct
Negation
Target Sentiment Analysis
ResultsFinancial tweets84.3% accurate on 2-point scale (Baseline: 60.4%)76.8% accurate on 3-point scale (Baseline: 65.0%)Beat Lexalytics (84.3% vs. 70.3%)Commercial tweets 84.7% accurate on 2-point scale (Baseline: 61.0%) 86.9% accurate on 3-point scale (baseline: 81.1%)
Stock Regression
Stock Regression Input:
Sentiment scores Mood states Meta Data Stock
Output: Trading Indication Confidence
Many dimensions Tweet period Trading period Financial Tweets or Commercial Tweets Tweet Crunchers Models Trading strategy
Tweet Aggregation Problem
Tweet volume Volume positive
tweets Avg sentiment Sentiment Growth Etc.
Machine Learning Models Linear Regression Bayesian Approaches Decision Trees Neural Nets Support Vector Machines
Results R2 < 0.01 Not usable as an independent trading model
after transaction costs. Still usable as an extra indicator to be used by
proven trading models.
Stock Dashboard (B2B2C)
Sentiment APIs(B2B)
Trading Indicator API(B2B)
Products - next steps:
‣ Extend scope to further niche domains and languages.
‣ Market leader and thought leader financial sentiment analysis. ‣ Getting more
insights into added value of SNTMNT algorithm as indicator next to fundamental and technical analysis.
Any questions?
For more info, visit:
www.SNTMNT.com