+ All Categories
Home > Documents > Paper Title (use style: paper title)€¦  · Web viewWe found this through combinations of three...

Paper Title (use style: paper title)€¦  · Web viewWe found this through combinations of three...

Date post: 12-Jul-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
12
Forecasting Stock Prices using Social Media Analysis Scott Coyne, Praveen Madiraju and Joseph Coelho Department of Mathematics, Statistics and Computer Science Marquette University Milwaukee, WI, USA {scott.coyne, praveen.madiraju, joseph.coelho}@marquette.edu Abstract—Stock market prices are becoming more and more volatile, largely due to improvements in technology and increased trading volume. Speculation affects business owners, investors, and policymakers alike. While these seemingly unpredictable trends continue, investors and consumers take to social media to share thoughts and opinions. We use information shared over StockTwits, a social media platform for investors, to better understand and predict individual stock prices. We designed and implemented three machine learning models to forecast stock prices using the dataset collected from StockTwits. We also evaluated our models with conclusions drawn from previous researchers in this field. Our first model found no correlation between general StockTwits postings and stock price. However, our second and third models considered a novel approach and successfully filtered through the twits to find important posts. These important twits could predict stock price movements with greater accuracy (average around 65%) based on sentiment analysis and smart user identification. We consider a user “smart” based on number of likes, follower count and more importantly how often the user is right about a stock. Keywords—Stock market, sentiment analysis, social media analysis, big data, machine learning, prediction. I.INTRODUCTION The New York Stock Exchange is responsible for twenty-one trillion dollars of equity. That is more than its countries GDP. NYSE also sees around one hundred sixty-nine billion dollars of these assets traded daily. Day trading has become so volatile that it can affect not just businesses and investors, but the national economy as a whole. Publicly traded equity price is difficult to forecast largely because every potential buyer and seller plays a role in setting the price. Uncertainty over other investors actions is what led to the Great Depression and many economic recessions throughout history. If investor actions were more predictable, speculation would decrease and company sales and revenue would determine the short term stock price. In other words, stocks would be more steady and consistent on a day to day basis. Obviously, predicting the actions of every potential buyer and seller is no easy task. There is no database storing the thoughts and plans of every person interested in a given stock. Fortunately, modern technology provides something close: social media.
Transcript
Page 1: Paper Title (use style: paper title)€¦  · Web viewWe found this through combinations of three bag-of-words based models: linear regression, sentiment classifier, and smart user

Forecasting Stock Prices using Social Media AnalysisScott Coyne, Praveen Madiraju and Joseph Coelho

Department of Mathematics, Statistics and Computer ScienceMarquette UniversityMilwaukee, WI, USA

{scott.coyne, praveen.madiraju, joseph.coelho}@marquette.edu

Abstract—Stock market prices are becoming more and more volatile, largely due to improvements in technology and increased trading volume. Speculation affects business owners, investors, and policymakers alike. While these seemingly unpredictable trends continue, investors and consumers take to social media to share thoughts and opinions. We use information shared over StockTwits, a social media platform for investors, to better understand and predict individual stock prices. We designed and im-plemented three machine learning models to forecast stock prices using the dataset collected from StockTwits. We also evaluated our models with conclusions drawn from previous researchers in this field. Our first model found no correlation between general StockTwits postings and stock price. However, our second and third models considered a novel approach and successfully filtered through the twits to find important posts. These important twits could pre-dict stock price movements with greater accuracy (average around 65%) based on sentiment analysis and smart user identification. We consider a user “smart” based on num-ber of likes, follower count and more importantly how of-ten the user is right about a stock.

Keywords—Stock market, sentiment analysis, social media analysis, big data, machine learning, prediction.

I.INTRODUCTIONThe New York Stock Exchange is responsible for twenty-one trillion dollars of equity. That is more than its countries GDP. NYSE also sees around one hundred sixty-nine billion dollars of these assets traded daily. Day trading has become so volatile that it can affect not just businesses and investors, but the national economy as a whole. Publicly traded equity price is difficult to forecast largely because every potential buyer and seller plays a role in setting the price. Uncertainty over other investors actions is what led to the Great Depression and many economic recessions throughout history. If investor actions were more predictable, speculation would decrease and company sales and revenue would determine the short term stock price. In other words, stocks would be more steady and consistent on a day to day basis. Obviously, predicting the actions of every potential buyer and seller is no easy task. There is no database storing the thoughts and plans of every person interested in a given stock. Fortunately, modern technology provides something close: social media.

The internet has changed the way traders trade. It gives them instant access to resources, articles, and statistics about whatever company they choose. It allows investors to connect at the speed of light. And now the common investor is not just a consumer of information, but a producer as well. Anyone with internet access can create a social media account and share their ideas at the click of a button. Traders buy, sell, and discuss in real-time leaving a paper trail in the form of Big Data. Our research dives into this dataset to find correlations between investor posts and stock prices.

StockTwits [1] is a social media service for traders, investors and entrepreneurs to share ideas regarding stock information. Better yet, their information is available to the public. This became the perfect source for all things data. Similar to twitter, StockTwits users have up to one hundred forty characters to share their thoughts, using hashtags and dollar tags ($) to identify subjects or stock symbols (i.e. $AAPL, $AMZN). Most previous papers referenced use StockTwits as their data source. In turn, all analysis and feature extraction on our end will come from here as well.

Our methods looked at a large number of stocks, each with varying market capacity, to ensure accurate results. After being granted partner level access, we downloaded one year of StockTwits history, spanning from May 2016 to April 2017. This saved each posts content, author information, and date. Eventually, all notable twits were grouped based on their target stock and saved in pickle (python object serialization) format, leaving 1,013,794 twits for us to examine.

According to most, social media has strong predictive power. Zhang, Fuehres, and Gloor [2] have found that general twitter sentiment has a correlation with how the market as a whole will move. If the average user was happy and optimistic, the S&P 500 would go up in value and vice versa. This begs the question: Is there a correlation between social media sentiment and an individual stocks price? Plenty of people have answered, usually with mixed reviews.

Many researchers conclude there is a correlation between sentiment and stock price. Oh and Sheng [3] found that “stock micro blog with its succinctness, high volume and real-time features do have predictive power over future stock price movements”. They used a bag of words classifier to label posts as positive or negative sentiment. The average sentiment was then used to predict directional change in stock price.

Oh and Sheng are not the only researchers to reach this conclusion. Given this, it is still important to be skeptical.

Page 2: Paper Title (use style: paper title)€¦  · Web viewWe found this through combinations of three bag-of-words based models: linear regression, sentiment classifier, and smart user

Oliveira, Cortez, and Areal [4] disagreed with previous literature and denounced the theme of a positive correlation. They directly addressed Oh and Sheng, saying their sample size was too small to draw a conclusion. This groups model analyzes sentiment metrics, similar to previous studies. One key difference being they used a full regression model to predict the exact percent change rather than direction. They concluded that, given sufficient data and infinite testing, there is no correlation between sentiment and price.

In this paper, we contend that Oliveira, Cortez, and Areal were correct in their skepticism, but there is still evidence for predictive power in social media. We found this through combinations of three bag-of-words based models: linear regression, sentiment classifier, and smart user classification. The basic regression model is similar to models in [3], [4], and [5]. This more or less verified [4]’s declaration of no correlation. However, our other models were unique to previous studies. The smart user filtering model produced extremely high accuracies, but it led to limited sample size and highly varied results. Therefore, we do not intend to prove all social media posts have predictive power, but rather show there is evidence when models go beyond a basic sentiment analyzer.

The remainder of this paper is organized as follows. Section two provides a survey of related works and Section three discusses preliminaries and tools we use going forward. This leads into Section four where we discuss the three models we use as well as their results. Next, we evaluate the models to compare accuracies in Section five. Section six offers our conclusion and ideas for work moving forward.

II. RELATED WORKSAs mentioned above, previous literature is back and

forth on the predictive power of StockTwits. After [3] stated they found a correlation, [4] argued in direct response. We chose their argument as the basis for our research question, but they are not the only sources with notable results. [5] used a number of models such as regression and Naïve Bayes to find a correlation. Their average accuracy beat the base case, but the varying results on different stocks left the ending inconclusive. Our sources [6-9] include tools mentioned in section three as well as stock market information referenced in this paper. The main topic of [10] is sentiment analysis. They provide a survey for ten sources in the area, many of them relate to the stock market. The common theme was that sentiment classification has been improving in recent years, and machine learning models can yield very high accuracies. [11], an older paper, utilized Naïve Bayes with a strong correlation. They also noticed they can cluster similar posts based on emotion. Once they analyzed the input, they concluded that there is a correlation between sentiment and price. However, all their accuracies were below the base case.

While [11] had uncertain conclusions, their work in sentiment analysis proved useful. The Naïve Bayes approach was utilized by [12] as well. [12] was successful in filtering out neutral twits. This helped eliminate noise from the dataset and only examine posts with a real effect on the market. They

first did so with Naïve Bayes, but also with decision trees and support vector machines.

Neural networks were first introduced to the stock market in [13]. This article, written before the turn of the century, uses several economic and technical statistics as input. [14] was a key pioneer in bringing neural network predictions to social media. They accurately predicted box office sales based on individual tweet sentiment. These two works prove deep learning can be applied to both social media and the stock exchange. In 2014, [15] brought the two together. Their model classified StockTwits data by sentiment with extremely high accuracy. Our classifier, listed in section four, applies the same general method.

Since sentiment is abundant in social media, there must be some predictive power in it. [16] discusses numerous applications for sentiment analysis. They found it can predict things from a consumers taste in music to civil voting habits. This work also discussed StockTwits and the public equity market. They believe similar algorithms can predict stock cost. After all, [17] found that social moods and real life sentiment are determinants of price movement. [18] also proved that worried discussions lead to a downtrend in stock price. Since real-life sentiment plays a role in the market, digital sentiment must as well.

III. PRELIMINARIESHere, we discuss the background material relevant for the

rest of the paper.

A. StockTwits [1]StockTwits is a social media communication

platform designed for traders and investors to discuss stock related information. Users tag stocks using the $TICKER format, which allows for effective data mining and analytics on user and stock related information. Extracted features include twit content, author, date, etc.

B. Quandl [6]Quandl is a leading source of financial and economic

data. Their simple stock history API was used to gather daily pricing data for the examined stocks. Simple data manipulation allowed us to label every day with a percentage change, which our algorithms would attempt to predict.

C. Python and Scikit-Learn [7]Our code was created using Python 2.7. This

language was chosen due to the numerous resources and machine learning tools available. Scikit-Learn was the library of choice. It is useful for its open source code as well as wide array of models available. We made use of the following:

TF-IDF (term frequency-inverse document fre-quency) for text feature extraction

Linear Regression MLP (multi-layer perceptron) Classifier

Page 3: Paper Title (use style: paper title)€¦  · Web viewWe found this through combinations of three bag-of-words based models: linear regression, sentiment classifier, and smart user

TF-IDF is a statistic used to determine how important a word is to its body of text.

In this case, it counts how many times a word appears in a given twit, and divides it by how many twits contain this word.

Scikit-Learn’s linear regression model works well with TF-IDF statistics. It takes in m cases of n parameters and one output. With this information, it calculates a coefficient for each parameter, and uses these coefficients to calculate output for unknown inputs.

The MLP classifier uses a neural net of n layers with a calculated number of perceptron’s to classify data into given categories. In our case, it can classify a twit as bullish, bearish, or neutral.

D. Pre-processingScikit-Learn is powerful for preprocessing of text.

Typical language processing mandates identification of stop words and grammar formatting. However, the TF-IDF approach can eliminate stop words automatically. Sample words like “and” or “the”, will appear in a majority of the documents. Therefore, their inverse document frequency gives them a score approaching zero. Also, word count vectors do not rely on order nor surrounding grammar. Our model just learns the vocabulary and plugs it right in to a prediction machine.

IV. MODELS AND RESULTSA. Linear Regression

Regression is not a new approach to predicting the stock market. It often takes in sentiment analysis and predicts some variable about the stock. Our model takes a different form of input, but similar results are expected. Rather than extract sentiment, we use a pure bag of words. From this bag of words, we process frequencies to find TF-IDF scores. Each

twit was then given a sparse vector representation of its content. Finally, the vectors corresponding to a target date were averaged together and used as input (see Figure 1).

One reason researchers are skeptical of predictive models is the possibility of overfitting. [3] and [5] only evaluate a handful of stocks, and they validate and test over the same data. This is why we chose linear regression. It is near the most basic form of mathematical modeling, which eliminates risk of overfitting. We also chose one stock, Apple ($AAPL), for creation and validation. Apple is the most discussed company on StockTwits. This process allowed us to move the model to other stocks with minimal room for bias. Also, to keep similarity with the model in [3], We only aimed to classify direction, not predict exact change.

Figure 1. Linear Regression Model

Eq2. Simple Equation for Linear Regression

Eq1. Formula for calculating TF-IDF

Page 4: Paper Title (use style: paper title)€¦  · Web viewWe found this through combinations of three bag-of-words based models: linear regression, sentiment classifier, and smart user

Despite the differences, model one will uncover the same truths as [3] and [4]. The latter two dissected sentiment before running regression. In order to do so, they used a bag of words and TF-IDF scores. We decided to cut out the middle man in order to feed the regression model as much data as possible. Assuming there is a correlation between sentiment and price, words that trigger a high sentiment would trigger a high coefficient in our regression equation. Therefore, if our model does not correctly use bullish and bearish words to predict directional movement, the pure sentiment analysis in [3] may not correlate as it is proposed.

One challenge we faced in evaluating our model was setting a proper target date. Papers such as [5] read the twits for day a to predict how the market will change on date b, usually some set n days ahead. We wanted to test for any possible correlation, so we created three versions to tackle this:

Testing three models with varying n values helped eliminate bias in our final results. Our validation showed similar results across the board, but they were consistently best using the N-Aggregate approach with an n value of two.

We tested seventeen unique stocks over a year of data. Our model was optimized by looking at all twits two days back, and comparing that average vector to directional movement. We then compared each accuracy to a base case, or accuracy achieved by only predicting favorable outcome every time (all up, as that is the likely outcome for the market at large). The results were incredibly mediocre and worked only as well as the base case. This reaffirmed the conclusion suggested by [4].

Key: Bold and Green – Better than base case Red – Worse than base case

The average accuracy came out to 52.45%, while the average base case accuracy came out to 52.55%. According to [9], the market as a whole typical goes up around 52-54% of days. This lack of improvement points to overfitting in previous regression models. However, many stocks did perform well, over large periods of test data, which is likely not all coincidence. Therefore, there must be some underlying correlation between the bag of words and stock price.

B. Sentiment PredictionSentiment is the underlying value of social media.

According to [2], the positive or negative vibes sent through twitter affect the positivity people carry into their own decision making. In their case, it reflected how the stock market performed as a whole. Due to widespread applications of sentiment, many different classification models have been tested [10]. The original trend was a dictionary based approach. This method gave words a set score based on individual sentiment, and the scores were added up over a body of text. More modern research, on the other hand, favors a supervised learning approach. These models utilize Naïve Bayes [11], Support Vector Machines [12], and even Neural Networks [15]. The application of neural networks has brought classification accuracy up to ninety percent.

There was one aspect we found interesting about previous sentiment classification. Many only classified into two groups: positive and negative. However, closer inspection shows lots of twits are only noise with no real sentiment behind them. Take the following Costco twits for example:

Table 1. Results of Linear Regression Model

Model DescriptionSame Day Read twits from some date and compare

how the price changed that exact day. N-Ahead Read the twits from some date.

Take a parameter n and compare to how stock changes n days ahead.

N-Aggregate Take a parameter n, chose some date, and read all twits over next n days. Compare these twits to stock price n + 1 days after original date.

Page 5: Paper Title (use style: paper title)€¦  · Web viewWe found this through combinations of three bag-of-words based models: linear regression, sentiment classifier, and smart user

None of these hold a prediction, sentiment, or real idea. Even if they seem positive in inflection, they never indicate whether the stock should be bought, sold, or shorted. However, previous models, including our work in 4A, use these as training data. In Model 1, every twits TF-IDF was averaged and then compared to the stocks change in price. Say one thousand unimportant, neutral tweets are posted on some day. Then imagine that same day, the stock price jumps five percent. These are two independent events. However, machine learning models will recognize a correlation and say these unimportant tweets have positive value. For future work, we show it is possible to filter out these tweets and still hold on to

predictive data.This method began by manually labelling tweets. We

only set out to prove possible, not to perfect the model. Therefore, sixteen thousand tweets were manually labeled, but the model may become even more accurate as that number increases. Human labeling did uncover some interesting truths. First, we found that the majority of posts were of neutral or unimportant sentiment. Secondly, individual users tended to post either all noise or all predictive posts. This finding will prove significant in Model 3.

We started off with a Naïve Bayes model, similar to work in [11] and [12]. This model is known for its simplicity and short runtimes, but it did fall short in our case. Naïve Bayes is probability based, and the overwhelming majority of training data was labeled neutral. This skewed probability guessed near ninety percent of twits had zero sentiment. While the final accuracy was higher than most works surveyed in [10], it did not leave enough data to train and test future models.

The next idea came from [15]. Their model used neural nets to extract features and classify with high correctness. We set up a basic MLP Classifier and let it train over the labeled twits. This model tested far better than Naïve

Bayes. The neural net labeled an appropriate number of posts

with a real sentiment (negative one or one) and predicted most of them perfectly (See Table 2). Figure 2 summarizes and illustrates this process.

Sentiment analysis is still progressing, especially in classifying subjective inputs [16]. Our findings open the door for improved models to take in only notable social media data and reach much more accurate predictions.

C. Smart User ClassificationOur final model came under a simple assumption: not

all twits are equal. This means that posts carry different weights. Some might be irrelevant while others influence enough people to cause a notable change in the market. We came up with three approaches to calculating weight. The first two methods are straightforward and common, but the third one led to interesting new results. These methods were to score weight based on:

1. number of likes2. the user’s follower count3. how often the user is correct

Weights from the first two methods were plugged in to model one with little to no improvement. This maintained our proof of no correlation. However, the correctness of a user has numerous applications. One of the main goals from model two was to eliminate noise and meaningless twits. This same idea can be applied on a user by user basis. Users tend to only post noise or only post predictions, rarely both. In our final model, we attempt to identify those users who only post predictions. Better yet, we can identify users who only post correct

Figure 2. Sentiment Classification Model

Table 2. Probability distribution for sentiment prediction$COST This is getting Interesting again.$COST I trade based on probability, not emotion…$COST Dang...had this on my WL this morning but forgot to come back to it. Ugh.Next weeks stock watchlist $COST$COST special div$COST amzn should just buy COST.

Page 6: Paper Title (use style: paper title)€¦  · Web viewWe found this through combinations of three bag-of-words based models: linear regression, sentiment classifier, and smart user

predictions. This group became our Smart User class, and they can predict the market with accuracy higher than any other

model.The first task was classifying smart users and

whether their twits are correct. Fortunately, model two plugs in here nicely. We classified every twit as bullish, bearish, or neutral. Then, after removing the neutral class, we compared

the sentiment to how the stock moved that day.

This gave us statistics for the typical posts accuracy (see Table 3).

Of course, these accuracies are calculated using every user and every post. Applying our Smart User model yields notably stronger results. We took the twits used above and grouped them by user. Then, we examined each user for volume and accuracy. If they published at least n meaningful posts in the nine months of training data (n usually around six to eight, depending on how popular the stock was) and at least eighty percent of the posts were accurate, they were labeled a smart user (see Figure 3). Now, only the top important users were examined to predict the market. Figure 3 summarizes the system architecture of stock forecasting using the smart user classification discussed above.

Before reading the results, it is important to understand the limitations of this model. First off, only a tiny fraction of data is actually used, because only a small portion of users posting are actual experts (as validated by our system) and only a small portion of twits are of quality content posted by the smart users. Also, we can only predict days in which at least one smart user posts about this stock. This leaves the

majority of days unpredictable. Finally, only extremely common stocks can be used. This is essential for two reasons.

First, a high volume of postings is required if the majority of them are filtered away. Second, small-cap or less popular stocks do not have many smart users that post about them multiple times a month. We found nine stocks with high enough popularity to produce meaningful results. Their

statistics are in the following table.

Small sample sizes have caused bias in many previous works, at least according to [4]. Our sample sizes were small, but our results overcome bias through random train-test splits and a high order of runs. For each stock, a random hundred days of the year were chosen for testing days, and the rest for training. Training days were used to identify smart users, and then we looked at smart user postings over the training set. This random split and test was performed twenty-five times per stock, and we averaged results to

calculate an accuracy for each.

Figure 3. Stock Forecasting using Smart User Model

Table 3. Results of market prediction based on aggregate sentiment

Table 4. Limiting statistics of smart user prediction

Table 4. Results of smart user prediction

Page 7: Paper Title (use style: paper title)€¦  · Web viewWe found this through combinations of three bag-of-words based models: linear regression, sentiment classifier, and smart user

The average accuracy came out to 0.643 with only one stock testing below its base case. The given results prove an underlying correlation between sentiment and stock price direction. However, it is difficult to find as it is buried under the noise of unimportant posts.

V. EVALUATIONSAll three models had the same fundamental input:

term frequency-inverse document frequency. While many recent works such as [12] and [15] added in extra related features, TF-IDF was the common theme. Using TF-IDF allowed us to relate our work with all previous research. Model one performed no better than expected. Model two verified sentiments presence in social media. Model three, the outlier, gives indisputable evidence of a correlation between sentiment from model two and predictions from model one. Model three’s accuracy is almost fifteen percent higher than model one. It is also over ten percent higher than it would be without filtering for smart users.

Model one had more than enough data to create a sure conclusion. Model two, on the other hand, gave high accuracies without significant training data. [10] and [11] already proved sentiment classification is possible with abundant input. Our model proves than anyone can find a correlation with limited data they label on their own. In our case, the yielded accuracy was strong enough to be used for Smart User Classification. Future research can use the same efficient approach to test their models with ease.

VI. CONCLUSIONOne of the main goals of the paper is to propose a

model for forecasting stock prices using social media analysis. Forecasting stock prices is a complex subject and we do not pretend to be experts in the field. Instead, as discussed in model 3, we harness the public knowledge of such experts in order to utilize machine learning and make accurate predictions. The first step in this process was correctly labeling twits as bullish, bearish, or neutral. We handled this in model two, using a multi-layer perceptron classifier to analyze term frequencies-inverse document frequencies. Once all twits were labeled, we could determine whether they were correct or not. These statistics were then used to identify smart users, who were primarily correct and had a large volume of

postings. In turn, these smart users could predict the market with high accuracy.

Model three proves that at least some limited number of twits have a predictive power on stock price. This new information can be applied to models popularized by [3] and [5] in order to improve accuracies across the board.

It is understandable why many sources report a correlation despite proof against it. These sources all tested limited amounts of data. In a small sample size, an unusually high number of Smart User twits can be included. When mixed together, the small population of smart users can cause the StockTwits feed as a whole to appear predictive. Of course, not all twits are equal, and this mass assumption can be proven false. This means that models such as regression and classification can still be useful as long as smart users and certain twits are accounted for.

A second goal of this paper was to verify sentiment identifiers used in previous works. This is the reason we created model two, our own sentiment classifier. The high accuracy of model two supported the conclusions drawn by previous researchers as well as our own conclusions from model one. TF-IDF has a very strong relation to sentiment, and it can be found without excessive training data. This finding has many applications. The first one is to relate general sentiment to the stock market. While general sentiment does not relate to an individual stock, it can relate to the entire market, just as [2] set out to prove. A second application is the mass filtering of social media. Not only can it get rid of neutral posts, but also be a platform for models such as model three.

Model three is not only important due to its high accuracy, but also because it shows sentiment has applications beyond aggregate guessing. If smart users can be identified, then perhaps sentiment can find smart companies, conversation threads, and followers as well. These are all bodies of work we expect to explore in the future. Social media certainly does relate to the stock market, and sentiment is indeed a strong indicator. Another interesting area of future work is to establish some adaptive weighted features to better identify smart users. On a different note, it may be interesting to explore identifying social bots or users who are hired by stock promoters to promote certain small-cap stocks. The results of model three demonstrate potential for future researcher to better understand social media and its predictive power.

ACKNOWLEDGEMENTWe thank StockTwits for granting us partner level access to

their database. We also acknowledge the Wehr Foundation for funding our project and keeping an emphasis on moving the world through scientific research. This project would not have been possible without either of these organizations.

REFERENCES

[1] http://stocktwits.com Last Accessed 26/7/2017[2] Zhang, X., Fuehres, H., & Gloor, P. A. (2011). Pre-

dicting stock market indicators through twitter “I

Page 8: Paper Title (use style: paper title)€¦  · Web viewWe found this through combinations of three bag-of-words based models: linear regression, sentiment classifier, and smart user

hope it is not as bad as I fear”. Procedia-Social and Behavioral Sciences, 26, 55-62.

[3] Oh, C., & Sheng, O. (2011, December). Investigating Predictive Power of Stock Micro Blog Sentiment in Forecasting Future Stock Price Directional Move-ment. In ICIS.

[4] Oliveira, N., Cortez, P., & Areal, N. (2013, Septem-ber). On the predictability of stock market behavior using stocktwits sentiment and posting volume. In Portuguese Conference on Artificial Intelligence (pp. 355-365). Springer, Berlin, Heidelberg.

[5] Tsui, Derek. "Predicting Stock Price Movement Us-ing Social Media Analysis.", Stanford University, Technical Report

[6] http://quandl.com Last Accessed 10/7/2017[7] http://scikit-learn.org Last Accessed 11/ 7/2017[8] https://image.slidesharecdn.com/nltkscikit-learnpy -

confr2010ogrisel-100828080703-phpapp02/95/statis-tical-learning-and-text-classification-with-nltk-and-scikitlearn-6-728.jpg?cb=1282983280 Last Accessed 21/7/2017

[9] Stock Market Yo-Yo. (n.d.). Retrieved July 18, 2017, from https://www.crestmontresearch.com/docs/Stock-Yo-Yo.pdf

[10] Nausheen, S., Kumar, A., & Amrutha, K. K. SUR-VEY ON SENTIMENT ANALYSIS OF STOCK MARKET.

[11] Zhang, K., Li, L., Li, P., & Teng, W. (2011, August). Stock trend forecasting method based on sentiment analysis and system similarity model. In Strategic Technology (IFOST), 2011 6th International Forum on (Vol. 2, pp. 890-894). IEEE.

[12] Xu, F., & Keelj, V. (2014, July). Collective Senti-ment Mining of Microblogs in 24-Hour Stock Price Movement Prediction. In Business Informatics (CBI), 2014 IEEE 16th Conference on (Vol. 2, pp. 60-67). IEEE.

[13] Kimoto, T., Asakawa, K., Yoda, M., & Takeoka, M. (1990, June). Stock market prediction system with modular neural networks. In Neural Networks, 1990., 1990 IJCNN International Joint Conference on (pp. 1-6). IEEE.

[14] Asur, S., & Huberman, B. A. (2010, August). Pre-dicting the future with social media. In Web Intelli-gence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on (Vol. 1, pp. 492-499). IEEE.

[15] Meesad, P., & Li, J. (2014, December). Stock trend prediction relying on text mining and sentiment anal-ysis with tweets. In Information and Communication Technologies (WICT), 2014 Fourth World Congress on (pp. 257-262). IEEE.

[16] Feldman, R. (2013). Techniques and applications for sentiment analysis. Communications of the ACM, 56(4), 82-89.

[17] Andreassen, P. B. (1987). On the social psychology of the stock market: Aggregate attributional effects

and the regressiveness of prediction. Journal of Per-sonality and Social Psychology, 53(3), 490.

[18] Gilbert, E., & Karahalios, K. (2010, May). Wide-spread Worry and the Stock Market. In ICWSM (pp. 59-65).


Recommended