A Sentiment Analysis Approach for predicting low-cap-stock Next Day Returns
Sentiment analysis is the analysis of the feelings (i.e. attitudes, emotions and opinions) – which are expressed in the news reports/blog posts/twitter messages etc. This is done using natural language processing (NLP) tools & algos.
Sentiment often influences & drives the direction of markets, at least in short time frames of few days and less.
Hence, traders and other participants in the financial markets seek to analyze the sentiment expressed in news reports/tweets/blog posts in order to use it in their models & trading decisions.
There are many tools out there to offer sentiment indicators for various assets. In a-quant, we have developed our own NLP based sentiment analysis tool and have tested its significance in predicting various asset prices.
In this article , we present some findings in the Small Cap space and more specifically for 500 small-cap stocks of Russell2000 index , with the smallest capitalization.
We used the small-cap stocks because the effect sentiment has on their prices is stronger for smaller companies than for larger ones.
The data used are sentiment data for the last 3 years for these Russell2000 stocks, based on a machine learning NLP algorithm that’s scanning thousands of news and media websites daily.
The output is a sentiment score derived from the news coverage, H/L sentiment scores for each day, news volume and change in news volume. It ranges from -5 (extremely negative coverage) to +5 (extremely positive coverage) where a score of 0 indicates an absence of articles for that particular day.
We considered how changes in company sentiment could be useful for short-term stock trading – mainly focusing on day trading – taking a company’s sentiment as an indicator for the stock’s price movement.
For our tests we used 3 “strategies”: Reversion, Logistic Regression, and Support Vector Machines (SVM).
In all of them we used sentiment indicators from previous days/weeks/months as features/inputs in our models in order to predict next day’s direction.
a) Simple reversion strategy
The reversion strategy tests were based on the assumption that companies with extreme positive/negative sentiment over the past day/week/month tend to revert within the next one or two days.
Results:
Sentiment basis |
Next Day Predictability% |
daily |
63.28% |
weekly |
62.77% |
monthly |
63.30% |
i.e. the 63.28% of the companies that – for a specific day – had an extreme positive sentiment, they got a negative return next day.
b) Logistic regression
Classification techniques are an essential part of machine learning and data mining applications. Logistic regression is a useful regression method for solving a binary classification problem. It is well suited for estimating values classify as a binary class e.g. one result is 1 and the other is -1.
This is what we did for our tests: we considered 1 to be a positive next-day return and -1 a negative one and we trained a logistic regression algorithm to predict the stock returns (positive or negative) for the next day.
In addition, logistic regression can ‘predict’ the probability of occurrence of a binary event (i.e. for a positive or a negative return). For our tests, we set the 60% as our threshold probability of successful prediction and the results where as shown below:
LogReg Predictability:
Features basis |
Next Day Predictability% |
weekly |
57.08% |
monthly |
53.03% |
Top performers:
Company |
Next Day Predictability% |
AVEO PHARMACEUTICALS INC |
62.11% |
BANK OF COMMERCE HOLDINGS |
59.33% |
COMMERCIAL VEHICLE GROUP |
59.04% |
c) Support Vector Machines (SVM) are the best supervised classification techniques and their goal is to train a model that assigns new unseen objects into a particular category i.e. categorize new objects into two separate groups based on their properties and a set of known examples, which are already categorized.
For our tests, we aimed to classify a sentiment score into a negative or positive next-day return (similar to logistic regression above) based on other sentiments which have already been classified as leading to positive or negative next-day returns.
The results where as shown below:
SVM Predictability:
Features basis |
Next Day Predictability% |
weekly |
57.52% |
monthly |
52.72% |
Top performers:
Company |
Next Day Predictability% |
MANITEX INTERNATIONAL INC |
71.43% |
AVEO PHARMACEUTICALS INC |
65.91% |
GERON CORP |
64.58% |
In general the results so far are very encouraging, but we can achieve even better accuracy, if we use the sentiment indicators in a model with other features as well (based on factor models, technicals , fundamentals etc).
Of course the good predictions in market direction do not guarantee profitable trades , but is a good start & gives an edge to build a better short term trading strategy upon.