K. Arvanitis, N. Bassiliades, “Real-Time Investors' Sentiment Analysis from Newspaper Articles”, in Advances in Combining Intelligent Methods, Intelligent Systems Reference Library, Volume 116, pp. 1-23, Springer, 2017.
Author(s): K. Arvanitis, N. Bassiliades
Keywords: Sentiment analysis, Data mining, Sentiment index, Investor sentiment, Stock returns, Naïve bayes classifier, n-gram language model
Abstract: Recently, investor sentiment measures have become one of the more widely examined areas in behavioral finance. They are capable of both explaining and forecasting stock returns. The purpose of this paper is to present a method, based on a combination of a Naïve Bayes classifier and the n-gram probabilistic language model, which can create a sentiment index for specific stocks and indices of the New York Stock Exchange. An economic useful proxy for investor sentiment is constructed from U.S. news articles mainly provided by The New York Times. Initially, a large amount of articles for ten big companies and indices is collected and processed, in order to be able to extract a sentiment score from each one of them. Then, the classifier is trained from the positive, negative and neutral articles, so that it is possible afterwards to examine the sentiment of any unseen newspaper article, for any company or index. Subsequently, the classification task is tested and validated for its accuracy and efficiency. The widely used Baker and Wurgler sentiment index  is used as a comparison measure for predicting stock returns. In a sample of S&P 500 index from 2004 to 2010 on monthly basis, it is shown that the new sentiment index created has, on average, twice the predictive ability of Baker and Wurgler’s index, for the existing time frame.