Thesis (Selection of subject)

Your browser does not support JavaScript, or its support is disabled. Some features may not be available.

News Feed Classifications to Improve Volatility Predictions

Thesis title in Czech:	News Feed Classifications to Improve Volatility Predictions
Thesis title in English:	News Feed Classifications to Improve Volatility Predictions
Key words:	volatility, text, klasifikátor, lexikon, senti ment, novinové články
English key words:	volatility, text, classifier, lexicon, sentiment, news
Academic year of topic announcement:	2015/2016
Thesis type:	diploma thesis
Thesis language:	angličtina
Department:	Institute of Economic Studies (23-IES)
Supervisor:	PhDr. Boril Šopov, M.Sc., LL.M.
Author:	hidden - assigned by the advisor
Date of registration:	17.06.2016
Date of assignment:	17.06.2016
Date and time of defence:	31.01.2018 08:30
Venue of defence:	Opletalova - Opletalova 26, O105, Opletalova - místn. č. 105
Date of electronic submission:	02.01.2018
Date of proceeded defence:	31.01.2018
Opponents:	RNDr. Michal Červinka, Ph.D.



URKUND check:

References

Stiglitz, J., Grossman, S. On the Impossibility of Informationally Efficient Markets, 1980. The American Economic Review. 70 (3), 393-408.

Laakkonen, H., Lanne, M., 2009. Asymmetric News Effects on Exchange Rate Volatility: Good vs. Bad News in Good vs. Bad Times. Studies in Nonlinear Dynamics. 14 (1), 1-38.

Poon, S., Granger, W., 2003. Forecasting Volatility in Financial Markets: A Review. Journal of Economic Literature 41 (2), 478-539.

Schumaker, R., Chen, H., 2009. Textual analysis of stock market predictions using breaking financial news: the azfin text system. ACM Transactions on Information Systems. 27 (2), 12.

Tetlock, P.C, 2007. Giving Content to Investor Sentiment: The Role of Media in the Stock Market. The Journal of American Finance Association. 62 (3), 1139-1168.

Preliminary scope of work

Tato práce analyzuje různé metody klasifikace textu za účelem zjištění, zda-li publikované novinové články o konkrétních společnostech umožňují lepší sim ulaci a predikci volatility akcií dané společnosti. V práci zkoumáme obsah textu publikovaných novinových článků a z toho vycházející sentiment (směr a síla) za použití tří různých přístupů: supervised machine learning Naive Bayes algoritmus, lexicon-based jako zástupce lingvistického přístupu a hy bridní Naive Bayes. V rámci hybridního Naive Bayes jsou uvažována pouze slova obsažená v daném lexikonu a nikoliv celý obsah článku. Pro lexicon- based přístup používáme nezávisle dva lexikony, jeden s binárním a jeden vícetřídním hodnocením sentimentu. Sentiment v trénovacím setu pro Naive Bayes byl přiřazen autorem. Z porovnání klasifikační metod založených na machine learning dojdeme k závěru, že všechny metody dosahují podobných výsledků z nichž nejlépe vychází hybridní Naive Bayes používající vícetřídní lexikon. Výstupní kvantitativní data ve formě hodnot sentimentu jsou pak dále zahrnuta do modelování volatility pomocí GARCH. Výsledky ukazují, že informace obsažené v novinových článcích přinášejí další vysvětlující prvek do tradičního GARCH modelu a jsou schopné zlepšit odhad. Nicméně, nejsme schopni získat dost podkladů pro určení nejlepší metody kvantifikace senti mentu. Model používající hybridní Naive Bayes přístup přinesl lepší in-sample výsledky, pro out-of-sample bylo však lepší užít vícetřídní lexikon. Také se nám podařilo ukázat asymetrický efekt, kdy pozitivní i negativní zprávy zvyšují volatilitu, nicméně u zpráv negativních je tento efekt silnější.

Preliminary scope of work in English

This thesis analyzes various text classification techniques in order to assess whether the knowledge of published news articles about selected companies can improve its’ stock return volatility modelling and forecasting. We examine the content of the textual news releases and derive the news sentiment (po larity and strength) employing three different approaches: supervised machine learning Naive Bayes algorithm, lexicon-based as a representative of linguistic approach and hybrid Naive Bayes. In hybrid Naive Bayes we consider only the words contained in the specific lexicon rather than whole set of words from the article. For the lexicon-based approach we used independently two lexicons one with binary another with multiclass labels. The training set for the Naive Bayes was labeled by the author. When comparing the classifiers from the machine learning approach we can conclude that all of them performed similarly with a slight advantage of the hybrid Naive Bayes combined with multiclass lexicon. The resulting quantitative data in form of sentiment scores will be then incorpo rated into GARCH volatility modelling. The findings suggest that information contained in news feeds does bring an additional explanatory power to tradi tional GARCH model and is able to improve it’s forecast. On the contrary, we could not provide enough evidence for favouring specific sentiment-derivation method. While the model employing hybrid Naive Bayes approach provided a bitter in-sample fit, the preferred model in the out-of-sample evaluation was the one employing multiclass lexicon. We also showed an asymmetric news effect, where both positive and negative news increase volatility with a latter having a more pronounced effect.