Neural networks and tree-based credit scoring models
Název práce v češtině: | Neuronové sítě a stromové metody v kreditních skóringových modelech |
---|---|
Název v anglickém jazyce: | Neural networks and tree-based credit scoring models |
Klíčová slova anglicky: | machine learning , loan default model , logistic regression, random forests, neural networks |
Akademický rok vypsání: | 2016/2017 |
Typ práce: | bakalářská práce |
Jazyk práce: | angličtina |
Ústav: | Institut ekonomických studií (23-IES) |
Vedoucí / školitel: | prof. PhDr. Ladislav Krištoufek, Ph.D. |
Řešitel: | skrytý![]() |
Datum přihlášení: | 04.06.2017 |
Datum zadání: | 04.06.2017 |
Datum a čas obhajoby: | 11.09.2018 09:00 |
Místo konání obhajoby: | Opletalova - Opletalova 26, O105, Opletalova - místn. č. 105 |
Datum odevzdání elektronické podoby: | 31.07.2018 |
Datum proběhlé obhajoby: | 11.09.2018 |
Oponenti: | Mgr. Nicolas Fanta |
Kontrola URKUND: | ![]() |
Seznam odborné literatury |
Athey, Susan, and Guido Imbens. “The State of Applied Econometrics-Causality and Policy Evaluation.” arXiv Preprint arXiv:1607.00699, 2016.
Varian, Hal R. “Big Data: New Tricks for Econometrics.” Journal of Economic Perspectives 28, no. 2 (May 2014): 3–28. Krauss, Christopher, Xuan Anh Do, and Nicolas Huck. “Deep Neural Networks, Gradient-Boosted Trees, Random Forests: Statistical Arbitrage on the S&P 500.” European Journal of Operational Research 259, no. 2 (June 2017): 689–702. James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An Introduction to Statistical Learning. Vol. 103. Springer Texts in Statistics. New York, NY: Springer New York, 2013. Murphy, Kevin P. Machine Learning: A Probabilistic Perspective. Adaptive Computation and Machine Learning Series. Cambridge, MA: MIT Press, 2012. |
Předběžná náplň práce v anglickém jazyce |
Thesis will try to introduce various machine learning techniques to answer the question of whether in terms of loan default prediction, they can perform comparably or even better than normal linear regression models.
Default predictions used in big institutions are usually exclusively modelled by logistic regression, therefore I will try to show that machine learning models can replace/be used together with this normal approach. Data used for the thesis are from Lending Club from 2007-2017. Lending Club is the biggest peer-to-peer lending platform in the US. The dataset contains ~300 thousand completed loans with ~30 relevant variables. The dataset will be split into a randomly selected training subset and a smaller randomly selected testing subset. The models will be constructed using the training subset and subsequently run on the testing subset to compare the performance of the models. Selected machine learning models will involve primarily decision trees & random forests (James et al., An Introduction to Statistical Learning) and artificial neural networks (Murphy, Machine Learning: A Probabilistic Perspective). The thesis will contain a theoretical and an empirical part. In the theoretical part I will firstly review the current state of machine learning usage in economics and secondly review machine learning techniques. In the empirical part I will use the data to create different models for predicting default. In the final chapter I will compare the results and conclude. |