Témata prací (Výběr práce)Témata prací (Výběr práce)(verze: 368)
Detail práce
   Přihlásit přes CAS
Performance Analysis of Credit Scoring Models on Lending Club Data
Název práce v češtině: Performance Analysis of Credit Scoring Models on Lending Club Data
Název v anglickém jazyce: Performance Analysis of Credit Scoring Models on Lending Club Data
Klíčová slova: Kreditní skórování, P2P půjčování, Klasifikace, Žebříček klasifikátorů
Klíčová slova anglicky: Credit scoring, P2P Lending, Classification, Classifiers’ ranking
Akademický rok vypsání: 2016/2017
Typ práce: diplomová práce
Jazyk práce: angličtina
Ústav: Institut ekonomických studií (23-IES)
Vedoucí / školitel: prof. PhDr. Petr Teplý, Ph.D.
Řešitel: skrytý - zadáno vedoucím/školitelem
Datum přihlášení: 09.11.2016
Datum zadání: 09.11.2016
Datum a čas obhajoby: 22.06.2017 08:30
Místo konání obhajoby: Opletalova - Opletalova 26, O105, Opletalova - místn. č. 105
Datum odevzdání elektronické podoby:18.05.2017
Datum proběhlé obhajoby: 22.06.2017
Oponenti: Mgr. Magda Pečená, Ph.D.
 
 
 
Kontrola URKUND:
Zásady pro vypracování
MP2P lending platforms, a new financial intermediary between borrowers and lenders, experience an astonishing growth since their inception. For example, the biggest P2P lending platform in USA, Lending Club, almost double the amount of issued loans each year . P2P lending is growing in Europe as well. Wardrop et al. (2015) has showed that P2P consumer lending more than doubled the amount of lend money each year since 2012.
Our master thesis will be based on the data provided by Lending Club. Lending Club publishes information about all issued loans on their websites. For purpose of our thesis, we have taken a data set of loans issued between January 2009 and December 2012. Our data set contains 85 699 loans and we know the final status of all loans. We can extract training and testing data sample from our data set. Moreover, our data set is large enough to ensure inter-temporal validation.
It is essential for P2P lending platforms to decrease the information asymmetry between lenders and borrowers. Therefore, the borrowers are required to provide some information about themselves and the loan characteristics. Based on this information, P2P lending platforms use their credit scoring models to properly assess borrowers’ credit risks. Well performing credit scoring model is pivotal for P2P lending platforms’ success. Nevertheless, as researched by Abdou & Pointon (2011)’s meta-analyses including more than 200 articles about credit scoring models, there is no single credit scoring method outperforming others. Based on the Kaggle competition dataset, Random forest was, however, chosen as the best credit scoring method by Pandey (2011). His results are in line with Liang (2011). The real world datasets, such as Lending Club dataset, usually behaves differently. Tsai (2014) have found out that Random forest model and Support Vector Machines are outperformed by Logistic regression with penalties for negative classes based on the Lending Club data. In addition to that, Chang et al. (2016) showed that Naïve Bayes with Gaussian outperforms Logistic regression as well as SVM. The question what is the best credit scoring model for Lending Club data has not yet been solved. Moreover, credit scoring models, such as Neural networks or C5.0, have not been compared in neither of these studies.
The purpose of our master thesis is comprehensive performance comparison of various credit scoring models. Furthermore, we want to develop our own credit scoring model. Our credit scoring model will be based on Logistic regression putting more weight on determinants of borrowers’ default researched by Carmichael (2014) and Serrano-Cinca et al. (2015).
Seznam odborné literatury
Carmichael, D. (2014): Modeling default for peer-to-peer loans. Available at SSRN: http://ssrn.com/abstract=2529240 pp. 1-43.
Liang, J. (2010): Predicting borrowers’ chance of default- ing on credit loans. pp 1-5
Mills, K. G. (2014): The State of Small Business Lending : Credit Access during the Recovery and How Technology May Change the Game THE STATE OF SMALL BUSINESS LENDING.
Namvar, E. (2013): An Introduction to Peer to Peer Loans as Investments. pp. 1-18.
Pandey, J. N. (2011) :Predicting Probability of Loan Default Stanford University, CS229 Project report Jiten- dra Nath Pandey, Maheshwaran Srinivasan.
Serrano-Cinca, C., B. Guti_errez-Nieto, & L. L_opez-Palacios (2015): Determinants of Default in P2P Lending. Plos One 10(10): p. e0139427.
Tsai, K. (2014): Peer Lending Risk Predictor Support Vector Machines (SVM ). pp. 1-5.
Wardrop, R., B. Zhang, R. Rau, & M. Gray (2015): The European Alternative Finance Benchmarking Report. p. 44.
Wu, J. (2014): Loan default prediction using lending club data. Available at http://www.wujiayu.me/assets/projects/loan-default-prediction-Jiayu-Wu.pdf pp. 1-12.
Předběžná náplň práce
Hypotheses:
1. Hypothesis #1: Neural network outperforms Random forest based on Lending Club data.
2. Hypothesis #2: Neural network outperforms Naïve Bayes with Gaussian.
3. Hypothesis #3: Our model has the highest AUC among other credit scoring models based on Lending Club data.

Methodology:
Our data set with loans issued between January 2009 and December 2013. This data set is large enough to insure inter-temporal validity.
The models will be evaluated based on the AUC (area under curve) score of ROC curve (Receiver operating characteristics).

Expected Contribution:
Various credit scoring models will be comprehensively compared based on the Lending Club data. We will get the answer what is the best credit scoring model for Lending Club. Carmichael (2014) and Serrano-Cinca et al. (2015) researched determinants of borrowers’ default in Lending Club data. We want to use these determinants to develop our own model outperforming other credit scoring model for Lending Club data.
Předběžná náplň práce v anglickém jazyce
1. Introduction
2. Literature Review
3. Data Description
4. Hypotheses
5. Results
6. Summary
7. Bibliography
8. Appendix
 
Univerzita Karlova | Informační systém UK