Thesis (Selection of subject)Thesis (Selection of subject)(version: 368)
Thesis details
   Login via CAS
Performance Analysis of Credit Scoring Models on Lending Club Data
Thesis title in Czech: Performance Analysis of Credit Scoring Models on Lending Club Data
Thesis title in English: Performance Analysis of Credit Scoring Models on Lending Club Data
Key words: Kreditní skórování, P2P půjčování, Klasifikace, Žebříček klasifikátorů
English key words: Credit scoring, P2P Lending, Classification, Classifiers’ ranking
Academic year of topic announcement: 2016/2017
Thesis type: diploma thesis
Thesis language: angličtina
Department: Institute of Economic Studies (23-IES)
Supervisor: prof. PhDr. Petr Teplý, Ph.D.
Author: hidden - assigned by the advisor
Date of registration: 09.11.2016
Date of assignment: 09.11.2016
Date and time of defence: 22.06.2017 08:30
Venue of defence: Opletalova - Opletalova 26, O105, Opletalova - místn. č. 105
Date of electronic submission:18.05.2017
Date of proceeded defence: 22.06.2017
Opponents: Mgr. Magda Pečená, Ph.D.
 
 
 
URKUND check:
Guidelines
MP2P lending platforms, a new financial intermediary between borrowers and lenders, experience an astonishing growth since their inception. For example, the biggest P2P lending platform in USA, Lending Club, almost double the amount of issued loans each year . P2P lending is growing in Europe as well. Wardrop et al. (2015) has showed that P2P consumer lending more than doubled the amount of lend money each year since 2012.
Our master thesis will be based on the data provided by Lending Club. Lending Club publishes information about all issued loans on their websites. For purpose of our thesis, we have taken a data set of loans issued between January 2009 and December 2012. Our data set contains 85 699 loans and we know the final status of all loans. We can extract training and testing data sample from our data set. Moreover, our data set is large enough to ensure inter-temporal validation.
It is essential for P2P lending platforms to decrease the information asymmetry between lenders and borrowers. Therefore, the borrowers are required to provide some information about themselves and the loan characteristics. Based on this information, P2P lending platforms use their credit scoring models to properly assess borrowers’ credit risks. Well performing credit scoring model is pivotal for P2P lending platforms’ success. Nevertheless, as researched by Abdou & Pointon (2011)’s meta-analyses including more than 200 articles about credit scoring models, there is no single credit scoring method outperforming others. Based on the Kaggle competition dataset, Random forest was, however, chosen as the best credit scoring method by Pandey (2011). His results are in line with Liang (2011). The real world datasets, such as Lending Club dataset, usually behaves differently. Tsai (2014) have found out that Random forest model and Support Vector Machines are outperformed by Logistic regression with penalties for negative classes based on the Lending Club data. In addition to that, Chang et al. (2016) showed that Naïve Bayes with Gaussian outperforms Logistic regression as well as SVM. The question what is the best credit scoring model for Lending Club data has not yet been solved. Moreover, credit scoring models, such as Neural networks or C5.0, have not been compared in neither of these studies.
The purpose of our master thesis is comprehensive performance comparison of various credit scoring models. Furthermore, we want to develop our own credit scoring model. Our credit scoring model will be based on Logistic regression putting more weight on determinants of borrowers’ default researched by Carmichael (2014) and Serrano-Cinca et al. (2015).
References
Carmichael, D. (2014): Modeling default for peer-to-peer loans. Available at SSRN: http://ssrn.com/abstract=2529240 pp. 1-43.
Liang, J. (2010): Predicting borrowers’ chance of default- ing on credit loans. pp 1-5
Mills, K. G. (2014): The State of Small Business Lending : Credit Access during the Recovery and How Technology May Change the Game THE STATE OF SMALL BUSINESS LENDING.
Namvar, E. (2013): An Introduction to Peer to Peer Loans as Investments. pp. 1-18.
Pandey, J. N. (2011) :Predicting Probability of Loan Default Stanford University, CS229 Project report Jiten- dra Nath Pandey, Maheshwaran Srinivasan.
Serrano-Cinca, C., B. Guti_errez-Nieto, & L. L_opez-Palacios (2015): Determinants of Default in P2P Lending. Plos One 10(10): p. e0139427.
Tsai, K. (2014): Peer Lending Risk Predictor Support Vector Machines (SVM ). pp. 1-5.
Wardrop, R., B. Zhang, R. Rau, & M. Gray (2015): The European Alternative Finance Benchmarking Report. p. 44.
Wu, J. (2014): Loan default prediction using lending club data. Available at http://www.wujiayu.me/assets/projects/loan-default-prediction-Jiayu-Wu.pdf pp. 1-12.
Preliminary scope of work
Hypotheses:
1. Hypothesis #1: Neural network outperforms Random forest based on Lending Club data.
2. Hypothesis #2: Neural network outperforms Naïve Bayes with Gaussian.
3. Hypothesis #3: Our model has the highest AUC among other credit scoring models based on Lending Club data.

Methodology:
Our data set with loans issued between January 2009 and December 2013. This data set is large enough to insure inter-temporal validity.
The models will be evaluated based on the AUC (area under curve) score of ROC curve (Receiver operating characteristics).

Expected Contribution:
Various credit scoring models will be comprehensively compared based on the Lending Club data. We will get the answer what is the best credit scoring model for Lending Club. Carmichael (2014) and Serrano-Cinca et al. (2015) researched determinants of borrowers’ default in Lending Club data. We want to use these determinants to develop our own model outperforming other credit scoring model for Lending Club data.
Preliminary scope of work in English
1. Introduction
2. Literature Review
3. Data Description
4. Hypotheses
5. Results
6. Summary
7. Bibliography
8. Appendix
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html