Thesis (Selection of subject)Thesis (Selection of subject)(version: 368)
Thesis details
   Login via CAS
Consumer Credit Risk Analysis: Evidence from the Czech Republic
Thesis title in Czech: Analýza spotřebitelského úvěrového rizika v České republice
Thesis title in English: Consumer Credit Risk Analysis: Evidence from the Czech Republic
Key words: ohodnocení úvěruschopnosti, indikátory selhání, klasifikační metody, osobní charakteristiky, bankovní sektor, spotřebitelský úvěr
English key words: credit scoring model, default predictors, classification methods, personal characteristics, banking sector, consumer loan
Academic year of topic announcement: 2016/2017
Thesis type: diploma thesis
Thesis language: angličtina
Department: Institute of Economic Studies (23-IES)
Supervisor: prof. Ing. Evžen Kočenda, M.A., Ph.D., DSc.
Author: hidden - assigned by the advisor
Date of registration: 25.05.2017
Date of assignment: 25.05.2017
Date and time of defence: 20.06.2018 08:30
Venue of defence: Opletalova - Opletalova 26, O206, Opletalova - místn. č. 206
Date of electronic submission:03.05.2018
Date of proceeded defence: 20.06.2018
Opponents: PhDr. Michal Hlaváček, Ph.D.
 
 
 
URKUND check:
References
Abdou, H. A., & Pointon, J. (2011). Credit scoring, statistical techniques and evaluation criteria: A review of the literature. Intelligent Systems in Accounting, Finance and Management, 18(2-3), 59-88.
Abdou, H., & Tsafack, M. (2015). Forecasting creditworthiness in retail banking: a comparison of cascade correlation neural networks, CART and logistic regression scoring models. The 2nd International Conference on Innovation in Economics and Business ICIEB 2015, February 12-13 2015, Amsterdam, Netherlands.
Bellotti, T., & Crook, J. (2009). Support vector machines for credit scoring and discovery of significant features. Expert Systems with Applications, 36(2), 3302-3308.
Crook, J. N., Hamilton, R., & Thomas, L. C. (1992). A comparison of discriminations under alternative definitions of credit default. In L. C. Thomas, J. N. Crook, & D. B. Edelman (Eds.), Credit scoring and credit control (pp. 217−245). Oxford: Oxford University Press.
Hand, D. J., & Henley, W. E. (1997). Statistical classification methods in consumer credit scoring: a review. Journal of the Royal Statistical Society: Series A (Statistics in Society), 160(3), 523-541.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 6). New York: springer.
Kočenda, E., Vojtek, M., 2011. Default Predictors in Retail Credit Scoring: Evidence from Czech Banking Data. Emerging Markets Finance and Trade, 47(6), 80–98. 
Kruppa, J., Schwarz, A., Arminger, G., & Ziegler, A. (2013). Consumer credit risk: Individual probability estimates using machine learning. Expert Systems with Applications, 40(13), 5125-5131.
Nguyen, H. T. (2015). Default predictors in credit scoring: evidence from France's retail banking institution. The Journal of Credit Risk, 11(2), 41-66.
Vojtek, M., & Kočenda, E. (2006). Credit-scoring methods. Czech Journal of Economics and Finance (Finance a uver), 56(3-4), 152-167.
Preliminary scope of work
Motivation:

Credit risk represents the most important risk commercial banks have to manage. It accounts for approximately 70% of all risks banks face. Its appropriate qualification and management are therefore crucial. As the number of consumer loans has been growing in the last 15 years, the assessment of risk of default on credit has been paid a lot of attention. For this purpose, various credit scoring methods were developed in order to help commercial banks prevent the financial loss resulting from potential defaults. These methods classify applicants for a loan into bad and good borrowers according to probability of default. This helps to evaluate their creditworthiness.
Due to high importance in the banking sector, credit risk analysis and particular credit scoring techniques have been examined by a plethora of authors. Not only do they compare suitability and accuracy of various traditional methods, but they also investigate the application of less conventional approaches. A detailed list of credit scoring methods was assembled by Hand and Henley (1997), Vojtek and Kočenda (2006) or Abdou and Pointon (2011). The majority of the methods reviewed in these papers are widely used and evaluated. The most frequently employed approach is logistic regression which usually serves as a baseline for comparison with other methods. Other popular methods include linear discriminant analysis, k-nearest neighbours, decision trees, random forest, support vector machines and neural networks. Their application and comparison was investigated for instance by Bellotti and Crook (2007), Kruppa et al. (2013) or Abdou and Tsafack (2015).
In order to model probability of default, it is necessary to work with client’s personal data which are further examined. The most important variables used in the analysis include demographic, financial, employment and behavioural indicators (Vojtek and Kočenda, 2006). Due to high data requirements, the research in this area is extremely difficult to conduct. Thus, the amount of studies which perform credit scoring analysis on real world data is very limited. To the author’s knowledge, default predictors in European retail banking have been investigated only in the cases of France (Nguyen, 2015) and the Czech Republic (Kočenda and Vojtek, 2011). Nguyen (2015) performed logistic regression in order to model credit risk in the French banking sector. Furthermore, Crook et al. (1992) examined various sociodemographic and economic discriminators for default prediction among cardholders in the UK.
In the Czech Republic, Kočenda and Vojtek (2011) constructed two credit risk models in order to examine default predictors in retail credit scoring using retail-loan banking data. They compared performance of logistic regression and CART model. They discovered that both methods were comparably efficient. As far as the key determinants of default behaviour are concerned, both models detected similar financial and socio-economic indicators. As a follow-up to this research paper, the main aim of this thesis is to investigate what factors influence probability of default in the Czech Republic. Furthermore, the comparison of suitability and accuracy of various techniques is made. Both traditional credit scoring methods and less conventional approaches are applied and evaluated.

Hypotheses:

Hypothesis #1: Client’s gender does not affect the probability of default.
Hypothesis #2: Client’s age does not affect the probability of default.
Hypothesis #3: Number of client’s children does not affect the probability of default.
Hypothesis #4: Level of client’s education does not affect the probability of default.
Hypothesis #5: Client’s monthly income does not affect the probability of default.
Hypothesis #6: A district in which a client lives does not affect the probability of default.
Hypothesis #7: There is no difference in performance among credit scoring techniques.

Methodology:

The main objective of this thesis is to investigate default determinants in the Czech Republic. Additionally, the evaluation of various credit scoring methods is performed. In order to conduct this research, it is essential to work with data containing personal information. For this purpose, one of the largest Czech banks provided me with a random sample of its clients’ loans. These clients have taken out either a loan for housing or they have consolidated their loans. The dataset was created in March 2017.
The anonymised data include information about 4,000 persons who were granted a loan during the period from November 2006 to March 2017. The variables included in the dataset can be divided into two groups. The first group provides information about the particular loan such as its type, amount borrowed, unpaid balance, interest rate and instalment amount. Specific dates when the loan was taken out and when it will be fully repaid are provided as well. The second group is directly related to the borrower and includes socio-demographic characteristics. The most important variable indicates a default of a client. This means that these borrowers were not able to meet their financial obligations in time. Other personal data contain client’s gender, age, level of education, number of children, region of residence, monthly income and expenditure based on the account transactions. The very last variable indicates a period for which the person has been a client of the bank.
Firstly, I will perform a logistic regression which is a widely used method in credit scoring analysis. The results of this approach can be used for a detection of key default determinants and predictors. Due to such useful interpretation of results, this method belongs to one of the most popular techniques in this area. Furthermore, this method will be used for testing of hypothesis related to personal characteristics. Secondly, I will apply additional classification methods which aim to classify borrowers into bad and good ones as accurately as possible. Finally, various approaches will be evaluated and compared based on their predictive power.

Expected contribution:

I will perform a detailed credit risk analysis conducted on real world banking data. Due to general data unavailability, this thesis will supplement a rather limited amount of research papers both in the Czech Republic and Europe. By comparing the results with the existing work which was conducted on a different dataset, this could provide us with the dynamics of the credit risk development in the Czech Republic. Furthermore, the application of additional classification methods could result in more accurate models for default prediction. This might help to better address credit risk in the banking sector.

Outline:

1. Introduction: This part will introduce a role of credit risk and credit scoring in the banking sector.
2. Literature Review: I will summarize the previous research related to credit scoring and compare the results of various authors.
3. Data Description: I will present the examined dataset including the detailed feature statistics.
4. Empirical Part: I will present the used methods and build models for default prediction by using the described techniques.
5. Results: I will discuss the results and compare the suitability of methods.
6. Conclusion: I will summarize my findings and their implications for future work.
 
Charles University | Information system of Charles University | http://www.cuni.cz/UKEN-329.html