Credit Scoring on Cash Flow Table with Machine Learning: XGBoost Approach
Güner Altan, Server DemirciMachine learning methods have started being used with greater momentum in the banking and finance sectors alongside modernization and globalization. The ability to distinguish between good and bad customers has become extremely important, especially with the increase in credit products offered in the banking sector. This ability to distinguish not only increases banks’ profitability but also increases their competitiveness in the market. In this context, banks put companies through a credit evaluation process before loaning to them, and the most important leg of this process is undoubtedly the credit score analysis. Considering that one of the most important risks banks carry is credit risk, the importance of correctly, reliably, and quickly completing the balanced scorecard study during the credit evaluation process cannot be denied. Whether the company undergoing a scorecard study is an independent company or part of a group of companies may change how the company or firms are evaluated. In a group of companies, no matter how good a rating one company has in regard to its status within the parent company, if the other companies have low ratings, this may affect and reduce the consolidated rating. In this context, the current study focuses on groups of companies. The aim of the study is to try to develop a scorecard model using the cash flow statements of consolidated companies. In this study, eXtreme Gradient Boosting (XGBoost), Gradient Boosting and Artificial Neural Network algorithms which are machine learning techniques and Python program were used. These three methods were compared, and the extreme gradient boosting method was shown to be the preferred model with an accuracy rating of 80%.
Makine Öğrenmesi ile Nakit Akış Tablosu Üzerinden Kredi Skorlaması: XGBoost Yaklaşımı
Güner Altan, Server DemirciModernleşme ve globalleşmeyle birlikte makine öğrenmesi yöntemleri bankacılık ve finans sektöründe artan bir ivmeyle kullanılmaya başlanmıştır. Özellikle bankacılık sektöründe sunulan kredi ürünlerinin artmasıyla kötü ve iyi müşteriler arasında tam olarak ayırt etme yeteneği son derece önemli hale gelmiştir. Bu ayırt etme yeteneği sadece bankaların karlılıklarını artırmakla kalmaz, aynı zamanda pazardaki rekabet gücünü de arttırır. Bu bağlamda bankalar firmaları borçlandırmadan önce kredi değerlendirme sürecinden geçirirler ve bu sürecin en önemli ayağını da şüphesiz skorlama çalışması oluşturmaktadır. Bankaların taşıdığı en önemli risklerden birinin kredi riski olduğu düşünülürse kredi değerlendirme sürecinde skorkart çalışmasının da doğru, güvenilir ve hızlı bir şekilde sonuçlanmasının önemi yadsınamaz. Skorlama çalışmalarında firmanın solo ya da grup firması olması firmanın ya da firmaların değerlendirilmesini değiştirebilir. Grubu oluşturan firmalarda ana firma statüsündeki firmanın derecelendirme notu ne kadar iyi olursa olsun diğer firmaların notu düşük ise, konsolide derecelendirme notunu etkileyip düşürebilir. Bu kapsamda çalışmada grup firmalarına vurgu yapılmıştır. Çalışmanın amacı konsolide firmaların nakit akış tablosundan faydalanılarak bir skorkart modeli geliştirilmeye çalışılmasıdır. Python program dili makine öğrenmesi ile XGBoost, Gradient Boosting ve Neural Network yöntemleri kullanılmıştır. Bu üç yöntem karşılaştırılmış olup XGBoost yöntemi %80 doğruluk skoru ile tercih edilen model olmuştur.
The study obtained 399 observations over a 3-year review period between 2017 and 2019 for 133 consolidated companies. The study aims to provide a faster and more reliable model for producing results with regard to banks’ scoring/rating studies based solely on cash flow statements in terms of companies’ financial data. In essence, the XGBoost algorithm was used in the Python machine learning methods to attempt to show that companies can indeed perform a successful scoring study using their cash flow statements.
With the increased competition in the banking sector, having banks maintain their assets with sustainable profitability is extremely important for both customer satisfaction and banks. Credit score analyses are a laborious, attention-gaining study process. Banks have been researching the most accurate credit risk assessment methods for many years. New methods have started being used alongside the developments in technology. One of these can be machine learning algorithms using the program Python.
By abandoning the traditional statistical methods in the banking sector, a new model with modern methods has been presented to the banking and financial sectors that uses machine learning algorithms. With regard to the data set, the study has selected consolidated companies compiled from manufacturing, trade, and service sectors and excluding construction sectors. At the same time, the study has preferred consolidated companies because of the great importance group companies have with regard to the credit evaluation process. By developing a model in this context, the study intends to emphasize how important the consolidated (i.e., group company) credit score is in the credit evaluation process in the case of a balanced scorecard study.
Firstly, the study will present the introduction and discuss the literature review and then talk about the concept of group companies and their credit evaluation process. The following sections of the study will address the three algorithms of extreme gradient boosting, gradient boosting, and neural networks, which are machine learning techniques used in credit scoring.
The study’s pre-model preparation phase discusses the details of the dependent and independent variables and conducts pre-model data cleaning. Correlation analyses (featureto-feature correlations, feature-to-target correlations) were performed in this context, and the outlier values were determined for the data. The outlier values for the variables were not extracted from the data, as removing them from the data would reduce the number of observations. Instead, the Robust Scaler method, which is sensitive to outliers, was used to scale the data.
During the phase of setting up the model, training and test set partitioning involved the following. A cross-validation analysis was performed on the training set, and the training set was divided into five subsets. As a result, the cross-validation training and test set accuracy score results were compared. The test results from the model were determined to have an 80% accuracy score in XGBboost (eXtreme gradient boosting), a 77.5% accuracy score for gradient boosting, and a 61.25% accuracy score for the artificial neural networks.
The study has preferred the XGBoost algorithm model with its 80% accuracy score and 82% score for area under the receiving operating characteristics curve (ROC-AUC). The model’s ROC curve is shown in figure 12, and the area below the curve (AUC) is 82%. The XGBoost’s confusion matrix is also shown in figure 8 and reveals our model’s predictive performance power. In this context, the probability of success is estimated based on a 52% classification threshold using 80 observational test data with the model possessing the highest performance. Ratings of A, B, C, D, or E were given to the companies based on their probability of success.
This study presents a model that is able to provide companies with a reliable scorecard/ rating model in a shorter time by taking only their cash flow statement in terms of financial data. In this way, banks can manage risk appetite with maximum optimization and provide extra customer satisfaction with faster analyses. This study involves rapid credit score rating and can at the very least provide solutions to companies’ short-term loan demands.