Research Article


DOI :10.26650/JEPR1114842   IUP :10.26650/JEPR1114842    Full Text (PDF)

Credit Scoring on Cash Flow Table with Machine Learning: XGBoost Approach

Güner AltanServer Demirci

Machine learning methods have started being used with greater momentum in the banking and finance sectors alongside modernization and globalization. The ability to distinguish between good and bad customers has become extremely important, especially with the increase in credit products offered in the banking sector. This ability to distinguish not only increases banks’ profitability but also increases their competitiveness in the market. In this context, banks put companies through a credit evaluation process before loaning to them, and the most important leg of this process is undoubtedly the credit score analysis. Considering that one of the most important risks banks carry is credit risk, the importance of correctly, reliably, and quickly completing the balanced scorecard study during the credit evaluation process cannot be denied. Whether the company undergoing a scorecard study is an independent company or part of a group of companies may change how the company or firms are evaluated. In a group of companies, no matter how good a rating one company has in regard to its status within the parent company, if the other companies have low ratings, this may affect and reduce the consolidated rating. In this context, the current study focuses on groups of companies. The aim of the study is to try to develop a scorecard model using the cash flow statements of consolidated companies. In this study, eXtreme Gradient Boosting (XGBoost), Gradient Boosting and Artificial Neural Network algorithms which are machine learning techniques and Python program were used. These three methods were compared, and the extreme gradient boosting method was shown to be the preferred model with an accuracy rating of 80%.

DOI :10.26650/JEPR1114842   IUP :10.26650/JEPR1114842    Full Text (PDF)

Makine Öğrenmesi ile Nakit Akış Tablosu Üzerinden Kredi Skorlaması: XGBoost Yaklaşımı

Güner AltanServer Demirci

Modernleşme ve globalleşmeyle birlikte makine öğrenmesi yöntemleri bankacılık ve finans sektöründe artan bir ivmeyle kullanılmaya başlanmıştır. Özellikle bankacılık sektöründe sunulan kredi ürünlerinin artmasıyla kötü ve iyi müşteriler arasında tam olarak ayırt etme yeteneği son derece önemli hale gelmiştir. Bu ayırt etme yeteneği sadece bankaların karlılıklarını artırmakla kalmaz, aynı zamanda pazardaki rekabet gücünü de arttırır. Bu bağlamda bankalar firmaları borçlandırmadan önce kredi değerlendirme sürecinden geçirirler ve bu sürecin en önemli ayağını da şüphesiz skorlama çalışması oluşturmaktadır. Bankaların taşıdığı en önemli risklerden birinin kredi riski olduğu düşünülürse kredi değerlendirme sürecinde skorkart çalışmasının da doğru, güvenilir ve hızlı bir şekilde sonuçlanmasının önemi yadsınamaz. Skorlama çalışmalarında firmanın solo ya da grup firması olması firmanın ya da firmaların değerlendirilmesini değiştirebilir. Grubu oluşturan firmalarda ana firma statüsündeki firmanın derecelendirme notu ne kadar iyi olursa olsun diğer firmaların notu düşük ise, konsolide derecelendirme notunu etkileyip düşürebilir. Bu kapsamda çalışmada grup firmalarına vurgu yapılmıştır. Çalışmanın amacı konsolide firmaların nakit akış tablosundan faydalanılarak bir skorkart modeli geliştirilmeye çalışılmasıdır. Python program dili makine öğrenmesi ile XGBoost, Gradient Boosting ve Neural Network yöntemleri kullanılmıştır. Bu üç yöntem karşılaştırılmış olup XGBoost yöntemi %80 doğruluk skoru ile tercih edilen model olmuştur. 


EXTENDED ABSTRACT


The study obtained 399 observations over a 3-year review period between 2017 and 2019 for 133 consolidated companies. The study aims to provide a faster and more reliable model for producing results with regard to banks’ scoring/rating studies based solely on cash flow statements in terms of companies’ financial data. In essence, the XGBoost algorithm was used in the Python machine learning methods to attempt to show that companies can indeed perform a successful scoring study using their cash flow statements.

With the increased competition in the banking sector, having banks maintain their assets with sustainable profitability is extremely important for both customer satisfaction and banks. Credit score analyses are a laborious, attention-gaining study process. Banks have been researching the most accurate credit risk assessment methods for many years. New methods have started being used alongside the developments in technology. One of these can be machine learning algorithms using the program Python.

By abandoning the traditional statistical methods in the banking sector, a new model with modern methods has been presented to the banking and financial sectors that uses machine learning algorithms. With regard to the data set, the study has selected consolidated companies compiled from manufacturing, trade, and service sectors and excluding construction sectors. At the same time, the study has preferred consolidated companies because of the great importance group companies have with regard to the credit evaluation process. By developing a model in this context, the study intends to emphasize how important the consolidated (i.e., group company) credit score is in the credit evaluation process in the case of a balanced scorecard study.

Firstly, the study will present the introduction and discuss the literature review and then talk about the concept of group companies and their credit evaluation process. The following sections of the study will address the three algorithms of extreme gradient boosting, gradient boosting, and neural networks, which are machine learning techniques used in credit scoring.

The study’s pre-model preparation phase discusses the details of the dependent and independent variables and conducts pre-model data cleaning. Correlation analyses (featureto-feature correlations, feature-to-target correlations) were performed in this context, and the outlier values were determined for the data. The outlier values for the variables were not extracted from the data, as removing them from the data would reduce the number of observations. Instead, the Robust Scaler method, which is sensitive to outliers, was used to scale the data.

During the phase of setting up the model, training and test set partitioning involved the following. A cross-validation analysis was performed on the training set, and the training set was divided into five subsets. As a result, the cross-validation training and test set accuracy score results were compared. The test results from the model were determined to have an 80% accuracy score in XGBboost (eXtreme gradient boosting), a 77.5% accuracy score for gradient boosting, and a 61.25% accuracy score for the artificial neural networks.

The study has preferred the XGBoost algorithm model with its 80% accuracy score and 82% score for area under the receiving operating characteristics curve (ROC-AUC). The model’s ROC curve is shown in figure 12, and the area below the curve (AUC) is 82%. The XGBoost’s confusion matrix is also shown in figure 8 and reveals our model’s predictive performance power. In this context, the probability of success is estimated based on a 52% classification threshold using 80 observational test data with the model possessing the highest performance. Ratings of A, B, C, D, or E were given to the companies based on their probability of success.

This study presents a model that is able to provide companies with a reliable scorecard/ rating model in a shorter time by taking only their cash flow statement in terms of financial data. In this way, banks can manage risk appetite with maximum optimization and provide extra customer satisfaction with faster analyses. This study involves rapid credit score rating and can at the very least provide solutions to companies’ short-term loan demands. 


PDF View

References

  • Akdoğan, N., & Tenker, N. (2007). Finansal Tablolar ve Mali Analiz Teknikleri. Ankara: Gazi Kitapevi. google scholar
  • Akpınar, N. (2019). Makine Öğrenmesi Teknikleriyle Kredi Başvuru Skor Kartının Oluşturulması. (Yüksek Lisans Tezi). Yıldız Teknik Üniversitesi, Fen Bilimleri Enstitüsü, İstanbul. google scholar
  • Ampountolas, A., Nde, T.N., Date, P. & Constantinescu, C. (2021). A Machine Learning Approach for Micro-Credit Scoring. Risks Journal, 9 (50), 1-20. google scholar
  • Brown, L. & Mues, C. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications, 39, 3446-3453. google scholar
  • Can, Ö., Y. (2020). Makine Öğrenmesi Teknikleri Kullanılarak Kredi Risk Analizi. (Yüksek Lisans Tezi). İstanbul Aydın Üniversitesi, Fen Bilimler Enstitüsü, İstanbul. google scholar
  • Chen, C., Yokoyama, S., Yamashita T., & Kawamura H. (2019). Application of XGBoost to Credit Scoring. Information processing Society Of Japan, 194 (11), 1-8. google scholar
  • Demajo, L., M., Vella, V., & Dingli, A. (2020). Explainable AI For Interpretable Credit Scoring. Computer Science & Information Technology, 185-203. google scholar
  • Fidan, M. M. (2009). Kredilendirme Sürecinde Uluslararası Finansal Raporlama Standartlarına Göre Konsolide Finansal Raporların Düzenlenmesi. (Doktora Tezi). Kadir Has Üniversitesi Sosyal Bilimler Enstitüsü. google scholar
  • Ghodselahi, A., & Amirmadhi, A. (2011). Application Of Artificial Intelligence Techniques For Credit Risk Evaluation. International Journal Of Modeling And Optimization, 1 (3), pp.246. google scholar
  • Girgin, F. (2020). TMS 7 Nakit Akış Tablosu Kapsamında Finansal Performansın Ölçülmesi: Bist’te Bir Uygulama. (Yüksek Lisans Tezi). Balıkesir Üniversitesi Sosyal Bilimler Enstitüsü. google scholar
  • Grogoriou, K. (2021). Credit risk analysis via machine learning methods: client segmentation based on probability of default. (Master Thesis). University Of Macedonia, Master Of Science In Applied Informatics. google scholar
  • Guegan, D., & Hassani, B. (2018). Regulatory learning: How to supervise machine learning models? An application to credit scoring. The Journal of Finance and Data Science, 4, 157-171. google scholar
  • Hild, A. (2021). Estimating And Evaluating The Probability Of Default- A Machine Learning Approach. (Master Thesis). Uppsala Universitet, Statistics İn The Faculty Of Social Sciences. google scholar
  • Li, Y., & Chen, W. (2020). A Comparative Performance Assessment of Ensemble Learning for Credit Scoring. Mathematics, 8 (1756), 1-19. google scholar
  • Nehrebecka, N. (2018). Predicting The Default Risk Of Companies. Comparison Of Credit Scoring Models: Logit Vs Support Vector Machines. Econometrics. Ekonometria Advances in Applied Data Analysis, 22 (2), 54-73. google scholar
  • Qin, C., Zhang, Y., Bao, F., Zhang, C., Liu, P. & Liu, P. (2021). XGBoost Optimized by Adaptive Particle Swarm Optimization for Credit Scoring. Hindawi Mathematical Problems in Engineering, 1-18. google scholar
  • Ramraj, S., Uzir, N., Sunil, R., & Banerjee, S. (2016). Experimenting XGBoost Algorithm for Prediction and Classification of Different Datasets. International Journal of Control Theory and Applications, 9 (40), 1-12. google scholar
  • Salvaire, P. A. J. M. (2019). Explaining The Predictions Of A Boosted Tree Algorithm Applicatıon To Credit Scoring. (Master Thesis). Universidade Nova de Lisboa, NOVA Information Management School. google scholar
  • Sang, H., V., Nam, N., H. & Nhan, N., D. (2016). A Novel Credit Scoring Prediction Model Based On Feature Selection Approach And Parallel Random Forest. Indian Journal Of Science And Technology, 9 (20), 1-6. google scholar
  • Sarker, I. H. (2021), Machine Learning: Algorithms, RealWorld Applications and Research Directions. SN Computer Science, 2 (160), 1-21. google scholar
  • Sorhun, E. (2021). Python ile Makine Öğrenmesi. İstanbul: Abaküs Yayınları. google scholar
  • Terko, A., Zunic, E., Donko, D., & Dzelihodzic, A. (2019). Credit Scoring Model Implementation in A Microfinance Context. International Conference On Information, Communication And Automation Technologies, 1-6. google scholar
  • Ustalı, N., K., Tosun, N., & Tosun, Ö. (2020). Makine Öğrenmesi Teknikleri ile Hisse Senedi Fiyat Tahmini. Eskişehir Osmangazi Üniversitesi İİBF Dergisi, Cilt 16, Sayı 1, 5-8. google scholar
  • Üçoğlu, D., & Fırat, F. Z. (2019). TFRS 10 Konsolide Finansal Tablolar Standardı Kapsamında Grup İçi Satışlarla İlgili Yapılan Eliminasyon İşlemlerinin Ertelenmiş Vergi Etkileri. Muhasebe ve Denetime Bakış, Cilt 57, s.85. google scholar
  • Wang, G., Hao, J., Ma, J. & Jiang, H. (2011). A comparative assessment of ensemble learning for credit scoring. Expert Systems with Applications, 38, 223-230. google scholar
  • Yalçın, Z. (2020). TFRS 10 Standardı Konsolide Finansal Tablolar: Bir Uygulama Örneği. İzmir YMMO Dergisi, 2 (2), s.3. google scholar

Citations

Copy and paste a formatted citation or use one of the options to export in your chosen format


EXPORT



APA

Altan, G., & Demirci, S. (2022). Credit Scoring on Cash Flow Table with Machine Learning: XGBoost Approach. Journal of Economic Policy Researches, 9(2), 397-424. https://doi.org/10.26650/JEPR1114842


AMA

Altan G, Demirci S. Credit Scoring on Cash Flow Table with Machine Learning: XGBoost Approach. Journal of Economic Policy Researches. 2022;9(2):397-424. https://doi.org/10.26650/JEPR1114842


ABNT

Altan, G.; Demirci, S. Credit Scoring on Cash Flow Table with Machine Learning: XGBoost Approach. Journal of Economic Policy Researches, [Publisher Location], v. 9, n. 2, p. 397-424, 2022.


Chicago: Author-Date Style

Altan, Güner, and Server Demirci. 2022. “Credit Scoring on Cash Flow Table with Machine Learning: XGBoost Approach.” Journal of Economic Policy Researches 9, no. 2: 397-424. https://doi.org/10.26650/JEPR1114842


Chicago: Humanities Style

Altan, Güner, and Server Demirci. Credit Scoring on Cash Flow Table with Machine Learning: XGBoost Approach.” Journal of Economic Policy Researches 9, no. 2 (Oct. 2022): 397-424. https://doi.org/10.26650/JEPR1114842


Harvard: Australian Style

Altan, G & Demirci, S 2022, 'Credit Scoring on Cash Flow Table with Machine Learning: XGBoost Approach', Journal of Economic Policy Researches, vol. 9, no. 2, pp. 397-424, viewed 5 Oct. 2022, https://doi.org/10.26650/JEPR1114842


Harvard: Author-Date Style

Altan, G. and Demirci, S. (2022) ‘Credit Scoring on Cash Flow Table with Machine Learning: XGBoost Approach’, Journal of Economic Policy Researches, 9(2), pp. 397-424. https://doi.org/10.26650/JEPR1114842 (5 Oct. 2022).


MLA

Altan, Güner, and Server Demirci. Credit Scoring on Cash Flow Table with Machine Learning: XGBoost Approach.” Journal of Economic Policy Researches, vol. 9, no. 2, 2022, pp. 397-424. [Database Container], https://doi.org/10.26650/JEPR1114842


Vancouver

Altan G, Demirci S. Credit Scoring on Cash Flow Table with Machine Learning: XGBoost Approach. Journal of Economic Policy Researches [Internet]. 5 Oct. 2022 [cited 5 Oct. 2022];9(2):397-424. Available from: https://doi.org/10.26650/JEPR1114842 doi: 10.26650/JEPR1114842


ISNAD

Altan, Güner - Demirci, Server. Credit Scoring on Cash Flow Table with Machine Learning: XGBoost Approach”. Journal of Economic Policy Researches 9/2 (Oct. 2022): 397-424. https://doi.org/10.26650/JEPR1114842



TIMELINE


Submitted10.05.2022
Accepted09.06.2022
Published Online29.07.2022

LICENCE


Attribution-NonCommercial (CC BY-NC)

This license lets others remix, tweak, and build upon your work non-commercially, and although their new works must also acknowledge you and be non-commercial, they don’t have to license their derivative works on the same terms.


SHARE




Istanbul University Press aims to contribute to the dissemination of ever growing scientific knowledge through publication of high quality scientific journals and books in accordance with the international publishing standards and ethics. Istanbul University Press follows an open access, non-commercial, scholarly publishing.