Predicting Countries’ Development Levels Using the Decision Tree and Random Forest Methods

Özkan Batuhan; Parim Coşkun; Çene Erhan

doi:https://dx.doi.org/10.26650/ekoist.2023.38.1172190

Research Article

DOI :10.26650/ekoist.2023.38.1172190 IUP :10.26650/ekoist.2023.38.1172190 Full Text (PDF)

Predicting Countries’ Development Levels Using the Decision Tree and Random Forest Methods

A very close relationship exists between countries’ development levels and economic level. Countries can be examined according to various criteria and evaluated under different groups based on their level of development, from underdeveloped to highly developed. Socioeconomic factors generally play a decisive role in determining countries’ levels of development. Although the level of development is determined with the help of socioeconomic variables, different organizations (e.g., United Nations [UN], International Monetary Fund [IMF]) may make country classifications with different methods. This situation causes a country’s development level to occur in different categories based on the method used and the organization that performed it. The aim of this study is to propose a machine learning model that predicts the development level for 193 countries. Development level consists of the categories of high income, upper middle income, lower middle income, and low income. The 26 variables that affect countries’ development levels were obtained from the World Development Indicators (WDI) database. Firstly, random forest based variable importance was used to determine the variables which have the most important effects on countries’ development levels. Afterwards, countries’ development levels were classified using decision trees and random forest algorithms with the most important variables selected through variable importance. The model composed with the random forest algorithm was determined to have correctly classified countries’ development levels at an accuracy of 70%. In addition, the findings show the variables of adolescent fertility rate, total fertility rate, and the share of agriculture, forestry, and fisheries in gross domestic product GDP) to be the most important variables affecting countries’ development levels.

Keywords: Development Level, Decision Tree, Random Forest, Fertility Rate, Machine Learning

DOI :10.26650/ekoist.2023.38.1172190 IUP :10.26650/ekoist.2023.38.1172190 Full Text (PDF)

Ülkelerin Gelişmişlik Düzeylerinin Karar Ağacı ve Rastgele Orman Yöntemleriyle Tahmin Edilmesi

Batuhan Özkan, Coşkun Parim, Erhan Çene

Ülkelerin gelişmişlik düzeyleri ile ekonomik açıdan kalkınma düzeyleri arasında çok yakın bir ilişki söz konusudur. Ülkeler, çeşitli ölçütlere göre incelenerek, gelişmişlik düzeylerine göre az gelişmişten çok gelişmişe doğru farklı gruplarda değerlendirilebilirler. Ülkelerin gelişmişlik düzeylerinin belirlenmesinde, genellikle sosyo-ekonomik faktörler belirleyici rol oynamaktadır. Gelişmişlik düzeyi her ne kadar sosyo-ekonomik değişkenler yardımıyla belirlense de, ülkelerin sınıflandırılması farklı organizasyonlar (Birleşmiş Milletler, Uluslararası Para Fonu vb.) tarafından farklı yöntemlerle yapılabilmektedir. Bu durum bir ülkenin gelişmişlik düzeyinin yönteme ve organizasyona göre farklı kategoride yer almasına sebep olmaktadır. Bu çalışmanın amacı, 193 ülke için gelişmişlik düzeyini tahmin eden bir makine öğrenmesi modeli geliştirmektir. Gelişmişlik düzeyi, “Yüksek Gelir”, “Üst Orta Gelir”, “Alt Orta Gelir” ve “Düşük Gelir” kategorilerinden oluşmaktadır. Ülkelerin gelişmişlik seviyesini etkileyen 26 değişken ise, Dünya Gelişmişlik İndeksi (World Development Indicators - WDI) veri tabanından elde edilmiştir. İlk olarak özellik seçimi olarak gelişmişlik düzeyini etkileyen en önemli değişkenlerin belirlenmesinde, rastgele orman metodu yardımıyla değişken önemi kullanılmıştır. Önemli bulunan bağımsız değişkenler yardımıyla, karar ağaçları ve rastgele orman algoritmaları kullanılarak gelişmişlik düzeyleri sınıflandırılmıştır. Rastgele orman algoritmasıyla oluşturulan modelin ülkelerin gelişmişliklerini %70 oranında doğru sınıflandırdığı belirlenmiştir. Ayrıca, bulgular Ergen Doğurganlık Hızı, Toplam Doğurganlık Oranı ve Tarım, Orman ve Balıkçılık’ın GSYİH (Gayri Safi Yurtiçi Hasıla) daki payının ülkelerin gelişmişliklerini etkileyen en önemli değişkenler olduğunu göstermektedir.

Keywords: Gelişmişlik Seviyesi, Karar Ağacı, Rastgele Orman, Doğurganlık Hızı, Makine Öğrenmesi

EXTENDED ABSTRACT

Although countries’ development levels are often associated with economic indicators, many different factors are found to affect their level of development. The methods used to determine countries’ development levels differ by organization (e.g., United Nations, World Bank, International Monetary Fund), and thus countries may show different levels of development, with the literature showing statistical methods such as machine learning, artificial neural networks, multiple discriminant analysis, and regression analysis to have been used to predict countries’ development levels.

Machine learning is the art of making future predictions from data using statistical algorithms. Machine learning methods train a model using part of the data as training data and then tests the performance of the proposed model using the remaining data (i.e., test data). In this way, the accuracy of a produced model is tested on test data the model has never seen before. Studies are found to have used machine learning and econometric methods to classify countries’ development levels, just as has been done in many areas.

This study developed a machine learning model that predicts the development levels for 193 countries using the World Development Indicator’s 2019 dataset. The dependent variable of the model has been classified as high income ($12,535 or more a year per capita ), upper middle income (from $4,046-$12,535), lower middle income ($1,035- $4,046), and low income (Less than $1,035). The groups in this classification were determined according to the World Bank’s Atlas Method, which they use to estimate the size of economies in terms of gross national income a year per capita in US dollars. The World Bank uses the Atlas conversion factor instead of simple exchange rates when calculating gross national income (GNI—formerly GNP) in US dollars for certain operational and analytical purposes. The purpose of the Atlas conversion factor is to reduce the effect of exchange rate fluctuations caused by inflation when comparing national incomes between countries. The Atlas conversion factor for any given year is the average of a country’s exchange rate for that year and the exchange rates for the previous two years, adjusted for the difference between the inflation rate in the country and international inflation. The independent variables are made up of the 26 variables obtained from the World Development Indicators (WDI) database that can be associated with countries’ development levels. These variables are related to health, industry, economy, agriculture, and technology. The study uses the decision tree and random forest machine learning methods. The results this study obtains will reveal the characteristics of countries that are at different development levels, as well as which variables affect them the most.

The data set used in the machine learning algorithms has been divided into 80% training and 20% test data. The decision tree and random forest algorithms were then applied to the data set. When considering the accuracy rates of both methods, the decision tree is seen to have a 65% accuracy rate, while the random forest has 70%. This shows the countries’ development levels to have been classified correctly at a rate of 70%. When examining the correct classification rates for each income group, all the countries in the high income group are seen to have been correctly classified, while the accuracy of the classifications in the low, lower middle, and high middle groups can be said to range between approximately 65%-97%. This may be due to the effects changes have on countries in the lower income groups. In addition, the study has included values for sensitivity, specificity, positive and negative predictive values, prevalence, detection rate, detection prevalence, and balanced accuracy in order to show the classification of the categories of developmental levels.

References

Ahmad Z ve Saleem A. (2012). Predicting Level of Development for Different Countries. Journal of Sustainable Development, 5(11). doi:10.5539/jsd.v5n11p15 google scholar
Biau G ve Scornet E. (2016). A random forest guided tour. Test, 25(2), 197-227. google scholar
Bloom DE ve Canning D. (2000). The health and wealth of nations. Science, 287(5456), 1207-1209. google scholar
Breiman L. (2001). Random forests. Machine learning, 45(1), 5-32. google scholar
Bulut Ş, Babacan A, ve Ertekin Ş. (2021). Ekonomik Büyümenin Belirleyicilerinin Farklı İnsani Gelişmişlik Düzeyindeki Ülkelere Göre Analizi. Sayıştay Dergisi, (120), 89-114. google scholar
Bühlmann P ve Yu B. (2002). Analyzing bagging. The Annals of Statistics, 30(4), 927-961. google scholar
Çene E. (2022). Makine Öğrenmesi Yöntemleriyle Euroleague Basketbol Maç Sonuçlarının Tahmin Edilmesi Ve Maç Sonuçları Üzerinde En Etkili Değişkenlerin Bulunması. Spor ve Performans Araştırmaları Dergisi, 13(1), 29-52. google scholar
Çeştepe H, Vergil H, ve Ergun H. (2012). Hizmet Ticaretinin Büyümeye Etkisi: Gelişmiş ve Gelişmekte Olan Ülkeler Üzerine Bir Panel Veri Analizi. Business and Economics Research Journal, 3(4), 91-105. google scholar
Demiray Erol E. (2013). Türkiye ve Avrupa Birliği Üyesi Ülkelerin Sosyo-Ekonomik Gelişmişlik Düzeylerinin Karşılaştırmalı Analizi. Sosyal ve Beşeri Bilimler Dergisi, 5(1), 198-208. google scholar
Demirci E ve Karaatlı M. (2019). Ülkelerin Gelişmişlik Seviyelerinin Tahmininde Kullanılan Sınıflandırma Algoritmalarının Karşılaştırılması. Süleyman Demirel Üniversitesi İktisadi ve İdari Bilimler Fakültesi Dergisi, 24(3), 703-714. google scholar
Erden Özsoy C ve Tosunoğlu BT. (2017). GSYH’nin ötesi: ekonomik gelişmenin ölçümünde alternatif metrikler. Çukurova Üniversitesi Sosyal Bilimler Enstitüsü Dergisi, 26(1), 285-301. google scholar
Guisan M-C ve Aguayo E. (2007). Health expenditure, poverty and economic development in Latin America 2000-2005. International journal of Applied Econometrics and Quantitative studies, 4(1), 5-24. google scholar
Guyon I ve Elisseeff A. (2003). An introduction to variable and feature selection. Journal of machine learning research, 3(Mar), 1157-1182. google scholar
Guyon I, Weston J, ve Barnhill S. (2002). Gene selection for cancer classification using DCA. Machine Learning, 46, 389-422. google scholar
Jemna D-V. (2015). Causality Relationship between Economic Development and Fertility in Romania on Regional Level. Procedia Economics and Finance, 20(15), 334-341. doi:10.1016/ s2212-5671(15)00081-7 google scholar
Kalousis A, Prados J, ve Hilario M. (2007). Stability of feature selection algorithms: a study on high-dimensional spaces. Knowledge and information systems, 12(1), 95-116. google scholar
Kleiner A, Talwalkar A, Sarkar P, ve Jordan MI. (2014). A scalable bootstrap for massive data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(4), 795-816. google scholar
Koçak E ve Uçan O. (2018). İnsani gelişme endeksi ile büyüme ilişkisi: Pedroni eşbütünleşme örneği. Journal of Politics Economy and Management, 1(2), 55-61. google scholar
Koşar Taş Ç ve Örk Özel S. (2017). Faktör Analizi Yöntemi İle Türkiye ve Avrupa Birliği Üyesi Ülkelerin Sosyo-Ekonomik Göstergeler Bakımından Gelişmişlik Düzeylerinin Karşılaştırılması. Çukurova Üniversitesi Sosyal Bilimler Enstitüsü Dergisi, 26(3), 60-77. google scholar
Kubar Y. (2016). Az gelişmiş ve gelişmekte olan ülkelerin kalkınma göstergeleri ile ekonomik büyüme arasındaki ilişki: Bir panel veri analizi (1995-2010). Ardahan Üniversitesi İktisadi ve İdari Bilimler Fakültesi Dergisi, 2(4), 65-99. google scholar
Kumar V ve Minz S. (2014). Feature selection: a literature review. SmartCR, 4(3), 211-229. google scholar
Lacalle-Calderon M, Perez-Trujillo M, ve Neira I. (2017). Fertility and Economic Development: Quantile Regression Evidence on the Inverse J-shaped Pattern. European Journal of Population, 33(1), 1-31. doi:10.1007/s10680-016-9382-4 google scholar
Max A, Wing J, Weston S, Williams A, Keefer C, Engelhardt A, ... Kuhn MM. (2021). Package ‘ caret ’ R topics documented : google scholar
Molina LC, Belanche L, ve Nebot A. (2002). Feature selection algorithms: A survey and expeıimental evaluation. IEEE International Conference on Data Mining Proceedings. içinde (ss. 306-313). IEEE. google scholar
Özkan B, Çene E, ve Parim C. (2018). İstanbul’daki Üniversite Öğrencilerinin Memnuniyet Düzeylerinin Çok Değişkenli İstatistiksel Yöntemler ve Karar Ağacıyla İncelenmesi. International Conference on Data Science and Applications içinde (ss. 489-505). google scholar
Öztürk SG. (2007). Classifying and predicting country types through development factors that influence economic, social, educational and health environments of countries. SWDI Proceedings papers S, 759, 665-674. google scholar
Parim C, Özkan B, ve Cene E. (2019). Clustering of Countries by the Factors Affecting Levels of Development and It’s Comparison by Years. Data Science and Applications, 2(1), 4-7. google scholar
Probst P, Wright MN, ve Boulesteix A. (2019). Hyperparameters and tuning strategies for random forest. Wiley Interdisciplinary Reviews: data mining and knowledge discovery, 9(3), e1301. google scholar
R Core Team. (2021). R: A language and environment for statistical computing, R Foundation for Statistical Computing,. Vienna, Austria. https://www.r-project.org/. adresinden erişildi. google scholar
Rebala G, Ravi A, ve Churiwala S. (2019). An introduction to machine learning. Cham: Springer. google scholar
Stec M, Filip P, Grzebyk M, ve Pierscieniak A. (2014). Socio-economic development in the eu member states - Concept and classification. Engineering Economics, 25(5), 504-512. doi:10.5755/j01.ee.25.5.6413 google scholar
Upreti P. (2015). Factors affecting economic growth in developing countries. Major Themes in Economics, 17(1), 37-54. google scholar
Wager S, Hastie T, ve Efron B. (2014). Confidence intervals for random forests: The jackknife and the infinitesimal jackknife. The Journal of Machine Learning Research, 15(1), 1625-1651. google scholar
World Bank. (2022). The World Bank Atlas method - detailed methodology. 6 Temmuz 2022 tarihinde https://datahelpdesk.worldbank.org/knowledgebase/articles/378832-what-is-the-world-bank-atlas-method adresinden erişildi. google scholar
Yang P, Hwa Yang Y, B Zhou B, ve Y Zomaya A. (2010). A review of ensemble methods in bioinformatics. Current Bioinformatics, 5(4), 296-308. google scholar
Zhang J ve Danish. (2019). The dynamic linkage between information and communication technology, human development index, and economic growth: evidence from Asian economies. Environmental Science and Pollution Research, 26(26), 26982-26990. google scholar

Citations

Copy and paste a formatted citation or use one of the options to export in your chosen format

EXPORT

APA

Özkan, B., Parim, C., & Çene, E. (2023). Predicting Countries’ Development Levels Using the Decision Tree and Random Forest Methods. EKOIST Journal of Econometrics and Statistics, 0(38), 87-104. https://doi.org/10.26650/ekoist.2023.38.1172190

AMA

Özkan B, Parim C, Çene E. Predicting Countries’ Development Levels Using the Decision Tree and Random Forest Methods. EKOIST Journal of Econometrics and Statistics. 2023;0(38):87-104. https://doi.org/10.26650/ekoist.2023.38.1172190

ABNT

Özkan, B.; Parim, C.; Çene, E. Predicting Countries’ Development Levels Using the Decision Tree and Random Forest Methods. EKOIST Journal of Econometrics and Statistics, [Publisher Location], v. 0, n. 38, p. 87-104, 2023.

Chicago: Author-Date Style

Özkan, Batuhan, and Coşkun Parim and Erhan Çene. 2023. “Predicting Countries’ Development Levels Using the Decision Tree and Random Forest Methods.” EKOIST Journal of Econometrics and Statistics 0, no. 38: 87-104. https://doi.org/10.26650/ekoist.2023.38.1172190

Chicago: Humanities Style

Özkan, Batuhan, and Coşkun Parim and Erhan Çene. “Predicting Countries’ Development Levels Using the Decision Tree and Random Forest Methods.” EKOIST Journal of Econometrics and Statistics 0, no. 38 (May. 2024): 87-104. https://doi.org/10.26650/ekoist.2023.38.1172190

Harvard: Australian Style

Özkan, B & Parim, C & Çene, E 2023, 'Predicting Countries’ Development Levels Using the Decision Tree and Random Forest Methods', EKOIST Journal of Econometrics and Statistics, vol. 0, no. 38, pp. 87-104, viewed 18 May. 2024, https://doi.org/10.26650/ekoist.2023.38.1172190

Harvard: Author-Date Style

Özkan, B. and Parim, C. and Çene, E. (2023) ‘Predicting Countries’ Development Levels Using the Decision Tree and Random Forest Methods’, EKOIST Journal of Econometrics and Statistics, 0(38), pp. 87-104. https://doi.org/10.26650/ekoist.2023.38.1172190 (18 May. 2024).

MLA

Özkan, Batuhan, and Coşkun Parim and Erhan Çene. “Predicting Countries’ Development Levels Using the Decision Tree and Random Forest Methods.” EKOIST Journal of Econometrics and Statistics, vol. 0, no. 38, 2023, pp. 87-104. [Database Container], https://doi.org/10.26650/ekoist.2023.38.1172190

Vancouver

Özkan B, Parim C, Çene E. Predicting Countries’ Development Levels Using the Decision Tree and Random Forest Methods. EKOIST Journal of Econometrics and Statistics [Internet]. 18 May. 2024 [cited 18 May. 2024];0(38):87-104. Available from: https://doi.org/10.26650/ekoist.2023.38.1172190 doi: 10.26650/ekoist.2023.38.1172190

ISNAD

Özkan, Batuhan - Parim, Coşkun - Çene, Erhan. “Predicting Countries’ Development Levels Using the Decision Tree and Random Forest Methods”. EKOIST Journal of Econometrics and Statistics 0/38 (May. 2024): 87-104. https://doi.org/10.26650/ekoist.2023.38.1172190

Issue 382023, P. 87-104

TIMELINE

Submitted	07.09.2022
Accepted	05.03.2023
Published Online	21.06.2023

LICENCE

Attribution-NonCommercial (CC BY-NC)

This license lets others remix, tweak, and build upon your work non-commercially, and although their new works must also acknowledge you and be non-commercial, they don’t have to license their derivative works on the same terms.

EKOIST Journal of Econometrics and Statistics

Research Article

Predicting Countries’ Development Levels Using the Decision Tree and Random Forest Methods

Ülkelerin Gelişmişlik Düzeylerinin Karar Ağacı ve Rastgele Orman Yöntemleriyle Tahmin Edilmesi

EXTENDED ABSTRACT

PDF View

References

Citations

EXPORT

APA

AMA

ABNT

Chicago: Author-Date Style

Chicago: Humanities Style

Harvard: Australian Style

Harvard: Author-Date Style

MLA

Vancouver

ISNAD

TIMELINE

LICENCE

SHARE