Research Article


DOI :10.26650/d3ai.1606958   IUP :10.26650/d3ai.1606958    Full Text (PDF)

Children of the Tree: Optimised Rule Extraction from Machine Learning Models

Hilal MeydanMert Bal

 The “Children of the Tree” algorithm provides a strong understanding of how the imbalanced dataset is classified by extracting rules from each tree of the Random Forest (RF) model. Basically, it converts the divisions created at each node of the trees into “if-then” rules and extracts individual rules for each tree by differentiating the general “community model” perception in the RF. Thus, the algorithm finds the “Children of the Tree” by converting the forest into a rule set. This study, developed on the “German Credit Data Set”, which is one of the banking data sets on which many studies have been conducted in the literature; determines the rules that cause to fall into that class(class good or class bad) for candidate customers. In this way, the bank would see the rules for potential customers belonging to the risky class and have the chance to recommend the alternative plans/products that are suitable for their risk strategy to their potential customers. The study evaluates rule validity and reliability using association rule mining metrics—support, confidence, lift, leverage, conviction - calculates "Minimum Description Length" (MDL), and ranks rules by "support" and "MDL cost" to extract the simplest rules for each class. It addresses risk management in banking and marketing needs, using MDL cost and SMOTE to handle imbalanced datasets, setting it apart from other algorithms.


PDF View

References

  • Leo Breiman. 2001. Random Forests. Machine Learning. 45, 1 (Oct. 2001), 5-32.https://doi.org/10.1023/A:1010933404324. google scholar
  • Houtao Deng. 2019. Interpreting Tree Ensembles with inTrees. International Journal of Data Science and AnalYtics. 7, 4 (Dec. 2019), 277-287. https://doi.org/l0.1007/s41060-018-0144-8. google scholar
  • Kim Phung Lu Thi, Ngoc Chau Vo Thi, and NguYen Hua Phung. 2015. Extracting Rule RF in Educational Data Classification: From a Random Forest to Interpretable Refined Rules. In Proceedings of the 2015 International Conference on Advanced Computing and Applications (ACOMP). 20-27. https://doi.org/10.1109/ACOMP.2015.13. google scholar
  • element Benard, Gerard Biau, Sebastien da Veiga, and Erwan Scornet. 2019. SIRUS: Stable and Interpretable Rule Set for Classi-fication. arXiv:1908.06852. Retrieved from https://arxiv.org/abs/1908.06852. google scholar
  • Morteza Mashayekhi and Robin Gras. 2015. Rule Extraction from Random Forest: The C Methods. In Kanade, T., Kittler, J., Kleinberg, J.M., et al. (Eds.). Advances in Artificial Intelligence. Vol. 3060. Springer Berlin Heidelberg, Berlin, Heidelberg, 223237. 10.1007/978-3-319-18356-5_20 google scholar
  • Satoshi Hara and Kohei HaYashi. 2017. Making Tree Ensembles Interpretable: A BaYesian Model Selection Approach.arXiv:1606.09066. Retrieved from https://arxiv.org/abs/1606.09066. google scholar
  • Md Nasim Adnan and Md Zahidul Islam. 2017. For A New Framework for Knowledge DiscoverY from Decision Forests. Australasian Journal of Information SYstems. 21, (Nov. 2017). https://doi.org/10.3127/ajis.v21i0.1539. google scholar
  • S. Ilker Birbil, Mert Edali, and Birol Yuceoglu. 2020. Rule Covering for Interpretation and Boosting. arXiv:2007.06379.Retrieved from https://arxiv.org/abs/2007.06379. google scholar
  • Gelin Zhang, Zhe Hou, Yanhong Huang, Jianqi Shi, Hadrien Bride, Jin Song Dong, and Yongsheng Gao. 2021. Extracting Optimal Explanations for Ensemble Trees via Logical Reasoning. arXiv:2103.02191. Retrieved from https://arxiv.org/abs/2103.02191. google scholar
  • Jerome H. Friedman and Bogdan E. Popescu. 2008. Predictive Learning via Rule Ensembles. The Annals of Applied Statistics. 2, 3 (September 2008), 916-954. https://doi.org/10.1214/07-AOAS148. google scholar
  • Nicolai Meinshausen. 2010. Node Harvest. The Annals of Applied Statistics. 4, 4 (December 2010), 2049-2072. DOI:https://doi.org/ 10.1214/10-AOAS367. arXiv:0910.2145. Retrieved from https://arxiv.org/abs/0910.2145. google scholar
  • Haddouchi Maissae and Berrado Abdelaziz. 2024. Forest-ORE: Mining Optimal Rule Ensemble to Interpret Random Forest Models. arXiv:2403.17588. Retrieved from https://doi.org/10.48550/arXiv.2403.17588. google scholar
  • Peter D. Grünwald. 2007. The Minimum Description Length Principle. Adaptive Computation and Machine Learning series. The MIT Press. https://doi.org/10.7551/mitpress/4643.001.0001. google scholar
  • Peter Clark and Tim Niblett. 1989. The CN2 Induction Algorithm. Machine Learning. 3, (1989), 261-283. https://doi.org/10.1007/BF00116835. google scholar
  • Mlungisi Duma, Bhekisipho Twala, and Tshilidzi Marwala. Improving the Performance of the RIPPER in Insurance Risk Classifi-cation: A Comparative StudY Using Feature Selection. arXiv:1108.4551. Retrieved from https://doi.org/10.48550/arXiv.1108.4551. google scholar
  • Eibe Frank and Ian H. Witten. 1998. Generating Accurate Rule Sets Without Global Optimization. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML '98). 144-151. Published: 24 JulY 1998. google scholar
  • Benjamin Letham, CYnthia Rudin, TYler H. McCormick, and David Madigan. 2015. Interpretable Classifiers Using Rules and BaYesian AnalYsis: Building a Better Stroke Prediction Model. The Annals of Applied Statistics. 9, 3 (2015), 1350-1371. https://doi.org/10. 1214/15-AOAS848. google scholar
  • Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: high-precision model-agnostic explanations. In Proceed-ings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence (AAAI'18/IAAI'18/EAAI'18). AAAI Press, Article 187, 1527-1535. google scholar
  • [19] Steven L. Salzberg. C4.5: Programs for Machine Learning bY J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993. Mach Learn 16, 235-240 (1994). https://doi.org/10.1007/BF00993309. google scholar
  • Leo Breiman, Jerome Friedman, Charles J. Stone, and Richard A. Olshen. 1984. Classification and regression trees. Wadsworth International Group. https://doi.org/10.1201/9781315139470. google scholar
  • William W. Cohen and Yoram Singer. 1999. A simple, fast, and effective rule learner. In Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence (AAAI '99/IAAI '99). American Association for Artificial Intelligence, USA, 335-342. google scholar
  • Zhuo Wang, Wei Zhang, Ning Liu, JianYong Wang. 2021. Scalable Rule-Based Representation Learning for Interpretable Classifi-cation. In Advances in Neural Information Processing Systems (NeurIPS). google scholar
  • Himabindu Lakkaraju, Stephen H. Bach, and Jure Leskovec. 2016. Interpretable Decision Sets: A Joint Framework for Description and Prediction. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge DiscoverY and Data Mining (KDD '16). Association for Computing MachinerY, New York, NY, USA, 1675-1684. https://doi.org/10.1145/2939672.2939874 google scholar
  • Harsha Nori, Samuel Jenkins, Paul Koch, and Rich Caruana. 2019. Interpretml: A Unified Framework for Machine Learning Inter-pretabilitY. arXiv:1909.09223. Retrieved from https://doi.org/10.48550/arXiv.1909.09223. google scholar
  • G Roshan Lal, Xiaotong Chen, and Varun Mithal. 2022. TE2Rules: Explaining Tree Ensembles using Rules. arXiv:2206.14359. Retrieved from https://doi.org/10.48550/arXiv.2206.14359. google scholar
  • Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining Association Rules Between Sets of Items in Large Databases. ACM SIGMOD Record. 22, 2 (June 1993), 207-216. https://doi.org/10.1145/170036.170072. google scholar
  • Elif VAROL ALTAY and BİLAL ALATAŞ. 2020. Nicel Birliktelik Kural Madenciliği İçin Baskın Olmayan Sıralama Genetik Algoritma-II’nin DuYarlılık Analizi. BİLİŞİM TEKNOLOJİLERİ DERGİSİ, Cilt 13, (Ocak 2020). google scholar
  • Jorma Rissanen, Modeling bY the shortest data description, Automatica 14 (1978) 465-471. google scholar
  • Teemu Roos. 2017. Minimum Description Length Principle. In EncYclopedia of Machine Learning and Data Mining(editors:Sammut, C., Webb, G.I.). Springer, 823-827. https://doi.org/10.1007/978-1-4899-7687-1_894. google scholar
  • Jorma Rissanen. 1989. Stochastic ComplexitY and Statistical InquirY. World Scientific. google scholar
  • Andrew R. Barron, Jorma Rissanen and Bin Yu. 1998. The minimum description length principle in coding and modeling. IEEE Trans. Inform. TheorY 44(6), 2743-2760. https://doi.org/10.1109/18.720554. google scholar
  • Ming Li and Paul VitânYİ. 2019. An Introduction to Kolmogorov ComplexitY and Its Applications. Springer. ISBN: 978-3-030-11297-4. google scholar
  • Peter Grünwald and Teemu Roos. 2019. Minimum Description Length Revisited. International Journal of Mathematics for Industry. 11, 01 (2019), 1930001. https://doi.org/10.1142/S2661335219300018. google scholar
  • Nitesh V. Chawla, Kevin W. BowYer, Lawrence O. Hall and W. Philip KegelmeYer. 2002. SMOTE: SYnthetic MinoritY Over-sampling Technique. Journal of Artificial Intelligence Research 16 (2002) 321-357. https://doi.org/10.48550/arXiv.1106.1813 google scholar
  • Hans Hofmann. 1994. Statlog (German Credit Data) [Dataset]. UCI Machine Learning RepositorY. https://doi.org/10.24432/C5NC77. google scholar

Citations

Copy and paste a formatted citation or use one of the options to export in your chosen format


EXPORT



APA

Meydan, H., & Bal, M. (2025). Children of the Tree: Optimised Rule Extraction from Machine Learning Models. Journal of Data Analytics and Artificial Intelligence Applications, 0(0), -. https://doi.org/10.26650/d3ai.1606958


AMA

Meydan H, Bal M. Children of the Tree: Optimised Rule Extraction from Machine Learning Models. Journal of Data Analytics and Artificial Intelligence Applications. 2025;0(0):-. https://doi.org/10.26650/d3ai.1606958


ABNT

Meydan, H.; Bal, M. Children of the Tree: Optimised Rule Extraction from Machine Learning Models. Journal of Data Analytics and Artificial Intelligence Applications, [Publisher Location], v. 0, n. 0, p. -, 2025.


Chicago: Author-Date Style

Meydan, Hilal, and Mert Bal. 2025. “Children of the Tree: Optimised Rule Extraction from Machine Learning Models.” Journal of Data Analytics and Artificial Intelligence Applications 0, no. 0: -. https://doi.org/10.26650/d3ai.1606958


Chicago: Humanities Style

Meydan, Hilal, and Mert Bal. Children of the Tree: Optimised Rule Extraction from Machine Learning Models.” Journal of Data Analytics and Artificial Intelligence Applications 0, no. 0 (Feb. 2025): -. https://doi.org/10.26650/d3ai.1606958


Harvard: Australian Style

Meydan, H & Bal, M 2025, 'Children of the Tree: Optimised Rule Extraction from Machine Learning Models', Journal of Data Analytics and Artificial Intelligence Applications, vol. 0, no. 0, pp. -, viewed 5 Feb. 2025, https://doi.org/10.26650/d3ai.1606958


Harvard: Author-Date Style

Meydan, H. and Bal, M. (2025) ‘Children of the Tree: Optimised Rule Extraction from Machine Learning Models’, Journal of Data Analytics and Artificial Intelligence Applications, 0(0), pp. -. https://doi.org/10.26650/d3ai.1606958 (5 Feb. 2025).


MLA

Meydan, Hilal, and Mert Bal. Children of the Tree: Optimised Rule Extraction from Machine Learning Models.” Journal of Data Analytics and Artificial Intelligence Applications, vol. 0, no. 0, 2025, pp. -. [Database Container], https://doi.org/10.26650/d3ai.1606958


Vancouver

Meydan H, Bal M. Children of the Tree: Optimised Rule Extraction from Machine Learning Models. Journal of Data Analytics and Artificial Intelligence Applications [Internet]. 5 Feb. 2025 [cited 5 Feb. 2025];0(0):-. Available from: https://doi.org/10.26650/d3ai.1606958 doi: 10.26650/d3ai.1606958


ISNAD

Meydan, Hilal - Bal, Mert. Children of the Tree: Optimised Rule Extraction from Machine Learning Models”. Journal of Data Analytics and Artificial Intelligence Applications 0/0 (Feb. 2025): -. https://doi.org/10.26650/d3ai.1606958



TIMELINE


Submitted24.12.2024
Accepted20.01.2025
Published Online23.01.2025

LICENCE


Attribution-NonCommercial (CC BY-NC)

This license lets others remix, tweak, and build upon your work non-commercially, and although their new works must also acknowledge you and be non-commercial, they don’t have to license their derivative works on the same terms.


SHARE




Istanbul University Press aims to contribute to the dissemination of ever growing scientific knowledge through publication of high quality scientific journals and books in accordance with the international publishing standards and ethics. Istanbul University Press follows an open access, non-commercial, scholarly publishing.