A Comparison of the Performance of Ensemble Tree and Neural Networks for The Prediction of Traffic Accident Duration
Hüseyin Korkmaz, Mehmet Ali Ertürk, Mehmet AdakTraffic accident duration is defined as the time difference between the occurrence and the return of the accident scene’s initial state. The aim of this paper is to predict the traffic accident duration based on traffic accident data in Istanbul with Ensemble Tree and Neural Networks methods and to compare the performance of these methods. The secondary aim of the paper is to identify the main factors affecting the accident duration. The accident data sets obtained from Istanbul Metropolitan Municipality and General Directorate of Security are used in this paper. The dataset includes 1.905 traffic accident records in Istanbul from 2013 to 2021. The data were analyzed within the scope of data mining. Statistical tests and machine learning algorithms were applied to the extracted data set and prediction of traffic accident duration was performed. R², MSE, RMSE and MAE metrics were used for the performance measures of the algorithms applied in this paper. It was found that the Ensemble Tree algorithm achieved an R-Square of 0.85 in training, while the Neural Networks algorithm performed better with 0.91 in testing.
Trafik Kaza Süresinin Tahmini İçin Topluluk Ağacı ve Sinir Ağları Performansının Karşılaştırılması
Hüseyin Korkmaz, Mehmet Ali Ertürk, Mehmet AdakTrafik kaza süresi, bir kazanın meydana gelmesi ile kaza yerinin başlangıç durumuna dönmesi arasındaki zaman farkı olarak ifade edilmektedir. Bu araştırmanın birincil amacı İstanbul’daki trafik kaza verilerine dayalı olarak trafik kaza süresini Topluluk Ağacı ve Sinir Ağları yöntemleri ile tahmin etmek ve bu yöntemlerin performanslarını karşılaştırmaktır. Araştırmanın ikincil amacı ise trafik kaza süresini etkileyen temel faktörleri belirlemektir. Bu araştırmada İstanbul Büyükşehir Belediyesi ve Emniyet Genel Müdürlüğü kurumlarından elde edilen İstanbul’a ait kaza bilgisi veri setleri kullanılmıştır. Veri seti, 2013-2021 yılları arasındaki İstanbul’da gerçekleşen 1.905 trafik kaza kaydını içermektedir. Veriler, veri madenciliği kapsamında incelenmiştir. Ayıklanan veri setine istatistik testleri ve makine öğrenmesi algoritmalarından Topluluk Ağacı ve Sinir Ağları uygulanarak trafik kaza süresi tahmini gerçekleştirilmiştir. Bu araştırmada uygulanan algoritmaların performans ölçümleri için R², MSE, RMSE ve MAE metrikleri kullanılmıştır. Topluluk Ağacı algoritmasının eğitimde R-Kare: 0.85 ile başarılı bir performans elde ettiği, testte ise R-Kare: 0.91 ile Sinir Ağları algoritmasının daha iyi performans gösterdiği sonucuna ulaşılmıştır.
This paper aims to predict the traffic accident duration based on traffic accident data in Istanbul with Ensemble Tree and Neural Networks methods and to compare the performance of these methods. The secondary aim of the paper is to identify the main factors affecting the accident duration. Traffic accident duration is defined as the time difference between the occurrence and the return of the accident scene’s initial state. When a traffic incident or accident occurs, the uncertainty of the duration causes concern for drivers, passengers, and traffic operators. Therefore, the duration between the occurrence of a traffic accident and its clearance is a topic worthy of research.
Firstly, a literature review was conducted in this paper. The literature review focuses on recent and up-to-date academic studies published from 2010 until the end of 2022. This literature review has gathered research papers from well-established databases that offer valuable information to researchers and practitioners on analyzing and predicting the duration of traffic accidents. The objective of this literature review is to uncover the dynamics of traffic incidents and the important factors affecting accident duration in various categories to improve traffic management. Thus, some guidance or recommendations for traffic accident management can be provided and countermeasures can be generated.
The accident data sets obtained from the Istanbul Metropolitan Municipality (IBB) and the General Directorate of Security (EGM) are used in this paper. One of the datasets used for the modelling in this paper is the dataset titled "Transportation Management Center Traffic Announcement Data" for the city of Istanbul published by the IBB Department of Transportation through the IBB Open Data Portal and is open access (İBB UDB, 2023). The IBB traffic announcement dataset covers the years 2013-2021. The dataset contains 159.411 traffic incident records and 13 different variables. The other dataset was obtained from the traffic accidents database provided by the statistics unit of the EGM Traffic Directorate. The EGM traffic accidents dataset covers the years 2013-2021 and all cities in Türkiye. The EGM data set was obtained in the form of two Excel documents "accident information" and "accident vehicle information". The "accident information" data set contains 1.338.387 traffic accident records and 53 different variables, while the "accident vehicle information" data set contains 2.206.474 traffic accident records and 11 different variables. The final dataset includes 1.905 traffic accident records in Istanbul from 2013 to 2021. The data were analyzed within the scope of data mining.
Researchers have frequently used machine learning models to predict the duration of traffic accidents, especially in the last decade. Machine learning is a sub-branch of artificial intelligence and has been widely used as a powerful tool to overcome challenges in different domains. Statistical tests and machine learning algorithms Ensemble Tree and Neural Networks methods were applied to the extracted data set and prediction of traffic accident duration was performed.
R², MSE, RMSE, and MAE metrics were used for the performance measures of the algorithms applied in this paper. In addition, the final data set used in this paper is divided into 80% training and 20% testing set. Moreover, for performance optimization of regression-based machine learning algorithms, different analyses including the K-Fold Cross Validation Method, Principal Component Analysis, Feature Selection and Optimizer were applied. As a result, it was found that the Ensemble Tree algorithm achieved an R-Square of 0.85 in training, while the Neural Networks algorithm performed better with 0.91 in testing.