An Application with Python Software for the Classification of Chemical Data
Gonca Ertürk, Oğuz AkpolatNowadays, much data can be generated and stored by chemical analyses. It is possible to evaluate these data, to reveal the relationships between them, and to make predictions with new data measured based on these relationships thanks to data mining algorithms. Monitoring the treatment processes and providing the necessary controls for environmental studies are based on the continuous determination of wastewater and activated sludge characteristics. The main criteria for determining the properties of wastewater are biochemical oxygen demand (BOD5), chemical oxygen demand (COD), total organic carbon (TOC), and dissolved oxygen (DO). Among these parameters, BOD5 measurement takes 5 days, while the others can be measured within 1-2 hours at most. Since BOD5 values can be mathematically correlated with other parameters, estimating them in a short time will provide a great advantage in terms of process control. In this study, a data set was created by measuring the specified parameters from 334 samples taken from a treatment plant for statistical evaluation, and the interactions of the parameters in this data set with each other were analyzed by the decision tree method. Thus, by considering the weighted effects of the parameters, it was tried to predict the probable BOD5 value of an unknown sample. The algorithm selected for this data mining study was modeled with PYTHON software and the performance of the algorithm in the estimation of the BOD5 parameter depending on other parameters was examined by extracting decision tree rules.