Producing Landslide Susceptibility Maps Using Statistics and Machine Learning Techniques: The Rize-Taşlıdere Basin Example
Arif Çağdaş Aydınoğlu, Gehver AltürkAs a disaster type, landslides cause significant life and economic losses; hence, producing landslide susceptibility maps is a priority research topic. This study aims to perform a landslide susceptibility analysis for shallow landslides by using statistics and machine learning techniques and evaluate the model performance using the Rize-Taşlıdere Basin as an example. First, literature was examined. Next, a detailed research was performed on the study area characteristics and the landslide inventory creation. Fifteen parameters (i.e., land use, lithology, elevation, slope, aspect, roughness, plan curvature, profile curvature, stream erosion index, topographic humidity index, sediment-carrying capacity, drainage density, distance to drainage, road density, and distance to road) produced by the geographic information system techniques were used as the input parameters in producing the landslide susceptibility map. Using the landslide inventory and input parameters, a parameter analysis was performed for the landslide susceptibility map in five classes by employing the frequency ratio (FR), logistic regression (LR), and artificial neural network (ANN) methods. The area under the curve and the area under the relative operating curve (AUC) were used to evaluate the model performance. The results show FR of 0.72, LR of 0.83, and ANN of 0.87. Although the ANN technique provided results with a higher accuracy, the LR technique that was near accurate was usable.
Heyelan Duyarlılık Haritalarının İstatistik ve Makine Öğrenmesi Teknikleri Kullanılarak Üretilmesi: Taşlıdere Havzası Örneği (Rize)
Arif Çağdaş Aydınoğlu, Gehver AltürkHeyelanlar, ülkemizde önemli derecede can ve ekonomik kayba neden olmuş afet türü olduğundan heyelan duyarlılık haritalarının üretilmesi öncelikli araştırma konularındandır. Bu çalışmada, istatistik ve makine öğrenmesi teknikleri kullanılarak sığ heyelanlara ilişkin heyelan duyarlılık analizinin gerçekleştirilmesi ve Rize- Taşlıdere Havzası örneği ile modelin performansının değerlendirilmesi amaçlanmaktadır. Öncelikle konuya ilişkin literatür irdelenmiş, havzanın drenaj alanı içerisinde çalışma alanı genel özellikleri ve sığ heyelan envanterinin oluşturulmasına yönelik ayrıntılı araştırmalar yürütülmüştür. Heyelan duyarlılık haritasının üretilmesinde girdi parametresi olarak Coğrafi Bilgi Sistemleri (CBS) teknikleri ile üretilmiş onbeş parametre kullanılmıştır. Bu parametreler; arazi kullanımı, litoloji, yükselti, eğim, bakı, pürüzlülük, plan eğriselliği, profil eğriselliği, pürüzlülük indeksi, akarsu aşındırma gücü indeksi, topoğrafik nemlilik indeksi, sediman taşıma kapasitesi, drenaj yoğunluğu, drenaja olan mesafe, yol yoğunluğu ve yola olan mesafedir. Heyelan duyarlılık haritası için heyelan envanteri ve girdi parametreleri kullanılarak, Frekans Oranı (FO), Lojistik Regresyon (LR) ve Yapay Sinir Ağları (YSA) yöntemleri ile uygun parametre kestirimi ve analizler gerçekleştirilmiştir. Üretilen haritalar beş duyarlılık sınıfında belirlenmiş, performansının değerlendirilmesinde ROC (Bağıl İşlem Eğrisi) eğrisi altında kalan alan olan AUC (Eğri altındaki alan) değeri FO 0,72, LR 0.83, YSA 0.87 olarak elde edilmiştir. Böylelikle mevcut YSA tekniğinin daha yüksek doğrulukta sonuç vermesine rağmen, LR tekniğinin yakın doğrulukta ve kullanılabilir olduğu görülmektedir.
Landslides are a disaster type that cause significant life and economic losses in our country. Hence, landslide susceptibility studies to reduce landslide damages and land use planning have become a very important issue for both scientists and public institutions. This study aims to perform a landslide susceptibility analysis for shallow landslides by using statistics and machine learning techniques and evaluate the model performance using the Rize-Taşlıdere Basin as an example. The basin representing the study area is considered as a potential landslide area due to its geographical features.
The literature on the subject was first examined. Next, a detailed research was performed on the general characteristics of the study area and the creation of a shallow landslide inventory within the drainage area of the basin. The inventory data of 878 landslides with point geometry were defined in the study area. Fifteen parameters produced by geographic information system techniques (i.e., land use, lithology, elevation, slope, aspect, roughness, plan curvature, profile curvature, roughness index, stream erosion index, topographic humidity index, sediment-carrying capacity, drainage density, distance to drainage, road density, and distance to road) were used as the input parameters in producing the landslide susceptibility map.
Using the landslide inventory and input parameters, appropriate parameter estimation and analysis were performed to the landslide susceptibility map by employing the frequency ratio (FR), logistic regression (LR), and artificial neural network (ANN) methods. FR is a widely used method because of its easy applicability. It is defined as the ratio of the probability of a disaster event to the probability of it not happening. The effects of the parameters causing landslide events are independently evaluated. According to the FR susceptibility map, 0.20% of the basin has very low landslide susceptibility (i.e., 11.47% low, 46.32% medium, 40.86% high, and 1.15% very high).
LR is the expression of the probability values of the dependent variable defining the landslide inventory according to the independent variables defining the input parameters. It determines the cause–effect relationship between the dependent and independent variables when the dependent binary variable (yes–no) expresses the landslide inventory. Each class of geology and land use was independently added to the analysis from each other; thus, the number of independent variables was 26. The LR susceptibility map showed that 4.55% of the basin has very low landslide susceptibility (i.e., 18.65% low, 17.45% medium, 43.37% high, and 15.99% very high landslide susceptibility).
The ANN is the computer model of this biological nervous system. Weights indicate the importance of the information in an artificial nerve cell and its effect on the cell. The inputs are multiplied by the weights of the connections they come from to adjust the effect of the inputs on the output to be produced. The weight values change to zero, negative, or positive. Positive and negative values indicate that the information has a positive or a negative effect. The inputs with zero weight are ineffective. In the ANN technique, the number of hidden layers is one. The number of neurons is selected, and the network structure is 26-18-1. The “momentum factor” is 0.5; the RMS is 0.001; the iteration number is 10.000; and the sigmoidal function acts as the activation function. The landslide susceptibility map is then obtained. The ANN susceptibility map showed that 6.52% of the basin has a very low landslide susceptibility (i.e., 7.20% low, 18.9% medium, 36.02% high, and 31.35% very high susceptible).
The area under the curve and the area under the relative operating curve were used to evaluate the performance. The AUC value was obtained as FO 0.72, LR 0.83, and ANN 0.87. Although the ANN gives a more accurate result than the other techniques, it is a complex analysis method. Creating a network design is a topic that should be studied in detail because the results will be optimistic in the case of excessive learning. The LR technique provided near-accurate results; hence, it is a more easily applicable method than the ANN. The weight coefficients of the parameters and their effect on the analysis are presented one by one.