Using the ARIMA and ARIMAX Methodologies to Estimate, Model, Forecast, and Compare Airline Passenger Transportation Demand in Türkiye
Vahap ÖnenDetermining air transportation demand is a very important input that impacts countries’ economies in terms of both micro and macroeconomics. Thus, almost all countries continuously conduct research studies on estimating and forecasting airline passenger demand. However, new estimating and forecasting methods have been improved since the past. The purpose of this study is to estimate and forecast airline passenger demand based on the autoregressive integrated moving average (ARIMA) and ARIMA with explanatory variables (ARIMAX) methods in light of updated time series data and to determine the best model between the two econometric methods. The data used in the study belong to quarterly data between 2010-2022, in which the total incoming and outgoing airline passengers and total incoming and outgoing aircraft are obtained from Türkiye’s General Directorate of State Airports Authority (DHMI), GDP values are obtained from the Turkish Statistical Institute (TurkSTAT), and crude oil import prices are obtained from the US Energy Information Administration (EIA). The collected data have been analyzed using the software program EViews 10.00. The study presents estimations and predictions for total number of airline passengers by first using ARIMA modeling based on the Box-Jenkins method for univariate time series analysis and then ARIMAX modeling by adding exogenous variables to the first model. The study first tested for effects from COVID-19 and found no significant structural break in the studied period. Thus, the seasonal ARIMA (1,1,0) x (1,1,2) effect model (SARIMA) was understood to have the best fit. The second part of the study added the exogenous variables to the model and found ARIMAX (1,1) x (0,0) to have the best fit. Therefore, while the number of aircraft movements and GDP variables are found to be significant and to support the model, imported crude oil price was found to not be significant and to not support the model. The forecasting analysis results found the Theil index; bias proportion; RMSE, MAE, and MAPE; variance, and covariance explanation ratios; and R square values to be at satisfactory levels for both models. When comparing the two models, the ARIMAX model is seen to perform better than the ARIMA model.
ARIMA-ARIMAX Yöntemiyle Türkiye Havayolu Yolcu Talep Tahmin Modellemesi, Öngörüsü Ve Karşılaştırması
Vahap ÖnenHavayolu yolcu talebinin belirlenmesi makro ve mikro ekonomik açından ülkelerin ekonomisine etki eden önemli bir girdi unsurudur. Bundan dolayı hemen hemen tüm ülkeler havayolu yolcu talebi tahmin ve öngörüsüne yönelik sürekli araştırmalar yapmaktadır. Bununla birlikte geçmişten günümüze tahmin ve öngörü modellerinde bir çok yeni yöntemler geliştirilmiştir. Bu çalışmanın amacı, güncel verilerin ışığı altında Türkiye havayolu yolcu sayısını ARIMA ve ARIMAX yöntemlerine dayalı olarak model tahminlemesi, öngörüsünün yapılması ve kullanılan ekonometrik modellerden hangisinin daha iyi olduğunun belirlenmesidir. Çalışmada kullanılan veriler 2010-2022 dönemleri çeyrek dönemlere ait olup; Devlet Hava Meydanları İşletmesinden (DHMİ) gelen- giden toplam yolcu sayıları, ticari gelen-giden toplam uçak hareket sayısı, TUIK’ten Gayri Safi Milli Hasıla (GHSY) ve Birleşik Devletler Enerji Bilgi İşletmesinden Ham Petrol Alış Fiyatı ve kullanılarak sağlanmıştır. Toplanan veriler Eviews 10.00 yardımıyla analiz edilmişlerdir. Araştırmada tek değişkenli zaman serisi analizi Box-Jenkins metoduna dayalı ARIMA modellemesi ve daha sonra modele dışsal değişkenlerin ilave edilerek ARIMAX modeliyle toplam yolcu sayısı tahminlemesi ve öngörüleri ortaya çıkarılmıştır. Yapılan çalışmada, ilk aşamada Covid-19 etkisi incelenmiş ve bu dönemde yapısal kırılmanın anlamlı bir etkisi olmadığı belirlenmiş, akabinde yapılan model tahimininde mevsimsel etkiyi de dikkate alan SARIMA (1,1,0)(1,1,2) modelinin en uygun model olduğu belirlenmiştir. İkinci çalışmada ARIMAX modellemesi yapılarak eklenen dışsal değişkenlerle birlikte en uygun modelin (1,1)(0,0) olduğu tespit edilmiş, dışsal değişkenlerden ülkeye giriş ve çıkış yapan ticari uçak sayısının, ve Gayri Safi Milli Hasılanın modeli anlamlı şekilde desteklediği, ham petrol fiyatı alış fiyatının ise yolcu sayısının belirlenmesine etkisinin anlamlı olmadığı belirlenmiştir. Yapılan her iki modelin öngörü analizleri sonucunda elde edilen Theil katsayıları, yansızlık oranları (Bias) varyans açıklama oranları, hataların ortalama kare kökü (RMSE), hataların mutlak ortalaması (MAE), hataların ortalama mutlak yüzdesi (MAPE) ve modeli açıklama gücü R kare değerlerinin oldukça iyi seviyede oldukları belirlenmiş olup bununla birlikte, yapılan model karşılaştırmaları neticesinde ARIMAX modelinin daha iyi öngörü gücü olduğu sonucuna ulaşılmıştır.
The liberalizations in aviation and the deregulation of air rules have increased the demand for air transportation in recent years and had great impact on the different airlines that provide these services, triggering congestion at airports and airlines. The quickest effect from liberalizing air rules has been seen in air travel service-providing organizations, which has affected the cost structure in the sector and led to various alliances and airline mergers, as well as new companies entering the market (Janic, 2000).
The problem with measuring airline demand is complex. Total demand and market share for a particular airline are determined by many variables, such as the firm’s price adjustments, advertising, number of flights, timing of flights, equipment, and other factors, as well as competitive environments and population densities of the same variables, business activities, environmental conditions, disposable personal income, and other issues. All these variables regarding decisions, competition, and environmental influences interact in the market to generate sales. From a marketing perspective, the problem of airline management involves anticipating the possible outcomes of these interactions for different marketing strategies. This allows a firm to choose the best strategy in light of the changing competitive and environmental conditions (Schultz, 1972).
The main purposes of this study involve how demand has been affected over the studied time period, whether COVID-19 has influenced demand structure or not, and using 52 quarters from time series data to create a model for estimating demand using not only the autoregressive integrated moving average (ARIMA) but also ARIMA with explanatory variables (ARIMAX) methods by adding exogenous variables. This has been done to reveal which of the two models is better and can accordingly forecast more precisely the near future demand. As a result, conducting this study is thought to be worthwhile and remarkable.
In general, airlines are divided into three categories regarding forecasting techniques: qualitative, quantitative, and decision analysis, and sometimes a combination of the first two can be used (Taneja, 1978). Time series analysis involves a ratio analysis, train projection, moving averages, spectral analysis, adaptive filtering, and Box-Jenkins methods. The details of all these methods are found in Montgomery and Johnson (1976), Box and Jenkins (1976), Brown (1963), Jenkins and Watts (1968), Anderson (2011), and Granger (1990). Econometric and time series are the methods most used to explain travel demand, and no differentiation is made between them. As seen in Table 1, Andreoni and Postorino (2006), Fernandes and Pacheco (2023), Kulendran and King (1997), Lim and McAleer (2002), and Lim et al. (2008) used univariate time series (i.e., ARIMA and/or ARIMAX) models in their studies. The ARIMA model has been used to determine air passenger transportation demand in Türkiye (Tortum et al., 2014b, p. 53), and their study’s results found the model that best explains the monthly passenger series to be the seasonal ARIMA (6,1,1)x(12,0,12) method (SARIMA).
The study uses quarterly data from 2010-2022 quarters, in which the total number of incoming and outgoing airline passengers and total incoming and outgoing aircraft movement numbers have been obtained from Türkiye’s General Directorate of State Airports Authority (DHMI), GDP from the Turkish Statistical Institute (TurkStat), and crude oil import prices from the US Energy Information Administration (EIA). More than 30 time-series datasets are considered sufficient for a forecasting model performed on a quarterly basis (Gujarati, 2014). The ARIMA and ARIMAX models were used by making use of the time series Box-Jenkins method for estimating the total number of domestic and international passengers transported by air at quarterly annual intervals. The EViews package program was used to determine the model estimating and forecasting.
Figure 3 provides graphical representations of the data of the related series. As seen in the figure, passenger demand increased until 2020, decreased in 2020 and 2021 due to the pandemic, and started to rise again as of 2022. In addition, demand was observed to fluctuate with seasonal characteristics. The number of aircraft movements for a similar course has also been included in the figure. The purchase price of crude oil is seen to fluctuate, and GDP is seen to have increased gradually since 2010, with this momentum increasing even more since 2018.
While the number of passengers was constant and no trend was found regarding the I(0) state of the passenger number difference series DPAS, the 3-lagged ADF unit root test results are greater than those for the t statistical absolute values (2.370) at 5% (1.947) and 10% (1.612) critical absolute values (p < 0.05); thus hypothesis H0 was rejected, and no unit root was determined to exist for the series, which is constant and trendless. No unit root was found in the DPAS series I(0), while the series was seen to become stationary in the PAS series I(1). Figure 8 presents a graph of the DPAS series.
Structural break unit root tests were conducted to determine whether a structural break occurs in the DPAS. Table 15 presents the 12- and 4-lag analyses of the DPAS series resulting from the structural break ADF and Perron’s unit root tests. As a result, hypothesis H0 was rejected, as no root or structural break occurred in the 2020Q1 period.
When examining the correlogram structure of the DPAS series, autocorrelation was determined to be present in the series due to both the autocorrelation and partial autocorrelation having multiple values (p < 0.05). In addition, when examining the number of passengers, the graphs reveal a seasonality effect to also be seen in the series due to the number of passengers being lower in the winter months and higher in the summer period. Therefore, many trials will be required to select the most suitable model. Of course, the AR, MA, SAR, and SMA coefficients with the smallest coefficients that meet the need will be selected here. As a result, the most suitable model is selected by looking at the following criteria in terms of comparison (Asteriou & Hall, 2011, p. 278).
A total of 1,521 model trials were conducted using the E-Views Auto ARMA option. As a result of the trials, the most appropriate ARMA model (1.1)x(1.2) or ARIMA (p,d,q) SARIMA (P,D,Q) ) was determined as the SARIMA (1,1,1)x(1,1,2), with the smallest related AIC value being determined as 38.61861. The static estimation result for passenger demand was obtained within the 95% confidence interval for the ARIMA (1,1,1)x(1,1,2) model, as given in Figure 14. In addition, the average absolute error, average absolute error percentage, Theil index, and related bias ratio, variance ratio, and covariance ratio values as obtained for the estimation reliability have been included.
The performed tests revealed the aircraft movements ACN series, LGDP series, crude oil purchase price (AIOP) series to not have a stationary level and to be a unit root. The tests performed by taking the difference of one of the series showed the series to become stationary regardless of constancy or trend. The ADF unit root test results tables related to the subject are Tables 16, 17, and 18. As a result of the model trials, the crude oil purchase price (DAIOP) series was excluded from the equation as an exogenous variable due to the lack of significance.
As a result of the trials, the ARMA model (1,1)x(0,0) and ARIMA (1,1,1) were determined as the most suitable models, with the smallest AIC value being determined as 36.95815. The obtained model is AR(1) MA(1) DACN DLGDP(-1) for the DPAS series. When we the model is regressed, the following results are shown (Table 19). When considering the correlogram test results of the related series, Figure 16 shows no autocorrelation to be present.
Figure 17 shows the static estimation result of the passenger demand obtained within a 95% confidence interval for the ARIMAX model. In addition, the figure includes the average absolute error, average absolute error percentage, Theil index, and related bias, variance, and covariance ratios obtained for estimation reliability.
When examining the criteria in Table 20, the ARIMAX model’s Theil inequality index, bias ratio, variance ratio, and covariance ratio, as well as RMSE, MAE, and MAPE values are seen to be smaller and the model’s explanation ratio R2 value to be higher in the ARIMAX model. The forecasting power of the ARIMAX model can be said to be better than the DPAS AR(1) MA(1) DACN DLGDP(-1) model. In light of this information, the ARIMAX model must be preferred when estimating the number of airline passengers.