On Predicting Growth Factor Data of Covid-19 Epidemic Using Hybrid Arima-Ann Model

: The Autoregressive Integrated Moving Average (ARIMA) model cannot capture the nonlinear patterns exhibited by the 2019 coronavirus (COVID-19) in terms of daily growth factor. As a result, Artificial Neural Networks (ANNs) and Hybrid ARIMA-ANN models have been successfully applied to resolve problems with nonlinear estimation. We compare the forecasting performance of these models using real, worldwide, daily COVID-19 data. The best forecasting model selected was compared using the forecasting assessment criterion known as mean absolute error. The main finding results show that the ANN model is more efficient than the ARIMA and Hybrid ARIMA-ANN models. The main finding from the ANN model analysis indicates that the magnitude of the increase in growth factor over time is rising in general while the percentage change in the growth factor is declining. This may be the result of the social distancing, safety, and cautionary measures mandated by governments worldwide.

In this paper, we aim to study the growth factor of COVID-19 using different models including the autoregressive integrated moving average (ARIMA), the artificial neural networks (ANNs), and the Hybrid ARIMA-ANN models to forecast the spread of COVID-19 around the world for the next 17 days using currently available data.The forecast of growth factor is presented to discover what should be expected in the coming days as well as to determine the best forecasting method.
Evidence from previous studies suggests that the hybrid model performs better than the linear or nonlinear models.For example, Wang. et. al. (2013) suggested that the linear ARIMA model and the nonlinear ANNs model were employed jointly, with the aim of capturing the different patterns in the time series data.They showed the effectiveness of the hybrid model (the multiplicative model) of ARIMA and ANNs models in obtaining more accurate forecasting as compared to ARIMA and ANNs models (Benvenuto et. al., 2020) indicated the effectiveness of the ARIMA model in predicting the epidemiological trend of the prevalence and incidence of COVID-2019.They mentioned that ARIMA (1,0,4) was chosen as the best ARIMA model for predicting the spread of COVID-19, while ARIMA (1,0,3) was selected as the best ARIMA model for determining the incidence of COVID-19.(Ceylan, 2020) showed that ARIMA models are suitable for predicting the prevalence of COVID-19 in Italy, Spain, and France.The study formulated different ARIMA models with different ARIMA parameters and selected the best models based on the lowest MAPE values.(Perone, 2020) showed that the ARIMA models are trustworthy enough for forecasting COVID-19 incidence in Italy, Russia, and the USA when new daily cases begin to stabilize.(Saba and Elsheikh, 2020) showed that nonlinear autoregressive artificial neural networks (NARANN) have a better performance compared with ARIMA based on different statistical criteria such as mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), determination coefficient, deviation ratio (RD), and coefficient of residual mass (CRM).In this paper, an ARIMA, ANNs, and a combination of ARIMA and ANNs (hybrid model) were proposed to make forecasts of the growth factor of COVID-19 time series data.
The remainder of this article is organized as follows.Section 2 provides the Description of the dataset.The Forecasting models are presented in section 3. The Empirical Study for the training and test datasets using three different time series models is discussed in section 4, and the final section concludes by summarizing the key results.
We consider the growth factor from January 24, 2020, to March 20, 2021.The Growth factor is the factor by which a quantity multiplies itself over time.The formula used for the growth factor is given by ' Every day s new cases New cases on the p Growth Fac revious y tor da  (1) A ratio of the growth factor greater than one indicates exponential growth and one which remains smaller than 1 is a signal of decreasing.
The growth factor ranged between 0.2840 and 6.9171 with a mean of 1.0395, a median of 1.0190, and a standard deviation of 0.3276 (with an interquartile range of 0.1725).These summaries indicate that the spread of growth factor varies among the various countries.Figure 1 shows the trend of the growth factor of COVID-19.From this plot, we can see that the growth factor is not linear over time and shows large fluctuations.We must be cautious using ARIMA models as they may not provide accurate forecasts in this case.The accuracy of forecasts can be determined by considering how well a model performs on new data that were not used when fitting the model.Accuracy is an important issue in forecasting; therefore, researchers tend to add more and more variables to their proposed model.Safi.and White.( 2017) considered the issue of whether a complex model actually does a better job than a simple one.
Several measures of forecasting accuracy have been developed and discussed, the fundamental usage of these measurements compared the accuracy of forecasting methods with univariate time series data (Cryer and Chan, 2008;Hyndman. and Athanasopoulos, 2018;and Wei, 2006).The best forecasting models selected will be compared using one of the three different forecasting accuracy measuring criteria: MAE, RMSE, and MAPE.RMSE is more sensitive to outliers than the MAE, which is preferable in cases of the existence of outliers.Using the MAE or RMSE is recommended when comparing forecasting methods on a single data set.This means the MAE and RMSE should be used if all forecasts are measured on the same scale.The MAPE is used when comparing the accuracy of the same or different methods on different time series data with different scales unless the data contain zeros or small values (Hyndman. and Koehler, 2006).The evaluation criterion for these measures of forecasting accuracy is that the smaller the value obtained, the better the model's forecasting ability (McKenzie, 2011).
The efficiency of the proposed forecasting method relative to that of the benchmark method in terms of the RMSE is defined by where, RMSEp and RMSEb represent the RMSE from the proposed and the benchmark methods, respectively.Usually, the benchmark method is the most naïve method (Hyndman and Koehler, 2006).
A ratio of less than one indicates that the forecasting performance of the proposed method is more efficient than the benchmark method and if this ratio is close to one, then the proposed forecasting method is nearly as efficient as the benchmark forecasting method.Otherwise, the proposed method performed poorly (White and Safi, 2016;and Safi, 2013).

FORECASTING MODELS
The ARIMA Model The general ARIMA   ,, p d q model is given (by Box et. al. (2015)): where, d is the degree of differencing, 1B    is the differencing operator, and the lag operator B is defined as , the operator that provides the previous value of the series. and The best ARIMA model is chosen according to its Akaike information criterion (AIC), AICc, or BIC value.

The ANN Model
The nnetar function is used to fit neural networks.This function is described as a feed-forward neural network with a single hidden layer and lagged inputs for forecasting univariate time series.
The nnetar function fits a neural network autoregressive   NNAR p, P, k model.For a non-seasonal time series, the default is the optimal number of lags, according to the AIC value, for a linear autoregressive   AR p model.For a seasonal time series, the default values is P1  where p is chosen from the optimal linear model fitted to the seasonally adjusted data and   1 2 k p P 1    (rounded to the nearest integer).By default, 25 networks with random starting values are trained and their predictions are averaged (by Hyndman, 2004).

The Hybrid Model
The hybrid model fits multiple individual model specifications to enable the easy creation of ensemble forecasts.The hybrid model consists of a combination of three models: the ARIMA, the exponential smoothing, and the ANN models.
Looking at a time series composed of autocorrelated linear and nonlinear components, we have: Fitting   ̂using the ARIMA model with   as the residual yields: The error term consists of nonlinear relationships with previous errors.The nonlinear relationships can be modeled from the past residuals as follows: Then, using an ANN model to predict   as an estimate for   , we can calculate the forecast: These models were built using hybridModel command in the forecastHybrid package.This package fits multiple models and combines them using either equal weights or weights based on in-sample errors.There are six models: ARIMA, exponential smoothing, theta, neural network autoregression (NNAR), seasonal and trend decomposition, and the trigonometric seasonal + exponential smoothing method + Box-Cox Transformation (TBATS) model for heterogeneity and the autoregressive moving average model for residuals + trends + seasonal (including multiples and non-integer periods).

EMPIRICAL STUDY
This section presents the empirical results for the training and test datasets of the models used to forecast the growth factor using three different time series models; the ARIMA, ANN models, and the hybrid combination of the two.The forecasting results are presented in the following sub-sections.
We carried out the Anderson-Darling normality test to determine if the data followed a normal distribution (Thode, (2002.The Anderson-Darling (AD) test is an empirical distribution function omnibus test for the composite hypothesis of normality.The test statistic is: where, . Here, Φ is the cumulative distribution function of the standard normal distribution and x and S are the mean and the standard deviation, respectively, of the data values.
For using the training dataset, the normality test for the growth factor residuals' of COVID-19 data yielded the AD values of 0.80794, 1.0139, and 0.35662, with corresponding p-values of 0.03356, 0.0102, and 0.4405 for ARIMA, ANN, and Hybrid models, respectively.
Therefore, this result indicates that the normality assumption was satisfied at a 0.01 level of significance for all residuals of the three selected models.
Since the data are normally distributed, to compare the performance of the models for the given dataset, we used the forecasting accuracy measure, the RMSE, over the forecasting period for each model.The smaller values of RMSE indicate higher forecasting accuracy.Therefore, the ratios of the RMSE of the ANN to those of the ARIMA, and hybrid models were calculated for analysis.
Table 1 lists the empirical results for the RMSE, MAE, and MAPE and the ratios of the ANN model's RMSE, MAE, and MAPE to those of the ARIMA and hybrid models for growth factor.The ratios of the RMSE of the ANN model to those of the ARIMA and hybrid models were calculated for analysis.
Applying the ANN model for with an average of 1,000 networks, each of which is a 22-25-1 network, with 601 weights and an estimated noise variance of 0.000655.This indicates that 25 networks were trained and that their predictions were averaged.For the ARIMA model, the result shows that the bestfit model was the ARIMA (1,1,2) with drift 0 0008 .
  and the equation is given by The RMSE of the ARIMA, ANN, and hybrid models equal 0.3169, 0.0255 and 0.0744, respectively.This result indicates that the relative efficiencies of the ANN model to the ARIMA and hybrid models equaled Ω=0.0805, and 0.3427, respectively.This result indicates that the ANN model's RMSEs equaled 8.05% and 34.27% of that of the ARIMA and hybrid models, respectively.Therefore, the ANN model was more efficient than the ARIMA and hybrid models for the growth factor of COVID-19 data.Therefore, we could use the ANN model since it outperforms the ARIMA and the hybrid models.However, as a second choice in this case, we could use the Hybrid model since it is outperforming the ARIMA, keeping in mind that it is not a perfect substitute.
The forecast for growth factor using the ANN model is compared with the actual values as shown in Figure 2. From the plot we observe the closeness of forecast values using ANN model to the actual values, indicating that the selected model in forecasting the growth factor is relatively close to the actual values, hence, this can be reliable for policy implementation.This substantiates the valid use of the model.In Figure 3, we show the percentage changes in growth factor calculated using the actual testing data (Feb.7 -March 20).Figures 2 and 3 show that the daily growth factor is fluctuating over time in general.This result indicates that the growth factor is not stable over time.

CONCLUSION
We used the ANN to forecast the growth factor of COVID-19.Forecasting performance was compared for different models using real daily data for COVID-19 around the world in the upcoming days.We discussed various forecasting techniques for choosing the best forecasts for growth factors for COVID-19.
The results show that the ANN performed better than the ARIMA and ARIMA-ANN Hybrid models.This is not a surprising result because ANN is designed to capture the nonlinear trend of the data that were exhibited during the time period of the sample.These results add to the growing body of literature that seeks to accurately forecast the spread of COVID-19 by combining multiple models used by other researchers.
The results are useful because they provide an accurate forecast for growth factors for the COVID-19 pandemic.All Governments and institutions involved in public health can benefit from these results for forecasting purposes using more a reliable and accurate forecast model for the novel COVID-2019.The additional value of results is not encouraging as the world struggles to contain the spread of COVID-19.
The growth factor exhibits a fluctuating trend which decreases our optimism as the fight to contain COVID-19 continues.Further research could be carried out in this area by studying the impact of COVID-19 on economic variables using the most appropriate forecasting techniques.In addition, these results can be used to make a relationship between its forecasts and some economic variables.

Figure 1 :
Figure 1: Growth Factor of COVID-19 In this study, 10% of the sample size is used as the testing sample.A training sample is used for the model building, and the testing sample is used for the model validation at the end of the analysis.We considered the first 380 observations as training sample over the period, of Jan 24, 2020 -Feb.6, 2021, and 42 observations as testing sample over the period, of Feb. 7, 2021 -March 20, 2021.

Figure 2 :
Figure 2: Forecasts and Actual values in Growth factor.

Figure 3 :
Figure 3: Actual Percentage changes for Growth factor After feeding the model with the data from Feb. 7 -March 20, and repeating the procedure using ANN "The best-chosen model", we show the forecasts in the growth factor for the next 41 days (March 21 -April 30), this is shown in Table3 and Figure 3.In the forecast for March 21 st till April 30 th , the daily growth factor is shown to be fluctuating over time in general.The growth factor exhibits a fluctuating trend which decreases our optimism as the fight to contain COVID-19 continues.The chosen forecasting model which is relatively higher prepares policymakers for the worst-case scenario as we go through the next wave of the pandemic.

Figure 3 :
Figure 3: Forecasts in Growth Factor for the full data Using ANN model.

Table 1 :
Forecasting Criteria and Ratios of the RMSE, MAE, and MAPE for the ANN model to the ARIMA and hybrid models.

Table 3 :
Forecasts in Growth Factor for the full data Using ANN model.