There are four types of analytics, descriptive analytics, diagnostics analytics, predictive analytics, and prescriptive analytics. Among all four types of analytics, predictive analytics is one of the important type of analytics to predict the future based on historical data. There are many factors which impacts the dependent variable and the future trend. So, while doing forecasting, all the external factors are considered as constant which may not have impact in the past. As an example, growth in the car emission is impacting the pollution which is growing day by day. So, while predicting future temperature, by default these parameters are taken into the account.
Among all forecasting methods, Autoregressive Integrated Moving Average (ARIMA) is applicable where data is timeseries format. For all types of time series data, whether it is sales or weather data, ARIMA technique is being used. The steps to perform any time series forecasting are descriptive analytics, model building and forecasting. In this article, we’ll discuss the Autoregressive Integrated Moving Average (ARIMA) and how to perform Time Series Forecasting with R (ARIMA). Moreover, in this article, predictive analytical technique will be used to predict temperature based on historical data, and it should be validated with accuracy parameters of the model.
Table of Contents
What is Autoregressive Integrated Moving Average (ARIMA)
Autoregressive Moving Average or ARIMA is a type of forecasting technique which is applicable only for time series type of data set. ARIMA is also called Box-Jenkins method. In Box-Jenkins method, three things are taken into consideration, auto regression
, moving average and seasonality (seasonal differencing). In the said methodology, three values are being checked, p (auto regression), d (stationarity) and q (moving average).
Stationarity of data is one of the important factors for ARIMA model. Moreover, ARIMA model is working on the stationary data. When we say stationary data, it is basically the data set where the properties of the data set don’t get change over a period of time in terms of mean, variance and autocorrelation.
There are two methods to perform ARIMA, manual differencing the data to make it stationary or using auto.arima algorithm. Auto ARIMA (auto.arima) function is such a powerful function which generates the best value of p, d and q so that forecasting will be best as per the available data.
Time Series Forecasting with R
ARIMA is consists of three things, Auto Regressive (AR), Integrated (I), Moving Average (MA). As a standard practice, first step is to convert the data into time series data set by using ‘ts’ function. Next step is to find out the stationarity of data and if it is not stationary, convert the data into stationary data by differencing it. As auto.arima algorithm has been used, find out p, d and q value manually was not required as the algorithm give the best p, d and q value. Then accuracy of the model has been evaluated and forecasted the future value.
Points to check for ARIMA modelling
Nature of the time series
Seasonality is there or not in the trend
Upward or downward trend
Check whether the data is stationary
Determine the order of AR and MA or value of p, d and q
Check the accuracy of the model
Forecast using the best model
Steps to create ARIMA model in R
Load the following libraries.
forecast
fUnitRoots
Load/ import the data
Change the data into time series data
Running Auto ARIMA (auto.arima) and predicting the future value
Plot the past and predicted value
Export the data into excel
Data Preparation
The data preparation is the one of crucial step before creating any model. In this case, the data preparation has been done by doing the followings.
Replacing Null value/ outlier with the average (past data or trend).
Descriptive Analysis
Descriptive analysis is all about to do the analysis on past data. Descriptive analysis explains the data in terms of past trend, average, maximum, minimum, and standard deviation to explain the past incident. So, descriptive analysis has been performed to understand the past temperature data properly.
Separate visualization has been created to visualize the past trend. Also, a table has been attached for descriptive analysis in terms of maximum, minimum, range, average, and standard deviation (Table. 1)
Descriptive Analysis of Temp
Max
39.8
Min
6.6
Range
33.2
Average
25.0
Standard Deviation
13.1
Table.1: Descriptive Analysis
Comments: High standard deviation of temperature has been found i.e. 13.1. R2 value of linear trend line is depicting the increase of temperature with time (year) in terms of best fit. It is showing good R2 value that is near to 0.5 as statistically significant. As per the temperature trend in increasing over the year.
Future data Prediction
In the following graphs, 245 months data has been shown which include past (221 month) and predicted (24 month) data (Fig. 3).
Also, actual predicted (point forecasted data) temperature for 24 months has been mentioned in table 2. R graphical output shows the best p, d & q value with seasonality value (Fig. 4).
R Code for ARIMA
R is being used for the predictive analytics in terms of performing time series forecasting through Auto Regressive Integrated Moving Average (ARIMA) technique.
The following source code was used to predict the future temperature. You can use the same code for ARIMA model.
####################### Library ################
install.packages(“forecast”)
library(forecast)
############# Import data ##################
setwd(“C:\\Users\\Insightoriel\\Desktop\\ARIMA”)
data<-read.csv(“Temp_Data_.csv”)
summary(data)
head(data)
data
############## Change into time series data ############
data<-ts(data[,2],start=c(1995,1),frequency=12)
############## Plot the data #########################
plot(data)
######################### ADF test #######################
install.packages(“fUnitRoots”)
library(fUnitRoots)
adfTest(data)
############ Running auto ARIMA & Predicting future value ############
############################## test ###########################
accuracy(ffcast)
Box.test(ffcast,type=”Ljung”,lag=12)
lag(ffcast,24)
############# Plot the forcasted values ################
plot(ffcast)
write.csv(ffcast,”Forecasted.csv”)
Accuracy Check
For ARIMA model, there are few statistical parameters which are produced by R and being checked for accuracy testing. Those are described below.
Augmented Dickey Fuller Test (ADF Test): ADF Test is being used to check the stationarity of the data. If p value is <0.01 then data should be considered as stationary. In R, “auto.arima” function is transforming the data into stationary if it is nonstationary.
R is showing the following statistical parameters to check the model accuracy as output of ‘accuracy’ function.
MAPE (Mean Absolute Percentage Error)
RMSE (Root Mean square Error).
MAE (Mean Absolute Error)
MAD (Mean Absolute Deviation)
MASE (Mean Absolute Scaled Error)
ACF1 (Autocorrelation Coefficient)
Among all the parameters for accuracy testing, MAPE (should be <5) is most significant to check the accuracy. All the statical parameters are showing less value for ARIMA model for all four cities which means that the accuracy of the model is good and acceptable.
Also performed Box-Ljung test to check the p value of the model.
R Output:
ADF Test
Accuracy Test
Box Ljung Test
Comments- ADF testing output (p value) is showing that the data is stationary. MAPE is 3.52 which is within the range (<5 or 5%). Also, p value (p<0.05) defines that the model is statistically significant.
Conclusion
We have discussed the entire details of ARIMA model and Time Series Forecasting with R. Also, mentioned the codes so that you can apply the codes to do prediction on your data.