Tuesday, October 31, 2017

Time Series Analysis of Nifty Price forecast using ARIMA Model


















Time Series Analysis of Nifty Price forecast using ARIMA Model









































Its couple of months back I had forecasted Nifty using ARIMA Model. And now I am interested to document it; I am using knitr and it is a powerful package to generate white papers. I came to know about it recently and thought to add it to my blogs as a document. That is the main reason why I am writing too many articles these days. ##What is ARIMA Model? ARIMA is a statistical and economic approach for forecasting time series data which is stationary. Now stationary is a complicated word to understand and in simple words, we can say it is the type of data where the variance and mean are constant. ARIMA is been derived from 2 models Integration, AR and MA i.e Auto Regressive and Moving Average Model. If we remove “I” function i.e Integration term it stand to be a model called ARMA. Which also have better predictability as ARIMA. But in this article I will show how to forecast data using ARIMA Model

For predicting we need 2 packages which we will load

library(forecast)
## Warning: package 'forecast' was built under R version 3.4.1
library(tseries)
## Warning: package 'tseries' was built under R version 3.4.1

After which we will load the data and see the 1st few rows and columns

head(mydata)
##          Date    Open    High     Low   Close   Shares.Traded
## 1 22-Aug-2016 8667.00 8684.85 8614.00 8629.15       156976546
## 2 23-Aug-2016 8628.35 8642.15 8580.00 8632.60       193302503
## 3 24-Aug-2016 8648.50 8661.05 8620.90 8650.30       154168043
## 4 25-Aug-2016 8668.85 8683.05 8583.65 8592.20       219736214
## 5 26-Aug-2016 8614.35 8622.95 8547.55 8572.55       152146666
## 6 29-Aug-2016 8583.75 8622.00 8543.75 8607.45       133723771
##   Turnover..Rs..Cr.
## 1           7394.57
## 2           7309.82
## 3           7199.72
## 4          10508.32
## 5           7153.37
## 6           6857.99

Now we have few unnecessary things in the data and I like to remove it. The Open, High, Low, Volume(share traded) and Amount(Turnover) is unnecessary. So I will remove it

##          Date   Close
## 1 22-Aug-2016 8629.15
## 2 23-Aug-2016 8632.60
## 3 24-Aug-2016 8650.30
## 4 25-Aug-2016 8592.20
## 5 26-Aug-2016 8572.55
## 6 29-Aug-2016 8607.45

After data manipulation I will plot the data and look for any outliers caused by any stock split, buyback, bonus etc.

The plot looks fine and I have to look for the stationary and it can be achieved by ADF test (Augmented Dickey–Fuller). And look for the p-value is significant or not.

## 
##  Augmented Dickey-Fuller Test
## 
## data:  mydata$Close
## Dickey-Fuller = -1.9257, Lag order = 6, p-value = 0.6071
## alternative hypothesis: stationary

The p vlaue looks good. Now look at ACF(Auto corelation factor). As we do univariant analysis, this will help to understand the correlation between themselves i.e with the data itself. And it will help us to understand the data

Now the data say it is having good corelation and it is necessary for a good ARIMA model

Now see all the plot is within blue line. so it is good for ARIMA model

Now we will test all ARIMA models individually and figure out how the data is performing and how to select the best arima model out of various parameter selection using AIC (Akaike information criterion)

Now look at the various arima models and its performance

a<-ARIMA111[6]

b<-ARIMA110[6]

c<-ARIMA210[6]
result<-rbind(a,b,c)
d<-cbind(c("arima(1,1,1)","arima(1,1,0)","arima(2,1,0)"))
e<-cbind(d,result)

ARIMA (1,1,1) is the lowest AIC amoung all the models and we will select the lowest AIC Model. Let us plot the same and see how it looks

tsdiag(ARIMA111)

We will select the lowest AIC. Now in this models I have intentionally left a model and it is obvious humans can miss out things and that is where computers are playing a smarter role and we will use a powerful function called as auto.arima() it is such a function which avoid most of that manual work and search the best AIC and BIC comparing with all the models. Let’s go ahead and apply it

Auto.Model<-auto.arima(mydata$Close)
summary(Auto.Model)
## Series: mydata$Close 
## ARIMA(0,1,0)                    
## 
## sigma^2 estimated as 3678:  log likelihood=-1353.37
## AIC=2708.73   AICc=2708.75   BIC=2712.23
## 
## Training set error measures:
##                    ME     RMSE      MAE        MPE      MAPE      MASE
## Training set 5.218005 60.52221 44.49768 0.05403156 0.5023453 0.9967207
##                   ACF1
## Training set 0.0184851

Now this is the magic of auto Arima and can you see the AIC is the lowest among all the models mentioned above. And go ahead and forecast

q<-forecast(Auto.Model,h=10)
plot(q)

Now we had plot the forecast we will see how the forecasted observation is

print(q)
##     Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 247        9904.15 9826.430  9981.87 9785.287 10023.01
## 248        9904.15 9794.237 10014.06 9736.052 10072.25
## 249        9904.15 9769.534 10038.77 9698.273 10110.03
## 250        9904.15 9748.709 10059.59 9666.424 10141.88
## 251        9904.15 9730.362 10077.94 9638.364 10169.94
## 252        9904.15 9713.775 10094.53 9612.996 10195.30
## 253        9904.15 9698.521 10109.78 9589.668 10218.63
## 254        9904.15 9684.323 10123.98 9567.954 10240.35
## 255        9904.15 9670.989 10137.31 9547.560 10260.74
## 256        9904.15 9658.376 10149.92 9528.272 10280.03

Now if you find the data is non-stationary, no need to fear and you just do difference of the data and while applying arima model make sure the middle parameter is zero i.e arima(1,0,1) instead of arima(1,1,1)