Hi everyone, I hope everyone is Fine. I am Tharun and today I am going to discuss the most important topic in Stationarity. I already discussed the basics of Stationarity in my previous post please refer to it if you haven't gone through. Let's start the topic without wasting time.
In order to confirm whether the Time Series is Stationary or not. We need to confirm three aspects.
In order to confirm whether the Time Series is Stationary or not. We need to confirm three aspects.
- Constant Mean.
- Constant Variance.
- No Autocorrelation.
Correlation :
Correlation describes the linear relationship between two variables.
Let's say x and y are two variables then Correlation coefficient ( r ) is given by:
r can take values from -1 to 1.
If the r-value is close to -1 it indicates x and y are negatively correlated i.e If x increases y decreases and vice versa.
If the r-value is close to +1 it indicates x and y are positively correlated i.e if x decreases y increases and vice versa.
If the r-value is close to 0 it indicates x and y are not linearly dependent on each other.
Examples :
- The more time you spend running on a treadmill ( x ), the more calories (y) you will burn ( positively correlated, r-value lies close to 1 )
- If a train increases speed ( x ), the length of time ( y ) to get to the final point decreases. ( negatively correlated, r-value lies close to -1 )
I plotted a graph between the Number of hours spent on a treadmill vs calories burnt.
In the graph, you can clearly see that If the number of hours spent on treadmill increases, the more number of calories will be burnt. These variables exhibit a highly positive correlation.
I think you got the basic idea about Correlation, Now let's start Lags.
Time Lags:
Time lags generally shifting your time series by n time units. So if u say time lag 1 then you shift your entire target column by 1 unit. This can be better explained by taking an example dataset.
lag 1 means shifting the entire passenger's column by 1 unit, lag 2 means shifting the entire passenger's column by 2 units.
Autocorrelation:
Autocorrelation is almost similar to correlation, the only difference is it measures the linear relationship between lagged values of a time series.
Autocorrelation is evaluated using the formula :
Here k represents the time lags.
"Autocorrelation basically checks whether the previous observations influence the later ones."
If there is any Autocorrelation present inside the Time Series we can say that the Time Series is not Stationary.
Let's take a dataset that contains Monthly milk production from the year 1962 to 1975.
The head of the dataset :
Generating lagged Time series at different lag values, Here I considered up to 4 lags.
Now we have to check the Autocorrelation between Production column and lagged columns. If the Autocorrelation is close to +1 or -1. Then we can say that the Time Series exhibits Autocorrelation and it can be termed as non-stationary Time Series.
So let's check the Correlation values.
You can clearly see that the Autocorrelation is close to 1 ( 0.900796 ) between the Production column ( Original Time Series ) and lag 1 (Lagged Time Series). This clearly shows Autocorrelation inside the Time Series. So this Time Series is non stationary.
Inference through graphs:
This plot shows positive Autocorrelation between Original Time Series and Lagged Time Series ( lag = 1).
Some Time Series may exhibit autocorrelation at lag =10 and some at lag =20. It is difficult to plot at every lag and to check the Autocorrelation.
In practice, we will take the help of ACF ( Autocorrelation Function ) and ACF plots from stats library which will make our job easier to generate the relationship between original Time series and Lagged Time series. So by looking at the plot, we can say that the Time Series exhibits Autocorrelation or not.
Let's check acf and plot_acf functions in the stats library.
Evaluating the correlation coefficient at different lags:
Plotting ACF plot for the Time Series
From this plot, we can tell that there is high Autocorrelation at several lags ( lag = 1, 12 ). So we can tell that this Time Series is non stationary Time Series.
To remove autocorrelation and make the Time series stationary we have several transformation methods such as differencing and logarithmic Transformations which I will discuss in later posts.
Let's take another example and check the Stationarity using acf and plot_acf functions.
I took a dataset that tells the daily Female births in the year 1959.
Applying the acf function from the stats library
You can see the autocorrelation coefficient values are very close to zero indicating Time-series doesn't exhibit Autocorrelation.
Applying plot_acf function
We can see a sudden drop at lag1 indicating that there is no Autocorrelation inside Time Series and the Time Series is Stationary.
Cheers! , Thanks for joining me, Happy Learning!
You can find the code here:
https://github.com/TharunAts/Time-Series-Forecasting-with-Tharun
References :
https://github.com/TharunAts/Time-Series-Forecasting-with-Tharun
References :
- https://otexts.com/fpp2/stationarity.html
- https://app.pluralsight.com/library/courses/r-time-series-analysis-forecasting/table-of-contents
- https://www.udemy.com/course/python-for-time-series-data-analysis/learn/lecture/13772656?start=345#notes
Comments
Post a Comment