Hi everyone, I am Tharun and today I am going to discuss the basic statistics to be observed while working with Time Series data.
- Trend
- Seasonality
- Cyclic
- Mean
- Variance
- Stationarity
I am sharing the head of the dataset for your inference.
I plotted time series using pandas as shown below:
We will examine this graph and let's discuss the statistics mentioned.
Trend :
A trend exists when there is a long-term increase or decrease in the data. It does not have to be linear. In the above series, we can clearly see an upward trend along with the time. This indicates the number of passengers who are taking airlines increased with time from 1949 to 1961.
Seasonality :
A seasonal pattern occurs when a time series is affected by seasonal factors such as the time of the year or the day of the week. In the above time series, you can see a hike every year in the months of July and August. This is due to many people returning to their houses after completing their summer vacation using airlines. So we can clearly tell that airlines time series possess seasonality in it.
Cyclic:
A cycle occurs when the data exhibit rises and falls that are not of a fixed frequency. These fluctuations are usually due to economic conditions.
Mean :
Mean is simply an average and can be evaluated by adding all the data points in the dataset and then dividing the total by the number of points. We need to know this statistic is constant over the entire time series or not.
To check this we can compare the mean of the first quarter of data and last quarter of data. If it is the same then we can say that the mean is constant over the entire Time Series.
I checked the mean of the first 2 years ( head part ) and the last 2 years ( tail part) of the current time series. You can see the results here.
In the above time series clearly, the mean is not constant.
Variance :
Variance is an indication of the variability of data points. If data has wide amplitudes away from mean on both sides then it is not constant.
Stationarity :
Stationarity is an important aspect of a Time Series say If the Time Series is stationary then all the statistical properties such as Mean, Variance, Autocorrelation remains constant throughout the entire time series. If they vary then we can say that the Time Series is non-stationary.
If the Time series contains Trend, Seasonality then it is non-stationary since the mean and variance are not constant. The airline passenger dataset is a clear example of a non-stationary time series. Before choosing any statistical model for time series we have to identify whether time series is either Stationary or Non-Stationary and then select a suitable model for the dataset.
- Some Algorithms can only handle Stationary time series.
- Some Algorithms can handle both the Stationary and Non-Stationary series.
The above airlines dataset is clearly non-stationary series since it contains both trend and seasonality.
Let us look into another example.
This time series contains the total number of female births every day in the year 1959.
By seeing this time series we can clearly say that there is no Trend (no long-term increase or decrease in the data ). There is no seasonality ( Repeatable pattern over a fixed interval of time ). The mean and variance are constant over time. So we can say this Time Series is Stationary.
However, some time series are difficult to identify whether it is Stationary or Non-Stationary by looking into graphs. There are other mathematical techniques like Augmented Dicky Fuller Test to check the Stationarity which I will discuss in upcoming blog posts. So stay connected for updates.
I am attaching another example. Try to identify whether it is Stationary or Non-Stationary. Please justify your answer in the comments.
References:
- https://otexts.com/fpp2/stationarity.html
- https://app.pluralsight.com/library/courses/r-time-series-analysis-forecasting/table-of-contents
- https://www.udemy.com/course/python-for-time-series-data-analysis/learn/lecture/13772656?start=345#notes
Good Work!
ReplyDeleteThanks Shubam.
Delete