Skip to main content

Basic statistics required while working with Time Series data


Hi everyone, I am Tharun and today I am going to discuss the basic statistics to be observed while working with Time Series data.
  1. Trend
  2. Seasonality
  3. Cyclic
  4. Mean
  5. Variance
  6. Stationarity
 I have taken the airlines dataset which contains the number of passengers travelled from the year 1949 to 1961 and discuss the topics briefly!

I am sharing the head of the dataset for your inference.




I plotted time series using pandas as shown below:




We will examine this graph and let's discuss the statistics mentioned.
 
Trend :

 A trend exists when there is a long-term increase or decrease in the data. It does not have to be linear. In the above series, we can clearly see an upward trend along with the time. This indicates the number of passengers who are taking airlines increased with time from 1949 to 1961.

Seasonality : 

A seasonal pattern occurs when a time series is affected by seasonal factors such as the time of the year or the day of the week. In the above time series, you can see a  hike every year in the months of July and August. This is due to many people returning to their houses after completing their summer vacation using airlines. So we can clearly tell that airlines time series possess seasonality in it.

Cyclic: 

A cycle occurs when the data exhibit rises and falls that are not of a fixed frequency. These fluctuations are usually due to economic conditions.

Mean :

Mean is simply an average and can be evaluated by adding all the data points in the dataset and then dividing the total by the number of points. We need to know this statistic is constant over the entire time series or not.

To check this we can compare the mean of the first quarter of data and last quarter of data. If it is the same then we can say that the mean is constant over the entire Time Series.

I checked the mean of the first 2 years  ( head part ) and the last 2 years ( tail part) of the current time series. You can see the results here. 



In the above time series clearly, the mean is not constant.

Variance : 

Variance is an indication of the variability of data points. If data has wide amplitudes away from mean on both sides then it is not constant.

Stationarity :

Stationarity is an important aspect of a Time Series say If the Time Series is stationary then all the statistical properties such as Mean, Variance, Autocorrelation remains constant throughout the entire time series. If they vary then we can say that the Time Series is non-stationary.

If the Time series contains Trend, Seasonality then it is non-stationary since the mean and variance are not constant. The airline passenger dataset is a clear example of a non-stationary time series. Before choosing any statistical model for time series we have to identify whether time series is either Stationary or Non-Stationary and then select a suitable model for the dataset.
  • Some Algorithms can only handle Stationary time series.
  • Some Algorithms can handle both the Stationary and Non-Stationary series.
So we should be very clear whether the time series is Stationary or not. I will discuss this in detail in the upcoming blog posts.

The above airlines dataset is clearly non-stationary series since it contains both trend and seasonality.

Let us look into another example.


This time series contains the total number of female births every day in the year 1959.

By seeing this time series we can clearly say that there is no Trend  (no long-term increase or decrease in the data ). There is no seasonality ( Repeatable pattern over a fixed interval of time ). The mean and variance are constant over time. So we can say this Time Series is Stationary.

However, some time series are difficult to identify whether it is Stationary or Non-Stationary by looking into graphs. There are other mathematical techniques like Augmented Dicky Fuller Test to check the Stationarity which I will discuss in upcoming blog posts. So stay connected for updates.

I am attaching another example. Try to identify whether it is Stationary or Non-Stationary. Please justify your answer in the comments.
 

References:
  • https://otexts.com/fpp2/stationarity.html
  • https://app.pluralsight.com/library/courses/r-time-series-analysis-forecasting/table-of-contents
  • https://www.udemy.com/course/python-for-time-series-data-analysis/learn/lecture/13772656?start=345#notes













 

Comments

Post a Comment

Popular posts from this blog

Why Forecasting ? Why Time Series? Why it is so important ?

Hi everyone I hope everyone is fine. Today I am going to discuss the basics and importance of Time Series forecasting and how it is extensively used in many industries.  Forecasting is carried out by everyone in day to day life. Some practical examples  A person may forecast the expenses based on the plans and bills in a particular month.  A project manager will forecast the time required for the completion of a project based on the complexity involved. These things are easier to forecast since we have a clear idea of facts with us. In simple terms, the time of the sunrise tomorrow morning can be forecast precisely. On the other hand, tomorrow’s lottery numbers cannot be forecast with any accuracy. The predictability of an event or a quantity depends on several factors including how well we understand the factors that contribute to it how much data is available whether the forecasts can affect the thing we are trying to forecast Like an...

Benchmark Methods and Techniques one should know while Forecasting

Hi everyone,  I am Tharun and today I am going to discuss benchmark methods for Forecasting that are simple to understand and effective for some Time-series problems. So let's start the Topic! Benchmark Methods for Forecasting:  Average Method  Naive Method   Seasonal Naive Method  Drift Method Average Method : In the Average Method, the forecasts for all future values are equal to the average of past data. y T + h represents forecasted values based on historical data . h represents the forecast horizon you want to forecast  (next 6 months, 1year, etc ) y1, y2,......, yT represents historic data points. So let's take an example dataset and discuss this method. I have taken a dataset that contains Monthly milk production from the year 1962 to 1975. so let's have a quick look into the head of the dataset. Now let's plot the dataset: Now I split the dataset into train and test as shown below. ...

Autocorrelation and how it effects Stationarity

Hi everyone, I hope everyone is Fine. I am Tharun and today I am going to discuss the most important topic in Stationarity. I already discussed the basics of Stationarity in my previous post please refer to it if you haven't gone through. Let's start the topic without wasting time. In order to confirm whether the Time Series is Stationary or not. We need to confirm three aspects. Constant Mean. Constant Variance. No Autocorrelation.     I discussed the first two concepts and why it is important to check Stationarity before building a model for the time series clearly in the previous post. Today I am going to discuss in detail about Autocorrelation. Before diving into Autocorrelation lets discuss the basics of Correlation and Time Lags. Correlation : Correlation describes the linear relationship between two variables.  Let's say x and y are two variables then Correlation coefficient ( r ) is given by: r can take values from -1 to 1....