Skip to main content

Benchmark Methods and Techniques one should know while Forecasting

Hi everyone,  I am Tharun and today I am going to discuss benchmark methods for Forecasting that are simple to understand and effective for some Time-series problems. So let's start the Topic!

Benchmark Methods for Forecasting:
  •  Average Method
  •  Naive Method 
  •  Seasonal Naive Method
  •  Drift Method
Average Method :

In the Average Method, the forecasts for all future values are equal to the average of past data.

  • h represents the forecast horizon you want to forecast  (next 6 months, 1year, etc )
  • y1, y2,......, yT represents historic data points.

So let's take an example dataset and discuss this method.

I have taken a dataset that contains Monthly milk production from the year 1962 to 1975.

so let's have a quick look into the head of the dataset.



Now let's plot the dataset:



Now I split the dataset into train and test as shown below.



I will build the model from the year 1962 to 1975 and I will test in the year 1975.

As discussed in the definition we should generate the mean of the entire train dataset (1962 to 1974 ) and we should impute it for the forecasting period (1975)  as predictions. I generated it by using df.mean() function in pandas as shown below.


Now let's plot the predictions with original values and see how well our model fits the data.


The blue line shows the predicted values and the green line shows the actual values. We can clearly see that the Average model cannot identify either Trend or Seasonality.

Let's check MAPE and MAE errors for the Time Series:


You can clearly see that the MAPE and MAE errors are very high indicating the Average Method is not the right fit for this Time series. But it is always good to know these methods because the advanced methods which we develop in the later course should get better MAE or MAPE values than the current method. If not we can use these simple methods for Forecasting.

( I explained about MAE and MAPE in bonus section at the end of post .)

Naive Method : 

In the Naive method, We simply set all forecast values to be the value of the most recent observation ( last observation ).

I used the same dataset and split the dataset in the same ratio as I did during the Average Method. I am directly taking you to the forecasting part because all the remaining steps are quite similar.



You can clearly see all the forecasting values are equal to the most recent observation in the training dataset i.e production at '1974-12-01' is 813.

Now Lets plot the Actual vs Forecasted values for Naive Method.


The blue line shows the predicted values and the green line shows the actual values.

Lets quickly check the MAE and MAPE metrics:


We can clearly say that the MAE and MAPE values are better than the Average   Method.  But still, this method does not fit into the actual trend of the dataset.

Seasonal Naive Method:

In the Seasonal Naive method, we set each forecast to be equal to the last observed value from the same season of the year. ( same month of the previous year).
*m represents the seasonal period

The seasonal Naive Method works well with highly seasonal data. We just replace the forecasting values with the previous year's values. For example, the forecasting value for 2021  January will be the same as the forecasting value in 2020 January. Now let's apply this on our Milk production dataset.



The Forecasting values are equal to the same month of last year. 

Now let's plot Actual vs Forecasted for Seasonal Naive Method:



You can clearly see that the Seasonal Naive Method almost matches the actual pattern and performs well. 

Lets quickly check the MAE and MAPE error metrics



The MAPE and MAE values are very low, which strongly indicates that our model performed well. So we can use this method as a benchmark and then go to advanced models If those models will get better values for MAE/MAPE then you can choose them or else you can go with this simple method.

The reason being simple models generalizes well for the upcoming data than the complex models.

Lets finally check the Drift Method and close the session!

Drift method:

In simple terms, the drift method is a line drawn between first and last observations and extrapolating into the future.

Now I am directly plotting Actual vs Forecasted using the drift method.


I  generated the Predictions from the drift method and evaluated MAE and MAPE errors. You can check my notebook attached for complete details.

All Benchmark predictions vs Actual values from the year 1975 to 1976:



You can clearly see the red color line ( Seasonal Naive method ) fit well to the Actual values in the year 1976. So Seasonal Naive Method can be used as a benchmark while working with this dataset.


Cheers!. Happy learning. Thanks for Joining me. Feel free to use the comment section for any doubts. In the bonus section, I discussed the error metrics MAE and MAPE.
 
You can find the code at :

https://github.com/TharunAts/Time-Series-Forecasting-with-Tharun/blob/master/Benchmark_Models.ipynb


Bonus:

Mean Absolute error (MAE):

The MAE is the average of all absolute errors.

MAE can be calculated as:
 This can be explained in 3 steps:
  • Find the difference between actual and predicted values (errors)
  •  Sum them up.
  • Divide them with the number of data points
MAE is a widely used error metric in the world of Data Science.

Mean Absolute Percentage Error (MAPE) :

MAPE measures this accuracy as a percentage and can be calculated as the average absolute percent error for each time period minus actual values divided by actual values.

MAPE can be calculated as:
Even though MAPE has its own pitfalls, but it is widely used in Time Series Forecasting problems, so it is better to know.


References:
  1. Hyndman, R.J., & Athanasopoulos, G. (2018) Forecasting: principles and practice, 2nd edition, OTexts: Melbourne, Australia. OTexts.com/fpp2. Accessed on 26,April 2020
  2. https://www.statisticshowto.com/absolute-error/











Comments

Popular posts from this blog

Why Forecasting ? Why Time Series? Why it is so important ?

Hi everyone I hope everyone is fine. Today I am going to discuss the basics and importance of Time Series forecasting and how it is extensively used in many industries.  Forecasting is carried out by everyone in day to day life. Some practical examples  A person may forecast the expenses based on the plans and bills in a particular month.  A project manager will forecast the time required for the completion of a project based on the complexity involved. These things are easier to forecast since we have a clear idea of facts with us. In simple terms, the time of the sunrise tomorrow morning can be forecast precisely. On the other hand, tomorrow’s lottery numbers cannot be forecast with any accuracy. The predictability of an event or a quantity depends on several factors including how well we understand the factors that contribute to it how much data is available whether the forecasts can affect the thing we are trying to forecast Like an...

Autocorrelation and how it effects Stationarity

Hi everyone, I hope everyone is Fine. I am Tharun and today I am going to discuss the most important topic in Stationarity. I already discussed the basics of Stationarity in my previous post please refer to it if you haven't gone through. Let's start the topic without wasting time. In order to confirm whether the Time Series is Stationary or not. We need to confirm three aspects. Constant Mean. Constant Variance. No Autocorrelation.     I discussed the first two concepts and why it is important to check Stationarity before building a model for the time series clearly in the previous post. Today I am going to discuss in detail about Autocorrelation. Before diving into Autocorrelation lets discuss the basics of Correlation and Time Lags. Correlation : Correlation describes the linear relationship between two variables.  Let's say x and y are two variables then Correlation coefficient ( r ) is given by: r can take values from -1 to 1....