'How to choose initial, period, horizon and cutoffs with Facebook Prophet?
I have around 23300 hourly datapoints in my dataset and I try to forecast using Facebook Prophet. To fine-tune the hyperparameters one can use cross validation:
from fbprophet.diagnostics import cross_validation
The whole procedure is shown here: https://facebook.github.io/prophet/docs/diagnostics.html
Using cross_validation
one needs to specify initial
, period
and horizon
:
df_cv = cross_validation(m, initial='xxx', period='xxx', horizon = 'xxx')
I am now wondering how to configure these three values in my case? As stated I have data of about 23.300 hourly datapoints. Should I take a fraction of that as the horizon or is it not that important to have correct fractions of the data as horizon and I can take whatever value seems to be appropriate?
Furthermore, cutoffs
has also be defined as below:
cutoffs = pd.to_datetime(['2013-02-15', '2013-08-15', '2014-02-15'])
df_cv2 = cross_validation(m, cutoffs=cutoffs, horizon='365 days')
Should these cutoffs
be equally distributed as above or can we set the cutoffs
individually as someone likes to set them?
Solution 1:[1]
initial
is the first training period. It is the minimum amount of data needed to begin your training on.horizon
is the length of time you want to evaluate your forecast over. Let's say that a retail outlet is building their model so that they can predict sales over the next month. A horizon set to 30 days would make sense here, so that they are evaluating their model on the same parameter setting that they wish to use it on.period
is the amount of time between each fold. It can be either greater than the horizon or less than it, or even equal to it.cutoffs
are the dates where each horizon will begin.
You can understand these terms by looking at this image -
credits: Forecasting Time Series Data with Facebook Prophet by Greg Rafferty
Let's imagine that a retail outlet wants a model that is able to predict the next month of daily sales, and they plan on running the model at the beginning of each quarter. They have 3 years of data
They would set their initial training data to be 2 years, then. They want to predict the next month of sales, and so would set horizon to 30 days. They plan to run the model each business quarter, and so would set the period to be 90 days. Which is also shown in above image.
Let's apply these parameters into our model:
df_cv = cross_validation(model,
horizon='30 days',
period='90 days',
initial='730 days')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | JATIN |