In this short tutorial, we are going to see how to use boostime
to apply two models: an arima + catboost and a prophet + lightGBM.
First, we load the libraries that we are going to use during this tutorial.
library(tidymodels)
library(boostime)
library(modeltime)
library(tidyverse)
library(timetk)
library(lubridate)
# This toggles plots from plotly (interactive) to ggplot (static)
interactive <- FALSE
Next, we visualize the data that we are going to use once filtered:
m750 <- m4_monthly %>% filter(id == "M750")
m750 %>%
plot_time_series(date, value, .interactive = interactive)
Let’s split the data into training and test sets using initial_time_split()
function:
splits <- initial_time_split(m750, prop = 0.8)
In the first model, we will use an Arima model whose orders will be selected automatically through KPSS unit root tests. Subsequently, the residuals of this first model will be passed to a Catboost model. Finally, the output of both models is summed.
model_arima_catboost <- boost_arima() %>%
set_engine("auto_arima_catboost", verbose = 0) %>%
fit(value ~ date + month(date), data = training(splits))
model_arima_catboost
#> parsnip model object
#>
#> Fit time: 26.1s
#> ARIMA(1,0,1)(1,1,1)[12] with drift w/ Catboost Errors
#> ---
#> Model 1: Auto ARIMA
#> Series: outcome
#> ARIMA(1,0,1)(1,1,1)[12] with drift
#>
#> Coefficients:
#> ar1 ma1 sar1 sma1 drift
#> 0.9600 -0.3275 0.195 -0.7330 17.4466
#> s.e. 0.0203 0.0706 0.121 0.1025 4.8051
#>
#> sigma^2 estimated as 23896: log likelihood=-1499.59
#> AIC=3011.18 AICc=3011.56 BIC=3031.87
#>
#> ---
#> Model 2: Catboost Errors
#>
#> CatBoost model (1000 trees)
#> Loss function: RMSE
#> Fit to 1 features
The second model will use Prophet followed by Catboost to model the residuals:
model_prophet_catboost <- boost_prophet() %>%
set_engine("prophet_catboost", verbose = 0) %>%
fit(value ~ date + month(date), data = training(splits))
Here’s the general process and where the functions fit.
The Modeltime Workflow
So we will continue from step three.
The next step is to add each of the models to a Modeltime Table using modeltime_table()
. This step does some basic checking to make sure each of the models are fitted and that organizes into a scalable structure called a “Modeltime Table” that is used as part of our forecasting workflow.
We have 2 models to add.
models_tbl <- modeltime_table(
model_arima_catboost,
model_prophet_catboost
)
models_tbl
#> # Modeltime Table
#> # A tibble: 2 x 3
#> .model_id .model .model_desc
#> <int> <list> <chr>
#> 1 1 <fit[+]> ARIMA(1,0,1)(1,1,1)[12] WITH DRIFT W/ CATBOOST ERRORS
#> 2 2 <fit[+]> PROPHET MODEL W/ CATBOOST ERROR SPECIFICATION
Calibrating adds a new column, .calibration_data
, with the test predictions and residuals inside. A few notes on Calibration:
calibration_tbl <- models_tbl %>%
modeltime_calibrate(new_data = testing(splits))
calibration_tbl
#> # Modeltime Table
#> # A tibble: 2 x 5
#> .model_id .model .model_desc .type .calibration_da~
#> <int> <list> <chr> <chr> <list>
#> 1 1 <fit[+]> ARIMA(1,0,1)(1,1,1)[12] WITH DRIFT ~ Test <tibble [62 x 4~
#> 2 2 <fit[+]> PROPHET MODEL W/ CATBOOST ERROR SPE~ Test <tibble [62 x 4~
There are 2 critical parts to an evaluation.
Visualizing the Test Error is easy to do using the interactive plotly visualization (just toggle the visibility of the models using the Legend).
calibration_tbl %>%
modeltime_forecast(
new_data = testing(splits),
actual_data = m750
) %>%
plot_modeltime_forecast(
.legend_max_width = 25, # For mobile screens
.interactive = interactive
)
We can use modeltime_accuracy()
to collect common accuracy metrics. The default reports the following metrics using yardstick
functions:
mae()
mape()
mase()
smape()
rmse()
rsq()
These of course can be customized following the rules for creating new yardstick metrics, but the defaults are very useful. Refer to default_forecast_accuracy_metrics()
to learn more.
To make table-creation a bit easier, I’ve included table_modeltime_accuracy()
for outputing results in either interactive (reactable
) or static (gt
) tables.
calibration_tbl %>%
modeltime_accuracy() %>%
table_modeltime_accuracy(
.interactive = interactive
)
Accuracy Table | ||||||||
---|---|---|---|---|---|---|---|---|
.model_id | .model_desc | .type | mae | mape | mase | smape | rmse | rsq |
1 | ARIMA(1,0,1)(1,1,1)[12] WITH DRIFT W/ CATBOOST ERRORS | Test | 640.45 | 6.24 | 1.91 | 6.02 | 683.07 | 0.84 |
2 | PROPHET MODEL W/ CATBOOST ERROR SPECIFICATION | Test | 271.50 | 2.74 | 0.81 | 2.67 | 364.09 | 0.81 |
The final step is to refit the models to the full dataset using modeltime_refit()
and forecast them forward.
refit_tbl <- calibration_tbl %>%
modeltime_refit(data = m750)
refit_tbl %>%
modeltime_forecast(h = "3 years", actual_data = m750) %>%
plot_modeltime_forecast(
.legend_max_width = 25, # For mobile screens
.interactive = interactive
)