The goal of Hyper Parameters Tuning is to find a configuration that maximizes or minimizes a specific goal. Tuning of hyperparameters can be performed in several ways: raw force, random search, Bayesian search, or model-based methods.
I have previously advocated the use of model-based methods, which proved to be very effective in some of the problems I encountered. You can find more information at SMAC, a powerful library for HPO, here:
You may also be interested in better understanding how these methods work and building your own Hyper Parameter Optimization library.
Here’s a fun way to do it using your knowledge of standard templates like XGBoost, CatBoost or RandomForest:
In addition to being confusing model-based hyperparameter optimization (using ML models to tune ML models), it offers a very interesting advantage over other solutions: it supports categorical parameters. This means that we can encode the model type as a categorical parameter.
Let’s see how we can hack the standard Hyper Parameter Tuning function to speed up model selection and model building.
As mentioned in the introduction, the idea of this hacking is this: can we consider model_type (i.e. XGBoost, Prophet, S / ARIMA, etc.) parameter like any other and let the Hyper Parameter Tuning method do the work for us?
Fortunately, as noted above, the optimization of hyperparameters using model-based methods supports categorical parameters. After all, their background model is usually an enhanced (or not) decision tree.
The direct consequence of this feature is that we can create HyperModel which is determined among other parameters a model type. model_type encodes the background model used in the training. In Python, this gives:
This implementation of HyperModel follows the interface of the scikit-learn model, i.e. it provides fit, predict, set_parameters and get_params methods. For simplicity, we only support two models: XGBoost and RandomForest, but adding a new one only takes a few lines. You should try SVR for example.
As you’ve probably noticed, we only use the whitelist to retain the parameters that are applicable to a particular model. This is not the best way to support the fact that the parameters differ from one model to another. The correct way to do that would be to use a conditional specification. The ConfigSpace python library supports this, but it does not work with the kit.
Using a new class for training and forecasting is immediate. It is said that we want to use XGBoost as a background template. This gives:
Now all we have to do is use this model in the HPT / HPO phase and let it choose the best candidate model type.
Armed with our HyperModel, whose main parameter is a model type, we can use standard Hyper Parameter Tuning methods to identify the best model.
The important thing to remember is that there is absolutely no “best model”. When I write “best model,” I mean the best model for a given score. In this article, we are going to use the mean absolute error as points.
Although I have used (and advised) SMAC or a custom Hyper Parameter Optimization implementation to perform HP tuning in the previous two topics, we will try another method in this article. Never miss the opportunity to try something new 🙂
This time, we are going to use BayesSearchCV to examine the configuration status. The principle of Bayesian search is to build a replacement model using Gaussian processes that estimates the score of the model.
Each new training updates the follow-up information for the replacement model. This substitute is then delivered in randomly selected configurations, and the one that gives the best score is considered for training.
Because it uses the Gaussian process model to learn the relationship between hyperparameters and the score of candidate models, it can be considered a model-based approach.
When everything is combined, the following lines of code are obtained:
The configuration mode is most often determined using a uniform distribution of parameters. This means that the probability of choosing a value for a particular range is the same everywhere. That’s true max_features, n_estimators or max_depth for example. Vice versa, gamma and Learning speed, which spans several magnitudes, is selected using a log-uniform distribution.
Executing this snippet of code shows that XGBoost seems to be the best option for this material.
Like me, you probably won’t trust the result of the algorithm until you’ve done some checking.
Fortunately, many other researchers have already studied the Boston material challenge. More specifically, it has been studied in Kaggle here by Shreayan Chaudhary and he comes to the same conclusion as our algorithm. That is good news.
However, we are never too careful. Let’s do another simple check to make sure RandomForest can’t surpass XGBoost if we give more iterations to the HP Tuning study and optimize only for Random Forest:
We simply reuse HyperModel, but this time we force the search to focus on one model: RandomForestRegressor. We also allow more iterations: 50 for Random Forest only and 50 iterations for both models in the past.
The conclusion is the same. XGBoost accuracy remains superior to the mean absolute error: 2.68 for Random Forest and 2.57 for XGboost.
We may also have been “lucky,” and the fact that XGBoost goes beyond RandomForestRegressor may be purely random and tied to the original seed used to initialize the Bayesian search: random position = 0.
Execute the code several times with different random stations should assure you that we were not lucky and that in this case XGBoost is the best option for the point we chose: Mean Absolute Error.
Another very interesting aspect is the speed at which our model choice throws a hollow duck. To illustrate this point, we are going to add another model to the list of HyperModel-supported models: LinearRegression.
Intuitively, this is the worst option. Let’s see how much time automatic model selection takes to explore this possibility. First, add it to HyperModel:
We are going to use dirty hacking to measure the time it took to examine a particular configuration. We’ll just print it on line 42 (believe it or not), and running some grapples will give you the following statistics:
- Linear regression was studied 3 times.
- Random forest 17 times.
- XGBoost 30 times.
It is significant to see that with only 50 iterations, our model range has learned to focus on the most promising model: XGBoost and abandoned linear regression with just three iterations.
We could have ruled out linear regression more quickly if we had used conditional configuration in ConfigSpace instead of using a whitelist of allowed parameters. This artificially increases unnecessary experiments.
Extending the tuning of hyperparameters for model selection works. It has been pretty easy to show it in a few lines of code.
We have used Bayesian search to examine the configuration state, but we could have used any other efficient Hyper Parameter Tuning method.
Another idea worth trying is to use Hyper Parameter Optimization methods to select features. It would probably work effectively as well.