'Output for PyCaret compare_models() function shows less models than PyCaret models supported
New on PyCaret, I don't understand several things about this library:
- According to this tutorial, https://github.com/pycaret/pycaret/blob/master/tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb there are 25 regressors available in the library of PyCaret models.
I'm using PyCaret for a regression.
But the output of compare_models()
function shows less models. More precisely, I established a differential and the following models are missing: ard, tr, kr, svm, mlp, xgboost, catboost
.
Perhaps these models are not suitable for regression? Or do I have to create individual models with:
ard = create_model('ard', ply = 5)
tr = create_model('tr', ply = 5)
kr = create_model('kr', ply = 5)
svm = create_model('svm', ply=5)
mlp = create_model('mlp', ply=5)
xgboot = create_model('xgboost', ply=5)
catboost = create_model('catboost', ply=5)
I initialize a the PyCaret environment thanks to
setup()
function:regression = setup(data = dataset_predictions_meteo, target = 'TEMPERATURE_OBSERVEE', categorical_features = ['MonthNumber' , 'origine' , 'LIB_SOURCE'], numeric_features = ['DIFF_HOURS' , 'TEMPERATURE_PREDITE'],
session_id=123, train_size=0.8, normalize=True, #transform_target=True remove_perfect_collinearity = True)
Before pre-processing pipeline, my original dataset is:
LIB_SOURCE TEMPERATURE_PREDITE DIFF_HOURS TEMPERATURE_OBSERVEE MonthNumber origine
gfs_025 10.376662 348.0 5.9500 12 Sencrop
gfs_025 8.688105 351.0 6.6200 12 Sencrop
gfs_025 5.323708 354.0 1.1250 12 Sencrop
gfs_025 5.271800 357.0 -1.5425 12 Sencrop
gfs_025 6.889182 324.0 5.9500 12 Sencrop
gfs_025 15.815905 336.0 23.7150 5 Visiogreen
gfs_025 15.294277 339.0 19.5925 5 Visiogreen
gfs_025 19.515454 342.0 25.3750 5 Visiogreen
gfs_025 25.983438 345.0 34.1500 5 Visiogreen
gfs_025 28.534859 348.0 37.6650 5 Visiogreen
After pre-processing pipeline, my dataset is:
get_config('X')
TEMPERATURE_PREDITE DIFF_HOURS LIB_SOURCE_arome_001 LIB_SOURCE_arpege_01 LIB_SOURCE_gfs_025 MonthNumber_1 MonthNumber_10 MonthNumber_11 MonthNumber_12 MonthNumber_2 MonthNumber_3 MonthNumber_4 MonthNumber_5 MonthNumber_8 origine_Sencrop
-0.142182 2.887928 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0
-0.446260 2.921703 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0
-1.052127 2.955477 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0
-1.061474 2.989251 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0
-0.770213 2.617735 0.0 0.0 1.0 0.0 0.
0.837327 2.752832 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
0.743391 2.786606 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
1.503548 2.820380 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
2.668314 2.854154 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
3.127778 2.887928 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
So because I set remove_perfect_collinearity to True
in setup()
function, I expect that for categoricals variables LIB_SOURCE
whith 3 values, PyCaret keeps only 2 columns, that is not the case. It seems work better for origine variable which have 2 values and after processing pipeline, PyCaret keeps only 1 column.
Additionaly for numerics features DIFF_HOURS
and TEMPERATURE_PREDITE
, PyCaret transforms them because of normalize setted to True in setup()
function. But for me, a normalisation should transform values between 0 and 1 that it's not the case.
Thanks.
Solution 1:[1]
The models that show up depend on the model libraries that are installed in your environment plus your setup. It may be different since many of the models libraries that you mentioned are not installed by default.
Solution 2:[2]
According to pycaret, this code example will work for only regression models but as you can see at tutorial its include all models.
from pycaret.regression import *
Use you model look like below;
exp_reg101 = setup(data = data, target = 'Price', session_id=123)
This doesnt require split your data x and y just put your whole data include your target variable just specify your target. Compare your models look like this code chunk below;
best = compare_models()
This usage of codes without any paramaters and ground zero basic , you can add parameters, it should give you all regression model comparasions. I dont think its about environment but just in case use it on colab to try out.
For your other question, i think there is confusion between stardartscale and minmaxscale. You should look for mixmaxscale if you want your data range between 0 and 1 if you dont have negative values. I couldnt tell more but check scale types there is 3 types as far as i know ; StandardScaler, MinMaxScaler and RobustScaler.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Nikhil Gupta |
Solution 2 | elandil2 |