'Output for PyCaret compare_models() function shows less models than PyCaret models supported

New on PyCaret, I don't understand several things about this library:

  1. According to this tutorial, https://github.com/pycaret/pycaret/blob/master/tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb there are 25 regressors available in the library of PyCaret models.

I'm using PyCaret for a regression. But the output of compare_models() function shows less models. More precisely, I established a differential and the following models are missing: ard, tr, kr, svm, mlp, xgboost, catboost.

Perhaps these models are not suitable for regression? Or do I have to create individual models with:

ard = create_model('ard', ply = 5)
tr = create_model('tr', ply = 5)
kr = create_model('kr', ply = 5)
svm = create_model('svm', ply=5)
mlp = create_model('mlp', ply=5)
xgboot = create_model('xgboost', ply=5)
catboost = create_model('catboost', ply=5)
  1. I initialize a the PyCaret environment thanks to setup() function:

    regression = setup(data = dataset_predictions_meteo, target = 'TEMPERATURE_OBSERVEE', categorical_features = ['MonthNumber' , 'origine' , 'LIB_SOURCE'], numeric_features = ['DIFF_HOURS' , 'TEMPERATURE_PREDITE'],
    session_id=123, train_size=0.8, normalize=True, #transform_target=True remove_perfect_collinearity = True)

Before pre-processing pipeline, my original dataset is:

    LIB_SOURCE  TEMPERATURE_PREDITE     DIFF_HOURS  TEMPERATURE_OBSERVEE    MonthNumber     origine
gfs_025     10.376662   348.0   5.9500  12  Sencrop
gfs_025     8.688105    351.0   6.6200  12  Sencrop
gfs_025     5.323708    354.0   1.1250  12  Sencrop
gfs_025     5.271800    357.0   -1.5425     12  Sencrop
gfs_025     6.889182    324.0   5.9500  12  Sencrop
gfs_025     15.815905   336.0   23.7150     5   Visiogreen
gfs_025     15.294277   339.0   19.5925     5   Visiogreen
gfs_025     19.515454   342.0   25.3750     5   Visiogreen
gfs_025     25.983438   345.0   34.1500     5   Visiogreen
gfs_025     28.534859   348.0   37.6650     5   Visiogreen

After pre-processing pipeline, my dataset is:

get_config('X')

TEMPERATURE_PREDITE     DIFF_HOURS  LIB_SOURCE_arome_001    LIB_SOURCE_arpege_01    LIB_SOURCE_gfs_025  MonthNumber_1   MonthNumber_10  MonthNumber_11  MonthNumber_12  MonthNumber_2   MonthNumber_3   MonthNumber_4   MonthNumber_5   MonthNumber_8   origine_Sencrop
    -0.142182   2.887928    0.0     0.0     1.0     0.0     0.0     0.0     1.0     0.0     0.0     0.0     0.0     0.0     1.0
    -0.446260   2.921703    0.0     0.0     1.0     0.0     0.0     0.0     1.0     0.0     0.0     0.0     0.0     0.0     1.0
    -1.052127   2.955477    0.0     0.0     1.0     0.0     0.0     0.0     1.0     0.0     0.0     0.0     0.0     0.0     1.0
    -1.061474   2.989251    0.0     0.0     1.0     0.0     0.0     0.0     1.0     0.0     0.0     0.0     0.0     0.0     1.0
    -0.770213   2.617735    0.0     0.0     1.0     0.0     0.
0.837327    2.752832    0.0     0.0     1.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     1.0     0.0     0.0
0.743391    2.786606    0.0     0.0     1.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     1.0     0.0     0.0
1.503548    2.820380    0.0     0.0     1.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     1.0     0.0     0.0
2.668314    2.854154    0.0     0.0     1.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     1.0     0.0     0.0
3.127778    2.887928    0.0     0.0     1.0     0.0     0.0     0.0     0.0     0.0     0.0     0.0     1.0     0.0     0.0

So because I set remove_perfect_collinearity to True in setup() function, I expect that for categoricals variables LIB_SOURCE whith 3 values, PyCaret keeps only 2 columns, that is not the case. It seems work better for origine variable which have 2 values and after processing pipeline, PyCaret keeps only 1 column.

Additionaly for numerics features DIFF_HOURS and TEMPERATURE_PREDITE, PyCaret transforms them because of normalize setted to True in setup() function. But for me, a normalisation should transform values between 0 and 1 that it's not the case.

Thanks.



Solution 1:[1]

The models that show up depend on the model libraries that are installed in your environment plus your setup. It may be different since many of the models libraries that you mentioned are not installed by default.

Solution 2:[2]

According to pycaret, this code example will work for only regression models but as you can see at tutorial its include all models.

from pycaret.regression import *

Use you model look like below;

exp_reg101 = setup(data = data, target = 'Price', session_id=123) 

This doesnt require split your data x and y just put your whole data include your target variable just specify your target. Compare your models look like this code chunk below;

best = compare_models()

This usage of codes without any paramaters and ground zero basic , you can add parameters, it should give you all regression model comparasions. I dont think its about environment but just in case use it on colab to try out.

For your other question, i think there is confusion between stardartscale and minmaxscale. You should look for mixmaxscale if you want your data range between 0 and 1 if you dont have negative values. I couldnt tell more but check scale types there is 3 types as far as i know ; StandardScaler, MinMaxScaler and RobustScaler.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Nikhil Gupta
Solution 2 elandil2