'feature importance in mlr3 with iml for classification forests
I calculate feature importance for 2 different types of machine learning models (SVM and Classification Forest). I cannot post the data here, but I describe what I do:
My (classification) task has about 400 observations of 70 variables. Some of them are highly, but nor perfectly correlated
- I fit the models with
learner_1$train(task)
learner_2$train(task)
where learner1 is a svm and learner 2 is a classification forest.
- Now, I want to calculate feature importance with iml, so for each of the learners I use the following code (here the code for learner_1)
model_analyzed=Predictor$new(learner_1,
data=dplyr::select(task$data(), task$feature_names),
y=dplyr::select(task$data(), task$target_names))
used_features <- task$feature_names
effect = FeatureImp$new(model_analyzed, loss="ce", n.repetitions=10, compare="ratio")
print(effect$plot(features=used_features))
My results are the following
a) For the SVM
b) For the classification forest
I do not understand the second picture:
a) should the "anchor" point not be around 1, as I observe for the SVM? If the ce is not made worse by shuffling for any feature, then the graph shoud show a 1 and not a 0?
b) If all features show a value very close to zero, as I see in the second graph, does it mean that the classification error is zero, if the feature is shuffled? So for each single feature, I would get a perfect model if just this one feature is omitted or shuffled?
I am really confused here, can someone help me understand what happens?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|