'Custom objective function for XGBoost including an external data column
I am using XGBoost in order to do a sales forecasting. I need a custom objective function, as the value of the prediction depends on the sales price of an item. I am struggling to feed in the sales price into the loss function next to the labels and predictions. This is my approach:
def monetary_value_objective(predt: np.ndarray, dtrain: Union[xgb.DMatrix, np.ndarray]) -> Tuple[np.ndarray, np.ndarray]:
"""
predt = model prediction
dtrain = labels
Currently, dtrain is a numpy array.
"""
y = dtrain
mask1 = predt <= y # Predict too few
mask2 = predt > y # Predict too much
price = train[0]["salesPrice"]
grad = price **2 * (predt - y)
# Gradient is negative if prediction is too low, and positive if it is too high
# Here scale it (0.72 = 0.6**2 * 2)
grad[mask1] = 2 * grad[mask1]
grad[mask2] = 0.72 * grad[mask2]
hess = np.empty_like(grad)
hess[mask1] = 2 * price[mask1]**2
hess[mask2] = 0.72 * price[mask2]**2
grad = -grad
return grad, hess
I get the following error when hyperparameter tuning:
[09:11:35] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
0%| | 0/1 [00:00<?, ?it/s, best loss: ?]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-34-2c64dc1b5a76> in <module>()
1 # set runtime environment to GPU at: Runtime -> Change runtime type
----> 2 trials, best_hyperparams = hyperpara_tuning(para_space)
3 final_xgb_model = trials.best_trial['result']['model']
4 assert final_xgb_model is not None, "Oooops there is no model created :O "
5
17 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/indexers.py in check_array_indexer(array, indexer)
399 if len(indexer) != len(array):
400 raise IndexError(
--> 401 f"Boolean index has wrong length: "
402 f"{len(indexer)} instead of {len(array)}"
403 )
IndexError: Boolean index has wrong length: 1 instead of 136019
Does anyone have an idea how to use the sales price in the objective function? Is this possible at all?
Thanks!
Solution 1:[1]
You can use weights
vector in your custom objective function, if you encode your external variable into weights distribution it could work, but I don't know if weights itself are only used in objective function inself or mayby also at level of data sampling, if so you would obtain much more complicated situation...
Solution 2:[2]
A bit late, but this answers the OP, https://datascience.stackexchange.com/questions/74780/how-to-implement-custom-loss-function-that-has-more-parameters-with-xgbclassifie
You use a function to return a function that keeps the same callback signature but the callback can use the parent function's data.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Qbik |
Solution 2 | Chris |