'Custom objective function for XGBoost including an external data column

I am using XGBoost in order to do a sales forecasting. I need a custom objective function, as the value of the prediction depends on the sales price of an item. I am struggling to feed in the sales price into the loss function next to the labels and predictions. This is my approach:

def monetary_value_objective(predt: np.ndarray, dtrain: Union[xgb.DMatrix, np.ndarray]) -> Tuple[np.ndarray, np.ndarray]:
  """
  predt = model prediction
  dtrain = labels 
  Currently, dtrain is a numpy array.
  """

  y = dtrain

  mask1 = predt <= y  # Predict too few
  mask2 = predt > y  # Predict too much

  price = train[0]["salesPrice"]

  grad = price **2 * (predt - y)  
  # Gradient is negative if prediction is too low, and positive if it is too high
  # Here scale it (0.72 = 0.6**2 * 2)
  grad[mask1] = 2 * grad[mask1]
  grad[mask2] = 0.72 * grad[mask2]

  hess = np.empty_like(grad)
  hess[mask1] = 2 * price[mask1]**2
  hess[mask2] = 0.72 * price[mask2]**2

  grad = -grad

  return grad, hess

I get the following error when hyperparameter tuning:

[09:11:35] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
  0%|          | 0/1 [00:00<?, ?it/s, best loss: ?]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-34-2c64dc1b5a76> in <module>()
      1 # set runtime environment to GPU at: Runtime -> Change runtime type
----> 2 trials, best_hyperparams = hyperpara_tuning(para_space)
      3 final_xgb_model = trials.best_trial['result']['model']
      4 assert final_xgb_model is not None, "Oooops there is no model created :O "
      5 

17 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/indexers.py in check_array_indexer(array, indexer)
    399         if len(indexer) != len(array):
    400             raise IndexError(
--> 401                 f"Boolean index has wrong length: "
    402                 f"{len(indexer)} instead of {len(array)}"
    403             )

IndexError: Boolean index has wrong length: 1 instead of 136019

Does anyone have an idea how to use the sales price in the objective function? Is this possible at all?

Thanks!



Solution 1:[1]

You can use weights vector in your custom objective function, if you encode your external variable into weights distribution it could work, but I don't know if weights itself are only used in objective function inself or mayby also at level of data sampling, if so you would obtain much more complicated situation...

Solution 2:[2]

A bit late, but this answers the OP, https://datascience.stackexchange.com/questions/74780/how-to-implement-custom-loss-function-that-has-more-parameters-with-xgbclassifie

You use a function to return a function that keeps the same callback signature but the callback can use the parent function's data.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Qbik
Solution 2 Chris