'GPU runs out of memory when training a ml model

I am trying to train a ml model using dask. I am training on my local machine with 1 GPU. My GPU has 24 GiBs of memory.

from dask_cuda import LocalCUDACluster
from dask.distributed import Client, LocalCluster

import dask.dataframe as dd
import pandas as pd
import numpy as np
import os
import xgboost as xgb

np.random.seed(42)


def get_columns(filename):
    return pd.read_csv(filename, nrows=10).iloc[:, :NUM_FEATURES].columns


def get_data(filename, target):
    import dask_cudf
    X = dask_cudf.read_csv(filename)
    # X = dd.read_csv(filename, assume_missing=True)
    y = X[[target]]
    X = X.iloc[:, :NUM_FEATURES]
    return X, y


def main(client: Client) -> None:
    X, y = get_data(FILENAME, TARGET)
    model = xgb.dask.DaskXGBRegressor(
        tree_method="gpu_hist",
        objective="reg:squarederror",
        seed=42,
        max_depth=5,
        eta=0.01,
        n_estimators=10)

    model.client = client
    model.fit(X, y, eval_set=[(X, y)])
    print("Saving the model..")
    model.get_booster().save_model("xgboost.model")

    print("Doing model importance..")
    columns = get_columns(FILENAME)
    pd.Series(model.feature_importances_, index=columns).sort_values(ascending=False).to_pickle("~/yolo.pkl")


if __name__ == "__main__":
    os.environ["MALLOC_TRIM_THRESHOLD_"]="65536"
    with LocalCUDACluster(device_memory_limit="15 GiB", rmm_pool_size="20 GiB") as cluster:
    # with LocalCluster() as cluster:
        with Client(cluster) as client:
            print(client)
            main(client)

Error as follows.

MemoryError: std::bad_alloc: out_of_memory: RMM failure at:/workspace/.conda-bld/work/include/rmm/mr/device/pool_memory_resource.hpp:192: Maximum pool size exceeded

Basically my GPU runs out of memory when I call model.fit. It works when I use a csv with 64100 rows and fails when I use a csv with 128198 rows (2x rows). These aren't large files so I assume I am doing something wrong.

I have tried fiddling around with

  • LocalCUDACluster: device_memory_limit and rmm_pool_size
  • dask_cudf.read_csv: chunksize

Nothing has worked.

I have been stuck on this all day so any help would be much appreciated.



Solution 1:[1]

You cannot train an xgboost model where the model grows larger than the remaining GPU memory size. You can scale out with dask_xgboost, but you need to ensure that the total GPU memory is sufficient.

Here is a great blog on this by Coiled: https://coiled.io/blog/dask-xgboost-python-example/

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 TaureanDyerNV