Skip to content

Check failed: (best_split_info.left_count) > (0) #4946

@chixujohnny

Description

@chixujohnny

Hi, I found a bug when training with large X_train.

lgb-gpu version: 3.3.2
CUDA=11.1
CentOS
ram=2TB
GPU=A100-40G

when the X_train is more than (1800w, 1000), lgb-gpu will has a bug like this:
[LightGBM] [Fatal] Check failed: (best_split_info.left_count) > (0) at LightGBM/src/treelearner/serial_tree_learner.app, line 686

When I use LGB==3.2.1 , I have the same problem as #4480 : when the GPU memory more than 8.3G will Memory Object Allocation Failure

in this LGB version==3.3.2 , LGB can't load more than 17G GPU memory (GPU has 40G memory), it seems like something problem occur in the tree split step and only happened when GPU memory loaded more than 17G.

Another colleague has this same problem, his lgb version is 3.3.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    buggpu (OpenCL)Issue is related to the OpenCL-based GPU variant.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions