Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDF failure at: /__w/cudf/cudf/cpp/src/io/parquet/reader_impl_helpers.cpp:379: Invalid rowgroup index[BUG] #756

Open
Oussamakhammassi opened this issue Nov 3, 2023 · 10 comments
Labels
bug Something isn't working status/needs-triage

Comments

@Oussamakhammassi
Copy link

Tried to run the tutorial of transformers4rec and i got this error

RuntimeError Traceback (most recent call last)
in

/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in evaluate(self, eval_dataset, ignore_keys, metric_key_prefix)
3005 self._memory_tracker.start()
3006
-> 3007 eval_dataloader = self.get_eval_dataloader(eval_dataset)
3008 start_time = time.time()
3009

16 frames
/usr/local/lib/python3.10/dist-packages/cudf/io/parquet.py in _read_parquet(filepaths_or_buffers, engine, columns, row_groups, use_pandas_metadata, *args, **kwargs)
819 f"following positional arguments: {list(args)}"
820 )
--> 821 return libparquet.read_parquet(
822 filepaths_or_buffers,
823 columns=columns,

parquet.pyx in cudf._lib.parquet.read_parquet()

parquet.pyx in cudf._lib.parquet.read_parquet()

RuntimeError: CUDF failure at: /__w/cudf/cudf/cpp/src/io/parquet/reader_impl_helpers.cpp:379: Invalid rowgroup index

@Oussamakhammassi Oussamakhammassi added bug Something isn't working status/needs-triage labels Nov 3, 2023
@rnyak
Copy link
Contributor

rnyak commented Nov 3, 2023

@Oussamakhammassi can you please tell us how did you install transformers4rec? are you using merlin-pytorch image?

Please also start with https://github.com/NVIDIA-Merlin/Transformers4Rec/tree/main/examples/getting-started-session-based examples since the tutorial nbs have not been updated recently.

@Oussamakhammassi
Copy link
Author

Hi rnyak!

pip install transformers4rec[nvtabular]

No i'm not using merlin-pytorch image

@rnyak
Copy link
Contributor

rnyak commented Nov 6, 2023

@Oussamakhammassi I'd recommend you to use docker image. Installing only transformers4rec[nvtabular] wont install cudf , dask_cudf etc.

if you want to install via pip you need to install rapids cudf and dask_cudf first (please see their doc here: https://docs.rapids.ai/install) and then install other Merlin libs as well:

  • models
  • dataloader
  • systems
  • core

@Oussamakhammassi
Copy link
Author

Yess i did all that but still don't work!

@rnyak
Copy link
Contributor

rnyak commented Nov 8, 2023

@Oussamakhammassi you need a compatible GPU and properly installed cuda driver to be able to import and use cudf library. what's your GPU specs? can you share the prints out of nvidia-smi and also nvcc --version?

@rnyak
Copy link
Contributor

rnyak commented Nov 8, 2023

@Oussamakhammassi
Copy link
Author

For the version, here's the output:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

For the example that you've sent to me, yes i did run it and it works well but i don't know why the other examples have this error

@Oussamakhammassi
Copy link
Author

Wed Nov 8 15:51:59 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 39C P8 9W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

@Bharathjpv
Copy link

i worked with this, example notebooks are working fine, but when i run with custom data, it throws this error with i call trainer.evaluate() method.

@rnyak
Copy link
Contributor

rnyak commented Jan 19, 2024

@Bharathjpv please share your error, and a reproducible toy example. we need to see what you are doing in your NVT and model training and eval pipeline to help you. thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working status/needs-triage
Projects
None yet
Development

No branches or pull requests

3 participants