Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuDNN, cuFFT, and cuBLAS Plugin Registration Errors #20803

Open
3 tasks
johnnkp opened this issue Dec 21, 2024 · 0 comments
Open
3 tasks

cuDNN, cuFFT, and cuBLAS Plugin Registration Errors #20803

johnnkp opened this issue Dec 21, 2024 · 0 comments

Comments

@johnnkp
Copy link

johnnkp commented Dec 21, 2024

This issue is copy from tensorflow/tensorflow#62075 (comment).

Tensorflow CUDA users have occurred the below errors since 2023 and unable to use GPU acceleration:

2023-10-09 13:36:23.355516: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-10-09 13:36:23.355674: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-10-09 13:36:23.355933: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

A possible reason is missing checking before register factory. In xla\stream_executor\rocm\rocm_*.cc, plugin registration function will call PluginRegistry::Instance()->HasFactory() before PluginRegistry::Instance()->RegisterFactory<>():

auto rocBlasAlreadyRegistered = PluginRegistry::Instance()->HasFactory(
      rocm::kROCmPlatformId, PluginKind::kBlas);

  if (!rocBlasAlreadyRegistered) {
    absl::Status status =
        PluginRegistry::Instance()
            ->RegisterFactory<PluginRegistry::BlasFactory>(...);

    if (!status.ok()) {
      LOG(ERROR) << "Unable to register rocBLAS factory: " << status.message();
    }
  }

Currently, initialize_cublas(), initialize_cudnn() and initialize_cufft() in xla\stream_executor\cuda\cuda_*.cc directly register factory without checking existing factory. XLA team please add the checking for Tensorflow contributors to merge, such that CUDA users can install nightly build and test the fix soon.

Tasks

Preview Give feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant