You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tensorflow CUDA users have occurred the below errors since 2023 and unable to use GPU acceleration:
2023-10-09 13:36:23.355516: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-10-09 13:36:23.355674: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-10-09 13:36:23.355933: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
A possible reason is missing checking before register factory. In xla\stream_executor\rocm\rocm_*.cc, plugin registration function will call PluginRegistry::Instance()->HasFactory() before PluginRegistry::Instance()->RegisterFactory<>():
auto rocBlasAlreadyRegistered = PluginRegistry::Instance()->HasFactory(
rocm::kROCmPlatformId, PluginKind::kBlas);
if (!rocBlasAlreadyRegistered) {
absl::Status status =
PluginRegistry::Instance()
->RegisterFactory<PluginRegistry::BlasFactory>(...);
if (!status.ok()) {
LOG(ERROR) << "Unable to register rocBLAS factory: " << status.message();
}
}
Currently, initialize_cublas(), initialize_cudnn() and initialize_cufft() in xla\stream_executor\cuda\cuda_*.cc directly register factory without checking existing factory. XLA team please add the checking for Tensorflow contributors to merge, such that CUDA users can install nightly build and test the fix soon.
The content you are editing has changed. Please copy your edits and refresh the page.
This issue is copy from tensorflow/tensorflow#62075 (comment).
Tensorflow CUDA users have occurred the below errors since 2023 and unable to use GPU acceleration:
A possible reason is missing checking before register factory. In
xla\stream_executor\rocm\rocm_*.cc
, plugin registration function will callPluginRegistry::Instance()->HasFactory()
beforePluginRegistry::Instance()->RegisterFactory<>()
:Currently, initialize_cublas(), initialize_cudnn() and initialize_cufft() in
xla\stream_executor\cuda\cuda_*.cc
directly register factory without checking existing factory. XLA team please add the checking for Tensorflow contributors to merge, such that CUDA users can install nightly build and test the fix soon.Tasks
The text was updated successfully, but these errors were encountered: