Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when calling MacenkoNormalizer.fit with tensorflow backend #37

Open
bertrandchauveau opened this issue Feb 25, 2023 · 12 comments
Open
Assignees
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers

Comments

@bertrandchauveau
Copy link

Hi,
I am sorry if my question is trivial but I have trouble using this package with the tensorflow backend.
Using torchstain 1.2.0, I have no problem performing a Macenko normalization with numpy. But as I try with tensorflow, it crashes using normalizer.fit

target_path = '/XXX.jpg'
target = cv2.cvtColor(cv2.imread(target_path), cv2.COLOR_BGR2RGB)
tf_normalizer = torchstain.normalizers.MacenkoNormalizer(backend='tensorflow')  

The only thing that I am doing differently from the provided example is the tensor conversion of the numpy array.
That is, I am not doing this

T = transforms.Compose([
    transforms.ToTensor(),
    transforms.Lambda(lambda x: x*255)
])

But rather tried this to match the transformation:

target = tf.constant(target, dtype=tf.float32)  #convert to tensor
target = tf.transpose(target, perm=[2, 0, 1])  #channel first

tf_normalizer.fit(target)

Is this why it crashes ? Is there a way to run this without using torchvision.transforms/on a pure TF basis?

I am using Tensorflow_2.10.0 and have installed torchstain using pip install torchstain[tf].
I currently do not use nor have installed torchvision in my TF environment.

Thank you for your advice

@andreped andreped self-assigned this Feb 25, 2023
@andreped andreped added good first issue Good for newcomers documentation Improvements or additions to documentation labels Feb 25, 2023
@andreped
Copy link
Collaborator

andreped commented Feb 25, 2023

I am sorry if my question is trivial but I have trouble using this package with the tensorflow backend.

Hello, @bertrandchauveau! I had this issue when making this myself, so no worries :]

You can take a look at what is done in the tests here.

Basically, do this instead:

import tensorflow as tf
import torchstain
import numpy as np

T = lambda x: tf.convert_to_tensor(np.moveaxis(x, -1, 0).astype("float32"))
t_to_transform = T(to_transform)

normalizer = torchstain.normalizers.MacenkoNormalizer(backend='tensorflow')
normalizer.fit(T(target))
result, _, _ = normalizer.normalize(I=t_to_transform, stains=True)

result = result.numpy().astype("float32")

Could you try this first to see if it resolves you issue? I'm a bit occupied right now, but could take a new look tomorrow, if you are still having issues.

This will be better documented in the upcoming release, which includes some new and interested stain normalization techniques and new backends (see here).

BTW: What is the status on the release, @carloalbertobarbano? Shall we aim to get it released by next week? I have a master student who would be interested in the new modified reinhard implementation.

@bertrandchauveau
Copy link
Author

Thank you for your quick response!

Sadly the same problem occurs, i.e. crashes when running:

normalizer.fit(T(target))

the "T" conversion does the same as my attempt of tf tensor conversion

@andreped
Copy link
Collaborator

Sadly the same problem occurs, i.e. crashes when running:

Hmm, well, what I described above is what we do in the unit test, so that should work. Could you show me the error log from the terminal?

Also, could you try downloading the test data that we used for the unit tests here and here, and try running them through your code. I believe that should work. If that works, then the intensity range of your image after imread is in the wrong range. You can see the intensity range by running print(np.unique(image))

Also, I noticed that you were a pathologists. If you just want to get a method working, I would recommend trying the command line tool fast-stain-normalization that is based on torchstain. It enables you to normalize an entire folder without needing to code. Just provide arguments to a CLI and run it from the terminal. You can see how to use it here.

@bertrandchauveau
Copy link
Author

I had the same issue with the test images that you provided.

This is the error message from the terminal:

2023-02-26 16:20:02.217992: I tensorflow/stream_executor/cuda/cuda_blas.cc:1614] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2023-02-26 16:20:02.225717: I tensorflow/core/util/cuda_solvers.cc:179] Creating GpuSolver handles for stream 000001CE08BFD700
2023-02-26 16:20:03.039762: F tensorflow/core/util/cuda_solvers.cc:114] Check failed: cusolverDnCreate(&cusolver_dn_handle) == CUSOLVER_STATUS_SUCCESS Failed to create cuSolverDN instance.
[I 16:20:28.009 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports

I tried this kind of things from what I saw from stackoverflow, but the kernel still crashes:

gpu = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(device=gpu[0], enable=True)

As I understand it, Tensorflow tries to place the tensors on the GPU, but for whatever reason, it does not work (as you said, I'm a pathologist.) For note, I have an RTX 4090 in a Windows setup and I have not encountered similar issues when tranining deep learning models.

So by forcing Tensorflow to use the CPU with:

with tf.device('/CPU:0'):
    tf_normalizer.fit(T(target))
    result_tf, _, _ = tf_normalizer.normalize(I=t_to_transform, stains=True)

It works as intended.

Should it also work with the GPU?

@andreped
Copy link
Collaborator

andreped commented Feb 26, 2023

I was unable to reproduce your issue. See gist.
As you can see from the gist, it works just fine with GPU, also for TF backend.

What you are observing I'm guessing is likely related to the TensorFloat-32 message your are seeing, which I have not seen before. This likely happens because you have a very new GPU, 4090, which I would think might produce some issues.

First I would try disabling TensorFloats, by adding this to the top of your script (after tf import): tf.enable_tensor_float_32_execution(False)

If that did not fix the issue, try installing the nightly release of TF to see if this has been fixed recently:

pip uninstall tensorflow && pip install tf-nightly

@bertrandchauveau
Copy link
Author

Thank you for your response. Agree that it works nicely in colab.

On my local machine, I disabled TensorFloat-32 with :
tf.config.experimental.enable_tensor_float_32_execution(False)

But the kernel still crashes when fitting the normalizer.
Upgrading tensorflow won’t be as simple as that since I am currently running on native Windows and tf_2.10.0 was the last version that allowed this according to the tf documentation. Upgrading would require to use WSL2, but I am not ready for this right now.

My initial idea (perhaps not a good one) for my project was to use torchstain to normalize images on the fly using a custom data generator, this to avoid the duplication of the dataset (normalized and non-normalized).

For now, I will duplicate my dataset, as relying on the CPU for normalization slows down the batch preparation pretty much. I’ll give it a try when I’m ready to upgrade tensorflow or will try with pytorch which seems less windows-phobic.

@carloalbertobarbano
Copy link
Member

Hi @bertrandchauveau, what version of CUDA and cuDNN are you using?

@andreped
Copy link
Collaborator

andreped commented Mar 2, 2023

My initial idea (perhaps not a good one) for my project was to use torchstain to normalize images on the fly using a custom data generator, this to avoid the duplication of the dataset (normalized and non-normalized).

That's exactly what I do in my training frameworks and that works just fine. As long as you are using tf.data.Dataset and take advantage of multithreading, it is barely any lag :] But I guess it depends on how much lag you expect and can tolerate, how large the images are, which CPU and SSD/HDD you have, and whatnot.

I don't really work on windows for training models anymore. Note that multithreading does not work as well on windows, as for UNIX-based systems.

Hi @bertrandchauveau, what version of CUDA and cuDNN are you using?

I guess as you seem to be using anaconda, you have installed CUDA through something like this. As I said, I don't have that much experience with conda, as I don't use it myself, but I guess @carloalbertobarbano can help you on that.

@bertrandchauveau
Copy link
Author

Hi @carloalbertobarbano,
cudatoolkit 11.2.2
cudnn 8.1.0.77
Exactly, installed via conda

@andreped
Copy link
Collaborator

@bertrandchauveau Are you still experiencing issues?

@bertrandchauveau
Copy link
Author

Hi,
Thank you for your message and sorry for my late reply. Since my last message:

  • I installed torchstain 1.3.0
  • kernel still crashes when using the Macenko approach, in fact now when calling:
    torchstain.normalizers.MacenkoNormalizer(backend='tensorflow')
    Same error message as before.
  • With using modified Reinhard method on a single image, sometimes it worked with the GPU, sometimes it crashed. I did not have time to explore this more.

It works when I force torchstain to work on the CPU. With tf.data.Dataset, it is true that there is not much lag during pure training (about +10% for me as compared to no stain normalization) but the validation step after each training epoch is much longer.

  • As you suggested it, I tried to install the last tf.2.12 on WSL, but failed for now with it seems endless error messages for tf to simply work and recognize the GPU...

I should have a bit more time this week to see why sometimes it seems to work with the modified Reinhard method.

@andreped
Copy link
Collaborator

andreped commented Apr 23, 2023

As you suggested it, I tried to install the last tf.2.12 on WSL, but failed for now with it seems endless error messages for tf to simply work and recognize the GPU...

AFAIK, there does not yet exist a precompiled binary of tf 2.12 on windows, so I believe that might result in some issues. But if you are using WSL it should work better. You could post the error messages you are getting and I could try to debug it for you. Note that I believe you need a nightly release, as the GPU you have might be too new, as discussed above.

I should have a bit more time this week to see why sometimes it seems to work with the modified Reinhard method.

Why it sometimes works and sometimes fails does not make much sense to me. Have you tried not using Anaconda and just regular Python virtual environments? You will need to setup CUDA yourself then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers
Projects
Status: No status
Development

No branches or pull requests

3 participants