Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply transforms in PreProcessor #2467

Open
wants to merge 23 commits into
base: release/v2.0.0
Choose a base branch
from

Conversation

djdameln
Copy link
Contributor

📝 Description

  • Delegate the responsibility of applying the input transforms to the PreProcessor. This is more in line with the other auxiliary components such as post-processing and evaluation, where the full functionality of the aux operation is contained in a single class.
  • To account for datasets with heterogeneous image shapes, we need to perform an additional check in the collate method, and resize the images before collating if not all images have the same shape.
  • Thorough manual testing may be required to check for any unwanted side-effects.

✨ Changes

Select what type of change your PR is:

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • 🔨 Refactor (non-breaking change which refactors the code base)
  • 🚀 New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📚 Documentation update
  • 🔒 Security update

✅ Checklist

Before you submit your pull request, please make sure you have completed the following steps:

  • 📋 I have summarized my changes in the CHANGELOG and followed the guidelines for my type of change (skip for minor changes, documentation updates, and test enhancements).
  • 📚 I have made the necessary updates to the documentation (if applicable).
  • 🧪 I have written tests that support my changes and prove that my fix is effective or my feature works (if applicable).

For more information about code review checklists, see the Code Review Checklist.

@alexriedel1
Copy link
Contributor

Yes I also thought about the problem and what you now did, adding augmentations to the dataset, is the onyl solution to this..
I also would say it's common and good practice in other frameworks.

@djdameln
Copy link
Contributor Author

Yes I also thought about the problem and what you now did, adding augmentations to the dataset, is the onyl solution to this.. I also would say it's common and good practice in other frameworks.

@alexriedel1 As you've seen, I re-introduced transforms on the dataset/datamodule side, but now named as augmentations.

Motivation:

In addition to the transforms applied by the PreProcessor, we now also have augmentation transforms on the dataset/datamodule side. The two serve different purposes: Augmentations are optional and can be used for enriching the dataset, e.g. random transforms on training set or test-time augmentations on the test set. PreProcessor transforms are model-specific and are used to ensure that the input images are read correctly by the model. Usually this involves resizing and normalization, but other transforms are possible as well.

This way, we keep the model-specific transforms on the model side, which has the advantages mentioned in my earlier comment. At the same time, the augmentation transforms on the datamodule side make it easier for the user to define their custom augmentation pipeline. The main advantage over earlier designs is that the user now only has to define the augmentations, and does not need to include the model-specific transforms such as resizing and normalization (because these are handled separately by the PreProcessor).

Concerning resizing:

  • When no augmentations are supplied, resizing only happens on the PreProcessor side, using the model-specific transforms defined for the model, or any custom resize/transform passed to the PreProcessor by the user. When the user wants to change the image size, as seen by the model, this is the resize transform that they need to change.
  • Applying augmentations to the images at full resolution may be costly, so the user may want to include a Resize transform in their augmentation pipeline to speed up the data loading. In this case, resizing will happen twice. Once when the dataset applies the augmentations before collating, and once by the PreProcessor after collating. This may be computationally expensive as you mentioned earlier, but only when the two resize transforms use a different output size. When the resize transforms use the same size, the second one will be skipped.
  • In the case of heterogeneous image size in the dataset, the user can now just pass a Resize transform to the augmentations argument of the datamodule (or dataset). This will ensure that the dataset resizes the images before collating, and the additional resize operation in the collate method will not be called. (We still keep the resize operation in the collate method as an additional safeguard). This gives the user control over the interpolation method that is used for resizing the images.

Please let me know what you think, any suggestions are more than welcome :)

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@alexriedel1
Copy link
Contributor

alexriedel1 commented Dec 19, 2024

Yes I also thought about the problem and what you now did, adding augmentations to the dataset, is the onyl solution to this.. I also would say it's common and good practice in other frameworks.

@alexriedel1 As you've seen, I re-introduced transforms on the dataset/datamodule side, but now named as augmentations.

Motivation:

In addition to the transforms applied by the PreProcessor, we now also have augmentation transforms on the dataset/datamodule side. The two serve different purposes: Augmentations are optional and can be used for enriching the dataset, e.g. random transforms on training set or test-time augmentations on the test set. PreProcessor transforms are model-specific and are used to ensure that the input images are read correctly by the model. Usually this involves resizing and normalization, but other transforms are possible as well.

This way, we keep the model-specific transforms on the model side, which has the advantages mentioned in my earlier comment. At the same time, the augmentation transforms on the datamodule side make it easier for the user to define their custom augmentation pipeline. The main advantage over earlier designs is that the user now only has to define the augmentations, and does not need to include the model-specific transforms such as resizing and normalization (because these are handled separately by the PreProcessor).

Concerning resizing:

  • When no augmentations are supplied, resizing only happens on the PreProcessor side, using the model-specific transforms defined for the model, or any custom resize/transform passed to the PreProcessor by the user. When the user wants to change the image size, as seen by the model, this is the resize transform that they need to change.
  • Applying augmentations to the images at full resolution may be costly, so the user may want to include a Resize transform in their augmentation pipeline to speed up the data loading. In this case, resizing will happen twice. Once when the dataset applies the augmentations before collating, and once by the PreProcessor after collating. This may be computationally expensive as you mentioned earlier, but only when the two resize transforms use a different output size. When the resize transforms use the same size, the second one will be skipped.
  • In the case of heterogeneous image size in the dataset, the user can now just pass a Resize transform to the augmentations argument of the datamodule (or dataset). This will ensure that the dataset resizes the images before collating, and the additional resize operation in the collate method will not be called. (We still keep the resize operation in the collate method as an additional safeguard). This gives the user control over the interpolation method that is used for resizing the images.

Please let me know what you think, any suggestions are more than welcome :)

Yes I (and most users too I guess) am happy with this! You should make clear in the docs that preprocessing transforms are stored in the model after export while augmentations are not, so people know exactly which one to use for their use case.

Copy link

codecov bot commented Dec 19, 2024

Codecov Report

Attention: Patch coverage is 96.21212% with 5 lines in your changes missing coverage. Please review.

Project coverage is 78.62%. Comparing base (58453ef) to head (bebe1ea).
Report is 3 commits behind head on release/v2.0.0.

Files with missing lines Patch % Lines
src/anomalib/data/datasets/base/video.py 66.66% 2 Missing ⚠️
src/anomalib/data/datasets/base/image.py 75.00% 1 Missing ⚠️
...malib/models/image/efficient_ad/lightning_model.py 66.66% 1 Missing ⚠️
...c/anomalib/models/image/winclip/lightning_model.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@                Coverage Diff                 @@
##           release/v2.0.0    #2467      +/-   ##
==================================================
+ Coverage           78.50%   78.62%   +0.11%     
==================================================
  Files                 303      306       +3     
  Lines               12955    12947       -8     
==================================================
+ Hits                10170    10179       +9     
+ Misses               2785     2768      -17     
Flag Coverage Δ
integration_py3.10 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@ashwinvaidya17 ashwinvaidya17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the massive efforts here. Looks good to me.

from typing import Any


def get_nested_attr(obj: Any, attr_path: str, default: Any | None = None) -> Any: # noqa: ANN401
Copy link
Contributor

@samet-akcay samet-akcay Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate why this is needed? Maybe add more description/example to the docstring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the description and added some examples

from torchvision.transforms.v2 import Compose, Transform


def get_transforms_of_type(input_transform: Transform | None, transform_type: type[Transform]) -> list[type[Transform]]:
Copy link
Contributor

@samet-akcay samet-akcay Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_transforms_of_type(input_transform: Transform | None, transform_type: type[Transform]) -> list[type[Transform]]:
def get_transforms_by_type(transforms: Transform | None, transform_type: type[Transform]) -> list[type[Transform]]:

or filter_transforms_by_type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For filter_transforms_by_type I would expect the input to be a sequence of transform objects.

This is not entirely accurate, since the function extracts the transforms from a single other transform, which could be an actual transform or a Compose.

Maybe extract_transforms_by_type?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants