Apply transforms in PreProcessor #2467

djdameln · 2024-12-12T15:56:27Z

📝 Description

Delegate the responsibility of applying the input transforms to the PreProcessor. This is more in line with the other auxiliary components such as post-processing and evaluation, where the full functionality of the aux operation is contained in a single class.
To account for datasets with heterogeneous image shapes, we need to perform an additional check in the collate method, and resize the images before collating if not all images have the same shape.
Thorough manual testing may be required to check for any unwanted side-effects.

✨ Changes

Select what type of change your PR is:

🐞 Bug fix (non-breaking change which fixes an issue)
🔨 Refactor (non-breaking change which refactors the code base)
🚀 New feature (non-breaking change which adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📚 Documentation update
🔒 Security update

✅ Checklist

Before you submit your pull request, please make sure you have completed the following steps:

📋 I have summarized my changes in the CHANGELOG and followed the guidelines for my type of change (skip for minor changes, documentation updates, and test enhancements).
📚 I have made the necessary updates to the documentation (if applicable).
🧪 I have written tests that support my changes and prove that my fix is effective or my feature works (if applicable).

For more information about code review checklists, see the Code Review Checklist.

tests/unit/pre_processing/test_pre_processing.py

src/anomalib/pre_processing/pre_processing.py

src/anomalib/data/dataclasses/generic.py

alexriedel1 · 2024-12-16T19:12:53Z

Yes I also thought about the problem and what you now did, adding augmentations to the dataset, is the onyl solution to this..
I also would say it's common and good practice in other frameworks.

djdameln · 2024-12-17T12:52:12Z

Yes I also thought about the problem and what you now did, adding augmentations to the dataset, is the onyl solution to this.. I also would say it's common and good practice in other frameworks.

@alexriedel1 As you've seen, I re-introduced transforms on the dataset/datamodule side, but now named as augmentations.

Motivation:

In addition to the transforms applied by the PreProcessor, we now also have augmentation transforms on the dataset/datamodule side. The two serve different purposes: Augmentations are optional and can be used for enriching the dataset, e.g. random transforms on training set or test-time augmentations on the test set. PreProcessor transforms are model-specific and are used to ensure that the input images are read correctly by the model. Usually this involves resizing and normalization, but other transforms are possible as well.

This way, we keep the model-specific transforms on the model side, which has the advantages mentioned in my earlier comment. At the same time, the augmentation transforms on the datamodule side make it easier for the user to define their custom augmentation pipeline. The main advantage over earlier designs is that the user now only has to define the augmentations, and does not need to include the model-specific transforms such as resizing and normalization (because these are handled separately by the PreProcessor).

Concerning resizing:

When no augmentations are supplied, resizing only happens on the PreProcessor side, using the model-specific transforms defined for the model, or any custom resize/transform passed to the PreProcessor by the user. When the user wants to change the image size, as seen by the model, this is the resize transform that they need to change.
Applying augmentations to the images at full resolution may be costly, so the user may want to include a Resize transform in their augmentation pipeline to speed up the data loading. In this case, resizing will happen twice. Once when the dataset applies the augmentations before collating, and once by the PreProcessor after collating. This may be computationally expensive as you mentioned earlier, but only when the two resize transforms use a different output size. When the resize transforms use the same size, the second one will be skipped.
In the case of heterogeneous image size in the dataset, the user can now just pass a Resize transform to the augmentations argument of the datamodule (or dataset). This will ensure that the dataset resizes the images before collating, and the additional resize operation in the collate method will not be called. (We still keep the resize operation in the collate method as an additional safeguard). This gives the user control over the interpolation method that is used for resizing the images.

Please let me know what you think, any suggestions are more than welcome :)

review-notebook-app · 2024-12-17T16:10:27Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

This reverts commit 151a179.

alexriedel1 · 2024-12-19T08:25:17Z

Yes I also thought about the problem and what you now did, adding augmentations to the dataset, is the onyl solution to this.. I also would say it's common and good practice in other frameworks.

@alexriedel1 As you've seen, I re-introduced transforms on the dataset/datamodule side, but now named as augmentations.

Motivation:

In addition to the transforms applied by the PreProcessor, we now also have augmentation transforms on the dataset/datamodule side. The two serve different purposes: Augmentations are optional and can be used for enriching the dataset, e.g. random transforms on training set or test-time augmentations on the test set. PreProcessor transforms are model-specific and are used to ensure that the input images are read correctly by the model. Usually this involves resizing and normalization, but other transforms are possible as well.

This way, we keep the model-specific transforms on the model side, which has the advantages mentioned in my earlier comment. At the same time, the augmentation transforms on the datamodule side make it easier for the user to define their custom augmentation pipeline. The main advantage over earlier designs is that the user now only has to define the augmentations, and does not need to include the model-specific transforms such as resizing and normalization (because these are handled separately by the PreProcessor).

Concerning resizing:

When no augmentations are supplied, resizing only happens on the PreProcessor side, using the model-specific transforms defined for the model, or any custom resize/transform passed to the PreProcessor by the user. When the user wants to change the image size, as seen by the model, this is the resize transform that they need to change.

Applying augmentations to the images at full resolution may be costly, so the user may want to include a Resize transform in their augmentation pipeline to speed up the data loading. In this case, resizing will happen twice. Once when the dataset applies the augmentations before collating, and once by the PreProcessor after collating. This may be computationally expensive as you mentioned earlier, but only when the two resize transforms use a different output size. When the resize transforms use the same size, the second one will be skipped.

In the case of heterogeneous image size in the dataset, the user can now just pass a Resize transform to the augmentations argument of the datamodule (or dataset). This will ensure that the dataset resizes the images before collating, and the additional resize operation in the collate method will not be called. (We still keep the resize operation in the collate method as an additional safeguard). This gives the user control over the interpolation method that is used for resizing the images.

Please let me know what you think, any suggestions are more than welcome :)

Yes I (and most users too I guess) am happy with this! You should make clear in the docs that preprocessing transforms are stored in the model after export while augmentations are not, so people know exactly which one to use for their use case.

codecov · 2024-12-19T19:27:34Z

Codecov Report

Attention: Patch coverage is 96.21212% with 5 lines in your changes missing coverage. Please review.

Project coverage is 78.62%. Comparing base (58453ef) to head (bebe1ea).
Report is 3 commits behind head on release/v2.0.0.

Files with missing lines	Patch %	Lines
src/anomalib/data/datasets/base/video.py	66.66%	2 Missing ⚠️
src/anomalib/data/datasets/base/image.py	75.00%	1 Missing ⚠️
...malib/models/image/efficient_ad/lightning_model.py	66.66%	1 Missing ⚠️
...c/anomalib/models/image/winclip/lightning_model.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@                Coverage Diff                 @@
##           release/v2.0.0    #2467      +/-   ##
==================================================
+ Coverage           78.50%   78.62%   +0.11%     
==================================================
  Files                 303      306       +3     
  Lines               12955    12947       -8     
==================================================
+ Hits                10170    10179       +9     
+ Misses               2785     2768      -17

Flag	Coverage Δ
integration_py3.10	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/anomalib/data/datamodules/base/image.py

ashwinvaidya17

Thanks for the massive efforts here. Looks good to me.

samet-akcay · 2025-01-03T14:57:28Z

src/anomalib/utils/attrs.py

+from typing import Any
+
+
+def get_nested_attr(obj: Any, attr_path: str, default: Any | None = None) -> Any:  # noqa: ANN401


Can you elaborate why this is needed? Maybe add more description/example to the docstring?

I updated the description and added some examples

samet-akcay · 2025-01-03T15:01:21Z

src/anomalib/data/transforms/utils.py

+from torchvision.transforms.v2 import Compose, Transform
+
+
+def get_transforms_of_type(input_transform: Transform | None, transform_type: type[Transform]) -> list[type[Transform]]:


Suggested change

def get_transforms_of_type(input_transform: Transform | None, transform_type: type[Transform]) -> list[type[Transform]]:

def get_transforms_by_type(transforms: Transform | None, transform_type: type[Transform]) -> list[type[Transform]]:

or filter_transforms_by_type?

For filter_transforms_by_type I would expect the input to be a sequence of transform objects.

This is not entirely accurate, since the function extracts the transforms from a single other transform, which could be an actual transform or a Compose.

Maybe extract_transforms_by_type?

apply transforms in pre-processor

23df070

djdameln requested review from samet-akcay and ashwinvaidya17 as code owners December 12, 2024 15:56

samet-akcay reviewed Dec 12, 2024

View reviewed changes

tests/unit/pre_processing/test_pre_processing.py Outdated Show resolved Hide resolved

samet-akcay reviewed Dec 12, 2024

View reviewed changes

src/anomalib/pre_processing/pre_processing.py Outdated Show resolved Hide resolved

alexriedel1 reviewed Dec 13, 2024

View reviewed changes

src/anomalib/data/dataclasses/generic.py Show resolved Hide resolved

add augmentation arguments to datamodules

5a12622

djdameln added 2 commits December 16, 2024 23:22

update expected config for adapter tests

ec03d3d

fix buffer issue

1d45aa3

djdameln added 2 commits December 17, 2024 16:48

update data notebooks

fe48e66

reduce num workers in MLFlow notebook

151a179

Revert "reduce num workers in MLFlow notebook"

544ac7e

This reverts commit 151a179.

djdameln added 9 commits December 19, 2024 14:43

match resize between augmentations and model transforms

f15f0f7

remove subset-specific transforms in preprocessor

8ae05a3

move nested attr helper to utils

4849be2

move transform retrieve function to transform utils

f8882a1

update efficientad transform validation

ceabaf7

formatting

133822b

update preprocessor docstring

a6ecb3a

fix data notebook

11edd81

fix logic in _update_augmentations

74f565b

djdameln added 4 commits December 19, 2024 22:05

merge release branch

9470a68

add unit tests for updating augmentations in data module

84d3120

add unit tests for updating augmentations in data module

1d9ea31

add test for collate method

d85e25d

samet-akcay reviewed Dec 21, 2024

View reviewed changes

src/anomalib/data/datamodules/base/image.py Show resolved Hide resolved

copy transform before converting

03ce134

ashwinvaidya17 approved these changes Jan 2, 2025

View reviewed changes

samet-akcay reviewed Jan 3, 2025

View reviewed changes

djdameln added 2 commits January 3, 2025 18:33

update docstring

1511621

rename function

bebe1ea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply transforms in PreProcessor #2467

Apply transforms in PreProcessor #2467

djdameln commented Dec 12, 2024

alexriedel1 commented Dec 16, 2024

djdameln commented Dec 17, 2024

review-notebook-app bot commented Dec 17, 2024

alexriedel1 commented Dec 19, 2024 •

edited

Loading

codecov bot commented Dec 19, 2024 •

edited

Loading

ashwinvaidya17 left a comment

samet-akcay Jan 3, 2025 •

edited

Loading

djdameln Jan 3, 2025

samet-akcay Jan 3, 2025 •

edited

Loading

djdameln Jan 3, 2025

		from typing import Any


		def get_nested_attr(obj: Any, attr_path: str, default: Any \| None = None) -> Any: # noqa: ANN401

		from torchvision.transforms.v2 import Compose, Transform


		def get_transforms_of_type(input_transform: Transform \| None, transform_type: type[Transform]) -> list[type[Transform]]:

	def get_transforms_of_type(input_transform: Transform \| None, transform_type: type[Transform]) -> list[type[Transform]]:
	def get_transforms_by_type(transforms: Transform \| None, transform_type: type[Transform]) -> list[type[Transform]]:

Apply transforms in PreProcessor #2467

Are you sure you want to change the base?

Apply transforms in PreProcessor #2467

Conversation

djdameln commented Dec 12, 2024

📝 Description

✨ Changes

✅ Checklist

alexriedel1 commented Dec 16, 2024

djdameln commented Dec 17, 2024

review-notebook-app bot commented Dec 17, 2024

alexriedel1 commented Dec 19, 2024 • edited Loading

codecov bot commented Dec 19, 2024 • edited Loading

Codecov Report

ashwinvaidya17 left a comment

Choose a reason for hiding this comment

samet-akcay Jan 3, 2025 • edited Loading

Choose a reason for hiding this comment

djdameln Jan 3, 2025

Choose a reason for hiding this comment

samet-akcay Jan 3, 2025 • edited Loading

Choose a reason for hiding this comment

djdameln Jan 3, 2025

Choose a reason for hiding this comment

alexriedel1 commented Dec 19, 2024 •

edited

Loading

codecov bot commented Dec 19, 2024 •

edited

Loading

samet-akcay Jan 3, 2025 •

edited

Loading

samet-akcay Jan 3, 2025 •

edited

Loading