Whitening Update #79

HarrisonSantiago · 2024-11-23T16:49:12Z

Whiten.py works with data on [N, D] level
Images.py is a high level wrapper that contains additional support for whitening images
Patches.py -> Images.py
Updated paths in misc example notebooks
Created new example notebook showing how to use whitening

belsten · 2024-11-27T03:21:00Z

I've noticed an issue with how the ZCA/PCA whitening is done.

As of right now, compute_whitening_stats masks out the eigenvalues/vectors based on n_components. This masking should occur after the inversion of the eigenvalues (i.e. in whiten) and that mask only needs to be applied to the eigenvalues. Right now, zeros are being inverted in whiten resulting in nans.

One way to solve this is pass n_components to whiten, which would be better because then it would be with epsilon, a parameter that has a similar function. Then whiten would perform the logic of keeping however many PCs via masking.

Here is a minimum working example that demonstrates the issue:

A = torch.from_numpy(np.asarray([
    [1,0.2,0.4],
    [0.2,1,0.2],
    [0.4,0.2,1],
])).float()
data = torch.randn(10000,3)@A.T

# set n_components less than 3 and epsilon=0
whitened_stats = compute_whitening_stats(data, n_components=2)
whiten(data,algorithm="zca",stats=whitened_stats)

Output:

tensor([[nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan],
        ...,
        [nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan]])

belsten

See latest comment regarding PCA/ZCA whitening.

HarrisonSantiago · 2024-11-27T21:12:27Z

Thanks for catching that! Moved it into whiten and now it only masks eigenvalues. Made some changes in the accompanying notebook and images.py to reflect

9q9q · 2024-11-27T21:55:57Z

sparsecoding/transforms/whiten.py

I think this could be unstable if the covariance matrix is ill-conditioned? Since L is triangular, could you use https://pytorch.org/docs/stable/generated/torch.linalg.solve_triangular.html?

Is switching to cholesky-stable what you were thinking? https://colab.research.google.com/drive/1W8FdF-K6hYE3W0wOmRSGVQv9kPLaRH18?usp=sharing

Thanks for making the notebook, I should've been a good reviewer like Alex and given you an example to start with 😅.

What I had in mind initially was
X_whitened = torch.linalg.solve_triangular(L, X.T, upper=False).T
to solve for the whitened data directly, but I think what you have works as well, plus it's consistent with the other implementations. And maybe it's nice to compute W explicitly if we want to return it in the future (e.g. to whiten other similar data).

Looks like the errors are actually not too far off the for the two methods, but using solve_triangular is ~2x faster (added some timing code): https://colab.research.google.com/drive/1CIfNaWEbOfU9R8WeZYEQ4XepJU6ZWIRy?usp=sharing

And here's a nice writeup I found about matrix inversions in practice: https://gregorygundersen.com/blog/2020/12/09/matrix-inversion/

I also agree that an option to return the whitening matrix would be useful. It's helpful if you're converting between whitened/unwhitened data or want to visualize the whitening transform.

Thanks for sharing the write up! I went with using solve_triangular but not solving for X_whitened directly, I like having the functionality of returning the whitening matrix for all of the methods

9q9q · 2024-11-27T21:57:48Z

sparsecoding/transforms/whiten.py

+def whiten(X: torch.Tensor,
+           algorithm: str = 'zca',
+           stats: Dict = None,
+           n_components=None,


nit

Suggested change

n_components=None,

n_components = None,

9q9q · 2024-11-27T22:13:48Z

sparsecoding/transforms/whiten.py

+    ----------
+    Whitened data of shape [N, D]
+
+    Notes


I think it would be helpful for documentation to

add a few words (super short) for why someone would want to choose PCA or Cholesky over the the default ZCA, and then direct them to the paper or stackexchange for more details.

add short example usage in a code block that calls compute_whitening_stats and then whiten (not sure how to format this for Sphinx @belsten). Maybe this is excessive, but I think we should be clear because these are separate functions. Or could mention first calling compute_whitening_stats in the stats Parameter field.

What do you think?

I cover all of that in the whitening notebook that'll be added to the documentation. What about adding that notebook to the references to encourage people to check it out?

A pointer to the notebook would work instead IMO! And it avoids bloat in the docstring.

9q9q · 2024-11-27T22:15:06Z

tests/transforms/test_whiten.py

Please also add tests for compute_whitening_stats and whitening using PCA and Cholesky.

…otebook, added tests

HarrisonSantiago and others added 14 commits November 15, 2024 22:10

Moving transforms, starting to add new whitening layout

6ad00e3

fix

5c1a03c

fixes

07cf257

fixes

f96f5c9

linting, fixing n_components

1e62b18

fixing tests

756d006

fixes

c505d35

changing target variances, simplfying code, increasing warnings

98d58b5

fixing docstrings

f58bd35

fixing cholesky

d4ff81a

bug fix, moving patch into images

7772acc

fixing paths

7d626f5

Created using Colab

2fd3296

Merge branch 'main' into harrison/whiten

00e9380

HarrisonSantiago added the enhancement New feature or request label Nov 23, 2024

HarrisonSantiago requested review from belsten and 9q9q November 23, 2024 16:49

HarrisonSantiago added 3 commits November 23, 2024 11:50

fix path

0211804

flint fix

2c58d49

flaking

2502a0d

This was linked to issues Nov 23, 2024

implement torchvision data transform #23

Closed

Image Whitening #68

Closed

edit to text

36a40b7

belsten requested changes Nov 27, 2024

View reviewed changes

updates to whitening, notebook edits

c4996a3

9q9q reviewed Nov 27, 2024

View reviewed changes

changing cholesky, added ability to return W, linting, added ref to n…

bb8cedc

…otebook, added tests

belsten approved these changes Nov 29, 2024

View reviewed changes

adding alex's whitening notebook, fixing warning

97ec9fe

HarrisonSantiago merged commit 08068b8 into main Dec 2, 2024
2 checks passed

HarrisonSantiago deleted the harrison/whiten branch December 2, 2024 23:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whitening Update #79

Whitening Update #79

HarrisonSantiago commented Nov 23, 2024

belsten commented Nov 27, 2024

belsten left a comment

HarrisonSantiago commented Nov 27, 2024

9q9q Nov 27, 2024

HarrisonSantiago Nov 27, 2024 •

edited

Loading

9q9q Nov 28, 2024

belsten Nov 28, 2024

HarrisonSantiago Nov 29, 2024

9q9q Nov 27, 2024

HarrisonSantiago Nov 29, 2024

9q9q Nov 27, 2024

HarrisonSantiago Nov 27, 2024

9q9q Nov 28, 2024

HarrisonSantiago Nov 29, 2024

9q9q Nov 27, 2024 •

edited

Loading

HarrisonSantiago Nov 29, 2024

Whitening Update #79

Whitening Update #79

Conversation

HarrisonSantiago commented Nov 23, 2024

belsten commented Nov 27, 2024

belsten left a comment

Choose a reason for hiding this comment

HarrisonSantiago commented Nov 27, 2024

Choose a reason for hiding this comment

HarrisonSantiago Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

9q9q Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HarrisonSantiago Nov 27, 2024 •

edited

Loading

9q9q Nov 27, 2024 •

edited

Loading