Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What does compressed_axes = None mean? #535

Open
hameerabbasi opened this issue Jan 6, 2022 · 4 comments
Open

What does compressed_axes = None mean? #535

hameerabbasi opened this issue Jan 6, 2022 · 4 comments
Assignees

Comments

@hameerabbasi
Copy link
Collaborator

I was going through the GCXS code trying to fix bugs when I came across a few places where compressed_axes was being set to None. That, in my opinion, should never happen. It should always be a tuple of integers.

@daletovar Could you specify the context here, and what could be needed to remove this?

@daletovar
Copy link
Collaborator

There are a few cases where compressed_axes is set to None. In the case of 0- or 1-dimensional GCXS array, there are no axes to compress so we set compressed_axes to None. In the constructor and many of the array methods like resize and reshape, compressed_axes defaults to None. This is a way for the user to indicate that they don't have a preference. This does not mean that the resulting array will have arr.compressed_axes is None. Instead, we generally do something like the following:

if compressed_axes is None:
    compressed_axes = (np.argmin(shape),) # use compressed_axes that results in best compression

I think that is makes sense to have a default value for the cases that users don't have a preference. I used None for that value.

On the one hand I don't think it's really an issue because any array with ndim > 1 is going to have a compressed_axes that is a tuple of integers:

>>> a = sparse.random((5,6,7), format='gcxs', compressed_axes=None) 
>>> a.compressed_axes
(0,)

On the other hand I can see how such behavior could be confusing. In terms of removing it, I think that it should stay for 0- and 1-dimensional arrays. For higher dimensional arrays, we could pick a different default value - something like compressed_axes='opt' to indicate compressed_axes with the optimal compression.

@hameerabbasi
Copy link
Collaborator Author

There are a few cases where compressed_axes is set to None. In the case of 0- or 1-dimensional GCXS array, there are no axes to compress so we set compressed_axes to None.

I think this may mislead people, and make it hard to get at the underlying format. Would compressed_axes in this case be () or (0,)? I think you mean there's very little choice in how things are stored, but that doesn't necessarily mean that there isn't a valid representation in terms of tuples of integers.

In the constructor and many of the array methods like resize and reshape, compressed_axes defaults to None. This is a way for the user to indicate that they don't have a preference. This does not mean that the resulting array will have arr.compressed_axes is None.

This seems reasonable to me, as long as there is a concrete compressed_axes specified in the underlying array.

I think that it should stay for 0- and 1-dimensional arrays.

As said above we could use either () or (0,) as appropriate.

For higher dimensional arrays, we could pick a different default value - something like compressed_axes='opt' to indicate compressed_axes with the optimal compression.

I think None is a perfectly reasonable value to indicate "I don't have a preference here."

@daletovar
Copy link
Collaborator

Okay, that makes sense to me. I like () for 0- and 1-dimensional arrays.

@mangecoeur
Copy link
Contributor

Just bumped into this, there are cases e.g. in to_scipy_sparse that assume non-None compressed axes, but a GCXS array can still be constructed with compressed_axes=None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants