-
-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constructing GCXS from non-canonical scipy.sparse.csr_matrix
results in wrong results
#602
Comments
Thanks for the detailed report! Since this is silent invalid results, I'd be inclined to fix it soon. |
Hello, what exactly do you mean by canonical here? Can this be reproduced without |
It can be reproduced if the data within a row is not sorted by column index. |
See https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.has_canonical_format.html#scipy.sparse.csr_matrix.has_canonical_format for the definition of "canonical"! |
This code is a compact reproducer without cupy: https://github.com/LiberTEM/sparseconverter/blob/4cfc0ee2ad4c37b07742db8f3643bcbd858a4e85/src/sparseconverter/__init__.py#L154-L183 |
Describe the bug
GCXS seems to require a canonical CSR data structure, i.e. column indices sorted within a row and without duplicates. Slicing a GCXS array with non-canonical internal data structure gives wrong results. The requirement for canonical data structures is not documented: https://sparse.pydata.org/en/stable/generated/sparse.GCXS.html
scipy.sparse.csr_matrix
doesn't require its data structure to be canonical. The GCXS constructor doesn't seem to check if a passedscipy.sparse.csr_matrix
is canonical. In effect, constructing a GCXS array from a perfectly validscipy.sparse.csr_matrix
may create a broken GCXS array which gives wrong numerical results. Using the explicitGCXS.from_scipy_sparse()
gives the same behavior.To Reproduce
This example uses
cupyx.scipy.sparse.csr_matrix
since that produces non-canonical CSR data structures when constructed from dense arrays. This highlights the severity of this bug where normal use of the APIs of these packages gives numerically wrong results without causing any warning or error. In my understanding the bug is mostly in GCXS sincescipy.sparse.csr_matrix
andcupyx.scipy.sparse.csr_matrix
work perfectly fine with non-canonical data structures.Edit: Include output
Expected behavior
The expected behavior can be split into two aspects, of which one is preventing a bug with serious impact on downstream code, and the other a feature request.
First, constructing a GCXS array from a valid
scipy.sparse.csr_matrix
that GCXS can't handle correctly should throw an error instead of producing a GCXS array that gives wrong results.Second, it would be good if GCXS could be constructed successfully from a non-canonical
scipy.sparse.csr_matrix
.System
sparse
version (sparse.__version__
'0.14.0')np.__version__
'1.24.4')numba.__version__
'0.57.1')cupy.__version__
'12.1.0')Additional context
At LiberTEM/sparseconverter#14 I am currently working on tests and workarounds. Perhaps the test matrix in https://github.com/LiberTEM/sparseconverter/blob/main/tests/test_sparseconverters.py could be useful here?
The text was updated successfully, but these errors were encountered: