-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate returning ndarray
when lazy_load=False
#1787
Comments
+1, I have a use case where this is important for validation on read: arr = np.arange(10)
af = asdf.AsdfFile({"array": np.arange(10)})
af.write_to("array.asdf")
with asdf.open("array.asdf") as af:
assert isinstance(af['array'], type(arr)), f"{type(af['array'])} is not {type(arr)}"
# AssertionError: <class 'asdf.tags.core.ndarray.NDArrayType'> is not <class 'numpy.ndarray'> Although I would not implement validation like this, this is typically the assumption of serializers (e.g., Pydantic). |
Would a type union work? |
Yes it would, but it exposes If there are other lazy types, it won't scale well if I have to union every lazy-able tags. |
Thanks for bringing up the point about using a separate lazy type having scalability issues. I am currently hopeful that the |
Thanks for the suggestion, I agree lazy tree would work but the lazy benefits are lost with pydantic. Much appreciate the informative response at #1789 (comment). My takeaway is Pydantic validates on instantiation, instantiation occurs on read, ndarray tagged objects gets converted to NDArrayType, and NDArrayType is not a subclass of ndarray. If we can change the last part—NDArrayType is a subclass of ndarray— then it wouldn't matter if I had lazy loading on/off. If NDArrayType is a subclass of ndarray, then these two scenarios shows where it is appropriate for users to think about laziness. If written with ndarray, the class read in must be a subclass of ndarrayaf = asdf.AsdfFile({"array": np.arange(10)})
af.write_to("array.asdf")
with asdf.open("array.asdf") as af:
assert isinstance(af['array'], np.ndarray) Users never think about laziness, could use NDArrayType as if it's ndarray. May copy an ndarray from one tree to another, but lazy objects must be materialized before writingcopy_af = asdf.AsdfFile()
with asdf.open("array.asdf") as af:
array = af.tree['array'] # BAD is NDArrayType causes OSError: ASDF file has already been closed
assert isinstance(af['array'], np.ndarray)
array = af.tree['array'].copy() # OK is ndarray
assert isinstance(af['array'], np.ndarray)
copy_af.tree['array'] = array
copy_af.write_to('copy_array.asdf') Current behavior raises OSError on the |
Thanks for the suggestion. I don't think making Would you give this branch a test to see if it works with your pydantic code (without any type union for It is far from "complete" as a solution and constitutes a breaking change (so wouldn't be possible until asdf 4, coming this fall at the soonest) but I would be curious to hear if it makes the pydantic integration easier. |
Sorry! I did not realize I didn't update this. This change does pass this test This test in v3 would've failed at |
After spending quite a bit of time in this, I'm convinced some sort of union types is needed rather than forcing users to lazy load to guarantee an ndarray is returned. Would it be appropriate for this asdf project to maintain a list of types (e.g., |
The main purpose of
NDArrayType
to to allow "lazy loading" of array data. When this feature is enabled, the converter "realizes" the underlying array but still returns aNDArrayType
:asdf/asdf/_core/_converters/ndarray.py
Lines 187 to 189 in 019c6a4
Investigate if returning the
ndarray
(instead of theNDArrayType
) causes any downstream issues.The text was updated successfully, but these errors were encountered: