Describe the bug, including details regarding any error messages, version, and platform.
Hi Arrow Friends,
I was pointed to: https://arrow.apache.org/docs/dev/format/Security.html#ipc-format on the Arrow Community call today.
This is a great document.
In this section:
Advice for users
Arrow libraries will typically ensure IPC streams are structurally valid but may not also validate the underlying Array data. It is extremely recommended that you use the appropriate APIs to validate the Arrow data read from an untrusted IPC stream.
As a reasonably experienced Arrow C++/PyArrow user, I didn't know what APIs were referenced here.
It seems like this text is talking about these methods:
https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html#pyarrow.RecordBatch.validate
https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.validate
Are those sufficient for the validation?
Would it be a good idea to add an always_validate flag to the IpcReadOptions when dealing with untrusted data sources?
https://arrow.apache.org/docs/python/generated/pyarrow.ipc.IpcReadOptions.html#pyarrow.ipc.IpcReadOptions
Thank for your consideration,
Rusty
Component(s)
Documentation