Skip to content

[Docs] Improve Security Considerations Documentation - include pointer to validation functions for validating IPC streams. #49241

@rustyconover

Description

@rustyconover

Describe the bug, including details regarding any error messages, version, and platform.

Hi Arrow Friends,

I was pointed to: https://arrow.apache.org/docs/dev/format/Security.html#ipc-format on the Arrow Community call today.

This is a great document.

In this section:

Advice for users

Arrow libraries will typically ensure IPC streams are structurally valid but may not also validate the underlying Array data. It is extremely recommended that you use the appropriate APIs to validate the Arrow data read from an untrusted IPC stream.

As a reasonably experienced Arrow C++/PyArrow user, I didn't know what APIs were referenced here.

It seems like this text is talking about these methods:

https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html#pyarrow.RecordBatch.validate
https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.validate

Are those sufficient for the validation?

Would it be a good idea to add an always_validate flag to the IpcReadOptions when dealing with untrusted data sources?

https://arrow.apache.org/docs/python/generated/pyarrow.ipc.IpcReadOptions.html#pyarrow.ipc.IpcReadOptions

Thank for your consideration,

Rusty

Component(s)

Documentation

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions