Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Python] Add a version byte to tables #45277

Open
MaxiBoether opened this issue Jan 16, 2025 · 1 comment
Open

[C++][Python] Add a version byte to tables #45277

MaxiBoether opened this issue Jan 16, 2025 · 1 comment

Comments

@MaxiBoether
Copy link

MaxiBoether commented Jan 16, 2025

Describe the enhancement requested

I installed pyarrow via pip without specifying a concrete version (which was my fault). Another requirement forced pip to download pyarrow 17. At the same time, I installed libarrow 18 via apt in my build container. I have a custom C++ Python extension which via CMake got compiled against the system libarrow 18. In Python, I read a pyarrow table, passed this to the C++ extension, and then on GetColumnByName my application segfaulted. It took me a bit to realize that I have a version mismatch between python and C++, I presume the memory layouts of the tables in memory are a bit different, which probably caused the segfault. Now it is all working fine again.

I wonder whether there should be something like a magic version byte that gets updated when the in-memory layout changes. This way, I could have avoided debugging this and gotten a better error message instead. While this might not be a common problem, it could help avoid issues like the segfault I encountered.

Component(s)

C++, Python

@amoeba
Copy link
Member

amoeba commented Jan 17, 2025

Hi @MaxiBoether and thanks for taking the time to write this up.

While passing a Table from Python to a C/C++ extension can work (as you've found), it comes with the downside you ran into here. The preferred way to share Arrow structures is by using the C Data Interface which is ABI-stable. See #36274 for more discussion on this topic too. Could that work for your use case?

@amoeba amoeba changed the title Add a version byte to tables [C++][Python] Add a version byte to tables Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants