You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I installed pyarrow via pip without specifying a concrete version (which was my fault). Another requirement forced pip to download pyarrow 17. At the same time, I installed libarrow 18 via apt in my build container. I have a custom C++ Python extension which via CMake got compiled against the system libarrow 18. In Python, I read a pyarrow table, passed this to the C++ extension, and then on GetColumnByName my application segfaulted. It took me a bit to realize that I have a version mismatch between python and C++, I presume the memory layouts of the tables in memory are a bit different, which probably caused the segfault. Now it is all working fine again.
I wonder whether there should be something like a magic version byte that gets updated when the in-memory layout changes. This way, I could have avoided debugging this and gotten a better error message instead. While this might not be a common problem, it could help avoid issues like the segfault I encountered.
Component(s)
C++, Python
The text was updated successfully, but these errors were encountered:
Hi @MaxiBoether and thanks for taking the time to write this up.
While passing a Table from Python to a C/C++ extension can work (as you've found), it comes with the downside you ran into here. The preferred way to share Arrow structures is by using the C Data Interface which is ABI-stable. See #36274 for more discussion on this topic too. Could that work for your use case?
amoeba
changed the title
Add a version byte to tables
[C++][Python] Add a version byte to tables
Jan 22, 2025
Describe the enhancement requested
I installed
pyarrow
via pip without specifying a concrete version (which was my fault). Another requirement forced pip to download pyarrow 17. At the same time, I installed libarrow 18 via apt in my build container. I have a custom C++ Python extension which via CMake got compiled against the system libarrow 18. In Python, I read a pyarrow table, passed this to the C++ extension, and then onGetColumnByName
my application segfaulted. It took me a bit to realize that I have a version mismatch between python and C++, I presume the memory layouts of the tables in memory are a bit different, which probably caused the segfault. Now it is all working fine again.I wonder whether there should be something like a magic version byte that gets updated when the in-memory layout changes. This way, I could have avoided debugging this and gotten a better error message instead. While this might not be a common problem, it could help avoid issues like the segfault I encountered.
Component(s)
C++, Python
The text was updated successfully, but these errors were encountered: