Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorflow 2.9 Upgrade #37

Open
charleslparker opened this issue Aug 4, 2022 · 1 comment
Open

Tensorflow 2.9 Upgrade #37

charleslparker opened this issue Aug 4, 2022 · 1 comment

Comments

@charleslparker
Copy link
Member

Starting with tensorflow 2.9.x, they've started using setting compiler flag _GLIBCXX_USE_CXX11_ABI by default, which is causing linker errors on the CI builds on github, but works for me locally on a mac, and works on the wintermute linux build server. Specifically, on the github CI builds, everything appears to build fine. But when we run pytest -sv tests/test_tree.py we get the following error:

  ==================================== ERRORS ====================================
  _____________________ ERROR collecting tests/test_tree.py ______________________
  tests/test_tree.py:9: in <module>
      import sensenet.importers
  sensenet/importers.py:42: in <module>
      bigml_tf_module = tensorflow.load_op_library(treelib[0])
  /tmp/tmp.8RK8W1x4vs/venv/lib/python3.8/site-packages/tensorflow/python/framework/load_library.py:54: in load_op_library
      lib_handle = py_tf.TF_LoadLibrary(library_filename)
  E   tensorflow.python.framework.errors_impl.NotFoundError: /tmp/tmp.8RK8W1x4vs/venv/lib/python3.8/site-packages/bigml_tf_tree.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK10tensorflow8OpKernel11TraceStringERKNS_15OpKernelContextEb

The problem here is that the custom tensorflow extension that deals with the internal trees sometimes generated by deepnets has been built with "old ABI" compaibility, whereas TF 2.9.x uses the "new ABI" (see https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html).

It's odd, because I verified that the correct flag gets passed to the compile step here (when using TF 2.9.x):

https://github.com/charleslparker/sensenet/blob/master/setup.py#L54

and purposely overriding it (by replacing the =1 with =0 for the flag in compile_args) causes the same test to break with other linker errors on my local and the linux server. So it's something strange going on with the compile step on github specifically. Maybe the dockers used by CIBuildWheel on github have an old version of libstdc++?

The exact linker error we get is documented here:
https://pgaleone.eu/tensorflow/bazel/abi/c++/2021/04/01/tensorflow-custom-ops-bazel-abi-compatibility/

where they say you have to rebuild tensorflow to fix it. I refuse to believe this!

Popping up the stack a bit; this op is only used when deepnets generate these internal trees (e.g., when "tree embedding = True" when you train a deepnet, or you do "Automatic structure search"). This extension has been such a pain so many times that maybe we should remove it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant