Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Different MSVC versions may be ABI incompatible, guard with _MSC_VER (fix #2898) #4779

Merged
merged 1 commit into from
Aug 23, 2023

Conversation

pwuertz
Copy link
Contributor

@pwuertz pwuertz commented Aug 4, 2023

Description

Pybind11, currently by default, shares C++ data structures across DLLs built with different MSVC versions, for which there is no documented guarantee for ABI compatibility. This has caused difficult-to-track crashes on several occasions for quite some time now (#1562, #1618, #2898, onnx/onnx#3493, pytorch/pytorch#62090, pytorch/pytorch#84457, NVIDIA/TensorRT#2853, IntelRealSense/librealsense#8570).

This PR aims to mitigate the risk of an ABI mismatch in pybind11-module interaction by guarding against different MSVC compiler/lib/toolchain versions, without causing significant changes for pybind11 (see also discussion in #2898).

Suggested changelog entry:

* Guard against crashes/corruptions caused by modules built with different MSVC versions

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for writing the great PR description!

@rwgk rwgk requested a review from wjakob August 4, 2023 16:12
@rwgk
Copy link
Collaborator

rwgk commented Aug 4, 2023

@EthanSteinberg
Copy link
Collaborator

@rwgk Won't this break ABI between already compiled extensions and new extensions compiled under this change?

@rwgk
Copy link
Collaborator

rwgk commented Aug 5, 2023

@rwgk Won't this break ABI between already compiled extensions and new extensions compiled under this change?

Yes, but I believe it's the lesser of two evils.

Based on @pwuertz's input I got to believe the current situation is very bad. Peter listed 8 bugs. I assume those are only a tiny fraction of cases where the ABI incompatibility caused crashes or confusion. The sooner we fix it the better. I think it's unavoidable that initially the next release might break some workflows, but once it's baked in everybody will be better off.

@rwgk
Copy link
Collaborator

rwgk commented Aug 5, 2023

Oh! We probably should up the internals version for Windows at the same time.

@pwuertz
Copy link
Contributor Author

pwuertz commented Aug 5, 2023

Won't this break ABI between already compiled extensions and new extensions compiled under this change?

No, this PR doesn't change the ABI of pybind11 modules at all. It just changes the namespace in which modules are communicating in.

Or is your concern that any two independently compiled modules are now ending up in different namespaces and thus unexpectedly won't be sharing struct internals? Note that pybind11 never gave you a guarantee for this in the first place. There is always a chance for two independently authored, compiled and distributed modules to unknowingly or accidentally end up in different namespaces due to a change in someone's build infrastructure.

Pybind11 currently doesn't provide a way of declaring and asserting that for example a some_module_extra is critically dependent on some_module's internals. Shouldn't be that difficult do implement though. some_module_extra could use something like a PYBIND11_DEPENDS_ON("some_module") macro, which causes some_module_extra to import some_module when loaded and check for matching PYBIND11_INTERNALS_IDs or raise.

We probably should up the internals version for Windows at the same time.

If I understood its intent correctly, the PYBIND11_INTERNALS_VERSION section of the PYBIND11_INTERNALS_ID is a guarantee for the struct internals {} definition being the same for two modules. We didn't touch struct internals {} in this PR, so there should be no need for changing PYBIND11_INTERNALS_VERSION?

@rwgk
Copy link
Collaborator

rwgk commented Aug 5, 2023

To answer the simple question first:

If I understood its intent correctly, the PYBIND11_INTERNALS_VERSION section of the PYBIND11_INTERNALS_ID is a guarantee for the struct internals {} definition being the same for two modules. We didn't touch struct internals {} in this PR, so there should be no need for changing PYBIND11_INTERNALS_VERSION?

That's correct, but it's kind-of the other way around:

  • Currently PYBIND11_INTERNALS_VERSION defaults to 4 except for Python 3.12. We already have 5 for a while, which unlocks two new behaviors/features.
  • This PR is changing the PYBIND11_BUILD_ABI of which PYBIND11_INTERNALS_VERSION is one component. What really matters for cross-extension interop is PYBIND11_BUILD_ABI. Since that's changing anyway, we might just as well unlock the features behind PYBIND11_INTERNALS_VERSION 5.

More later.

@rwgk
Copy link
Collaborator

rwgk commented Aug 6, 2023

Or is your concern that any two independently compiled modules are now ending up in different namespaces and thus unexpectedly won't be sharing struct internals?

Yes. — That's what I had in mind with "lesser of two evils." — Currently we have a many-to-many mapping (MSVC version X to Y). A subset of those are "good", the rest is "bad". I don't think we have a reasonable way to tell those apart. The only choice we have is how quickly we transition from many-to-many to one-to-one.

Everything below is beyond the scope of this PR for sure.

Pybind11 currently doesn't provide a way of declaring and asserting that for example a some_module_extra is critically dependent on some_module's internals.

Hm ... it doesn't tell you what you need, only that you need something. In the case of stl.h conversions it tries to be more helpful:

E       TypeError: missing_header_arg(): incompatible function arguments. The following argument types are supported:
E           1. (arg0: std::vector<float, std::allocator<float> >) -> None
E
E       Invoked with: [1.0, 2.0, 3.0]
E
E       Did you forget to `#include <pybind11/stl.h>`? Or <pybind11/complex.h>,
E       <pybind11/functional.h>, <pybind11/chrono.h>, etc. Some automatic
E       conversions are optional and require extra headers to be included
E       when compiling your pybind11 module.

We could probably go a long way by pointing to a new doc section in that message, explaining the common situations. That would have no compile-time or runtime overhead.

Another approach might be to keep track of what types have registered conversions anywhere in the process, e.g. based on a simple python dict with key typeid(T) for all class_-wrapped types, value a Python list of all modules with converters. When the error above is generated, we know we don't have the conversions in that extension, but if we have a match in the dict we could append that information to the error message. That has compile-time and runtime overhead, but I think it would be worth it.

Since we're hitting on PYBIND11_INTERNALS_VERSION here, I want to add another thought: I believe we could do much better by using Python capsules to pass pointers between any pair of binding systems, not only different pybind11 versions, but also nanobind or even SWIG. The idea is super simple: we need a standardized key for the C++ ABI, and a raw pointer, std::shared_ptr, or std::unique_ptr. If the ABI key matches we know we can pass them around, no matter how they are wrapped. (We've been using capsules like that for years for pybind11, SWIG, PyCLIF interop, although we don't require ABI keys because we build everything from sources Google-internally.) I think unshackling us from the PYBIND11_INTERNALS_VERSION will open up a lot of creative potential, without dragging the community into excessive copy-pasting of Python bindings.

wjakob added a commit to wjakob/nanobind that referenced this pull request Aug 15, 2023
Copy link
Collaborator

@henryiii henryiii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunate (and would love it if we could come up with something looser), but it's what we already do, and ppl can override if they know what they are doing.

wjakob added a commit to wjakob/nanobind that referenced this pull request Aug 23, 2023
@rwgk
Copy link
Collaborator

rwgk commented Aug 23, 2023

Thanks @henryiii and @wjakob for supporting/adopting this change.
I'll merge this here, than make a separate PR to bump the PYBIND11_INTERNALS_VERSION for MSVC (see #4779 (comment)).

@rwgk rwgk merged commit 76b8858 into pybind:master Aug 23, 2023
82 checks passed
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Aug 23, 2023
rwgk pushed a commit to rwgk/pybind11 that referenced this pull request Aug 26, 2023
rwgk pushed a commit that referenced this pull request Aug 30, 2023
@isuruf
Copy link
Contributor

isuruf commented Nov 28, 2023

This is wrong. I looked at most of the issues mentioned and it is about mixing MSVC modules built with /MT vs modules built with /MD. Please add a guard of checking _MT macro. See https://learn.microsoft.com/en-us/cpp/build/reference/md-mt-ld-use-run-time-library?view=msvc-170

@rwgk
Copy link
Collaborator

rwgk commented Nov 28, 2023

This is wrong.

Well, sorry. What we need is constructive feedback, concretely a PR with how to do it right. pybind11 is a community project.

Also, maybe you meant: this isn't complete enough yet?

@rwgk
Copy link
Collaborator

rwgk commented Nov 28, 2023

This is wrong.

Well, sorry. What we need is constructive feedback, concretely a PR with how to do it right. pybind11 is a community project.

Also, maybe you meant: this isn't complete enough yet?

Oh you did: #4953

Looking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants