-
-
Notifications
You must be signed in to change notification settings - Fork 30k
-
-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regex \B doesn't match empty string #124130
Comments
The current behavior ( Lines 896 to 898 in aba42c0
Strictly speaking, Changing this behavior would technically be a feature request (not a bug), and would likely require a deprecation period. I'd recommend opening a thread on https://discuss.python.org/c/ideas/6 to see if there's community interest in changing this. For reference, JavaScript treats the empty string as containing a non-word boundary: >> /\b/.test("")
false
>> /\B/.test("")
true |
Another Perl reference: $ perl -E 'say "Boundary" if "" =~ /\b/; say "Non-boundary" if "" =~ /\B/;'
Non-boundary Seems that it is taking Perl as a baseline to some extent, as indicated in Lines 896 to 901 in 9017b95
However the comments really confuse me since the behavior for |
I think both comments apply to the |
This test was added in 5a045b9 (bpo-10713/gh-54922). It was not an assertion for the intended behavior, it was added to ensure that the current behavior would not change. Strictly speaking, the current behavior contradicts the documentation that says that Of course, it may be that the documentation is wrong. But taking into account that the current behavior differs from the behavior of many (if not all) other RE engines, that the code is most likely a copying error (it was not properly tested), that we already did several breaking changes related to zero-width matches in the past, and that it affects only very specific cases, I think that we can and should change this behavior. It is preferable to emit a FutureWarning first, but I do not know how difficult to do this without producing false positives. If it is too difficult or impossible, we will have no other way as to change the behavior without warning. |
We've been reluctant to make any changes to the |
Bug report
Bug description:
Apparently the empty string neither is nor isn't a word boundary. Is that supposed to happen? \B matches the empty string in every other language I can think of.
Online reproducer: https://godbolt.org/z/8q6fehss7
CPython versions tested on:
3.11, 3.12
Operating systems tested on:
Linux
Linked PRs
\B
#124133The text was updated successfully, but these errors were encountered: