Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add utf8_hop functions that return error detection #22689

Merged
merged 6 commits into from
Oct 28, 2024

Conversation

khwilliamson
Copy link
Contributor

These commits add versions of the utf8_hop functions that return extra information that can tell the caller if the request could not be completely filled. This will help simplify several portions of the perl core.

The function names end with _overshoot that indicate to return how far the function call would have overshot the edge of the string if that had been allowed to happen. I'm open to a better name, but I didn't find one in a thesaurus.

  • This set of changes requires a perldelta entry, and I need help writing it.

Once this is settled, I can write the perldelta

Prior to this commit these were illegal.  This allows it when there is
no pesky thread context in the way.

This causes embed.fnc to generate macro 'Perl_foo' #defined to be  macro
'foo'.  This could be used to easily convert existing macros into having
long names should that become a name space pollution problem.

But more immediately, we have scattered around, various one line
functions that simply call something else.  Those were typically created
to preserve the pre-existing API should someone be using the long name,
and the implementation changed.

This commit allows those (that don't use thread context) to be
conveniently replaced by a macro.  The next couple commits will do that
for a couple of them.
This is like plain utf8_hop_back() except it returns how many
characters the request would have overshot the edge if it had been
allowed to go beyond the edge.

This allows the caller to do error handling.

The code has to be changed to be more careful (than before this commit)
with counting the actual number of characters consumed in the hop.
There is a subtle difference here with existing behavior.  Most of the
time a zero hop already does nothing; but if the initial conditions were
that we were starting the hop past the edge, a runtime error was raised,
even though the action was a no-op.  Its arguable what to do in this
case, but I believe the new behavior is more correct, and it paves the
way for future commits where it is more clearly more correct.

This adds a conditional and indents the code within the new block,
removing now-redundant conditionals.
This is in preparation for a future commit where we will need to do
finish-up work before returning.
This is like plain utf8_hop_forward() except it returns how many
characters the request would have overshot the edge if it had been
allowed to go beyond the edge.

This allows the caller to do error handling.

The code has to be changed to be more careful (than before this commit)
with counting the actual number of characters consumed in the hop.
This is like utf8_hop_safe(), but also returns the number of characters
that would have overshot the edge if it had been allowed to go beyond
the edge
@khwilliamson khwilliamson merged commit 9b82965 into Perl:blead Oct 28, 2024
33 of 34 checks passed
@khwilliamson khwilliamson deleted the overshoot branch October 28, 2024 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant