Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine the conversion of UTF-8 to bytes into a single base function #22703

Closed
wants to merge 18 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion autodoc.pl
Original file line number Diff line number Diff line change
Expand Up @@ -1899,7 +1899,7 @@ ($fh, $section_name, $element_name, $docref)
# Here, has a long name and we didn't create one just
# above. Check that there really is a long name entry.
my $real_proto = delete $protos{"Perl_$name"};
if ($real_proto) {
if ($real_proto || $flags =~ /m/) {

# Set up to redo the loop at the end. This iteration
# adds the short form; the redo causes its long form
Expand Down
135 changes: 84 additions & 51 deletions embed.fnc
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@
: real (full) name, with any appropriate thread context paramaters, thus hiding
: that detail from the typical code.
:
: Most macros (as opposed to functions) listed here are the complete full name.
: Many macros (as opposed to functions) listed here are the complete full name,
: though we may want to start converting those to have full names.
:
: All non-static functions defined by perl need to be listed in this file.
: embed.pl uses the entries here to construct:
Expand Down Expand Up @@ -157,8 +158,9 @@
: and you know at a glance that the macro actually has documentation. It
: doesn't by itself create any documentation; instead the other apidoc lines
: pull in information specified by these lines. Many of the lines in this
: file for macros could be pulled out of here and replaced by these lines
: throughout the source. It is a goal to do that as convenience dictates.
: file for macros that don't also have the 'p' flag (described below) could be
: pulled out of here and replaced by these lines throughout the source. It is
: a goal to do that as convenience dictates.
:
: The other apidoc lines either have the usage data as part of the line, or
: pull in the data from this file or apidoc_defn lines.
Expand Down Expand Up @@ -210,10 +212,10 @@
: line that begins with an '='. In particular, an '=cut' line ends that
: documentation without introducing something new.
:
: Various macros and other elements aren't listed here in embed.fnc. They are
: documented in the same manner, but since they don't have this file to get
: information from, the defining lines have the syntax and meaning they do in
: this file, so it can be specified:
: Various macros and other elements aren't listed here in embed.fnc (though
: they could be). They are documented in the same manner, but since they don't
: have this file to get information from, the defining lines have the syntax
: and meaning they do in this file, so it can be specified:
:
: =for apidoc flags|return_type|name|arg1|arg2|...|argN
: =for apidoc_item flags|return_type|name|arg1|arg2|...|argN
Expand Down Expand Up @@ -301,22 +303,21 @@
: functions flagged with this, the installation can run Configure with
: the -Accflags='-DNO_MATHOMS' parameter to not even compile them.
:
: Sometimes the function has been subsumed by a more general one (say,
: by adding a flags parameter), and a macro exists with the original
: short name API, and it calls the new function, bypassing this one, and
: the original 'Perl_' form is being deprecated. In this case also
: specify the 'M' flag.
: If the function can be implemented as a macro (that evaluates its
: arguments exactly once), use the 'm' and 'p' flags together to implement
: this. (See the discussion under 'm'.) Another option for this is to
: use the 'M' flag.
:
: Without the M flag, these functions should be deprecated, and it is an
: error to not also specify the 'D' flag.
: Without the m or M flags, these functions should be deprecated, and it
: is an error to not also specify the 'D' flag.
:
: The 'b' functions are normally moved to mathoms.c, but if
: circumstances dictate otherwise, they can be anywhere, provided the
: whole function is wrapped with
:
: #ifndef NO_MATHOMS
: ...
: #endif
: #ifndef NO_MATHOMS
: ...
: #endif
:
: Note that this flag no longer automatically adds a 'Perl_' prefix to
: the name. Additionally specify 'p' to do that.
Expand Down Expand Up @@ -370,10 +371,10 @@
: then it is assumed to take a strftime-style format string as the 1st
: arg; otherwise it's assumed to take a printf style format string, not
: necessarily the 1st arg. All the arguments following the second form
: (including possibly '...') are assumed to be for the format.
: (including possibly '...') are assumed to be for the format.
:
: embed.h: any entry in here for the second form is suppressed because
: of varargs
: of varargs
: proto.h: add __attribute__format__ (or ...null_ok__)
:
: 'F' Function has a '...' parameter, but don't assume it is a format. This
Expand All @@ -396,7 +397,7 @@
: one NN argument.
:
: proto.h: PERL_ARGS_ASSERT macro is not defined unless the function
: has NN arguments
: has NN arguments
:
: 'h' Hide any documentation that would normally go into perlapi or
: perlintern. This is typically used when the documentation is actually
Expand Down Expand Up @@ -427,7 +428,7 @@
: particular C file(s) or in the perl core.) Therefore, all non-guarded
: functions should also have the 'p' flag specified to avoid polluting
: the XS code name space. Otherwise, this flag also turns on the 'S'
: flag.
: flag.
:
: proto.h: function is declared as PERL_STATIC_INLINE
:
Expand All @@ -439,23 +440,46 @@
: __attribute__always_inline__ is added
:
: 'm' Implemented as a macro; there is no function associated with this
: name, and hence no long Perl_ or S_ name. However, if the macro name
: itself begins with 'Perl_', autodoc.pl will show a thread context
: parameter unless the 'T' flag is specified.
: name. There is no long S_ name.
:
: However, you may #define the macro with a long name like 'Perl_foo',
: and specify the 'p' flag. This will cause an embed.h entry to be
: created that #defines 'foo' as 'Perl_foo'. This can be used to make
: any macro have a long name, perhaps to avoid name collisions. It is
: particularly useful tp preserve backward compatibility when a function
: is converted to be a macro. Most of mathoms.c could be converted to
: use this facility. When there is no thread context involved, you just
: do something like
:
: #define Perl_foo(a, b, c) Perl_bar(a, b, 0, c)
:
: Otherwise consider this general case where there is a series of macros
: that build on the previous ones by calling something with a different
: name or with an extra parameter beyond what the previous one did:
:
: #define Perl_foo(mTHX, a) Perl_bar1(aTHX, a)
: #define Perl_bar1(mTHX, a) Perl_bar2(aTHX, a, 0)
: #define Perl_bar2(mTHX, a, b) Perl_bar3(aTHX, a, b, 0)
: #define Perl_bar3(mTHX, a, b, c) Perl_func(aTHX_ a, b, c, 0)
:
: Use the formal parameter name 'mTHX,' (which stands for "macro thread
: context") as the first in each macro definition, and call the next
: macro in the sequence with 'aTHX,' (Note the commas). Eventually, the
: sequence will end with a function call (or else there would be no need
: for thread context). For that instead call it with 'aTHX_' (with an
: underscore instead of a comma).
:
: suppress proto.h entry (actually, not suppressed, but commented out)
: suppress entry in the list of exported symbols available on all platforms
: suppress embed.h entry, as the implementation should furnish the macro
: suppress entry in the list of exported symbols available on all
: platforms
: suppress embed.h entry (when no 'p' flag), as the implementation
: should furnish the macro
:
: 'M' The implementation is furnishing its own macro instead of relying on
: the automatically generated short name macro (which simply expands to
: call the real name function). One reason to do this is if the
: parameters need to be cast from what the caller has, or if there is a
: macro that bypasses this function (whose long name is being retained
: for backward compatibility for those who call it with that name). An
: example is when a new function is created with an extra parameter and
: a wrapper macro is added that has the old API, but calls the new one
: with the exta parameter set to a default.
: parameters need to be cast from what the caller has. There is less
: need to do this now that 'm' and 'p' together is supported.
:
: This flag requires the 'p' flag to be specified, as there would be no
: need to do this if the function weren't publicly accessible before.
Expand Down Expand Up @@ -489,10 +513,10 @@
:
: 'o' Has no Perl_foo or S_foo compatibility macro:
:
: This is used for whatever reason to force the function to be called
: with the long name. Perhaps there is a varargs issue. Use the 'M'
: flag instead for wrapper macros, and legacy-only functions should
: also use 'b'.
: This is used for whatever reason to force the function to be called
: with the long name. Perhaps there is a varargs issue. Use the 'M'
: or 'm' flags instead for wrapper macros, and legacy-only functions
: should also use 'b'.
:
: embed.h: suppress "#define foo Perl_foo"
:
Expand All @@ -517,9 +541,10 @@
:
: proto.h: add __attribute__pure__
:
: 'p' Function in source code has a Perl_ prefix:
: 'p' Function or macro in source code has a Perl_ prefix:
:
: proto.h: function is declared as Perl_foo rather than foo
: proto.h: function or macro is declared as Perl_foo rather than foo
: (though the entries for macros will be commented out)
: embed.h: "#define foo Perl_foo" entries added
:
: 'R' Return value must not be ignored (also implied by 'a' and 'P' flags):
Expand All @@ -543,8 +568,8 @@
:
: 's' Static function, but function in source code has a Perl_ prefix:
:
: This is used for functions that have always had a Perl_ prefix, but
: have been moved to a header file and declared static.
: This is used for functions that have always had a Perl_ prefix, but
: have been moved to a header file and declared static.
:
: proto.h: function is declared as Perl_foo rather than foo
: STATIC is added to declaration;
Expand Down Expand Up @@ -579,11 +604,11 @@
: compatibility issues.
:
: 'W' Add a comma_pDEPTH argument to function prototypes, and a comma_aDEPTH
: argument to the function calls. This means that under DEBUGGING a
: depth argument is added to the functions, which is used for example by
: the regex engine for debugging and trace output. A non DEBUGGING build
: will not pass the unused argument. Currently restricted to functions
: with at least one argument.
: argument to the function calls. This means that under DEBUGGING a
: depth argument is added to the functions, which is used for example by
: the regex engine for debugging and trace output. A non DEBUGGING build
: will not pass the unused argument. Currently restricted to functions
: with at least one argument.
:
: 'X' Explicitly exported:
:
Expand Down Expand Up @@ -762,14 +787,9 @@ Adp |int |bytes_cmp_utf8 |NN const U8 *b \
|STRLEN blen \
|NN const U8 *u \
|STRLEN ulen
AMdp |U8 * |bytes_from_utf8|NN const U8 *s \
Adp |U8 * |bytes_from_utf8|NN const U8 *s \
|NN STRLEN *lenp \
|NN bool *is_utf8p
CTdp |U8 * |bytes_from_utf8_loc \
|NN const U8 *s \
|NN STRLEN *lenp \
|NN bool *is_utf8p \
|NULLOK const U8 **first_unconverted
Adp |U8 * |bytes_to_utf8 |NN const U8 *s \
|NN STRLEN *lenp
AOdp |SSize_t|call_argv |NN const char *sub_name \
Expand Down Expand Up @@ -3657,6 +3677,19 @@ CDbdp |UV |utf8n_to_uvuni |NN const U8 *s \
|U32 flags
Adpx |U8 * |utf8_to_bytes |NN U8 *s \
|NN STRLEN *lenp
Cp |PL_utf8_to_bytes_ret|utf8_to_bytes_ \
|NN U8 **s_ptr \
|NN STRLEN *lenp \
|Perl_utf8_to_bytes_arg result_as
Admp |PL_utf8_to_bytes_ret|utf8_to_bytes_new_pv \
|NN U8 const **s_ptr \
|NN STRLEN *lenp
Admp |PL_utf8_to_bytes_ret|utf8_to_bytes_overwrite \
|NN U8 **s_ptr \
|NN STRLEN *lenp
Admp |PL_utf8_to_bytes_ret|utf8_to_bytes_temp_pv \
|NN U8 const **s_ptr \
|NN STRLEN *lenp
EMXp |U8 * |utf16_to_utf8 |NN U8 *p \
|NN U8 *d \
|Size_t bytelen \
Expand Down
6 changes: 5 additions & 1 deletion embed.h
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@
# define block_gimme() Perl_block_gimme(aTHX)
# define block_start(a) Perl_block_start(aTHX_ a)
# define bytes_cmp_utf8(a,b,c,d) Perl_bytes_cmp_utf8(aTHX_ a,b,c,d)
# define bytes_from_utf8_loc Perl_bytes_from_utf8_loc
# define bytes_from_utf8(a,b,c) Perl_bytes_from_utf8(aTHX_ a,b,c)
# define bytes_to_utf8(a,b) Perl_bytes_to_utf8(aTHX_ a,b)
# define call_argv(a,b,c) Perl_call_argv(aTHX_ a,b,c)
# define call_atexit(a,b) Perl_call_atexit(aTHX_ a,b)
Expand Down Expand Up @@ -791,6 +791,10 @@
# define utf8_hop_safe Perl_utf8_hop_safe
# define utf8_length(a,b) Perl_utf8_length(aTHX_ a,b)
# define utf8_to_bytes(a,b) Perl_utf8_to_bytes(aTHX_ a,b)
# define utf8_to_bytes_(a,b,c) Perl_utf8_to_bytes_(aTHX_ a,b,c)
# define utf8_to_bytes_new_pv(a,b) Perl_utf8_to_bytes_new_pv(aTHX,a,b)
# define utf8_to_bytes_overwrite(a,b) Perl_utf8_to_bytes_overwrite(aTHX,a,b)
# define utf8_to_bytes_temp_pv(a,b) Perl_utf8_to_bytes_temp_pv(aTHX,a,b)
# define utf8_to_uvchr_buf_helper(a,b,c) Perl_utf8_to_uvchr_buf_helper(aTHX_ a,b,c)
# define utf8n_to_uvchr_msgs Perl_utf8n_to_uvchr_msgs
# define uvoffuni_to_utf8_flags_msgs(a,b,c,d) Perl_uvoffuni_to_utf8_flags_msgs(aTHX_ a,b,c,d)
Expand Down
10 changes: 5 additions & 5 deletions mathoms.c
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,15 @@
/*
* This file contains mathoms, various binary artifacts from previous
* versions of Perl which we cannot completely remove from the core
* code. There are two reasons functions should be here:
* code. There is only one reason these days for functions should be here:
*
* 1) A function has been replaced by a macro within a minor release,
* so XS modules compiled against an older release will expect to
* still be able to link against the function
* 2) A function Perl_foo(...) with #define foo Perl_foo(aTHX_ ...)
* has been replaced by a macro, e.g. #define foo(...) foo_flags(...,0)
* but XS code may still explicitly use the long form, i.e.
* Perl_foo(aTHX_ ...)
*
* It used to be that this was the way to handle the case were a function
* Perl_foo(...) had been replaced by a macro. But see the 'm' flag discussion
* in embed.fnc for a better way to handle this.
*
* This file can't just be cleaned out periodically, because that would break
* builds with -DPERL_NO_SHORT_NAMES
Expand Down
19 changes: 14 additions & 5 deletions proto.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

24 changes: 15 additions & 9 deletions regen/embed.pl
Original file line number Diff line number Diff line change
Expand Up @@ -137,11 +137,14 @@ sub generate_proto_h {

die_at_end "$plain_func: S and p flags are mutually exclusive"
if $flags =~ /S/ && $flags =~ /p/;
die_at_end "$plain_func: m and $1 flags are mutually exclusive"
if $has_mflag && $flags =~ /([pS])/;

die_at_end "$plain_func: u flag only usable with m" if $flags =~ /u/
&& ! $has_mflag;
if ($has_mflag) {
if ($flags =~ /S/) {
die_at_end "$plain_func: m and S flags are mutually exclusive";
}
}
else {
die_at_end "$plain_func: u flag only usable with m" if $flags =~ /u/;
}

my ($static_flag, @extra_static_flags)= $flags =~/([SsIi])/g;

Expand Down Expand Up @@ -500,9 +503,9 @@ sub embed_h {
my $ind= $level ? " " : "";
$ind .= " " x ($level-1) if $level>1;
my $inner_ind= $ind ? " " : " ";
unless ($flags =~ /[omM]/) {
if ($flags !~ /[omM]/ or ($flags =~ /m/ && $flags =~ /p/)) {
my $argc = scalar @$args;
if ($flags =~ /T/) {
if ($flags =~ /[T]/) {
my $full_name = full_name($func, $flags);
next if $full_name eq $func; # Don't output a no-op.
$ret = indent_define($func, $full_name, $ind);
Expand All @@ -525,8 +528,11 @@ sub embed_h {
$use_va_list ? ("__VA_ARGS__") : ());
$ret = "#${ind}define $func($paramlist) ";
add_indent($ret,full_name($func, $flags) . "(aTHX");
$ret .= "_ " if $replacelist;
$ret .= $replacelist;
if ($replacelist) {
$ret .= ($flags =~ /m/) ? "," : "_ ";
$ret .= $replacelist;
}

if ($flags =~ /W/) {
if ($replacelist) {
$ret .= " comma_aDEPTH";
Expand Down
Loading
Loading