-
Notifications
You must be signed in to change notification settings - Fork 88
Description
Describe the bug
When initializing a basic_charset
with character arrays such as those returned by digits()
and others, the last element of the array is incorrectly discarded.
StringZilla/include/stringzilla/stringzilla.hpp
Lines 192 to 195 in 152ed04
inline carray<10> const &digits() noexcept { | |
static carray<10> const all = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'}; | |
return all; | |
} |
This is due to the use of count_characters - 1
in the loop that populates the basic_charset
bitset.
StringZilla/include/stringzilla/stringzilla.hpp
Lines 283 to 289 in 152ed04
template <std::size_t count_characters> | |
explicit basic_charset(char_type const (&chars)[count_characters]) noexcept : basic_charset() { | |
static_assert(count_characters > 0, "Character array cannot be empty"); | |
for (std::size_t i = 0; i < count_characters - 1; ++i) { // count_characters - 1 to exclude the null terminator | |
char_type c = chars[i]; | |
bitset_._u64s[sz_bitcast(sz_u8_t, c) >> 6] |= (1ull << (sz_bitcast(sz_u8_t, c) & 63u)); | |
} |
While this prevents including a null terminator for string literals like sz::char_set("x")
, it causes incorrect behavior when handling character arrays that do not have a null terminator, resulting in the exclusion of the final character.
I believe this local block is the full extent of the affected code.
StringZilla/include/stringzilla/stringzilla.hpp
Lines 330 to 341 in 152ed04
inline char_set ascii_letters_set() { return char_set {ascii_letters()}; } | |
inline char_set ascii_lowercase_set() { return char_set {ascii_lowercase()}; } | |
inline char_set ascii_uppercase_set() { return char_set {ascii_uppercase()}; } | |
inline char_set ascii_printables_set() { return char_set {ascii_printables()}; } | |
inline char_set ascii_controls_set() { return char_set {ascii_controls()}; } | |
inline char_set digits_set() { return char_set {digits()}; } | |
inline char_set hexdigits_set() { return char_set {hexdigits()}; } | |
inline char_set octdigits_set() { return char_set {octdigits()}; } | |
inline char_set punctuation_set() { return char_set {punctuation()}; } | |
inline char_set whitespaces_set() { return char_set {whitespaces()}; } | |
inline char_set newlines_set() { return char_set {newlines()}; } | |
inline char_set base64_set() { return char_set {base64()}; } |
Steps to reproduce
#include "stringzilla/stringzilla.hpp"
namespace sz = ashvardanian::stringzilla;
int main() {
sz::string haystack = "239";
// Test with null-terminated string
assert(haystack.contains_only(sz::char_set("0123456789")));
// Passes: null terminator is correctly discarded
// Test with initializer list
static std::initializer_list all = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'};
assert (haystack.contains_only(sz::char_set {all}));
// Passes: constructor for initializer list is called
// Test with carray
sz::carray<10> all_digits = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'};
assert(haystack.contains_only(sz::char_set {all_digits}));
assert(haystack.is_digit());
// Fails: '9' is incorrectly discarded
}
Expected behavior
No asserts
StringZilla version
v3.11.0
Operating System
Ubuntu 22.04.5
Hardware architecture
x86
Which interface are you using?
C++ bindings
Contact Details
No response
Are you open to being tagged as a contributor?
- I am open to being mentioned in the project
.git
history as a contributor
Is there an existing issue for this?
- I have searched the existing issues
Code of Conduct
- I agree to follow this project's Code of Conduct