You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
endswith returns the wrong result when unicode characters are part of the string. The example below should paint a clear picture. This may be fixed in one of the open utf8 PRs already, but I wanted to raise this issue just in case.
Steps to reproduce
Repro example:
defmain():
vartest: String ="│\n"print(test.removesuffix("\n")) # Does not strip the newline.print(test.endswith("\n")) # Returns False
System information
- What OS did you do install Mojo on ? MacOS 14.6.1
- Provide version information for Mojo by pasting the output of `mojo -v` 24.6
- Provide Magic CLI version by pasting the output of `magic -V` or `magic --version` magic 0.5.1 (based on pixi 0.37.0)
- Optionally, provide more information with `magic info`.
The text was updated successfully, but these errors were encountered:
@thatstoasty I haven't gotten around to those functions yet, but I think I found the problem. StringSlice.__len__() works by unicode codepoints and String doesn't (it should in the future). StringSlice.find() works by byte offset, and it should be by unicode codepoints.
All of that context to explain:
fnendswith(
self, suffix: StringSlice, start: Int =0, end: Int =-1
) -> Bool:
"""Verify if the `StringSlice` end with the specified suffix between start and end positions. Args: suffix: The suffix to check. start: The start offset from which to check. end: The end offset from which to check. Returns: True if the `self[start:end]` is suffixed by the input suffix."""iflen(suffix) >len(self):
returnFalseif end ==-1:
returnself.rfind(suffix, start) +len(suffix) ==len(self)
return StringSlice[origin](
ptr=self.unsafe_ptr() + start, length=end - start
).endswith(suffix)
The line self.rfind(suffix, start) + len(suffix) == len(self) is to blame. Since rfind returns a byte offset and len() unicode codepoints. Or at least I think that line is the problem. This might get fixed when we switch to full unicode support (find(), __getitem__ and len() for strings should all work by unicode codepoints).
If that's not the case then I'm as lost as you are 😅
Bug description
endswith
returns the wrong result when unicode characters are part of the string. The example below should paint a clear picture. This may be fixed in one of the open utf8 PRs already, but I wanted to raise this issue just in case.Steps to reproduce
Repro example:
System information
The text was updated successfully, but these errors were encountered: