Replies: 6 comments 19 replies
-
types as in the famous examples of when languges added more formal types like stringbuilder or python 3 or voldemore types where where its inlined as a "range starter" |
Beta Was this translation helpful? Give feedback.
-
why support utf16 and 32 at all? ascii and unicode are the main choices |
Beta Was this translation helpful? Give feedback.
-
Oh goodness, this is a hell of a topic. I was genuinely hoping that we would never do this. First let's start with the different storage representations:
Appender I didn't bother much about, although it does have character and my strings specific behavior. My read only strings and builders are one source file each, that has some replacements done on it to form three of each. Don't forget we'll need normalization support as comparison (note: no default here is correct, even if everyone uses NFC today). |
Beta Was this translation helpful? Give feedback.
-
Meh. What's the point of this? I always thought that constantly checking string validity was a bit slow and pedantic, and wished that the The existing built-in types are more interoperable with existing D code, already represent UTF-8/16/32 strings, and can have syntactic sugar that's obviated at compile-time, and so on. |
Beta Was this translation helpful? Give feedback.
-
A type-level guarantee against invalid UTF would be nice per se, but I feel this is pretty little gain compared to the effort required. I don't think we should have any other work waiting for the type to happen. But it can be done, the string functions can continue to parse ranges of characters regardless of their source, including good old arrays of them. Then we can devise a checked string type any time we wish and it can work with existing functions - even V2 ones - out of the box. |
Beta Was this translation helpful? Give feedback.
-
As far as the general Phobos API goes, I don't think that a string type is particularly relevant. The vast majority of the code will be written to work on ranges of characters (and probably just ranges of It's the stuff where you actually store a string where it becomes an issue, and that doesn't come up all that often in Phobos. So, with something like that, we would need to decide on what string type to use, but it'll be the simplest to just use That being the case, a string type that handles string comparisons and normalization and whatnot (and potentially does stuff like have small string optimizations) might be useful to have for folks who want that sort of thing, but Phobos as a whole wouldn't need to know or care. Anything involving string building would be up to the string type itself to deal with, and for the rest, you just need to be able to get a range of So, I don't know how good or bad an idea it is to create a new string type for Phobos (and there are certainly arguments in favor and against), but I think that it's the kind of thing that can mostly be restricted to its own module without it needing to affect the rest of Phobos, and as such, I don't think that it's terribly relevant to much of the string-handling in Phobos. |
Beta Was this translation helpful? Give feedback.
-
Instead of using
string
,wstring
, anddstring
to represent Unicode strings, Phobos v3 should include library types representing Unicode strings encoded in UTF-8, UTF-16, and UTF-32.These types should validate their data upon construction, and their public interfaces should be designed to ensure that their data remains valid at all times (at least in
@safe
code). For example, the UTF-8 and UTF-16 types must not allow slicing in the middle of a code point.This approach is an application of the principles described in "Parse, don't validate."
Beta Was this translation helpful? Give feedback.
All reactions