Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix utf8 handling in C++ with the default Std IO streams #1965

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kdt3rd
Copy link
Contributor

@kdt3rd kdt3rd commented Jan 29, 2025

The use of the new std::filesystem::path to cleanup messy filebuf manipulations to create a stream was incorrectly not using u8path under windows. This also adds future support for C++20 where they change the API.

Adds tests to test the utf-8 support which was previously incomplete

@@ -27,15 +27,25 @@ namespace
inline ifstream*
make_ifstream (const char* filename)
{
return new ifstream (std::filesystem::path (filename),
#if __cplusplus >= 202002L
return new ifstream (std::filesystem::path (std::u8string ((const char8_t*)filename)),
Copy link

@kmilos kmilos Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is a safe cast, the sign of char is platform specific. See e.g. this discussion, it would seem there is still some clarifications to be done in the standard re UTF-8; it seems somehow broken/incomplete in C++20: https://stackoverflow.com/questions/57603013/how-to-safely-convert-const-char-to-const-char8-t-in-c20

It seems to say it's best to just keep using the C++17 u8path...

Re C++20 and future OpenEXR API, they seem to suggest adding a UTF-8 specific make_ifstream (const char8_t* filename).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am quite disappointed what the standards committee has done in this regard. While I don't think the signedness is a particular concern, the strict aliasing rules violation probably does have some semantic effect for the optimizers. As a result, until there is more clarity, I'll just remove the c++20 path for now, as there is no VFX reference platform specifying it yet anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, just switched it to a view w/ transform, which is still disappointing, but at least avoids the aliasing issue and actually an extra string copy from what was there.

@kdt3rd kdt3rd force-pushed the fix_utf8_handling_cpp branch 2 times, most recently from eddf7a6 to fcd1678 Compare January 31, 2025 22:20
the std::filesystem::path mechanism is reasonable, but needs to be told
it to look for a utf-8 string, and is not automatically that, and the
standards committee must have realized there were issues and change the
api in c++20

Signed-off-by: Kimball Thurston <[email protected]>
@kdt3rd kdt3rd force-pushed the fix_utf8_handling_cpp branch from fcd1678 to 3f19ca4 Compare January 31, 2025 23:45
Copy link
Contributor

@meshula meshula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I can tell the CI failures are meaningless with respect to this PR. lgtm.

Comment on lines -83 to +84
// class IStream
// class IStre
// am
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert?


void testUpdateMeta (const std::string& tempdir);

void testWriteScans (const std::string& tempdir);
void testWriteTiles (const std::string& tempdir);
void testWriteMultiPart (const std::string& tempdir);


Copy link

@kmilos kmilos Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert unnecessary change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants