Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avifenc fails to process files with non UTF8 file names #1238

Closed
hlad12 opened this issue Dec 5, 2022 · 7 comments
Closed

Avifenc fails to process files with non UTF8 file names #1238

hlad12 opened this issue Dec 5, 2022 · 7 comments
Assignees

Comments

@hlad12
Copy link

hlad12 commented Dec 5, 2022

Avifenc fails to process files with non UTF8 file names, having nonstandard characters like Russian / Hindi characters in their file names

@y-guyon
Copy link
Collaborator

y-guyon commented Dec 5, 2022

Thank you for the report.
May I know on which platform you are encountering the issue?
Do you have a file that you can upload here (keeping the file name or zipping it) to reproduce the issue?

@wantehchang
Copy link
Collaborator

@hlad12 Since you didn't mention avifdec in this bug report, I only inspected avifenc.c.

One thing that may go wrong is the first line in the avifReadImage() function:

    const avifAppFileFormat format = avifGuessFileFormat(filename);

The bug may be in the following code in avifGuessFileFormat():

    // If we get here, the file header couldn't be read for some reason. Guess from the extension.

    const char * fileExt = strrchr(filename, '.');
    if (!fileExt) { 
        return AVIF_APP_FILE_FORMAT_UNKNOWN;
    }
    ++fileExt; // skip past the dot

    char lowercaseExt[8]; // This only needs to fit up to "jpeg", so this is plenty
    const size_t fileExtLen = strlen(fileExt);
    if (fileExtLen >= sizeof(lowercaseExt)) { // >= accounts for NULL terminator
        return AVIF_APP_FILE_FORMAT_UNKNOWN;
    }
    
    for (size_t i = 0; i < fileExtLen; ++i) {
        lowercaseExt[i] = (char)tolower((unsigned char)fileExt[i]);
    }
    lowercaseExt[fileExtLen] = 0;
    
    if (!strcmp(lowercaseExt, "avif")) {
        return AVIF_APP_FILE_FORMAT_AVIF;
    } else if (!strcmp(lowercaseExt, "y4m")) {
        return AVIF_APP_FILE_FORMAT_Y4M;
    } else if (!strcmp(lowercaseExt, "jpg") || !strcmp(lowercaseExt, "jpeg")) {
        return AVIF_APP_FILE_FORMAT_JPEG;
    } else if (!strcmp(lowercaseExt, "png")) {
        return AVIF_APP_FILE_FORMAT_PNG;
    } 
    return AVIF_APP_FILE_FORMAT_UNKNOWN;

@mintommm
Copy link

mintommm commented Oct 9, 2023

I also faced this issue.
Encoding seems to fail if emoji is included in the path.

OS: Windows 11 Pro 22H2
Terminal: PowerShell 7.3.7
Binary: avifenc.exe from https://ci.appveyor.com/project/louquillio/libavif/build/artifacts
Original Image: Jy0O0q0mLXl668HAo43n.jpeg from https://web.dev/compress-images-avif/

If run it with the original name, it will succeed.

PS T:\emoji> .\avifenc.exe --version
Version: 1.0.1 (dav1d [dec]:1.2.1-0-g8a6f054, aom [enc/dec]:3.7.0-457-g2603b3a2d)
libyuv : available (1864)

PS T:\emoji> .\avifenc.exe .\Jy0O0q0mLXl668HAo43n.jpeg .\Jy0O0q0mLXl668HAo43n.avif
Directly copied JPEG pixel data (no YUV conversion): .\Jy0O0q0mLXl668HAo43n.jpeg
Successfully loaded: .\Jy0O0q0mLXl668HAo43n.jpeg
AVIF to be written: (Lossy)
 * Resolution     : 840x1120
 * Bit Depth      : 8
 * Format         : YUV420
 * Chroma Sam. Pos: 0
 * Alpha          : Absent
 * Range          : Full
 * Color Primaries: 1
 * Transfer Char. : 13
 * Matrix Coeffs. : 6
 * ICC Profile    : Absent
 * XMP Metadata   : Absent
 * Exif Metadata  : Absent
 * Transformations: None
 * Progressive    : Unavailable
Encoding with AV1 codec 'aom' speed [6], color quality [60 (Medium)], alpha quality [100 (Lossless)], tileRowsLog2 [0], tileColsLog2 [0], 1 worker thread(s), please wait...
Encoded successfully.
 * Color AV1 total size: 29981 bytes
 * Alpha AV1 total size: 0 bytes
Wrote AVIF: .\Jy0O0q0mLXl668HAo43n.avif

If change the name to 🐾.jpeg and run it, it will fail.

PS T:\emoji> .\avifenc.exe .\🐾.jpeg .\🐾.avif
Can't open JPEG file for read: .\??.jpeg
Cannot read input file: .\??.jpeg

Attaching the image for reproduce.
GitHub does not allow to attach the file with original name, so please rename this file to 🐾.jpeg before run it.
🐾

@vrabaud
Copy link
Collaborator

vrabaud commented Oct 9, 2023

Hi, thx for the details. I am preparing a patch using @y-guyon 's excellent work to support UTF-8 on Windows in libwebp: https://chromium.googlesource.com/webm/libwebp/+/refs/heads/main/examples/unicode.h

@wantehchang
Copy link
Collaborator

wantehchang commented Oct 10, 2023

Vincent: I assigned this issue to you based on your comment. Thank you for volunteering.

I debugged this a little today. The command ./avifenc ./🐾.jpeg ./🐾.avif works on my Linux computer, but I can reproduce the bug on my Windows computer.

After some experiments and Web searches, I got it to work as follows:

  • Use wmain() instead of main() to receive command-line arguments in UTF-16.
  • Call WideCharToMultiByte() to convert the UTF-16 command-line arguments to UTF-8.
  • Call setlocale(LC_ALL, ".UTF8") to change the code page of avifenc.c to UTF-8. According to my Web searches, this only works on Windows 10 version 1803 or later. See Microsoft's documentation on setlocale().

@wantehchang
Copy link
Collaborator

Note that in CMakeLists.txt, we have the following:

    add_compile_options(
        ...
        # This tells MSVC to read source code as UTF-8 and assume console can only use ASCII (minimal safe).
        # libavif uses ANSI API to print to console, which is not portable between systems using different
        # languages and results in mojibake unless we only use codes shared by every code page: ASCII.
        # A C4556 warning will be generated on violation.
        # Commonly used /utf-8 flag assumes UTF-8 for both source and console, which is usually not the case.
        # Warnings can be suppressed but there will still be random characters printed to the console.
        /source-charset:utf-8
        /execution-charset:us-ascii
    )

I found that the /execution-charset:us-ascii compiler flag does not seem to have an effect on this issue.

@wantehchang
Copy link
Collaborator

This has been fixed by a series of commits from Vincent, including #1693. Eventually we switched to a solution using a manifest file (#1900).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants