Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New TOC CD-TEXT string decoding #633

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Commits on Aug 27, 2024

  1. New TOC CD-TEXT string decoding

    This patch replaces the previous broken approach to TOC string decoding
    that used `.encode().decode('unicode_escape')` with proper parsing of
    the escape sequences cdrdao is known to generate.
    
    The new parser is also lenient with invalid escape sequences, that can
    occur due to improper escaping in cdrdao. See:
    cdrdao/cdrdao#32
    
    Latin-1:
    
    This new parsing method should work for Latin-1 strings for both old and
    new versions of cdrdao, as long as those strings don't trigger the
    improper escaping issues in upstream cdrdao.
    
    This has been verified with the album Diorama from the Danish black
    metal band MØL.
    
    MS-JIS:
    
    This new parsing method should also work for MS-JIS strings as long as
    the .toc file was generated by cdrdao 1.2.5+ and the strings don't
    trigger improper escaping issues in upstream cdrdao.
    
    Unfortunately, I don't have any CD with CD-Text in MS-JIS, so I could
    not verify this.
    
    cdrdao versions before 1.2.5 will still cause whipper to produce
    mojibake (garbled characters) when reading MS-JIS CD-Text, as those
    versions do not encode strings in UTF-8.
    
    Other encodings:
    
    As far as I know, CD-Text only supports officially ASCII, Latin-1 and
    MS-JIS, but I wouldn't be surprised if there are unofficial encodings
    out there, given the strange strings I've seen in some bug reports.
    
    If you have a CD with garbled CD-Text, please submit a bug report
    indicating the performer, album name, language and attach the .toc file
    so that the produced strings can be compared to the expected text.
    
    Fixes whipper-team#169
    
    Signed-off-by: Alicia Boya García <[email protected]>
    ntrrgc committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    4719c74 View commit details
    Browse the repository at this point in the history