Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This patch replaces the previous broken approach to TOC string decoding that used
.encode().decode('unicode_escape')
with proper parsing of the escape sequences cdrdao is known to generate.The new parser is also lenient with invalid escape sequences, that can occur due to improper escaping in cdrdao. See:
cdrdao/cdrdao#32
Latin-1:
This new parsing method should work for Latin-1 strings for both old and new versions of cdrdao, as long as those strings don't trigger the improper escaping issues in upstream cdrdao.
This has been verified with the album Diorama from the Danish black metal band MØL.
MS-JIS:
This new parsing method should also work for MS-JIS strings as long as the .toc file was generated by cdrdao 1.2.5+ and the strings don't trigger improper escaping issues in upstream cdrdao.
Unfortunately, I don't have any CD with CD-Text in MS-JIS, so I could not verify this.
cdrdao versions before 1.2.5 will still cause whipper to produce mojibake (garbled characters) when reading MS-JIS CD-Text, as those versions do not encode strings in UTF-8.
Other encodings:
As far as I know, CD-Text only supports officially ASCII, Latin-1 and MS-JIS, but I wouldn't be surprised if there are unofficial encodings out there, given the strange strings I've seen in some bug reports.
If you have a CD with garbled CD-Text, please submit a bug report indicating the performer, album name, language and attach the .toc file so that the produced strings can be compared to the expected text.
Fixes #169
Fixes #183