-
-
Notifications
You must be signed in to change notification settings - Fork 34.1k
Open
Labels
interpreter-core(Objects, Python, Grammar, and Parser dirs)(Objects, Python, Grammar, and Parser dirs)type-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Description
Bug report
Bug description:
OSS-Fuzz has found a heap buffer overflow in _PyTokenizer_ensure_utf8. Link to OSS-Fuzz bug report.
The root cause is that valid_utf8() in Parser/tokenizer/helpers.c checks continuation bytes in reverse order thus reader s[expected] before s[1] on these lines:
cpython/Parser/tokenizer/helpers.c
Lines 497 to 499 in 8b7b5a9
| for (; expected; expected--) | |
| if (s[expected] < 0x80 || s[expected] >= 0xC0) | |
| return 0; |
When a multi-byte UTF-8 sequence is truncated - such as a 3-byte lead \xEA followed immediately by a null terminator - the backward loop reads past the end of the valid data before encountering the null byte that would stop it.
This is not a security-critical issue.
CPython versions tested on:
CPython main branch
Operating systems tested on:
No response
Linked PRs
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
interpreter-core(Objects, Python, Grammar, and Parser dirs)(Objects, Python, Grammar, and Parser dirs)type-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error