-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix printing of emoji on Windows when stdout is redirected #3374
base: main
Are you sure you want to change the base?
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
8407f29
to
e12fb7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution!
Monkeypatching sys.stdout
feels quite risky though. A few questions:
- What will happen on Windows terminals that use an encoding incompatible with UTF-8? I'm worried we'll get mojibake in that case.
- Could this be fixed in
click
instead? We useclick.echo
to output all emoji, andclick
is supposed to take care of output encoding for us. - The
pycln
issue reported gettingUnicodeEncodeError
, while the OP of the Black issue saw literal\u1234
output instead. Do you know why that is?
This isn't possible on ~"modern" Windows terminals (which I means Windows 10+, even the built in CMD.exe understands utf-8) Also from python 3.7+ the output is utf8 when stdout is connected to a TTY, so this disparity only occurs when not in a TTY. So in short I think the risk of mojibake is tiny. And whatever risk there is already applies without this change and you'd get an encoding error on trying to convert the extended unicode emoji character into cp1252. One option I could do here is to limit this to Windows 10+. They should remove the risk entirely.
It can, but for some reason it felt like a more disruptive change to make there rather then just to a single project (black) that we know we already expect to be able to print utf-8/high unicode.
I suspect the Unicode escape is something to do with how the output is captured by pre-commit. I can dig into that more if you'd like confirmation |
This is funny to even mention but since Black works on Python 3.7+, it will run on Windows 7, Python 3.9 and newer don't support Windows 7 but still support So, while decreasingly likely, it is certainly possible for a user of an older Windows to run Black. I would add the version check to avoid regressions. Skipping an emoji and showing edit: turns out Python's requirements are Visual Studio 2017 requirements which means no Windows Vista and Windows 8 for you. Windows 7 and 8.1 is fine. |
So adding a >>> platform.uname()
uname_result(system='Windows', node='sinope', release='10', version='10.0.22621', machine='AMD64')
>>> platform.release()
'10' |
Ah I've done a bit of digging and technically it's not even all versions of windows 10, but anyone since the Creators update, build 1809, so I'll make the check even more detailed. (And I've checked that this will still work on Github Actions too. On Server 2022 it returns |
if ( | ||
"pytest" not in sys.modules | ||
and platform.system() == "Windows" | ||
and tuple(map(int, platform.version().split("."))) >= (10, 0, 1809) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't decide if I love or hate that last line. Advantages:
- it's correct (as it uses
int
comparisons); - it performs well (as the
and
short circuits for non-Windows users); - fits in a single line.
The disadvantage is that it reads pretty clumsily ("make a tuple out of an int map of the string that platform.version()
returns split by the period... and then compare it to (10, 0, 1809)"). Especially that int map bit.
But I guess the advantages outweigh my personal preference here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty much sums up my feelings when writing it!
This is most apparent when black is run via pre-commit which does `subprocess.Popen(stdout=PIPE)`. One option around this would be to instruct users to set `PYTHONUTF8=1`, but that is not a very good approach, as basically everything supports UTF-8 these days. This fix was inspired by pycln and hadialqattan/pycln#54
On older versions of Windows the cmd.exe can't display UTF-8, so this would likely result in mojibake. By limiting it to Windows10 we _know_ that displaying UTF-8 will work
It _might_ work on earlier versions, but this is where `CreatePseudoConsole` and the ConHost overhaul was released which definitely has universal utf-8 support. See https://learn.microsoft.com/en-us/windows/console/createpseudoconsole
Co-authored-by: Łukasz Langa <[email protected]>
6cbe299
to
bd81114
Compare
Any other comments or changes requested @ambv @JelleZijlstra? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I appreciate all of your in-depth investigation, I don't understand what's wrong here myself, but this seems safe have read your comments.
I'll reach out to the click folks tomorrow, but I agree it's very possible click won't fix this for us as it (seems to) requires monkeypatching the standard streams.
Sorry for taking so long to get back to you!
@@ -1411,6 +1411,22 @@ def patch_click() -> None: | |||
|
|||
|
|||
def patched_main() -> None: | |||
#: Fixes errors with emoji in Windows terminals when output is redirected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#: Fixes errors with emoji in Windows terminals when output is redirected | |
#: Fixes errors with emoji in Windows terminals when output is redirected |
Coming back to this, I'm not enthusiastic about including this in Black:
|
If we don't accept this (and I entirely understand why) is there perhaps a place in the docs that I could document the work around to users (setting |
Mentioning that workaround in the docs seems fine. |
One comment on the "We're monkeypatching the standard library, always a risky thing to do" -- if that is the only concern then I could replace the |
This is most apparent when black is run via pre-commit which does
subprocess.Popen(stdout=PIPE)
.One option around this would be to instruct users to set
PYTHONUTF8=1
,but that is not a very scalable approach, as basically everything supports
UTF-8 these days.
This fix was inspired by pycln and hadialqattan/pycln#54
Fixes #3156
Checklist - did you ...
CHANGES.md
if necessary?