Fix printing of emoji on Windows when stdout is redirected #3374

ashb · 2022-11-03T21:25:11Z

This is most apparent when black is run via pre-commit which does
subprocess.Popen(stdout=PIPE).

One option around this would be to instruct users to set PYTHONUTF8=1,
but that is not a very scalable approach, as basically everything supports
UTF-8 these days.

This fix was inspired by pycln and hadialqattan/pycln#54

Fixes #3156

Checklist - did you ...

Add an entry in CHANGES.md if necessary?

github-actions · 2022-11-03T21:48:50Z

diff-shades reports zero changes comparing this PR (de02031) to main (5d0d593).

What is this? | Workflow run | diff-shades documentation

JelleZijlstra

Thanks for your contribution!

Monkeypatching sys.stdout feels quite risky though. A few questions:

What will happen on Windows terminals that use an encoding incompatible with UTF-8? I'm worried we'll get mojibake in that case.
Could this be fixed in click instead? We use click.echo to output all emoji, and click is supposed to take care of output encoding for us.
The pycln issue reported getting UnicodeEncodeError, while the OP of the Black issue saw literal \u1234 output instead. Do you know why that is?

ashb · 2022-11-04T13:36:13Z

Thanks for your contribution!

Monkeypatching sys.stdout feels quite risky though. A few questions:

What will happen on Windows terminals that use an encoding incompatible with UTF-8? I'm worried we'll get mojibake in that case.

This isn't possible on ~"modern" Windows terminals (which I means Windows 10+, even the built in CMD.exe understands utf-8)

Also from python 3.7+ the output is utf8 when stdout is connected to a TTY, so this disparity only occurs when not in a TTY.

So in short I think the risk of mojibake is tiny. And whatever risk there is already applies without this change and you'd get an encoding error on trying to convert the extended unicode emoji character into cp1252.

One option I could do here is to limit this to Windows 10+. They should remove the risk entirely.

Could this be fixed in click instead? We use click.echo to output all emoji, and click is supposed to take care of output encoding for us.

It can, but for some reason it felt like a more disruptive change to make there rather then just to a single project (black) that we know we already expect to be able to print utf-8/high unicode.

The pycln issue reported getting UnicodeEncodeError, while the OP of the Black issue saw literal \u1234 output instead. Do you know why that is?

I suspect the Unicode escape is something to do with how the output is captured by pre-commit. I can dig into that more if you'd like confirmation

ambv · 2022-11-04T14:07:19Z

This is funny to even mention but since Black works on Python 3.7+, it will run on Windows 7, 8, 8.1, 10, and 11. ~~Maybe even Vista 😂~~

Python 3.9 and newer don't support Windows 7 but still support 8 8.1+.

So, while decreasingly likely, it is certainly possible for a user of an older Windows to run Black. I would add the version check to avoid regressions. Skipping an emoji and showing \Uxxx is less disruptive than mojibake.

edit: turns out Python's requirements are Visual Studio 2017 requirements which means no Windows Vista and Windows 8 for you. Windows 7 and 8.1 is fine.

ashb · 2022-11-04T14:31:41Z

So adding a int(platform..release()) > 10 sounds like a safer way of doing this then?

>>> platform.uname()
uname_result(system='Windows', node='sinope', release='10', version='10.0.22621', machine='AMD64')
>>> platform.release()
'10'

ashb · 2022-11-04T15:01:56Z

Ah I've done a bit of digging and technically it's not even all versions of windows 10, but anyone since the Creators update, build 1809, so I'll make the check even more detailed.

(And I've checked that this will still work on Github Actions too. On Server 2022 it returns 10.0.20348 for the version.

ambv · 2022-11-05T11:55:54Z

src/black/__init__.py

+    if (
+        "pytest" not in sys.modules
+        and platform.system() == "Windows"
+        and tuple(map(int, platform.version().split("."))) >= (10, 0, 1809)


Can't decide if I love or hate that last line. Advantages:

it's correct (as it uses int comparisons);

it performs well (as the and short circuits for non-Windows users);

fits in a single line.

The disadvantage is that it reads pretty clumsily ("make a tuple out of an int map of the string that platform.version() returns split by the period... and then compare it to (10, 0, 1809)"). Especially that int map bit.

But I guess the advantages outweigh my personal preference here.

Pretty much sums up my feelings when writing it!

src/black/__init__.py

CHANGES.md

This is most apparent when black is run via pre-commit which does `subprocess.Popen(stdout=PIPE)`. One option around this would be to instruct users to set `PYTHONUTF8=1`, but that is not a very good approach, as basically everything supports UTF-8 these days. This fix was inspired by pycln and hadialqattan/pycln#54

On older versions of Windows the cmd.exe can't display UTF-8, so this would likely result in mojibake. By limiting it to Windows10 we _know_ that displaying UTF-8 will work

It _might_ work on earlier versions, but this is where `CreatePseudoConsole` and the ConHost overhaul was released which definitely has universal utf-8 support. See https://learn.microsoft.com/en-us/windows/console/createpseudoconsole

Co-authored-by: Łukasz Langa <[email protected]>

ashb · 2022-11-15T17:00:07Z

Any other comments or changes requested @ambv @JelleZijlstra?

CHANGES.md

ichard26

Thank you! I appreciate all of your in-depth investigation, I don't understand what's wrong here myself, but this seems safe have read your comments.

I'll reach out to the click folks tomorrow, but I agree it's very possible click won't fix this for us as it (seems to) requires monkeypatching the standard streams.

Sorry for taking so long to get back to you!

ichard26 · 2023-01-03T05:08:08Z

src/black/__init__.py

@@ -1411,6 +1411,22 @@ def patch_click() -> None:


 def patched_main() -> None:
+    #: Fixes errors with emoji in  Windows terminals when output is redirected


Suggested change

#: Fixes errors with emoji in Windows terminals when output is redirected

#: Fixes errors with emoji in Windows terminals when output is redirected

JelleZijlstra · 2023-01-03T05:44:50Z

Coming back to this, I'm not enthusiastic about including this in Black:

We're monkeypatching the standard library, always a risky thing to do, and we're doing it only in a particular set of obscure conditions (when pytest is absent and we're on a particular version of Windows). That feels like a future maintainability nightmare.
Printing unicode isn't a core part of what Black is about. We're using Click to create Unicode-compatible output, so I feel like it should be Click's job to fix Unicode issues here. If Click doesn't think it's worth fixing, maybe that's a hint that we shouldn't either.
The bug this fixes isn't particularly bad. Emojis get printed as "\u1234" sequences, which is ugly but doesn't make Black unusable.

ashb · 2023-01-04T15:18:32Z

If we don't accept this (and I entirely understand why) is there perhaps a place in the docs that I could document the work around to users (setting PYTHONUTF8=1)?

JelleZijlstra · 2023-01-04T15:31:01Z

Mentioning that workaround in the docs seems fine.

ashb · 2023-01-05T12:48:02Z

One comment on the "We're monkeypatching the standard library, always a risky thing to do" -- if that is the only concern then I could replace the sys.stdout = ... with contextlib.redirect_stdout

ashb mentioned this pull request Nov 3, 2022

No Emojis on PowerShell #3156

Open

This comment was marked as resolved.

Sign in to view

ashb force-pushed the force-utf8-terminals branch from 8407f29 to e12fb7b Compare November 4, 2022 09:34

JelleZijlstra reviewed Nov 4, 2022

View reviewed changes

ambv reviewed Nov 5, 2022

View reviewed changes

src/black/__init__.py Outdated Show resolved Hide resolved

ambv reviewed Nov 5, 2022

View reviewed changes

CHANGES.md Outdated Show resolved Hide resolved

ashb and others added 4 commits November 15, 2022 16:59

Only force utf-8 stdout/err on Windows 10+

08cc65d

On older versions of Windows the cmd.exe can't display UTF-8, so this would likely result in mojibake. By limiting it to Windows10 we _know_ that displaying UTF-8 will work

Apply suggestions from code review

bd81114

Co-authored-by: Łukasz Langa <[email protected]>

ashb force-pushed the force-utf8-terminals branch from 6cbe299 to bd81114 Compare November 15, 2022 17:00

ashb commented Nov 19, 2022

View reviewed changes

CHANGES.md Outdated Show resolved Hide resolved

ashb added 2 commits November 19, 2022 11:00

Update CHANGES.md

3cf2c67

Merge branch 'main' into force-utf8-terminals

de02031

ichard26 approved these changes Jan 3, 2023

View reviewed changes

ichard26 added this to the Release 23.1.0 milestone Jan 3, 2023

ichard26 removed this from the Release 23.1.0 milestone Jan 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix printing of emoji on Windows when stdout is redirected #3374

Fix printing of emoji on Windows when stdout is redirected #3374

ashb commented Nov 3, 2022 •

edited

Loading

This comment was marked as resolved.

github-actions bot commented Nov 3, 2022 •

edited

Loading

JelleZijlstra left a comment

ashb commented Nov 4, 2022

ambv commented Nov 4, 2022 •

edited

Loading

ashb commented Nov 4, 2022

ashb commented Nov 4, 2022

ambv Nov 5, 2022

ashb Nov 5, 2022

ashb commented Nov 15, 2022

ichard26 left a comment

ichard26 Jan 3, 2023

JelleZijlstra commented Jan 3, 2023

ashb commented Jan 4, 2023

JelleZijlstra commented Jan 4, 2023

ashb commented Jan 5, 2023

		@@ -1411,6 +1411,22 @@ def patch_click() -> None:


		def patched_main() -> None:
		#: Fixes errors with emoji in Windows terminals when output is redirected

Fix printing of emoji on Windows when stdout is redirected #3374

Are you sure you want to change the base?

Fix printing of emoji on Windows when stdout is redirected #3374

Conversation

ashb commented Nov 3, 2022 • edited Loading

Checklist - did you ...

This comment was marked as resolved.

github-actions bot commented Nov 3, 2022 • edited Loading

JelleZijlstra left a comment

Choose a reason for hiding this comment

ashb commented Nov 4, 2022

ambv commented Nov 4, 2022 • edited Loading

ashb commented Nov 4, 2022

ashb commented Nov 4, 2022

ambv Nov 5, 2022

Choose a reason for hiding this comment

ashb Nov 5, 2022

Choose a reason for hiding this comment

ashb commented Nov 15, 2022

ichard26 left a comment

Choose a reason for hiding this comment

ichard26 Jan 3, 2023

Choose a reason for hiding this comment

JelleZijlstra commented Jan 3, 2023

ashb commented Jan 4, 2023

JelleZijlstra commented Jan 4, 2023

ashb commented Jan 5, 2023

ashb commented Nov 3, 2022 •

edited

Loading

github-actions bot commented Nov 3, 2022 •

edited

Loading

ambv commented Nov 4, 2022 •

edited

Loading