Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

windows-gotchas を翻訳 #118

Merged
merged 13 commits into from
May 24, 2018
12 changes: 6 additions & 6 deletions preprocessed-site/posts/2017/windows-gotchas.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ tags:

GHCがファイルを読み書きする時に使う[`Handle`](https://www.stackage.org/haddock/lts-10.0/base-4.10.1.0/System-IO.html#t:Handle)というオブジェクトには、文字コードの情報が含まれています。

これはRubyの[`IO`](https://docs.ruby-lang.org/ja/latest/class/IO.html)やPerlのファイルハンドラーにあるような仕組みと大体似ていて、`Handle`といったデータの「入り口」を表すオブジェクトに文字コードを紐付けることで、外から入ってくる文字列の文字コードを確実に内部の統一された文字コードに変換する変換できるようにしてくれます
これはRubyの[`IO`](https://docs.ruby-lang.org/ja/latest/class/IO.html)やPerlのファイルハンドラーにあるような仕組みと大体似ていて、`Handle`といったデータの「入り口」を表すオブジェクトに文字コードを紐付けることで、外から入ってくる文字列の文字コードを確実に内部の統一された文字コードに変換してくれます
Haskellの`Char`型の場合はUTF-32(この場合その言い方でよかったっけ?)のはずです。

この`Handle`に紐付ける文字コード、当然のごとくデフォルトではOSのロケール設定に従って設定されるようになってまして、日本語版のWindowsではそう、Windows-31J(またの名をCP932)ですね。
Expand Down Expand Up @@ -75,7 +75,7 @@ UTF-8とWindows-31Jは全然違う体系の文字コードなので、UTF-8な
/c/Windows/System32/chcp.com 932
```

### それでもダメな場合、あるいはライブラリーや開発者として出くわした場合
### それでもダメな場合、あるいはライブラリーなどの開発者として出くわした場合

残念ながら、`chcp 65001`してもこのエラーが消えないことはあります[^eta-20127]。
私の推測なんですが、どうも`chcp 65001`は`chcp 65001`したコマンドプロンプト(とかbash)の孫プロセス(つまり、あなたが入力したコマンドの子プロセス)には届かないことがあるようです。
Expand Down Expand Up @@ -128,7 +128,7 @@ hSetEncoding stdout $ mkLocaleEncoding TransliterateCodingFailure
一つ一つ解説しましょう。
まず`hSetEncoding`は先ほども触れたとおり指定した`Handle`の文字コードを変更する関数です。
そして`stdout`は名前の通り標準出力を表す`Handle`です。
最後の`mkLocaleEncoding TransliterateCodingFailure`ですが、これはWindowsで設定された文字コード(この場合`chcp`された文字コードと同じ)に対して、「もし(Unicodeから、あるいはUnicodeに)変換できない文字があった場合、エラーにせず、それっぽい文字に変換する」という設定にすることができます
最後の`mkLocaleEncoding TransliterateCodingFailure`ですが、これはWindowsで設定された文字コード(`chcp`された文字コードと同じ)を作って、「もし(Unicodeから、あるいはUnicodeに)変換できない文字があった場合、エラーにせず、それっぽい文字に変換する」という設定で返す、という意味です

結果、`chcp 932`な状態でGHCのエラーメッセージにも使われる

Expand All @@ -144,9 +144,9 @@ hSetEncoding stdout $ mkLocaleEncoding TransliterateCodingFailure
? No instance for (Transformation Nagisa CardCommune_Mepple)
```

のように、クエスチョンマークに変換されるようになります。そう、WindowsでGHCをお使いの方は一度は目にした「?」ではないでしょうか😅
つまりGHCはデフォルトで`mkLocaleEncoding TransliterateCodingFailure`しているものと推測されます。
いずれにせよ、エラーが起きないだけマシですね
のように、クエスチョンマークに変換されるようになります。そう、日本語のWindowsでGHCをお使いの方は一度は目にした「?」ではないでしょうか😅
つまりGHCはデフォルトで`hSetEncoding stderr $ mkLocaleEncoding TransliterateCodingFailure`しているものと推測されます。
いずれにせよ、エラーでプログラムが異常終了しないだけマシですね

更に補足すると、GHCの文字コードについてより詳しい情報は、[GHC.IO.Encodingのドキュメント](https://hackage.haskell.org/package/base-4.10.1.0/docs/GHC-IO-Encoding.html)をご覧ください。

Expand Down
201 changes: 201 additions & 0 deletions preprocessed-site/posts/2018/windows-gotchas-en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
---
title: Errors and the workarounds frequently encountered when dealing with Haskell on Windows
headingBackgroundImage: ../../img/background.png
headingDivClass: post-heading
subHeading: Quick-and-dirty checklist
author: Yuji Yamamoto
postedBy: <a href="http://the.igreque.info/">Yuji Yamamoto(@igrep)</a>
date: May 25, 2018
tags: Windows
...
---

This is the English version of [WindowsでHaskellを扱う時によく遭遇するエラーと対処法](https://haskell.jp/blog/posts/2017/windows-gotchas.html).
The original article is the 4th article of [Haskell (その4) Advent Calendar 2017 (Japanese)](https://qiita.com/advent-calendar/2017/haskell4).


What I'm going to tell is summarized as [just one tweet (originally in Japanese)](https://twitter.com/igrep/status/938056578934042626):

> What I've learned:
>
> - chcp65001 if 'Invalid character'
> - rebuild if 'Permission Denied'
> - Don't mix Japanese characters in file paths.
> - Some libraries in C are available, and others are not.
>
> Perhaps they're helpful in other languages.

Let me add more details.

# chcp 65001 if "Invalid character"

You would have encountered this frequently, especially if you don't know how to avoid/fix this.
Oh, it's caused again by building with hakyll!


```
> stack exec -- site rebuild
...
[ERROR] preprocessed-site\posts/2017/01-first.md: hGetContents: invalid argument (invalid byte sequence)
```

The object called [`Handle`](https://www.stackage.org/haddock/lts-10.0/base-4.10.1.0/System-IO.html#t:Handle), used by GHC to read and write a file, knows its character encoding.


This resembles Ruby's [`IO`](https://ruby-doc.org/core-2.5.0/IO.html) and Perl's file handler.
Both of them represent the "gateway" of data, and assigning character encoding to them enables us to handle the only, consistently encoded strings by converting the incoming data.
In Haskell's type `Char`, the only default encoding is UTF-32 (is this the right name in this case?).


The character encoding assigned to a `Handle` by default depends on the locale settings of the OS: in Japanese Windows, Windows-31J (a.k.a CP932).
But it's now soon becoming 2018 (when writing the original article). The most of the file you create should be in UTF-8 unless you write programs in notepad.exe[^notepad].
It doesn't work to read a UTF-8 file as a Windows-31J file because they're very different encoding system.
The `invalid byte sequence` error, shown at the head of this section, is caused by that inconsistency.
Remember this kind of errors are often caused when reading or writing stdout/stdin, as well as plain files.


[^notepad]: Translator's note: In Japanese locale, notepad.exe saves the file in Windows-31J. This will be changed (into UTF-8) in the future release of Windows 10.

## Workaround

### If you encounter as a user

In many cases you can avoid these kind of errors by running the below command in advance.


```
> chcp 65001
> stack exec -- site rebuild
... Should work!
```

This command temporarily changes the character encoding in the current Command Prompt session.
The number `65001` seems to stand for UTF-8.
To roll it back, run `chcp 932`.


```
> chcp 932
```

It seems that the "932" of "CP932" is the same "932" entered here!


The `chcp` command is available in MSYS2's bash (Surprises me a little. How it works?).
But you should know that `chcp` exists at `C:\Windows\System32\`, which MSYS2 users usually don't want to include in the `PATH`.
The directory contains many incompatible commands whose names conflict with the tools loved by Unix people (e.g. `find.exe`)!


So I've dropped `C:\Windows\System32\` from `PATH` when using MSYS2.
If you've done like me, run by full path:


```
/c/Windows/System32/chcp.com 932
```

### If still it doesn't work, or you're the developer of the libraries etc.

Unfortunately, the error can often persist even after running `chcp 65001`[^eta-20127].
According to my guess, the `chcp 65001` command doesn't affect the grandchild processes of the Command Prompt (or bash etc.) on which the `chcp` is run (i.e. the child processes of the command you enter).

[^eta-20127]: By the way, when I once tried to build the compiler of [Eta](http://eta-lang.org/), (as far as I remember) `chcp 65001` didn't fix the problem, but `chcp 20127` did.
As `chcp 20127` switches into US-ASCII, so I suspect the local environment of the developer of Eta is US-ASCII...

If the error still happens you can either report to the developer, or fix it yourself!
When reporting; asking the developer to run after doing `chcp 932` could help him/her reproduce the bug (Sorry, I've never tried it).
When fixing by yourself, perhaps the best and most certain way would be to switch the character encoding of the `Handle` object.


This problem is caused by the inconsistency between the `Handle`\'s character encoding and the actually transferred bytes' encoding. So switching into the proper encoding should fix it.
If the error happens when reading/writing a common UTF-8 file via the `Handle`, writing like below can avoid it:


```haskell
import System.IO (hSetEncoding)
import GHC.IO.Encoding (utf8)

hSetEncoding handle utf8
```

As a bonus, I'll show you an example of how [I myself addressed a problem caused by the standard output (or standard error output), and fixed a bug in haddock](https://github.com/haskell/haddock/pull/566).
In short, it can at least suppress the error to paste the code below before your program uses the `Handle` (Copied from [this commit](https://github.com/haskell/haddock/pull/566/commits/855118ee45e323fd9b2ee32103c7ba3eb1fbe4f2)).


```haskell
{-# LANGUAGE CPP #-}

import System.IO (hSetEncoding, stdout)

#if defined(mingw32_HOST_OS)
import GHC.IO.Encoding.CodePage (mkLocaleEncoding)
import GHC.IO.Encoding.Failure (CodingFailureMode(TransliterateCodingFailure))
#endif

...

#if defined(mingw32_HOST_OS)
liftIO $ hSetEncoding stdout $ mkLocaleEncoding TransliterateCodingFailure
#endif
```

CPP macros to `import` modules only available on Windows makes this code hard to read, so let's cut out the verbose part:


```
hSetEncoding stdout $ mkLocaleEncoding TransliterateCodingFailure
```


Here're the details:
First of all, `hSetEncoding` is the function to change the `Handle`'s character encoding, as I referred before.
Then `stdout` is the `Handle` for the standard output as its name.
The last function call `mkLocaleEncoding TransliterateCodingFailure` returns a character encoding object for the current Windows' character encoding (i.e. `chcp`ed character encoding), configured as "Even if the `Handle` detects any characters which can't be converted into/from a Unicode character, don't raise an error, convert it into some likable character instead.".

As the result of the `hSetEncoding` above, and the current character encoding is Windows-31J, the character used in the compilation error of GHC:

```
↓This character
• No instance for (Transformation Nagisa CardCommune_Mepple)
```

is converted into


```
? No instance for (Transformation Nagisa CardCommune_Mepple)
```

the question mark. Yeah, this is the "?" I bet most users of GHC on Japanese Windows have seen at least once 😅
This makes me guess GHC executes `hSetEncoding stderr $ mkLocaleEncoding TransliterateCodingFailure` by default before printing out the compilation error.
Anyway, it's good that the program doesn't abort due to the error!


As the last note of this section: Read [the document of GHC.IO.Encoding](https://hackage.haskell.org/package/base-4.10.1.0/docs/GHC-IO-Encoding.html) for the details of how GHC handles various character encodings.

# Rebuild if "Permission Denied"

I've made the first section too long for "Quick-and-dirty checklist", but I'll tell you in short from this section.
We often encounter some errors like "Permission Denied", "Directory not empty" and similar ones when running `stack build`, `ghc`, `elm-make`, and any other commands written in Haskell.
To tell the truth, I'm completely not sure of the cause, but those errors disappear by running the same command several times.
The key is to repeat many times. Never give up only by once or twice 😅
Turning off your antivirus software's scanning of the problematic directory, Dropbox's synchronisation, etc. might also fix such errors.


# Try hard to build libraries in C...

On Windows, it frequently troubles us to install libraries which depend on libraries written in C (registered as `lib***` in your OS's package manager).
But this is not the case only for Haskell.


The way to fix depends on the case, so let me give you some examples as external links (Sorry, all pages are written in Japanese!).


- HDBC-sqlite3:
- [Windows版stackでもHDBC-sqlite3をビルドする - Qiita](https://qiita.com/igrep/items/d947ab871eb5b20b57e4)
- [MSYS2でHDBC-sqlite3をコンパイル - 北海道苫小牧市出身の初老PGが書くブログ](http://hiratara.hatenadiary.jp/entry/2017/01/29/110100)
- [Haskell - Haskellにてstackでiconvパッケージを利用する方法【Windows環境】(102462)|teratail](https://teratail.com/questions/102462)

That's all!
Then, Happy Hacking in Haskell on Windows 10!! I don't know WSL!🏁🏁🏁