Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evil-digraph: How to add a digraph for '🙂' character #1947

Open
ju-sh opened this issue Dec 25, 2024 · 2 comments
Open

evil-digraph: How to add a digraph for '🙂' character #1947

ju-sh opened this issue Dec 25, 2024 · 2 comments

Comments

@ju-sh
Copy link

ju-sh commented Dec 25, 2024

Issue type

  • Question: How to add digraph for '🙂' character? I was trying to have ':)'

I think this is a 4-byte unicode character.
Is it possible to have digraphs for 4-byte unicode?

Environment

Emacs version: 27.1
Operating System: Debian 11
Evil version: 1.15.0
Evil installation type: MELPA
Graphical/Terminal: X
Tested in a make emacs session (see CONTRIBUTING.md): No

Reproduction steps

Add this to init file:

((?: ?\)) . ?\xd83dde42)

Expected behavior

Digraph should be loaded

Actual behavior

Error message: error: Hex character out of range: \xd83dde42...

Further notes

Not sure if I got the unicode hex for '🙂' wrong..

@ju-sh ju-sh changed the title evil-digraph: Add digraph for '🙂' character evil-digraph: How to add a digraph for '🙂' character Dec 25, 2024
@tomdl89
Copy link
Member

tomdl89 commented Dec 27, 2024

On my machine, that emoji looks to have a hex value of #x1f642 so this should work:

(push '((?: ?\)) . ?\x1f642) evil-digraphs-table-user)

but then so should just using the emoji literally in your init file:

(push '((?: ?\)) . ?🙂) evil-digraphs-table-user)

@tomdl89
Copy link
Member

tomdl89 commented Dec 27, 2024

ChatGPT explanation if you are interested:

In short, they were looking at the UTF-16 “surrogate pair” rather than the single Unicode code point.

Single Code Point

  • The “slightly smiling face” emoji is U+1F642 in Unicode.

UTF-16 Surrogate Pair

  • In UTF-16 encoding, characters outside the Basic Multilingual Plane (BMP) need two 16-bit code units.
  • For 🙂, the surrogate pair is 0xD83D 0xDE42.

So, when someone says “it’s U+d83dde42,” they are simply combining the two surrogate halves into one notation (D83D and DE42). It’s still the same emoji, but shown in UTF-16 rather than the single Unicode point in hex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants