fix: handle incomplete UTF-8 sequences and add test for reproduction #1166

monochromegane · 2024-09-24T05:47:07Z

I'd like to start by thanking you for releasing such an amazing TUI framework 🧋 .
This pull request introduces improvements to the way we handle input data by detecting incomplete UTF-8 sequences and addressing them appropriately.

Background

Currently, tea.KeyMsg detects an unknownInputByteMsg when a byte array is interrupted in the middle of reading multibyte UTF-8 characters. As a result, the character is corrupted and cannot be correctly input. Fortunately, UTF-8 encoding allows us to determine whether more bytes are needed based on the first byte. We believe this can be resolved by invoking an additional read to complete the sequence.

Reproduction

This issue can occasionally be reproduced by repeatedly inputting multiple multibyte characters using the code below. My environment is macOS 14.6.1, go version go1.23.1 darwin/arm64, tmux 3.4.

package main

import (
	"fmt"
	"log"
	"strings"

	tea "github.com/charmbracelet/bubbletea"
)

type model struct {
	msgs []string
}

func (m model) Init() tea.Cmd { return nil }

func (m model) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
	m.msgs = append(m.msgs, fmt.Sprintf("%T %#v", msg, msg))
	switch msg := msg.(type) {
	case tea.KeyMsg:
		switch msg.Type {
		case tea.KeyCtrlC:
			return m, tea.Quit
		}
	}
	return m, nil
}

func (m model) View() string {
	return strings.Join(m.msgs, "\n")
}

func main() {
	prog := tea.NewProgram(model{msgs: []string{}})
	if err := prog.Start(); err != nil {
		log.Fatal(err)
	}
}

The log during reproduction is as follows.
I am repeatedly inputting 一二三 (representing one, two, three in Japanese). After several inputs, it is detected as unknownInputByteMsg.

$ go run .
tea.WindowSizeMsg tea.WindowSizeMsg{Width:117, Height:25}
tea.KeyMsg tea.KeyMsg{Type:-1, Runes:[]int32{19968, 20108, 19977}, Alt:false, Paste:false}
tea.KeyMsg tea.KeyMsg{Type:-1, Runes:[]int32{19968, 20108, 19977}, Alt:false, Paste:false}
tea.KeyMsg tea.KeyMsg{Type:-1, Runes:[]int32{19968, 20108, 19977}, Alt:false, Paste:false}
tea.KeyMsg tea.KeyMsg{Type:-1, Runes:[]int32{19968, 20108, 19977}, Alt:false, Paste:false}
tea.KeyMsg tea.KeyMsg{Type:-1, Runes:[]int32{19968, 20108, 19977}, Alt:false, Paste:false}
tea.KeyMsg tea.KeyMsg{Type:-1, Runes:[]int32{19968}, Alt:false, Paste:false}
tea.unknownInputByteMsg 0xe4
tea.unknownInputByteMsg 0xba
tea.unknownInputByteMsg 0x8c
tea.KeyMsg tea.KeyMsg{Type:-1, Runes:[]int32{19977}, Alt:false, Paste:false}

This is because the expected 9 bytes (3 chars x 3 bytes) are read in two parts, like 0xe4, 0xb8, 0x80, 0xe4 and 0xba, 0x8c, 0xe4, 0xb8, 0x89.

Fix

This fix will resolve the character missing issue.

$ go run .
tea.WindowSizeMsg tea.WindowSizeMsg{Width:117, Height:25}
tea.KeyMsg tea.KeyMsg{Type:-1, Runes:[]int32{19968, 20108, 19977}, Alt:false, Paste:false}
tea.KeyMsg tea.KeyMsg{Type:-1, Runes:[]int32{19968}, Alt:false, Paste:false}
tea.KeyMsg tea.KeyMsg{Type:-1, Runes:[]int32{20108, 19977}, Alt:false, Paste:false}
tea.KeyMsg tea.KeyMsg{Type:-1, Runes:[]int32{19968, 20108}, Alt:false, Paste:false}
tea.KeyMsg tea.KeyMsg{Type:-1, Runes:[]int32{19977}, Alt:false, Paste:false}
tea.KeyMsg tea.KeyMsg{Type:-1, Runes:[]int32{19968, 20108}, Alt:false, Paste:false}
tea.KeyMsg tea.KeyMsg{Type:-1, Runes:[]int32{19977}, Alt:false, Paste:false}
tea.KeyMsg tea.KeyMsg{Type:-1, Runes:[]int32{19968, 20108}, Alt:false, Paste:false}
tea.KeyMsg tea.KeyMsg{Type:-1, Runes:[]int32{19977}, Alt:false, Paste:false}
tea.KeyMsg tea.KeyMsg{Type:-1, Runes:[]int32{19968, 20108}, Alt:false, Paste:false}
tea.KeyMsg tea.KeyMsg{Type:-1, Runes:[]int32{19977}, Alt:false, Paste:false}
tea.KeyMsg tea.KeyMsg{Type:-1, Runes:[]int32{19968, 20108, 19977}, Alt:false, Paste:false}

fix: handle incomplete UTF-8 sequences and add test for reproduction

dd5c773

monochromegane requested review from meowgorithm and aymanbagabas as code owners September 24, 2024 05:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle incomplete UTF-8 sequences and add test for reproduction #1166

fix: handle incomplete UTF-8 sequences and add test for reproduction #1166

monochromegane commented Sep 24, 2024

fix: handle incomplete UTF-8 sequences and add test for reproduction #1166

Are you sure you want to change the base?

fix: handle incomplete UTF-8 sequences and add test for reproduction #1166

Conversation

monochromegane commented Sep 24, 2024

Background

Reproduction

Fix