Take surrogate pairs into account for caretPositions #306

kalegd · 2024-02-22T07:33:37Z

For Issue #304

Adds a check to see if a glyph is a surrogate pair, in which case we set the same caret position for each char that makes up the glyph

In the issue you mentioned we might need to update selectionUtils. From a cursory glance I felt like it might not be necessary. It seems like getCaretAtPoint loops through the chars in order, so it should return the first char in a surrogate pair no matter what

kalegd · 2024-02-22T08:09:38Z

A quick note on testing. I wasn't able to get the text example working prior to the changes, so I tested with the in-place editing example from the README

lojjic

Thank you for the PR! Just one point of discussion about ligatures made up of surrogate-pair codepoints.

lojjic · 2024-02-25T15:13:28Z

packages/troika-three-text/src/Typesetter.js

@@ -677,6 +689,25 @@ export function createTypesetter(resolveFonts, bidi) {
    }
  }

+  function fillSurrogateCaretPositions(caretPositions, charStartIndex, charCount) {


This could be simplified to just a single caretPositions.copyWithin() call. (Edit: if this function stays; see other comment.)

lojjic · 2024-02-25T15:16:08Z

packages/troika-three-text/src/Typesetter.js

+                    fillSurrogateCaretPositions(caretPositions, prevCharIndex, charCount)
+                  } else {
+                    fillLigatureCaretPositions(caretPositions, prevCharIndex, charCount)
+                  }


I feel like this may not be a strict if/else situation ... you could potentially have ligatures formed from codepoints that are surrogates. Maybe a better approach would be to modify fillLigatureCaretPositions to check for surrogates stepwise within its loop?

You're correct, took a deeper look into things and I didn't realize the tons of other characters that are considered surrogate pairs that aren't emojis and could very likely be part of ligature substitutions. Thanks for calling this out

kalegd · 2024-03-22T14:10:55Z

Sorry for disappearing like that! Was off tech-free cycling for a few weeks. Will take a look at your feedback Monday

kalegd · 2024-03-25T07:19:11Z

Just looked at this again with fresh eyes today and have a thought I'd like to discuss before I redo the changes for this PR

I'm wondering if the current implementation of caret subdividing ligature substitutions is non-standard?

Example: "न्य" is 1 grapheme, but 3 code points. Despite being three code points, every text editor I've used has not subdivided the grapheme for the caret. The caret is either at the beginning of the whole grapheme or the end of the whole grapheme. Maybe what I wanted to do for surrogate pairs should actually be done for all graphemes? In this case we just update fillLigatureCaretPositions to use caretPositions.copyWithin() as you suggested

lojjic · 2024-03-25T15:35:48Z

It's definitely nonstandard how I've done it. The intent is to handle stylistic ligatures, like in many fonts where certain sequences of latin characters are replaced by a single streamlined glyph -- in those cases I wanted to allow selection/caret placement in between characters. A common example is "fi, ffi, fl, ffl" in Roboto.

I believe the fonts provide caret placement data for this sort of situation, but that's not currently parsed, hence the current simplistic approach.

I definitely see how this doesn't play nicely with glyph substitutions for non-Latin languages. I'm not sure if there's a simple way to distinguish the two scenarios so we can make both work? Would have to do some research.

Interestingly, in this text field I'm writing this comment in (Firefox on Mac) I'm able to place a caret in the middle of "न्य", but not in between all three codepoints - see image:

kalegd · 2024-03-26T06:46:26Z

Ah thanks for clarifying, I wasn't aware of stylistic ligatures specified by fonts. The more you know :)

My next suggestion then is to use Intl.Segmenter since stylistic ligatures should still be parsed as separate graphemes by that. We'd need to wait until that hits Firefox though (it's been on Chrome and Safari for a few years now and is supposed to reach Firefox this year). With that we could check for graphemes within fillLigatureCaretPositions, and then adjust the caret subdivision accordingly in the case of a Font's stylistic ligature including a compound grapheme specified by the Segmenter. If there is no compound grapheme, everything should still work the way it currently does like your example of "fi, ffi, fl, ffl"

Or we leverage the caret data that is provided in font files, but I'm guessing that's a much more labor intensive PR?

I feel like using Intl.Segmenter is an easy to implement fallback that doesn't prevent using the Font's caret data in the future

lojjic · 2024-03-27T16:53:40Z

Intl.Segmenter is new to me, and an interesting possibility for a short term workaround.

I think, though, that I'm leaning toward just killing the current auto-spacing of interior carets and going with the single position copy. That will leave stylistic ligatures "broken" in the short term of course, but TBH that's more likely to motivate me to implement the true solution using the font-provided ligature caret positions, for which I've opened #308.

kalegd · 2024-03-29T07:03:49Z

Sounds good to me. Would you prefer I make that change to copy position in this PR, or do you prefer to do this yourself separately as part of that new issue?

lojjic · 2024-05-31T03:22:56Z

Oops! Forgot to reply. Feel free to do that in this PR, I'd merge it. 😄

Take surrogate pairs into account for caretPositions

8a0a1a0

lojjic reviewed Feb 25, 2024

View reviewed changes

lojjic mentioned this pull request Mar 27, 2024

Support ligature caret positioning provided in font #308

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Take surrogate pairs into account for caretPositions #306

Take surrogate pairs into account for caretPositions #306

kalegd commented Feb 22, 2024

kalegd commented Feb 22, 2024

lojjic left a comment

lojjic Feb 25, 2024

lojjic Feb 25, 2024

kalegd Mar 25, 2024

kalegd commented Mar 22, 2024

kalegd commented Mar 25, 2024

lojjic commented Mar 25, 2024

kalegd commented Mar 26, 2024

lojjic commented Mar 27, 2024

kalegd commented Mar 29, 2024

lojjic commented May 31, 2024

Take surrogate pairs into account for caretPositions #306

Are you sure you want to change the base?

Take surrogate pairs into account for caretPositions #306

Conversation

kalegd commented Feb 22, 2024

kalegd commented Feb 22, 2024

lojjic left a comment

Choose a reason for hiding this comment

lojjic Feb 25, 2024

Choose a reason for hiding this comment

lojjic Feb 25, 2024

Choose a reason for hiding this comment

kalegd Mar 25, 2024

Choose a reason for hiding this comment

kalegd commented Mar 22, 2024

kalegd commented Mar 25, 2024

lojjic commented Mar 25, 2024

kalegd commented Mar 26, 2024

lojjic commented Mar 27, 2024

kalegd commented Mar 29, 2024

lojjic commented May 31, 2024