-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Take surrogate pairs into account for caretPositions #306
base: main
Are you sure you want to change the base?
Conversation
A quick note on testing. I wasn't able to get the text example working prior to the changes, so I tested with the in-place editing example from the README |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR! Just one point of discussion about ligatures made up of surrogate-pair codepoints.
@@ -677,6 +689,25 @@ export function createTypesetter(resolveFonts, bidi) { | |||
} | |||
} | |||
|
|||
function fillSurrogateCaretPositions(caretPositions, charStartIndex, charCount) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be simplified to just a single caretPositions.copyWithin()
call. (Edit: if this function stays; see other comment.)
fillSurrogateCaretPositions(caretPositions, prevCharIndex, charCount) | ||
} else { | ||
fillLigatureCaretPositions(caretPositions, prevCharIndex, charCount) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this may not be a strict if/else situation ... you could potentially have ligatures formed from codepoints that are surrogates. Maybe a better approach would be to modify fillLigatureCaretPositions
to check for surrogates stepwise within its loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're correct, took a deeper look into things and I didn't realize the tons of other characters that are considered surrogate pairs that aren't emojis and could very likely be part of ligature substitutions. Thanks for calling this out
Sorry for disappearing like that! Was off tech-free cycling for a few weeks. Will take a look at your feedback Monday |
Just looked at this again with fresh eyes today and have a thought I'd like to discuss before I redo the changes for this PR I'm wondering if the current implementation of caret subdividing ligature substitutions is non-standard? Example: "न्य" is 1 grapheme, but 3 code points. Despite being three code points, every text editor I've used has not subdivided the grapheme for the caret. The caret is either at the beginning of the whole grapheme or the end of the whole grapheme. Maybe what I wanted to do for surrogate pairs should actually be done for all graphemes? In this case we just update |
It's definitely nonstandard how I've done it. The intent is to handle stylistic ligatures, like in many fonts where certain sequences of latin characters are replaced by a single streamlined glyph -- in those cases I wanted to allow selection/caret placement in between characters. A common example is "fi, ffi, fl, ffl" in Roboto. I believe the fonts provide caret placement data for this sort of situation, but that's not currently parsed, hence the current simplistic approach. I definitely see how this doesn't play nicely with glyph substitutions for non-Latin languages. I'm not sure if there's a simple way to distinguish the two scenarios so we can make both work? Would have to do some research. Interestingly, in this text field I'm writing this comment in (Firefox on Mac) I'm able to place a caret in the middle of "न्य", but not in between all three codepoints - see image: |
Ah thanks for clarifying, I wasn't aware of stylistic ligatures specified by fonts. The more you know :) My next suggestion then is to use Intl.Segmenter since stylistic ligatures should still be parsed as separate graphemes by that. We'd need to wait until that hits Firefox though (it's been on Chrome and Safari for a few years now and is supposed to reach Firefox this year). With that we could check for graphemes within Or we leverage the caret data that is provided in font files, but I'm guessing that's a much more labor intensive PR? I feel like using Intl.Segmenter is an easy to implement fallback that doesn't prevent using the Font's caret data in the future |
I think, though, that I'm leaning toward just killing the current auto-spacing of interior carets and going with the single position copy. That will leave stylistic ligatures "broken" in the short term of course, but TBH that's more likely to motivate me to implement the true solution using the font-provided ligature caret positions, for which I've opened #308. |
Sounds good to me. Would you prefer I make that change to copy position in this PR, or do you prefer to do this yourself separately as part of that new issue? |
Oops! Forgot to reply. Feel free to do that in this PR, I'd merge it. 😄 |
For Issue #304
Adds a check to see if a glyph is a surrogate pair, in which case we set the same caret position for each char that makes up the glyph
In the issue you mentioned we might need to update selectionUtils. From a cursory glance I felt like it might not be necessary. It seems like
getCaretAtPoint
loops through the chars in order, so it should return the first char in a surrogate pair no matter what