Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take surrogate pairs into account for caretPositions #306

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 40 additions & 9 deletions packages/troika-three-text/src/Typesetter.js
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,10 @@ export function createTypesetter(resolveFonts, bidi) {
// In the future we may consider a full Unicode line breaking algorithm impl: https://www.unicode.org/reports/tr14
const BREAK_AFTER_CHARS = new RegExp(`${lineBreakingWhiteSpace}|[\\-\\u007C\\u00AD\\u2010\\u2012-\\u2014\\u2027\\u2056\\u2E17\\u2E40]`)

// glyphs that start with a code point within the surrogate range should be treated as surrogate pairs
const HIGH_SURROGATE_START = 0xd800;
const HIGH_SURROGATE_END = 0xdbff;

/**
* Load and parse all the necessary fonts to render a given string of text, then group
* them into consecutive runs of characters sharing a font.
Expand Down Expand Up @@ -517,13 +521,17 @@ export function createTypesetter(resolveFonts, bidi) {
caretPositions[charIndex * 4 + 2] = line.baseline + fontData.caretBottom + anchorYOffset //common bottom y
caretPositions[charIndex * 4 + 3] = line.baseline + fontData.caretTop + anchorYOffset //common top y

// If we skipped any chars from the previous glyph (due to ligature subs), fill in caret
// positions for those missing char indices; currently this uses a best-guess by dividing
// If we skipped any chars from the previous glyph (due to ligature subs/surrogates), fill in caret
// positions for those missing char indices; currently ligatures uses a best-guess by dividing
// the ligature's width evenly. In the future we may try to use the font's LigatureCaretList
// table to get better interior caret positions.
const ligCount = charIndex - prevCharIndex
if (ligCount > 1) {
fillLigatureCaretPositions(caretPositions, prevCharIndex, ligCount)
const charCount = charIndex - prevCharIndex
if (charCount > 1) {
if (isSurrogate(text.slice(prevCharIndex, charIndex))) {
fillSurrogateCaretPositions(caretPositions, prevCharIndex, charCount)
} else {
fillLigatureCaretPositions(caretPositions, prevCharIndex, charCount)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this may not be a strict if/else situation ... you could potentially have ligatures formed from codepoints that are surrogates. Maybe a better approach would be to modify fillLigatureCaretPositions to check for surrogates stepwise within its loop?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, took a deeper look into things and I didn't realize the tons of other characters that are considered surrogate pairs that aren't emojis and could very likely be part of ligature substitutions. Thanks for calling this out

}
prevCharIndex = charIndex
}
Expand Down Expand Up @@ -597,11 +605,15 @@ export function createTypesetter(resolveFonts, bidi) {
}
})

// Fill in remaining caret positions in case the final character was a ligature
// Fill in remaining caret positions in case the final character was a ligature/surrogate
if (caretPositions) {
const ligCount = text.length - prevCharIndex;
if (ligCount > 1) {
fillLigatureCaretPositions(caretPositions, prevCharIndex, ligCount)
const charCount = text.length - prevCharIndex
if (charCount > 1) {
if (isSurrogate(text.slice(prevCharIndex, charIndex))) {
fillSurrogateCaretPositions(caretPositions, prevCharIndex, charCount)
} else {
fillLigatureCaretPositions(caretPositions, prevCharIndex, charCount)
}
}
}
}
Expand Down Expand Up @@ -677,6 +689,25 @@ export function createTypesetter(resolveFonts, bidi) {
}
}

function fillSurrogateCaretPositions(caretPositions, charStartIndex, charCount) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be simplified to just a single caretPositions.copyWithin() call. (Edit: if this function stays; see other comment.)

const charStartX = caretPositions[charStartIndex * 4]
const charEndX = caretPositions[charStartIndex * 4 + 1]
const charBottom = caretPositions[charStartIndex * 4 + 2]
const charTop = caretPositions[charStartIndex * 4 + 3]
for (let i = 0; i < charCount; i++) {
const startIndex = (charStartIndex + i) * 4
caretPositions[startIndex] = charStartX;
caretPositions[startIndex + 1] = charEndX;
caretPositions[startIndex + 2] = charBottom
caretPositions[startIndex + 3] = charTop
}
}

function isSurrogate(text) {
const firstCodeUnit = text.charCodeAt(0);
return firstCodeUnit >= HIGH_SURROGATE_START && firstCodeUnit <= HIGH_SURROGATE_END;
}

function now() {
return (self.performance || Date).now()
}
Expand Down