-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write path normalization without array allocations #60812
base: main
Are you sure you want to change the base?
Conversation
@typescript-bot perf test this |
@@ -624,27 +624,103 @@ export function getNormalizedPathComponents(path: string, currentDirectory: stri | |||
} | |||
|
|||
/** @internal */ | |||
export function getNormalizedAbsolutePath(fileName: string, currentDirectory: string | undefined): string { | |||
return getPathFromPathComponents(getNormalizedPathComponents(fileName, currentDirectory)); | |||
export function getNormalizedAbsolutePath(path: string, currentDirectory: string | undefined): string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if currentDirectory
is almost always normalized.
@andrewbranch Here they are:
tscComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
tsserverComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
startupComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
Developer Information: |
const root = path.substring(0, rootLength); | ||
const normalizedRoot = root && normalizeSlashes(root); | ||
// `normalized` is only initialized once `path` is determined to be non-normalized | ||
let normalized = normalizedRoot === root ? undefined : normalizedRoot; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should just use lastIndexOf
to see if altDirectorySeparator
occurs. That way you can avoid creating a substring for root
at all.
I’m puzzled by the perf results |
Changing the scenarios to have long paths that are already normalized shows the old |
@typescript-bot perf test |
const sepIndex = path.indexOf(directorySeparator, index + 1); | ||
const altSepIndex = path.indexOf(altDirectorySeparator, index + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quadratic because you have to find at least one of the other kinds of slashes over and over. You should just tight-loop on !isAnyDirectorySeparator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or use findIndex(path, isAnyDirectorySeparator, index + 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern with findIndex
is that it might not optimize well it passed in both arrays and strings. Maybe I'm superstitious.
// At beginning of segment | ||
segmentStart = index; | ||
let ch = path.charCodeAt(index); | ||
while (isAnyDirectorySeparator(ch) && index + 1 < path.length) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took me a bit to see why this works, but it's because index
starts immediately after the root, or after the first separator following the prior segment. Figuring out if one of those is a backslash is handled below.
normalizedUpTo = index + 2; | ||
} | ||
} | ||
else if (normalized === undefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that in this branch it is technically possible for you to avoid extra substrings by adjusting normalizedUpTo
instead of initializing normalized
.
@andrewbranch Here they are:
tscComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
tsserverComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
startupComparison Report - baseline..pr
System info unknown
Hosts
Scenarios
Developer Information: |
// the root is `file://Users/` | ||
assert.strictEqual(ts.getNormalizedAbsolutePath("file://Users/matb/projects/san/../../../../../../typings/@epic/Core.d.ts", ""), "file://Users/typings/@epic/Core.d.ts"); | ||
// this is real | ||
assert.strictEqual(ts.getNormalizedAbsolutePath("file:///Users/matb/projects/san/../../../../../../typings/@epic/Core.d.ts", ""), "file:///typings/@epic/Core.d.ts"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idk if we want to preserve specific examples in these, but that might be fine.
Maybe we should have tests on untitled:
?
return path; | ||
} | ||
// Some paths only require cleanup of `/./` or leading `./` | ||
const simplified = path.replace(/\/\.\//g, "/").replace(/^\.\//, ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're already here, I'd also recommend
const simplified = path.replace(/\/\.\//g, "/").replace(/^\.\//, ""); | |
let simplified = path.replace(/\/\.\//g, "/"); | |
if (simplified.startsWith("./") { | |
simplified = simplified.slice(2); | |
} |
let segmentStart = index; | ||
let normalizedUpTo = index; | ||
let seenNonDotDotSegment = rootLength !== 0; | ||
while (index < path.length) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since path.length
doesn't change, it may be more efficient to store it in a local and reuse that rather than look it up each time (both here and elsewhere in the function).
normalized = path.substring(0, normalizedUpTo); | ||
} | ||
} | ||
else if (segmentLength === 2 && path.charCodeAt(index) === CharacterCodes.dot && path.charCodeAt(index + 1) === CharacterCodes.dot) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: If the previous segmentLength
was 1
, but the segment was not .
, this performs an unnecessary recheck of segmentLength === 2
.
if (normalized === undefined) { | ||
// Seen superfluous separator | ||
normalized = path.substring(0, segmentStart - 1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (normalized === undefined) { | |
// Seen superfluous separator | |
normalized = path.substring(0, segmentStart - 1); | |
} | |
// Seen superfluous separator | |
normalized ??= path.substring(0, segmentStart - 1); |
if (segmentEnd === altSepIndex && normalized === undefined) { | ||
// Seen backslash | ||
normalized = path.substring(0, segmentStart); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (segmentEnd === altSepIndex && normalized === undefined) { | |
// Seen backslash | |
normalized = path.substring(0, segmentStart); | |
} | |
if (segmentEnd === altSepIndex) { | |
// Seen backslash | |
normalized ??= path.substring(0, segmentStart); | |
} |
if (normalized === undefined) { | ||
normalized = path.substring(0, normalizedUpTo); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (normalized === undefined) { | |
normalized = path.substring(0, normalizedUpTo); | |
} | |
normalized ??= path.substring(0, normalizedUpTo); |
let normalized = simpleNormalizePath(path); | ||
if (normalized !== undefined) { | ||
return normalized; | ||
} | ||
normalized = getNormalizedAbsolutePath(path, ""); | ||
return normalized && hasTrailingDirectorySeparator(path) ? ensureTrailingDirectorySeparator(normalized) : normalized; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call to hasTrailingDirectorySeparator
is redundant as that check is also performed in ensureTrailingDirectorySeparator
. This could be simplified to:
let normalized = simpleNormalizePath(path); | |
if (normalized !== undefined) { | |
return normalized; | |
} | |
normalized = getNormalizedAbsolutePath(path, ""); | |
return normalized && hasTrailingDirectorySeparator(path) ? ensureTrailingDirectorySeparator(normalized) : normalized; | |
return simpleNormalizePath(path) ?? | |
ensureTrailingDirectorySeparator(getNormalizedAbsolutePath(path, "")); |
const sepIndex = path.indexOf(directorySeparator, index + 1); | ||
const altSepIndex = path.indexOf(altDirectorySeparator, index + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or use findIndex(path, isAnyDirectorySeparator, index + 1)
From instrumenting the compilation of VS Code, I learned two things:
Optimizing I'm going to repeat the instrumentation for a TS Server program update and see if the ratios are similar. |
During a TS Server update, literally every call is a no-op.
|
Related: #60633
Alternative to #60755
Despite the good benchmarks, I didn’t see any measurable impact in real-world
updateGraphWorker
duration on my M2 Mac Mini. However, since that was a GC hot spot for @dbaeumer and #60755 improved things some, this may do even better for him.bench.js