perf(bundle-source): cut multi-entry agoric bundling time and add detailed profiling by turadg · Pull Request #3099 · endojs/endo

turadg · 2026-02-19T23:07:24Z

Refs: (none)

Description

This PR speeds up @endo/bundle-source for multi-entrypoint workloads (notably agoric-sdk) and improves profiling visibility for further tuning.

Primary changes:

Add richer end-to-end profiling and trace merge summaries:
- packages/bundle-source/tools/profile-agoric-bundling.mts
- packages/bundle-source/tools/trace-merge.js
Add import/parse/transform instrumentation in compartment-mapper and evasive-transform.
Optimize node-modules graph finalization path (finalizeGraph) and canonical/read behavior:
- packages/compartment-mapper/src/generic-graph.js
- packages/compartment-mapper/src/node-modules.js
- packages/compartment-mapper/src/node-powers.js
- packages/compartment-mapper/src/import-hook.js
- packages/compartment-mapper/src/map-parser.js
Add cross-bundle parser reuse in archive parsing:
- packages/compartment-mapper/src/parse-archive-mjs.js
Improve archive/zip path and add zip microbenchmark tooling:
- packages/compartment-mapper/src/archive-lite.js
- packages/zip/src/writer.js
- packages/zip/src/format-writer.js
- packages/zip/tools/benchmark-writer.mjs
Add/read cache behavior in bundle-source zip bundling path:
- packages/bundle-source/src/zip-base64.js

Measured result from yarn workspace @endo/bundle-source profile:agoric-bundling:

Baseline: /var/folders/v2/309w97f90q5d0r47d3np62v40000gn/T/profile-agoric-bundling-2026-02-19T21-21-58-060Z
- bundleSource.total: 13180.267ms
Latest: /var/folders/v2/309w97f90q5d0r47d3np62v40000gn/T/profile-agoric-bundling-2026-02-19T23-00-54-856Z
- bundleSource.total: 7538.092ms

Total win: -42.8% (-5642.175ms)

Optimization contribution checkpoints:

Node-modules/read-path optimizations (21:21:58 -> 21:33:22):
- 13180.267ms -> 10021.350ms (-23.97%)
Zip path tuning and profiling (21:33:22 -> 21:53:12):
- 10021.350ms -> 9884.006ms (-1.37%)
Cross-bundle parse reuse (21:53:12 -> 23:00:54):
- 9884.006ms -> 7538.092ms (-23.74%)
- cache stats (latest): 3887 hits / 357 misses (~91.6% hit rate)

Security Considerations

No new external runtime dependency is introduced.

Changes are internal performance/instrumentation improvements in bundling and zip writing code paths. New behavior is primarily caching and profiling spans. Cache scope remains process-local and only stores parser output for identical transformed source/module URL combinations.

Scaling Considerations

This PR is intended to reduce CPU and wall-clock time for repeated bundling of overlapping dependency graphs.

Tradeoffs:

Added in-process caches increase transient memory usage.
Cache in parse-archive-mjs is bounded by an entry cap and resets when exceeding the cap.

Net effect on profiled workload is significantly lower bundling latency.

Documentation Considerations

No end-user API change.

Operationally useful additions:

New @endo/zip benchmark script: yarn workspace @endo/zip bench:writer
Enhanced profiling summaries for bundle-source tooling.

No migration instructions needed.

Testing Considerations

Executed tests include:

yarn workspace @endo/zip test
yarn workspace @endo/compartment-mapper test test/map-node-modules.test.js
yarn workspace @endo/compartment-mapper test test/node-powers.test.js
yarn workspace @endo/compartment-mapper test test/parse-archive-mjs.test.js
yarn workspace @endo/compartment-mapper test test/bundle.test.js (known expected failure remains)
yarn workspace @endo/bundle-source test test/profiling.test.js
yarn workspace @endo/bundle-source test test/comment.test.js
Multiple runs of yarn workspace @endo/bundle-source profile:agoric-bundling

Compatibility Considerations

No intentional breaking changes.

Behavior should remain compatible; changes are performance-oriented and profiling-oriented with conservative cache bypass conditions where side effects may exist.

Upgrade Considerations

No special upgrade action required.

No *BREAKING* change in commit message and no NEWS.md entry added, since this is internal perf and tooling behavior.

… propagation

…detail

changeset-bot · 2026-02-19T23:07:35Z

⚠️ No Changeset found

Latest commit: b03648b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copilot

Pull request overview

This PR introduces significant performance optimizations for @endo/bundle-source targeting multi-entry bundling workloads, achieving a reported ~43% speedup on agoric-sdk bundling tasks. The changes focus on three main areas: profiling infrastructure, caching mechanisms, and algorithmic optimizations.

Changes:

Adds comprehensive Chrome trace profiling with tooling for aggregation and analysis
Implements multiple caching layers: process-global read cache, parse result cache, canonical path memoization, and import hook read cache
Optimizes graph shortest-path computation by computing Dijkstra once from source
Adds fast-path optimization in evasive-transform to skip parsing when no risky patterns are detected
Enhances ZIP writing with pre-allocation and profiling instrumentation

Reviewed changes

Copilot reviewed 33 out of 34 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
packages/bundle-source/src/profile.js	New profiling module with Chrome trace output
packages/bundle-source/src/zip-base64.js	Adds process-global read cache and profiling integration
packages/bundle-source/src/script.js	Integrates profiling into script bundling path
packages/bundle-source/src/endo.js	Adds profiling spans to transform and type-erasure steps
packages/bundle-source/tools/profile-agoric-bundling.mts	Tool for profiling agoric-sdk bundle workloads
packages/bundle-source/tools/trace-merge.js	Merges and summarizes multiple trace files
packages/bundle-source/test/profiling.test.js	Basic profiling functionality test
packages/compartment-mapper/src/parse-archive-mjs.js	Adds parse result caching with 20k entry limit
packages/compartment-mapper/src/node-powers.js	Adds canonical path memoization and concurrent read limiting
packages/compartment-mapper/src/import-hook.js	Adds per-operation read cache and suffix expansion optimization
packages/compartment-mapper/src/generic-graph.js	New makeShortestPathFromSource for single-source shortest paths
packages/compartment-mapper/src/node-modules.js	Uses optimized single-source Dijkstra algorithm
packages/compartment-mapper/src/archive-lite.js	Adds profiling and optimizes synchronous write detection
packages/evasive-transform/src/index.js	Adds fast-path to skip transform when no risky patterns detected
packages/zip/src/writer.js	Pre-allocates buffer and adds profiling support
packages/zip/src/format-writer.js	Adds profiling spans and optimizes record building
packages/zip/tools/benchmark-writer.mjs	New ZIP writer microbenchmark tool

Comments suppressed due to low confidence (6)

packages/compartment-mapper/src/import-hook.js:669

The cached read promise is removed from cache on rejection (line 666), but successful reads are never removed. This cache is scoped to a single mapping run via makeImportHookMaker, so it's bounded by the lifetime of a single bundle operation. However, there's no maximum size limit, so a bundle with thousands of modules could accumulate all successful reads in memory without bound during that operation.

  const cachedMaybeRead = location => {
    const cached = maybeReadCache.get(location);
    if (cached !== undefined) {
      return cached;
    }
    const pending = Promise.resolve(maybeRead(location));
    maybeReadCache.set(location, pending);
    pending.catch(() => {
      maybeReadCache.delete(location);
    });
    return pending;
  };

packages/bundle-source/src/zip-base64.js:62

The cache eviction in cacheReadValue uses FIFO (first-in-first-out) by removing the first entry from the Map iterator. However, JavaScript Map iteration order is insertion order, not access order. This means frequently accessed entries can be evicted before rarely accessed ones. Consider using an LRU (Least Recently Used) eviction policy for better cache hit rates, or at minimum document this FIFO behavior.

  while (cachedReadBytes > readCacheMaxBytes && cachedReads.size > 0) {
    const oldestKey = cachedReads.keys().next().value;
    if (oldestKey === undefined) {
      break;
    }
    const value = cachedReads.get(oldestKey);
    cachedReads.delete(oldestKey);
    if (value !== undefined) {
      cachedReadBytes -= value.length;
    }
  }

packages/bundle-source/src/script.js:112

Inconsistent indentation: lines 111-112 have different indentation from surrounding lines (lines 110 and 113-129). Line 111 should be indented to align with line 110, and line 112 should be indented to align with line 113.

      commonDependencies,
      profileStartSpan: profiler.startSpan,

packages/compartment-mapper/src/archive-lite.js:102

The duck-typing check for Promise (lines 97-101) is fragile. It checks for an object with a then property that's a function, but this will incorrectly treat any thenable (including user-defined thenables or objects that happen to have a then method) as a Promise. Use maybeWrite instanceof Promise or Promise.resolve(maybeWrite) === maybeWrite for more robust Promise detection, or check for maybeWrite?.constructor?.name === 'Promise'.

          if (
            maybeWrite &&
            typeof maybeWrite === 'object' &&
            'then' in maybeWrite &&
            typeof maybeWrite.then === 'function'
          ) {

packages/bundle-source/src/profile.js:10

The nextTraceFileId counter at module scope will never reset and could theoretically overflow after 2^53 trace files in a long-running process. While extremely unlikely in practice, consider resetting it periodically or using a timestamp-based approach instead. Alternatively, document that this is acceptable given the improbability of reaching Number.MAX_SAFE_INTEGER trace files.

let nextTraceFileId = 0;

packages/zip/src/writer.js:20

The size estimation assumes ASCII-only filenames (comment on line 18-19 acknowledges this). However, for non-ASCII paths, the BufferWriter may need to grow significantly. Consider using TextEncoder.encode(file.name).length for more accurate size estimation of UTF-8 encoded paths, especially since the actual encoding uses UTF-8.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-19T23:13:51Z

packages/compartment-mapper/src/parse-archive-mjs.js

+      : typeof sourceMap === 'string'
+        ? sourceMap
+        : JSON.stringify(sourceMap);
+  const cacheKey = `${source}\n//# sourceMappingURL=${sourceMapKey}`;


Cache key construction is vulnerable to collision. If the source code contains the exact string \n//# sourceMappingURL=, a different source with the same content up to that point but different subsequent code could collide with an entry that has a sourceMap. A more robust approach would be to use separate cache key components or a hash-based key.

Suggested change

const cacheKey = `${source}\n//# sourceMappingURL=${sourceMapKey}`;

const cacheKey = JSON.stringify([source, sourceMapKey]);

It does seem like a very unwieldy cache key; can we somehow do better?

Copilot · 2026-02-19T23:13:51Z

packages/compartment-mapper/src/parse-archive-mjs.js

+      parseArchiveMjsCacheEntries += 1;
+      if (parseArchiveMjsCacheEntries > MAX_PARSE_ARCHIVE_MJS_CACHE_ENTRIES) {
+        parseArchiveMjsCache.clear();
+        parseArchiveMjsCacheEntries = 0;
+        byLocation = new Map();
+        parseArchiveMjsCache.set(sourceUrl, byLocation);
+      }


Cache eviction logic clears the entire cache when limit is exceeded, but then immediately recreates the byLocation map and adds the new entry without checking if the limit would still be exceeded. If a single location URL has more than MAX_PARSE_ARCHIVE_MJS_CACHE_ENTRIES source variations, this will thrash the cache continuously. Consider evicting entries more selectively or preventing a single URL from dominating the cache.

Suggested change

parseArchiveMjsCacheEntries += 1;

if (parseArchiveMjsCacheEntries > MAX_PARSE_ARCHIVE_MJS_CACHE_ENTRIES) {

parseArchiveMjsCache.clear();

parseArchiveMjsCacheEntries = 0;

byLocation = new Map();

parseArchiveMjsCache.set(sourceUrl, byLocation);

}

const byLocationSize = byLocation.size;

if (parseArchiveMjsCacheEntries + 1 > MAX_PARSE_ARCHIVE_MJS_CACHE_ENTRIES) {

// Evict only entries for this location to prevent one URL

// from dominating and thrashing the entire cache.

parseArchiveMjsCacheEntries -= byLocationSize;

byLocation = new Map();

parseArchiveMjsCache.set(sourceUrl, byLocation);

}

parseArchiveMjsCacheEntries += 1;

I agree with copilot, must we clear the entire cache rather than e.g. evicting all entries under the oldest key?

Suggested change

parseArchiveMjsCacheEntries += 1;

if (parseArchiveMjsCacheEntries > MAX_PARSE_ARCHIVE_MJS_CACHE_ENTRIES) {

parseArchiveMjsCache.clear();

parseArchiveMjsCacheEntries = 0;

byLocation = new Map();

parseArchiveMjsCache.set(sourceUrl, byLocation);

}

parseArchiveMjsCacheEntries += 1;

if (parseArchiveMjsCacheEntries > MAX_PARSE_ARCHIVE_MJS_CACHE_ENTRIES) {

// Find the oldest key in parseArchiveMjsCache and evict all of its

// sub-entries, either by deleting its entry or (if we need it here)

// by clearing is associated map.

const oldest = parseArchiveMjsCache.entries().next().value;

const oldestByLocation = oldest?.[1];

const size = oldestByLocation?.size;

assert(size > 0);

if (byLocation === oldestByLocation) {

byLocation.clear();

} else {

parseArchiveMjsCache.delete(oldest[0]);

}

parseArchiveMjsCacheEntries -= size;

}

Copilot · 2026-02-19T23:13:51Z

packages/bundle-source/src/zip-base64.js

+ * @param {string} location
+ * @param {Uint8Array | undefined} bytes
+ */
+const cacheReadValue = (location, bytes) => {
+  const prior = cachedReads.get(location);
+  if (prior !== undefined) {
+    cachedReadBytes -= prior.length;
+  }
+
+  cachedReads.set(location, bytes);


The read cache (cachedReads, pendingReads, cachedReadBytes) is declared at module scope, making it shared across all bundleZipBase64 calls in the same process. This could lead to cache poisoning if different bundle operations read the same file path but expect different content (e.g., if files are modified between bundle calls, or if the same relative path resolves differently in different working directories). Consider scoping the cache to individual bundle operations or including additional context (like working directory or file modification time) in the cache key.

Suggested change

* @param {string} location

* @param {Uint8Array | undefined} bytes

*/

const cacheReadValue = (location, bytes) => {

const prior = cachedReads.get(location);

if (prior !== undefined) {

cachedReadBytes -= prior.length;

}

cachedReads.set(location, bytes);

* Derive a cache key for a read location that includes contextual information

* about the current working directory, so that the same location string used

* in different working directories does not collide in the cache.

*

* @param {string} location

* @returns {string}

*/

const makeReadCacheKey = location => `${process.cwd()}:${location}`;

/**

* @param {string} location

* @param {Uint8Array | undefined} bytes

*/

const cacheReadValue = (location, bytes) => {

const key = makeReadCacheKey(location);

const prior = cachedReads.get(key);

if (prior !== undefined) {

cachedReadBytes -= prior.length;

}

cachedReads.set(key, bytes);

Copilot · 2026-02-19T23:13:52Z

packages/compartment-mapper/src/import-hook.js

 const nodejsConventionSearchSuffixes = [
  // LOAD_AS_FILE(X)
  '.js',
  '.json',
-  '.node',
  // LOAD_INDEX(X)
  '/index.js',
  '/index.json',
-  '/index.node',
 ];


Removal of '.node' and '/index.node' from nodejsConventionSearchSuffixes could break compatibility with native Node.js addons. While endo/compartment-mapper may not support native modules in compartments, this changes documented Node.js resolution behavior without explanation in the PR description or code comments. If this is intentional, add a comment explaining why native addon resolution is excluded.

Copilot · 2026-02-19T23:13:52Z

packages/compartment-mapper/src/import-hook.js

+  const hasExplicitExtension = leaf.includes('.');
+  if (!endsWithSlash && hasExplicitExtension && moduleSpecifier !== '.') {
+    return candidates;


The fast-path check leaf.includes('.') will incorrectly skip suffix expansion for specifiers with dots in directory names (e.g., @scope/pkg.v2/module or src/2.0/handler). This could break module resolution for packages with dots in their path components. Consider using a more precise check like verifying the extension starts after the last slash and checking against known file extensions.

gibson042 · 2026-02-20T00:29:22Z