Use database of partial paths to speed up bindings resolution #1198

ggiraldez · 2024-12-19T21:36:41Z

Builds on top of #1195

Resolve all references at once using a database of minimal partial paths. This speeds up resolution considerably (since it avoids a lot of duplicated work) at the expense of higher memory consumption.
Change Definition and Reference to hold a Rc<> to the BindingGraph as opposed to a normal reference. This should allow easier integration with WASM since there are already wrappers for ref counted objects.
Split the implementation of BindingGraph and splint off a BindingGraphBuilder in which to add user files and built-ins and then call resolve() which will consume the builder and return a leaner BindingGraph with all bindings resolved.
The changes here require using our fork of stack-graphs which adds the ability to rewind the arena allocator used for partial paths after resolving each reference, but still allows using the default database to hold the set of minimal partial paths.

This results in some references being resolved to many ambiguous definitions, some of which we were able to resolve via ranking.

…hots

…assertions to snapshots

Most remaining assertion tests were redundant as there were already snapshots that cover those cases.

This also removes the ranking algorithm for resolution results, since it's no longer needed.

…attributes This *should* make it easier to construct a partial paths databases in which these nodes are endpoints.

changeset-bot · 2024-12-19T21:36:45Z

⚠️ No Changeset found

Latest commit: 1b36243

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

OmarTawfik · 2024-12-20T19:10:08Z

crates/codegen/runtime/cargo/crate/src/extensions/bindings/mod.rs

@@ -1,11 +1,11 @@
 use semver::Version;


which will consume the builder and return a leaner BindingGraph with all bindings resolved.

I assume that means speeding up resolving (all) defs/refs in the file, at the expense of a slower initialization time. Is that correct? Do we have rough figures on how much this is changing? or a benchmark run for before/after?

at the expense of higher memory consumption

Do we have a rough figure of the increased memory as well?

I queued a run for it now: https://github.com/NomicFoundation/slang/actions/runs/12438290472/

I assume that means speeding up resolving (all) defs/refs in the file, at the expense of a slower initialization time. Is that correct? Do we have rough figures on how much this is changing? or a benchmark run for before/after?

Yes, that's the expectation. I ran a couple of sanctuary shards locally with

infra run --bin solidity_testing_sanctuary --release -- test --shards-count 256 --shard-index INDEX --check-bindings ethereum mainnet

and while this is not exhaustive by any means, the results are quite significant:

For INDEX = 1, total execution time went down from 3'32" to 1'40"

For INDEX = 120, total execution time went down from 4'24" to 1'26"

I expect similar results for other shards. Overall, for very small contracts we may see a slight increase in time due to the overhead of creating the database, initial population and the increased number of memory allocations. But I expect the overhead to be quickly amortized for larger contracts.

at the expense of higher memory consumption

Do we have a rough figure of the increased memory as well?

This is tough to estimate because it should vary with contract complexity. Empirically I've seen peak memory to be twice as large when using the database. YMMV.

I queued a run for it now: https://github.com/NomicFoundation/slang/actions/runs/12438290472/

I've seen the results, and we may need to modify the structure of the test slightly. binding_graph_builder.resolve() is called during the definitions test, because in order to access the definitions the bindings need to be resolved already. That means all the cost of resolution is added to the definitions test, while previously it was tallied in the references test (which now has negligible cost).

and while this is not exhaustive by any means, the results are quite significant:

Looks great!

I've seen the results, and we may need to modify the structure of the test slightly.

Should we modify it in the same PR, to make sure that the benchmark results are reported correctly for this commit?

I updated the tests and moved the call to .resolve() into the references test. So all resolution happens in that last test. The other name, definitions is now a bit misleading though, as only ingestion of user source files happens at that stage. But execution costs should be comparable.

OmarTawfik · 2024-12-20T19:13:48Z

scripts/_common.sh

@@ -46,3 +46,9 @@ if ! output=$(

  exit 1
 fi
+
+if [[ ! -f submodules/stack-graphs/stack-graphs/Cargo.toml ]]; then


this should already be taken care of by infra setup git command.

This doesn't work, because to build infra we need to have all the dependencies available, and stack-graph being in a submodule means it's not.

I see. Thanks for explaining!

OmarTawfik · 2024-12-20T19:16:52Z

Cargo.toml

@@ -130,7 +130,7 @@ serde = { version = "1.0.216", features = ["derive", "rc"] }
 serde_json = { version = "1.0.133", features = ["preserve_order"] }
 similar-asserts = { version = "1.6.0" }
 smallvec = { version = "1.7.0", features = ["union"] }
-stack-graphs = { version = "0.13.0" }
+stack-graphs = { path = "submodules/stack-graphs/stack-graphs", version = "0.14.0" }


If we only need the Cargo reference to build this, without any infra/pre-build steps, I wonder why are we adding the submodule alltogether?

We can just add the crate as a direct git reference, and it will be cloned/built automatically by Cargo:

Suggested change

stack-graphs = { path = "submodules/stack-graphs/stack-graphs", version = "0.14.0" }

stack-graphs = { git = "https://github.com/NomicFoundation/stack-graphs", ref = "SPECIFIC_REF_TO_UPDATE_TO" }

Oh, that's a good idea. And it would handle the previous comment as well.

OmarTawfik · 2024-12-20T19:25:47Z

Cargo.toml

@@ -130,7 +130,7 @@ serde = { version = "1.0.216", features = ["derive", "rc"] }
 serde_json = { version = "1.0.133", features = ["preserve_order"] }
 similar-asserts = { version = "1.6.0" }
 smallvec = { version = "1.7.0", features = ["union"] }
-stack-graphs = { version = "0.13.0" }
+stack-graphs = { path = "submodules/stack-graphs/stack-graphs", version = "0.14.0" }
 string-interner = { version = "0.17.0", features = [
    "std",


since we are forking/editing NomicFoundation/stack-graphs, I suggest doing a few changes there first:

keeping the main branch pure for upstream changes.

adding a nomic branch that contains both upstream+our changes. We can regularly merge changes from main to it.

send PR(s) to nomic branch with the intended changes.

This will make sure at least one person reviews the changes there, and that is kept up to date/separate from upstream.
To help with this, I'm creating the nomic branch now, and will add CI checks/validation to it, so you can just send the PR.

Sounds good. I'll set this up.

Here's the PR. Since we're no longer using the crate as a submodule, we can also remove the linter/formatting configuration options that we had added to ignore warnings.

OmarTawfik · 2024-12-20T19:35:04Z

Cargo.toml

@@ -130,7 +130,7 @@ serde = { version = "1.0.216", features = ["derive", "rc"] }
 serde_json = { version = "1.0.133", features = ["preserve_order"] }
 similar-asserts = { version = "1.6.0" }
 smallvec = { version = "1.7.0", features = ["union"] }
-stack-graphs = { version = "0.13.0" }
+stack-graphs = { path = "submodules/stack-graphs/stack-graphs", version = "0.14.0" }
 string-interner = { version = "0.17.0", features = [


the ability to rewind the arena allocator used for partial paths after resolving each reference

Do you think this would be useful for the upstream project? maybe suggesting it as a PR, in case they accept it? then we don't have to maintain the fork at all.

We can try, but I doubt it's useful for their normal use cases. The main problem is it's not exactly safe, since you have no direct control over the mutability of the database and a mutable reference is required to do anything meaningful with it. That means you can accidentally allocate new objects in the partial paths arena (which are invalidated when you reset) that you'll be referencing in the database.

I think it may be possible to change the design to take an immutable database reference (inside stack-graphs), but it's probably a much bigger change.

OmarTawfik · 2024-12-20T19:48:19Z

crates/metaslang/bindings/src/lib.rs

    parents: Vec<GraphHandle>,
 }

-pub struct BindingGraph<KT: KindTypes + 'static> {
+pub struct BindingGraphBuilder<KT: KindTypes + 'static> {


I'm a bit confused by the hierarchy here:

BindingGraphBuilder exposed from lib.rs, and is different than Builder, which is exposed from builder/mod.rs.

BindingGraph is exposed from resolved/mod.rs, and is different than Graph, which is exposed from metaslang_graph_builder::graph.

WDYT of restructuring it a bit to clarify the relationships between them? If I can suggest, ordering it by the public API/use cases:

builder/mod.rs exposes the public BindingGraphBuilder:

Has the internal Builder and Resolver under it.

graph/mod.rs exposes the public BindingGraph, and the related public APIs, like:

graph/definition.rs

graph/reference.rs

graph/location.rs

Not blocking for this PR of course.

Yeah, I'm not happy about the module structure either. What we currently have in the builder module should probably be called loader, since it builds a graph and loads it into our stack graph. Then the resolver and BindingGraphBuilder could live in a builder module.

I reorganized to code and put all the builder, resolver and loader code under a builder module.

OmarTawfik · 2024-12-20T19:52:28Z

crates/codegen/runtime/cargo/wasm/src/runtime/wrappers/bindings/mod.rs

@@ -25,8 +25,8 @@ mod rust {
        pub definiens_location: BindingLocation,
    }

-    impl From<crate::rust_crate::bindings::Definition<'_>> for Definition {
-        fn from(definition: crate::rust_crate::bindings::Definition<'_>) -> Self {
+    impl From<crate::rust_crate::bindings::Definition> for Definition {


This should allow easier integration with WASM since there are already wrappers for ref counted objects.

Given this, I don't think we longer need these Definition/Reference types here, and can just reuse the types you added in resolved/mod.rs.

I removed both wrapper classes and changed the code to use the metaslang_bindings Definition and Reference.

This uses the added `save_checkpoint`/`restore_checkpoint` which rewind the allocation pointer in the `PartialPaths` arenas. For this to work properly, we also first `ensure_both_directions` in the database so that after that it doesn't need further mutation.

…lver`

The database resolver will resolve all references at once by using a database of minimal partial paths.

… the `BindingGraph`

This makes it impossible to try to access definitions/references before resolving, and allows dropping the entire stack graph and database of partial paths used for resolution after they are no longer necessary.

…er and loader

ggiraldez · 2024-12-23T22:57:28Z

After #1195 is merged, I'll rebase this PR and the conflicts should be resolved.

…records The `definitions` test name is now a bit misleading since no definitions are retrieved there, but it's still where user source files are ingested.

ggiraldez added 21 commits December 17, 2024 16:46

Remove bindings target context {set,get}_context methods

2117d83

Use Reference::definitions() instead of resolve_definition()

76501df

This results in some references being resolved to many ambiguous definitions, some of which we were able to resolve via ranking.

Migrate some binding assertions tests to snapshots

9ed010d

Migrate binding assertions for constants, control and arrays to snaps…

cd7eb08

…hots

Migrate events and errors binding assertions tests to snapshots

ffc1c19

Migrate functions/function types/scoping/imports bindings tests from …

c7113d7

…assertions to snapshots

Finish migration of bindings assertions to snapshots

6cc223c

Most remaining assertion tests were redundant as there were already snapshots that cover those cases.

Remove bindings assertions completely

5f5575b

Remove parsing of context in bindings test files

e695f07

Update public_api.txt

0fbd092

Remove resolve_definition() in favor of definitions()

862f382

This also removes the ranking algorithm for resolution results, since it's no longer needed.

Remove C3 linearization algorithm implementation

a33b0e7

Remove tag graph attribute

009e9f3

Update perf tests resolved reference count

02905e4

Formatting fixes

cb802ad

Mark extension hooks and scopes with is_exported and is_endpoint …

842ba72

…attributes This *should* make it easier to construct a partial paths databases in which these nodes are endpoints.

Basic resolver using a database of minimal partial paths

d403c52

Disable graph debugging info (improving processing times)

b66e21e

Add method to resolve all at once and save results

c59033b

Perform scope extension with a database of partial paths

44ee978

Fix linter warnings

78ad197

ggiraldez requested review from a team as code owners December 19, 2024 21:36

ggiraldez force-pushed the hooks-database-stitching branch 2 times, most recently from 5d3a2e6 to 156722b Compare December 19, 2024 23:10

OmarTawfik reviewed Dec 20, 2024

View reviewed changes

ggiraldez added 10 commits December 23, 2024 16:02

Encapsulate building the database when constructing the `DatabaseReso…

ad0ec96

…lver`

Refactor: move most reference/definition getters to the Bindings owner

73c870e

Expose DatabaseResolver type as public

3acf4ce

Update public_api.txt

db02560

Replace old resolver with database resolver

25239f1

The database resolver will resolve all references at once by using a database of minimal partial paths.

Remove obsolete ResolutionError

e90a6e5

Definition and Reference take an Rc<> instead of a reference to…

9b1cc3f

… the `BindingGraph`

Add a bit of documentation

203eda7

Split builder part of BindingGraph to use before calling resolve()

8c55602

This makes it impossible to try to access definitions/references before resolving, and allows dropping the entire stack graph and database of partial paths used for resolution after they are no longer necessary.

ggiraldez force-pushed the hooks-database-stitching branch from 15f35da to 8c55602 Compare December 23, 2024 21:16

ggiraldez added 3 commits December 23, 2024 16:51

Rename bindings Builder to Loader

70e9607

Refactor: move bindings builder into its own module along with resolv…

edf12f2

…er and loader

Remove wrappers for Definition/Reference used in WASM

7cc333c

ggiraldez added 3 commits December 24, 2024 11:09

Update stack-graphs dependency to the nomic branch of our fork

bff3daa

Refactor performance to try to keep consistent with previous bencher …

4b18696

…records The `definitions` test name is now a bit misleading since no definitions are retrieved there, but it's still where user source files are ingested.

Update public_api.txt

1b36243

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use database of partial paths to speed up bindings resolution #1198

Use database of partial paths to speed up bindings resolution #1198

ggiraldez commented Dec 19, 2024 •

edited

Loading

changeset-bot bot commented Dec 19, 2024 •

edited

Loading

OmarTawfik Dec 20, 2024 •

edited

Loading

OmarTawfik Dec 20, 2024

ggiraldez Dec 20, 2024

ggiraldez Dec 20, 2024

OmarTawfik Dec 24, 2024

ggiraldez Dec 24, 2024

OmarTawfik Dec 20, 2024

ggiraldez Dec 20, 2024

OmarTawfik Dec 20, 2024

OmarTawfik Dec 20, 2024 •

edited

Loading

ggiraldez Dec 20, 2024

OmarTawfik Dec 20, 2024 •

edited

Loading

ggiraldez Dec 20, 2024

ggiraldez Dec 23, 2024

OmarTawfik Dec 20, 2024

ggiraldez Dec 20, 2024 •

edited

Loading

OmarTawfik Dec 20, 2024

ggiraldez Dec 20, 2024

ggiraldez Dec 23, 2024

OmarTawfik Dec 20, 2024

ggiraldez Dec 23, 2024

ggiraldez commented Dec 23, 2024

	stack-graphs = { path = "submodules/stack-graphs/stack-graphs", version = "0.14.0" }
	stack-graphs = { git = "https://github.com/NomicFoundation/stack-graphs", ref = "SPECIFIC_REF_TO_UPDATE_TO" }

Use database of partial paths to speed up bindings resolution #1198

Are you sure you want to change the base?

Use database of partial paths to speed up bindings resolution #1198

Conversation

ggiraldez commented Dec 19, 2024 • edited Loading

changeset-bot bot commented Dec 19, 2024 • edited Loading

⚠️ No Changeset found

OmarTawfik Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OmarTawfik Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OmarTawfik Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggiraldez Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggiraldez commented Dec 23, 2024

ggiraldez commented Dec 19, 2024 •

edited

Loading

changeset-bot bot commented Dec 19, 2024 •

edited

Loading

OmarTawfik Dec 20, 2024 •

edited

Loading

OmarTawfik Dec 20, 2024 •

edited

Loading

OmarTawfik Dec 20, 2024 •

edited

Loading

ggiraldez Dec 20, 2024 •

edited

Loading