-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use database of partial paths to speed up bindings resolution #1198
base: main
Are you sure you want to change the base?
Use database of partial paths to speed up bindings resolution #1198
Conversation
This results in some references being resolved to many ambiguous definitions, some of which we were able to resolve via ranking.
…assertions to snapshots
Most remaining assertion tests were redundant as there were already snapshots that cover those cases.
This also removes the ranking algorithm for resolution results, since it's no longer needed.
…attributes This *should* make it easier to construct a partial paths databases in which these nodes are endpoints.
|
5d3a2e6
to
156722b
Compare
@@ -1,11 +1,11 @@ | |||
use semver::Version; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which will consume the builder and return a leaner BindingGraph with all bindings resolved.
I assume that means speeding up resolving (all) defs/refs in the file, at the expense of a slower initialization time. Is that correct? Do we have rough figures on how much this is changing? or a benchmark run for before/after?
at the expense of higher memory consumption
Do we have a rough figure of the increased memory as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I queued a run for it now: https://github.com/NomicFoundation/slang/actions/runs/12438290472/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that means speeding up resolving (all) defs/refs in the file, at the expense of a slower initialization time. Is that correct? Do we have rough figures on how much this is changing? or a benchmark run for before/after?
Yes, that's the expectation. I ran a couple of sanctuary shards locally with
infra run --bin solidity_testing_sanctuary --release -- test --shards-count 256 --shard-index INDEX --check-bindings ethereum mainnet
and while this is not exhaustive by any means, the results are quite significant:
- For
INDEX = 1
, total execution time went down from 3'32" to 1'40" - For
INDEX = 120
, total execution time went down from 4'24" to 1'26"
I expect similar results for other shards. Overall, for very small contracts we may see a slight increase in time due to the overhead of creating the database, initial population and the increased number of memory allocations. But I expect the overhead to be quickly amortized for larger contracts.
at the expense of higher memory consumption
Do we have a rough figure of the increased memory as well?
This is tough to estimate because it should vary with contract complexity. Empirically I've seen peak memory to be twice as large when using the database. YMMV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I queued a run for it now: https://github.com/NomicFoundation/slang/actions/runs/12438290472/
I've seen the results, and we may need to modify the structure of the test slightly. binding_graph_builder.resolve()
is called during the definitions
test, because in order to access the definitions the bindings need to be resolved already. That means all the cost of resolution is added to the definitions
test, while previously it was tallied in the references
test (which now has negligible cost).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and while this is not exhaustive by any means, the results are quite significant:
Looks great!
I've seen the results, and we may need to modify the structure of the test slightly.
Should we modify it in the same PR, to make sure that the benchmark results are reported correctly for this commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the tests and moved the call to .resolve()
into the references
test. So all resolution happens in that last test. The other name, definitions
is now a bit misleading though, as only ingestion of user source files happens at that stage. But execution costs should be comparable.
scripts/_common.sh
Outdated
@@ -46,3 +46,9 @@ if ! output=$( | |||
|
|||
exit 1 | |||
fi | |||
|
|||
if [[ ! -f submodules/stack-graphs/stack-graphs/Cargo.toml ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should already be taken care of by infra setup git
command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work, because to build infra
we need to have all the dependencies available, and stack-graph being in a submodule means it's not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks for explaining!
Cargo.toml
Outdated
@@ -130,7 +130,7 @@ serde = { version = "1.0.216", features = ["derive", "rc"] } | |||
serde_json = { version = "1.0.133", features = ["preserve_order"] } | |||
similar-asserts = { version = "1.6.0" } | |||
smallvec = { version = "1.7.0", features = ["union"] } | |||
stack-graphs = { version = "0.13.0" } | |||
stack-graphs = { path = "submodules/stack-graphs/stack-graphs", version = "0.14.0" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we only need the Cargo reference to build this, without any infra/pre-build steps, I wonder why are we adding the submodule alltogether?
We can just add the crate as a direct git
reference, and it will be cloned/built automatically by Cargo
:
stack-graphs = { path = "submodules/stack-graphs/stack-graphs", version = "0.14.0" } | |
stack-graphs = { git = "https://github.com/NomicFoundation/stack-graphs", ref = "SPECIFIC_REF_TO_UPDATE_TO" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's a good idea. And it would handle the previous comment as well.
@@ -130,7 +130,7 @@ serde = { version = "1.0.216", features = ["derive", "rc"] } | |||
serde_json = { version = "1.0.133", features = ["preserve_order"] } | |||
similar-asserts = { version = "1.6.0" } | |||
smallvec = { version = "1.7.0", features = ["union"] } | |||
stack-graphs = { version = "0.13.0" } | |||
stack-graphs = { path = "submodules/stack-graphs/stack-graphs", version = "0.14.0" } | |||
string-interner = { version = "0.17.0", features = [ | |||
"std", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we are forking/editing NomicFoundation/stack-graphs
, I suggest doing a few changes there first:
- keeping the
main
branch pure for upstream changes. - adding a
nomic
branch that contains both upstream+our changes. We can regularly merge changes frommain
to it. - send PR(s) to
nomic
branch with the intended changes.
This will make sure at least one person reviews the changes there, and that is kept up to date/separate from upstream.
To help with this, I'm creating the nomic
branch now, and will add CI checks/validation to it, so you can just send the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I'll set this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the PR. Since we're no longer using the crate as a submodule, we can also remove the linter/formatting configuration options that we had added to ignore warnings.
@@ -130,7 +130,7 @@ serde = { version = "1.0.216", features = ["derive", "rc"] } | |||
serde_json = { version = "1.0.133", features = ["preserve_order"] } | |||
similar-asserts = { version = "1.6.0" } | |||
smallvec = { version = "1.7.0", features = ["union"] } | |||
stack-graphs = { version = "0.13.0" } | |||
stack-graphs = { path = "submodules/stack-graphs/stack-graphs", version = "0.14.0" } | |||
string-interner = { version = "0.17.0", features = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ability to rewind the arena allocator used for partial paths after resolving each reference
Do you think this would be useful for the upstream project? maybe suggesting it as a PR, in case they accept it? then we don't have to maintain the fork at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can try, but I doubt it's useful for their normal use cases. The main problem is it's not exactly safe, since you have no direct control over the mutability of the database and a mutable reference is required to do anything meaningful with it. That means you can accidentally allocate new objects in the partial paths arena (which are invalidated when you reset) that you'll be referencing in the database.
I think it may be possible to change the design to take an immutable database reference (inside stack-graphs
), but it's probably a much bigger change.
crates/metaslang/bindings/src/lib.rs
Outdated
parents: Vec<GraphHandle>, | ||
} | ||
|
||
pub struct BindingGraph<KT: KindTypes + 'static> { | ||
pub struct BindingGraphBuilder<KT: KindTypes + 'static> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by the hierarchy here:
BindingGraphBuilder
exposed fromlib.rs
, and is different thanBuilder
, which is exposed frombuilder/mod.rs
.BindingGraph
is exposed fromresolved/mod.rs
, and is different thanGraph
, which is exposed frommetaslang_graph_builder::graph
.
WDYT of restructuring it a bit to clarify the relationships between them? If I can suggest, ordering it by the public API/use cases:
builder/mod.rs
exposes the publicBindingGraphBuilder
:- Has the internal
Builder
andResolver
under it.
- Has the internal
graph/mod.rs
exposes the publicBindingGraph
, and the related public APIs, like:graph/definition.rs
graph/reference.rs
graph/location.rs
Not blocking for this PR of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not happy about the module structure either. What we currently have in the builder
module should probably be called loader
, since it builds a graph and loads it into our stack graph. Then the resolver
and BindingGraphBuilder
could live in a builder
module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reorganized to code and put all the builder, resolver and loader code under a builder
module.
@@ -25,8 +25,8 @@ mod rust { | |||
pub definiens_location: BindingLocation, | |||
} | |||
|
|||
impl From<crate::rust_crate::bindings::Definition<'_>> for Definition { | |||
fn from(definition: crate::rust_crate::bindings::Definition<'_>) -> Self { | |||
impl From<crate::rust_crate::bindings::Definition> for Definition { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should allow easier integration with WASM since there are already wrappers for ref counted objects.
Given this, I don't think we longer need these Definition
/Reference
types here, and can just reuse the types you added in resolved/mod.rs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed both wrapper classes and changed the code to use the metaslang_bindings
Definition
and Reference
.
This uses the added `save_checkpoint`/`restore_checkpoint` which rewind the allocation pointer in the `PartialPaths` arenas. For this to work properly, we also first `ensure_both_directions` in the database so that after that it doesn't need further mutation.
The database resolver will resolve all references at once by using a database of minimal partial paths.
… the `BindingGraph`
This makes it impossible to try to access definitions/references before resolving, and allows dropping the entire stack graph and database of partial paths used for resolution after they are no longer necessary.
15f35da
to
8c55602
Compare
After #1195 is merged, I'll rebase this PR and the conflicts should be resolved. |
…records The `definitions` test name is now a bit misleading since no definitions are retrieved there, but it's still where user source files are ingested.
Builds on top of #1195
Definition
andReference
to hold aRc<>
to theBindingGraph
as opposed to a normal reference. This should allow easier integration with WASM since there are already wrappers for ref counted objects.BindingGraph
and splint off aBindingGraphBuilder
in which to add user files and built-ins and then callresolve()
which will consume the builder and return a leanerBindingGraph
with all bindings resolved.stack-graphs
which adds the ability to rewind the arena allocator used for partial paths after resolving each reference, but still allows using the default database to hold the set of minimal partial paths.