-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create an arena for package names #242
Conversation
Awesome idea! |
We'll try it and report back. |
My impression is that it gives us a small but consistent speedup, e.g., on a large resolution with a filled cache (so minimal IO):
Or, with a pre-filled cache but no existing lockfile:
|
I can probably do a bit better with more work on our side to use the IDs everywhere. |
10% on only resolution micro benchmarks leading to 1% on a end-to-end test makes sense if we are spending about 10% inside resolution code. |
Yeah, seems like a clear improvement. |
CodSpeed Performance ReportMerging #242 will degrade performances by 3.5%Comparing Summary
Benchmarks breakdown
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(the perf numbers in uv shouldn't have changed)
Here are the main differences between #274 and this PR:
|
1991777
to
e44080d
Compare
277c0da
to
f68ec2d
Compare
The I'm going to merge, we can discuss breaking changes in follow-up PR's. |
I'll upgrade uv to use this behavior. |
We have long known that resolution time is proportional to
P::Clone
. Most real-world resolution problems contain at least oneString
in thereP
. If performance is anywhere on the priority list thenP::Clone
should not allocate. Thanks to our library being generic this is easy for users to achieve and control.P
can be a wrapper aroundRc<P>
and it does not allocate, or using various interning strategies use&str
for even faster clone, or by using hashconsing (or any de-duplicating interning strategy)P
can be a wrapper aroundusize
. So for our benchmarks we usedP=u32
because it was the simplest type that met our trait pounds ORP=&str
because it was similar to what was easily available to a user. Now that we have production users, we see that thereP
tends to have a more expensive clone. (Rc
for the cargo benchmarks,Arc
for the uv use case,String
for gleam and elm.)It also turns out that resolution time is proportional to
P::Hash
. AsP
ends up being the key in a large number of hash tables. The hottest of these tables tends to bePartialSolution::package_assignments
. Only the hashconsing (or its relatives) strategy allows a user to provide very fastP::Hash
whereP
has a string in it. None of our production users are using this approach. So the performance of our internal benchmarks that use&str
are far more realistic than the ones that useu32
.Luckily we can provide a hashconsing like wrapper around
P
for our users. We already have an arena such that if two IDs are equal then the data they point at must be equal, we just need to make the inverse true that if the data is equal then any IDs for that data will be equal. This arena instead of allocating by just pushing new items to aVec
(and returning the new index) would return a previous ID if the value had already been added and do the normal thing otherwise. This does not involve a lot of code by using the indexmap crate, which is already one of our dependencies.This is a pessimization four benchmarks that use
u32
, or for any (theoretical) users who are already using hashconsing. Unfortunately rust does not have specialization, and even if it did there is no trait bound for "Hash
is cheap". Based on the (hilariously out of date named)large_case_u16_NumberVersion
this is a ~9.5% regression, similarly ~10.5% for the syntheticslow_135_0_u16_NumberVersion
. A sudoku problem, which is not synthetic but also not the intended use case, sees a similar regression.Real-world benchmarks see significant improvements.
elm_str_SemanticVersion
10.2%,zuse_str_SemanticVersion
19.9%, all of crates.io 14% without lock files and 11% with.I'd love to hear the impact on
uv
benchmarks, but I expect them to be in a similar range. Perhaps infinitesimally bigger because they are already collecting this data for their implementation of prioritize which can now just beId<P>::as_raw()
.There is potential follow-up work because
Map<Id<P
(especially when densely filled) can be replaced with aVec
decreasing the size and removing the calculation ofHash
forId
. But it's a lot of code for a much smaller when so I left it out for now.Unfortunately bunch of log and panic messages now refer to
Id(1)
instead of the name of that package. Similarly they use Debug instead of display. This is hard to fix because the relevantimpls
do not have access to the arena in which the actual package name is stored.