Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an arena for package names #242

Merged
merged 1 commit into from
Nov 20, 2024
Merged

Create an arena for package names #242

merged 1 commit into from
Nov 20, 2024

Conversation

Eh2406
Copy link
Member

@Eh2406 Eh2406 commented Jul 22, 2024

We have long known that resolution time is proportional to P::Clone. Most real-world resolution problems contain at least one String in there P. If performance is anywhere on the priority list then P::Clone should not allocate. Thanks to our library being generic this is easy for users to achieve and control. P can be a wrapper around Rc<P> and it does not allocate, or using various interning strategies use &str for even faster clone, or by using hashconsing (or any de-duplicating interning strategy) P can be a wrapper around usize. So for our benchmarks we used P=u32 because it was the simplest type that met our trait pounds OR P=&str because it was similar to what was easily available to a user. Now that we have production users, we see that there P tends to have a more expensive clone. (Rc for the cargo benchmarks, Arc for the uv use case, String for gleam and elm.)

It also turns out that resolution time is proportional to P::Hash. As P ends up being the key in a large number of hash tables. The hottest of these tables tends to be PartialSolution::package_assignments. Only the hashconsing (or its relatives) strategy allows a user to provide very fast P::Hash where P has a string in it. None of our production users are using this approach. So the performance of our internal benchmarks that use &str are far more realistic than the ones that use u32.

Luckily we can provide a hashconsing like wrapper around P for our users. We already have an arena such that if two IDs are equal then the data they point at must be equal, we just need to make the inverse true that if the data is equal then any IDs for that data will be equal. This arena instead of allocating by just pushing new items to a Vec (and returning the new index) would return a previous ID if the value had already been added and do the normal thing otherwise. This does not involve a lot of code by using the indexmap crate, which is already one of our dependencies.

This is a pessimization four benchmarks that use u32, or for any (theoretical) users who are already using hashconsing. Unfortunately rust does not have specialization, and even if it did there is no trait bound for "Hash is cheap". Based on the (hilariously out of date named) large_case_u16_NumberVersion this is a ~9.5% regression, similarly ~10.5% for the synthetic slow_135_0_u16_NumberVersion. A sudoku problem, which is not synthetic but also not the intended use case, sees a similar regression.

Real-world benchmarks see significant improvements. elm_str_SemanticVersion 10.2%, zuse_str_SemanticVersion 19.9%, all of crates.io 14% without lock files and 11% with.

I'd love to hear the impact on uv benchmarks, but I expect them to be in a similar range. Perhaps infinitesimally bigger because they are already collecting this data for their implementation of prioritize which can now just be Id<P>::as_raw().

There is potential follow-up work because Map<Id<P (especially when densely filled) can be replaced with a Vec decreasing the size and removing the calculation of Hash for Id. But it's a lot of code for a much smaller when so I left it out for now.

Unfortunately bunch of log and panic messages now refer to Id(1) instead of the name of that package. Similarly they use Debug instead of display. This is hard to fix because the relevant impls do not have access to the arena in which the actual package name is stored.

@charliermarsh
Copy link
Contributor

Awesome idea!

@charliermarsh
Copy link
Contributor

We'll try it and report back.

@charliermarsh
Copy link
Contributor

My impression is that it gives us a small but consistent speedup, e.g., on a large resolution with a filled cache (so minimal IO):

❯ hyperfine "../target/release/main lock" "../target/release/uv lock" --warmup 100 --runs 100
Benchmark 1: ../target/release/main lock
  Time (mean ± σ):      27.3 ms ±   0.5 ms    [User: 21.3 ms, System: 5.2 ms]
  Range (min … max):    26.5 ms …  30.6 ms    100 runs

Benchmark 2: ../target/release/uv lock
  Time (mean ± σ):      26.7 ms ±   0.3 ms    [User: 20.8 ms, System: 5.1 ms]
  Range (min … max):    26.1 ms …  27.3 ms    100 runs

Summary
  '../target/release/uv lock' ran
    1.02 ± 0.02 times faster than '../target/release/main lock'

Or, with a pre-filled cache but no existing lockfile:

❯ hyperfine "../target/release/main lock" "../target/release/uv lock" --warmup 100 --runs 100 --prepare "rm uv.lock"
Benchmark 1: ../target/release/main lock
  Time (mean ± σ):     157.9 ms ±   4.7 ms    [User: 175.9 ms, System: 176.6 ms]
  Range (min … max):   147.0 ms … 174.1 ms    100 runs

Benchmark 2: ../target/release/uv lock
  Time (mean ± σ):     155.1 ms ±   4.3 ms    [User: 169.5 ms, System: 169.9 ms]
  Range (min … max):   146.6 ms … 169.9 ms    100 runs

Summary
  '../target/release/uv lock' ran
    1.02 ± 0.04 times faster than '../target/release/main lock'

@charliermarsh
Copy link
Contributor

I can probably do a bit better with more work on our side to use the IDs everywhere.

@Eh2406
Copy link
Member Author

Eh2406 commented Jul 23, 2024

10% on only resolution micro benchmarks leading to 1% on a end-to-end test makes sense if we are spending about 10% inside resolution code.

@charliermarsh
Copy link
Contributor

Yeah, seems like a clear improvement.

Copy link

codspeed-hq bot commented Nov 19, 2024

CodSpeed Performance Report

Merging #242 will degrade performances by 3.5%

Comparing Eh2406:pid_old (824ee77) with dev (4bde6d6)

Summary

⚡ 3 improvements
❌ 1 regressions
✅ 2 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark dev Eh2406:pid_old Change
backtracking_singletons 4.3 s 4.2 s +4.21%
large_case_u16_NumberVersion.ron 20.2 ms 21 ms -3.5%
sudoku-easy 15.9 ms 15.3 ms +4.32%
sudoku-hard 16.3 ms 15.5 ms +5.34%

src/internal/arena.rs Outdated Show resolved Hide resolved
Copy link
Member

@konstin konstin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(the perf numbers in uv shouldn't have changed)

@x-hgg-x
Copy link
Contributor

x-hgg-x commented Nov 20, 2024

Here are the main differences between #274 and this PR:

  • I used a PackageId type instead of Id<P> to remove the generic type.
  • I added the package store inside the error type to avoid cloning packages in the derivation tree (breaking change).
  • I added a display() method for Incompatibility, External and PartialSolution using the package store to get better logs and reports.
  • I added the package store in the DP methods to avoid hashing packages several times (breaking change).

src/internal/core.rs Outdated Show resolved Hide resolved
@Eh2406
Copy link
Member Author

Eh2406 commented Nov 20, 2024

The display() method is a very good idea. I copied that for Incompatibility, and PartialSolution. Thank you. So far External still holds P so does not need special handling. I audited every place where this PR switch from display to debug, where possible I included both. With the mapping regularly enough reported in the logs, I'm not worried about the remaining cases.

I'm going to merge, we can discuss breaking changes in follow-up PR's.

@Eh2406 Eh2406 added this pull request to the merge queue Nov 20, 2024
Merged via the queue into pubgrub-rs:dev with commit 4ac6c42 Nov 20, 2024
5 of 6 checks passed
@Eh2406 Eh2406 deleted the pid_old branch November 20, 2024 20:41
@charliermarsh
Copy link
Contributor

I'll upgrade uv to use this behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants