Get rid of unsafe pointer magic WithSidebands (cost: high)
What needs to be done is to transform the &mut StreamingPeekableIter into a child future, and when exhausted, it must be transformed back into the &mut _ that created it. That way, only a single mutable reference to said Iter is present at any time. Unfortunately the generated futures (using async) don't support that as we would have to keep both the future and the parent that created it inside of our own struct. Instead of hiding this using pointers, one could implement the magical part by hand, a custom future, which happily dissolves into its mutable parent iter ref. That would be quite some work though.
- Finding an object in a loose object store costs an extra disk IO operation to pacify the borrow checker. This wouldn't be an issue with polonius.
- tree-parsing performance
- when diffing trees parsing can take substantial time. Maybe optimizations are possible here.
- Once polonius is available with production-ready performance, we should simplify the
locate(…)
code in(compound|linked)::Db()
respectively. Currently these first have to obtain an index, and when found, access the data to avoid having the borrowchecker fail to understand our buffer usage within a loop correctly. Performance itself it probably not reasonably affected.
- Pack decoding takes 5x more memory than git on the android-base repository.
- On ARM64 on MacOS the SHA1 implementation of the
sha-1
crate is capped at about 550MB/s, half the speed of what I saw on Intel and about 50% slower than what's implemented inlibcorecrypto.dylib
. Get that fast and the decoding stage will be able to beat git on fewer cores. See this comment for more. Right now we only do when scaling beyond whatgit
can do due to lock contention.- This should work once the
asm
feature can be enabled in thesha-1
crate, which currently fails but is tracked in this issue.- If it's not fast enough, one might hope that ARM8 instructions can improve performance, but right now they aren't available.
- Maybe the path forward for that crate is to use system or openssl dylibs.
- This should work once the
-
pack::cache::lru::Memory
all copy input data in individual allocations. Could a pre-allocated arena or slab be faster?- Probably not, as allocation performance might not be the issue here. Even though there definitely is a lot of effectively useless copying of data and deallocation happening if caches are not used after all.
- Add more control over the amount of memory used for the
less-memory
algorithm ofpack-verify
to increase cache hit rate at the cost of memory. Note that depending on this setting, it might not be needed anymore to iterated over sorted offsets, freeing 150MB of memory in the process that could be used for the improved cache. With the current cache and no sorted offsets, the time nearly triples. - progress measuring costs when using 96 cores (see this comment)
- potential savings: low
- Add '--chunk|batch-size' flag to
pack-verify
andpack-index-from-data
to allow tuning sizes for large amounts of cores- @joshtriplett write: "I did find that algorithm when I was looking for the chunk size, though I didn't dig into the details. As a quick hack, I tried dropping the upper number from 1000 to 250, which made no apparent difference in performance."
- potential savings:
mediumunclear
- On 96 core machines, it takes visible time until all threads are started and have work. Is it because starting 100 threads takes so long? Or is it contention to get work?
- Improve cache hit rate of
lookup
pack traversal by using partial DAGs build with help of the index- @joshtriplett writes: "Would it be possible, with some care, to use the index to figure out in advance which objects will be needed again and which ones won't? Could you compute a small DAG of objects you need for deltas (without storing the objects themselves), and use that to decide the order you process objects in?"
- Note that there is tension between adding more latency to build such tree and the algorithms ability to (otherwise) start instantly.
- potential savings: unknown