add DictionaryEncodedValueIndex.getValueIterator and use it for ExpressionPredicateIndexSupplier#19023
Conversation
…ssionPredicateIndexSupplier changes: * Added `getValueIterator` method to `DictionaryEncodedValueIndex` to give an easy way for consumers to iterate the dictionary values in order * `ExpressionPredicateIndexSupplier` now uses `getValueIterator` to scan the dictionary values, offering a performance improvement, particularly when using front-coding * fixed a few other places that were iterating the dictionary using get to use iterator instead
|
I wonder if it's worth adding some sort of perf section to the CI. Assuming we mandate that all perf-sensitive changes add a corresponding benchmark, presumably we can create a unit test that checks for statistically significant regressions in most core areas of the code? This might help catch more issues before they hit a release. Just would need to limit the execution time of said benchmarks. |
…erator-for-expression-predicate-index-supplier
I think that is a lot easier said than done, we'd have to be very strategic about what we run if we are talking about benchmarks like was added here, since the benchmarks require quite an absurd amount of time to run them all for all combinations. Just running which to be fair is quite a bit of an over estimate since many combinations are not run together (like we only test compression stuff on MMAP segments), so it wouldn't really be that long, but it would still be a very long time. If i restrict it to 1 set of combinations (and still run both non-vectorized and vectorized), e.g. it still estimates almost 11 hours And this is just 1 benchmark. There are a ton of other files, many with just as many combinations. This is also ignoring that these benchmarks are only as good as their coverage. The thing this PR is improving basically only applied to front-coding with sufficiently large buckets along with queries using enough virtual columns in filters so that it became slower than not using the indexes. There were very many cases where using the indexes was an improvement, and even this query was faster when not using front-coding, so we would need to have the right combinations to be able to catch stuff before. |
capistrant
left a comment
There was a problem hiding this comment.
left one comment that is maybe more of a teaching moment for me when it comes code in the query path
There was a problem hiding this comment.
as this logic is the same as getValue is there sense in having something like private String getUsingGlobalIndex(int index) that both can share? I'm not as used to the hot query path though, would that risk perf or would the compiler inline it and result in no runtime difference?
There was a problem hiding this comment.
ah they probably could use a shared method, i could measure just to be sure; this sort of code is copied in a lot of places all across the nested column stuff since quite a lot of places have to do a similar translation from local field ids to global ids, i had a dream that someday i would try to consolidate them, but haven't really got around to it yet 😅
|
CI failure seems unrelated (but also failing a lot, ive retried many times) |
changes:
getValueIteratormethod toDictionaryEncodedValueIndexto give an easy way for consumers to iterate the dictionary values in orderExpressionPredicateIndexSuppliernow usesgetValueIteratorto scan the dictionary values, offering a performance improvement, particularly when using front-codingCredit to #19004 for the added benchmark query and bringing this issue to attention, where when using front-coding it was causing computing the indexes to be slower than just doing a full scan (at least in some cases, such as this query)
before:
after:
I also considered adding a
getBitmapsIteratortoDictionaryEncodedValueIndex, but ultimately decided against it because most of the bitmapgetmethods do some coercion of null values to empty bitmaps so they can't just use the underlyingIndexediterator directly... which sounded a bit more tedious than i wanted to deal with. Perhaps can consider doing this as a follow-up so the places that are iterating both dictionaries and collecting the corresponding bitmaps can just both use iterators instead of keeping a counter, or making some convenient structure to iterate both things at the same time so we don't even need to keep in sync...