-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement shotgun4Intersect #239
base: master
Are you sure you want to change the base?
Conversation
Implement shotgun4Intersect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lemire Amazing trick with branchlessComparator
and if idx4+1 < nL { // common case
optimization!
Thanks! Let us finish this! |
I am leaving this PR intentionally open. |
Tried to follow-up on this branch here: puzpuzpuz@3ac5959 The changes are the following:
The real data benchmark I've added looks like the following: func BenchmarkRealDataAnd(b *testing.B) {
benchmarkRealDataAggregate(b, func(bitmaps []*Bitmap) uint64 {
b := bitmaps[0]
for i := 1; i < len(bitmaps); i++ {
b = And(b, bitmaps[i])
}
return b.GetCardinality()
})
} I'm not sure if it's good enough though. Looking forward to hear your thoughts. The results on my machine are the following. Environment: Ubuntu 20.04 x86-64, Go 1.16.5, i5-8300h CPU onesidedgallopingintersect2by2: $ BENCH_REAL_DATA=1 go test -bench BenchmarkRealDataAnd -run -
goos: linux
goarch: amd64
pkg: github.com/RoaringBitmap/roaring
cpu: Intel(R) Core(TM) i5-8300H CPU @ 2.30GHz
BenchmarkRealDataAnd/census-income_srt-8 112808 10660 ns/op
BenchmarkRealDataAnd/census-income-8 116253 10362 ns/op
BenchmarkRealDataAnd/census1881_srt-8 135080 8795 ns/op
BenchmarkRealDataAnd/census1881-8 130599 9168 ns/op
BenchmarkRealDataAnd/dimension_003-8 1376 885094 ns/op
BenchmarkRealDataAnd/dimension_008-8 4058 308777 ns/op
BenchmarkRealDataAnd/dimension_033-8 25941 46678 ns/op
BenchmarkRealDataAnd/uscensus2000-8 121574 9786 ns/op
BenchmarkRealDataAnd/weather_sept_85_srt-8 97990 12082 ns/op
BenchmarkRealDataAnd/weather_sept_85-8 43671 28879 ns/op
BenchmarkRealDataAnd/wikileaks-noquotes_srt-8 128178 14734 ns/op
BenchmarkRealDataAnd/wikileaks-noquotes-8 133525 9009 ns/op
PASS
ok github.com/RoaringBitmap/roaring 39.732s shotgun4Intersect: $ BENCH_REAL_DATA=1 go test -bench BenchmarkRealDataAnd -run -
goos: linux
goarch: amd64
pkg: github.com/RoaringBitmap/roaring
cpu: Intel(R) Core(TM) i5-8300H CPU @ 2.30GHz
BenchmarkRealDataAnd/census-income_srt-8 111625 10605 ns/op
BenchmarkRealDataAnd/census-income-8 117244 10248 ns/op
BenchmarkRealDataAnd/census1881_srt-8 133972 8770 ns/op
BenchmarkRealDataAnd/census1881-8 131566 9120 ns/op
BenchmarkRealDataAnd/dimension_003-8 1359 871526 ns/op
BenchmarkRealDataAnd/dimension_008-8 4050 292773 ns/op
BenchmarkRealDataAnd/dimension_033-8 26175 44918 ns/op
BenchmarkRealDataAnd/uscensus2000-8 127453 9230 ns/op
BenchmarkRealDataAnd/weather_sept_85_srt-8 97687 12014 ns/op
BenchmarkRealDataAnd/weather_sept_85-8 44283 26967 ns/op
BenchmarkRealDataAnd/wikileaks-noquotes_srt-8 135856 8815 ns/op
BenchmarkRealDataAnd/wikileaks-noquotes-8 134486 8959 ns/op
PASS
ok github.com/RoaringBitmap/roaring 38.144s No significant difference except for the |
Thanks. This is useful and seems to go toward my own bias: there may simply be not enough gain to make it worth it. |
Yes, so far I'm also under impression that shotgun intersection doesn't make the weather on operations with real data sets. I'm going to stop at this point and won't make further analysis. Going to check other GH issues and find the ones I could help with. |
@puzpuzpuz That was helpful! |
This is a second attempt at fixing #215
It is based on a previous PR by @alldroll. The changes are as follows:
The benchmark, though still unrealistic, is a tiny bit more realistic. Important: which benchmarking data processing code with branches, you don't want to run the same benchmark over the same data in sequence because the processor might get the ability to predict the branches with an unrealistic ability that is not achievable in the real world.
It looks like the Go compiler is not smart enough to use conditional moves, as it should. So it does many, many branches when compiling the shotgun4 approach. We want to avoid that. A solution (which is kind of a hack) is to replace the branches ("if") by some arithmetic. The arithmetic is more expensive (many more instructions) than a simple conditional move, but saving on mispredicted branches and useless speculative loads is important.
Results should vary but here is what I get after making these changes...
So shotgun4 is "almost" twice as fast. Not bad!
Of course, more work is needed...
I propose that the work be done in the shotgun4 branch directly in the main repository.