-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RFGPBenchmark (#5010) #5074
Conversation
Cool! How does it compare to before your PR? :) |
@gf712 , Updated it :) |
@jonpsy great work! |
Alright! This is quite a speed up! I suspect the speed ups will be even larger for much larger problems. Maybe add one bigger one that takes a second or so? You only benchmark the transform here, not the fit. Might be worth doing that too (as it will be faster as well with the code you wrote, interesting to know by how much) |
Right, we're planning to benchmark
We've cleared part 1. Two more to go ✌ |
I think 1 and 3 are the important ones. BTW does transform call apply_to_matrix once or apply_to_vector in a loop? |
The former |
ok good. I think there is not really a need to benchmark the vector version. But fit is definitely interesting |
Updated benchmark results: After PR Before PR This is the maximum load my already fragile laptop can achieve before causing a house fire. I'm still not sure about the Fit misbehaving in before PR. |
🤣 |
No need to go bigger :) Not sure about the fit thingi...but we cannot compare like this obviously ... |
Is it maybe that fit doesnt do anything before the PR because stuff was done lazily? This looks like timings for a no-op... |
In that case might be best to just compare fit + transform? |
yes that might be easiest.... maybe best to start with that. Though it is also interesting how this decomposes ... |
I can assure you both the transform methods start at an equal footing. The prior transform method do not use fit inside it. We are winning, there's no doubting that. About the misbehaving Fit method, I think I've cracked the code. Thing is, |
In that case wouldn't it be better to move the object instantiation to the loop? It will add a bit of noise in the benchmark, but imo the result would be more comparable |
Or.... We could tweak the prior method a little ;) (by removing test_rfinited() ) Here's the result of the prior PR fit method : https://pastebin.com/7hNSAPwq Another clear win. Victory feels nice, ain't it? |
\ | ||
SGMatrix<float64_t> matrix(num_dims, num_vecs); \ | ||
linalg::range_fill(matrix, 1.0); \ | ||
auto feats = std::make_shared<DenseFeatures<float64_t>>(matrix); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not auto feats = std::make_shared<DenseFeatures<float64_t>>(mat);
, like below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're fitting and transforming two different matrix.
That is feat the relative gains in fit are even stronger that in transform. @jonpsy make sure to take a moment to remember what the reasons of those performance gains were:
|
@jonpsy great work! :) |
Here's the benchmark result in release build on max cpu power (using
sudo cpupower frequency-set --governor performance
)After the PR:
https://pastebin.com/xefubPQ6
Before the PR
https://pastebin.com/PqneD4Nb
Benchmark for the PR: #5010
@karlnapf @gf712
EDIT: For future readers, here's the final draft for performance comparison .
Before PR:
https://pastebin.com/2ETBQgZL
After PR:
https://pastebin.com/rBwWPbqX