Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RFGPBenchmark (#5010) #5074

Merged
merged 3 commits into from
Jun 24, 2020
Merged

Add RFGPBenchmark (#5010) #5074

merged 3 commits into from
Jun 24, 2020

Conversation

jonpsy
Copy link
Contributor

@jonpsy jonpsy commented Jun 22, 2020

Here's the benchmark result in release build on max cpu power (using sudo cpupower frequency-set --governor performance)

After the PR:
https://pastebin.com/xefubPQ6

Before the PR
https://pastebin.com/PqneD4Nb

Benchmark for the PR: #5010
@karlnapf @gf712

EDIT: For future readers, here's the final draft for performance comparison .

Before PR:
https://pastebin.com/2ETBQgZL

After PR:
https://pastebin.com/rBwWPbqX

@gf712
Copy link
Member

gf712 commented Jun 22, 2020

Cool! How does it compare to before your PR? :)

@jonpsy
Copy link
Contributor Author

jonpsy commented Jun 22, 2020

@gf712 , Updated it :)

@vigsterkr
Copy link
Member

@jonpsy great work!

@karlnapf
Copy link
Member

Alright! This is quite a speed up! I suspect the speed ups will be even larger for much larger problems. Maybe add one bigger one that takes a second or so?

You only benchmark the transform here, not the fit. Might be worth doing that too (as it will be faster as well with the code you wrote, interesting to know by how much)

@jonpsy
Copy link
Contributor Author

jonpsy commented Jun 22, 2020

Right, we're planning to benchmark

  1. transform
  2. apply_to_feature_vector
  3. fit

We've cleared part 1. Two more to go ✌

@karlnapf
Copy link
Member

I think 1 and 3 are the important ones.

BTW does transform call apply_to_matrix once or apply_to_vector in a loop?

@jonpsy
Copy link
Contributor Author

jonpsy commented Jun 22, 2020

The former

@karlnapf
Copy link
Member

ok good. I think there is not really a need to benchmark the vector version. But fit is definitely interesting

@jonpsy
Copy link
Contributor Author

jonpsy commented Jun 22, 2020

Updated benchmark results:

After PR
https://pastebin.com/rBwWPbqX

Before PR
https://pastebin.com/XJ1vjwd9

This is the maximum load my already fragile laptop can achieve before causing a house fire. I'm still not sure about the Fit misbehaving in before PR.

@karlnapf
Copy link
Member

This is the maximum load my already fragile laptop can achieve before causing a house fire. I'm still not sure about the Fit misbehaving in before PR.

🤣

@karlnapf
Copy link
Member

No need to go bigger :)
Really great improvements. You see it was worth taking the time to do this properly.

Not sure about the fit thingi...but we cannot compare like this obviously ...

@karlnapf
Copy link
Member

Is it maybe that fit doesnt do anything before the PR because stuff was done lazily? This looks like timings for a no-op...
If that is the case, then the numbers above might be wrong in the sense that are not comparing apples and apples because before the PR, transform included computing a fit

@gf712
Copy link
Member

gf712 commented Jun 22, 2020

Is it maybe that fit doesnt do anything before the PR because stuff was done lazily? This looks like timings for a no-op...
If that is the case, then the numbers above might be wrong in the sense that are not comparing apples and apples because before the PR, transform included computing a fit

In that case might be best to just compare fit + transform?

@karlnapf
Copy link
Member

yes that might be easiest.... maybe best to start with that. Though it is also interesting how this decomposes ...

@jonpsy
Copy link
Contributor Author

jonpsy commented Jun 24, 2020

I can assure you both the transform methods start at an equal footing. The prior transform method do not use fit inside it. We are winning, there's no doubting that.

About the misbehaving Fit method, I think I've cracked the code. Thing is, test_rfinited() do not let you calculate fit again for same set of parameters, it just returns the old ones. Due to this, we get say ~2ms on first itr then 0ms for the next 200+ iter. The average of which is 0 ms. Hence the erratic behaviour, voila!

@gf712
Copy link
Member

gf712 commented Jun 24, 2020

I can assure you both the transform methods start at an equal footing. The prior transform method do not use fit inside it. We are winning, there's no doubting that.

About the misbehaving Fit method, I think I've cracked the code. Thing is, test_rfinited() do not let you calculate fit again for same set of parameters, it just returns the old ones. Due to this, we get say ~2ms on first itr then 0ms for the next 200+ iter. The average of which is 0 ms. Hence the erratic behaviour, voila!

In that case wouldn't it be better to move the object instantiation to the loop? It will add a bit of noise in the benchmark, but imo the result would be more comparable

@jonpsy
Copy link
Contributor Author

jonpsy commented Jun 24, 2020

Or.... We could tweak the prior method a little ;) (by removing test_rfinited() )

Here's the result of the prior PR fit method : https://pastebin.com/7hNSAPwq

Another clear win. Victory feels nice, ain't it?

\
SGMatrix<float64_t> matrix(num_dims, num_vecs); \
linalg::range_fill(matrix, 1.0); \
auto feats = std::make_shared<DenseFeatures<float64_t>>(matrix); \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not auto feats = std::make_shared<DenseFeatures<float64_t>>(mat);, like below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're fitting and transforming two different matrix.

@karlnapf
Copy link
Member

That is feat the relative gains in fit are even stronger that in transform.
Let's merge it then?

@jonpsy make sure to take a moment to remember what the reasons of those performance gains were:

  • SIMD vectorized operations rather than loops over components
  • Matrix-vector products rather than loops over vectors
  • And much simpler code

@gf712
Copy link
Member

gf712 commented Jun 24, 2020

@jonpsy great work! :)

@karlnapf karlnapf merged commit bb84313 into shogun-toolbox:develop Jun 24, 2020
@jonpsy jonpsy deleted the benchmark branch June 24, 2020 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants