Add RFGPBenchmark (#5010) #5074

jonpsy · 2020-06-22T05:44:12Z

Here's the benchmark result in release build on max cpu power (using sudo cpupower frequency-set --governor performance)

After the PR:
https://pastebin.com/xefubPQ6

Before the PR
https://pastebin.com/PqneD4Nb

Benchmark for the PR: #5010
@karlnapf @gf712

EDIT: For future readers, here's the final draft for performance comparison .

Before PR:
https://pastebin.com/2ETBQgZL

After PR:
https://pastebin.com/rBwWPbqX

gf712 · 2020-06-22T06:43:51Z

Cool! How does it compare to before your PR? :)

jonpsy · 2020-06-22T07:49:27Z

@gf712 , Updated it :)

src/shogun/preprocessor/RandomFourierGaussPreproc_benchmark.cc

vigsterkr · 2020-06-22T08:54:38Z

@jonpsy great work!

karlnapf · 2020-06-22T09:11:30Z

Alright! This is quite a speed up! I suspect the speed ups will be even larger for much larger problems. Maybe add one bigger one that takes a second or so?

You only benchmark the transform here, not the fit. Might be worth doing that too (as it will be faster as well with the code you wrote, interesting to know by how much)

jonpsy · 2020-06-22T09:14:55Z

Right, we're planning to benchmark

transform
apply_to_feature_vector
fit

We've cleared part 1. Two more to go ✌

karlnapf · 2020-06-22T09:16:38Z

I think 1 and 3 are the important ones.

BTW does transform call apply_to_matrix once or apply_to_vector in a loop?

jonpsy · 2020-06-22T09:17:44Z

The former

karlnapf · 2020-06-22T09:20:20Z

ok good. I think there is not really a need to benchmark the vector version. But fit is definitely interesting

jonpsy · 2020-06-22T15:40:22Z

Updated benchmark results:

After PR
https://pastebin.com/rBwWPbqX

Before PR
https://pastebin.com/XJ1vjwd9

This is the maximum load my already fragile laptop can achieve before causing a house fire. I'm still not sure about the Fit misbehaving in before PR.

karlnapf · 2020-06-22T15:57:56Z

This is the maximum load my already fragile laptop can achieve before causing a house fire. I'm still not sure about the Fit misbehaving in before PR.

🤣

karlnapf · 2020-06-22T15:59:45Z

No need to go bigger :)
Really great improvements. You see it was worth taking the time to do this properly.

Not sure about the fit thingi...but we cannot compare like this obviously ...

karlnapf · 2020-06-22T16:01:10Z

Is it maybe that fit doesnt do anything before the PR because stuff was done lazily? This looks like timings for a no-op...
If that is the case, then the numbers above might be wrong in the sense that are not comparing apples and apples because before the PR, transform included computing a fit

gf712 · 2020-06-22T16:12:30Z

Is it maybe that fit doesnt do anything before the PR because stuff was done lazily? This looks like timings for a no-op...
If that is the case, then the numbers above might be wrong in the sense that are not comparing apples and apples because before the PR, transform included computing a fit

In that case might be best to just compare fit + transform?

karlnapf · 2020-06-22T16:20:22Z

yes that might be easiest.... maybe best to start with that. Though it is also interesting how this decomposes ...

jonpsy · 2020-06-24T09:36:34Z

I can assure you both the transform methods start at an equal footing. The prior transform method do not use fit inside it. We are winning, there's no doubting that.

About the misbehaving Fit method, I think I've cracked the code. Thing is, test_rfinited() do not let you calculate fit again for same set of parameters, it just returns the old ones. Due to this, we get say ~2ms on first itr then 0ms for the next 200+ iter. The average of which is 0 ms. Hence the erratic behaviour, voila!

gf712 · 2020-06-24T10:03:30Z

I can assure you both the transform methods start at an equal footing. The prior transform method do not use fit inside it. We are winning, there's no doubting that.

About the misbehaving Fit method, I think I've cracked the code. Thing is, test_rfinited() do not let you calculate fit again for same set of parameters, it just returns the old ones. Due to this, we get say ~2ms on first itr then 0ms for the next 200+ iter. The average of which is 0 ms. Hence the erratic behaviour, voila!

In that case wouldn't it be better to move the object instantiation to the loop? It will add a bit of noise in the benchmark, but imo the result would be more comparable

jonpsy · 2020-06-24T10:04:52Z

Or.... We could tweak the prior method a little ;) (by removing test_rfinited() )

Here's the result of the prior PR fit method : https://pastebin.com/7hNSAPwq

Another clear win. Victory feels nice, ain't it?

gf712 · 2020-06-24T11:18:24Z

src/shogun/preprocessor/Preprocessor_benchmark.h

+                                                                                \
+    SGMatrix<float64_t> matrix(num_dims, num_vecs);                             \
+    linalg::range_fill(matrix, 1.0);                                            \
+    auto feats = std::make_shared<DenseFeatures<float64_t>>(matrix);            \


why not auto feats = std::make_shared<DenseFeatures<float64_t>>(mat);, like below?

We're fitting and transforming two different matrix.

karlnapf · 2020-06-24T12:50:05Z

That is feat the relative gains in fit are even stronger that in transform.
Let's merge it then?

@jonpsy make sure to take a moment to remember what the reasons of those performance gains were:

SIMD vectorized operations rather than loops over components
Matrix-vector products rather than loops over vectors
And much simpler code

gf712 · 2020-06-24T13:03:46Z

@jonpsy great work! :)

add RFGPbenchmark

966f062

vigsterkr reviewed Jun 22, 2020

View reviewed changes

src/shogun/preprocessor/RandomFourierGaussPreproc_benchmark.cc Outdated Show resolved Hide resolved

add fit benchmark

75c233e

fix indents, rename fixtures

bd22e54

gf712 reviewed Jun 24, 2020

View reviewed changes

karlnapf merged commit bb84313 into shogun-toolbox:develop Jun 24, 2020

jonpsy deleted the benchmark branch June 24, 2020 15:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RFGPBenchmark (#5010) #5074

Add RFGPBenchmark (#5010) #5074

jonpsy commented Jun 22, 2020 •

edited

Loading

gf712 commented Jun 22, 2020

jonpsy commented Jun 22, 2020

vigsterkr commented Jun 22, 2020

karlnapf commented Jun 22, 2020

jonpsy commented Jun 22, 2020

karlnapf commented Jun 22, 2020

jonpsy commented Jun 22, 2020

karlnapf commented Jun 22, 2020

jonpsy commented Jun 22, 2020 •

edited

Loading

karlnapf commented Jun 22, 2020

karlnapf commented Jun 22, 2020

karlnapf commented Jun 22, 2020

gf712 commented Jun 22, 2020

karlnapf commented Jun 22, 2020

jonpsy commented Jun 24, 2020

gf712 commented Jun 24, 2020

jonpsy commented Jun 24, 2020 •

edited

Loading

gf712 Jun 24, 2020

jonpsy Jun 24, 2020

karlnapf commented Jun 24, 2020

gf712 commented Jun 24, 2020

Add RFGPBenchmark (#5010) #5074

Add RFGPBenchmark (#5010) #5074

Conversation

jonpsy commented Jun 22, 2020 • edited Loading

gf712 commented Jun 22, 2020

jonpsy commented Jun 22, 2020

vigsterkr commented Jun 22, 2020

karlnapf commented Jun 22, 2020

jonpsy commented Jun 22, 2020

karlnapf commented Jun 22, 2020

jonpsy commented Jun 22, 2020

karlnapf commented Jun 22, 2020

jonpsy commented Jun 22, 2020 • edited Loading

karlnapf commented Jun 22, 2020

karlnapf commented Jun 22, 2020

karlnapf commented Jun 22, 2020

gf712 commented Jun 22, 2020

karlnapf commented Jun 22, 2020

jonpsy commented Jun 24, 2020

gf712 commented Jun 24, 2020

jonpsy commented Jun 24, 2020 • edited Loading

gf712 Jun 24, 2020

Choose a reason for hiding this comment

jonpsy Jun 24, 2020

Choose a reason for hiding this comment

karlnapf commented Jun 24, 2020

gf712 commented Jun 24, 2020

jonpsy commented Jun 22, 2020 •

edited

Loading

jonpsy commented Jun 22, 2020 •

edited

Loading

jonpsy commented Jun 24, 2020 •

edited

Loading