sketch of how default kernel will change #435

R-M-Lee · 2024-08-27T15:35:41Z

We started discussing a little bit here:
#430

Briefly: we will change the default priors (actually it's a bit more than that... the default kernel really) to reflect the findings of the paper Vanilla Bayesian Optimization Performs Great in High Dimensions and the corresponding changes being made by the botorch team (Update the default SingleTaskGP prior and Use Standardize by default for SingleTaskGP)

Here is a first very rough commit to check... I think that this would have the desired effect:

dimensionality scaled lengthscale priors
new noise prior
no output scaling but output is standardized by default

If this makes sense (@jduerholt ) then I can clear it up and make the associated changes e.g., in SingleTaskGPHyperconfig.

Proposal: we just change the values of BOTORCH_NOISE_PRIOR and BOTORCH_LENGTHCALE_PRIOR, use them as defaults, and remove the HVARFNER stuff

jduerholt · 2024-08-28T19:20:51Z

Hi Robert,

looks good for me.

Let us rename the botorch priors to ThreeSixPrior as they are often called like this in the literature.

Why not keep the Hvarfner name? It is from his paper ;)

The question would then which priors to include in the hyperopt? We could go for all three (MBO, ThreeSix, Hvarfner) and combine it with a an additional input called outputscale to toggle in the hyperopt if the kernel should be combined with an outputscale or not? But this could also be done in a seperate PR.

The current PR is failing due to the lengthscale feature importance calculation, as it is expecting a lengthscale based kernel inside a scale kernel. It has to be adjusted so that it also understand it, when it is a pure kernel. But this is just one furhter if expresion.

Best,

Johannes

jduerholt · 2024-08-28T19:21:39Z

bofire/data_models/surrogates/single_task_gp.py

-                lengthscale_prior=BOTORCH_LENGTHCALE_PRIOR(),
-            ),
-            outputscale_prior=BOTORCH_SCALE_PRIOR(),
+        default_factory=lambda: MaternKernel(


In the original paper and I think also in botorch they are using an RBF Kernel, or?

R-M-Lee · 2024-08-29T07:47:09Z

Thanks for the feedback. Ok then I'll clean this up and ask you to have another look later

R-M-Lee · 2024-09-05T15:44:45Z

Hi @jduerholt (maybe also interesting for @bertiqwerty),

I have not yet modified the length scale feature importance calculation or the hyperopt config... I just wanted to convince myself that the new priors work well. So I used the new multinorm benchmark problem with 20 dimensions to see whether the new priors give a performance boost. In this one test, they actually make it worse (this is a maximization problem)

Ideas? Either a different problem to confirm the performance or something I am missing in the surrogate definition (although it seems simple enough to me). Maybe I should use one of the test functions from the Hvarfner paper

The script I used to generate the data in the picture is here

jduerholt · 2024-09-06T09:56:39Z

Hi @R-M-Lee,

very interesting finding.

When I implemented the priors orignally (which could be of course buggy), I tested it only on ZDT1 and saw there some improvement in comparison to the usual priors. You can find it here: https://github.com/experimental-design/bofire/blob/main/tutorials/benchmarks/011-ZDT1.ipynb

SAASBO still outperforms it there. Another high-dimensional function that we have in BoFire is the thirty dimensional branin function (https://github.com/experimental-design/bofire/blob/main/tutorials/benchmarks/006-30dimBranin.ipynb). I am currently running some tests there. Unfortunately, they have other functions in the paper. So, I think it makes sense to run our implementation on one of the functions from the paper.

Other possibility would be to run plain botorch on the benchmark from above. What we definitely can check is that the botorch model created by our surrogate looks the same (kernel, priors etc.) as the one which is currently setup in default botorch (using the Hvarfner priors)

Best,

Johannes

jduerholt · 2024-09-06T11:23:57Z

For the 30dim Branin, the new priors perform slighlty better:

R-M-Lee · 2024-09-10T08:31:42Z

hmm, I'm still not sure enough with the new priors to finalize this pr... here is the result of using the Hartmann function in 25D (only the first 6 dimensions are influential, others are just to inflate the dimensionality). I have plotted the distribution of responses at each iteration, not the "best so far". So it's clear from this that the Hvarfner priors, at least how we have them implemented, lead to a lot more exploration. Maybe this is the desired behavior; there's an argument in the Hvarfner paper about EI + short lengthscale priors leading to proposals close to the incumbent - maybe that's just what we see in this plot

Here is the typical staircase (cummin) plot. Now the new and old priors are pretty indistinguishable:

I should probably use the same init random samples to compare the methods. Will try to remember that next time

Next I will compare bofire surrogate with botorch surrogate as you suggest.

R-M-Lee · 2024-09-13T13:33:05Z

update (mostly for myself)

The new priors are not yet in the pypi version of botorch (0.11.3). There, you indeed get

SingleTaskGP(
  (likelihood): GaussianLikelihood(
    (noise_covar): HomoskedasticNoise(
      (noise_prior): GammaPrior()
      (raw_noise_constraint): GreaterThan(1.000E-04)
    )
  )
  (mean_module): ConstantMean()
  (covar_module): ScaleKernel(
    (base_kernel): MaternKernel(
      (lengthscale_prior): GammaPrior()
      (raw_lengthscale_constraint): Positive()
    )
    (outputscale_prior): GammaPrior()
    (raw_outputscale_constraint): Positive()
  )
)

and the lengthscale priors in the Matern kernel are gamma(3,6)

>>> [_ for _ in inferred_noise_model.covar_module.sub_kernels()][0].lengthscale_prior.state_dict()
OrderedDict([('concentration', tensor(3., dtype=torch.float64)),
             ('rate', tensor(6., dtype=torch.float64))])

I'll compare with the botorch main branch

R-M-Lee · 2024-09-13T14:30:22Z

Ok I did the comparison now with some random train and test data. I think the results are pretty convincing that we have it implemented correctly (or at least in line with the botorch main branch):

so I'll clean up, take care of the feature importance functions and the hyperconfig, then you can review

jduerholt · 2024-09-25T06:19:33Z

so I'll clean up, take care of the feature importance functions and the hyperconfig, then you can review

Thank you very much!

R-M-Lee · 2024-09-26T15:29:14Z

The question would then which priors to include in the hyperopt? We could go for all three (MBO, ThreeSix, Hvarfner) and combine it with a an additional input called outputscale to toggle in the hyperopt if the kernel should be combined with an outputscale or not? But this could also be done in a seperate PR.

I think your suggestion is good and we should do it like that. So I put that in too.

R-M-Lee · 2024-11-07T14:21:56Z

Hi @jduerholt, quick reminder: this is still waiting for your review

jduerholt · 2024-11-07T15:18:06Z

Oh, sorry. Totally forgot it. I will review it tomorrow. Can you make the branch up do date with the recent changes in BoFire?

R-M-Lee · 2024-11-08T09:39:16Z

ok, I think it's up-to-date now so feel free to give your thoughts when you can @jduerholt

jduerholt

Looks good to me. Thank you! Only some minor comments. Best, Johannes

jduerholt · 2024-11-08T19:25:55Z

bofire/benchmarks/single.py

        self.gaussians = gaussians
+        self.prefactors = prefactors


What is the purpose of these changes to Branin?

jduerholt · 2024-11-08T19:30:03Z

bofire/data_models/surrogates/mixed_single_task_gp.py

-                BOTORCH_SCALE_PRIOR(),
+                THREESIX_NOISE_PRIOR(),
+                THREESIX_LENGTHSCALE_PRIOR(),
+                THREESIX_SCALE_PRIOR(),


We could add the Hvarfner priors also to this GP and its hyperconfig. But this can be also done on need in a seperate PR.

jduerholt · 2024-11-08T19:37:01Z

bofire/data_models/surrogates/single_task_gp.py

            )
        surrogate_data.noise_prior = noise_prior
+
+        # Define a kernel that wraps the base kernel in a scale kernel if necessary
+        def outer_kernel(base_kernel, outputscale_prior, use_scale) -> AnyKernel:


Can you also add typehints to this function?

sketch of how default kernel will change

b6a78e2

jduerholt reviewed Aug 28, 2024

View reviewed changes

R-M-Lee added 4 commits September 5, 2024 14:41

improve numerical behavior of mvnorm benchmark

ff33609

hvarfner priors

e35e6f3

single_task_gp.py

c6f0f3e

script for generating benchmark image

96e9201

threesix and hvarfner priors for singletaskgp

e22dab2

R-M-Lee requested a review from jduerholt September 26, 2024 15:27

update hyperparam opt with new kernel and priors

94a08f7

R-M-Lee added 3 commits November 8, 2024 10:01

merge main into new_default_priors

9be5cb0

remove commented old code

b598d3c

remove refs to old priors

8a64681

jduerholt approved these changes Nov 8, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sketch of how default kernel will change #435

sketch of how default kernel will change #435

R-M-Lee commented Aug 27, 2024 •

edited

Loading

jduerholt commented Aug 28, 2024

jduerholt Aug 28, 2024

R-M-Lee commented Aug 29, 2024

R-M-Lee commented Sep 5, 2024

jduerholt commented Sep 6, 2024

jduerholt commented Sep 6, 2024

R-M-Lee commented Sep 10, 2024

R-M-Lee commented Sep 13, 2024

R-M-Lee commented Sep 13, 2024 •

edited

Loading

jduerholt commented Sep 25, 2024

R-M-Lee commented Sep 26, 2024 •

edited

Loading

R-M-Lee commented Nov 7, 2024

jduerholt commented Nov 7, 2024

R-M-Lee commented Nov 8, 2024

jduerholt left a comment

jduerholt Nov 8, 2024

jduerholt Nov 8, 2024

jduerholt Nov 8, 2024

sketch of how default kernel will change #435

Are you sure you want to change the base?

sketch of how default kernel will change #435

Conversation

R-M-Lee commented Aug 27, 2024 • edited Loading

jduerholt commented Aug 28, 2024

jduerholt Aug 28, 2024

Choose a reason for hiding this comment

R-M-Lee commented Aug 29, 2024

R-M-Lee commented Sep 5, 2024

jduerholt commented Sep 6, 2024

jduerholt commented Sep 6, 2024

R-M-Lee commented Sep 10, 2024

R-M-Lee commented Sep 13, 2024

R-M-Lee commented Sep 13, 2024 • edited Loading

jduerholt commented Sep 25, 2024

R-M-Lee commented Sep 26, 2024 • edited Loading

R-M-Lee commented Nov 7, 2024

jduerholt commented Nov 7, 2024

R-M-Lee commented Nov 8, 2024

jduerholt left a comment

Choose a reason for hiding this comment

jduerholt Nov 8, 2024

Choose a reason for hiding this comment

jduerholt Nov 8, 2024

Choose a reason for hiding this comment

jduerholt Nov 8, 2024

Choose a reason for hiding this comment

R-M-Lee commented Aug 27, 2024 •

edited

Loading

R-M-Lee commented Sep 13, 2024 •

edited

Loading

R-M-Lee commented Sep 26, 2024 •

edited

Loading