Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sketch of how default kernel will change #435

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
Draft

Conversation

R-M-Lee
Copy link
Contributor

@R-M-Lee R-M-Lee commented Aug 27, 2024

We started discussing a little bit here:
#430

Briefly: we will change the default priors (actually it's a bit more than that... the default kernel really) to reflect the findings of the paper Vanilla Bayesian Optimization Performs Great in High Dimensions and the corresponding changes being made by the botorch team (Update the default SingleTaskGP prior and Use Standardize by default for SingleTaskGP)

Here is a first very rough commit to check... I think that this would have the desired effect:

  • dimensionality scaled lengthscale priors
  • new noise prior
  • no output scaling but output is standardized by default

If this makes sense (@jduerholt ) then I can clear it up and make the associated changes e.g., in SingleTaskGPHyperconfig.

Proposal: we just change the values of BOTORCH_NOISE_PRIOR and BOTORCH_LENGTHCALE_PRIOR, use them as defaults, and remove the HVARFNER stuff

@jduerholt
Copy link
Contributor

Hi Robert,

looks good for me.

Let us rename the botorch priors to ThreeSixPrior as they are often called like this in the literature.

Why not keep the Hvarfner name? It is from his paper ;)

The question would then which priors to include in the hyperopt? We could go for all three (MBO, ThreeSix, Hvarfner) and combine it with a an additional input called outputscale to toggle in the hyperopt if the kernel should be combined with an outputscale or not? But this could also be done in a seperate PR.

The current PR is failing due to the lengthscale feature importance calculation, as it is expecting a lengthscale based kernel inside a scale kernel. It has to be adjusted so that it also understand it, when it is a pure kernel. But this is just one furhter if expresion.

Best,

Johannes

lengthscale_prior=BOTORCH_LENGTHCALE_PRIOR(),
),
outputscale_prior=BOTORCH_SCALE_PRIOR(),
default_factory=lambda: MaternKernel(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original paper and I think also in botorch they are using an RBF Kernel, or?

@R-M-Lee
Copy link
Contributor Author

R-M-Lee commented Aug 29, 2024

Thanks for the feedback. Ok then I'll clean this up and ask you to have another look later

@R-M-Lee
Copy link
Contributor Author

R-M-Lee commented Sep 5, 2024

Hi @jduerholt (maybe also interesting for @bertiqwerty),

I have not yet modified the length scale feature importance calculation or the hyperopt config... I just wanted to convince myself that the new priors work well. So I used the new multinorm benchmark problem with 20 dimensions to see whether the new priors give a performance boost. In this one test, they actually make it worse (this is a maximization problem)

image

Ideas? Either a different problem to confirm the performance or something I am missing in the surrogate definition (although it seems simple enough to me). Maybe I should use one of the test functions from the Hvarfner paper

The script I used to generate the data in the picture is here

@jduerholt
Copy link
Contributor

Hi @R-M-Lee,

very interesting finding.

When I implemented the priors orignally (which could be of course buggy), I tested it only on ZDT1 and saw there some improvement in comparison to the usual priors. You can find it here: https://github.com/experimental-design/bofire/blob/main/tutorials/benchmarks/011-ZDT1.ipynb

SAASBO still outperforms it there. Another high-dimensional function that we have in BoFire is the thirty dimensional branin function (https://github.com/experimental-design/bofire/blob/main/tutorials/benchmarks/006-30dimBranin.ipynb). I am currently running some tests there. Unfortunately, they have other functions in the paper. So, I think it makes sense to run our implementation on one of the functions from the paper.

Other possibility would be to run plain botorch on the benchmark from above. What we definitely can check is that the botorch model created by our surrogate looks the same (kernel, priors etc.) as the one which is currently setup in default botorch (using the Hvarfner priors)

Best,

Johannes

@jduerholt
Copy link
Contributor

For the 30dim Branin, the new priors perform slighlty better:
image

@R-M-Lee
Copy link
Contributor Author

R-M-Lee commented Sep 10, 2024

hmm, I'm still not sure enough with the new priors to finalize this pr... here is the result of using the Hartmann function in 25D (only the first 6 dimensions are influential, others are just to inflate the dimensionality). I have plotted the distribution of responses at each iteration, not the "best so far". So it's clear from this that the Hvarfner priors, at least how we have them implemented, lead to a lot more exploration. Maybe this is the desired behavior; there's an argument in the Hvarfner paper about EI + short lengthscale priors leading to proposals close to the incumbent - maybe that's just what we see in this plot

image

Here is the typical staircase (cummin) plot. Now the new and old priors are pretty indistinguishable:

image

I should probably use the same init random samples to compare the methods. Will try to remember that next time

Next I will compare bofire surrogate with botorch surrogate as you suggest.

@R-M-Lee
Copy link
Contributor Author

R-M-Lee commented Sep 13, 2024

update (mostly for myself)

The new priors are not yet in the pypi version of botorch (0.11.3). There, you indeed get

SingleTaskGP(
  (likelihood): GaussianLikelihood(
    (noise_covar): HomoskedasticNoise(
      (noise_prior): GammaPrior()
      (raw_noise_constraint): GreaterThan(1.000E-04)
    )
  )
  (mean_module): ConstantMean()
  (covar_module): ScaleKernel(
    (base_kernel): MaternKernel(
      (lengthscale_prior): GammaPrior()
      (raw_lengthscale_constraint): Positive()
    )
    (outputscale_prior): GammaPrior()
    (raw_outputscale_constraint): Positive()
  )
)

and the lengthscale priors in the Matern kernel are gamma(3,6)

>>> [_ for _ in inferred_noise_model.covar_module.sub_kernels()][0].lengthscale_prior.state_dict()
OrderedDict([('concentration', tensor(3., dtype=torch.float64)),
             ('rate', tensor(6., dtype=torch.float64))])

I'll compare with the botorch main branch

@R-M-Lee
Copy link
Contributor Author

R-M-Lee commented Sep 13, 2024

Ok I did the comparison now with some random train and test data. I think the results are pretty convincing that we have it implemented correctly (or at least in line with the botorch main branch):

image

so I'll clean up, take care of the feature importance functions and the hyperconfig, then you can review

@jduerholt
Copy link
Contributor

so I'll clean up, take care of the feature importance functions and the hyperconfig, then you can review

Thank you very much!

@R-M-Lee
Copy link
Contributor Author

R-M-Lee commented Sep 26, 2024

The question would then which priors to include in the hyperopt? We could go for all three (MBO, ThreeSix, Hvarfner) and combine it with a an additional input called outputscale to toggle in the hyperopt if the kernel should be combined with an outputscale or not? But this could also be done in a seperate PR.

I think your suggestion is good and we should do it like that. So I put that in too.

@R-M-Lee
Copy link
Contributor Author

R-M-Lee commented Nov 7, 2024

Hi @jduerholt, quick reminder: this is still waiting for your review

@jduerholt
Copy link
Contributor

Oh, sorry. Totally forgot it. I will review it tomorrow. Can you make the branch up do date with the recent changes in BoFire?

@R-M-Lee
Copy link
Contributor Author

R-M-Lee commented Nov 8, 2024

ok, I think it's up-to-date now so feel free to give your thoughts when you can @jduerholt

Copy link
Contributor

@jduerholt jduerholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thank you! Only some minor comments. Best, Johannes

self.gaussians = gaussians
self.prefactors = prefactors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of these changes to Branin?

BOTORCH_SCALE_PRIOR(),
THREESIX_NOISE_PRIOR(),
THREESIX_LENGTHSCALE_PRIOR(),
THREESIX_SCALE_PRIOR(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add the Hvarfner priors also to this GP and its hyperconfig. But this can be also done on need in a seperate PR.

)
surrogate_data.noise_prior = noise_prior

# Define a kernel that wraps the base kernel in a scale kernel if necessary
def outer_kernel(base_kernel, outputscale_prior, use_scale) -> AnyKernel:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add typehints to this function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants