-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sketch of how default kernel will change #435
base: main
Are you sure you want to change the base?
Conversation
Hi Robert, looks good for me. Let us rename the botorch priors to Why not keep the The question would then which priors to include in the hyperopt? We could go for all three (MBO, ThreeSix, Hvarfner) and combine it with a an additional input called The current PR is failing due to the lengthscale feature importance calculation, as it is expecting a lengthscale based kernel inside a scale kernel. It has to be adjusted so that it also understand it, when it is a pure kernel. But this is just one furhter if expresion. Best, Johannes |
lengthscale_prior=BOTORCH_LENGTHCALE_PRIOR(), | ||
), | ||
outputscale_prior=BOTORCH_SCALE_PRIOR(), | ||
default_factory=lambda: MaternKernel( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original paper and I think also in botorch they are using an RBF Kernel, or?
Thanks for the feedback. Ok then I'll clean this up and ask you to have another look later |
Hi @jduerholt (maybe also interesting for @bertiqwerty), I have not yet modified the length scale feature importance calculation or the hyperopt config... I just wanted to convince myself that the new priors work well. So I used the new multinorm benchmark problem with 20 dimensions to see whether the new priors give a performance boost. In this one test, they actually make it worse (this is a maximization problem) Ideas? Either a different problem to confirm the performance or something I am missing in the surrogate definition (although it seems simple enough to me). Maybe I should use one of the test functions from the Hvarfner paper The script I used to generate the data in the picture is here |
Hi @R-M-Lee, very interesting finding. When I implemented the priors orignally (which could be of course buggy), I tested it only on ZDT1 and saw there some improvement in comparison to the usual priors. You can find it here: https://github.com/experimental-design/bofire/blob/main/tutorials/benchmarks/011-ZDT1.ipynb SAASBO still outperforms it there. Another high-dimensional function that we have in BoFire is the thirty dimensional branin function (https://github.com/experimental-design/bofire/blob/main/tutorials/benchmarks/006-30dimBranin.ipynb). I am currently running some tests there. Unfortunately, they have other functions in the paper. So, I think it makes sense to run our implementation on one of the functions from the paper. Other possibility would be to run plain botorch on the benchmark from above. What we definitely can check is that the botorch model created by our surrogate looks the same (kernel, priors etc.) as the one which is currently setup in default botorch (using the Hvarfner priors) Best, Johannes |
hmm, I'm still not sure enough with the new priors to finalize this pr... here is the result of using the Hartmann function in 25D (only the first 6 dimensions are influential, others are just to inflate the dimensionality). I have plotted the distribution of responses at each iteration, not the "best so far". So it's clear from this that the Hvarfner priors, at least how we have them implemented, lead to a lot more exploration. Maybe this is the desired behavior; there's an argument in the Hvarfner paper about EI + short lengthscale priors leading to proposals close to the incumbent - maybe that's just what we see in this plot Here is the typical staircase (cummin) plot. Now the new and old priors are pretty indistinguishable: I should probably use the same init random samples to compare the methods. Will try to remember that next time Next I will compare bofire surrogate with botorch surrogate as you suggest. |
update (mostly for myself) The new priors are not yet in the pypi version of botorch (0.11.3). There, you indeed get
and the lengthscale priors in the Matern kernel are gamma(3,6)
I'll compare with the botorch main branch |
Ok I did the comparison now with some random train and test data. I think the results are pretty convincing that we have it implemented correctly (or at least in line with the botorch main branch): so I'll clean up, take care of the feature importance functions and the hyperconfig, then you can review |
Thank you very much! |
I think your suggestion is good and we should do it like that. So I put that in too. |
Hi @jduerholt, quick reminder: this is still waiting for your review |
Oh, sorry. Totally forgot it. I will review it tomorrow. Can you make the branch up do date with the recent changes in BoFire? |
ok, I think it's up-to-date now so feel free to give your thoughts when you can @jduerholt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thank you! Only some minor comments. Best, Johannes
self.gaussians = gaussians | ||
self.prefactors = prefactors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of these changes to Branin?
BOTORCH_SCALE_PRIOR(), | ||
THREESIX_NOISE_PRIOR(), | ||
THREESIX_LENGTHSCALE_PRIOR(), | ||
THREESIX_SCALE_PRIOR(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add the Hvarfner priors also to this GP and its hyperconfig. But this can be also done on need in a seperate PR.
) | ||
surrogate_data.noise_prior = noise_prior | ||
|
||
# Define a kernel that wraps the base kernel in a scale kernel if necessary | ||
def outer_kernel(base_kernel, outputscale_prior, use_scale) -> AnyKernel: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add typehints to this function?
We started discussing a little bit here:
#430
Briefly: we will change the default priors (actually it's a bit more than that... the default kernel really) to reflect the findings of the paper Vanilla Bayesian Optimization Performs Great in High Dimensions and the corresponding changes being made by the botorch team (Update the default SingleTaskGP prior and Use Standardize by default for SingleTaskGP)
Here is a first very rough commit to check... I think that this would have the desired effect:
If this makes sense (@jduerholt ) then I can clear it up and make the associated changes e.g., in SingleTaskGPHyperconfig.
Proposal: we just change the values of
BOTORCH_NOISE_PRIOR
andBOTORCH_LENGTHCALE_PRIOR
, use them as defaults, and remove theHVARFNER
stuff