Skip to content
This repository has been archived by the owner on Mar 20, 2021. It is now read-only.

Training data for successful tuning run #256

Open
mjain2 opened this issue Oct 15, 2019 · 0 comments
Open

Training data for successful tuning run #256

mjain2 opened this issue Oct 15, 2019 · 0 comments

Comments

@mjain2
Copy link

mjain2 commented Oct 15, 2019

Hi,

I'm testing out ottertune with MySQL & TPCC. I have seen some conflicting issues/documentation on training data, so just wanted to get some clarity on a few questions I had.

I have forked ottertune and am using it with Azure MySQL (made the proper changes to parser/code/configs to enable that). I used LHS to generate ~50 samples for 6 different knobs, then ran a few loops (resulting in errors, or 'not enough training data found' issues). Overall, I have about 100 points in my no_tuning_session, but when I run loops in a new tuning session, it always either (1) generates no recommendation / blank recommendation or (2) says not enough training data.

1: How do we evaluate whether our training data is good? We are currently using LHS to generate configurations. Are the training points expected to follow a normal distribution?
2: How many training data points are needed, on average? I have seen the tuning pipeline run with 44 training points, but I had uploaded ~100. I'm assuming this is because there were duplicate knobs? If duplicate knobs are filtered out, what's the best way to ensure we have the right amount of data before starting a tuning session?
3: Is there a way to make the upload process faster for LHS samples? To upload ~100 points it would take a day or two if the observation period is 5 minutes (1 upload currently takes ~8-10min).

In general, the tuning pipeline has been failing for me and it's hard to catch the issues until you are running a tuning session itself.

Thanks in advance!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant