Training data for successful tuning run #256

mjain2 · 2019-10-15T21:40:31Z

Hi,

I'm testing out ottertune with MySQL & TPCC. I have seen some conflicting issues/documentation on training data, so just wanted to get some clarity on a few questions I had.

I have forked ottertune and am using it with Azure MySQL (made the proper changes to parser/code/configs to enable that). I used LHS to generate ~50 samples for 6 different knobs, then ran a few loops (resulting in errors, or 'not enough training data found' issues). Overall, I have about 100 points in my no_tuning_session, but when I run loops in a new tuning session, it always either (1) generates no recommendation / blank recommendation or (2) says not enough training data.

1: How do we evaluate whether our training data is good? We are currently using LHS to generate configurations. Are the training points expected to follow a normal distribution?
2: How many training data points are needed, on average? I have seen the tuning pipeline run with 44 training points, but I had uploaded ~100. I'm assuming this is because there were duplicate knobs? If duplicate knobs are filtered out, what's the best way to ensure we have the right amount of data before starting a tuning session?
3: Is there a way to make the upload process faster for LHS samples? To upload ~100 points it would take a day or two if the observation period is 5 minutes (1 upload currently takes ~8-10min).

In general, the tuning pipeline has been failing for me and it's hard to catch the issues until you are running a tuning session itself.

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training data for successful tuning run #256

Training data for successful tuning run #256

mjain2 commented Oct 15, 2019

Training data for successful tuning run #256

Training data for successful tuning run #256

Comments

mjain2 commented Oct 15, 2019