You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 20, 2021. It is now read-only.
I'm testing out ottertune with MySQL & TPCC. I have seen some conflicting issues/documentation on training data, so just wanted to get some clarity on a few questions I had.
I have forked ottertune and am using it with Azure MySQL (made the proper changes to parser/code/configs to enable that). I used LHS to generate ~50 samples for 6 different knobs, then ran a few loops (resulting in errors, or 'not enough training data found' issues). Overall, I have about 100 points in my no_tuning_session, but when I run loops in a new tuning session, it always either (1) generates no recommendation / blank recommendation or (2) says not enough training data.
1: How do we evaluate whether our training data is good? We are currently using LHS to generate configurations. Are the training points expected to follow a normal distribution?
2: How many training data points are needed, on average? I have seen the tuning pipeline run with 44 training points, but I had uploaded ~100. I'm assuming this is because there were duplicate knobs? If duplicate knobs are filtered out, what's the best way to ensure we have the right amount of data before starting a tuning session?
3: Is there a way to make the upload process faster for LHS samples? To upload ~100 points it would take a day or two if the observation period is 5 minutes (1 upload currently takes ~8-10min).
In general, the tuning pipeline has been failing for me and it's hard to catch the issues until you are running a tuning session itself.
Thanks in advance!
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hi,
I'm testing out ottertune with MySQL & TPCC. I have seen some conflicting issues/documentation on training data, so just wanted to get some clarity on a few questions I had.
I have forked ottertune and am using it with Azure MySQL (made the proper changes to parser/code/configs to enable that). I used LHS to generate ~50 samples for 6 different knobs, then ran a few loops (resulting in errors, or 'not enough training data found' issues). Overall, I have about 100 points in my no_tuning_session, but when I run loops in a new tuning session, it always either (1) generates no recommendation / blank recommendation or (2) says not enough training data.
1: How do we evaluate whether our training data is good? We are currently using LHS to generate configurations. Are the training points expected to follow a normal distribution?
2: How many training data points are needed, on average? I have seen the tuning pipeline run with 44 training points, but I had uploaded ~100. I'm assuming this is because there were duplicate knobs? If duplicate knobs are filtered out, what's the best way to ensure we have the right amount of data before starting a tuning session?
3: Is there a way to make the upload process faster for LHS samples? To upload ~100 points it would take a day or two if the observation period is 5 minutes (1 upload currently takes ~8-10min).
In general, the tuning pipeline has been failing for me and it's hard to catch the issues until you are running a tuning session itself.
Thanks in advance!
The text was updated successfully, but these errors were encountered: