-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky test TestAuthLeaseTimeToLive
#18585
Comments
Discussed during sig-etcd triage meeting. Next step for this would be to try and establish a flake rate percentage using |
I keep reading about this
Might also be good to add this to the contributing guide afterwards so other newbies like me aren't lost 🙂 (I can take care of that!) Sorry for the question being a bit off-topic, I'd really like to help a bit with this issue but once again I'm confused about the referred to |
@ghouscht this is the You can install it by running: go install golang.org/x/tools/cmd/stress@latest It would be helpful to clarify in the contributing guide |
/assign Thank you, I opened a small PR that improves the contributing guide with the above info. Now that I know which |
While preparing to run the tests with
However, running the tests in To be able to run the tests in e2e mode with stress I made the following modifications in func (e e2eRunner) NewCluster(ctx context.Context, t testing.TB, opts ...config.ClusterOption) intf.Cluster {
cfg := config.NewClusterConfig(opts...)
+
+ lis, err := net.Listen("tcp", ":0")
+ if err != nil {
+ panic(err)
+ }
+ lis.Close()
+
+ _, port, err := net.SplitHostPort(lis.Addr().String())
+ if err != nil {
+ panic(err)
+ }
+
+ t.Logf("using base port: %s", port)
+
+ p, _ := strconv.Atoi(port)
+
e2eConfig := NewConfig(
WithClusterSize(cfg.ClusterSize),
WithQuotaBackendBytes(cfg.QuotaBackendBytes),
WithStrictReconfigCheck(cfg.StrictReconfigCheck),
WithAuthTokenOpts(cfg.AuthToken),
WithSnapshotCount(cfg.SnapshotCount),
+ WithBasePort(p),
)
if cfg.ClusterContext != nil { In 1021 runs, I saw 37 failures in total, 3 of them were due to the described issue and the remaining 34 were due to port conflicts (race condition beteween asking the kernel for a free port and actually using it in the parallel test runs). So that is roughly a 0.3% failure rate for the given test |
Some more observations I made in the meantime. When this happens
and the etcd server:
This can be seen in the prow job output as well. Edit: Apparently the I think increasing the token TTL a bit should solve the issue. We should probably also think of improving the |
Hey @lucasrod16 I think now that my PR was merged I think we can close this issue, do you agree? I also had a quick look at the etcd test grid and to me the test seems to be stable now. |
Which Github Action / Prow Jobs are flaking?
e2e-386
Which tests are flaking?
TestAuthLeaseTimeToLive
Github Action / Prow Job link
https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/etcd-io_etcd/18574/pull-etcd-e2e-386/1834665788858961920
Reason for failure (if possible)
Anything else we need to know?
No response
The text was updated successfully, but these errors were encountered: