Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(e2e): extend logging test to 3 node cluster #18806

Merged
merged 1 commit into from
Oct 30, 2024

Conversation

ghouscht
Copy link
Contributor

See also #17329

This is how I would extend the logging test to include both a single node and a three node cluster. If it is ok like that I'd submit PRs for backporting as written in the linked issue

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

@k8s-ci-robot
Copy link

Hi @ghouscht. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ghouscht ghouscht force-pushed the issue-17329 branch 2 times, most recently from cd7b0f9 to 7b37673 Compare October 29, 2024 11:45
@codecov-commenter
Copy link

codecov-commenter commented Oct 29, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.75%. Comparing base (3696b32) to head (8d094d2).

Current head 8d094d2 differs from pull request most recent head cb19f64

Please upload reports for the commit cb19f64 to get more accurate results.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

see 21 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #18806      +/-   ##
==========================================
+ Coverage   68.73%   68.75%   +0.02%     
==========================================
  Files         420      420              
  Lines       35512    35512              
==========================================
+ Hits        24410    24418       +8     
+ Misses       9675     9663      -12     
- Partials     1427     1431       +4     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3696b32...cb19f64. Read the comment docs.


var lines []string
for i := range tc.clusterSize {
lines = append(lines, epc.Procs[i].Logs().Lines()...)
Copy link
Member

@serathius serathius Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we need to collect log lines after we close the server, did you encounter error with that?

If I remember safely get logs from closed process you need to get handle of logs epc.Procs[i].Logs() before closing process and call .Lines() after closing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did encounter the allowed error but apparently not all the time. It indeed does make a difference if .Lines() is called after closing. By doing so the resulting log lines seem to be constant. Will update the PR.

{
name: "single node cluster",
clusterSize: 1,
allowedErrors: map[string]bool{"setting up serving from embedded etcd failed.": true},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to followup why we get this log.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, saw the comment in the issue. From what I saw in release-3.5 branch this does not happen. I'll have a look where exactly it comes from.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, thanks @ghouscht !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that there are multiple sources for this error. One example is:

{"level":"error","ts":"2024-10-30T09:17:44.048443+0100","caller":"embed/etcd.go:875","msg":"setting up serving from embedded etcd failed.","error":"http: Server closed" ...

and that is because the errHandler function (https://github.com/ghouscht/etcd/blob/issue-17329/server/embed/etcd.go#L873-L875) does not check for the http.ErrServerClosed error. This error is always returned by Serve methods after a call to Close or Shutdown: https://pkg.go.dev/net/http#ErrServerClosed.

After ignoring the error mentioned above I also saw this:

{"level":"error","ts":"2024-10-30T09:17:44.048443+0100","caller":"embed/etcd.go:875","msg":"setting up serving from embedded etcd failed.","error":"accept tcp 127.0.0.1:20000: use of closed network connection" ...

Currently trying to figure out where this comes from.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error logs are generated when closing/shutting down etcd. The listeners are closed firstly, then the accepter will return errors. They should be expected. One possible additional test case is to verify that there is no any error logs before closing etcd, which of course can be discussed & implemented separately.

{"level":"info","ts":"2024-10-30T08:25:35.363864Z","caller":"osutil/interrupt_unix.go:65","msg":"received signal; shutting down","signal":"interrupt"}
{"level":"info","ts":"2024-10-30T08:25:35.363960Z","caller":"embed/etcd.go:398","msg":"closing etcd server","name":"default","data-dir":"default.etcd","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["http://localhost:2379"]}
......
{"level":"error","ts":"2024-10-30T08:25:35.364157Z","caller":"embed/etcd.go:874","msg":"setting up serving from embedded etcd failed.","error":"http: Server closed","stacktrace":"go.etcd.io/etcd/server/v3/embed.(*Etcd).errHandler\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:874\ngo.etcd.io/etcd/server/v3/embed.(*serveCtx).serve.func3\n\tgo.etcd.io/etcd/server/v3/embed/serve.go:176"}
{"level":"error","ts":"2024-10-30T08:25:35.364250Z","caller":"embed/etcd.go:874","msg":"setting up serving from embedded etcd failed.","error":"accept tcp 127.0.0.1:2379: use of closed network connection","stacktrace":"go.etcd.io/etcd/server/v3/embed.(*Etcd).errHandler\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:874\ngo.etcd.io/etcd/server/v3/embed.(*Etcd).serveClients.func1\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:776"}
...
{"level":"error","ts":"2024-10-30T08:25:35.394822Z","caller":"embed/etcd.go:874","msg":"setting up serving from embedded etcd failed.","error":"accept tcp 127.0.0.1:2380: use of closed network connection","stacktrace":"go.etcd.io/etcd/server/v3/embed.(*Etcd).errHandler\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:874\ngo.etcd.io/etcd/server/v3/embed.(*Etcd).servePeers.func3\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:614"}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is what I figured out as well. In that case these errors are expected and we could simply adapt the errorHandler function to ignore them, e.g. like this:

	if err != nil &&
		!errors.Is(err, http.ErrServerClosed) &&
		!errors.Is(err, net.ErrClosed) {
		e.GetLogger().Error("setting up serving from embedded etcd failed.", zap.Error(err))
	}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's difficult to be certain whether http.ErrServerClose or net.ErrClosed is expected by just one error itself. Suggest not to change the production code for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahrtr
Copy link
Member

ahrtr commented Oct 30, 2024

/ok-to-test

Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr, ghouscht, serathius

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants