Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move tls handshake errors to debug #5766

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ripienaar
Copy link
Contributor

This turned out to be a much bigger undertaking due to how the logs are used in tests

Signed-off-by: R.I.Pienaar [email protected]

@ripienaar ripienaar requested a review from a team as a code owner August 8, 2024 16:51
@derekcollison
Copy link
Member

This feels like it should be a server opt-in to downgrade. I personally would not want this behavior in a production system I would run..

@ripienaar
Copy link
Contributor Author

Yeah its hard to say, Phil and others have indicated these logs dont add any value other than being annoying so feature group decided to downgrade their level, but we could also make it opt in I suppose - though I do not find these useful due to the nature of modern networks with scanners, load balancers and all kind of other annoyances

DummyLogger
errCh chan string
logCh chan string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer instead to have two channels, leaving the errCh as it was and add another debugCh which can be used to capture only those entries on that level

Copy link
Contributor Author

@ripienaar ripienaar Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considered that but all the tests only enable debugging when there are logs to match - so doing this didnt seem like it would improve anything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside these log based checks are universally awful, some of these tests only acts on a message being logged regardless of content - incredibly brittle as we add logs over time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

select {
case <-l.errCh:
gotIt = true
case <-time.After(time.Second):
// Try again
}

@@ -2444,7 +2443,7 @@ func TestClientClampMaxSubsErrReport(t *testing.T) {
s1 := RunServer(o1)
defer s1.Shutdown()

l := &captureErrorLogger{errCh: make(chan string, 10)}
l := &captureLogger{logCh: make(chan string, 10)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fail to see why all tests that used captureErrorLogger{} need to be changed. For instance, I did just the client modification in client.go, and run this test (without modification) and it passes. Ditto for the gateway test, etc.. I think only tests that did check for TLS handshake should be changed to go to debug, and one can very well create a dedicated capture logger to check that only a given err/debug/whatever message matches before inserting in the channel (to avoid the test itself to have to deal with err/debug/whatever traces that are not relevant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason most of them changed is because I renamed the struct as its not anymore a error logger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this - didnt rename the struct so I think its incorrectly named now but at least the change set is smaller and added a suggested filter to the one

@ripienaar ripienaar force-pushed the tls_error_debug branch 3 times, most recently from afb0322 to d97757f Compare August 9, 2024 08:28
Copy link
Member

@kozlovic kozlovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ripienaar Sorry for the back and forth. I think it is simpler to create a "dedicated" logger that captures TLS handshake errors and have the tests that were looking for those type of errors (now debug traces, but still an "error") to use that logger instead. Does that make sense?

@@ -337,12 +337,32 @@ func TestLeafNodeTLSRemoteWithNoCerts(t *testing.T) {

type captureErrorLogger struct {
DummyLogger
errCh chan string
filter func(string) bool
Copy link
Member

@kozlovic kozlovic Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what I meant. We can create a specific logger for the TLS handshake error and modify only the tests that were checking for the handshake error (when using the generic error capture logger). I would say that in most cases we really only care that we got an error (now a debug trace, but still an handshake error), so a channel of struct{} would suffice, but since in one of the test TestLeafNodeTLSConfigReloadForRemote, we check that we get bad certificate, we could still have string.
So, I would add a dedicated logger, and modify the tests. Here is a diff on the addition of the logger and modification of only 2 of the tests (the others would be similar):

+type captureTLSHandshakeError struct {
+       DummyLogger
+       ch chan string
+}
+
+func (c *captureTLSHandshakeError) Debugf(format string, v ...any) {
+       msg := fmt.Sprintf(format, v...)
+       if strings.Contains(msg, "handshake error") {
+               select {
+               case c.ch <- msg:
+               default:
+               }
+       }
+}
+
 func TestLeafNodeTLSConfigReload(t *testing.T) {
        template := `
                listen: 127.0.0.1:-1
@@ -2623,8 +2638,8 @@ func TestLeafNodeTLSConfigReload(t *testing.T) {
        srvA, optsA := RunServerWithConfig(confA)
        defer srvA.Shutdown()
 
-       lg := &captureErrorLogger{errCh: make(chan string, 10)}
-       srvA.SetLogger(lg, false, false)
+       lg := &captureTLSHandshakeError{ch: make(chan string, 10)}
+       srvA.SetLogger(lg, true, false)
 
        confB := createConfFile(t, []byte(fmt.Sprintf(`
                listen: -1
@@ -2654,14 +2669,7 @@ func TestLeafNodeTLSConfigReload(t *testing.T) {
 
        // Wait for the error
        select {
-       case err := <-lg.errCh:
-               // Since Go 1.18, we had to regenerate certs to not have to use GODEBUG="x509sha1=1"
-               // But on macOS, with our test CA certs, no SCTs included, it will fail
-               // for the reason "x509: “localhost” certificate is not standards compliant"
-               // instead of "unknown authority".
-               if !strings.Contains(err, "unknown") && !strings.Contains(err, "compliant") {
-                       t.Fatalf("Unexpected error: %v", err)
-               }
+       case <-lg.ch:
        case <-time.After(2 * time.Second):
                t.Fatalf("Did not get TLS error")
        }
@@ -2696,8 +2704,8 @@ func TestLeafNodeTLSConfigReloadForRemote(t *testing.T) {
        srvA, optsA := RunServerWithConfig(confA)
        defer srvA.Shutdown()
 
-       lg := &captureErrorLogger{errCh: make(chan string, 10)}
-       srvA.SetLogger(lg, false, false)
+       lg := &captureTLSHandshakeError{ch: make(chan string, 10)}
+       srvA.SetLogger(lg, true, false)
 
        template := `
                listen: -1
@@ -2721,7 +2729,7 @@ func TestLeafNodeTLSConfigReloadForRemote(t *testing.T) {
 
        // Wait for the error
        select {
-       case err := <-lg.errCh:
+       case err := <-lg.ch:
                if !strings.Contains(err, "bad certificate") {
                        t.Fatalf("Unexpected error: %v", err)
                }

@@ -117,7 +117,7 @@ func TestClusterTLSInsecure(t *testing.T) {
defer srvA.Shutdown()

l := &captureTLSError{ch: make(chan struct{}, 1)}
srvA.SetLogger(l, false, false)
srvA.SetLogger(l, true, false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to modify the test in the /test package that uses this logger: TestClusterTLSInsecure.

@github-actions github-actions bot added the stale This issue has had no activity in a while label Oct 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale This issue has had no activity in a while
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants