Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pool triggerHealthCheck leaves stranded goroutines for 500ms #1641

Open
bgentry opened this issue Jun 13, 2023 · 4 comments
Open

Pool triggerHealthCheck leaves stranded goroutines for 500ms #1641

bgentry opened this issue Jun 13, 2023 · 4 comments
Labels

Comments

@bgentry
Copy link
Contributor

bgentry commented Jun 13, 2023

Describe the bug
After releasing all conns back to a pgxpool.Pool and closing the pool, there can still be goroutines left running. That's because triggerHealthCheck gets called upon Release(), and it spawns a goroutine which begins by sleeping 500ms no matter what.

This means there is no way to guarantee that a pgxpool is fully shut down with no remaining remnants, at least not without sleeping a non-trivial amount of time.

To Reproduce

I can try to put together a reproduction if it helps. I believe this is triggered by:

  1. Running goleak in your test program
  2. Starting up a pgxpool
  3. Acquiring a conn
  4. Releasing the conn
  5. Closing the pool
  6. Observing that goleak sometimes detects open/leaked goroutines when the program exits.

Expected behavior
Closing a pgxpool.Pool should leave behind no remnants and no open goroutines.

Actual behavior
Because there is a hardcoded 500ms sleep that doesn't terminate when i.e. the pool closes sooner, goroutines are sometimes left running after the pool is closed.

Version

  • Go: $ go version -> go version go1.20.4 darwin/amd64
  • PostgreSQL: $ psql --no-psqlrc --tuples-only -c 'select version()' -> PostgreSQL 14.8 (Homebrew) on x86_64-apple-darwin22.4.0, compiled by Apple clang version 14.0.3 (clang-1403.0.22.14.1), 64-bit
  • pgx: $ grep 'github.com/jackc/pgx/v[0-9]' go.mod -> github.com/jackc/pgx/v5 v5.3.1

Additional context
I did not see any past issues discussing this, so I wanted to at least confirm that it was an intentional design choice and not merely an oversight or bug.

@bgentry bgentry added the bug label Jun 13, 2023
@jackc
Copy link
Owner

jackc commented Jun 14, 2023

Because there is a hardcoded 500ms sleep that doesn't terminate when i.e. the pool closes sooner, goroutines are sometimes left running after the pool is closed.

But the goroutine terminates after that 500ms, right? Does this behavior cause a problem?

@bgentry
Copy link
Contributor Author

bgentry commented Jun 14, 2023

@jackc I think the primary issues are:

  1. This goes against the expectations a Go programmer has that when they close a resource that it is fully closed. This expectation is IMO strengthened by the docs for Pool.Close() which imply that all resources have been released by the time the method returns.

  2. When using goleak in test suites to verify that no goroutines are being leaked by the code being tested, pgx generates false positives with one "leaked" goroutine per conn pool that was recently closed:

    goleak: Errors on successful test run: found unexpected goroutines:
    [Goroutine 21898 in state sleep, with time.Sleep on top of the stack:
    goroutine 21898 [sleep]:
    time.Sleep(0x1dcd6500)
    	/opt/hostedtoolcache/go/1.20.4/x64/src/runtime/time.go:195 +0x135
    github.com/jackc/pgx/v5/pgxpool.(*Pool).triggerHealthCheck.func1()
    	/home/runner/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgxpool/pool.go:379 +0x[31]
    created by github.com/jackc/pgx/v5/pgxpool.(*Pool).triggerHealthCheck
    	/home/runner/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgxpool/pool.go:376 +0x8f
    
     Goroutine 21883 in state sleep, with time.Sleep on top of the stack:
    goroutine 21883 [sleep]:
    time.Sleep(0x1dcd6500)
    	/opt/hostedtoolcache/go/1.20.4/x64/src/runtime/time.go:195 +0x135
    github.com/jackc/pgx/v5/pgxpool.(*Pool).triggerHealthCheck.func1()
    	/home/runner/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgxpool/pool.go:379 +0x31
    created by github.com/jackc/pgx/v5/pgxpool.(*Pool).triggerHealthCheck
    	/home/runner/go/pkg/mod/github.com/jackc/pgx/[email protected]/pgxpool/pool.go:[37]
    

    In order to avoid these false positives, a user of both pgx and goleak needs to add (a) a 500ms+ sleep at the end of their tests, or (b) this exception when invoking goleak:

    var knownGoroutineLeaks = []goleak.Option{ //nolint:gochecknoglobals
    	// This goroutine contains a 500 ms uninterruptable sleep that may still be
    	// running by the time the test suite finishes and cause a failure.
    	goleak.IgnoreTopFunction("github.com/jackc/pgx/v5/pgxpool.(*Pool).backgroundHealthCheck"),
    }

    This exception is also prone to breaking due to pgx internal refactors, so it's a hacky solution.

From a brief look at the code in pool.go, I'm wondering if instead of an uninterruptible time.Sleep(500*time.Millisecond), the goroutine in triggerHealthCheck() could be refactored to use a timer that is properly canceled & drained when that goroutine exits, and then select both that timer and p.closeChan? Something like:

t := time.NewTimer(500*time.Millisecond)

select {
case <-p.closeChan:
	if !t.Stop() {
		<-t.C
	}
	return
case <-t.C:
	select {
	case p.healthCheckChan <- struct{}{}:
	default:
	}
}

This of course doesn't 100% solve the fact that the goroutine could still be running for a brief moment after the return of Pool.Close(); that would probably require a deeper refactor, particularly a way to confirm that this and any other goroutines have exited.

I also see that elsewhere in the code you're doing something similar to the above pattern, except using a <-time.After(). This is another place where a goroutine can be left in a leaked state for up to 500ms. This could also benefit from the above pattern, or you could extract it into a slightly cleaner form with something like this: https://go.dev/play/p/pH0jHo2RZaL

@jackc
Copy link
Owner

jackc commented Jun 14, 2023

🤷‍♂️

This goes against the expectations a Go programmer has that when they close a resource that it is fully closed. This expectation is IMO strengthened by the docs for Pool.Close() which imply that all resources have been released by the time the method returns.

The docs say all connections are closed, and they are. The fact that health checks are implemented via a background goroutine is an implementation detail that isn't exposed through the public interface.

When using goleak in test suites to verify that no goroutines are being leaked by the code being tested, pgx generates false positives with one "leaked" goroutine per conn pool that was recently closed:

That is unfortunate, but I view it as akin to checking memory usage as soon as a value has gone out of scope. The GC may not have run yet. Once a variable is out of scope or a Pool is closed it's not visible to the user application unless runtime introspection is used.

I don't mind if someone wants to refactor / improve this, but it's not something I personally plan on working on.

@bgentry
Copy link
Contributor Author

bgentry commented Jun 14, 2023

@jackc I opened #1642 to at least try to improve these situations, although it doesn't fully resolve the potential goleak issue. At least the workaround is documented here in case anybody goes looking for it.

If in spite of the PR you don't feel this is worth fixing, no worries, feel free to close this and the PR. Thanks for your great work on pgx regardless ✌️

bgentry added a commit to bgentry/pgx that referenced this issue Jul 3, 2023
When a pool is closed, some background goroutines may be left open,
particularly for health checks as detailed in jackc#1641. Two specific
examples have been refactored here to avoid a blocking sleep and instead
also select on the pool being closed to potentially return/continue
sooner.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants