Connection contention #356

ltagliamonte · 2025-01-25T19:44:10Z

I was doing a bit of perf tuning and by doubling the connection number my latencies got much better.
I understand that connections are shared and the more connections "the better" but I'd rather have a way to monitor contention on the connection (gorutines that wait for a connection from the pool) than do guess work especially when the size of my worker pool is parameterized.

Is there a way to expose connection contention? Maybe a log enabled via config?

Thanks a lot for the great project!

mediocregopher · 2025-01-26T15:57:40Z

Hi @ltagliamonte , can you clarify if you're using v3 or v4? The way the Pool works in each is extremely different, so it's not possible to answer without knowing which we're talking about.

ltagliamonte · 2025-01-26T16:03:16Z

Hello @mediocregopher I'm using v4

mediocregopher · 2025-01-26T17:02:44Z

Nice, thanks. So in v4 this question is a bit tricky because there's potentially two places which could be blocking:

Getting a Conn from the Pool. This can block if all Conns have been removed from the Pool, and it's currently empty. A Conn is only removed from the Pool if the Action which is going to be performed is not shareable, which 99% of the time means it is a blocking command like BRPOP, otherwise the Conn is left in the Pool and shared with other shareable Actions. So for the Pool to be empty (and therefore blocking) you'd have to be doing more non-shareable Actions than there are Conns in the Pool.

If you want to know how many non-shareable Actions are taking place within your Pool you could inspect it using a very simple interface:

type poolWrapper struct {
	radix.Client
	nonShareableActionsGauge atomic.Uint64
}

func (pw *poolWrapper) Do(ctx context.Context, a radix.Action) error {
	if !a.Properties().CanShareConn {
		pw.nonShareableActionsGauge.Add(1)
		defer pw.nonShareableActionsGauge.Add(-1)
		// Or however you want to measure it
	}
	return pw.Client.Do(ctx, a)
}

// Spin up a go-routine to periodically log nonShareableActions

For Actions which are shareable, their EncodeDecode calls will be automatically pipelined within Conn. In effect any blocking which happens at this stage is as a result of network congestion, where either the time it takes to write to the socket or read responses back from it is preventing subsequent Actions from having their turn. If you want to know how many Actions are blocked at this part you could essentially do the opposite of the example above: increment a counter for every active shareable Action. Dividing that by the Pool size would give you roughly the current number of Actions which are blocked per Conn.

What you asked for, a log message like "Action is blocked because the Pool is too small" is unfortunately not something which is easily determined, because all Actions block for some amount of time. The only question is how long is acceptable. If you're using a metrics server like Prometheus then a wrapper like the above can be a great place to record action times on a histogram, and once the time it takes to Do an Action has gotten too high you increase the Pool size some more. If you're not using Prometheus you could use an in-memory histogram library to the same effect.

One final note, which doesn't answer your question but might help, is to check out the WriteFlushInterval field of the Dialer if you haven't yet. By setting that to something like 150 microseconds you can increase the overall throughput of Conns, as it will reduce the number of system calls being made even further.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection contention #356

Connection contention #356

ltagliamonte commented Jan 25, 2025 •

edited

Loading

mediocregopher commented Jan 26, 2025

ltagliamonte commented Jan 26, 2025

mediocregopher commented Jan 26, 2025 •

edited

Loading

Connection contention #356

Connection contention #356

Comments

ltagliamonte commented Jan 25, 2025 • edited Loading

mediocregopher commented Jan 26, 2025

ltagliamonte commented Jan 26, 2025

mediocregopher commented Jan 26, 2025 • edited Loading

ltagliamonte commented Jan 25, 2025 •

edited

Loading

mediocregopher commented Jan 26, 2025 •

edited

Loading