add support to batch metric and send on a strict interval #169

maciuszek · 2024-12-19T21:08:16Z

Add more definite send interval control than what flush interval provided, restricting when any metrics is sent to this interval. This will implicitly introduce batching for timers.

… metrics is sent to this interval

improve comments

add some tests for batching

sokada1221

A few non-binding comments - as I may not have full context/understanding:

I think it's generally a good idea to implement batching with 2 parameters: batch size and interval. Since interval is already configurable, it'd be nice to have batch size configurable - with default being the magic approxMaxMemBytes/bufSize lol
How do you plan to test the change? Should we add some metrics/logs to verify the batching behavior?

add more comments

maciuszek · 2024-12-23T20:48:43Z

stats_test.go

-	counter = sink.record
-	if !strings.Contains(counter, expected) {
-		t.Error("wanted counter value of test.___f=i:1|c, got", counter)
+	expected = "test.__host=i:1|c"


Note: this test is out of scope of this work but it was previously volatile with the order of reserved_tag vs test.__host not being deterministic

maciuszek · 2024-12-23T20:55:49Z

net_sink.go

@@ -118,8 +118,8 @@ func NewNetSink(opts ...SinkOption) FlushableSink {
 		bufSize = defaultBufferSizeTCP
 	}

-	s.outc = make(chan *bytes.Buffer, approxMaxMemBytes/bufSize)
-	s.retryc = make(chan *bytes.Buffer, 1) // It should be okay to limit this given we preferentially process from this over outc.
+	s.outc = make(chan *bytes.Buffer, approxMaxMemBytes/bufSize) // todo: need to understand why/how this number was chosen and probably elevate it


~~To make threading possible forking #169 (review) to here . @sokada1221.~~

So this doesn't restrict memory allocated, but the amount of slots per metric/string.
If it exceeds it'll block more stats for being written until read, it wouldn't act as a batching mechanism 🤔. buffered channeled are a bit strange, i think in actuality this buffer will always be full, if we would change it to normal channel with no buffer (always block), i don't think we would see an impact.

…ed channel was full

charlievieth · 2024-12-27T02:12:01Z

net_sink.go

 		}
 	}
+	return batch[:0]


If there is an error in send this will cause any following batched metrics to be dropped. Might be better to keep them queued. Additionally, this prevents us from putting any buffers received after the error back into the buffer pool (which is only a problem if we decide to keep the existing logic).

If there's an error in send, the metrics will be written to the retryc channel so nothing should be dropped i think, despite clearing it from the batch. In the current state, that retry handling will escape batching altogether, will think on this.

Nevermind, I see any failure prevents any/all subsequent sends in the current iteration, good catch, thanks

…dge cases expand todos

add more comments (will need to cleanup later)

fix bug with channel block when batching

paulinjo · 2025-01-02T19:18:36Z

settings.go

@@ -20,6 +20,8 @@ const (
 	DefaultFlushIntervalS = 5
 	// DefaultLoggingSinkDisabled is the default behavior of logging sink suppression, default is false.
 	DefaultLoggingSinkDisabled = false
+	// DefaultBatchSize is the default maximum amount of stats we batch before sending, default is 0 which disables batching.


So we're batching based on a number of stats here instead of the size in MB? In that case I would think a batch of 1 would be equivalent to no batching, not 0; a 0 batch size doesn't really make sense.

But even better if we could explicitly enable/disable batching via config, I think that would make things more clear.

Exactly yeah, the batch size specifies the number of stats we batch.

batch size 1 will effectively be similar as no batching, but 0 specifically reverts back to the old behaviour hence my use of it, as in don't go through the batch code at all.

We could have 2 configurations for this, since i'm effectively using 0 like a feature flag anyway. But will we really ever clean it up, and having 1 configuration hard dependant on the other isn't too great.

Done. Made the configuration more granular

add test for batch send failure remove some unnecessary comments

fix bug with batch interval reclarify some comments

improved code structure and comments

paulinjo · 2025-01-06T14:22:08Z

net_sink.go

+	batchTimeout := time.Duration(s.conf.BatchSendIntervalS) * time.Second
+	batchInterval := time.After(batchTimeout)
+
+	sendBatch := false


nit: rename to doSendBatch or shouldSendBatch

paulinjo · 2025-01-06T14:38:28Z

net_sink.go

 		default:
 			// Drop through in case retryc has nothing.
 		}

+		// send batched outc data anytime indicated or the batch is full
+		if sendBatch || len(batch) >= batchSize {


The amount of nesting and conditions here is a bit hard to follow. Hopefully there's some way to refactor that would improve readability.

Refactored the code a bit and I think it's a bit more readable.

The only alternative I can think of is forking another run function for batching and determine which to use when launching the go routine.

Doing that may indeed be more efficient since there will be less conditions per loop, but the drawback is we'll have duplicate code which will be hard to maintain in 2 places 🤔

What are your thoughts, I could modularize some of the internals but the structure will effectively be duplicated.

… iterations

prototype a more definite flush interval control restricting when any…

d0c14db

… metrics is sent to this interval

maciuszek marked this pull request as draft December 19, 2024 21:09

maciuszek added 3 commits December 19, 2024 16:38

fix settings

cefa4c8

improve comments

add todo to review buffering capacity

2f51fd7

improve naming

384f462

maciuszek force-pushed the mattkuzminski/enforce-batching-for-all-metrics branch from 01f9e4d to 384f462 Compare December 20, 2024 00:51

fix outc processing competition and allow batching

90a0f82

add some tests for batching

sokada1221 reviewed Dec 23, 2024

View reviewed changes

maciuszek added 2 commits December 23, 2024 15:43

fix pre-exisiting volatile test

391d649

optimize delays for concurrent action in tests

257ebc4

add more comments

maciuszek commented Dec 23, 2024

View reviewed changes

maciuszek added 5 commits December 23, 2024 17:14

major fix: previously the batching would have blocked once the buffer…

0d06f21

…ed channel was full

fix comment

dba2df8

fix spelling for linter

3d32fa5

a bit of a redesign with improvements to code breakdown

101187d

fix comments

5a6293c

maciuszek force-pushed the mattkuzminski/enforce-batching-for-all-metrics branch from 972503c to 5a6293c Compare December 24, 2024 19:58

charlievieth reviewed Dec 27, 2024

View reviewed changes

maciuszek added 2 commits December 30, 2024 10:12

readd doflush outc drain logic

446188d

improve comments

c3b9a08

maciuszek force-pushed the mattkuzminski/enforce-batching-for-all-metrics branch from 76f2891 to c3b9a08 Compare December 30, 2024 15:21

maciuszek added 4 commits December 30, 2024 13:00

refactor to support retries in batch send

b3722fa

fix lint

1c80dae

add ugly code to cover batching in all scenarios cases and consider e…

d1944c2

…dge cases expand todos

fix spelling for linter

b7f0a3d

maciuszek force-pushed the mattkuzminski/enforce-batching-for-all-metrics branch from 9e755e1 to b7f0a3d Compare December 31, 2024 03:58

maciuszek added 3 commits December 31, 2024 00:06

guarentee batch doesn't starve

b1ed20e

add more comments (will need to cleanup later)

expand some comments and add todo

eeb03f2

improve batch send timeout

40cac66

fix bug with channel block when batching

maciuszek added 3 commits January 2, 2025 12:11

add tests for batch autosend

3837a52

clean up comments

10b9773

fix spelling for linter

5828040

maciuszek marked this pull request as ready for review January 2, 2025 19:08

paulinjo reviewed Jan 2, 2025

View reviewed changes

maciuszek added 2 commits January 2, 2025 15:51

fix batch send failure remining batch logic

a46cf32

add test for batch send failure remove some unnecessary comments

fix lint errors

51ee282

maciuszek force-pushed the mattkuzminski/enforce-batching-for-all-metrics branch from 68e9ecb to 51ee282 Compare January 2, 2025 21:09

maciuszek added 4 commits January 3, 2025 12:23

improve batch send tests

d3e4b19

fix bug with batch interval reclarify some comments

made batch configuration more granular

747fbc2

improved code structure and comments

fix spelling

9c6f78f

improve comments

25cd271

maciuszek changed the title ~~support a strict flush interval for all metrics~~ add support to batch metric and send on a strict interval Jan 3, 2025

paulinjo reviewed Jan 6, 2025

View reviewed changes

maciuszek added 2 commits January 6, 2025 09:46

improve variable naming

a8ec4d4

restructure

abc1e18

paulinjo previously approved these changes Jan 6, 2025

View reviewed changes

refactor how we send metrics with an interface design optimizing loop…

108884c

… iterations

maciuszek dismissed paulinjo’s stale review via 108884c January 6, 2025 23:44

maciuszek added 2 commits January 6, 2025 18:46

fix spelling for linter

1a0e7d4

only create connection when we need to for batch sends

9b67c65

maciuszek closed this Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support to batch metric and send on a strict interval #169

add support to batch metric and send on a strict interval #169

maciuszek commented Dec 19, 2024 •

edited

Loading

sokada1221 left a comment

maciuszek Dec 23, 2024 •

edited

Loading

maciuszek Dec 23, 2024 •

edited

Loading

charlievieth Dec 27, 2024

maciuszek Dec 30, 2024 •

edited

Loading

paulinjo Jan 2, 2025

maciuszek Jan 2, 2025

maciuszek Jan 3, 2025

paulinjo Jan 6, 2025

paulinjo Jan 6, 2025

maciuszek Jan 6, 2025

add support to batch metric and send on a strict interval #169

add support to batch metric and send on a strict interval #169

Conversation

maciuszek commented Dec 19, 2024 • edited Loading

sokada1221 left a comment

Choose a reason for hiding this comment

maciuszek Dec 23, 2024 • edited Loading

Choose a reason for hiding this comment

maciuszek Dec 23, 2024 • edited Loading

Choose a reason for hiding this comment

charlievieth Dec 27, 2024

Choose a reason for hiding this comment

maciuszek Dec 30, 2024 • edited Loading

Choose a reason for hiding this comment

paulinjo Jan 2, 2025

Choose a reason for hiding this comment

maciuszek Jan 2, 2025

Choose a reason for hiding this comment

maciuszek Jan 3, 2025

Choose a reason for hiding this comment

paulinjo Jan 6, 2025

Choose a reason for hiding this comment

paulinjo Jan 6, 2025

Choose a reason for hiding this comment

maciuszek Jan 6, 2025

Choose a reason for hiding this comment

maciuszek commented Dec 19, 2024 •

edited

Loading

maciuszek Dec 23, 2024 •

edited

Loading

maciuszek Dec 23, 2024 •

edited

Loading

maciuszek Dec 30, 2024 •

edited

Loading