-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support to batch metric and send on a strict interval #169
Conversation
… metrics is sent to this interval
improve comments
01f9e4d
to
384f462
Compare
add some tests for batching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few non-binding comments - as I may not have full context/understanding:
- I think it's generally a good idea to implement batching with 2 parameters: batch size and interval. Since interval is already configurable, it'd be nice to have batch size configurable - with default being the magic
approxMaxMemBytes/bufSize
lol - How do you plan to test the change? Should we add some metrics/logs to verify the batching behavior?
counter = sink.record | ||
if !strings.Contains(counter, expected) { | ||
t.Error("wanted counter value of test.___f=i:1|c, got", counter) | ||
expected = "test.__host=i:1|c" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this test is out of scope of this work but it was previously volatile with the order of reserved_tag vs test.__host not being deterministic
net_sink.go
Outdated
@@ -118,8 +118,8 @@ func NewNetSink(opts ...SinkOption) FlushableSink { | |||
bufSize = defaultBufferSizeTCP | |||
} | |||
|
|||
s.outc = make(chan *bytes.Buffer, approxMaxMemBytes/bufSize) | |||
s.retryc = make(chan *bytes.Buffer, 1) // It should be okay to limit this given we preferentially process from this over outc. | |||
s.outc = make(chan *bytes.Buffer, approxMaxMemBytes/bufSize) // todo: need to understand why/how this number was chosen and probably elevate it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make threading possible forking #169 (review) to here . @sokada1221.
So this doesn't restrict memory allocated, but the amount of slots per metric/string.
If it exceeds it'll block more stats for being written until read, it wouldn't act as a batching mechanism 🤔. buffered channeled are a bit strange, i think in actuality this buffer will always be full, if we would change it to normal channel with no buffer (always block), i don't think we would see an impact.
972503c
to
5a6293c
Compare
net_sink.go
Outdated
} | ||
} | ||
return batch[:0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is an error in send
this will cause any following batched metrics to be dropped. Might be better to keep them queued. Additionally, this prevents us from putting any buffers received after the error back into the buffer pool (which is only a problem if we decide to keep the existing logic).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's an error in send, the metrics will be written to the retryc channel so nothing should be dropped i think, despite clearing it from the batch. In the current state, that retry handling will escape batching altogether, will think on this.
Nevermind, I see any failure prevents any/all subsequent sends in the current iteration, good catch, thanks
76f2891
to
c3b9a08
Compare
9e755e1
to
b7f0a3d
Compare
add more comments (will need to cleanup later)
fix bug with channel block when batching
settings.go
Outdated
@@ -20,6 +20,8 @@ const ( | |||
DefaultFlushIntervalS = 5 | |||
// DefaultLoggingSinkDisabled is the default behavior of logging sink suppression, default is false. | |||
DefaultLoggingSinkDisabled = false | |||
// DefaultBatchSize is the default maximum amount of stats we batch before sending, default is 0 which disables batching. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we're batching based on a number of stats here instead of the size in MB? In that case I would think a batch of 1 would be equivalent to no batching, not 0; a 0 batch size doesn't really make sense.
But even better if we could explicitly enable/disable batching via config, I think that would make things more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly yeah, the batch size specifies the number of stats we batch.
batch size 1 will effectively be similar as no batching, but 0 specifically reverts back to the old behaviour hence my use of it, as in don't go through the batch code at all.
We could have 2 configurations for this, since i'm effectively using 0 like a feature flag anyway. But will we really ever clean it up, and having 1 configuration hard dependant on the other isn't too great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Made the configuration more granular
add test for batch send failure remove some unnecessary comments
68e9ecb
to
51ee282
Compare
fix bug with batch interval reclarify some comments
improved code structure and comments
net_sink.go
Outdated
batchTimeout := time.Duration(s.conf.BatchSendIntervalS) * time.Second | ||
batchInterval := time.After(batchTimeout) | ||
|
||
sendBatch := false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename to doSendBatch
or shouldSendBatch
net_sink.go
Outdated
default: | ||
// Drop through in case retryc has nothing. | ||
} | ||
|
||
// send batched outc data anytime indicated or the batch is full | ||
if sendBatch || len(batch) >= batchSize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The amount of nesting and conditions here is a bit hard to follow. Hopefully there's some way to refactor that would improve readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored the code a bit and I think it's a bit more readable.
The only alternative I can think of is forking another run function for batching and determine which to use when launching the go routine.
Doing that may indeed be more efficient since there will be less conditions per loop, but the drawback is we'll have duplicate code which will be hard to maintain in 2 places 🤔
What are your thoughts, I could modularize some of the internals but the structure will effectively be duplicated.
Add more definite send interval control than what flush interval provided, restricting when any metrics is sent to this interval. This will implicitly introduce batching for timers.