-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for persistent buffering in receiver/gateway #1342
base: main
Are you sure you want to change the base?
Added support for persistent buffering in receiver/gateway #1342
Conversation
Could also add separate settings for logs and metrics, something like this: |
helm-charts/splunk-otel-collector/templates/config/_otel-k8s-cluster-receiver-config.tpl
Outdated
Show resolved
Hide resolved
helm-charts/splunk-otel-collector/templates/config/_otel-k8s-cluster-receiver-config.tpl
Outdated
Show resolved
Hide resolved
setfacl -n -Rm d:m::rx,m::rx,d:g:{{ $clusterReceiver.securityContext.runAsGroup | default 999 }}:rx,g:{{ $clusterReceiver.securityContext.runAsGroup | default 999 }}:rx {{ .Values.splunkPlatform.sendingQueue.persistentQueue.storagePath }}/receiver; | ||
{{- end }}'] | ||
securityContext: | ||
runAsUser: 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We probably need to document this runAsUser 0 and above used permissions in https://github.com/signalfx/splunk-otel-collector-chart/blob/main/docs/advanced-configuration.md#data-persistence as well as customer documentation. This would be done for the agent, cluster receiver, and gateway. A number of customers request information like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest adding documentation primarily in the Advanced Configuration: Data Persistence section. Then, include a brief note with a hyperlink in the Running the Container in Non-Root User Mode section that points to the data persistence section.
@@ -104,6 +104,31 @@ spec: | |||
mountPath: /splunk-messages | |||
- mountPath: /conf | |||
name: collector-configmap | |||
- name: patch-log-dirs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We might be outgrowing "log" functionality here for patch-log-dirs. If it works for now I'm fine with it. It might be time to refactor the names and images a little to make them more abstract to better address these new use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, something like patch-dirs would work for all configurations I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'd go with patch-dirs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does persistent buffering require the .Values.logsCollection.checkpointPath directory for checkpoints? I don't think it does. I recommend removing the code that sets up the logsCollection checkpoint directory for the cluster receiver and gateway to avoid potential race conditions. Since the agent is deployed to all nodes and the cluster receiver and gateway are only deployed to some, they may attempt to set permissions on the same directory simultaneously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To move forward with this
- I would just remove lines 111-113 for now so the agent and cluster receiver are not sharing on disk resources.
- Make sure to only include any of this new cluster receiver code if
.Values.splunkPlatform.sendingQueue.persistentQueue.enabled=true
Please update the docs at https://github.com/signalfx/splunk-otel-collector-chart/blob/main/docs/advanced-configuration.md#data-persistence for these changes as well. |
…luster-receiver-config.tpl Co-authored-by: jvoravong <[email protected]>
…luster-receiver-config.tpl Co-authored-by: jvoravong <[email protected]>
@@ -16,6 +15,12 @@ extensions: | |||
observe_nodes: true | |||
{{- end }} | |||
|
|||
{{- if .Values.splunkPlatform.sendingQueue.persistentQueue.enabled }} | |||
file_storage/persistent_queue_receiver: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit
file_storage/persistent_queue_receiver: | |
file_storage/persistent_queue_cluster_receiver: |
- k8s_observer | ||
{{- end }} | ||
{{- if .Values.splunkPlatform.sendingQueue.persistentQueue.enabled }} | ||
- file_storage/persistent_queue_receiver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit
- file_storage/persistent_queue_receiver | |
- file_storage/persistent_queue_cluster_receiver |
@@ -62,6 +62,24 @@ spec: | |||
securityContext: | |||
{{- include "splunk-otel-collector.securityContext" (dict "isWindows" .Values.isWindows "securityContext" $gateway.securityContext) | nindent 8 }} | |||
{{- end }} | |||
initContainers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We ill only want to include the initContainers if persistent queue is enabled
{{- if .Values.splunkPlatform.sendingQueue.persistentQueue.enabled }}
initContainers:
...
{{- end }}
Can you also update the enable-persistence-queue example to include |
POC - Added support for persistent buffering in receiver/gateway