-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for persistent buffering in receiver/gateway #1342
base: main
Are you sure you want to change the base?
Changes from all commits
058014d
503912f
57a97b4
25b578e
9a8b3f7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -7,7 +7,6 @@ The values can be overridden in .Values.clusterReceiver.config | |||||
extensions: | ||||||
health_check: | ||||||
|
||||||
|
||||||
{{- if eq (include "splunk-otel-collector.distribution" .) "eks/fargate" }} | ||||||
# k8s_observer w/ pod and node detection for eks/fargate deployment | ||||||
k8s_observer: | ||||||
|
@@ -16,6 +15,12 @@ extensions: | |||||
observe_nodes: true | ||||||
{{- end }} | ||||||
|
||||||
{{- if .Values.splunkPlatform.sendingQueue.persistentQueue.enabled }} | ||||||
file_storage/persistent_queue_receiver: | ||||||
directory: {{ .Values.splunkPlatform.sendingQueue.persistentQueue.storagePath }}/clusterReceiver | ||||||
timeout: 0 | ||||||
{{- end }} | ||||||
|
||||||
receivers: | ||||||
# Prometheus receiver scraping metrics from the pod itself | ||||||
{{- include "splunk-otel-collector.prometheusInternalMetrics" "k8s-cluster-receiver" | nindent 2}} | ||||||
|
@@ -205,11 +210,14 @@ service: | |||||
telemetry: | ||||||
metrics: | ||||||
address: 0.0.0.0:8889 | ||||||
{{- if eq (include "splunk-otel-collector.distribution" .) "eks/fargate" }} | ||||||
extensions: [health_check, k8s_observer] | ||||||
{{- else }} | ||||||
extensions: [health_check] | ||||||
{{- end }} | ||||||
extensions: | ||||||
- health_check | ||||||
{{- if eq (include "splunk-otel-collector.distribution" .) "eks/fargate" }} | ||||||
- k8s_observer | ||||||
{{- end }} | ||||||
{{- if .Values.splunkPlatform.sendingQueue.persistentQueue.enabled }} | ||||||
- file_storage/persistent_queue_receiver | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit
Suggested change
|
||||||
{{- end }} | ||||||
pipelines: | ||||||
{{- if or (eq (include "splunk-otel-collector.o11yMetricsEnabled" $) "true") (eq (include "splunk-otel-collector.platformMetricsEnabled" $) "true") }} | ||||||
# k8s metrics pipeline | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -104,6 +104,31 @@ spec: | |
mountPath: /splunk-messages | ||
- mountPath: /conf | ||
name: collector-configmap | ||
- name: patch-log-dirs | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: We might be outgrowing "log" functionality here for patch-log-dirs. If it works for now I'm fine with it. It might be time to refactor the names and images a little to make them more abstract to better address these new use cases. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, something like patch-dirs would work for all configurations I guess? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I'd go with patch-dirs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does persistent buffering require the .Values.logsCollection.checkpointPath directory for checkpoints? I don't think it does. I recommend removing the code that sets up the logsCollection checkpoint directory for the cluster receiver and gateway to avoid potential race conditions. Since the agent is deployed to all nodes and the cluster receiver and gateway are only deployed to some, they may attempt to set permissions on the same directory simultaneously. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To move forward with this
|
||
image: {{ template "splunk-otel-collector.image.initPatchLogDirs" . }} | ||
imagePullPolicy: {{ .Values.image.initPatchLogDirs.pullPolicy }} | ||
command: ['sh', '-c', ' | ||
mkdir -p {{ .Values.logsCollection.checkpointPath }}; | ||
chown -Rv {{ $clusterReceiver.securityContext.runAsUser | default 999 }}:{{ $clusterReceiver.securityContext.runAsGroup | default 999 }} {{ .Values.logsCollection.checkpointPath }}; | ||
chmod -v g+rwxs {{ .Values.logsCollection.checkpointPath }}; | ||
|
||
{{- if .Values.splunkPlatform.sendingQueue.persistentQueue.enabled }} | ||
mkdir -p {{ .Values.splunkPlatform.sendingQueue.persistentQueue.storagePath }}/clusterReceiver; | ||
chown -Rv {{ $clusterReceiver.securityContext.runAsUser | default 999 }}:{{ $clusterReceiver.securityContext.runAsGroup | default 999 }} {{ .Values.splunkPlatform.sendingQueue.persistentQueue.storagePath }}/clusterReceiver; | ||
chmod -v g+rwxs {{ .Values.splunkPlatform.sendingQueue.persistentQueue.storagePath }}/clusterReceiver; | ||
setfacl -n -Rm d:m::rx,m::rx,d:g:{{ $clusterReceiver.securityContext.runAsGroup | default 999 }}:rx,g:{{ $clusterReceiver.securityContext.runAsGroup | default 999 }}:rx {{ .Values.splunkPlatform.sendingQueue.persistentQueue.storagePath }}/clusterReceiver; | ||
{{- end }}'] | ||
securityContext: | ||
runAsUser: 0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: We probably need to document this runAsUser 0 and above used permissions in https://github.com/signalfx/splunk-otel-collector-chart/blob/main/docs/advanced-configuration.md#data-persistence as well as customer documentation. This would be done for the agent, cluster receiver, and gateway. A number of customers request information like this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd suggest adding documentation primarily in the Advanced Configuration: Data Persistence section. Then, include a brief note with a hyperlink in the Running the Container in Non-Root User Mode section that points to the data persistence section. |
||
resources: | ||
{{- toYaml $clusterReceiver.resources | nindent 12 }} | ||
volumeMounts: | ||
- name: checkpoint | ||
mountPath: {{ .Values.logsCollection.checkpointPath }} | ||
{{- if .Values.splunkPlatform.sendingQueue.persistentQueue.enabled }} | ||
- name: persistent-queue | ||
mountPath: {{ .Values.splunkPlatform.sendingQueue.persistentQueue.storagePath }}/clusterReceiver | ||
{{- end }} | ||
{{- end }} | ||
containers: | ||
- name: otel-collector | ||
|
@@ -189,6 +214,10 @@ spec: | |
mountPath: /otel/etc | ||
readOnly: true | ||
{{- end }} | ||
{{- if .Values.splunkPlatform.sendingQueue.persistentQueue.enabled }} | ||
- name: persistent-queue | ||
mountPath: {{ .Values.splunkPlatform.sendingQueue.persistentQueue.storagePath }}/clusterReceiver | ||
{{- end }} | ||
- mountPath: {{ .Values.isWindows | ternary "C:\\conf" "/conf" }} | ||
name: collector-configmap | ||
{{- if eq (include "splunk-otel-collector.distribution" .) "eks/fargate" }} | ||
|
@@ -217,6 +246,12 @@ spec: | |
secret: | ||
secretName: {{ template "splunk-otel-collector.secret" . }} | ||
{{- end }} | ||
{{- if .Values.splunkPlatform.sendingQueue.persistentQueue.enabled }} | ||
- name: persistent-queue | ||
hostPath: | ||
path: {{ .Values.splunkPlatform.sendingQueue.persistentQueue.storagePath }}/clusterReceiver | ||
type: DirectoryOrCreate | ||
{{- end }} | ||
{{- if eq (include "splunk-otel-collector.distribution" .) "eks/fargate" }} | ||
- name: init-eks-fargate-cluster-receiver-script | ||
configMap: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -62,6 +62,24 @@ spec: | |
securityContext: | ||
{{- include "splunk-otel-collector.securityContext" (dict "isWindows" .Values.isWindows "securityContext" $gateway.securityContext) | nindent 8 }} | ||
{{- end }} | ||
initContainers: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We ill only want to include the initContainers if persistent queue is enabled
|
||
- name: patch-log-dirs | ||
image: {{ template "splunk-otel-collector.image.initPatchLogDirs" . }} | ||
imagePullPolicy: {{ .Values.image.initPatchLogDirs.pullPolicy }} | ||
command: ['sh', '-c', ' | ||
{{- if .Values.splunkPlatform.sendingQueue.persistentQueue.enabled }} | ||
mkdir -p {{ .Values.splunkPlatform.sendingQueue.persistentQueue.storagePath }}/gateway; | ||
chown -Rv {{ $gateway.securityContext.runAsUser | default 999 }}:{{ $gateway.securityContext.runAsGroup | default 999 }} {{ .Values.splunkPlatform.sendingQueue.persistentQueue.storagePath }}/gateway; | ||
chmod -v g+rwxs {{ .Values.splunkPlatform.sendingQueue.persistentQueue.storagePath }}/gateway; | ||
setfacl -n -Rm d:m::rx,m::rx,d:g:{{ $gateway.securityContext.runAsGroup | default 999 }}:rx,g:{{ $gateway.securityContext.runAsGroup | default 999 }}:rx {{ .Values.splunkPlatform.sendingQueue.persistentQueue.storagePath }}/gateway; | ||
{{- end }}'] | ||
securityContext: | ||
runAsUser: 0 | ||
jvoravong marked this conversation as resolved.
Show resolved
Hide resolved
|
||
volumeMounts: | ||
{{- if .Values.splunkPlatform.sendingQueue.persistentQueue.enabled }} | ||
- name: persistent-queue | ||
mountPath: {{ .Values.splunkPlatform.sendingQueue.persistentQueue.storagePath }}/gateway | ||
{{- end }} | ||
containers: | ||
- name: otel-collector | ||
command: | ||
|
@@ -151,6 +169,10 @@ spec: | |
{{- end }} | ||
- mountPath: {{ .Values.isWindows | ternary "C:\\conf" "/conf" }} | ||
name: collector-configmap | ||
{{- if .Values.splunkPlatform.sendingQueue.persistentQueue.enabled }} | ||
- name: persistent-queue | ||
mountPath: {{ .Values.splunkPlatform.sendingQueue.persistentQueue.storagePath }}/gateway | ||
{{- end }} | ||
{{- if $gateway.extraVolumeMounts }} | ||
{{- toYaml $gateway.extraVolumeMounts | nindent 8 }} | ||
{{- end }} | ||
|
@@ -167,6 +189,12 @@ spec: | |
secret: | ||
secretName: {{ template "splunk-otel-collector.secret" . }} | ||
{{- end }} | ||
{{- if .Values.splunkPlatform.sendingQueue.persistentQueue.enabled }} | ||
- name: persistent-queue | ||
hostPath: | ||
path: {{ .Values.splunkPlatform.sendingQueue.persistentQueue.storagePath }}/gateway | ||
type: DirectoryOrCreate | ||
{{- end }} | ||
{{- if $gateway.extraVolumes }} | ||
{{- toYaml $gateway.extraVolumes | nindent 6 }} | ||
{{- end }} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit