Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chore][otel] Enable persistence in otel.yml #5549

Merged
merged 31 commits into from
Oct 21, 2024

Conversation

VihasMakwana
Copy link
Contributor

@VihasMakwana VihasMakwana commented Sep 17, 2024

  • Enhancement
  • Docs

What does this PR do?

This PR adds the filestorage extension in otel.yaml and updates the documentation.
Updates otel dependencies to 0.110.0

Why is it important?

By default, the OpenTelemetry Collector operates in a stateless mode, meaning it does not save file offsets to disk. To resolve this, we enable persistence in the default configuration files.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

How to test this PR locally

  • Build from this branch
  • Keep the config default and run elastic-agent otel --config otel.yaml
  • Ingest some data and stop the process.
  • Restart the process and ingest some more data.
  • You shouldn't see any data loss/duplication.

Related issues

@VihasMakwana VihasMakwana added documentation Improvements or additions to documentation enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team opentelemetry Related to the Elastic Distribution of the OpenTelemetry Collector labels Sep 17, 2024
@VihasMakwana VihasMakwana self-assigned this Sep 17, 2024
Copy link
Contributor

mergify bot commented Sep 17, 2024

This pull request does not have a backport label. Could you fix it @VihasMakwana? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Sep 17, 2024

backport-v8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Sep 17, 2024
@VihasMakwana VihasMakwana marked this pull request as ready for review September 18, 2024 10:19
@VihasMakwana VihasMakwana requested a review from a team as a code owner September 18, 2024 10:19
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@VihasMakwana
Copy link
Contributor Author

VihasMakwana commented Sep 18, 2024

@elastic/elastic-agent-control-plane

I propose two options for default directory:

  1. Set it as per current sample configurations and modify them during the onboarding process. Currently, the sample configs use STORAGE_DIR env as the default directory option
  2. Follow Filebeat's directory layout and set the default to a path under path.data.

For the latter option, we need to implement a config converter which maps path.data to /opt/Elastic/Agent/data (in case of Linux).

Let me know how do you feel about this proposal.

cc: @ycombinator @pierrehilbert @andrzej-stencel

@VihasMakwana
Copy link
Contributor Author

Introducing a new config converter would be straightforward and could be easily integrated into our distribution.

@cmacknz
Copy link
Member

cmacknz commented Sep 19, 2024

As discussed let's leave this as an environment variable. How the collector is run and installed (if it is installed as a service at all) is currently completely up to the user and this goes along with that.

The only place we likely want a default is in our reference Kubernetes configurations, which can set the environment variable or set a default in the configuration it includes directly.

In general we do not want to use any of the paths that are also used by Beats or Agent, because a standalone OTel collector must be able to run alongside them without issue.

In our OTel distribution I think we solve this by having an install command that puts this in whatever base install directory we pick (/opt/Elastic, C:\Program Files\Elastic, or /Library/Elastic) but we need an installer that runs as a privileged user to do that. You can't do it with config alone.

Copy link
Contributor

mergify bot commented Sep 27, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b enabl-file-storage-update-docs upstream/enabl-file-storage-update-docs
git merge upstream/main
git push upstream enabl-file-storage-update-docs

otel.yml Outdated Show resolved Hide resolved
internal/pkg/otel/README.md Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
otel.yml Outdated
Comment on lines 28 to 30
file_storage/filelogreceiver:
create_directory: true
directory: ${env:STORAGE_DIR}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of the File Storage extension may cause problems for users using this configuration with read-only filesystems, like in Docker, in Kubernetes, unless they configure a writable storage layer in their containers/pods. Should we be concerned about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of the File Storage extension may cause problems for users using this configuration with read-only filesystems, like in Docker, in Kubernetes, unless they configure a writable storage layer in their containers/pods. Should we be concerned about this?

I believe we will have a separate recommendation in our k8s configs based on @cmacknz's comment.

Copy link
Contributor

@andrzej-stencel andrzej-stencel Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change breaks running Elatic Agent as OTel Collector in Docker. I'd rather have this change applied in a way that doesn't break running the EDOT Collector in Docker. If we want to enable persistence by default, it should "just work", like it "just works" with the regular Elastic Agent.

$ docker run --rm -ti --entrypoint="elastic-agent" docker.elastic.co/beats/elastic-agent:9.0.0-SNAPSHOT otel
Starting in otel mode
2024-09-30T09:06:39.260Z	info	[email protected]/service.go:137	Setting up own telemetry...
2024-09-30T09:06:39.261Z	info	[email protected]/service.go:186	Skipped telemetry setup.
2024-09-30T09:06:39.261Z	info	builders/builders.go:26	Development component. May change in the future.	{"kind": "exporter", "data_type": "logs", "name": "debug"}
2024-09-30T09:06:39.262Z	info	builders/extension.go:50	Development component. May change in the future.	{"kind": "extension", "name": "memory_limiter"}
2024-09-30T09:06:39.262Z	info	memorylimiter/memorylimiter.go:75	Memory limiter configured	{"kind": "extension", "name": "memory_limiter", "limit_mib": 700, "spike_limit_mib": 180, "check_interval": 1}
2024-09-30T09:06:39.252Z	warn	[email protected]/provider.go:59	Configuration references unset environment variable	{"name": "STORAGE_DIR"}
failed to build extensions: failed to create extension "file_storage/filelogreceiver": mkdir /var/lib/otelcol: permission denied

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrzej-stencel I see what you mean. The error is due to an empty env:STORAGE_DIR and hence, the filestorage extension defaults to /var/lib/otelcol. Since the default user in the Docker container is elastic-agent, this causes a permission denied error.

If I use any of following commands, it works.

docker run --rm -ti --entrypoint="elastic-agent" --mount type=bind,source=$(pwd)/data,target=/usr/share/otel -e STORAGE_DIR=/usr/share/otel/  docker.elastic.co/beats/elastic-agent:9.0.0-SNAPSHOT otel

or

docker run --user root --rm -ti --entrypoint="elastic-agent" docker.elastic.co/beats/elastic-agent:9.0.0-SNAPSHOT otel

I wonder how should we approach this. Maybe add documentation? or update EDOT according to @cmacknz's comment. Or do both.

In our OTel distribution I think we solve this by having an install command that puts this in whatever base install directory we pick (/opt/Elastic, C:\Program Files\Elastic, or /Library/Elastic) but we need an installer that runs as a privileged user to do that. You can't do it with config alone.

@cmacknz @andrzej-stencel I'd appreciate your opinions over this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you that is should "just work".

Copy link
Member

@cmacknz cmacknz Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the container we control the paths, and can pick a default that is writable by our user in the container. Elastic agent uses the STATE_PATH environment variable and puts it under /usr/share/elastic-agent.

agentBaseDirectory = "/usr/share/elastic-agent" // directory that holds all elastic-agent related files
defaultStateDirectory = agentBaseDirectory + "/state" // directory that will hold the state data

We could add an /otelcol sub-directory to /usr/share/elastic-agent for one possibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly we can reuse the STATE_PATH variable and change which sub-directory we use based on whether the entrypoint is the otel command or the elastic-agent container command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

Copy link
Contributor

mergify bot commented Oct 7, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b enabl-file-storage-update-docs upstream/enabl-file-storage-update-docs
git merge upstream/main
git push upstream enabl-file-storage-update-docs

@VihasMakwana
Copy link
Contributor Author

@andrzej-stencel @cmacknz PTAL 😅

Copy link
Contributor

@andrzej-stencel andrzej-stencel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fond of the prepareEnv function setting the STORAGE_DIR env var irrespective of whether the env var is used in configuration or not. On the other hand, I don't have an alternative "perfect" solution, so let's go with this approach now and perhaps change if we come up with a better way.

Copy link
Contributor

mergify bot commented Oct 11, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b enabl-file-storage-update-docs upstream/enabl-file-storage-update-docs
git merge upstream/main
git push upstream enabl-file-storage-update-docs

internal/pkg/otel/templates/README.md.tmpl Outdated Show resolved Hide resolved
internal/pkg/otel/templates/README.md.tmpl Outdated Show resolved Hide resolved
// The filestorage extension will handle directory creation since create_directory: true is set by default.
// If the user hasn’t specified the env:STORAGE_DIR in filestorage, they may have opted for a custom path, and the extension will create the directory accordingly.
// In this case, setting env:STORAGE_DIR will have no effect.
if err := os.Setenv("STORAGE_DIR", filepath.Join(paths.Top(), "otel_registry")); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would just call the path otel instead of otel_registry since there is no guarantee we only use this directory to store file log registries long term.

// The filestorage extension will handle directory creation since create_directory: true is set by default.
// If the user hasn’t specified the env:STORAGE_DIR in filestorage, they may have opted for a custom path, and the extension will create the directory accordingly.
// In this case, setting env:STORAGE_DIR will have no effect.
if err := os.Setenv("STORAGE_DIR", filepath.Join(paths.Top(), "otel_registry")); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why can't we just reuse the STATE_PATH environment variable the container already supports so we don't need to add a new one? The default value is also writable by the elastic agent container user.

One reason to do this is it won't be a breaking change if we decide to get rid of STORAGE_DIR later.

@VihasMakwana
Copy link
Contributor Author

@cmacknz I've reused the STATE_PATH and changed directory name. PTAL!

@@ -118,3 +123,22 @@ func runCollector(cmdCtx context.Context, configFiles []string) error {
return goerrors.Join(errs...)

}

func prepareEnv() error {
if _, ok := os.LookupEnv("STORAGE_DIR"); !ok {
Copy link
Member

@cmacknz cmacknz Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use STATE_PATH here as well (and in the collector examples)?

What I am trying to do is avoid introducing a new environment variable we may decide to remove in the future, trying to keep our public API as small as possible while we figure out how everything fits together.

STATE_PATH isn't going anywhere without making it a breaking change for many people, so it doesn't increase the surface area of configuration we need to maintain.

Copy link
Contributor

mergify bot commented Oct 15, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b enabl-file-storage-update-docs upstream/enabl-file-storage-update-docs
git merge upstream/main
git push upstream enabl-file-storage-update-docs

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@pchila pchila removed their request for review October 18, 2024 12:56
@VihasMakwana VihasMakwana merged commit 49579ae into main Oct 21, 2024
14 checks passed
@VihasMakwana VihasMakwana deleted the enabl-file-storage-update-docs branch October 21, 2024 10:43
mergify bot pushed a commit that referenced this pull request Oct 21, 2024
* chore: update config, docs

* fix: spelling mistake

* fix: go tidy

* remove comments

* fix: go tidy

* fix: add filestorage to extension

* Update changelog/fragments/1726572104-enable-persistence-by-default.yaml

Co-authored-by: Andrzej Stencel <[email protected]>

* chore: update readme

* chore: update readme

* go.mod

* chore: go.mod

* comments

* comments

* chore: update notice

* chore: go.sum and notice

* chore: go.sum and notice

* Update internal/pkg/otel/templates/README.md.tmpl

Co-authored-by: Craig MacKenzie <[email protected]>

* Update internal/pkg/otel/templates/README.md.tmpl

Co-authored-by: Craig MacKenzie <[email protected]>

* chore: address comments

* chore: use STATE_PATH

* chore: update readme

* fix: add missing import

---------

Co-authored-by: Andrzej Stencel <[email protected]>
Co-authored-by: Craig MacKenzie <[email protected]>
(cherry picked from commit 49579ae)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify documentation Improvements or additions to documentation enhancement New feature or request opentelemetry Related to the Elastic Distribution of the OpenTelemetry Collector Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants