Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

otel: trim non-otel configuration from when validating the otel configuration #6235

Closed
mauri870 opened this issue Dec 6, 2024 · 8 comments · Fixed by #6531
Closed

otel: trim non-otel configuration from when validating the otel configuration #6235

mauri870 opened this issue Dec 6, 2024 · 8 comments · Fixed by #6531
Assignees
Labels
good first issue Good for newcomers Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Comments

@mauri870
Copy link
Member

mauri870 commented Dec 6, 2024

Now with the groundwork for the hybrid agent done in #5796 it is possible to have a yaml config file with both beats and otel configuration, for example:

hybrid-agent-config.yml
# normal agent config
filebeat.inputs:
  - type: filestream
    id: filestream-filebeat
    enabled: true
    paths:
      - /tmp/filebeat.log
output.elasticsearch:
  hosts:
    - https://esendpoint
  api_key: api-key
  index: filebeat-index
# and otel config on the same file
receivers:
  filebeatreceiver:
    filebeat:
      inputs:
        - type: filestream
          id: filestream-fbreceiver
          enabled: true
          paths:
            - /tmp/fbreceiver.log
    output:
      otelconsumer:
    logging:
      level: info
      selectors:
        - '*'
    path.home: /tmp
    queue.mem.flush.timeout: 0s
exporters:
  elasticsearch/log:
    endpoints:
      - https://esendpoint
    api_key: api-key
    logs_index: fbreceiver-index
    batcher:
      enabled: true
      flush_timeout: 1s
    mapping:
      mode: bodymap
service:
  pipelines:
    logs:
      receivers:
        - filebeatreceiver
      exporters:
        - elasticsearch/log

When running ./elastic-agent otel validate --config=./hybrid-agent-config.yml I get the following error:

failed to get config: cannot unmarshal the configuration: decoding failed due to the following error(s):

'' has invalid keys: filebeat.inputs, output.elasticsearch

When validating this file, we need to remove any top-level Beats-specific configuration to ensure that only the OpenTelemetry-specific components are validated.

@mauri870 mauri870 added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Dec 6, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@cmacknz cmacknz changed the title otel: trim beats specific configuration from when validating the configuration otel: trim non-otel configuration from when validating the otel configuration Dec 6, 2024
@ycombinator ycombinator added the good first issue Good for newcomers label Dec 9, 2024
@mauri870
Copy link
Member Author

mauri870 commented Jan 2, 2025

@kaanyalti I updated the description with more details.

Thinking more about this, I think it is a bit more complicated than I initially thought. Looking at the otel configuration docs there are multiple protocols supported for the config (env, file, http). For our use case, I don't think we need to support all of these, I'd be happy with just validating file paths and the file protocol for now (--config=./hybrid-agent-config.yml and file:hybrid-agent-config.yml).

If we expect the agent collector to support loading agent config from different protocols like otel does, then we should probably have a follow up ticket for it.

@kaanyalti
Copy link
Contributor

@mauri870 looking at the otel collector code, I don't think we can enforce our clients to only use local files. The config provider interface allows users to provide config in any supported format

// Provider is an interface that helps to retrieve a config map and watch for any
// changes to the config map. Implementations may load the config from a file,
// a database or any other source.
//
// The typical usage is the following:
//
//	r, err := provider.Retrieve("file:/path/to/config")
//	// Use r.Map; wait for watcher to be called.
//	r.Close()
//	r, err = provider.Retrieve("file:/path/to/config")
//	// Use r.Map; wait for watcher to be called.
//	r.Close()
//	// repeat retrieve/wait/close cycle until it is time to shut down the Collector process.
//	// ...
//	provider.Shutdown()
type Provider interface {
	// Retrieve goes to the configuration source and retrieves the selected data which
	// contains the value to be injected in the configuration and the corresponding watcher that
	// will be used to monitor for updates of the retrieved value.
	//
	// `uri` must follow the "<scheme>:<opaque_data>" format. This format is compatible
	// with the URI definition (see https://datatracker.ietf.org/doc/html/rfc3986). The "<scheme>"
	// must be always included in the `uri`. The "<scheme>" supported by any provider:
	//   - MUST consist of a sequence of characters beginning with a letter and followed by any
	//     combination of letters, digits, plus ("+"), period ("."), or hyphen ("-").
	//     See https://datatracker.ietf.org/doc/html/rfc3986#section-3.1.
	//   - MUST be at least 2 characters long to avoid conflicting with a driver-letter identifier as specified
	//     in https://tools.ietf.org/id/draft-kerwin-file-scheme-07.html#syntax.
	//   - For testing, all implementation MUST check that confmaptest.ValidateProviderScheme returns no error.
	//
	// `watcher` callback is called when the config changes. watcher may be called from
	// a different go routine. After watcher is called Retrieved.Get should be called
	// to get the new config. See description of Retrieved for more details.
	// watcher may be nil, which indicates that the caller is not interested in
	// knowing about the changes.
	//
	// If ctx is cancelled should return immediately with an error.
	// Should never be called concurrently with itself or with Shutdown.
	Retrieve(ctx context.Context, uri string, watcher WatcherFunc) (*Retrieved, error)

	// Scheme returns the location scheme used by Retrieve.
	Scheme() string

	// Shutdown signals that the configuration for which this Provider was used to
	// retrieve values is no longer in use and the Provider should close and release
	// any resources that it may have created.
	//
	// This method must be called when the Collector service ends, either in case of
	// success or error. Retrieve cannot be called after Shutdown.
	//
	// Should never be called concurrently with itself or with Retrieve.
	// If ctx is cancelled should return immediately with an error.
	Shutdown(ctx context.Context) error
}

I may be missing some config option that may make implementation of this issue simpler. I have a meeting with @michalpristas tomorrow to discuss the implementation details here. Hopefully we can figure something out.

@kaanyalti
Copy link
Contributor

kaanyalti commented Jan 2, 2025

Looking at the otel cli flags we have, I think we are relaying whatever config path is provided directly to otel, so we are not putting any restrictions here either. I am guessing this is the intended behaviour as we want the otel collector to behave the same as it does outside of the agent

func getConfigFiles(cmd *cobra.Command, useDefault bool) ([]string, error) {
	configFiles, err := cmd.Flags().GetStringArray(otelConfigFlagName)
	if err != nil {
		return nil, fmt.Errorf("failed to retrieve config flags: %w", err)
	}

@kaanyalti
Copy link
Contributor

After discussing with @michalpristas, here's what I will implement

  • Use otel resolver to et all the config
  • Strip away non-otel components from the config files
  • Save the configs to a new dir
  • Pass this new dir and the config files into the otel command

@kaanyalti
Copy link
Contributor

ok, another quick update. I spoke too soon, what I mentioned above seems to be not possible without reimplementing what otel has as the functions that may make this implementation possible are not exported. I am going to prepare a document explaining why this implementation is not possible at the moment, and will possibly propose a change in otel.

@kaanyalti
Copy link
Contributor

Ok found a way to implement this, will post a PR as soon as I have a draft that demonstrates what I'm doing

@kaanyalti
Copy link
Contributor

Discussed with the team and decided not to support stripping non-otel components for otel validation. Added a more detailed help message in the command explaining that hybrid configurations will fail if validated by otel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
4 participants