Add ability to enroll with a specific ID #4290

blakerouse · 2025-01-08T01:41:32Z

What is the problem this PR solves?

This solves an issue where an Elastic Agent is being replaced with a new Elastic Agent instance for the same host, pod, or workload. This allows the enrolling Elastic Agent to tell the ID that it wants to use, that ID can be currently in-use and this enrollment will take over the record of that Elastic Agent. To take off the existing Elastic Agent both the original and the new enrollment must use the same replace_token during the enrollment. This ensures that the original enrollment informs Fleet Server that it can be replaced, and ensures that the replacement has the same token to perform the replacement.

How does this PR solve the problem?

It solves the issue by taking a new id field in the enroll HTTP request. That id is then used as the Elastic Agent ID and determines if this is a new Elastic Agent or if it should take over an existing Elastic Agent record.

How to test this PR locally

At the moment the integration tests are the best way to test this, as the ability to use this field has not been exposed yet on the Elastic Agent.

Design Checklist

I have ensured my design is stateless and will work when multiple fleet-server instances are behind a load balancer.
I have or intend to scale test my changes, ensuring it will work reliably with 100K+ agents connected.
I have included fail safe mechanisms to limit the load on fleet-server: rate limiting, circuit breakers, caching, load shedding, etc. (already covered by enroll handle)

Checklist

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
~~[ ] I have made corresponding change to the default configuration files~~ (no config changes)
I have added tests that prove my fix is effective or that my feature works
I have added an entry in ./changelog/fragments using the changelog tool

Related issues

Closes Add ability to provide the agent-id in enroll API #4226

jlind23 · 2025-01-08T14:38:43Z

After chatting with @blakerouse over slack we should fail the enrollment of this "new" agent if the policy has temper protection enabled.

kaanyalti · 2025-01-08T16:18:22Z

Changes so far look good to me, waiting for updates relevant to @jlind23's comment above to approve

michel-laterman · 2025-01-08T16:26:57Z

model/openapi.yml

+            If another agent is enrolled with the same ID the other agent will no longer be able to communicate,
+            this new agent is considered a replacement of the other agent.


This is a pretty big change from our discussion.
Please also add a sentence saying the (replaced) agent will still be able to send data into ES

It is a slight change, because it is not possible to get an API key token again after the initial create. That made me have no choice to change the behavior.

I did add more as requested to the description of this field to inform that it will still allow data to flow.

jlind23 · 2025-01-08T20:56:02Z

Changes so far look good to me, waiting for updates relevant to @jlind23's comment above to approve

See #4226 (comment)

blakerouse · 2025-01-08T22:35:12Z

@jlind23 @michel-laterman @kaanyalti I have updated this PR based on the discuss I had with @jlind23 about security with this feature. This PR now includes an additional replace_token during the enrollment API. I have updated the PR description to describe this as well as the API specification describes it.

internal/pkg/api/handleEnroll.go

pkoutsovasilis · 2025-01-09T04:49:09Z

Except a small ending of a trace-span the code changes look good to me. I understand the potential pitfalls with this feature and definitely see how the replace_token helps in minimising some of them but still this feature to me serves only special-case scenarios and is not streamlined usage 🙂

internal/pkg/api/handleEnroll.go

blakerouse · 2025-01-09T16:15:48Z

@pkoutsovasilis @michalpristas I updated the PR with the request fixes. Thanks for the reviews.

internal/pkg/api/handleEnroll.go

elastic-sonarqube · 2025-01-09T18:34:43Z

Quality Gate passed

Issues
2 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
87.3% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube

pkoutsovasilis

LGTM

michalpristas

Thanks for resolving comment. one question about behavior other than that I'm ok with the change.
i let you decide how you want to address the point i raised in this iteration

michalpristas · 2025-01-10T08:02:49Z

internal/pkg/api/handleEnroll.go

+			return nil, err
+		}
+
+		if agent.Id != "" {


can it return agent.id == ""? if not this if statement is not needed.
if so. we dont have this case handled as this should not be the same as empty ID when req.ID is not used.
we whould probably not continue with empty id, probably we should fail. generating a new one breaks the purpose of providing it via req.id

It absolutely will return an agent.id == "". This happens when we check to see if an existing agent already exists with that ID. The ErrNotFound will not be returned from _checkAgent, it will return nil error and this will be `agent.id == "".

blakerouse added 2 commits January 7, 2025 20:34

Add ability to enroll and provide the agent id.

c489556

Fix lint.

01bc665

blakerouse added Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team backport-8.x Automated backport to the 8.x branch with mergify labels Jan 8, 2025

blakerouse self-assigned this Jan 8, 2025

blakerouse requested a review from a team as a code owner January 8, 2025 01:41

blakerouse requested review from pkoutsovasilis and kaanyalti January 8, 2025 01:41

Add changelog entry.

4e383f5

michel-laterman reviewed Jan 8, 2025

View reviewed changes

blakerouse added 4 commits January 8, 2025 17:14

Add replace_token.

1e0b16a

Add crypto dep.

7e78f58

More fixes.

7a8ef63

Fix lint.

0a59a2b

blakerouse mentioned this pull request Jan 8, 2025

Add ability to enroll with defined ID and replace_token elastic/elastic-agent#6498

Draft

5 tasks

pkoutsovasilis reviewed Jan 9, 2025

View reviewed changes

internal/pkg/api/handleEnroll.go Outdated Show resolved Hide resolved

michalpristas added the enhancement New feature or request label Jan 9, 2025

michalpristas reviewed Jan 9, 2025

View reviewed changes

internal/pkg/api/handleEnroll.go Show resolved Hide resolved

internal/pkg/api/handleEnroll.go Outdated Show resolved Hide resolved

internal/pkg/api/handleEnroll.go Show resolved Hide resolved

internal/pkg/api/handleEnroll.go Outdated Show resolved Hide resolved

Updates from code review.

fec4f23

pkoutsovasilis reviewed Jan 9, 2025

View reviewed changes

internal/pkg/api/handleEnroll.go Outdated Show resolved Hide resolved

Use now variable.

8e43c86

pkoutsovasilis approved these changes Jan 9, 2025

View reviewed changes

michalpristas approved these changes Jan 10, 2025

View reviewed changes

kaanyalti approved these changes Jan 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to enroll with a specific ID #4290

Add ability to enroll with a specific ID #4290

blakerouse commented Jan 8, 2025 •

edited

Loading

jlind23 commented Jan 8, 2025

kaanyalti commented Jan 8, 2025

michel-laterman Jan 8, 2025

blakerouse Jan 8, 2025

blakerouse Jan 8, 2025

jlind23 commented Jan 8, 2025

blakerouse commented Jan 8, 2025

pkoutsovasilis commented Jan 9, 2025

blakerouse commented Jan 9, 2025

elastic-sonarqube bot commented Jan 9, 2025

pkoutsovasilis left a comment

michalpristas left a comment

michalpristas Jan 10, 2025

blakerouse Jan 10, 2025

		If another agent is enrolled with the same ID the other agent will no longer be able to communicate,
		this new agent is considered a replacement of the other agent.

Add ability to enroll with a specific ID #4290

Are you sure you want to change the base?

Add ability to enroll with a specific ID #4290

Conversation

blakerouse commented Jan 8, 2025 • edited Loading

What is the problem this PR solves?

How does this PR solve the problem?

How to test this PR locally

Design Checklist

Checklist

Related issues

jlind23 commented Jan 8, 2025

kaanyalti commented Jan 8, 2025

michel-laterman Jan 8, 2025

Choose a reason for hiding this comment

blakerouse Jan 8, 2025

Choose a reason for hiding this comment

blakerouse Jan 8, 2025

Choose a reason for hiding this comment

jlind23 commented Jan 8, 2025

blakerouse commented Jan 8, 2025

pkoutsovasilis commented Jan 9, 2025

blakerouse commented Jan 9, 2025

elastic-sonarqube bot commented Jan 9, 2025

Quality Gate passed

pkoutsovasilis left a comment

Choose a reason for hiding this comment

michalpristas left a comment

Choose a reason for hiding this comment

michalpristas Jan 10, 2025

Choose a reason for hiding this comment

blakerouse Jan 10, 2025

Choose a reason for hiding this comment

blakerouse commented Jan 8, 2025 •

edited

Loading