Remove node IP from excluded list when draining fails #423

A-Kamaee · 2024-06-11T14:28:57Z

In case of unsuccessful drain of a node we remove the node IP from the elastic search excluded IPs.

Issue : #128 #404

Description

A few sentences describing the overall goals of the pull request's
commits.

Types of Changes

What types of changes does your code introduce? Keep the ones that apply:

New feature (non-breaking change which adds functionality)
Bug fix (non-breaking change which fixes an issue)
Configuration change
Refactor/improvements
Documentation / non-code

Tasks

List of tasks you will do to complete the PR

Created Task 1
Created Task 2
To-do Task 3

Review

List of tasks the reviewer must do to review the PR

Tests
Documentation
CHANGELOG

Deployment Notes

These should highlight any db migrations, feature toggles, etc.

girishc13 · 2024-06-11T17:35:00Z

The draining annotation needs to be removed otherwise the next run will start draining the pod again.

juhovuori · 2024-06-12T06:28:34Z

This fixes the clean variant of the described bug. However, we are still left with dirty exclude list in a messy case, when the process just crashes during the draining loop. It's a fairly realistic scenario as the draining loop potentially runs for hours.

otrosien · 2024-06-12T11:01:28Z

Would it make sense to have es-operator do log a warning and kill of the node in case each of the shards have another copy outside this node?

…e elastic search excluded IPs. Signed-off-by: Abouzar Kamaee <[email protected]>

A-Kamaee · 2024-06-12T13:41:14Z

Would it make sense to have es-operator do log a warning and kill of the node in case each of the shards have another copy outside this node?

I don't think it's a good idea. There's a reason Elasticsearch can't empty the node. If es-operator abruptly kills the node, even if there are shards available in other nodes, the cluster might end up in an unknown state. WDYT?

A-Kamaee · 2024-06-12T13:43:17Z

This fixes the clean variant of the described bug. However, we are still left with dirty exclude list in a messy case, when the process just crashes during the draining loop. It's a fairly realistic scenario as the draining loop potentially runs for hours.

Can you elaborate on the specific scenario you have in mind? Please also consider that in the event of a context Done, we will clean up the exclude list.

Signed-off-by: Abouzar Kamaee <[email protected]>

…y and add tests for ESClient.excludePodIP function Signed-off-by: Abouzar Kamaee <[email protected]>

Signed-off-by: Abouzar Kamaee <[email protected]>

juhovuori · 2024-06-13T07:57:45Z

This fixes the clean variant of the described bug. However, we are still left with dirty exclude list in a messy case, when the process just crashes during the draining loop. It's a fairly realistic scenario as the draining loop potentially runs for hours.

Actually, I just learned this is not an issue in es-operator exactly due to the mentioned annotation. Ignore my previous comment, the PR looks good IMO.

In case of unsuccessful drain of a node we remove the node IP from th…

ea7ab5f

…e elastic search excluded IPs. Signed-off-by: Abouzar Kamaee <[email protected]>

A-Kamaee force-pushed the eviction-error-cleanup branch from a3aa704 to ea7ab5f Compare June 12, 2024 13:00

Remove draining annotation from pod in case of failed draining!

81b3c41

Signed-off-by: Abouzar Kamaee <[email protected]>

A-Kamaee force-pushed the eviction-error-cleanup branch 2 times, most recently from 067b299 to 0367cbf Compare June 12, 2024 16:29

Refactor excludePodIP function for improved readability and simplicit…

b1eb694

…y and add tests for ESClient.excludePodIP function Signed-off-by: Abouzar Kamaee <[email protected]>

A-Kamaee force-pushed the eviction-error-cleanup branch from 0367cbf to b1eb694 Compare June 12, 2024 16:30

Add tests for ESClient.undoExcludePodIP function

db0903f

Signed-off-by: Abouzar Kamaee <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove node IP from excluded list when draining fails #423

Remove node IP from excluded list when draining fails #423

A-Kamaee commented Jun 11, 2024 •

edited

Loading

girishc13 commented Jun 11, 2024

juhovuori commented Jun 12, 2024

otrosien commented Jun 12, 2024

A-Kamaee commented Jun 12, 2024

A-Kamaee commented Jun 12, 2024

juhovuori commented Jun 13, 2024

Remove node IP from excluded list when draining fails #423

Are you sure you want to change the base?

Remove node IP from excluded list when draining fails #423

Conversation

A-Kamaee commented Jun 11, 2024 • edited Loading

In case of unsuccessful drain of a node we remove the node IP from the elastic search excluded IPs.

Description

Types of Changes

Tasks

Review

Deployment Notes

girishc13 commented Jun 11, 2024

juhovuori commented Jun 12, 2024

otrosien commented Jun 12, 2024

A-Kamaee commented Jun 12, 2024

A-Kamaee commented Jun 12, 2024

juhovuori commented Jun 13, 2024

A-Kamaee commented Jun 11, 2024 •

edited

Loading