redpanda-data · JakeSCahill · Jan 23, 2025 · Jan 23, 2025 · Jan 23, 2025 · Jan 24, 2025
@@ -49,6 +49,14 @@ antora:
           filter: docker-compose
           env_type: Docker
           attribute_name: docker-labs-index
+  - require: '@sntke/antora-mermaid-extension'
+    mermaid_library_url: https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs
+    script_stem: mermaid-scripts
+    mermaid_initialize_options:
+      start_on_load: true
+      theme: base
+      theme_variables:
+        line_color: '#e2401b'
   - require: '@redpanda-data/docs-extensions-and-macros/extensions/collect-bloblang-samples'
   - require: '@redpanda-data/docs-extensions-and-macros/extensions/generate-rp-connect-categories'
   - require: '@redpanda-data/docs-extensions-and-macros/extensions/modify-redirects'

@@ -133,6 +133,7 @@
 *** xref:manage:kubernetes/k-remote-read-replicas.adoc[Remote Read Replicas]
 *** xref:manage:kubernetes/k-manage-resources.adoc[Manage Pod Resources]
 *** xref:manage:kubernetes/k-scale-redpanda.adoc[Scale]
+*** xref:manage:kubernetes/k-nodewatcher.adoc[]
 *** xref:manage:kubernetes/k-decommission-brokers.adoc[Decommission Brokers]
 *** xref:manage:kubernetes/k-recovery-mode.adoc[Recovery Mode]
 *** xref:manage:kubernetes/monitoring/index.adoc[Monitor]

@@ -1,12 +1,14 @@
 = Decommission Brokers in Kubernetes
-:description: Remove a broker so that it is no longer considered part of the cluster.
+:description: Remove a Redpanda broker from the cluster without risking data loss or causing instability.
 :page-context-links: [{"name": "Linux", "to": "manage:cluster-maintenance/decommission-brokers.adoc" },{"name": "Kubernetes", "to": "manage:kubernetes/k-decommission-brokers.adoc" } ]
 :tags: ["Kubernetes"]
 :page-aliases: manage:kubernetes/decommission-brokers.adoc
 :page-categories: Management
 :env-kubernetes: true
 
-When you decommission a broker, its partition replicas are reallocated across the remaining brokers and it is removed from the cluster. You may want to decommission a broker in the following circumstances:
+Decommissioning a broker is the *safe and controlled* way to remove a Redpanda broker from the cluster without risking data loss or causing instability. By decommissioning, you ensure that partition replicas are reallocated across the remaining brokers so that you can then safely shut down the broker.
+
+You may want to decommission a broker in the following situations:
 
 * You are removing a broker to decrease the size of the cluster, also known as scaling down.
 * The broker has lost its storage and you need a new broker with a new node ID (broker ID).
@@ -222,15 +224,204 @@ So the primary limitation consideration is the replication factor of five, meani
 
 To decommission a broker, you can use one of the following methods:
 
-- <<Automated>>: Use the Decommission controller to automatically decommission brokers whenever you reduce the number of StatefulSet replicas.
 - <<Manual>>: Use `rpk` to decommission one broker at a time.
+- <<Automated>>: Use the Decommission controller to automatically decommission brokers whenever you reduce the number of StatefulSet replicas.
+
+[[Manual]]
+=== Manually decommission a broker
+
+Follow this workflow to manually decommission a broker before reducing the number of StatefulSet replicas:
+
+[mermaid]
+....
+flowchart TB
+    %% Define classes
+    classDef userAction stroke:#374D7C, fill:#E2EBFF, font-weight:bold,rx:5,ry:5
+
+    A[Start Manual Scale-In]:::userAction --> B["Identify Broker to Remove<br/>(Highest Pod Ordinal)"]:::userAction
+    B --> C[Decommission Broker Running on Pod with Highest Ordinal]:::userAction
+    C --> D[Monitor Decommission Status]:::userAction
+    D --> E{Is Broker Removed?}:::userAction
+    E -- No --> D
+    E -- Yes --> F[Decrease StatefulSet Replicas by 1]:::userAction
+    F --> G[Wait for Rolling Update and Cluster Health]:::userAction
+    G --> H{More Brokers to Remove?}:::userAction
+    H -- Yes --> B
+    H -- No --> I[Done]:::userAction
+....
+
+. List your brokers and their associated broker IDs:
++
+```bash
+kubectl --namespace <namespace> exec -ti redpanda-0 -c redpanda -- \
+  rpk cluster info
+```
++
+.Example output
+[%collapsible]
+====
+```
+CLUSTER
+=======
+redpanda.560e2403-3fd6-448c-b720-7b456d0aa78c
+
+BROKERS
+=======
+ID    HOST                          PORT   RACK
+0     redpanda-0.testcluster.local  32180  A
+1     redpanda-1.testcluster.local  32180  A
+4     redpanda-3.testcluster.local  32180  B
+5*    redpanda-2.testcluster.local  32180  B
+6     redpanda-4.testcluster.local  32180  C
+8     redpanda-6.testcluster.local  32180  C
+9     redpanda-5.testcluster.local  32180  D
+```
+====
++
+The output shows that the IDs don't match the StatefulSet ordinal, which appears in the hostname. In this example, two brokers will be decommissioned: `redpanda-6` (ID 8) and `redpanda-5` (ID 9).
++
+NOTE: When scaling in a cluster, you cannot choose which broker is removed. Redpanda is deployed as a StatefulSet in Kubernetes. The StatefulSet controls which Pods are destroyed and always starts with the Pod that has the highest ordinal. So the first broker to be removed when updating the StatefulSet in this example is `redpanda-6` (ID 8).
+
+. Decommission the broker with the highest Pod ordinal:
++
+```bash
+kubectl --namespace <namespace> exec -ti <pod-name> -c <container-name> -- \
+  rpk redpanda admin brokers decommission <broker-id>
+```
++
+This message is displayed before the decommission process is complete.
++
+```bash
+Success, broker <broker-id> has been decommissioned!
+```
++
+TIP: If the broker is not running, use the `--force` flag.
+
+. Monitor the decommissioning status:
++
+```bash
+kubectl --namespace <namespace> exec -ti <pod-name> -c <container-name> -- \
+  rpk redpanda admin brokers decommission-status <broker-id>
+```
++
+The output uses cached cluster health data that is refreshed every 10 seconds. When the completion column for all rows is 100%, the broker is decommissioned.
++
+Another way to verify decommission is complete is by running the following command:
++
+```bash
+kubectl --namespace <namespace> exec -ti <pod-name> -c <container-name> -- \
+  rpk cluster health
+```
++
+Be sure to verify that the decommissioned broker's ID does not appear in the list of IDs. In this example, ID 9 is missing, which means the decommission is complete.
++
+```
+CLUSTER HEALTH OVERVIEW
+=======================
+Healthy:               true
+Controller ID:               0
+All nodes:                   [4 1 0 5 6 8]
+Nodes down:                  []
+Leaderless partitions:       []
+Under-replicated partitions: []
+```
+
+. Decrease the number of replicas *by one* to remove the Pod with the highest ordinal (the one you just decommissioned).
++
+:caution-caption: Reduce replicas by one
+[CAUTION]
+====
+When scaling in (removing brokers), remove only one broker at a time. If you reduce the StatefulSet replicas by more than one, Kubernetes can terminate multiple Pods simultaneously, causing quorum loss and cluster unavailability.
+====
+:caution-caption: Caution
++
+[tabs]
+======
+Helm + Operator::
++
+--
+.`redpanda-cluster.yaml`
+[,yaml]
+----
+apiVersion: cluster.redpanda.com/v1alpha2
+kind: Redpanda
+metadata:
+  name: redpanda
+spec:
+  chartRef: {}
+  clusterSpec:
+    statefulset:
+      replicas: <number-of-replicas>
+----
+
+Apply the Redpanda resource:
+
+```bash
+kubectl apply -f redpanda-cluster.yaml --namespace <namespace>
+```
+
+--
+Helm::
++
+--
+
+[tabs]
+====
+--values::
++
+.`decommission.yaml`
+[,yaml]
+----
+statefulset:
+  replicas: <number-of-replicas>
+----
+
+--set::
++
+[,bash]
+----
+helm upgrade redpanda redpanda/redpanda --namespace <namespace> --wait --reuse-values --set statefulset.replicas=<number-of-replicas>
+----
+====
+--
+======
++
+This process triggers a rolling restart of each Pod so that each broker has an up-to-date `seed_servers` configuration to reflect the new list of brokers.
 
-This example shows how to scale a cluster from seven brokers to five brokers.
+You can repeat this procedure to continue to scale down.
 
 [[Automated]]
 === Use the Decommission controller
 
-The Decommission controller is responsible for monitoring the StatefulSet for changes in the number replicas. When the number of replicas is reduced, the controller decommissions brokers, starting from the highest Pod ordinal, until the number of brokers matches the number of replicas. For example, you have a Redpanda cluster with the following brokers:
+The Decommission controller is responsible for monitoring the StatefulSet for changes in the number replicas. When the number of replicas is reduced, the controller decommissions brokers, starting from the highest Pod ordinal, until the number of brokers matches the number of replicas.
+
+[mermaid]
+....
+flowchart TB
+    %% Define classes
+    classDef userAction stroke:#374D7C, fill:#E2EBFF, font-weight:bold,rx:5,ry:5
+    classDef systemAction fill:#F6FBF6,stroke:#25855a,stroke-width:2px,color:#20293c,rx:5,ry:5
+
+    %% Main workflow
+    A[Start Automated Scale-In]:::userAction --> B[Decrease StatefulSet<br/>Replicas by 1]:::userAction
+    B --> C[Decommission Controller<br/>Detects Reduced Replicas]:::systemEvent
+    C --> D[Controller Marks<br/>Highest Ordinal Pod for Removal]:::systemEvent
+    D --> E[Controller Orchestrates<br/>Broker Decommission]:::systemEvent
+    E --> F[Partitions Reallocate<br/>Under Controller Supervision]:::systemEvent
+    F --> G[Check Cluster Health]:::systemEvent
+    G --> H{Broker Fully Removed?}:::systemEvent
+    H -- No --> F
+    H -- Yes --> I[Done,<br/>or Repeat if Further Scale-In Needed]:::userAction
+
+    %% Legend
+    subgraph Legend
+      direction TB
+      UA([User Action]):::userAction
+      SE([System Event]):::systemEvent
+    end
+....
+
+For example, you have a Redpanda cluster with the following brokers:
 
 [.no-copy]
 ----
@@ -402,7 +593,14 @@ helm upgrade --install redpanda redpanda/redpanda \
 kubectl exec redpanda-0 --namespace <namespace> -- rpk cluster health
 ```
 
-. Decrease the number of replicas by one:
+. Decrease the number of replicas *by one*.
++
+:caution-caption: Reduce replicas by one
+[CAUTION]
+====
+When scaling in (removing brokers), remove only one broker at a time. If you reduce the StatefulSet replicas by more than one, Kubernetes can terminate multiple Pods simultaneously, causing quorum loss and cluster unavailability.
+====
+:caution-caption: Caution
 +
 [tabs]
 ======
@@ -493,104 +691,7 @@ If you're running the Decommission controller as a sidecar:
 kubectl logs <pod-name> --namespace <namespace> -c redpanda-controllers
 ----
 
-You can repeat this procedure to scale down to 5 brokers.
-
-[[Manual]]
-=== Manually decommission a broker
-
-If you don't want to use the <<Automated, Decommission controller>>, follow these steps to manually decommission a broker before reducing the number of StatefulSet replicas:
-
-. List your brokers and their associated broker IDs:
-+
-```bash
-kubectl --namespace <namespace> exec -ti redpanda-0 -c redpanda -- \
-  rpk cluster info
-```
-+
-.Example output
-[%collapsible]
-====
-```
-CLUSTER
-=======
-redpanda.560e2403-3fd6-448c-b720-7b456d0aa78c
-
-BROKERS
-=======
-ID    HOST                          PORT   RACK
-0     redpanda-0.testcluster.local  32180  A
-1     redpanda-1.testcluster.local  32180  A
-4     redpanda-3.testcluster.local  32180  B
-5*    redpanda-2.testcluster.local  32180  B
-6     redpanda-4.testcluster.local  32180  C
-8     redpanda-6.testcluster.local  32180  C
-9     redpanda-5.testcluster.local  32180  D
-```
-====
-+
-The output shows that the IDs don't match the StatefulSet ordinal, which appears in the hostname. In this example, two brokers will be decommissioned: `redpanda-6` (ID 8) and `redpanda-5` (ID 9).
-+
-NOTE: When scaling in a cluster, you cannot choose which broker is decommissioned. Redpanda is deployed as a StatefulSet in Kubernetes. The StatefulSet controls which Pods are destroyed and always starts with the Pod that has the highest ordinal. So the first broker to be destroyed when updating the StatefulSet in this example is `redpanda-6` (ID 8).
-
-. Decommission the broker with your selected broker ID:
-+
-```bash
-kubectl --namespace <namespace> exec -ti <pod-name> -c <container-name> -- \
-  rpk redpanda admin brokers decommission <broker-id>
-```
-+
-This message is displayed before the decommission process is complete.
-+
-```
-Success, broker <broker-id> has been decommissioned!
-```
-+
-TIP: If the broker is not running, use the `--force` flag.
-
-. Monitor the decommissioning status:
-+
-```bash
-kubectl --namespace <namespace> exec -ti <pod-name> -c <container-name> -- \
-  rpk redpanda admin brokers decommission-status <broker-id>
-```
-+
-The output uses cached cluster health data that is refreshed every 10 seconds. When the completion column for all rows is 100%, the broker is decommissioned.
-+
-Another way to verify decommission is complete is by running the following command:
-+
-```bash
-kubectl --namespace <namespace> exec -ti <pod-name> -c <container-name> -- \
-  rpk cluster health
-```
-+
-Be sure to verify that the decommissioned broker's ID does not appear in the list of IDs. In this example, ID 9 is missing, which means the decommission is complete.
-+
-```
-CLUSTER HEALTH OVERVIEW
-=======================
-Healthy:               true
-Controller ID:               0
-All nodes:                   [4 1 0 5 6 8]
-Nodes down:                  []
-Leaderless partitions:       []
-Under-replicated partitions: []
-```
-
-. Decommission any other brokers.
-+
-After decommissioning one broker and verifying that the process is complete, continue decommissioning another broker by repeating the previous two steps.
-+
-NOTE: Be sure to take into account everything in <<should-you-decommission-brokers, this section>>, and that you have verified that your cluster and use cases will not be negatively impacted by losing brokers.
-
-. Update the StatefulSet replica value.
-+
-The last step is to update the StatefulSet replica value to reflect the new broker count. In this example the count was updated to five. If you deployed with the Helm chart, then run following command:
-+
-```bash
-helm upgrade redpanda redpanda/redpanda --namespace <namespace> --wait --reuse-values --set statefulset.replicas=5
-```
-+
-This process triggers a rolling restart of each Pod so that each broker has an up-to-date `seed_servers` configuration to reflect the new list of brokers.
+You can repeat this procedure to continue to scale down.
 
 == Troubleshooting