Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .github/workflows/test-docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ jobs:
runs-on: ubuntu-latest
outputs:
data-transforms: ${{ steps.filter.outputs.data-transforms }}
envoy-shadowing: ${{ steps.filter.outputs.envoy-shadowing }}
redpanda-migrator: ${{ steps.filter.outputs.redpanda-migrator }}
steps:
- name: Checkout code
Expand All @@ -22,6 +23,8 @@ jobs:
filters: |
data-transforms:
- 'data-transforms/**'
envoy-shadowing:
- 'docker-compose/envoy-shadowing/**'
redpanda-migrator:
- 'docker-compose/redpanda-migrator-demo/**'
run-tests:
Expand All @@ -45,6 +48,10 @@ jobs:
run: npm install

- name: Increase AIO limit for Redpanda
if: needs.setup.outputs.envoy-shadowing == 'true'
run: |
# Redpanda uses Linux AIO. With 6 brokers each needing ~10000 events,
# we exceed the default limit of 65536. Increase it.
if: needs.setup.outputs.redpanda-migrator == 'true'
run: |
# Redpanda uses Linux AIO. Increase limit to support multiple brokers.
Expand All @@ -56,6 +63,9 @@ jobs:
# Run the tests for the data transforms
npm run test-transforms

- name: Test envoy shadowing
if: needs.setup.outputs.envoy-shadowing == 'true'
run: npm run test-envoy-shadowing
- name: Test redpanda migrator
if: needs.setup.outputs.redpanda-migrator == 'true'
run: npm run test-migrator
Expand Down
202 changes: 202 additions & 0 deletions docker-compose/envoy-shadowing/README.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
= Disaster Recovery with Envoy and Shadowing
:env-docker: true
:page-categories: High Availability, Disaster Recovery, Integration
:description: Combine Redpanda Shadowing for data replication with Envoy proxy for transparent client routing during disaster recovery.
:page-layout: lab
:page-topic-type: lab
:personas: platform_operator, streaming_developer
// Learning objectives
:learning-objective-1: Set up Shadowing for offset-preserving data replication
:learning-objective-2: Configure Envoy for automatic client routing during failover
:learning-objective-3: Execute a complete disaster recovery failover

// (test start {"id":"envoy-shadowing-dr", "description": "Envoy + Shadowing disaster recovery"})
// (step {"action":"runShell", "command": "docker compose down -v 2>/dev/null || true", "workingDirectory": "../docker-compose/envoy-shadowing"})

ifdef::env-site[]
This lab demonstrates a disaster recovery setup that combines xref:ROOT:manage:disaster-recovery/shadowing/index.adoc[Redpanda Shadowing] with https://www.envoyproxy.io/[Envoy proxy^].
endif::[]
ifndef::env-site[]
This lab demonstrates a disaster recovery setup that combines https://docs.redpanda.com/current/manage/disaster-recovery/[Redpanda Shadowing^] with https://www.envoyproxy.io/[Envoy proxy^].
endif::[]

* **Shadowing** provides offset-preserving, byte-for-byte data replication between clusters
* **Envoy** provides transparent client routing without requiring client reconfiguration

https://www.envoyproxy.io/[Envoy^] is a high-performance proxy that can route traffic intelligently based on backend health. In this setup, Envoy routes Kafka clients to the active cluster and automatically fails over to the shadow cluster when the source becomes unavailable. This eliminates the need to reconfigure clients during disaster recovery.

In this lab, you will:

* {learning-objective-1}
* {learning-objective-2}
* {learning-objective-3}

== Prerequisites

You need https://docs.docker.com/compose/install/[Docker and Docker Compose^].

== Run the lab

. Clone this repository:
+
[,bash]
----
git clone https://github.com/redpanda-data/redpanda-labs.git
cd redpanda-labs/docker-compose/envoy-shadowing
----

. Start the environment:
+
[,bash]
----
docker compose up -d --wait
----
// (step {"action":"runShell", "command": "docker compose up -d --wait", "workingDirectory": "../docker-compose/envoy-shadowing", "timeout": 180000})

. Verify both clusters are healthy:
+
[,bash]
----
docker exec redpanda-source rpk cluster health
docker exec redpanda-shadow rpk cluster health
----
// (step {"action":"runShell", "command": "docker exec redpanda-source rpk cluster health", "output": "/Healthy/"})
// (step {"action":"runShell", "command": "docker exec redpanda-shadow rpk cluster health", "output": "/Healthy/"})

. Create a topic on the source cluster:
+
[,bash]
----
docker exec redpanda-source rpk topic create demo-topic --partitions 3 --replicas 1
----
// (step {"action":"runShell", "command": "docker exec redpanda-source rpk topic create demo-topic --partitions 3 --replicas 1", "output": "/OK/"})

. Create a shadow link to replicate data from source to shadow:
+
[,bash]
----
docker exec redpanda-shadow rpk shadow create \
--config-file /config/shadow-link.yaml \
--no-confirm \
-X admin.hosts=redpanda-shadow:9644
----
// (step {"action":"runShell", "command": "docker exec redpanda-shadow rpk shadow create --config-file /config/shadow-link.yaml --no-confirm -X admin.hosts=redpanda-shadow:9644", "output": "/Successfully created/"})

. Verify the shadow link is active:
+
[,bash]
----
docker exec redpanda-shadow rpk shadow status demo-shadow-link -X admin.hosts=redpanda-shadow:9644
----
// (step {"action":"runShell", "command": "docker exec redpanda-shadow rpk shadow status demo-shadow-link -X admin.hosts=redpanda-shadow:9644", "output": "/ACTIVE/"})

. Produce messages through Envoy (routes to source cluster):
+
[,bash]
----
docker exec python-client python3 /scripts/test-producer.py
----
// (step {"action":"runShell", "command": "docker exec python-client python3 /scripts/test-producer.py", "output": "/OK/"})
// (step {"action":"wait", "duration": 5000})

. Verify data replicated to shadow (lag should be 0):
+
[,bash]
----
docker exec redpanda-shadow rpk shadow status demo-shadow-link -X admin.hosts=redpanda-shadow:9644 | grep -A5 "demo-topic"
----
// (step {"action":"runShell", "command": "docker exec redpanda-shadow rpk shadow status demo-shadow-link -X admin.hosts=redpanda-shadow:9644", "output": "/demo-topic/"})

== Simulate disaster and failover

. Stop the source cluster to simulate a disaster:
+
[,bash]
----
docker stop redpanda-source
----
+
Envoy detects the failure in 10-15 seconds and routes traffic to the shadow cluster.
// (step {"action":"runShell", "command": "docker stop redpanda-source"})
// (step {"action":"wait", "duration": 30000})

. Read replicated data from shadow through Envoy:
+
[,bash]
----
docker exec python-client python3 /scripts/test-consumer.py
----
+
Consumers can read from shadow topics immediately after Envoy fails over.
// (step {"action":"runShell", "command": "docker exec python-client python3 /scripts/test-consumer.py", "output": "/OK/"})

. Execute shadow failover to enable writes:
+
[,bash]
----
docker exec redpanda-shadow rpk shadow failover demo-shadow-link --all --no-confirm \
-X admin.hosts=redpanda-shadow:9644
----
+
Shadow topics are read-only until you run the failover command. This prevents split-brain scenarios where both clusters accept writes.
// (step {"action":"runShell", "command": "docker exec redpanda-shadow rpk shadow failover demo-shadow-link --all --no-confirm -X admin.hosts=redpanda-shadow:9644", "output": "/Successfully initiated/"})
// (step {"action":"wait", "duration": 5000})

. Produce new messages to the failed-over shadow cluster:
+
[,bash]
----
docker exec python-client python3 /scripts/test-producer.py
----
// (step {"action":"runShell", "command": "docker exec python-client python3 /scripts/test-producer.py", "output": "/OK/"})

== Clean up

Stop and remove the demo environment:

[,bash]
----
docker compose down -v
----
// (step {"action":"runShell", "command": "docker compose down -v", "workingDirectory": "../docker-compose/envoy-shadowing"})

// (test end)

== What you explored

In this lab, you:

* Set up Shadowing between source and shadow clusters with offset-preserving replication
* Configured Envoy for automatic client routing based on cluster health
* Simulated a disaster by stopping the source cluster
* Verified consumers can read replicated data through Envoy immediately after failover
* Executed `rpk shadow failover` to enable writes on the shadow cluster
* Produced new messages to the failed-over cluster without client reconfiguration

The following table summarizes the roles of each component in this disaster recovery setup:

|===
| Component | Role | Automatic?

| Shadowing
| Data replication with preserved offsets
| Yes

| Envoy
| Client routing to healthy cluster
| Yes

| `rpk shadow failover`
| Enable writes on shadow topics
| No (manual)
|===

== Suggested reading

ifdef::env-site[]
* xref:ROOT:manage:disaster-recovery/shadowing/index.adoc[Shadowing for Disaster Recovery]
endif::[]
ifndef::env-site[]
* https://docs.redpanda.com/current/manage/disaster-recovery/[Shadowing for Disaster Recovery^]
endif::[]
* https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/network_filters/kafka_broker_filter[Envoy Kafka Broker Filter^]
30 changes: 30 additions & 0 deletions docker-compose/envoy-shadowing/config/shadow-link.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: demo-shadow-link

# Source cluster connection
client_options:
bootstrap_servers:
- redpanda-source:9092

# Topic replication settings
topic_metadata_sync_options:
interval: 10s
auto_create_shadow_topic_filters:
- pattern_type: PREFIX
filter_type: INCLUDE
name: demo-
synced_shadow_topic_properties:
- retention.ms
- segment.ms
start_at_earliest: {}

# Consumer group offset replication
consumer_offset_sync_options:
interval: 10s
group_filters:
- pattern_type: LITERAL
filter_type: INCLUDE
name: '*'

# Schema registry replication
schema_registry_sync_options:
shadow_schema_registry_topic: {}
123 changes: 123 additions & 0 deletions docker-compose/envoy-shadowing/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Envoy + Shadowing Combined Lab
# ===============================
#
# Combines Redpanda Shadowing (data replication) with Envoy proxy (transparent client routing)
# for production-ready disaster recovery without client reconfiguration.
#
# ARCHITECTURE:
# Clients → Envoy (9092) → Source Cluster (priority 0) / Shadow Cluster (priority 1)
# ↓ Shadowing replication ↓
# Shadow Cluster (offset-preserving replica)
#
# ENVIRONMENT VARIABLES:
# REDPANDA_VERSION - Redpanda image tag (default: v25.3.4)
# ENVOY_VERSION - Envoy image tag (default: contrib-v1.31-latest)
#
# PORTS:
# 9092 - Envoy Kafka proxy (clients connect here)
# 9901 - Envoy admin interface
# 19092 - Source cluster Kafka (direct access)
# 29092 - Shadow cluster Kafka (direct access)

services:
# Source Cluster (Primary)
redpanda-source:
image: docker.redpanda.com/redpandadata/redpanda:${REDPANDA_VERSION:-v25.3.4}
container_name: redpanda-source
hostname: redpanda-source
command:
- redpanda
- start
- --kafka-addr internal://0.0.0.0:9092,external://0.0.0.0:19092
- --advertise-kafka-addr internal://redpanda-source:9092,external://localhost:19092
- --schema-registry-addr internal://0.0.0.0:8081,external://0.0.0.0:18081
- --rpc-addr redpanda-source:33145
- --advertise-rpc-addr redpanda-source:33145
- --mode dev-container
- --smp 1
- --set redpanda.enable_shadow_linking=true
ports:
- "19092:19092"
- "18081:18081"
- "19644:9644"
networks:
- redpanda-network
healthcheck:
test: ["CMD-SHELL", "rpk cluster health | grep -E 'Healthy:.+true' || exit 1"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s

# Shadow Cluster (DR Target)
redpanda-shadow:
image: docker.redpanda.com/redpandadata/redpanda:${REDPANDA_VERSION:-v25.3.4}
container_name: redpanda-shadow
hostname: redpanda-shadow
command:
- redpanda
- start
- --kafka-addr internal://0.0.0.0:9092,external://0.0.0.0:29092
- --advertise-kafka-addr internal://redpanda-shadow:9092,external://localhost:29092
- --schema-registry-addr internal://0.0.0.0:8081,external://0.0.0.0:28081
- --rpc-addr redpanda-shadow:33145
- --advertise-rpc-addr redpanda-shadow:33145
- --mode dev-container
- --smp 1
- --set redpanda.enable_shadow_linking=true
ports:
- "29092:29092"
- "28081:28081"
- "29644:9644"
volumes:
- ./config:/config
networks:
- redpanda-network
healthcheck:
test: ["CMD-SHELL", "rpk cluster health | grep -E 'Healthy:.+true' || exit 1"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s

# Envoy Proxy for transparent client routing
# NOTE: We use the 'contrib' image variant because it includes the Kafka broker filter
# (envoy.filters.network.kafka_broker) which is not included in the standard Envoy image.
# This filter enables Envoy to understand Kafka protocol for intelligent routing.
envoy:
image: envoyproxy/envoy:${ENVOY_VERSION:-contrib-v1.31-latest}
container_name: envoy-proxy
ports:
- "9092:9092"
- "9901:9901"
volumes:
- ./envoy-proxy/envoy.yaml:/etc/envoy/envoy.yaml
networks:
- redpanda-network
depends_on:
redpanda-source:
condition: service_healthy
redpanda-shadow:
condition: service_healthy
healthcheck:
test: ["CMD-SHELL", "/bin/bash -c 'echo > /dev/tcp/localhost/9901' || exit 1"]
interval: 5s
timeout: 3s
retries: 3
start_period: 10s

# Python client for testing (kafka-python works with Envoy)
python-client:
image: python:3.11-slim
container_name: python-client
command: ["/bin/bash", "-c", "pip install kafka-python && tail -f /dev/null"]
volumes:
- ./scripts:/scripts
networks:
- redpanda-network
depends_on:
- envoy

networks:
redpanda-network:
driver: bridge
Loading