A complete real-time Change Data Capture (CDC) pipeline using Apache Flink, MariaDB, and Docker Compose. This project demonstrates how to build a modern streaming analytics system that processes database changes in real-time.
This project creates a database-to-database CDC pipeline that:
- Captures changes from a MariaDB
sales_records
table in real-time - Processes data using Apache Flink to calculate running totals and analytics
- Sinks results back to a MariaDB
sales_analytics
table - Provides monitoring via Flink Web UI and cAdvisor metrics
When you insert, update, or delete records in the source table, the analytics are automatically recalculated and updated within seconds.
graph LR
A[MariaDB<br/>sales_records] -->|CDC<br/>Real-time Changes| B[Apache Flink<br/>Stream Processing]
B -->|JDBC<br/>Analytics Results| C[MariaDB<br/>sales_analytics]
style A fill:#e1f5fe
style B fill:#f3e5f5
style C fill:#e8f5e8
- Apache Flink 1.19.3 - Stream processing engine with Java 17
- MariaDB 11.8 LTS - Source and sink database with binlog enabled
- MySQL CDC Connector 3.1.0 - Captures database changes
- JDBC Connector 3.2.0 - Writes results back to database
- Custom Docker Images - All dependencies built-in for reliability
- Docker and Docker Compose installed
- 8GB+ RAM recommended for Flink processing
git clone <this-repo>
cd apache_flink_and_docker_compose
docker compose up -d --build
This will start:
- 1 MariaDB container (source + sink)
- 1 Flink JobManager
- 2 Flink TaskManagers
docker ps
You should see 4 containers running:
CONTAINER ID IMAGE PORTS NAMES
abc123... apache_flink_and_docker_compose-jobmanager 0.0.0.0:8081->8081/tcp jobmanager
def456... apache_flink_and_docker_compose-taskmanager 6123/tcp, 8081/tcp taskmanager-1
ghi789... apache_flink_and_docker_compose-taskmanager 6123/tcp, 8081/tcp taskmanager-2
jkl012... mariadb:11.8 0.0.0.0:3306->3306/tcp mariadb
docker exec jobmanager /opt/flink/bin/sql-client.sh embedded -f job.sql
You should see:
[INFO] Execute statement succeed.
[INFO] Execute statement succeed.
[INFO] Execute statement succeed.
[INFO] Submitting SQL update statement to the cluster...
[INFO] SQL update statement has been successfully submitted to the cluster:
Job ID: <job-id>
Check that analytics were calculated:
docker exec mariadb mariadb -u root -prootpassword -e "USE sales_database; SELECT * FROM sales_analytics;"
Expected output:
metric_name | metric_value | calculated_at
total_sales | 5500.00 | 2025-09-06 09:29:39
docker exec mariadb mariadb -u root -prootpassword -e "
USE sales_database;
INSERT INTO sales_records (sale_id, product_id, product_name, sale_date, sale_amount)
VALUES (11, 111, 'Product K', '2023-01-11', 1100.00);"
docker exec mariadb mariadb -u root -prootpassword -e "USE sales_database; SELECT * FROM sales_analytics;"
The metric_value
should update from 5500.00
to 6600.00
within seconds!
The beauty of this CDC pipeline is that the Flink job runs continuously, monitoring the source database and updating analytics in real-time as data changes occur.
This shows:
- 1 Running Job - The CDC pipeline actively processing data
- Job Status: RUNNING - Continuously monitoring for database changes
- Duration: 14m+ - Job has been running and will continue indefinitely
- 2 Available Task Slots - Ready to process new data immediately
The job graph reveals the complete CDC pipeline:
- Source:
sales_records_table[1]
βCalc[2]
(CDC from MariaDB) - Processing:
GroupAggregate[4]
βCalc[5]
βConstraintEnforcer[6]
(Real-time aggregation) - Sink:
mariadb_analytics_sink[6]
(Write results back to database)
Key Metrics:
- Records Processed: 10 records from source table
- Status: Both source and sink tasks showing RUNNING
- Latency: Sub-second processing from database change to analytics update
- Throughput: Ready to process new records as they arrive
π‘ The job continues running in the background, automatically detecting any INSERT, UPDATE, or DELETE operations on the
sales_records
table and recalculating analytics within seconds.
- URL: http://localhost:8081
- Features: Job monitoring, metrics, task manager status
# Connect to MariaDB
docker exec -it mariadb mariadb -u root -prootpassword sales_database
# View source data
SELECT * FROM sales_records;
# View analytics
SELECT * FROM sales_analytics;
.
βββ docker-compose.yml # Container orchestration
βββ Dockerfile # Custom Flink image with all dependencies
βββ jobs/
β βββ job.sql # Flink SQL CDC job definition
βββ sql/
β βββ init.sql # MariaDB schema and sample data
β βββ mariadb.cnf # MariaDB configuration with binlog
βββ images/
β βββ flink_screenshot_1.png # Flink Web UI overview
β βββ flink_screenshot_2.png # Job execution graph
βββ README.md # This file
- MariaDB with binlog enabled captures all row changes
- Sample data includes 10 sales records totaling $5,500
- Flink MySQL CDC connector reads binlog in real-time
- Creates a streaming table
sales_records_table
- Builds a continuous query to sum
sale_amount
- Results written to
sales_analytics
table via JDBC - Updates happen automatically when source data changes
- Includes timestamp of calculation
- All JARs downloaded during Docker build
- No version conflicts or classpath issues
- Consistent environment across deployments
Edit jobs/job.sql
to add more analytics:
-- Example: Average sale amount
CREATE TEMPORARY VIEW avg_sales AS
SELECT AVG(sale_amount) AS avg_sales_amount
FROM sales_records_table;
-- Add to sink
INSERT INTO mariadb_analytics_sink
SELECT 'avg_sales' AS metric_name,
avg_sales_amount AS metric_value,
CURRENT_TIMESTAMP AS calculated_at
FROM avg_sales;
In docker-compose.yml
:
taskmanager:
# ...
deploy:
replicas: 4 # Increase for more parallelism
# Check Flink logs
docker logs jobmanager
# Verify database connectivity
docker exec mariadb mariadb -u root -prootpassword -e "SHOW DATABASES;"
# Verify binlog is enabled
docker exec mariadb mariadb -u root -prootpassword -e "SHOW VARIABLES LIKE 'log_bin';"
# Should show: log_bin | ON
- Increase Docker memory allocation (8GB+ recommended)
- Scale task managers up via
docker compose up --scale taskmanager=4
- Monitor via Flink Web UI at http://localhost:8081
- Flink Version: 1.19.3 with Java 17
- CDC Latency: Typically <1 second for simple aggregations
- Fault Tolerance: Flink checkpoints enabled for exactly-once processing
- Database Compatibility: Works with MySQL 5.6+, MariaDB 10.x+ (tested with 11.8 LTS)
Built using modern Docker practices with dependency management inspired by successful production deployments.