A collection of example Apache Flink streaming jobs demonstrating various data processing patterns and best practices for the Cheetah data platform.
This repository contains multiple real-world examples of Apache Flink jobs, each showcasing different streaming data processing concepts:
- AvroToJson - Convert Avro-serialized messages to JSON format
- EnrichStream - Enrich streaming data with additional information
- ExternalLookup - Perform lookups against external APIs during stream processing
- FlinkStates - Demonstrate Flink's stateful processing capabilities (ValueState, ReducingState, AggregatingState, ListState, MapState)
- JsonToAvro - Convert JSON messages to Avro format with schema registry integration
- KeySerializationSchema - Custom serialization for Kafka message keys
- MultipleSideOutput - Split streams into multiple outputs based on conditions
- Observability - Monitor and instrument Flink jobs with metrics and logging
- SerializationErrorCatch - Handle serialization errors gracefully
- SerializationErrorSideOutput - Route serialization errors to a separate output stream
- TransformAndStore - Transform data and store results in multiple sinks
- TumblingWindow - Implement time-based windowing operations
All examples require the local development infrastructure from the cheetah-development-infrastructure repository. This provides:
- Apache Kafka for message streaming
- Schema Registry for Avro schemas
- Keycloak for authentication
- Redpanda Console for Kafka inspection
Start the infrastructure:
# Clone the infrastructure repo
git clone https://github.com/trifork/cheetah-development-infrastructure
cd cheetah-development-infrastructure
# Start Kafka and related services
docker compose --profile kafka up -dThe Flink jobs depend on Maven packages from the Cheetah Maven Repository on GitHub. You'll need to:
- Create a GitHub Personal Access Token with
read:packagesscope at https://github.com/settings/tokens/new - Set environment variables:
export GITHUB_ACTOR=your-github-username
export GITHUB_TOKEN=your-github-tokenEach example requires specific Kafka topics to be created before running. Refer to each job's README for the required topics and setup instructions.
Each example is self-contained with:
/src- Java source code for the Flink job/ComponentTest- .NET integration testsdocker-compose.yaml- Configuration for running locallyREADME.md- Detailed documentation and setup instructions
To run an example:
# Navigate to an example directory
cd AvroToJson
# Build and start the Flink job
docker compose up --buildFor detailed information about each job, including specific configuration, testing procedures, and implementation details, see the README in each example's directory.
Each example includes:
- Unit tests - JUnit tests in
/src/test(automatically run during Maven build) - Component tests - .NET integration tests in
/ComponentTestthat verify end-to-end behavior
Run component tests via Docker Compose or directly with dotnet test.