-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Description
What would you like to happen?
Hello and good evening;
I've been trying to reverse engineer a pure docker of the Flink cluster (and workers) for portable Python Beam. I say reverse engineer because all of the documents I've come across deploy fully in Kubernetes or partially in Docker (kafka in docker with local Flink cluster, or similar).
For a bit more context I'm not bothering with streaming (yet) so I'm not too worried there. Just wanted to interact with S3 and output to a custom DB sink. Pretty basic and well documented (thanks for that btw!).
It's taken me a while and I was getting close. Using DOCKER environment type doesn't work in a docker deployed cluster which makes sense. So, I went the external route sending jobs to a python worker. Since that worker (-p 50000:50000 apache/beam_python3.11_sdk:latest --worker_pool
) lives in docker and is referable by the job server, and job manager, it cannot be referred to as localhost. And there's no evident way to change the control host.
In this issue the following occurs:

When given this prompt - which is legal according to SDK Harness Docs and Flink Runner Docs
python script-beam-bench.py --runner FlinkRunner --flink_master jobmanager:8081 --environment_type EXTERNAL --environment_config beamworker:50000
In the event this issue is actually not, please keep me apprised. I've attached my cluster to this issue. As well as the pipeline I'm testing with.
portable-flink-cluster.docker-compose.yaml
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner