Skip to content

[Feature Request]: Worker Control Host Naming argument/parameter #35943

@EDigioacchinoTBL

Description

@EDigioacchinoTBL

What would you like to happen?

Hello and good evening;

I've been trying to reverse engineer a pure docker of the Flink cluster (and workers) for portable Python Beam. I say reverse engineer because all of the documents I've come across deploy fully in Kubernetes or partially in Docker (kafka in docker with local Flink cluster, or similar).

For a bit more context I'm not bothering with streaming (yet) so I'm not too worried there. Just wanted to interact with S3 and output to a custom DB sink. Pretty basic and well documented (thanks for that btw!).

It's taken me a while and I was getting close. Using DOCKER environment type doesn't work in a docker deployed cluster which makes sense. So, I went the external route sending jobs to a python worker. Since that worker (-p 50000:50000 apache/beam_python3.11_sdk:latest --worker_pool) lives in docker and is referable by the job server, and job manager, it cannot be referred to as localhost. And there's no evident way to change the control host.

https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/fn_api_runner/worker_handlers.py#L629

In this issue the following occurs:

Image

When given this prompt - which is legal according to SDK Harness Docs and Flink Runner Docs
python script-beam-bench.py --runner FlinkRunner --flink_master jobmanager:8081 --environment_type EXTERNAL --environment_config beamworker:50000

In the event this issue is actually not, please keep me apprised. I've attached my cluster to this issue. As well as the pipeline I'm testing with.

script-beam-bench.py

portable-flink-cluster.docker-compose.yaml

Issue Priority

Priority: 2 (default / most feature requests should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions