Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support finer-grained control over parallelism #57

Open
kkrugler opened this issue Apr 27, 2016 · 1 comment
Open

Support finer-grained control over parallelism #57

kkrugler opened this issue Apr 27, 2016 · 1 comment

Comments

@kkrugler
Copy link
Contributor

Currently you can only control parallelism for all sources and groups on the entire job, via:

  • flink.num.sourceTasks to specify the parallelism of source tasks.
  • flink.num.shuffleTasks to specify the parallelism of all shuffling tasks (GroupBy, CoGroup).

Though as Fabian says:

Non-shuffling operators such as Each/Map and HashJoin take the parallelism of their predecessor (for HashJoin the first input) to avoid random shuffling.
So an Each/Map or Join that immediately follows a source runs with the source parallelism.
Effectively, most operators will run with the shuffle parallelism, because Each and HashJoin pick it up once their input was shuffled.

Since the job is a single Cascading Step, you can't use the M-R approach of getting the conf for the step and setting the number of reduce tasks.

Chris Wensel recommends calling pipe.getNodeConfigDef(), as each pipe is uniquely named, and this lets the author of the workflow control how much granularity they need on setting parallelism.

Though I'm not sure what should happen if you set the explicit parallelism of a pipe such that it's in conflict with the implicit parallelism from say an upstream shuffle parallelism.

@fhueske
Copy link
Contributor

fhueske commented Apr 28, 2016

Hi Ken,
looks like pipe.getNodeConfigDef() (and the other getConfigDef() methods on the other levels) allow more fine-grained specification of task parallelism.
I'll have to allocate a bit more time to make a safe assessment and finally to implement it.

Thanks for pointing into this direction!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants