[Stateful] Implement length-aware keying to minimize padding in BatchElements (Part 2/3) #37565

Eliaaazzz · 2026-02-11T11:22:44Z

[Stateful] Implement length-aware keying to minimize padding in BatchElements (Part 2/3)

Rationale

Issue: #37531 (Stateful Core - Part 2)
Part 1: #37532

This PR adds length-aware keying to BatchElements to improve batching efficiency for variable-length inputs (for example, NLP inference workloads).

Today, stateful BatchElements uses one shared key (WithSharedKey). That causes short and long sequences to be mixed in the same batch, so padding is dictated by the longest item and compute is wasted. This PR addresses that by routing elements into length buckets before stateful batching.

What changed

New DoFn: WithLengthBucketKey

Implemented in apache_beam/transforms/util.py
Uses bisect-based bucket lookup
Routes elements into length buckets (for example, 0-16, 16-32, etc.) so similarly sized elements are batched together
Uses a composite key: (worker_uuid, bucket_index)

API updates

BatchElements now accepts length_fn and bucket_boundaries
ModelHandler.init now accepts length_fn and bucket_boundaries
Default boundaries: [16, 32, 64, 128, 256, 512]

Stateful-path integration

Length-aware routing is enabled automatically on the stateful path when max_batch_duration_secs is set and length_fn is provided

Testing and results

Added test_padding_efficiency_bimodal in util_test.py to represent a bimodal workload:

500 short elements (length 5-30)
500 long elements (length 200-512)

Observed result:

Unbucketed baseline padding efficiency: about 68%
Bucketed padding efficiency (this PR): about 77%
Improvement: about +9 percentage points

Interpretation:

Unbucketed path mixes short and long elements, increasing padding waste
Bucketed path separates short/long cohorts, reducing wasted compute and memory

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.

gemini-code-assist · 2026-02-11T11:23:09Z

Summary of Changes

Hello @Eliaaazzz, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant optimization to the BatchElements transform by implementing length-aware keying. This enhancement allows elements of similar lengths to be batched together, drastically reducing padding waste, especially beneficial for workloads with variable-length inputs such as NLP inference. The changes involve a new DoFn for length-based routing, API updates to configure this behavior, and automatic integration into the stateful processing path, leading to improved resource utilization and overall efficiency.

Highlights

Length-Aware Keying Implementation: Introduced a new DoFn, WithLengthBucketKey, which enables elements to be keyed based on their length, routing similarly sized elements to the same state partition for more efficient batching.
API Enhancements: The BatchElements transform and ModelHandler.__init__ now accept length_fn and bucket_boundaries parameters, allowing users to configure length-aware batching. Default bucket boundaries are provided if length_fn is set without explicit boundaries.
Automatic Stateful Path Integration: Length-aware routing is automatically activated on the stateful path when both max_batch_duration_secs and length_fn are provided, ensuring optimized batching for streaming contexts.
Improved Padding Efficiency: New benchmark tests demonstrate a significant improvement in padding efficiency, increasing from approximately 68% to 77% for bimodal length distributions, by separating short and long elements into distinct batches.
Robust Validation: Added validation checks for bucket_boundaries to ensure they are non-empty, sorted, positive, and are only used when length_fn is also provided.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

sdks/python/apache_beam/ml/inference/base.py
- Added length_fn and bucket_boundaries parameters to the ModelHandler.__init__ method.
- Updated docstrings to describe the new length_fn and bucket_boundaries parameters.
- Stored the new batching parameters in _batching_kwargs for configuration.
sdks/python/apache_beam/ml/inference/base_test.py
- Added test_length_fn_and_bucket_boundaries to verify correct parameter passing.
- Added test_length_fn_only to confirm length_fn is passed without bucket_boundaries when only length_fn is provided.
sdks/python/apache_beam/transforms/util.py
- Imported the bisect module for efficient bucket lookup.
- Implemented the WithLengthBucketKey DoFn for length-based element routing to specific state partitions.
- Modified BatchElements.__init__ to accept length_fn and bucket_boundaries.
- Added validation logic for bucket_boundaries to ensure they are sorted, positive, and require length_fn.
- Defined _DEFAULT_BUCKET_BOUNDARIES for length-aware batching when custom boundaries are not provided.
- Updated BatchElements.expand to conditionally use WithLengthBucketKey for stateful, length-aware batching.
sdks/python/apache_beam/transforms/util_test.py
- Imported is_not_empty for new test assertions.
- Added test_length_bucket_assignment to verify the correct bucket indexing by WithLengthBucketKey.
- Added test_stateful_length_aware_constant_batch to ensure elements in distinct length groups produce separate batches.
- Added test_stateful_length_aware_default_boundaries to confirm the application of default bucket boundaries.
- Added tests for ValueError conditions: test_length_aware_requires_length_fn, test_bucket_boundaries_must_be_sorted, and test_bucket_boundaries_must_be_positive.
- Added test_length_fn_without_stateful_is_ignored to confirm that length_fn is ignored if stateful batching is not enabled.
- Added test_padding_efficiency_bimodal benchmark to compare padding efficiency with and without length-aware bucketing.

Activity

No human activity (comments, reviews, etc.) has been recorded on this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-02-11T12:50:14Z

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

codecov · 2026-02-11T14:18:48Z

Codecov Report

❌ Patch coverage is 37.50000% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 35.88%. Comparing base (195cc59) to head (92b546a).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
sdks/python/apache_beam/transforms/util.py	42.85%	16 Missing ⚠️
sdks/python/apache_beam/ml/inference/base.py	0.00%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff            @@
##             master   #37565   +/-   ##
=========================================
  Coverage     35.88%   35.88%           
  Complexity     1691     1691           
=========================================
  Files          1063     1063           
  Lines        166721   166752   +31     
  Branches       1227     1227           
=========================================
+ Hits          59832    59844   +12     
- Misses       104694   104713   +19     
  Partials       2195     2195

Flag	Coverage Δ
python	`39.69% <37.50%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2026-02-11T15:26:53Z

Assigning reviewers:

R: @tvalentyn for label python.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

stop reviewer notifications - opt out of the automated review tooling
remind me after tests pass - tag the comment author after tests pass
waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

- Add length_fn and bucket_boundaries parameters to ModelHandler.__init__ to support length-aware bucketed keying for ML inference batching - Add WithLengthBucketKey DoFn to route elements by length buckets - Update BatchElements to support length-aware batching when max_batch_duration_secs is set, reducing padding waste for variable-length sequences (e.g., NLP workloads) - Default bucket boundaries: [16, 32, 64, 128, 256, 512] - Add comprehensive tests validating bucket assignment, mixed-length batching, and padding efficiency improvements (77% vs 68% on bimodal data) - All formatting (yapf) and lint (pylint 10/10) checks passed

…keying

github-actions bot added the python label Feb 11, 2026

Eliaaazzz force-pushed the feature/stateful-core-length-aware-keying branch from 9f6b1c2 to 92b546a Compare February 11, 2026 14:01

github-actions bot added the Next Action: Reviewers label Feb 11, 2026

Eliaaazzz force-pushed the feature/stateful-core-length-aware-keying branch 2 times, most recently from 8926e75 to 77b23a7 Compare February 12, 2026 02:06

Eliaaazzz force-pushed the feature/stateful-core-length-aware-keying branch from 77b23a7 to 0aa8bcb Compare February 12, 2026 05:44

Merge branch 'apache:master' into feature/stateful-core-length-aware-…

8713eac

…keying

Eliaaazzz force-pushed the feature/stateful-core-length-aware-keying branch from 2f68510 to 8713eac Compare February 12, 2026 08:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Stateful] Implement length-aware keying to minimize padding in BatchElements (Part 2/3) #37565

[Stateful] Implement length-aware keying to minimize padding in BatchElements (Part 2/3) #37565

Eliaaazzz commented Feb 11, 2026

Uh oh!

gemini-code-assist bot commented Feb 11, 2026

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

codecov bot commented Feb 11, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Stateful] Implement length-aware keying to minimize padding in BatchElements (Part 2/3) #37565

Are you sure you want to change the base?

[Stateful] Implement length-aware keying to minimize padding in BatchElements (Part 2/3) #37565

Conversation

Eliaaazzz commented Feb 11, 2026

Uh oh!

gemini-code-assist bot commented Feb 11, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

codecov bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov bot commented Feb 11, 2026 •

edited

Loading