Skip to content

Conversation

@Eliaaazzz
Copy link
Contributor

[Stateful] Implement length-aware keying to minimize padding in BatchElements (Part 2/3)

Rationale

Issue: #37531 (Stateful Core - Part 2)
Part 1: #37532

This PR adds length-aware keying to BatchElements to improve batching efficiency for variable-length inputs (for example, NLP inference workloads).

Today, stateful BatchElements uses one shared key (WithSharedKey). That causes short and long sequences to be mixed in the same batch, so padding is dictated by the longest item and compute is wasted. This PR addresses that by routing elements into length buckets before stateful batching.

What changed

  1. New DoFn: WithLengthBucketKey
  • Implemented in apache_beam/transforms/util.py
  • Uses bisect-based bucket lookup
  • Routes elements into length buckets (for example, 0-16, 16-32, etc.) so similarly sized elements are batched together
  • Uses a composite key: (worker_uuid, bucket_index)
  1. API updates
  • BatchElements now accepts length_fn and bucket_boundaries
  • ModelHandler.init now accepts length_fn and bucket_boundaries
  • Default boundaries: [16, 32, 64, 128, 256, 512]
  1. Stateful-path integration
  • Length-aware routing is enabled automatically on the stateful path when max_batch_duration_secs is set and length_fn is provided

Testing and results

Added test_padding_efficiency_bimodal in util_test.py to represent a bimodal workload:

  • 500 short elements (length 5-30)
  • 500 long elements (length 200-512)

Observed result:

  • Unbucketed baseline padding efficiency: about 68%
  • Bucketed padding efficiency (this PR): about 77%
  • Improvement: about +9 percentage points

Interpretation:

  • Unbucketed path mixes short and long elements, increasing padding waste
  • Bucketed path separates short/long cohorts, reducing wasted compute and memory

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Eliaaazzz, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant optimization to the BatchElements transform by implementing length-aware keying. This enhancement allows elements of similar lengths to be batched together, drastically reducing padding waste, especially beneficial for workloads with variable-length inputs such as NLP inference. The changes involve a new DoFn for length-based routing, API updates to configure this behavior, and automatic integration into the stateful processing path, leading to improved resource utilization and overall efficiency.

Highlights

  • Length-Aware Keying Implementation: Introduced a new DoFn, WithLengthBucketKey, which enables elements to be keyed based on their length, routing similarly sized elements to the same state partition for more efficient batching.
  • API Enhancements: The BatchElements transform and ModelHandler.__init__ now accept length_fn and bucket_boundaries parameters, allowing users to configure length-aware batching. Default bucket boundaries are provided if length_fn is set without explicit boundaries.
  • Automatic Stateful Path Integration: Length-aware routing is automatically activated on the stateful path when both max_batch_duration_secs and length_fn are provided, ensuring optimized batching for streaming contexts.
  • Improved Padding Efficiency: New benchmark tests demonstrate a significant improvement in padding efficiency, increasing from approximately 68% to 77% for bimodal length distributions, by separating short and long elements into distinct batches.
  • Robust Validation: Added validation checks for bucket_boundaries to ensure they are non-empty, sorted, positive, and are only used when length_fn is also provided.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • sdks/python/apache_beam/ml/inference/base.py
    • Added length_fn and bucket_boundaries parameters to the ModelHandler.__init__ method.
    • Updated docstrings to describe the new length_fn and bucket_boundaries parameters.
    • Stored the new batching parameters in _batching_kwargs for configuration.
  • sdks/python/apache_beam/ml/inference/base_test.py
    • Added test_length_fn_and_bucket_boundaries to verify correct parameter passing.
    • Added test_length_fn_only to confirm length_fn is passed without bucket_boundaries when only length_fn is provided.
  • sdks/python/apache_beam/transforms/util.py
    • Imported the bisect module for efficient bucket lookup.
    • Implemented the WithLengthBucketKey DoFn for length-based element routing to specific state partitions.
    • Modified BatchElements.__init__ to accept length_fn and bucket_boundaries.
    • Added validation logic for bucket_boundaries to ensure they are sorted, positive, and require length_fn.
    • Defined _DEFAULT_BUCKET_BOUNDARIES for length-aware batching when custom boundaries are not provided.
    • Updated BatchElements.expand to conditionally use WithLengthBucketKey for stateful, length-aware batching.
  • sdks/python/apache_beam/transforms/util_test.py
    • Imported is_not_empty for new test assertions.
    • Added test_length_bucket_assignment to verify the correct bucket indexing by WithLengthBucketKey.
    • Added test_stateful_length_aware_constant_batch to ensure elements in distinct length groups produce separate batches.
    • Added test_stateful_length_aware_default_boundaries to confirm the application of default bucket boundaries.
    • Added tests for ValueError conditions: test_length_aware_requires_length_fn, test_bucket_boundaries_must_be_sorted, and test_bucket_boundaries_must_be_positive.
    • Added test_length_fn_without_stateful_is_ignored to confirm that length_fn is ignored if stateful batching is not enabled.
    • Added test_padding_efficiency_bimodal benchmark to compare padding efficiency with and without length-aware bucketing.
Activity
  • No human activity (comments, reviews, etc.) has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@Eliaaazzz Eliaaazzz force-pushed the feature/stateful-core-length-aware-keying branch from 9f6b1c2 to 92b546a Compare February 11, 2026 14:01
@codecov
Copy link

codecov bot commented Feb 11, 2026

Codecov Report

❌ Patch coverage is 37.50000% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 35.88%. Comparing base (195cc59) to head (92b546a).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
sdks/python/apache_beam/transforms/util.py 42.85% 16 Missing ⚠️
sdks/python/apache_beam/ml/inference/base.py 0.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master   #37565   +/-   ##
=========================================
  Coverage     35.88%   35.88%           
  Complexity     1691     1691           
=========================================
  Files          1063     1063           
  Lines        166721   166752   +31     
  Branches       1227     1227           
=========================================
+ Hits          59832    59844   +12     
- Misses       104694   104713   +19     
  Partials       2195     2195           
Flag Coverage Δ
python 39.69% <37.50%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Contributor

Assigning reviewers:

R: @tvalentyn for label python.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@Eliaaazzz Eliaaazzz force-pushed the feature/stateful-core-length-aware-keying branch 2 times, most recently from 8926e75 to 77b23a7 Compare February 12, 2026 02:06
- Add length_fn and bucket_boundaries parameters to ModelHandler.__init__
  to support length-aware bucketed keying for ML inference batching
- Add WithLengthBucketKey DoFn to route elements by length buckets
- Update BatchElements to support length-aware batching when
  max_batch_duration_secs is set, reducing padding waste for
  variable-length sequences (e.g., NLP workloads)
- Default bucket boundaries: [16, 32, 64, 128, 256, 512]
- Add comprehensive tests validating bucket assignment, mixed-length
  batching, and padding efficiency improvements (77% vs 68% on bimodal data)
- All formatting (yapf) and lint (pylint 10/10) checks passed
@Eliaaazzz Eliaaazzz force-pushed the feature/stateful-core-length-aware-keying branch from 77b23a7 to 0aa8bcb Compare February 12, 2026 05:44
@Eliaaazzz Eliaaazzz force-pushed the feature/stateful-core-length-aware-keying branch from 2f68510 to 8713eac Compare February 12, 2026 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant