Skip to content

Fix BestEffort race condition and Fallback safety#2

Open
wilbeibi wants to merge 3 commits intomainfrom
fix-best-effort-race-condition-4609469893830213266
Open

Fix BestEffort race condition and Fallback safety#2
wilbeibi wants to merge 3 commits intomainfrom
fix-best-effort-race-condition-4609469893830213266

Conversation

@wilbeibi
Copy link
Owner

Fixed race condition in ActBestEffort for PutObject by splitting the body stream properly and starting operations concurrently. Prevented data corruption in ActFallback for large streams by explicitly failing fast. Improved tests.


PR created automatically by Jules for task 4609469893830213266 started by @wilbeibi

…n in Fallback

This commit addresses several critical issues with the `BestEffort` and `Fallback` routing strategies for `PutObject` and `UploadPart` operations:

1.  **BestEffort Race Condition & Deadlock**:
    -   Previously, `ActBestEffort` used a shared `io.Reader` for both primary and secondary uploads, leading to a race condition where one upload would consume the stream, causing the other to fail or upload partial data.
    -   Also fixed a deadlock in `doParallel` where the primary operation would block waiting for the secondary operation to start consuming the piped stream, but the secondary operation was only started *after* the primary finished. The secondary operation is now started concurrently.
    -   Introduced `tolerantWriter` in `router.go` and updated `teeBody` to support a tolerant mode where write errors to the secondary stream are ignored (for `ActBestEffort`), allowing the primary stream to proceed uninterrupted.

2.  **Fallback Safety**:
    -   `ActFallback` for large streaming bodies is unsafe because the stream cannot be rewound after a failed primary attempt. This would result in an empty body being sent to the secondary.
    -   Added explicit checks in `ops.go` and `multipart_ops.go` to return an error if `ActFallback` is used with a large or unknown-length stream, preventing silent data corruption.
    -   Small bodies (buffered via `drainBody`) continue to support fallback correctly.

3.  **Test Improvements**:
    -   Added `router_ops_test.go` with integration tests for `ActBestEffort` (concurrent write verification) and `ActFallback` (small body retry).
    -   Fixed test stability by using `sync.WaitGroup` instead of `time.Sleep`.
    -   Updated `router_test.go` to match new `teeBody` signature.

Co-authored-by: wilbeibi <659203+wilbeibi@users.noreply.github.com>
@google-labs-jules
Copy link

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

google-labs-jules bot and others added 2 commits February 14, 2026 19:15
The CI workflow was failing because:
1.  `enable-cache: true` was set for `astral-sh/setup-uv`, but no `uv.lock` file exists in the repository, causing a failure.
2.  `python-version` was provided as an input, but `astral-sh/setup-uv` does not support this input.

This change removes the invalid configuration to allow the workflow to proceed. `uv run` will handle Python dependencies as specified in `scripts/validate_config.py`.

Co-authored-by: wilbeibi <659203+wilbeibi@users.noreply.github.com>
This commit addresses several critical issues with the `BestEffort` and `Fallback` routing strategies for `PutObject` and `UploadPart` operations:

1.  **BestEffort Race Condition & Deadlock**:
    -   Previously, `ActBestEffort` used a shared `io.Reader` for both primary and secondary uploads, leading to a race condition where one upload would consume the stream, causing the other to fail or upload partial data.
    -   Also fixed a deadlock in `doParallel` where the primary operation would block waiting for the secondary operation to start consuming the piped stream, but the secondary operation was only started *after* the primary finished. The secondary operation is now started concurrently.
    -   Introduced `tolerantWriter` in `router.go` and updated `teeBody` to support a tolerant mode where write errors to the secondary stream are ignored (for `ActBestEffort`), allowing the primary stream to proceed uninterrupted.

2.  **Fallback Safety**:
    -   `ActFallback` for large streaming bodies is unsafe because the stream cannot be rewound after a failed primary attempt. This would result in an empty body being sent to the secondary.
    -   Added explicit checks in `ops.go` and `multipart_ops.go` to return an error if `ActFallback` is used with a large or unknown-length stream, preventing silent data corruption.
    -   Small bodies (buffered via `drainBody`) continue to support fallback correctly.

3.  **Test Improvements**:
    -   Added `router_ops_test.go` with integration tests for `ActBestEffort` (concurrent write verification) and `ActFallback` (small body retry).
    -   Fixed test stability by using `sync.WaitGroup` instead of `time.Sleep`.
    -   Updated `router_test.go` to match new `teeBody` signature.

4.  **CI Fix**:
    -   Removed invalid `enable-cache` and `python-version` inputs from the `astral-sh/setup-uv` step in `.github/workflows/go.yml` to fix CI failures.

Co-authored-by: wilbeibi <659203+wilbeibi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant