Skip to content

Conversation

rescrv
Copy link
Contributor

@rescrv rescrv commented Sep 4, 2025

Description of changes

This PR will make the push_logs call initialize the log. Existing logs
are unaffected. If a log doesn't exist, a not_found error will be
returned or the request will be empty, as appropriate.

This deletes the paths that push/pull logs in the go log service.

Test plan

CI

Migration plan

This is part of the migration to the rust log service.

We need to plan for how to roll it out such that the rust log service
rolls before Go or the Frontends.

Observability plan

Watch staging.

Documentation Changes

N/A

Copy link
Contributor

propel-code-bot bot commented Sep 4, 2025

Remove Go Log Service, Finalize Full Cutover to Rust Log Service

This PR completes the deprecation and removal of the Go log service, switching all log-related responsibilities entirely to the Rust log service. All RPC handlers in the Go log server now return explicit errors indicating their deprecation, and the Rust log service is responsible for log initialization and all operations. Test files, configs, and supporting code paths connected to Go log logic are removed or updated, ensuring clean migration and preventing accidental use of the old Go code paths.

Key Changes

• All log RPC endpoints in go/pkg/log/server/server.go now return an error (Go log service doesn't support ...; migrated to Rust); Go log routing is removed.
• All forwarding/legacy migration/shim logic in the Rust log service for bridging Go and Rust log services is deleted; push_logs in Rust log service now initializes logs on demand.
• Test files specifically targeting Go log failover and migration (chromadb/test/distributed/test_log_failover.py) are removed.
• Tilt/Kubernetes and YAML config files (tilt_config.yaml, sample_configs/tilt_config.yaml) switched to point exclusively to rust-log-service, removing dual-host/threshold logic.
• Property and integration tests and proptest seeds are updated: property-based and integration tests that checked Go/Rust interop are dropped or refocused.
• Fork, failover, and property-based testing paths related to Go log or migration logic removed.
• Go property-based log server tests remove invariants related to compaction and log state that are no longer meaningful under the Rust-only architecture.

Affected Areas

• go/pkg/log/server
• rust/log-service
• chromadb/test/distributed (test_log_failover.py)
• rust/frontend sample configs
• Build/test workflows (.github/workflows/_python-tests.yml)
• Tilt/Kubernetes deployment configs
• Property/proptest test regressions
• Cross-service fork/failover handling in Python/Rust integration tests

This summary was automatically generated by @propel-code-bot

Copy link

github-actions bot commented Sep 4, 2025

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Comment on lines 615 to +617
if let Err(wal3::Error::UninitializedLog) = res {
return self
.forward_update_collection_log_offset(Request::new(request))
.await;
return Err(Status::not_found(format!(
"collection {collection_id} not found"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Error handling inconsistency: The function now returns Err(Status::not_found(...)) for uninitialized logs, but callers may expect different error semantics. This breaking change could cause issues if other parts of the system rely on the previous forwarding behavior.

Consider:

// Add clear documentation about the new behavior
// Or provide a migration path for existing callers
if let Err(wal3::Error::UninitializedLog) = res {
    tracing::warn!("Collection {} not found in rust log service", collection_id);
    return Err(Status::not_found(format!(
        "collection {collection_id} not found"
    )));
}
Context for Agents
[**BestPractice**]

**Error handling inconsistency**: The function now returns `Err(Status::not_found(...))` for uninitialized logs, but callers may expect different error semantics. This breaking change could cause issues if other parts of the system rely on the previous forwarding behavior.

Consider:
```rust
// Add clear documentation about the new behavior
// Or provide a migration path for existing callers
if let Err(wal3::Error::UninitializedLog) = res {
    tracing::warn!("Collection {} not found in rust log service", collection_id);
    return Err(Status::not_found(format!(
        "collection {collection_id} not found"
    )));
}
```

File: rust/log-service/src/lib.rs
Line: 617

Tiltfile Outdated
@@ -259,7 +259,6 @@ k8s_resource('postgres', resource_deps=['k8s_setup'], labels=["infrastructure"],
# Jobs are suffixed with the image tag to ensure they are unique. In this context, the image tag is defined in k8s/distributed-chroma/values.yaml.
k8s_resource('sysdb-migration-latest', resource_deps=['postgres'], labels=["infrastructure"])
k8s_resource('logservice-migration-latest', resource_deps=['postgres'], labels=["infrastructure"])
k8s_resource('logservice', resource_deps=['sysdb-migration-latest'], labels=["chroma"], port_forwards='50052:50051')
k8s_resource('rust-log-service', labels=["chroma"], port_forwards='50054:50051')
k8s_resource('sysdb', resource_deps=['sysdb-migration-latest'], labels=["chroma"], port_forwards='50051:50051')
k8s_resource('rust-frontend-service', resource_deps=['sysdb', 'logservice', 'rust-log-service'], labels=["chroma"], port_forwards='8000:8000')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CriticalError]

The logservice resource is removed in this PR, but it's still listed as a dependency for rust-frontend-service. This will likely cause Tilt to fail. The dependency should be removed to prevent this.

Suggested change
k8s_resource('rust-frontend-service', resource_deps=['sysdb', 'logservice', 'rust-log-service'], labels=["chroma"], port_forwards='8000:8000')
k8s_resource('rust-frontend-service', resource_deps=['sysdb', 'rust-log-service'], labels=["chroma"], port_forwards='8000:8000')

Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents
[**CriticalError**]

The `logservice` resource is removed in this PR, but it's still listed as a dependency for `rust-frontend-service`. This will likely cause Tilt to fail. The dependency should be removed to prevent this.

```suggestion
k8s_resource('rust-frontend-service', resource_deps=['sysdb', 'rust-log-service'], labels=["chroma"], port_forwards='8000:8000')
```

⚡ **Committable suggestion**

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

File: Tiltfile
Line: 264

Err(wal3::Error::UninitializedLog) => {
return self.forward_pull_logs(Request::new(pull_logs)).await;
}
Err(wal3::Error::UninitializedLog) => vec![],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

This change in error handling from forwarding to proxy to returning empty results could mask legitimate errors. Consider whether returning an empty vector for UninitializedLog is the correct behavior, as it might make it difficult for clients to distinguish between "no records exist" and "collection doesn't exist".

// Consider returning a more explicit error or status
Err(wal3::Error::UninitializedLog) => vec![], // This might hide collection existence issues
Context for Agents
[**BestPractice**]

This change in error handling from forwarding to proxy to returning empty results could mask legitimate errors. Consider whether returning an empty vector for `UninitializedLog` is the correct behavior, as it might make it difficult for clients to distinguish between "no records exist" and "collection doesn't exist".

```rust
// Consider returning a more explicit error or status
Err(wal3::Error::UninitializedLog) => vec![], // This might hide collection existence issues
```

File: rust/log-service/src/lib.rs
Line: 1246

This PR will make the push_logs call initialize the log.  Existing logs
are unaffected.  If a log doesn't exist, a not_found error will be
returned or the request will be empty, as appropriate.

This deletes the paths that push/pull logs in the go log service.

CI

This is part of the migration to the rust log service.

We need to plan for how to roll it out such that the rust log service
rolls before Go or the Frontends.

Watch staging.

N/A
@@ -248,7 +248,7 @@ async fn get_log_from_handle_with_mutex_held<'a>(
_phantom: std::marker::PhantomData,
});
}
let opened = LogWriter::open(
let opened = LogWriter::open_or_initialize(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CriticalError]

Using LogWriter::open_or_initialize instead of LogWriter::open changes initialization behavior. If this replaces logs that were migrated from another service, this could create new empty logs instead of properly opening existing ones. Consider verifying this won't cause data loss for migrated collections.

// Verify the log exists before initializing a new one
let opened = match LogWriter::open(options.clone(), Arc::clone(storage), prefix, mark_dirty).await {
    Ok(writer) => writer,
    Err(wal3::Error::UninitializedLog) => {
        return Err(Status::not_found(format!("collection {collection_id} not found")));
    }
    Err(err) => return Err(Status::unknown(err.to_string())),
};
Context for Agents
[**CriticalError**]

Using `LogWriter::open_or_initialize` instead of `LogWriter::open` changes initialization behavior. If this replaces logs that were migrated from another service, this could create new empty logs instead of properly opening existing ones. Consider verifying this won't cause data loss for migrated collections.

```rust
// Verify the log exists before initializing a new one
let opened = match LogWriter::open(options.clone(), Arc::clone(storage), prefix, mark_dirty).await {
    Ok(writer) => writer,
    Err(wal3::Error::UninitializedLog) => {
        return Err(Status::not_found(format!("collection {collection_id} not found")));
    }
    Err(err) => return Err(Status::unknown(err.to_string())),
};
```

File: rust/log-service/src/lib.rs
Line: 251

@@ -1574,19 +1327,6 @@ impl LogServer {
Arc::clone(&storage),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CriticalError]

The ForkLogs method removes the check for uninitialized logs and proceeds with copying. This could cause issues if the source collection doesn't exist. The copy operation might create an empty target log even when the source is invalid.

// Add validation for source log existence
if let Err(err) = log_reader.next_write_timestamp().await {
    match err {
        wal3::Error::UninitializedLog => {
            return Err(Status::not_found(format!(
                "source collection {source_collection_id} not found"
            )));
        }
        _ => {
            return Err(Status::new(
                err.code().into(),
                format!("Failed to load log: {}", err),
            ));
        }
    }
}
Context for Agents
[**CriticalError**]

The `ForkLogs` method removes the check for uninitialized logs and proceeds with copying. This could cause issues if the source collection doesn't exist. The copy operation might create an empty target log even when the source is invalid.

```rust
// Add validation for source log existence
if let Err(err) = log_reader.next_write_timestamp().await {
    match err {
        wal3::Error::UninitializedLog => {
            return Err(Status::not_found(format!(
                "source collection {source_collection_id} not found"
            )));
        }
        _ => {
            return Err(Status::new(
                err.code().into(),
                format!("Failed to load log: {}", err),
            ));
        }
    }
}
```

File: rust/log-service/src/lib.rs
Line: 1327

Comment on lines 1079 to 1083
Err(wal3::Error::UninitializedLog) => {
tracing::info!("forwarding because log uninitialized");
return self
.forward_push_logs(collection_id, Request::new(push_logs))
.await;
return Err(Status::not_found(format!(
"collection {collection_id} not found"
)));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

With the change from LogWriter::open to LogWriter::open_or_initialize (line 251), it seems that get_log_from_handle will now create the log if it doesn't exist, and should no longer return wal3::Error::UninitializedLog. If that's correct, this match arm might be dead code.

This also seems to contradict a part of the PR description which says for push_logs "If a log doesn't exist, a not_found error will be returned". The code change to open_or_initialize seems to implement "This PR will make the push_logs call initialize the log." instead, which feels more correct for a write operation.

Could you clarify if this error path is still reachable? If not, removing this arm would make the new behavior clearer.

Context for Agents
[**BestPractice**]

With the change from `LogWriter::open` to `LogWriter::open_or_initialize` (line 251), it seems that `get_log_from_handle` will now create the log if it doesn't exist, and should no longer return `wal3::Error::UninitializedLog`. If that's correct, this match arm might be dead code.

This also seems to contradict a part of the PR description which says for `push_logs` "If a log doesn't exist, a not_found error will be returned". The code change to `open_or_initialize` seems to implement "This PR will make the push_logs call initialize the log." instead, which feels more correct for a write operation.

Could you clarify if this error path is still reachable? If not, removing this arm would make the new behavior clearer.

File: rust/log-service/src/lib.rs
Line: 1083

@blacksmith-sh blacksmith-sh bot deleted a comment from rescrv Sep 5, 2025
@rescrv rescrv merged commit 4de20e4 into main Sep 5, 2025
97 of 115 checks passed
chroma-droid pushed a commit that referenced this pull request Sep 5, 2025
## Description of changes

This PR will make the push_logs call initialize the log.  Existing logs
are unaffected.  If a log doesn't exist, a not_found error will be
returned or the request will be empty, as appropriate.

This deletes the paths that push/pull logs in the go log service.

## Test plan

CI

## Migration plan

This is part of the migration to the rust log service.

We need to plan for how to roll it out such that the rust log service
rolls before Go or the Frontends.

## Observability plan

Watch staging.

## Documentation Changes

N/A
rescrv added a commit that referenced this pull request Sep 5, 2025
This PR cherry-picks the commit 4de20e4
onto rc/2025-09-05. If there are unresolved conflicts, please resolve
them manually.

Co-authored-by: Robert Escriva <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant