Skip to content

Conversation

@jackieismpc
Copy link
Contributor

Summary:
This PR refactors the underlying storage module of Libra, introducing a multi-cloud adapter based on object_store and an intelligent tiered storage architecture. By using a "Synchronous Facade" pattern to encapsulate underlying asynchronous I/O, it achieves seamless support for S3/R2 without breaking existing business logic.

Key Changes:

  1. Dependencies:

    • Added object_store, async-trait, lru-mem, and tokio.
  2. Storage Architecture Refactor (src/utils/storage/):

    • StorageTrait: Defined standard async storage interfaces (get, put, exist, search).

    • LocalStorage: Migrated and optimized existing file system storage logic, resolving HashKind context loss issues in multi-threaded environments.

    • RemoteStorage: Implemented object access based on S3/R2, handling Git object compression and header encapsulation.

    • TieredStorage: Implemented tiered routing logic and LRU cache management.

  3. Client Adaptation (src/utils/client_storage.rs):

    • Refactored ClientStorage to maintain an internal tokio::runtime::Runtime, bridging synchronous API calls to underlying asynchronous implementations.

    • Supported dynamic storage mode switching via environment variables (LIBRA_STORAGE_TYPE, LIBRA_STORAGE_BUCKET, etc.).

  4. Tests:

Signed-off-by: jackie <[email protected]>
Signed-off-by: jackieismpc <[email protected]>
@genedna
Copy link
Member

genedna commented Jan 18, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6b7a7301b8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an async-aware, pluggable storage architecture with a new Storage trait, local/remote/tiered backends, and R2/S3 support via object_store, while keeping the existing synchronous ClientStorage API by bridging through an internal Tokio runtime. It also wires in the necessary Buck/Cargo dependencies and adds tests for the new storage behavior, including in-memory and Cloudflare R2 integration tests.

Changes:

  • Added Storage trait plus LocalStorage, RemoteStorage (via object_store), and TieredStorage (local+remote with size-based LRU caching) under src/utils/storage/.
  • Refactored ClientStorage to delegate to an async Storage backend through a dedicated Tokio runtime and environment-driven backend selection (LIBRA_STORAGE_*).
  • Extended build metadata (Cargo, BUCK, third-party crates) and added tests for remote/tiered behavior and R2 integration, plus a small fix to index_pack temp-dir usage.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
third-party/rust/crates/ring/0.17.14/BUCK Enables the std feature for ring to satisfy TLS requirements from object_store.
third-party/rust/crates/quick-xml/0.38.4/BUCK Adds Buck rule for quick-xml, used transitively by object_store.
third-party/rust/crates/object_store/0.12.4/BUCK Adds Buck rule for object_store with cloud/back-end features needed for S3/R2-style storage.
third-party/rust/crates/humantime/2.3.0/BUCK Adds Buck rule for humantime, a dependency of object_store.
tests/storage_r2_test.rs New tests exercising RemoteStorage with in-memory backend, TieredStorage size-based behavior, and (ignored-by-default) Cloudflare R2 integration using ClientStorage.
src/utils/storage/tiered.rs Implements TieredStorage that routes reads/writes between LocalStorage and RemoteStorage, caching large objects locally in an LRU that deletes files on eviction.
src/utils/storage/remote.rs Implements RemoteStorage using object_store::ObjectStore, handling Git object header construction, zlib compression, and path mapping.
src/utils/storage/mod.rs Defines the async Storage trait (get/put/exist/search) and re-exports the new storage backends.
src/utils/storage/local.rs Refactors and encapsulates the previous on-disk Git object storage logic (loose + pack, including index building and caches) behind the Storage trait, adding hash-kind capture for multi-threaded contexts.
src/utils/mod.rs Exposes the new storage module from utils.
src/utils/client_storage.rs Reworks ClientStorage into a thin synchronous facade over async Storage implementations, introduces a dedicated Tokio runtime, environment-based backend selection, and adjusts tests accordingly.
src/command/index_pack.rs Changes pack index v2 building to use the system temp directory for Pack scratch space to avoid DirectoryNotEmpty errors during cleanup.
buckal.snap Updates Buckal fingerprints to account for new dependencies (object_store, quick-xml, humantime, modified ring features).
Cargo.toml Adds object_store and async-trait dependencies to the libra crate.
Cargo.lock Locks in versions for object_store, quick-xml, humantime, and updates dependency graph accordingly.
BUCK Wires async-trait and object_store into the main binary/library/test targets and adds a Buck target for the new storage_r2_test.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Signed-off-by: jackie <[email protected]>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 16 changed files in this pull request and generated 2 comments.

.inner
.get(&path)
.await
.map_err(|e| GitError::ObjectNotFound(format!("Remote object not found: {}", e)))?;
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In RemoteStorage::get, all errors from inner.get(&path).await are mapped to GitError::ObjectNotFound, which conflates genuine "not found" cases with network, authentication, or other backend failures and makes debugging operational issues much harder. Consider preserving the underlying error type (e.g., using a dedicated GitError variant or attaching the original error as context) so callers can distinguish between missing objects and remote I/O/configuration problems.

Suggested change
.map_err(|e| GitError::ObjectNotFound(format!("Remote object not found: {}", e)))?;
.map_err(|e| match e {
object_store::Error::NotFound { .. } => {
GitError::ObjectNotFound(format!("Remote object not found at path {}", path))
}
other => GitError::IOError(std::io::Error::other(other)),
})?;

Copilot uses AI. Check for mistakes.
Comment on lines +175 to +186
unsafe {
env::set_var("LIBRA_STORAGE_TYPE", "r2");
env::set_var("LIBRA_STORAGE_BUCKET", "libra"); // Non-existent bucket (initially)
env::set_var(
"LIBRA_STORAGE_ENDPOINT",
format!("https://{}.r2.cloudflarestorage.com", R2_ACCOUNT_ID),
);
env::set_var("LIBRA_STORAGE_ACCESS_KEY", R2_ACCESS_KEY);
env::set_var("LIBRA_STORAGE_SECRET_KEY", R2_SECRET_KEY);
env::set_var("LIBRA_STORAGE_REGION", "auto");
env::set_var("LIBRA_STORAGE_THRESHOLD", "10");
}
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These integration tests wrap a series of env::set_var calls in an unsafe block even though set_var is safe, which is misleading and may suggest that undefined behavior is involved where there is none. You can drop the unsafe block entirely and call env::set_var directly, keeping the tests clearer and avoiding unnecessary use of unsafe.

Suggested change
unsafe {
env::set_var("LIBRA_STORAGE_TYPE", "r2");
env::set_var("LIBRA_STORAGE_BUCKET", "libra"); // Non-existent bucket (initially)
env::set_var(
"LIBRA_STORAGE_ENDPOINT",
format!("https://{}.r2.cloudflarestorage.com", R2_ACCOUNT_ID),
);
env::set_var("LIBRA_STORAGE_ACCESS_KEY", R2_ACCESS_KEY);
env::set_var("LIBRA_STORAGE_SECRET_KEY", R2_SECRET_KEY);
env::set_var("LIBRA_STORAGE_REGION", "auto");
env::set_var("LIBRA_STORAGE_THRESHOLD", "10");
}
env::set_var("LIBRA_STORAGE_TYPE", "r2");
env::set_var("LIBRA_STORAGE_BUCKET", "libra"); // Non-existent bucket (initially)
env::set_var(
"LIBRA_STORAGE_ENDPOINT",
format!("https://{}.r2.cloudflarestorage.com", R2_ACCOUNT_ID),
);
env::set_var("LIBRA_STORAGE_ACCESS_KEY", R2_ACCESS_KEY);
env::set_var("LIBRA_STORAGE_SECRET_KEY", R2_SECRET_KEY);
env::set_var("LIBRA_STORAGE_REGION", "auto");
env::set_var("LIBRA_STORAGE_THRESHOLD", "10");

Copilot uses AI. Check for mistakes.
@genedna
Copy link
Member

genedna commented Jan 19, 2026

@claude

@github-actions
Copy link
Contributor

github-actions bot commented Jan 19, 2026

Claude finished @genedna's task in 2m 23s —— View job


Code Review Complete

I've completed a comprehensive code review of this PR introducing tiered object storage with S3/R2 support. This is a significant architectural improvement that adds multi-cloud storage capabilities while maintaining backward compatibility.

Progress

  • Check repository guidelines (CLAUDE.md)
  • Review storage architecture (trait and implementations)
  • Review sync/async bridging in client_storage.rs
  • Evaluate test coverage and quality
  • Assess security implications
  • Review error handling
  • Check documentation
  • Validate bot feedback and provide additional insights

Summary

This is a well-structured refactoring that introduces a clean storage abstraction layer. The implementation demonstrates good architectural decisions with the async trait pattern and tiered storage approach. However, there are several critical issues and improvements needed before merging.

Critical Issues 🔴

1. LRU Cache Size Accounting Bug (src/utils/storage/tiered.rs:22-24)

The automated review correctly identified this. CachedFile::heap_size() returns self.size, but the actual implementation uses the file size correctly. However, there's a more fundamental issue: the LRU cache is configured with 200MB as the size limit, but HeapSize should track memory overhead, not disk usage.

Problem: The cache tracks file sizes correctly (line 95: size: data.len()), but mixing disk space management with memory-based LRU is semantically confusing. If a 100MB blob is cached, it counts as 100MB against the LRU limit, which is correct for disk cache but the naming suggests memory accounting.

Impact: Medium - The code works as intended for disk cache management, but the semantic mismatch could cause confusion.

Recommendation: Either rename cache_size parameter to cache_disk_size_bytes or consider a count-based cache with explicit disk space tracking. Fix this →

2. Remote Search Not Implemented (src/utils/storage/remote.rs:104-137)

The bot reviews identified that RemoteStorage::search returns an empty list in tiered mode. Looking at the code, this is incorrect - the implementation at lines 104-137 does implement search functionality using object_store::list(). However, there's a performance concern:

Issue: TieredStorage::search (line 145) only searches local storage and ignores remote. This means objects that only exist remotely cannot be found via prefix search.

Impact: High - Abbreviated object IDs and ref navigation (e.g., abc123~1) will fail if objects are only in remote storage.

Test Coverage: The test at tests/storage_r2_test.rs:95-123 validates remote search works correctly with the in-memory backend.

Recommendation: In TieredStorage::search, merge results from both local and remote searches, deduplicating by hash. Fix this →

3. Panic on Invalid Storage Type (src/utils/client_storage.rs:73)

_ => panic!("Unsupported storage type: {}", storage_type),

Problem: Configuration errors cause hard crashes instead of graceful error handling.

Impact: High - Production misconfiguration leads to panics rather than recoverable errors.

Recommendation: Return a Result or fall back to local storage with a warning. The bot review correctly identified this. Fix this →

4. Error Conflation in Remote Storage (src/utils/storage/remote.rs:41)

.map_err(|e| GitError::ObjectNotFound(format!("Remote object not found: {}", e)))?;

Problem: All errors (network failures, auth errors, timeouts, actual 404s) are mapped to ObjectNotFound. The latest bot review suggested preserving error types.

Impact: High - Debugging operational issues becomes very difficult when all failures look like missing objects.

Recommendation: Add error context preservation. Consider the suggestion from the bot review to match on object_store::Error::NotFound separately. Fix this →

High Priority Issues 🟡

5. Unnecessary Unsafe Blocks (tests/storage_r2_test.rs:130, 175)

unsafe {
    env::set_var("LIBRA_STORAGE_TYPE", "r2");
    // ...
}

Problem: env::set_var is not unsafe in Rust. These unsafe blocks are misleading and suggest UB where none exists.

Impact: Medium - Code clarity and false sense of danger around safe operations.

Note: Multiple bot reviews flagged this consistently.

Recommendation: Remove all unsafe blocks around env::set_var calls. Fix this →

6. Hardcoded Test Credentials (tests/storage_r2_test.rs:13-15)

const R2_ACCOUNT_ID: &str = "1234";
const R2_ACCESS_KEY: &str = "12345678";
const R2_SECRET_KEY: &str = "dbedudygydgwedqudgwedqugwedqugwedqugwedqugwedqugwedqugwedqugwedqug";

Problem: Even though these are dummy values, having credential patterns in source code is a security anti-pattern.

Impact: Medium - Could lead to accidental credential commits in the future.

Recommendation: Load from environment variables or use clearly fake values like "REPLACE_WITH_ACTUAL_KEY".

7. Hash Kind Thread Safety (src/utils/storage/local.rs:52, 387-389)

hash_kind: Option<HashKind>, // Capture hash kind from creation thread
// ...
if let Some(kind) = self_clone.hash_kind {
    set_hash_kind(kind);
}

Problem: The code captures hash_kind at creation time and re-sets it in spawned tasks. This is a workaround for thread-local state issues but could cause problems if different storages with different hash kinds are used concurrently.

Impact: Medium - Potential race conditions in multi-threaded scenarios with mixed hash types.

Architecture: The PR description mentions this fixes "HashKind context loss issues in multi-threaded environments," which is good. However, relying on thread-local state is brittle.

Recommendation: Consider making HashKind part of the ObjectHash type itself, or passing it explicitly through all operations. This is likely a larger refactor but would be more robust.

8. Index Pack Temp Directory Change (src/command/index_pack.rs:200-201)

// Use system temp dir to avoid DirectoryNotEmpty error if Pack tries to clean up the dir
let tmp_path = std::env::temp_dir();

Problem: This changes behavior to use system temp instead of pack directory. While it fixes the DirectoryNotEmpty issue, it could cause problems:

  • Temp files might persist across crashes
  • Cross-filesystem operations might be slower
  • Cleanup responsibility is unclear

Impact: Medium - Could affect reliability and performance.

Recommendation: Document why this change was needed and ensure proper cleanup. Consider using tempfile::TempDir for automatic cleanup.

Medium Priority Issues 🟢

9. Missing Documentation for Storage Trait

The Storage trait (src/utils/storage/mod.rs:12-31) has minimal documentation. The interface is clear, but documenting expected behavior for each method would help:

  • What happens if get is called on a non-existent object?
  • Are put operations idempotent?
  • What are the performance characteristics of search?

10. No Validation of Environment Variables

The storage configuration (src/utils/client_storage.rs:46-90) reads environment variables but doesn't validate them:

  • Bucket names have format requirements
  • Endpoints should be valid URLs
  • Keys should have minimum lengths

Recommendation: Add basic validation and return clear error messages.

11. Blocking Operations in Async Context

The block_on_storage helper (src/utils/client_storage.rs:94-105) uses channels to bridge async/sync boundaries. While functional, this creates overhead:

let (tx, rx) = mpsc::channel();
RUNTIME.spawn(async move {
    let res = future.await;
    let _ = tx.send(res);
});
rx.recv().unwrap()

Issue: Each storage operation spawns a task and uses a channel. For high-frequency operations, this could be expensive.

Recommendation: Consider using RUNTIME.block_on() directly for simpler cases, or document why the channel approach is necessary.

Security Considerations 🔒

12. HTTP Allowed by Default

Line 69: builder = builder.with_allow_http(true);

Problem: This allows unencrypted HTTP connections to object storage, which could expose credentials and data.

Impact: High if used in production, Low if only for local testing.

Recommendation: Make this configurable via environment variable and default to false. Only enable for explicit local development scenarios.

13. No Credential Validation

The code accepts any credential values without validation. Consider:

  • Checking for empty credentials
  • Warning if credentials look like placeholder values
  • Supporting credential providers (IAM roles, etc.)

Test Coverage Assessment ✅

Strengths:

  • Good mock testing with in-memory backend
  • Tests cover basic operations (get/put/exist)
  • Search functionality is tested
  • Tiered storage logic (size-based routing) is validated

Gaps:

  1. No tests for error conditions (network failures, invalid credentials)
  2. No tests for LRU eviction behavior
  3. No tests for concurrent access patterns
  4. Integration tests are #[ignore]d by default (expected, but ensure CI runs them somewhere)
  5. No tests for the search limitation in TieredStorage

Recommendation: Add negative test cases and stress tests for cache eviction.

Code Quality 👍

Positives:

  1. Clean trait-based abstraction
  2. Good separation of concerns
  3. Comprehensive inline documentation in implementation files
  4. Smart use of async_trait for uniform interface
  5. Proper use of Arc for shared ownership
  6. Good test structure with mock backends

Areas for Improvement:

  1. Error handling could be more granular
  2. Some magic numbers (e.g., cache sizes) should be constants
  3. Consider using tracing for observability instead of println! in tests

Performance Considerations ⚡

  1. Pack File Caching: The static PACK_FILE_CACHE loads entire pack files into memory. For large repos, this could be problematic.

  2. Remote Search Performance: The RemoteStorage::search implementation lists objects from remote storage, which could be slow for large buckets. Consider implementing pagination or limiting results.

  3. Synchronous Blocking: Every storage operation blocks a thread waiting for async completion. For high-throughput scenarios, this might not scale well.

Documentation 📚

The PR description is excellent and clearly explains the changes. The code-level documentation is good in implementation files but sparse in the trait definition. Consider adding:

  • Architecture documentation explaining when to use each storage type
  • Configuration guide for environment variables
  • Migration guide for existing deployments
  • Performance tuning guide

Overall Assessment

This is a well-architected PR that makes meaningful improvements to Libra's storage layer. The core design is sound, and the implementation is mostly clean. However, the critical issues around error handling, search functionality in tiered mode, and configuration validation should be addressed before merging.

Recommendation: Approve with Changes Required

The automated bot reviews correctly identified several issues. My additional findings focus on architectural concerns and subtle bugs that could cause production issues.

Must Fix Before Merge:

Should Fix Before Merge:

Can Address in Follow-up:

Great work on this significant refactoring! The tiered storage architecture is a solid foundation for multi-cloud support. 🎉


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants