Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: introduce LogStore trait #1706

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions rust/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ compile_error!(

pub mod data_catalog;
pub mod errors;
pub mod logstore;
pub mod operations;
pub mod protocol;
pub mod schema;
Expand Down
36 changes: 36 additions & 0 deletions rust/src/logstore/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
//! Delta log store.
use crate::errors::DeltaResult;
use bytes::Bytes;

use crate::protocol::Action;

/// Trait for critical operations required to read and write commit entries in Delta logs.
///
/// The correctness is predicated on the atomicity and durability guarantees of
/// the implementation of this interface. Specifically,
///
/// - Atomic visibility: Any commit created via `write_commit_entry` must become visible atomically.
/// - Mutual exclusion: Only one writer must be able to create a commit for a specific version.
/// - Consistent listing: Once a commit entry for version `v` has been written, any future call to
/// `get_latest_version` must return a version >= `v`, i.e. the underlying file system entry must
/// become visible immediately.
#[async_trait::async_trait]
pub trait LogStore {
/// Read data for commit entry with the given version.
/// TODO: return the actual commit data, i.e. Vec<Action>, instead?
async fn read_commit_entry(&self, version: i64) -> DeltaResult<Bytes>;

/// Write list of actions as delta commit entry for given version.
///
/// This operation can be retried with a higher version in case the write
/// fails with `TransactionError::VersionAlreadyExists`.
async fn write_commit_entry(
&self,
version: i64,
actions: Vec<Action>,
overwrite: bool,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does overwrite do? Why is it necessary?(I would think overwriting a commit would be not compatible with consistency.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. I copied that over from LogStore::write from the reference implementation, but couldn't trace back any justification for the overwrite flag. The design doc linked above has the following to say:

If overwrite=true, then write directly into S3 with no DynamoDB interaction
else

which doesn't help me understand the "why" either. As you correctly point out, overwrite is violating any of the consistency the delta log is supposed to deliver in the first place.

If you don't see a use case / call to that method that would set overwrite = true, my proposal would be to drop that argument from write_commit_entry and potentially add it when we can justify its existence.

) -> DeltaResult<()>;

/// Find latest version currently stored in the delta log.
async fn get_latest_version(&self) -> DeltaResult<i64>;
}