-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] object store metastore #2309
base: main
Are you sure you want to change the base?
Conversation
be3e74c
to
638607a
Compare
ab8beca
to
2f76ae8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating our object store based metadata store @igalshilman :-) It looks really nice! The implementation looks correct to me :-)
I left a few minor comments and questions. The one thing that is not fully clear to me is whether we want to clean up older versions or not. If not, would this be problematic for the get_latest_version
which needs to list more and more versions?
|
||
let mut shutdown = pin::pin!(cancellation_watcher()); | ||
|
||
let delegate = match builder.build().await { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this await take a long time? If yes, then maybe select over the shutdown
to also support shutdowns during this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this await take a long time?
not any more.
} | ||
}; | ||
|
||
let mut refresh = tokio::time::interval(Duration::from_secs(2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably something we want to make configurable at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outdated, it doesn't exists any more.
|
||
#[derive(Debug, Clone)] | ||
pub struct Client { | ||
sender: tokio::sync::mpsc::UnboundedSender<Commands>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a bounded sender also work?
.map_err(|_| WriteError::Internal("Object store fetch channel ".into()))?; | ||
|
||
rx.await.map_err(|_| { | ||
WriteError::Internal("Object store fetch channel disconnected".to_string()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error messages seem to be copied from the get
call.
// serialize as json | ||
serde_json::to_vec(self).map(Into::into).map_err(Into::into) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the json encoding would be more helpful if the values were also encoded in json?
crates/core/src/metadata_store/providers/objstore/optimistic_store.rs
Outdated
Show resolved
Hide resolved
} | ||
|
||
#[async_trait::async_trait] | ||
impl MetadataStore for OptimisticLockingMetadataStore { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the OptimisticLockingMetadataStore
implements the MetadataStore
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, it is not any more.
|
||
pub struct OptimisticLockingMetadataStore { | ||
version_repository: Arc<dyn VersionRepository>, | ||
latest_store_cache: ArcSwap<Option<CachedStore>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this also be a CachedStore
(w/o ArcSwap<Option<>>
) and change the methods to take a mutable borrow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it won't implement the metastore trait then I can do that.
Let me try this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worked out beautifully 🧑🍳
// refresh and try again | ||
let delay = { | ||
let mut rng = rand::thread_rng(); | ||
rng.gen_range(50..300) | ||
}; | ||
tokio::time::sleep(Duration::from_millis(delay)).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be able to replace this with a RetryPolicy
.
|
||
match self | ||
.version_repository | ||
.try_create_version(new_store.current_version(), new_store_bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The correctness of the implementation relies on the fact that our version space won't have any wholes, right? I guess this makes cleaning older versions up a bit more tricky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe assert that new_store.current_version() == cached_store.store.version().next()
to state that there mustn't be any wholes.
6d99302
to
7397d7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might have a better path to implement this, using Object Versioning. A quick search shows that this is supported across MinIO / Azure / GCP, here's the S3 documentation: https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html.
Here's a sketch of the conditional PUT path:
async fn put(
&self,
key: ByteString,
value: VersionedValue,
precondition: Precondition,
) -> Result<(), WriteError> {
let key = object_store::path::Path::from(key.to_string());
match precondition {
Precondition::MatchesVersion(version) => {
// If we have a cached version, we can also add an if-modified-since condition to the GET.
// Unfortunately we can't set the version explicitly, and a HEAD request doesn't return enough
// information to determine our own metadata value version.
let get_result = self.object_store.get(&key).await.map_err(|e| {
WriteError::Internal(format!("Failed to check precondition: {}", e))
})?;
if extract_object_version(&get_result.payload) != version {
return Err(WriteError::FailedPrecondition(
"Version mismatch".to_string(),
));
}
self.object_store
.put_opts(
&key,
PutPayload::from_bytes(serialize_versioned_value(value)),
PutOptions::from(PutMode::Update(UpdateVersion::from(&get_result))),
)
.await
.map_err(|e| WriteError::Internal(format!("Failed to update value: {}", e)))?;
Ok(())
}
_ => todo!(),
}
}
Unfortunately we don't control the versions; in S3 they are a monotonic numbered sequence but we don't get to set them directly - rather, S3 will set them for us. Our own API relies on explicit versions so I'm assuming we'll just serialize them along with the rest of the payload as the object body.
A normal GET request implicitly returns the latest version, no further tricks required. It's also possible to request a particular previous version, but that's not needed for our API. Might be useful for troubleshooting though! (And I'd seriously consider serializing the values as JSON for operations friendliness.)
The only requirement for this all to work is that we must use a bucket with object versioning enabled, which is not the default but is trivial to set. S3 and other stores also support automatic cleanup of old versions using object lifecycle policies, so we don't need to implement that ourselves.
return Ok(()); | ||
}; | ||
|
||
if maybe_cached_version.is_none() || maybe_cached_version.unwrap() < global_latest_version { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the !is_some_and(...)
version personally.
let mut stream = self.object_store.list(None); | ||
|
||
// TODO: we should switch to a create internal Version | ||
let mut current = 0; | ||
while let Some(version) = stream | ||
.try_next() | ||
.await | ||
.map_err(|e| VersionRepositoryError::Network(e.into()))? | ||
{ | ||
if let Some(name) = version.location.filename() { | ||
if !name.starts_with("v") { | ||
continue; | ||
} | ||
if let Ok(v) = name[1..].parse::<u32>() { | ||
if v > current { | ||
current = v; | ||
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do this without LIST operations but rather using object versioning.
Wouldn't the |
No description provided.