Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core): Implement list with deleted and versions for gcs #5548

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hoslo
Copy link
Contributor

@hoslo hoslo commented Jan 15, 2025

Which issue does this PR close?

Part of #5496

Rationale for this change

Implement the gcs in RFC we need.

What changes are included in this PR?

Implement list with deleted for gcs services
Implement versions for gcs services

Are there any user-facing changes?

Users can now list deleted files and user versions from gcs services.

Copy link

codspeed-hq bot commented Jan 15, 2025

CodSpeed Performance Report

Merging #5548 will not alter performance

Comparing gcs-list-with-deleted (f712cf0) with main (ef94a68)

Summary

✅ 73 untouched benchmarks

@hoslo hoslo force-pushed the gcs-list-with-deleted branch 3 times, most recently from 2daea28 to 4e9999d Compare January 15, 2025 11:14
@@ -193,6 +197,10 @@ impl GcsCore {

let mut req = Request::get(&url);

if let Some(version) = args.version() {
req = req.header(constants::GENERATION, version);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I misread that, it's been fixed.

@@ -372,6 +384,10 @@ impl GcsCore {

let mut req = Request::get(&url);

if let Some(version) = args.version() {
req = req.header(constants::IF_GENERATION_MATCH, version);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IF_GENERATION_MATCH is used for conditional check, it's not the version.

@@ -397,6 +413,10 @@ impl GcsCore {

let mut req = Request::head(&url);

if let Some(version) = args.version() {
req = req.header(constants::X_GOOG_IF_GENERATION_MATCH, version);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same.

let mut req = Request::delete(&url);

if let Some(version) = args.version() {
req = req.header(constants::GENERATION, version);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generation is a query parameter.

@@ -534,6 +561,54 @@ impl GcsCore {
self.send(req).await
}

pub async fn gcs_list_object_versions(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output is the same for GCS. Perhaps we don't need to split it into two functions?

@hoslo hoslo force-pushed the gcs-list-with-deleted branch 2 times, most recently from 084af79 to c75ca31 Compare January 16, 2025 03:34
@hoslo hoslo force-pushed the gcs-list-with-deleted branch from c75ca31 to f712cf0 Compare January 16, 2025 04:10
@hoslo
Copy link
Contributor Author

hoslo commented Jan 22, 2025

@Xuanwo Any other questions?

@Xuanwo
Copy link
Member

Xuanwo commented Jan 22, 2025

@Xuanwo Any other questions?

There are two things in my mind now:

The frist, do we really need an extra GcsVersionsLister? It seems nearly the same with GcsLister.

Then, to support list_with_deleted, we may need to use gcs's softDeleted.

However, this API is

If true, only returns soft-deleted objects as part of the objects list response.

And

The softDeleted parameter can only be used successfully if the bucket has a soft delete policy. Otherwise, the request fails with a 400 Bad Request error with the reason invalidArgument. If true, versions cannot be set to true.

For GCS, list_with_deleted is a separate feature that needs to be enabled independently. I believe it would be better to remove this feature for now and reconsider it in the future.

@hoslo
Copy link
Contributor Author

hoslo commented Jan 22, 2025

There are two things in my mind now:

  1. GcsVersionsLister and GcsLister have different processing strategies.
  2. Gcs's Object Versioning can also control deletion, and is more in line with OpenDAL's semantics (softDeleted has an expiration time).

@Xuanwo
Copy link
Member

Xuanwo commented Jan 22, 2025

  1. GcsVersionsLister and GcsLister have different processing strategies.

Hi, s3 split them because we need two different API calls, requiring distinct parsing logic. However, GCS doesn’t have this issue; we can simply unify them based on different OpList inputs.

2. Gcs's Object Versioning can also control deletion, and is more in line with OpenDAL's semantics (softDeleted has an expiration time).

Oh, you are right. soft_delete is another new feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants