Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to AWS Java SDKs s3Client.doesBucketExistV2(bucketName) call #8187

Open
tpang-cxl opened this issue Sep 18, 2024 · 2 comments
Open

Comments

@tpang-cxl
Copy link

Currently, we use Apache's Dolphin Scheduler for scheduling our pipelines. But when set the S3 endpoint to our LakeFS server, we get an error as:

com.amazonaws.services.s3.model.AmazonS3Exception: This operation is not supported in LakeFS (Service: Amazon S3; Status Code: 405; Error Code: ERRLakeFSNotSupported; Request ID: 40a1e78b-f23e-4b04-8294-4338345d7c74; S3 Extended Request ID: CB07BFE5E44B0E5F; Proxy: null)

This error doesn't happen if we switch our storage directly to MinIO. We are now using this workaround. However, this workaround is not desirable as we don't wan to expose our MinIO storage directly (bypassing LakeFS) to our API calls. If LakeFS can support this call, we can disallow direct access of MinIO again

@itaiad200
Copy link
Contributor

Root cause is probably as described here

@arielshaqed
Copy link
Contributor

Root cause is probably as described here

If true, that root cause says the issue is the getBucketAcl call. Uggh.

The doesBucketExistV2 docs also state that it performs this call.

So I would like to scope this clearly: we can add a workaround to cause doesBucketExistV2 of the deprecated AWS Java SDK v1 to work. It will involve sending a minimal response to getBucketAcl back to the client. Of course that response may cause other S3 clients to give strange results when used with lakeFS, but I feel reasonably confident that a response "this bucket allows anything to authorized users" or similar will give good results.

The AWS Java SDK v2 does not have such a method, and instead says to use headBucket. So we believe that code using the v2 SDK will work today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants