feat: support custom AWS S3 Endpoint #895

aminya · 2024-10-08T09:09:26Z

This PR allows setting the AWS S3 endpoint URL. This allows for using S3-compatible services like Tigris or self-hosting the storage.

https://www.tigrisdata.com/docs/sdks/s3/aws-ruby-sdk/

config/initializers/s3.rb

ezekg · 2024-10-08T16:14:02Z

Can you add some background here? What's the use case? Why is this valuable to you?

aminya · 2024-10-09T06:35:13Z

Can you add some background here? What's the use case? Why is this valuable to you?

This allows using S3-compatible services like Tigris or self-host the storage.

https://www.tigrisdata.com/docs/sdks/s3/aws-ruby-sdk/

config/initializers/s3.rb

hrueger · 2024-11-22T22:57:51Z

Hi all,
we're also very interested in this. We've tried this PR, however, we have the problem that it automatically appends the bucket name as a subdomain layer:
storage.ourserver.com where we host a minio becomes keygen-bucket.storage.ourserver.com... For Minio it would have to be storage.ourserver.com/keygen-bucket` though. Any idea on why that happens?

Co-authored-by: Zeke Gabrielse <[email protected]>

aminya · 2024-11-22T23:08:53Z

Hi all, we're also very interested in this. We've tried this PR, however, we have the problem that it automatically appends the bucket name as a subdomain layer: storage.ourserver.com where we host a minio becomes keygen-bucket.storage.ourserver.com... For Minio it would have to be storage.ourserver.com/keygen-bucket` though. Any idea on why that happens?

It seems Minio isn't following the standard convention. It works with Tigris

lukas-runge · 2024-11-22T23:30:13Z

Hi all, we're also very interested in this. We've tried this PR, however, we have the problem that it automatically appends the bucket name as a subdomain layer: storage.ourserver.com where we host a minio becomes keygen-bucket.storage.ourserver.com... For Minio it would have to be storage.ourserver.com/keygen-bucket` though. Any idea on why that happens?

Using the flag force_path_style: true in the client config fixes this. @aminya maybe add the option to set this flag via env variable. 🙌

config/initializers/s3.rb

aminya · 2024-11-24T00:00:00Z

@lukas-runge Great, does this cover your usecase?
f9d387c (#895)

lukas-runge · 2024-11-24T02:11:33Z

@lukas-runge Great, does this cover your usecase? f9d387c (#895)

Yes - just tested it. 👍

Ty @aminya! 🙏

ezekg

First off, I appreciate the PR, and the discussion from everybody here.

As-is this PR works, but the implementation is not ideal, because it doesn't take into account existing users wanting to migrate from one storage provider to another, e.g. from S3 to Tigris. What I mean is that if an existing Keygen CE user with artifacts already stored in S3 wanted to move from S3 to Tigris, they couldn't do so safely without replicating their entire S3 bucket into Tigris, which can be very costly for large objects. Without the replication, existing artifacts would no longer be accessible, since they would exist in S3 but not Tigris, and the S3 backend now points to Tigris and not S3 (which is confusing unto itself when there is an R2 backend that is also S3-compatible). This also doesn't support scenarios where some accounts want to store artifacts in S3 while others may be ok with Tigris — with this PR, S3 points to Tigris, breaking accounts that want S3.

In the case of Keygen Cloud, when we moved from S3 to R2, we did so seamlessly, without replicating TBs of data from S3 to R2. We were able to do this because existing artifacts remained in S3, while new artifacts were stored in R2. And this is still the case to this day. Some accounts are still on S3, per their internal requirements, while most are on R2. I want the same story for Keygen CE/EE users moving from S3 or R2 to Tigris or Minio.

So with that said —

Rather than introduce customization/overloads for the S3 storage backend, e.g. endpoint and force_path_style (a config that the user has to understand), I'd rather we introduce dedicated Tigris and Minio backends. The Minio client can set force_path_style: true, so that it works out of the box without the user needing to know about that config.

Both new backends could be configured via TIGRIS_ and MINIO_ env vars, just like AWS_ and CF_ are used to configure S3 and R2, respectively. The code is already designed in a way to be extensible, to be able to add TIGRIS and MINIO, and even more storage providers in the future. Each backend would have its own specific environment variables for configuring the client so that it "Just Works."

In addition to the new Tigris and Minio backends, we should add a KEYGEN_STORAGE_BACKEND environment variable to set the default storage backend for new accounts, since there would be more than just the current S3 and R2 backends.

Migrating an existing account would be as simple as running:

account = Account.sole

account.update!(backend: 'TIGRIS')

Existing artifacts, e.g. in R2, would still be accessible, since artifacts have their own backend, and new artifacts would go to Tigris. No replication needed!

Would you be open to adjusting the PR?

aminya · 2024-11-25T00:01:56Z

The idea of having separate classes for each provider sounds good at the first glance, however it doesn't match the original promise of these services. As you see, Tigris simply needs providing the endpoint, and it uses the same AWS environment variables. When you set up a Fly.io Tigris machine, the exact environment variables I used in this code are set automatically by Fly. So if we change these names, it's gonna make the deployment harder not easier.

https://www.tigrisdata.com/docs/sdks/s3/aws-ruby-sdk/

Another thing is that we can't simply find all the S3 providers in the world and provide wrappers for them. For example, it's my first time hearing about Minio.

ezekg · 2024-11-25T03:20:08Z

I understand that's how S3-compatible services work, but what I want to avoid here is env vars like AWS_FORCE_PATH_STYLE_S3 and others like it. It's easier to add support for backends like Minio (which I'm also unfamiliar with) that have the force_path_style baked-in so that the end-user doesn't have to think about it. Like I said, a proper backend is best, because although something may be 'S3-compatible', that doesn't mean it's compatible with the current client options we have for the S3 backend.

I'm fine with introducing a single AWS_ENDPOINT env var (you can try AWS_ENDPOINT first and then fallback to AWS_ENDPOINT_URL_S3), for general compatibility with S3-compatible object stores in the simplest case, but I don't want to introduce anything else configurable via the environment. And like I said, I'd love to see a PR adding proper a Tigris backend that users can migration to and from, in a backwards-compatible way, just like the current S3 and R2 backends.

#895 (comment)

Quick follow up. I changed my mind.

After looking at what other open source projects do, I'm fine with introducing AWS_ENDPOINT (which can fallback to AWS_ENDPOINT_URL_S3) as well as AWS_FORCE_PATH_STYLE environment variables:

AWS_ACCESS_KEY_ID     = ENV['AWS_ACCESS_KEY_ID']
AWS_SECRET_ACCESS_KEY = ENV['AWS_SECRET_ACCESS_KEY']
AWS_BUCKET            = ENV['AWS_BUCKET']
AWS_REGION            = ENV['AWS_REGION']
AWS_ENDPOINT          = ENV['AWS_ENDPOINT'] || ENV['AWS_ENDPOINT_URL_S3']
AWS_FORCE_PATH_STYLE  = ENV['AWS_FORCE_PATH_STYLE'].in?(%w[true t 1])

AWS_CLIENT_OPTIONS = {
  access_key_id: AWS_ACCESS_KEY_ID,
  secret_access_key: AWS_SECRET_ACCESS_KEY,
  region: AWS_REGION,
  endpoint: AWS_ENDPOINT,
  force_path_style: AWS_FORCE_PATH_STYLE,
}.compact_blank
 .freeze

These seem to be the 2 most commonly used environment variables for configuring S3-compatible object stores. For users that want proper support for a Tigris/Minio/etc. backend, they can implement support in a separate PR.

Thanks for the discussion here, and for your patience. Would you be open to updating the PR, @aminya?

feat: support custom AWS S3 Endpoint

fbd9cfa

ezekg reviewed Oct 8, 2024

View reviewed changes

config/initializers/s3.rb Outdated Show resolved Hide resolved

fix: remove nil values from the AWS SDK options

c5a45a3

ezekg reviewed Oct 16, 2024

View reviewed changes

config/initializers/s3.rb Outdated Show resolved Hide resolved

ezekg reviewed Oct 16, 2024

View reviewed changes

config/initializers/s3.rb Show resolved Hide resolved

fix: replace compact! With compact

ad61997

Co-authored-by: Zeke Gabrielse <[email protected]>

aminya commented Nov 23, 2024

View reviewed changes

config/initializers/s3.rb Show resolved Hide resolved

config/initializers/s3.rb Outdated Show resolved Hide resolved

feat: support setting force_path_style for AWS S3 SDK

f9d387c

ezekg requested changes Nov 24, 2024

View reviewed changes

This comment was marked as outdated.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support custom AWS S3 Endpoint #895

feat: support custom AWS S3 Endpoint #895

aminya commented Oct 8, 2024 •

edited

Loading

ezekg commented Oct 8, 2024

aminya commented Oct 9, 2024

hrueger commented Nov 22, 2024 •

edited

Loading

aminya commented Nov 22, 2024

lukas-runge commented Nov 22, 2024

aminya commented Nov 24, 2024

lukas-runge commented Nov 24, 2024

ezekg left a comment •

edited

Loading

aminya commented Nov 25, 2024 •

edited

Loading

This comment was marked as outdated.

ezekg commented Nov 25, 2024 •

edited

Loading

feat: support custom AWS S3 Endpoint #895

Are you sure you want to change the base?

feat: support custom AWS S3 Endpoint #895

Conversation

aminya commented Oct 8, 2024 • edited Loading

ezekg commented Oct 8, 2024

aminya commented Oct 9, 2024

hrueger commented Nov 22, 2024 • edited Loading

aminya commented Nov 22, 2024

lukas-runge commented Nov 22, 2024

aminya commented Nov 24, 2024

lukas-runge commented Nov 24, 2024

ezekg left a comment • edited Loading

Choose a reason for hiding this comment

aminya commented Nov 25, 2024 • edited Loading

This comment was marked as outdated.

ezekg commented Nov 25, 2024 • edited Loading

aminya commented Oct 8, 2024 •

edited

Loading

hrueger commented Nov 22, 2024 •

edited

Loading

ezekg left a comment •

edited

Loading

aminya commented Nov 25, 2024 •

edited

Loading

ezekg commented Nov 25, 2024 •

edited

Loading