REST Catalog S3 Signer Endpoint should be Catalog specific #11608

c-thiel · 2024-11-20T23:08:21Z

Apache Iceberg version

1.7.0 (latest release)

Query engine

Spark

Please describe the bug 🐞

Currently when configuring two REST catalogs in spark, the s3.signer.uri of the first catalog is used also for the second catalog.

During initial connect to the REST catalog, the catalog may return a s3.signer.uri attribute as part of the overrides of the /v1/config endpoint. This property seems to be set globally for the spark session. Whichever catalog I use first, the sign request for the second catalog is sent to the sign endpoint of the first. Using each catalog separately works perfectly fine.

I tested with one Lakekeeper where we use different sign endpoints for each warehouse as well as with two Nessies. Warehouses share the same bucket but use different path prefixes in my tests.

My spark configuration looks like this:

    "spark.sql.catalog.catalog1": "org.apache.iceberg.spark.SparkCatalog",
    "spark.sql.catalog.catalog1.type": "rest",
    "spark.sql.catalog.catalog1.uri": CATALOG_1_URL,
    "spark.sql.catalog.catalog1.warehouse": "warehouse_1",
    "spark.sql.catalog.catalog1.io-impl": "org.apache.iceberg.aws.s3.S3FileIO",
    "spark.sql.catalog.catalog1.s3.remote-signing-enabled": "true",
    "spark.sql.catalog.catalog2": "org.apache.iceberg.spark.SparkCatalog",
    "spark.sql.catalog.catalog2.type": "rest",
    "spark.sql.catalog.catalog2.uri": CATALOG_2_URL,
    "spark.sql.catalog.catalog2.warehouse": "warehouse_2",
    "spark.sql.catalog.catalog2.io-impl": "org.apache.iceberg.aws.s3.S3FileIO",
    "spark.sql.catalog.catalog1.s3.remote-signing-enabled": "true",

If required, I can add a docker compose example as well.
If someone could point me into the right direction, I might be able to create a fix PR.

Willingness to contribute

I can contribute a fix for this bug independently
I would be willing to contribute a fix for this bug with guidance from the Iceberg community
I cannot contribute a fix for this bug at this time

The text was updated successfully, but these errors were encountered:

c-thiel · 2024-11-21T07:42:26Z

This is not only a problem with spark but at least also affects starrocks.
According to a user on our discord we see the same behavior as I describe for spark above:

I can confirm that both catalogs (lake and lake2) work perfectly fine when set up and used individually in StarRocks. I can create tables, insert data, and query without any issues when only one catalog is active at a time.

However, the problem arises when both catalogs are configured simultaneously. At that point, operations on the second catalog (like INSERT) fail.

c-thiel added the bug Something isn't working label Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REST Catalog S3 Signer Endpoint should be Catalog specific #11608

REST Catalog S3 Signer Endpoint should be Catalog specific #11608

c-thiel commented Nov 20, 2024 •

edited

Loading

c-thiel commented Nov 21, 2024

REST Catalog S3 Signer Endpoint should be Catalog specific #11608

REST Catalog S3 Signer Endpoint should be Catalog specific #11608

Comments

c-thiel commented Nov 20, 2024 • edited Loading

Apache Iceberg version

Query engine

Please describe the bug 🐞

Willingness to contribute

c-thiel commented Nov 21, 2024

c-thiel commented Nov 20, 2024 •

edited

Loading