Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-level namespaces support in Iceberg rest catalog to allow direct Snowflake's Iceberg tables access #52451

Open
marc-marketparts opened this issue Oct 29, 2024 · 4 comments

Comments

@marc-marketparts
Copy link

Feature request

Is your feature request related to a problem? Please describe.
Snowflake's internal Iceberg catalog can be sync automatically to their Snowflake Open Catalog service (polaris rest catalog), which could enable direct access to tables from Starrocks and therefore drop any effort to transfer data from Snowflake to Starrocks.

However, unlike Mysql (and Starrocks ?) where database and schema are the same concept, Snowflake tables are stored in a 2 levels database structure, meaning each table is stored in a schema object that belongs to a database object (https://docs.snowflake.com/en/sql-reference/ddl-database).
Snowflake replicate this structure in its iceberg catalog through namespaces, so a table will always be available in the namespace "database.schema".

After having connected Starrocks to the Snowflake Open catalog successfully, I tried to access some tables but my attempts to select the right namespace or table through the commands USE database.schema or SELECT * FROM database.schema.table always fail.

So, Starrocks does not seem to handle multi-level namespaces.

Describe the solution you'd like
Allow full path object identifier in sql commands to match namespaces structure as spark does https://iceberg.apache.org/docs/1.6.0/spark-configuration/#using-catalogs.

At least, enable users to set the right namespace through the USE command.

Describe alternatives you've considered
Using an ETL alternative process to transfer data, that will require extra effort and cost and be error prone and inefficient for millions of rows.

@DorianZheng
Copy link
Contributor

DorianZheng commented Oct 30, 2024

@marc-marketparts Hi, thanks for filing this issue. Could you try

use `database.schema`

or

select * from `database.schema`.table

We have also encountered this issue in Unity Catalog and typically treat the entire namespace level as database objects that should work.

@marc-marketparts
Copy link
Author

Hi @DorianZheng

It does not work unfortunately, I get the error "Unknown database" for both.

I also tried with the official namespace separator %1F (use database%1Fschema) but it fails too.
https://github.com/apache/iceberg/blob/6319712b612b724fedbc5bed41942ac3426ffe48/open-api/rest-catalog-open-api.yaml#L225

@DorianZheng
Copy link
Contributor

Could you send me the fe.log

@marc-marketparts
Copy link
Author

Here is the fe log
fe.log

Queries are executed on mysql client after launching starrocks in local environment through "docker run -p 9030:9030 -p 8030:8030 -p 8040:8040 -itd starrocks/allin1-ubuntu:latest"

FYI, here is the postman result of GET {{catalog_uri}}/v1/SNOWFLAKE_PROD/namespaces/PROD_APP%1FPRSANA/tables

{
    "identifiers": [
        {
            "namespace": [
                "PROD_APP",
                "PRSANA"
            ],
            "name": "MY_TEST_TABLE_ICEBERG"
        }
    ],
    "next-page-token": null
}

The SQL "SHOW DATABASES" displays the first level of namespace only. I think it should also display recursively all levels, so we know what sub namespace are available, for example, "database1", "database2", "database1.schema1", "database1.schema2", etc., and then call USE command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants