Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't iceberg_scan specific manifest version from S3: No such file or directory #62

Open
ssanchozz opened this issue Aug 2, 2024 · 3 comments

Comments

@ssanchozz
Copy link

I'm trying to use iceberg extension to read iceberg data from S3.
As a test data I'm using an example attached to the iceberg extension doc page.
As s3 storage I'm using MinIO in docker.
I execute my query from java code, using org.duckdb:duckdb_jdbc:1.0.0.

  1. When I query the whole table with allow_moved_paths = true, the tool works fine.

  2. When I query a specific metadata (e.g. v1), following the example from the doc I get an error:

java.sql.SQLException: IO Error: Cannot open file "lineitem_iceberg/metadata/snap-3776207205136740581-1-cf3d0be5-cf70-453d-ad8f-48fdc412e608.avro": No such file or directory
java.sql.SQLException: java.sql.SQLException: IO Error: Cannot open file "lineitem_iceberg/metadata/snap-3776207205136740581-1-cf3d0be5-cf70-453d-ad8f-48fdc412e608.avro": No such file or directory

Query looks like this:
SELECT * FROM iceberg_scan('s3://bucketname/lineitem_iceberg/metadata/v1.metadata.json');

  1. If I try to use another example:
    SELECT * FROM iceberg_scan('s3://bucketname/lineitem_iceberg/metadata/02701-1e474dc7-4723-4f8d-a8b3-b5f0454eb7ce.metadata.json', allow_moved_paths = true);

I get an exception: Enabling allow_moved_paths is not enabled for directly scanning metadata files., because of this line.

The questions are:

  1. Why do we prohibit to use allow_moved_paths when querying specific version of metadata? Maybe we can remove this check and allow allow_moved_paths?
  2. Any other idea what's wrong here and how can we fix?
@mike-luabase
Copy link

does this work for you locally?

@ssanchozz
Copy link
Author

ssanchozz commented Aug 3, 2024

does this work for you locally?

What do you mean by locally? I'm running this locally, but querying data which is in minIO in docker on local machine.

However I've tried to do the same, storing the data on the local filesystem and querying it and got the same error:

java.sql.SQLException: IO Error: Cannot open file "lineitem_iceberg/metadata/snap-3776207205136740581-1-cf3d0be5-cf70-453d-ad8f-48fdc412e608.avro": No such file or directory

The query I've used for this is:
SELECT count(*) FROM iceberg_scan('/<absolute_path_on_my_local_machine>/lineitem_iceberg/metadata/v1.metadata.json');

And if I query like this, it works fine:
SELECT count(*) FROM iceberg_scan('/<absolute_path_on_my_local_machine>/lineitem_iceberg', allow_moved_paths = true);

@fabito
Copy link

fabito commented Nov 28, 2024

I was facing the same issue.
Creating a secret solved the issue:

CREATE SECRET secret3 (
     TYPE S3,
     PROVIDER CREDENTIAL_CHAIN,
     CHAIN 'env;config'
 );

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants