Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write Support #37

Open
ibotty opened this issue Jan 19, 2024 · 17 comments
Open

Write Support #37

ibotty opened this issue Jan 19, 2024 · 17 comments

Comments

@ibotty
Copy link

ibotty commented Jan 19, 2024

Is there write support on the roadmap? If so, is there an ETA? I'll have to plan a future project and would like to keep using duckdb.

@samansmink
Copy link
Collaborator

Hi @ibotty! It's something we want to add but we can not give an ETA currently

@ibotty
Copy link
Author

ibotty commented Jan 22, 2024

Thank you for the answer. I might consider duckdb when time comes.

@rustyconover
Copy link
Contributor

Incrementally, this is getting closer with the COPY function in DuckDB adding support for writing field id metadata for the columns.

There should be an iceberg_schema() function that will return the schema of a table and the appropriate field ids. Then you could write a Parquet file with the right field ids to be added to the Iceberg table.

@NielsKorschinsky
Copy link

Hi @rustyconover ,

that are great news!
Do I understand you correctly, that this iceberg_schema() function is already available (in a pre-release)?
What is the plan with writing the associated parquet file, could this also be done in sync with writing the schema file via duckdb?

@rustyconover
Copy link
Contributor

Hi @NielsKorschinsky,

At this current time the function iceberg_schema() doesn't exist in the extension, but I can see its usefulness to be along the road of writing changes to Iceberg tables.

Rusty

@NicolasPA
Copy link

NicolasPA commented Aug 24, 2024

I'd like to use DuckDB on Iceberg for my ELT SQL processing engine on cloud storage for our DWH migration, and I bet many more people do. So here's a more precise use case to motivate the prioritization of this feature.

@vitorcarra
Copy link

vitorcarra commented Sep 4, 2024

@NicolasPA thanks for raising this specific use case. I would like to see this feature available to implement the same as you have described. I am commenting here to emphasize how amazing would be to have such feature.

@NicolasPA
Copy link

NicolasPA commented Sep 26, 2024

Hi @samansmink, I believe this request is the most voted of all the DuckDB repos.
Would you have an updated ETA motivated by this demand?
We would like to know if we can count on DuckDB for our new data platform in a couple of months.

@samansmink
Copy link
Collaborator

@NicolasPA I can assure you that the interest for this feature has not gone unnoticed. However, at this time we do not have an ETA for this feature yet.

I hope you understand that as a relatively small non vc-backed company, we don't always have the resources to build everything we want right away. Especially in the case of a feature with the complexity of this issue.

For more info on our support policy, check out https://duckdblabs.com/news/2023/10/02/support-policy.html

@NicolasPA
Copy link

@samansmink, thank you for confirming you have been noticing the interest.
I fully understand the resources issue.
I don't want to put unnecessary pressure on you guys.
I think I was voicing the data architectures wet dreams I'm having with DuckDB lately, and hoping this kind of testimony will help you prioritize in the future.

@chlimaferreira
Copy link

I'm sure if duckdb writes in iceberg or delta it will become a big player in the market.

@ani-panda
Copy link

This issue is open for a very long time. I wonder if Tabular (https://www.tabular.io/) folks are doing anything? That said, Databricks acquisition of Tabular may not be motivating Databricks to cannibalize their own (spark) engine.

@soumilshah1995
Copy link

+1

2 similar comments
@pantonis
Copy link

+1

@thenaturalist
Copy link

+1

@2tony2
Copy link

2tony2 commented Dec 4, 2024

This is the biggest factor holding back using duckdb as an actual data warehouse replacement as far as I'm concerned.

@beeing
Copy link

beeing commented Dec 4, 2024

I noticed that there's https://github.com/apache/iceberg-rust which is under active development, perhaps it will be easier to wait for this library and integrate to this extension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests