You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Delta-rs version:
0.21.0
0.20.0
0.19.0
I can't test with 0.18.0
In 0.17.0 it works fine
Binding:
Python
Environment:
Cloud provider:
Local and S3
OS:
MacOS and Amazon Linux
Other:
Bug
What happened:
When overwriting a table all the schema gets rewritten (already reported here #2923) AND I think because of how json metadata is encoded/decoded, all \ characters get escaped again (these characters come from Spark comments/metadata for example, or my own comments)
One of my "development" tables json files grew to 350mb, now delta can't scan them anymore (thrift buffer size limits :) )
What you expected to happen:
When rewriting metadata, no extra escape characters should be added again
TinoSM
changed the title
Delta Table written with rust
_delta_log of table written with rust engine+overwrite grows and grows (upto 350mb per file)
Nov 19, 2024
TinoSM
changed the title
_delta_log of table written with rust engine+overwrite grows and grows (upto 350mb per file)
>0.17.0 _delta_log of table written with rust engine+overwrite grows and grows (upto 350mb per file)
Nov 19, 2024
TinoSM
changed the title
>0.17.0 _delta_log of table written with rust engine+overwrite grows and grows (upto 350mb per file)
>0.17.0 _delta_log gets corrupted after overwrite (log files grows and grows upto 350mb per file)
Nov 20, 2024
Environment
Delta-rs version:
0.21.0
0.20.0
0.19.0
I can't test with 0.18.0
In 0.17.0 it works fine
Binding:
Python
Environment:
Bug
What happened:
When overwriting a table all the schema gets rewritten (already reported here #2923) AND I think because of how json metadata is encoded/decoded, all \ characters get escaped again (these characters come from Spark comments/metadata for example, or my own comments)
One of my "development" tables json files grew to 350mb, now delta can't scan them anymore (thrift buffer size limits :) )
What you expected to happen:
When rewriting metadata, no extra escape characters should be added again
How to reproduce it:
I'm sorry but I can only test with polars :(
https://docs.pola.rs/api/python/stable/reference/api/polars.DataFrame.write_delta.html
More details:
test_table.zip contains the delta table with active+id columns, empty.
test_table_broken.zip contains the tables with many \\\
Image with cat 00008.json and 0000.json, see how the \\ grew
test_table_broken.zip
test_table.zip
The text was updated successfully, but these errors were encountered: