You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
Delta Lake 4.0 introduces support for a new VariantType (as outlined in the Delta 4.0 blog), and we’re excited about its potential! Specifically, Spark 4.0 now supports writing VariantType data to Parquet files, which opens up new possibilities for managing complex nested data.
We have a use case where we would like to query this VariantType from DuckDB.
Example:
Here is a simple Spark 4.0 Python code snippet that creates a DataFrame with a complex JSON string, converts it to a VariantType column, and writes it to a Parquet file:
# write.py
from pyspark.sql import SparkSession
from pyspark.sql.functions import parse_json
# Initialize Spark session with Spark 4.0
spark = SparkSession.builder \
.appName("VariantTypeSupportApp") \
.getOrCreate()
# Create a DataFrame with a complex JSON string
df1 = spark.createDataFrame([{
'json_string': '''{
"user": {
"name": "Alice",
"age": 30,
"hobbies": ["reading", "swimming", "hiking"],
"address": {
"city": "Wonderland",
"zip": "12345"
}
},
"status": "active",
"tags": ["admin", "user", "editor"]
}'''
}])
# Parse the JSON string to a VariantType column
df2 = df1.select(
parse_json(df1.json_string).alias("json_var")
)
# Write the DataFrame with VariantType to Parquet
df2.write.mode("append").parquet("variant_data.parquet")
print("Data written to Parquet file: variant_data.parquet")
Request:
We would love to have the ability to query this VariantType data in DuckDB. Since DuckDB already supports a variety of complex data types and Parquet files, adding support for VariantType would help improve interoperability and allow users to seamlessly analyze data written by Delta 4.0 and Spark 4.0.
We look forward to hearing your thoughts on adding this feature!
Thanks,
The text was updated successfully, but these errors were encountered:
Hello,
Delta Lake 4.0 introduces support for a new VariantType (as outlined in the Delta 4.0 blog), and we’re excited about its potential! Specifically, Spark 4.0 now supports writing VariantType data to Parquet files, which opens up new possibilities for managing complex nested data.
We have a use case where we would like to query this VariantType from DuckDB.
Example:
Here is a simple Spark 4.0 Python code snippet that creates a DataFrame with a complex JSON string, converts it to a VariantType column, and writes it to a Parquet file:
Request:
We would love to have the ability to query this VariantType data in DuckDB. Since DuckDB already supports a variety of complex data types and Parquet files, adding support for VariantType would help improve interoperability and allow users to seamlessly analyze data written by Delta 4.0 and Spark 4.0.
We look forward to hearing your thoughts on adding this feature!
Thanks,
The text was updated successfully, but these errors were encountered: