Request for DuckDB Support for Querying Delta 4.0 VariantType #126

soumilshah1995 · 2024-12-04T01:43:08Z

Hello,
Delta Lake 4.0 introduces support for a new VariantType (as outlined in the Delta 4.0 blog), and we’re excited about its potential! Specifically, Spark 4.0 now supports writing VariantType data to Parquet files, which opens up new possibilities for managing complex nested data.
We have a use case where we would like to query this VariantType from DuckDB.
Example:
Here is a simple Spark 4.0 Python code snippet that creates a DataFrame with a complex JSON string, converts it to a VariantType column, and writes it to a Parquet file:


# write.py
from pyspark.sql import SparkSession
from pyspark.sql.functions import parse_json

# Initialize Spark session with Spark 4.0
spark = SparkSession.builder \
    .appName("VariantTypeSupportApp") \
    .getOrCreate()

# Create a DataFrame with a complex JSON string
df1 = spark.createDataFrame([{
    'json_string': '''{
        "user": {
            "name": "Alice",
            "age": 30,
            "hobbies": ["reading", "swimming", "hiking"],
            "address": {
                "city": "Wonderland",
                "zip": "12345"
            }
        },
        "status": "active",
        "tags": ["admin", "user", "editor"]
    }'''
}])

# Parse the JSON string to a VariantType column
df2 = df1.select(
    parse_json(df1.json_string).alias("json_var")
)

# Write the DataFrame with VariantType to Parquet
df2.write.mode("append").parquet("variant_data.parquet")

print("Data written to Parquet file: variant_data.parquet")

Request:
We would love to have the ability to query this VariantType data in DuckDB. Since DuckDB already supports a variety of complex data types and Parquet files, adding support for VariantType would help improve interoperability and allow users to seamlessly analyze data written by Delta 4.0 and Spark 4.0.
We look forward to hearing your thoughts on adding this feature!
Thanks,

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for DuckDB Support for Querying Delta 4.0 VariantType #126

Request for DuckDB Support for Querying Delta 4.0 VariantType #126

soumilshah1995 commented Dec 4, 2024

Request for DuckDB Support for Querying Delta 4.0 VariantType #126

Request for DuckDB Support for Querying Delta 4.0 VariantType #126

Comments

soumilshah1995 commented Dec 4, 2024