Skip to content

Adding _dlt_load_id to Arrow fails when using adbc #3551

@KijkEr

Description

@KijkEr

dlt version

1.20.0

Describe the problem

I have a source which yields a PyArrow Table. I'm trying to load this data to an MSSQL destination using the adbc driver. I want to add the _dlt_load_id column to the data I'm loading. That's why I've added:

[normalize.parquet_normalizer]
add_dlt_load_id = true

to config.toml.
When I run the pipeline I get the following error:

adbc_driver_manager.NotSupportedError: NOT_IMPLEMENTED: [mssql] Unsupported bind parameter or bulk ingest type dictionary<values=utf8, indices=int8, ordered=false> for field _dlt_load_id

Expected behavior

I expect that the data gets loaded to a table in the destination, containing the _dlt_load_id column.

Steps to reproduce

You need a SQL Server Database or an Azure SQL Database as destination.
Add the adbc-driver-manager and dbc packages to the project

pip install adbc-driver-manager dbc
dbc install mssql

The following example demonstrates the issue:

import dlt
import polars as pl

CONN_STR = "mssql://username:password@host/database?trust_server_certificate=true"


def data():
    df = pl.from_dict({"a": [1, 2], "b": [3, 4]})
    yield df.to_arrow()


test_pipeline = dlt.pipeline(
    pipeline_name="test",
    destination=dlt.destinations.mssql(CONN_STR),
    dataset_name="landing",
    progress="log",
    dev_mode=True,
)

# Run
info = test_pipeline.run(data=data(), loader_file_format="parquet")

print(info)

Operating system

macOS

Runtime environment

Local

Python version

3.13

dlt data source

Azure SQL Database

dlt destination

No response

Other deployment details

No response

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions