Fix unionByName to properly handle missing columns from both DataFrames by mariotaddeucci · Pull Request #243 · duckdb/duckdb-python

mariotaddeucci · 2026-01-02T23:27:00Z

When allowMissingColumns=True, the method now correctly handles missing columns from both the left and right DataFrames by:

Adding missing columns from the right DataFrame to the left as NULL
Ensuring all columns from the left DataFrame are present in the right
Properly aligning column order to match Spark's behavior

This ensures the union result contains all columns from both DataFrames, with NULL values where columns are missing, matching PySpark behavior.

When allowMissingColumns=True, the method now correctly handles missing columns from both the left and right DataFrames by: - Adding missing columns from the right DataFrame to the left as NULL - Ensuring all columns from the left DataFrame are present in the right - Properly aligning column order to match Spark's behavior This ensures the union result contains all columns from both DataFrames, with NULL values where columns are missing, matching PySpark behavior.

Copilot

Pull request overview

This PR fixes the unionByName method to properly handle missing columns from both DataFrames when allowMissingColumns=True. Previously, the method only handled missing columns from the right DataFrame, but not from the left one.

Key Changes:

Updated the logic to add NULL columns for missing columns from both DataFrames
Column order now matches Spark's behavior by prioritizing the left DataFrame's schema
Added a test case to verify the reversed scenario works correctly

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
duckdb/experimental/spark/sql/dataframe.py	Rewrote the `unionByName` implementation to handle missing columns bidirectionally and align columns properly before performing the union
tests/fast/spark/test_spark_union_by_name.py	Added test case `test_union_by_name_allow_missing_cols_rev` to verify the fix works when the DataFrame with fewer columns is on the left side

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

duckdb/experimental/spark/sql/dataframe.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

evertlammerts · 2026-01-06T17:32:43Z

Can you fix the linting and formatting errors please? See https://duckdb.org/docs/stable/dev/building/python#3-enable-pre-commit-hooks for guidance.

evertlammerts

formatting

mariotaddeucci · 2026-01-09T02:03:00Z

Hey @evertlammerts, I belive the failing test is not related with this change due it’s is on pyarrow tests ans not on pyspark api.

evertlammerts · 2026-01-12T15:30:09Z

Thanks @mariotaddeucci !

Copilot AI review requested due to automatic review settings January 2, 2026 23:27

Copilot started reviewing on behalf of mariotaddeucci January 2, 2026 23:27 View session

Copilot AI reviewed Jan 2, 2026

View reviewed changes

duckdb/experimental/spark/sql/dataframe.py Outdated Show resolved Hide resolved

duckdb/experimental/spark/sql/dataframe.py Outdated Show resolved Hide resolved

mariotaddeucci and others added 3 commits January 2, 2026 20:33

Update duckdb/experimental/spark/sql/dataframe.py

b8767db

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update duckdb/experimental/spark/sql/dataframe.py

cabba0d

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Merge branch 'main' into fix-unionbyname-missing-columns

6794f0a

evertlammerts requested changes Jan 6, 2026

View reviewed changes

mariotaddeucci and others added 2 commits January 6, 2026 21:38

fix formatting

8956d0e

Merge branch 'main' into fix-unionbyname-missing-columns

654b5b5

mariotaddeucci requested a review from evertlammerts January 7, 2026 01:31

evertlammerts and others added 2 commits January 12, 2026 10:54

disable release on main

89ed9a1

Merge branch 'main' into fix-unionbyname-missing-columns

b84e4fe

evertlammerts changed the base branch from main to v1.5-variegata January 12, 2026 12:59

Merge branch 'v1.5-variegata' into fix-unionbyname-missing-columns

b94529d

evertlammerts merged commit 1b6480b into duckdb:v1.5-variegata Jan 12, 2026
15 checks passed

mariotaddeucci deleted the fix-unionbyname-missing-columns branch January 12, 2026 16:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unionByName to properly handle missing columns from both DataFrames#243

Fix unionByName to properly handle missing columns from both DataFrames#243
evertlammerts merged 9 commits intoduckdb:v1.5-variegatafrom
mariotaddeucci:fix-unionbyname-missing-columns

mariotaddeucci commented Jan 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

evertlammerts commented Jan 6, 2026

Uh oh!

evertlammerts left a comment

Uh oh!

mariotaddeucci commented Jan 9, 2026

Uh oh!

Uh oh!

evertlammerts commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mariotaddeucci commented Jan 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

evertlammerts commented Jan 6, 2026

Uh oh!

evertlammerts left a comment

Choose a reason for hiding this comment

Uh oh!

mariotaddeucci commented Jan 9, 2026

Uh oh!

Uh oh!

evertlammerts commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants