Skip to content

Comments

Implement FFI table provider factory#20326

Merged
timsaucer merged 6 commits intoapache:mainfrom
davisp:17942-ffi-table-provider-factory
Feb 21, 2026
Merged

Implement FFI table provider factory#20326
timsaucer merged 6 commits intoapache:mainfrom
davisp:17942-ffi-table-provider-factory

Conversation

@davisp
Copy link
Member

@davisp davisp commented Feb 12, 2026

Which issue does this PR close?

This PR is re-opening PR #17994 and updating it to match the current FFI approach (I.e., I made it look like the FFI_TableProvider in various places).

Rationale for this change

Expose TableProviderFactory via FFI to enable external languages (e.g., Python) to implement custom table provider factories and extend DataFusion with new data source types.

What changes are included in this PR?

  • Added datafusion/ffi/src/table_provider_factory.rs with:

    • FFI_TableProviderFactory: Stable C ABI struct with function pointers for create, clone, release, and version
    • ForeignTableProviderFactory: Wrapper implementing TableProviderFactory trait

Are these changes tested?

Yes

I've also added the integration tests as requested in the original PR.

Are there any user-facing changes?

Yes - new FFI API that enables custom TableProviderFactory implementations in foreign languages. This is an additive change with no breaking changes to existing APIs.

Also, I'd like to thank @Weijun-H for the initial version of this PR as it simplified getting up to speed on the serialization logic that I hadn't encountered yet.

@github-actions github-actions bot added the ffi Changes to the ffi crate label Feb 12, 2026
@timsaucer timsaucer self-requested a review February 12, 2026 21:36
Copy link
Member

@timsaucer timsaucer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work and I appreciate the code coverage!

I have a few comments, but nothing I see as really blocking. Thanks for the work!


/// Used to create a clone of the factory. This should only need to be called
/// by the receiver of the factory.
pub clone: unsafe extern "C" fn(factory: &Self) -> Self,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I've come to realize that many of the structs in this crate have these functions all pub but don't need to be. I'm doing drive by cleanup when I come across them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here.

));
column_defaults.insert(col_name.clone(), expr);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within this function there's a lot of rresult_return! macro calls. I bet we can just move the bulk of this function into a helper that returns a Result<_, DataFusionError> and then do the rresult_return! one time.

More generally, this feels like it should be lifted into the proto crate, but I'm not familiar enough with the CreateExternalTable command to see if there is a difference in the usage here as opposed to in LogicalPlanNode::try_into_logical_plan.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I eyeballed it and near as I can tell they were exactly the same. And to be more specific, we don't want there to be a difference so I just swapped to using the builtin versions.

Fixed here.

session: &dyn Session,
cmd: &CreateExternalTable,
) -> Result<Arc<dyn TableProvider>> {
let session = FFI_SessionRef::new(session, None, self.0.logical_codec.clone());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to share more widely, when using FFI_SessionRef we do not maintain guarantees about lifetime when it gets shared over the FFI boundary. It is essential that your function here never keeps the session around any longer than it's valid. This looks like it's good to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hurms. Is that "essential" as in "UB if broken" or "we panic with a note that you can't keep a reference to FFI sessions" message. Because I'm pretty sure there's nothing stopping extensions from keeping references and how would that code even know the difference?

Comment on lines 346 to 362
let proto_cmd = CreateExternalTableNode {
name: Some(cmd.name.clone().into()),
location: cmd.location.clone(),
file_type: cmd.file_type.clone(),
schema: Some(cmd.schema.as_ref().try_into()?),
table_partition_cols: cmd.table_partition_cols.clone(),
if_not_exists: cmd.if_not_exists,
or_replace: cmd.or_replace,
temporary: cmd.temporary,
order_exprs: converted_order_exprs,
definition: cmd.definition.clone().unwrap_or_default(),
unbounded: cmd.unbounded,
options: cmd.options.clone(),
constraints: Some(cmd.constraints.clone().into()),
column_defaults: converted_column_defaults,
};

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the comment above, it feels like this should be lifted into a helper function and in the proto crate, but it's also a very real possibility there's something I'm missing in why it needs to be here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fixed by using the builtin AsLogicalPlan conversions.

Fixed here.

async fn create(
&self,
_session: &dyn Session,
_cmd: &CreateExternalTable,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this PR does include the proto conversion back and forth of the command, we probably want to at least test this command in some way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you mean since those conversions were reimplemented that we should test them. I've since reverted that and used the builtin ones so I assume you're ok with not adding a test for what I assume is already tested code.

Comment on lines 428 to 429
let ffi_factory =
FFI_TableProviderFactory::new(factory, None, task_ctx_provider, None);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should overwrite the library marker method in this test so that it does create the Foreign struct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here.

I'd also note, that this actually does end up round tripping a CreateExternalCommand.

@davisp davisp force-pushed the 17942-ffi-table-provider-factory branch from 4912f5e to 92f352b Compare February 20, 2026 21:27
@davisp davisp requested a review from timsaucer February 20, 2026 21:39
Copy link
Member

@timsaucer timsaucer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the work on this! Much appreciated!

@timsaucer timsaucer added this pull request to the merge queue Feb 21, 2026
@timsaucer timsaucer removed this pull request from the merge queue due to a manual request Feb 21, 2026
@timsaucer timsaucer changed the title 17942 ffi table provider factory Implement FFI table provider factory Feb 21, 2026
@timsaucer timsaucer added this pull request to the merge queue Feb 21, 2026
Merged via the queue into apache:main with commit 0d63ced Feb 21, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ffi Changes to the ffi crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

expose TableProviderFactory via FFI

3 participants