Implement FFI table provider factory#20326
Conversation
timsaucer
left a comment
There was a problem hiding this comment.
Really nice work and I appreciate the code coverage!
I have a few comments, but nothing I see as really blocking. Thanks for the work!
|
|
||
| /// Used to create a clone of the factory. This should only need to be called | ||
| /// by the receiver of the factory. | ||
| pub clone: unsafe extern "C" fn(factory: &Self) -> Self, |
There was a problem hiding this comment.
minor: I've come to realize that many of the structs in this crate have these functions all pub but don't need to be. I'm doing drive by cleanup when I come across them.
| )); | ||
| column_defaults.insert(col_name.clone(), expr); | ||
| } | ||
|
|
There was a problem hiding this comment.
Within this function there's a lot of rresult_return! macro calls. I bet we can just move the bulk of this function into a helper that returns a Result<_, DataFusionError> and then do the rresult_return! one time.
More generally, this feels like it should be lifted into the proto crate, but I'm not familiar enough with the CreateExternalTable command to see if there is a difference in the usage here as opposed to in LogicalPlanNode::try_into_logical_plan.
There was a problem hiding this comment.
I eyeballed it and near as I can tell they were exactly the same. And to be more specific, we don't want there to be a difference so I just swapped to using the builtin versions.
Fixed here.
| session: &dyn Session, | ||
| cmd: &CreateExternalTable, | ||
| ) -> Result<Arc<dyn TableProvider>> { | ||
| let session = FFI_SessionRef::new(session, None, self.0.logical_codec.clone()); |
There was a problem hiding this comment.
Just to share more widely, when using FFI_SessionRef we do not maintain guarantees about lifetime when it gets shared over the FFI boundary. It is essential that your function here never keeps the session around any longer than it's valid. This looks like it's good to me.
There was a problem hiding this comment.
Hurms. Is that "essential" as in "UB if broken" or "we panic with a note that you can't keep a reference to FFI sessions" message. Because I'm pretty sure there's nothing stopping extensions from keeping references and how would that code even know the difference?
| let proto_cmd = CreateExternalTableNode { | ||
| name: Some(cmd.name.clone().into()), | ||
| location: cmd.location.clone(), | ||
| file_type: cmd.file_type.clone(), | ||
| schema: Some(cmd.schema.as_ref().try_into()?), | ||
| table_partition_cols: cmd.table_partition_cols.clone(), | ||
| if_not_exists: cmd.if_not_exists, | ||
| or_replace: cmd.or_replace, | ||
| temporary: cmd.temporary, | ||
| order_exprs: converted_order_exprs, | ||
| definition: cmd.definition.clone().unwrap_or_default(), | ||
| unbounded: cmd.unbounded, | ||
| options: cmd.options.clone(), | ||
| constraints: Some(cmd.constraints.clone().into()), | ||
| column_defaults: converted_column_defaults, | ||
| }; | ||
|
|
There was a problem hiding this comment.
Similar to the comment above, it feels like this should be lifted into a helper function and in the proto crate, but it's also a very real possibility there's something I'm missing in why it needs to be here.
There was a problem hiding this comment.
Also fixed by using the builtin AsLogicalPlan conversions.
Fixed here.
| async fn create( | ||
| &self, | ||
| _session: &dyn Session, | ||
| _cmd: &CreateExternalTable, |
There was a problem hiding this comment.
Since this PR does include the proto conversion back and forth of the command, we probably want to at least test this command in some way.
There was a problem hiding this comment.
I assume you mean since those conversions were reimplemented that we should test them. I've since reverted that and used the builtin ones so I assume you're ok with not adding a test for what I assume is already tested code.
| let ffi_factory = | ||
| FFI_TableProviderFactory::new(factory, None, task_ctx_provider, None); |
There was a problem hiding this comment.
We should overwrite the library marker method in this test so that it does create the Foreign struct.
There was a problem hiding this comment.
Fixed here.
I'd also note, that this actually does end up round tripping a CreateExternalCommand.
4912f5e to
92f352b
Compare
timsaucer
left a comment
There was a problem hiding this comment.
Thank you for the work on this! Much appreciated!
This PR is re-opening PR #17994 and updating it to match the current FFI approach (I.e., I made it look like the FFI_TableProvider in various places).
I've also added the integration tests as requested in the original PR.
Also, I'd like to thank @Weijun-H for the initial version of this PR as it simplified getting up to speed on the serialization logic that I hadn't encountered yet.