Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[substrait] Add support for ExtensionTable #13772

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ccciudatu
Copy link
Contributor

@ccciudatu ccciudatu commented Dec 13, 2024

Which issue does this PR close?

Closes #13771.
Addresses the first bullet in #13318.

Rationale for this change

Adds support for encoding/decoding custom table providers as ExtensionTables in Substrait.

What changes are included in this PR?

Two more methods for SerializerRegistry to handle TableSources and the necessary changes in from/to _substrait_plan to transparently map custom tables to ExtensionTable nodes.

Are these changes tested?

A round trip test is included.

Are there any user-facing changes?

No breaking changes, only a couple of convenience changes, such as default implementations for trait methods.

@github-actions github-actions bot added logical-expr Logical plan and expressions substrait labels Dec 13, 2024
@ccciudatu ccciudatu force-pushed the substrait-extension-tables branch from a65e926 to bac38cc Compare December 13, 2024 22:05
@github-actions github-actions bot added the core Core DataFusion crate label Dec 14, 2024
"Deserializing user defined logical plan node `{name}` is not supported"
)
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SerializerRegistry trait now has two more methods for handling tables (with default implementations for backwards compatibility), so it makes sense for the existing methods to have default implementations as well.
This will allow implementors to conveniently implement the trait for user-defined logical nodes only or for tables only.
Since the implementations here are perfect as trait defaults, this PR just moves them into the trait itself.

@ccciudatu ccciudatu force-pushed the substrait-extension-tables branch from 8db92b8 to a12007b Compare December 16, 2024 14:16
@@ -994,8 +994,34 @@ pub async fn from_substrait_rel(
)
.await
}
_ => {
not_impl_err!("Unsupported ReadType: {:?}", &read.as_ref().read_type)
Some(ReadType::ExtensionTable(ext)) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't actually get to handling ExtensionTables in #13803, but I think I've got most what we would need in place for it.

The way I would approach it, based on the work in that PR, is to add a method to the SubstraitConsumer trait like:

async fn consume_extension_table(&self, extension_table: &ExtensionTable) -> Result<LogicalPlan>

and wire it in here. Then, as a user you would be able to provide your own implementation of the decoder. This might user the SerializerRegistry, but it doesn't necessarily need to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great! I like where that is going.
My goal here was to add the missing support for reading & writing extension tables leveraging only what's available (to keep the patch as small as possible).
But I do agree that uniform handling of Substrait extensions would make more sense and would overcome some of the current limitations.

I think the code here is really easy to migrate to the new SubstraitConsumer (and SubstraitProducer?) interfaces once they're available, by just replacing the SerializerRegistry calls with the new dedicated read/write methods. But the rest would be mostly unchanged by this migration.
FWIW, once your refactoring is complete, I think there would be no place for SerializerRegistry anymore, and it should be removed (or at least deprecated).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and SubstraitProducer?)

I haven't gotten around to the producer yet, but if the SubstraitConsumer pattern makes sense to folks it should be easy enough to hammer it out.

I think the code here is really easy to migrate to the new SubstraitConsumer

Would you be open to having #13803 merged first, and then porting your code over?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can do that.

@alamb
Copy link
Contributor

alamb commented Dec 21, 2024

@alamb alamb marked this pull request as draft December 21, 2024 02:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions substrait
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[substrait] Add support for ExtensionTable
3 participants