You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
Custom TableProvider implementations cannot currently be encoded as ExtensionTables in Substrait. The Substrait plan will only retain the table names, which are hardly ever enough to restore the table definition on the consumer side. For UDTFs in particular, the name is completely useless, as it's always tmp_table.
Describe the solution you'd like
Add two more methods in SerializerRegistry for serializing/deserializing TableSource instances and use these new extensions in to_substrait_plan and from_substrait_plan to let users encode/decode custom table definitions as ExtensionTables.
Describe alternatives you've considered
For the cases where the user controls the table name, one hideous workaround would be to encode the whole table definition in the table name and register a custom schema provider to decode it (e.g. some_catalog.custom_schema."base64(proto_binary)"). This is a horrible hack as it requires using those names in SQL queries and it doesn't work for UDTFs.
A far better alternative is to leverage the already supported Substrait extensions (in particular, ExtensionLeaf), by implementing the SerializerRegistry trait and forcing the table to fit into a UserDefinedLogicalNodes.
However, this approach is both limited and unnatural:
logical plans have to be preprocessed in order to replace TableScans with Extension nodes before converting to substrait
The logical plan resulting from decoding Substrait can only be executed if an ExtensionPlanner is registered for handling the user-defined nodes, but in this case it would not benefit from the special treatment that tables get in DataFusion (projections and filter pushdowns for scan, various knobs to instruct the engine about the table capabilities etc.). Rewriting the decoded plan to convert the user-defined node back to a TableScan is the only way to benefit from all that.
Substrait also encodes projections only for tables (i.e. ReadRels), so an ExtensionLeaf can't make use of that. Rewriting the Substrait plan itself to overcome this limitation is way more tedious than it should be.
Is your feature request related to a problem or challenge?
Custom TableProvider implementations cannot currently be encoded as ExtensionTables in Substrait. The Substrait plan will only retain the table names, which are hardly ever enough to restore the table definition on the consumer side. For UDTFs in particular, the name is completely useless, as it's always
tmp_table
.Describe the solution you'd like
Add two more methods in
SerializerRegistry
for serializing/deserializingTableSource
instances and use these new extensions into_substrait_plan
andfrom_substrait_plan
to let users encode/decode custom table definitions as ExtensionTables.Describe alternatives you've considered
For the cases where the user controls the table name, one hideous workaround would be to encode the whole table definition in the table name and register a custom schema provider to decode it (e.g.
some_catalog.custom_schema."base64(proto_binary)"
). This is a horrible hack as it requires using those names in SQL queries and it doesn't work for UDTFs.A far better alternative is to leverage the already supported Substrait extensions (in particular,
ExtensionLeaf
), by implementing theSerializerRegistry
trait and forcing the table to fit into aUserDefinedLogicalNode
s.However, this approach is both limited and unnatural:
TableScan
s withExtension
nodes before converting to substraitExtensionPlanner
is registered for handling the user-defined nodes, but in this case it would not benefit from the special treatment that tables get in DataFusion (projections and filter pushdowns for scan, various knobs to instruct the engine about the table capabilities etc.). Rewriting the decoded plan to convert the user-defined node back to a TableScan is the only way to benefit from all that.ReadRel
s), so an ExtensionLeaf can't make use of that. Rewriting the Substrait plan itself to overcome this limitation is way more tedious than it should be.Additional context
This should be a child issue of #13318.
The text was updated successfully, but these errors were encountered: