-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: full e2e pipeline #26
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
apache/datafusion#14330 Signed-off-by: Yuchen Liang <[email protected]>
07ec3e3
to
7f9449b
Compare
Signed-off-by: Yuchen Liang <[email protected]>
) -> Result<Vec<(PhysicalExpressionId, Arc<PhysicalExpression>)>>; | ||
|
||
/// Adds a physical expression to an existing group in the memo table. | ||
async fn add_physical_expr_to_group( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that it's possible for a physical expr not to belong to any group with the optd fork design, so probably we need to think twice about the physexpr related apis
} | ||
|
||
#[async_recursion] | ||
pub async fn match_any_partial_logical_plan( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider codegen this with the facilities in #25
} | ||
|
||
/// Creates a physical project operator. | ||
pub fn project<Relation, Scalar>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming: need to distinguish from logical project?
|
||
/// A scalar operator that adds two values. | ||
#[derive(Debug, Clone, PartialEq, Deserialize)] | ||
pub struct And<Scalar> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should collapse binary ops and logical ops into a single class like
pub struct LogicalOp<Scalar> {
type: Or/And
}
pub struct BinaryOp<Scalar> {
type: Add/Minus/...
}
which helps reduce the num of lines of code
/// Inserts an entry into the `physical_expressions` table. | ||
async fn insert_into_physical_expressions( | ||
txn: &mut SqliteConnection, | ||
logical_expr_id: PhysicalExpressionId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to physical_expr_id
@@ -544,7 +757,9 @@ const fn get_all_scalar_exprs_in_group_query() -> &'static str { | |||
" UNION ALL ", | |||
"SELECT scalar_expression_id, json_object('Add', json_object('left', left_group_id, 'right', right_group_id)) as data FROM scalar_adds WHERE group_id = $1", | |||
" UNION ALL ", | |||
"SELECT scalar_expression_id, json_object('Equal', json_object('left', left_group_id, 'right', right_group_id)) as data FROM scalar_equals WHERE group_id = $1" | |||
"SELECT scalar_expression_id, json_object('Equal', json_object('left', left_group_id, 'right', right_group_id)) as data FROM scalar_equals WHERE group_id = $1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought And
will be represented as a list of expressions instead of left/right
DFLogicalPlan::Join(join) => { | ||
let mut join_cond = Vec::new(); | ||
for (left, right) in &join.on { | ||
let left = Self::conv_df_to_optd_scalar(left, join.left.schema(), 0)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend make conversion + column ref rewrite into two passes because they are two different things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise every scalar conversion function has to take an offset with it. Or you can make such offset into the context.
pub mod converter; | ||
pub mod planner; | ||
|
||
pub async fn run_queries(queries: String) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider vendor datafusion-cli instead of implementing this by ourselves
} | ||
/// Utility function to create a session context for datafusion + optd. | ||
/// The allow deprecated is for the RuntimeConfig | ||
#[allow(deprecated)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use the new API from datafusion instead of marking allow deprecated
use optd_datafusion::run_queries; | ||
use std::io; | ||
|
||
#[tokio::main] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vendor datafusion-cli instead of implementing a less functional cli?
Problem
With the initial representation and storage added in #4 and #22, we now want to support the full pipeline going from parsing SQL, optimizing the plan using optd, and executing the query in Datafusion.
Summary of changes