Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: full e2e pipeline #26

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

feat: full e2e pipeline #26

wants to merge 13 commits into from

Conversation

yliang412
Copy link
Member

@yliang412 yliang412 commented Feb 11, 2025

Problem

With the initial representation and storage added in #4 and #22, we now want to support the full pipeline going from parsing SQL, optimizing the plan using optd, and executing the query in Datafusion.

Summary of changes

SarveshOO7 and others added 9 commits February 10, 2025 12:53
@codecov-commenter
Copy link

codecov-commenter commented Feb 11, 2025

Codecov Report

Attention: Patch coverage is 88.73500% with 122 lines in your changes missing coverage. Please review.

Project coverage is 83.5%. Comparing base (44d26e2) to head (cd7ef16).

Files with missing lines Patch % Lines
optd-datafusion/src/converter/into_optd.rs 69.1% 37 Missing ⚠️
optd-core/src/storage/memo.rs 91.0% 16 Missing ⚠️
optd-datafusion/src/converter/from_optd.rs 90.4% 12 Missing ⚠️
optd-datafusion/src/lib.rs 87.8% 10 Missing ⚠️
optd-core/src/operators/relational/physical/mod.rs 0.0% 8 Missing ⚠️
optd-core/src/values/mod.rs 53.3% 7 Missing ⚠️
...d-core/src/operators/relational/logical/project.rs 33.3% 6 Missing ⚠️
...-core/src/operators/relational/physical/project.rs 33.3% 6 Missing ⚠️
optd-datafusion/src/main.rs 80.7% 5 Missing ⚠️
optd-core/src/cascades/mod.rs 98.8% 4 Missing ⚠️
... and 4 more
Additional details and impacted files
Files with missing lines Coverage Δ
...c/operators/relational/physical/scan/table_scan.rs 50.0% <ø> (+50.0%) ⬆️
optd-core/src/test_utils.rs 100.0% <100.0%> (ø)
optd-datafusion/src/converter/mod.rs 100.0% <100.0%> (ø)
optd-core/src/operators/relational/logical/mod.rs 78.4% <77.7%> (-0.2%) ⬇️
optd-core/src/operators/scalar/mod.rs 70.3% <75.0%> (+0.8%) ⬆️
optd-core/src/operators/scalar/and.rs 50.0% <50.0%> (ø)
optd-core/src/cascades/mod.rs 98.9% <98.8%> (-0.3%) ⬇️
optd-datafusion/src/planner.rs 94.5% <94.5%> (ø)
optd-datafusion/src/main.rs 80.7% <80.7%> (ø)
...d-core/src/operators/relational/logical/project.rs 33.3% <33.3%> (ø)
... and 7 more

... and 3 files with indirect coverage changes

Signed-off-by: Yuchen Liang <[email protected]>
) -> Result<Vec<(PhysicalExpressionId, Arc<PhysicalExpression>)>>;

/// Adds a physical expression to an existing group in the memo table.
async fn add_physical_expr_to_group(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that it's possible for a physical expr not to belong to any group with the optd fork design, so probably we need to think twice about the physexpr related apis

}

#[async_recursion]
pub async fn match_any_partial_logical_plan(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider codegen this with the facilities in #25

}

/// Creates a physical project operator.
pub fn project<Relation, Scalar>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming: need to distinguish from logical project?


/// A scalar operator that adds two values.
#[derive(Debug, Clone, PartialEq, Deserialize)]
pub struct And<Scalar> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should collapse binary ops and logical ops into a single class like

pub struct LogicalOp<Scalar> {
  type: Or/And
}

pub struct BinaryOp<Scalar> {
  type: Add/Minus/...
}

which helps reduce the num of lines of code

/// Inserts an entry into the `physical_expressions` table.
async fn insert_into_physical_expressions(
txn: &mut SqliteConnection,
logical_expr_id: PhysicalExpressionId,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to physical_expr_id

@@ -544,7 +757,9 @@ const fn get_all_scalar_exprs_in_group_query() -> &'static str {
" UNION ALL ",
"SELECT scalar_expression_id, json_object('Add', json_object('left', left_group_id, 'right', right_group_id)) as data FROM scalar_adds WHERE group_id = $1",
" UNION ALL ",
"SELECT scalar_expression_id, json_object('Equal', json_object('left', left_group_id, 'right', right_group_id)) as data FROM scalar_equals WHERE group_id = $1"
"SELECT scalar_expression_id, json_object('Equal', json_object('left', left_group_id, 'right', right_group_id)) as data FROM scalar_equals WHERE group_id = $1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought And will be represented as a list of expressions instead of left/right

DFLogicalPlan::Join(join) => {
let mut join_cond = Vec::new();
for (left, right) in &join.on {
let left = Self::conv_df_to_optd_scalar(left, join.left.schema(), 0)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend make conversion + column ref rewrite into two passes because they are two different things

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise every scalar conversion function has to take an offset with it. Or you can make such offset into the context.

pub mod converter;
pub mod planner;

pub async fn run_queries(queries: String) -> Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider vendor datafusion-cli instead of implementing this by ourselves

}
/// Utility function to create a session context for datafusion + optd.
/// The allow deprecated is for the RuntimeConfig
#[allow(deprecated)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the new API from datafusion instead of marking allow deprecated

use optd_datafusion::run_queries;
use std::io;

#[tokio::main]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vendor datafusion-cli instead of implementing a less functional cli?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants