Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor parsing pipeline #1200

Open
wants to merge 24 commits into
base: main
Choose a base branch
from
Open

Conversation

ericvergnaud
Copy link
Contributor

@ericvergnaud ericvergnaud commented Nov 13, 2024

Refactors the LogicalPlan pipeline such that it's possible to use the original TokenStream to fetch the comment tokens and apply them appropriately to the plan items.

Progresses #869

Supersedes #1191

Copy link

github-actions bot commented Nov 13, 2024

Coverage tests results

448 tests  +448   414 ✅ +414   4s ⏱️ +4s
  6 suites +  6    34 💤 + 34 
  6 files   +  6     0 ❌ ±  0 

Results for commit c8b5af8. ± Comparison against base commit 7601ced.

♻️ This comment has been updated with latest results.

@ericvergnaud ericvergnaud marked this pull request as draft November 14, 2024 07:46
@ericvergnaud ericvergnaud changed the base branch from refactor-Origin-in-TreeNode to main November 14, 2024 08:08
@ericvergnaud ericvergnaud marked this pull request as ready for review November 14, 2024 08:29
@jimidle
Copy link
Contributor

jimidle commented Nov 14, 2024

I'm not quite sure about this. We now have more code in the individual PlanParsers and I am not sure that we need to pass the token stream around as we could just call one or more processors on it after it is created. Sort of like the optimizer applies many rules to the LogicalPlan.

@nfx nfx requested a review from jimidle November 14, 2024 14:43
Copy link
Contributor

@jimidle jimidle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this is the correct approach. The separate steps are now one big step and we have moved code from the generic PlanParser into each of the two implementations. Eventually we will have N implementations and any changes to the parsing sequence would have to be made in every instance.

I think that we need to start again with the comment processing and first write a design/strategy document, stealing from:

https://docs.google.com/document/d/1s3nvTklaFgt4a-u_lUhR5tQO_L-S_Br-YAtbF-XxhSw/template/preview

@ericvergnaud
Copy link
Contributor Author

I'm not quite sure about this. We now have more code in the individual PlanParsers and I am not sure that we need to pass the token stream around as we could just call one or more processors on it after it is created. Sort of like the optimizer applies many rules to the LogicalPlan.

The existing implementation forbids that since the CommonTokenStream is transient. We need simultaneous access to the CommonTokenStream and the generated LogicalPlan.

@ericvergnaud
Copy link
Contributor Author

ericvergnaud commented Nov 14, 2024

I don't think that this is the correct approach. The separate steps are now one big step and we have moved code from the generic PlanParser into each of the two implementations. Eventually we will have N implementations and any changes to the parsing sequence would have to be made in every instance.

I'm not sure this is a problem since ANTLR's parsing sequence cannot be changed.
The logic is indeed replicated, but not the code itself.

@ericvergnaud
Copy link
Contributor Author

I think that we need to start again with the comment processing and first write a design/strategy document, stealing from:

https://docs.google.com/document/d/1s3nvTklaFgt4a-u_lUhR5tQO_L-S_Br-YAtbF-XxhSw/template/preview

I'm working on that, experimenting an approach that lets catalyst intact.

@jimidle
Copy link
Contributor

jimidle commented Nov 14, 2024

I'm not quite sure about this. We now have more code in the individual PlanParsers and I am not sure that we need to pass the token stream around as we could just call one or more processors on it after it is created. Sort of like the optimizer applies many rules to the LogicalPlan.

The existing implementation forbids that since the CommonTokenStream is transient. We need simultaneous access to the CommonTokenStream and the generated LogicalPlan.

I don't think we do - I think we process the token stream as soon as we get it.

@jimidle
Copy link
Contributor

jimidle commented Nov 28, 2024

Can we close this now? We have no need for this change

@ericvergnaud
Copy link
Contributor Author

It's marked on hold in the project. A change is needed, we haven't decided which one yet.

@jimidle
Copy link
Contributor

jimidle commented Nov 28, 2024

It's marked on hold in the project. A change is needed, we haven't decided which one yet.

OK - but please remeber that you think a change is needed; I don't believe anyone else thinks that. This has nothing to do with allowing different transpilers in the project, if that is where you are coming from here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants