Refactor parsing pipeline #1200

ericvergnaud · 2024-11-13T16:56:47Z

Refactors the LogicalPlan pipeline such that it's possible to use the original TokenStream to fetch the comment tokens and apply them appropriately to the plan items.

Progresses #869

Supersedes #1191

This reverts commit da36fcb.

…such that it can be populated a construction time

…line # Conflicts: # core/src/main/scala/com/databricks/labs/remorph/intermediate/expressions.scala # core/src/main/scala/com/databricks/labs/remorph/intermediate/plans.scala # core/src/main/scala/com/databricks/labs/remorph/intermediate/trees.scala # core/src/main/scala/com/databricks/labs/remorph/intermediate/workflows/JobNode.scala

This reverts commit f955172.

github-actions · 2024-11-13T17:00:24Z

Coverage tests results

448 tests +448 414 ✅ +414 4s ⏱️ +4s
6 suites + 6 34 💤 + 34
6 files + 6 0 ❌ ± 0

Results for commit c8b5af8. ± Comparison against base commit 7601ced.

♻️ This comment has been updated with latest results.

jimidle · 2024-11-14T13:50:28Z

I'm not quite sure about this. We now have more code in the individual PlanParsers and I am not sure that we need to pass the token stream around as we could just call one or more processors on it after it is created. Sort of like the optimizer applies many rules to the LogicalPlan.

jimidle

I don't think that this is the correct approach. The separate steps are now one big step and we have moved code from the generic PlanParser into each of the two implementations. Eventually we will have N implementations and any changes to the parsing sequence would have to be made in every instance.

I think that we need to start again with the comment processing and first write a design/strategy document, stealing from:

https://docs.google.com/document/d/1s3nvTklaFgt4a-u_lUhR5tQO_L-S_Br-YAtbF-XxhSw/template/preview

ericvergnaud · 2024-11-14T17:37:39Z

I'm not quite sure about this. We now have more code in the individual PlanParsers and I am not sure that we need to pass the token stream around as we could just call one or more processors on it after it is created. Sort of like the optimizer applies many rules to the LogicalPlan.

The existing implementation forbids that since the CommonTokenStream is transient. We need simultaneous access to the CommonTokenStream and the generated LogicalPlan.

ericvergnaud · 2024-11-14T17:40:16Z

I don't think that this is the correct approach. The separate steps are now one big step and we have moved code from the generic PlanParser into each of the two implementations. Eventually we will have N implementations and any changes to the parsing sequence would have to be made in every instance.

I'm not sure this is a problem since ANTLR's parsing sequence cannot be changed.
The logic is indeed replicated, but not the code itself.

ericvergnaud · 2024-11-14T17:41:35Z

I think that we need to start again with the comment processing and first write a design/strategy document, stealing from:

https://docs.google.com/document/d/1s3nvTklaFgt4a-u_lUhR5tQO_L-S_Br-YAtbF-XxhSw/template/preview

I'm working on that, experimenting an approach that lets catalyst intact.

jimidle · 2024-11-14T21:12:41Z

I'm not quite sure about this. We now have more code in the individual PlanParsers and I am not sure that we need to pass the token stream around as we could just call one or more processors on it after it is created. Sort of like the optimizer applies many rules to the LogicalPlan.

The existing implementation forbids that since the CommonTokenStream is transient. We need simultaneous access to the CommonTokenStream and the generated LogicalPlan.

I don't think we do - I think we process the token stream as soon as we get it.

jimidle · 2024-11-28T15:26:17Z

Can we close this now? We have no need for this change

ericvergnaud · 2024-11-28T17:47:59Z

It's marked on hold in the project. A change is needed, we haven't decided which one yet.

jimidle · 2024-11-28T18:54:30Z

It's marked on hold in the project. A change is needed, we haven't decided which one yet.

OK - but please remeber that you think a change is needed; I don't believe anyone else thinks that. This has nothing to do with allowing different transpilers in the project, if that is where you are coming from here?

ericvergnaud added 18 commits November 12, 2024 15:41

refactor Origin

56c0fb2

drop CurrentOrigin

f29386e

make 'TreeNode.origin' field mandatory

03afc1f

formatting and merge issues

5aa0f5f

add test

da36fcb

send LINE_COMMENT to dedicated channel

f955172

propagate mandatory field 'origin' down to project

254352f

Revert "add test"

a3751e5

This reverts commit da36fcb.

propagate mandatory field 'origin' down to project

e0d28dd

propagate mandatory field 'origin' down to project

725f675

refactor parsing pipeline

d87ef33

fix failing tests

6af25c2

refactor parsing pipeline

eeaf24b

fix issues

08ab045

make all Origin fields mandatory

40570f8

refactor TreeNode.origin to a method returning a defaulted parameter …

72a4df7

…such that it can be populated a construction time

fix internal API

d8c7742

ericvergnaud mentioned this pull request Nov 13, 2024

Refactor parsing pipeline #1191

Closed

Revert "send LINE_COMMENT to dedicated channel"

ef1eafe

This reverts commit f955172.

ericvergnaud mentioned this pull request Nov 13, 2024

Drop plan parser dependency on antlr4 #1201

Open

patches optimizer rule for NameOrPosition

4de84d8

ericvergnaud marked this pull request as draft November 14, 2024 07:46

rollback catalyst changes

e0aaa11

ericvergnaud changed the base branch from refactor-Origin-in-TreeNode to main November 14, 2024 08:08

ericvergnaud added 3 commits November 14, 2024 09:14

drop stale test

1734bf6

Merge branch 'main' into refactor-parsing-pipeline

b6b8c4f

Merge branch 'patch/nameOrPosition' into refactor-parsing-pipeline

c8b5af8

ericvergnaud marked this pull request as ready for review November 14, 2024 08:29

nfx requested a review from jimidle November 14, 2024 14:43

jimidle requested changes Nov 14, 2024

View reviewed changes

ericvergnaud mentioned this pull request Nov 15, 2024

[FEATURE]: IR to Listen and Generate Code Comments #869

Open

1 task

ericvergnaud self-assigned this Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor parsing pipeline #1200

Refactor parsing pipeline #1200

ericvergnaud commented Nov 13, 2024 •

edited

Loading

github-actions bot commented Nov 13, 2024 •

edited

Loading

jimidle commented Nov 14, 2024

jimidle left a comment

ericvergnaud commented Nov 14, 2024

ericvergnaud commented Nov 14, 2024 •

edited

Loading

ericvergnaud commented Nov 14, 2024

jimidle commented Nov 14, 2024

jimidle commented Nov 28, 2024

ericvergnaud commented Nov 28, 2024

jimidle commented Nov 28, 2024

Refactor parsing pipeline #1200

Are you sure you want to change the base?

Refactor parsing pipeline #1200

Conversation

ericvergnaud commented Nov 13, 2024 • edited Loading

github-actions bot commented Nov 13, 2024 • edited Loading

Coverage tests results

jimidle commented Nov 14, 2024

jimidle left a comment

Choose a reason for hiding this comment

ericvergnaud commented Nov 14, 2024

ericvergnaud commented Nov 14, 2024 • edited Loading

ericvergnaud commented Nov 14, 2024

jimidle commented Nov 14, 2024

jimidle commented Nov 28, 2024

ericvergnaud commented Nov 28, 2024

jimidle commented Nov 28, 2024

ericvergnaud commented Nov 13, 2024 •

edited

Loading

github-actions bot commented Nov 13, 2024 •

edited

Loading

ericvergnaud commented Nov 14, 2024 •

edited

Loading