Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More efficiently seek to position #171

Open
mainej opened this issue Mar 26, 2022 · 4 comments
Open

More efficiently seek to position #171

mainej opened this issue Mar 26, 2022 · 4 comments

Comments

@mainej
Copy link
Contributor

mainej commented Mar 26, 2022

Problem/Opportunity

find-last-by-pos is slower than it could be, especially when used to find zlocs at the end of files.

Suppose for example that you have a zipper at the root of the following code, and know the row and column of b-b-a.

(a
 (a-a
  (a-a-a)
  (a-a-b)
  (a-a-c)))

(b
 (b-a
  (b-a-a)
  (b-a-b)
  (b-a-c))
 (b-b
  (b-b-a))) ;; <-- we want to get here, to b-b-a

(c)

The problem with using find-last-by-pos to find the node is that it uses z/next*, which means it does a depth first traversal of the tree. But, since a's end row precedes b-b-a's start row, neither a nor any of its children or grandchildren will be b-b-a. Similarly, neither b-a nor its children can be b-b-a. Therefore traversing a's or b-a's branches is wasted time. To find a node near the end of a large file ends up traversing hundreds or thousands of nodes that cannot possibly be the desired node.

Proposed Solution

I propose that either find-last-by-pos be made more efficient, or that new tools be introduced, tools specialized to the constraints of navigation by position.

The essential idea is to change the navigation algorithm to skip z/right* over nodes that can't contain the position, before descending z/down* and recursing in nodes that do contain the position. You can see an implementation of this idea here.

Aside: The idea for this request was developed in clojure-lsp/clojure-lsp#793. I encourage you to look at that PR for additional context, noting especially the performance analysis of large files. Be aware that the original implementation used a function that, though it was named find-last-by-pos, wasn't rewrite-clj's version. That function was however essentially the same algorithm as rewrite-clj's, and so I expect the performance characteristics to be similar.

Additional context

In regards to whether to replace z/find-last-by-pos, note that the clojure-lsp version accepts a row and col, but, z/find-last-by-pos accepts a full range. I haven't thought too carefully about the consequences of that difference. I think that using the end row/col of the range would return the same results, but perhaps that isn't considering every case.

There are a few variants of this algorithm that rewrite-clj may want to support.

One is to find both the start and end zloc in a range (see, for example, this clojure-lsp code). This can be optimized beyond simply searching from the initial node twice, by taking advantage of the fact that we know the end must follow the start. A conservative version of this algorithm looks like the following pseudocode. There may be more efficient algorithms which don't search all the way to the top before beginning the search for the end loc.

(let [start-loc (z/find-at-pos root start-row start-col)]
  [start-loc (z/find-at-pos (z/top start-loc) end-row end-col)])

Another variant is to find the minimal set of nodes that span a range. That is, find the deepest, shortest set of sibling nodes that contain the start and end of the range. Admittedly, I don't have a use-case for this scenario and it may be too specialized for rewrite-clj.

Action
I contributed the clojure-lsp code, and would be willing to port it to rewrite-clj. I've talked with @ericdallo, one of the clojure-lsp maintainers, who said that'd be OK.

To use an optimized z/find-last-by-pos, clojure-lsp would also need #170, since it doesn't use position tracking. Alternatively, if something like find-by-heritability were exposed as a public method in rewrite-clj, it could use that. But either way, that consideration shouldn't affect this discussion.

@ericdallo
Copy link

Pinging @lread as may be interested on discussing this

@lread
Copy link
Collaborator

lread commented Mar 26, 2022

Two issues in one day? @mainej, you are spoiling me!

This sounds like an interesting idea, but I'll have to take a deeper peek.

Lucky for me, you've done a great job describing the issue.

@mainej
Copy link
Contributor Author

mainej commented Mar 26, 2022

Haha, sorry—I've had these on my list for a long time and finally got around them. I hope I'm not inundating you. :)

@lread
Copy link
Collaborator

lread commented Mar 26, 2022

Not at all, I am genuinely delighted!

@lread lread added this to rewrite-clj Jul 3, 2024
@lread lread moved this to Medium Priority in rewrite-clj Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Medium Priority
Development

No branches or pull requests

3 participants