-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More efficiently seek to position #171
Comments
Pinging @lread as may be interested on discussing this |
Two issues in one day? @mainej, you are spoiling me! This sounds like an interesting idea, but I'll have to take a deeper peek. Lucky for me, you've done a great job describing the issue. |
Haha, sorry—I've had these on my list for a long time and finally got around them. I hope I'm not inundating you. :) |
Not at all, I am genuinely delighted! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Problem/Opportunity
find-last-by-pos
is slower than it could be, especially when used to find zlocs at the end of files.Suppose for example that you have a zipper at the root of the following code, and know the row and column of
b-b-a
.The problem with using
find-last-by-pos
to find the node is that it usesz/next*
, which means it does a depth first traversal of the tree. But, sincea
's end row precedesb-b-a
's start row, neithera
nor any of its children or grandchildren will beb-b-a
. Similarly, neitherb-a
nor its children can beb-b-a
. Therefore traversinga
's orb-a
's branches is wasted time. To find a node near the end of a large file ends up traversing hundreds or thousands of nodes that cannot possibly be the desired node.Proposed Solution
I propose that either
find-last-by-pos
be made more efficient, or that new tools be introduced, tools specialized to the constraints of navigation by position.The essential idea is to change the navigation algorithm to skip
z/right*
over nodes that can't contain the position, before descendingz/down*
and recursing in nodes that do contain the position. You can see an implementation of this idea here.Aside: The idea for this request was developed in clojure-lsp/clojure-lsp#793. I encourage you to look at that PR for additional context, noting especially the performance analysis of large files. Be aware that the original implementation used a function that, though it was named
find-last-by-pos
, wasn't rewrite-clj's version. That function was however essentially the same algorithm as rewrite-clj's, and so I expect the performance characteristics to be similar.Additional context
In regards to whether to replace
z/find-last-by-pos
, note that the clojure-lsp version accepts a row and col, but,z/find-last-by-pos
accepts a full range. I haven't thought too carefully about the consequences of that difference. I think that using the end row/col of the range would return the same results, but perhaps that isn't considering every case.There are a few variants of this algorithm that rewrite-clj may want to support.
One is to find both the start and end zloc in a range (see, for example, this clojure-lsp code). This can be optimized beyond simply searching from the initial node twice, by taking advantage of the fact that we know the end must follow the start. A conservative version of this algorithm looks like the following pseudocode. There may be more efficient algorithms which don't search all the way to the top before beginning the search for the end loc.
Another variant is to find the minimal set of nodes that span a range. That is, find the deepest, shortest set of sibling nodes that contain the start and end of the range. Admittedly, I don't have a use-case for this scenario and it may be too specialized for rewrite-clj.
Action
I contributed the clojure-lsp code, and would be willing to port it to rewrite-clj. I've talked with @ericdallo, one of the clojure-lsp maintainers, who said that'd be OK.
To use an optimized
z/find-last-by-pos
, clojure-lsp would also need #170, since it doesn't use position tracking. Alternatively, if something likefind-by-heritability
were exposed as a public method in rewrite-clj, it could use that. But either way, that consideration shouldn't affect this discussion.The text was updated successfully, but these errors were encountered: