-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get Some Idea of Performance Cost of Using Position Tracking Zipper #172
Comments
Not sure how many of these I'll hit, but environments of interest:
|
I think measuring 1 and 2 is the most important. If those get faster, then I expect that 3 and 4 also get faster (although sometimes 3 optimizes more aggressively. In short, I would optimize for 1 (JVM) and 2 (but even then I think 1 is, given the areas where rewrite-clj is most used, is by far the most important). |
Yeah, that sounds very reasonable, thanks! |
@lread I don't mean to steal your thunder, but I had to know... Here's my analysis of cljs/core.cljs, the file you suggested. That file has 12K lines, which is way outside the norm for a Clojure file. But, it makes for interesting analysis... It takes approximately the same amount of time to parse the file either as a tracked or untracked zipper. An untracked zipper can traverse all the nodes in about a quarter as much time.
This equivalent parsing time and faster traversing time is explained by the fact that a tracked zipper calculates its position data on the fly, as you traverse it. But I think this overstates the performance differences between tracked and untracked zippers. What if you were to do a little work at each node, and in particular, what if you were to ask each node about its position? Forcing both zippers to deal with position data would be a more apples-to-apples comparison. A good candidate for that kind of comparison is
The first thing to notice is that While we're doing this, we might as well compare
I think it's clear this algorithm is faster than Even with the optimized algorithm it takes 107ms to get to the bottom of the file in a tracked zipper, versus 2ms for an untracked zipper. This suggests the tracked zipper isn't zippy enough for Let me know if you want the code that generated these stats. Or, I could pretty easily re-run it with smaller files, if you'd like that for comparison. All benchmarking done with criterium. JVM 11, Clojure 1.11. 2.5 GHz Dual-Core Intel Core i7, 16 GB RAM. |
FWIW, when I started building clj-kondo I discovered that not using zippers at all was much better for performance, so I went that route. |
It appears that |
Hey ho! Thanks for the thoughtful analysis @mainej. I barely started looking at this. I was also using My next test was to try a
I think this is very interesting, but I don't see it as the focus of this specific issue (still relevant to #171 tho!) Here I'm just trying to get a rough idea of the cost of choosing a position tracking zipper today. |
That's reasonable. So, to focus on that case a little bit more, I ran clj-async-profiler on a traversal of every node: (let [tracked (z/of-file "/path/to/clojurescript/src/main/cljs/cljs/core.cljs"
{:track-position? true})]
(println "\n\n\ntracked, traverse")
(time
(profiler/profile
{:min-width 5}
(dotimes [_ 25]
(->> tracked
(iterate z/next*)
(take-while (complement z/end?))
last
z/node))))) I'll attach the flamegraph here, but Github always "helpfully" converts the nicely interactive SVG where you can drilldown into subsections into a static image. If you want, I can share the original file somewhere else. My summary is: the profiler was in rewrite-clj code about 92% of the time. That time breaks down roughly like this:
So, obviously,
|
Q: To what Ya, that would have been my guess, but nice to see it in flames. |
The best I've been able to do so far is the following, which benchmarks around 250ms, cutting the difference between tracked and untracked cursors by about 55% (with all the usual caveats: on this file, under this one workload of seeking through every node, on my machine). (let [s (string node)
last-c (subs s (dec (count s)))
lines (string/split s #"\n" -1) ;; -1 includes trailing empty lines
rows (dec (count lines)) ;; all on one line moves the cursor zero rows
cols (cond-> (count (last lines))
;; count trailing newline when elided by split
(= last-c "\n") inc)]
[rows cols]) I tried I also experimented a little with I suspect there's some lower-level string manipulation that would be faster than a regex. I'll keep exploring. |
Uh wait... Why does the original algorithm return the same thing for both of these strings? "\n" ;; => [1 1]
"\na" ;; => [1 1] I'd expect it to be: "\n" ;; => [1 0]
"\na" ;; => [1 1] But maybe not... I guess in an editor, when you're on a new blank line, your cursor is on the first column. And after you add a character, that character is also in the first column. Not very intuitive to my brain. |
OK, last note for today. Here's a lower-level version. It benchmarks at 231ms, a (somewhat disappointingly) small improvement. (let [s (string node)
;; indexes of newlines
is (loop [idx -1
idxs []]
(if-let [i (string/index-of s \newline (inc idx))]
(recur i (conj idxs i))
idxs))
s-length (count s)]
[(count is)
(if (seq is)
(let [trailing-newline? (= "\n" (subs s (dec s-length)))]
(cond-> (- s-length (last is))
(not trailing-newline?) dec))
s-length)]) That cuts the difference between the tracked and untracked zippers by 61%. Or to think about it a different way, originally the tracked zipper took 3.8 times as long as the tracked zipper, and now it takes 2.1 times. Breaking down times... of the 98% of the total time that the profiler spent in rewrite-clj, 53% was in
There may be more to eke out from newline counting—13% is still a sizable chunk—but I'm out of techniques. The next bottleneck in I'll also point out that something like 25% of the time is doing other custom-zipper maintenance (mostly in |
Thanks for taking a peek @mainej, I'll let this simmer in my noggin for a bit before diving in any deeper. |
Yeah, back to the hammock. |
@borkdude what does clj-kondo do instead of using zippers? |
I just manually process nodes, no zipping, too much waste of performance |
I've been meaning to do this for a while.
When asked, I have typically told folks that I expect the position tracking zipper incurs a performance hit, but maybe I should get at least some vague idea of what that hit is.
And document it.
Inspired by #170, #171
The text was updated successfully, but these errors were encountered: