Optimize bytecode amount #206

luontola · 2023-11-30T13:28:57Z

Fixes #205

This PR does two optimizations which reduce the amount of generated bytecode:

Optimize hiccup.compiler/build-string to concatenate consecutive string literals at compile time
Optimize hiccup.compiler/compile-element ::literal-tag to generate only once the code for an element's children and ending tag

Example of the generated code:

I did a little benchmarking and this improved the performance of my code by 7-14%. This improvement was due to concatenating the strings at compile time, which led to fewer StringBuilder.append() calls. It could be optimized further by using only one StringBuilder, avoiding the multiple intermediate StringBuilder->String->StringBuilder conversions. I'll have a look at that later in another PR.

luontola · 2023-11-30T13:53:40Z

test/hiccup2/core_test.clj

+  (testing "runtime both absent"
+    ;; TODO: this might not be desired behavior (use case: the user has a function which returns a map of attributes or nil)
+    (is (= (str (html {:mode :xhtml} [:br (identity nil)])) "<br></br>"))
+    (is (= (str (html {:mode :html} [:br (identity nil)])) "<br></br>"))
+    (is (= (str (html {:mode :xml} [:br (identity nil)])) "<br></br>"))
+    (is (= (str (html {:mode :sgml} [:br (identity nil)])) "<br></br>")))
+  (testing "compile-time both absent: nil child"
+    ;; TODO: same as above, but more of the user's fault to write a nil literal inside a void element
+    (is (= (str (html {:mode :xhtml} [:br nil])) "<br></br>"))
+    (is (= (str (html {:mode :html} [:br nil])) "<br></br>"))
+    (is (= (str (html {:mode :xml} [:br nil])) "<br></br>"))
+    (is (= (str (html {:mode :sgml} [:br nil])) "<br></br>")))


When writing tests for the old version of the code, I noticed this weird behavior. I didn't change it in this PR, but it might be good to change Hiccup to generate   or   instead of  .

If you think this is a bug that should be fixed, I volunteer to work on it.

It does look like a bug, and I'd very much appreciate the help - but let's address that in a separate PR. Can you remove this particular testing block for now, so we can include it in a later PR that's focused on this issue?

Also, instead of  , would  work instead? IIRC that would have the same behavior, but generate more correct HTML.

I'll change this test when I write the fix. For now it's useful for it to document the current behavior, and it was also critical in preventing my optimizations from changing the current behavior. It covers edge cases that the existing tests didn't cover, and it saved me a already couple of times during my refactoring.

weavejester

Thank you very much for your work on this, I have some comments/changes needed before it can be merged.

weavejester · 2023-11-30T13:52:39Z

src/hiccup/compiler.clj

@@ -314,22 +333,24 @@
 (defmethod compile-element ::literal-tag
  [[tag attrs & content]]
  (let [[tag tag-attrs _] (normalize-element-form [tag])
-        attrs-sym         (gensym "attrs")]
+        attrs-sym (gensym "attrs")]


Looks like the spacing here was accidentally removed.

What code formatter do you use? I wasn't able to configure Cursive to produce exactly the same style.

I'm using Emacs/CIDER. Does Cursive try to revert aligned let forms?

Cursive aligns let forms, but ignores destructurings when deciding how deep to align them.

You could configure a Leiningen alias for formatting the whole project. That would end all discussion about code formatting and PRs could focus on more important things. For example add the zprint plugin and configure it to use the community style.

src/hiccup/compiler.clj

weavejester · 2023-11-30T13:59:22Z

test/hiccup2/core_test.clj

+  (testing "runtime both absent"
+    ;; TODO: this might not be desired behavior (use case: the user has a function which returns a map of attributes or nil)
+    (is (= (str (html {:mode :xhtml} [:br (identity nil)])) "<br></br>"))
+    (is (= (str (html {:mode :html} [:br (identity nil)])) "<br></br>"))
+    (is (= (str (html {:mode :xml} [:br (identity nil)])) "<br></br>"))
+    (is (= (str (html {:mode :sgml} [:br (identity nil)])) "<br></br>")))
+  (testing "compile-time both absent: nil child"
+    ;; TODO: same as above, but more of the user's fault to write a nil literal inside a void element
+    (is (= (str (html {:mode :xhtml} [:br nil])) "<br></br>"))
+    (is (= (str (html {:mode :html} [:br nil])) "<br></br>"))
+    (is (= (str (html {:mode :xml} [:br nil])) "<br></br>"))
+    (is (= (str (html {:mode :sgml} [:br nil])) "<br></br>")))


It does look like a bug, and I'd very much appreciate the help - but let's address that in a separate PR. Can you remove this particular testing block for now, so we can include it in a later PR that's focused on this issue?

Also, instead of  , would  work instead? IIRC that would have the same behavior, but generate more correct HTML.

weavejester · 2023-11-30T14:03:09Z

test/hiccup/compiler_test.clj

+(deftest test-concatenate-strings
+  (let [concatenate-strings #'compiler/concatenate-strings]
+    (testing "empty collection"
+      (is (= [] (concatenate-strings nil)))
+      (is (= [] (concatenate-strings []))))
+    (testing "concatenates consecutive strings, ignores other types"
+      (is (= ["abc"] (concatenate-strings ["a" "b" "c"])))
+      (is (= [1 "ab" 2 "cd" 3] (concatenate-strings [1 "a" "b" 2 "c" "d" 3]))))))


We shouldn't be testing private functions on their own, as it should be possible to refactor the internals of a namespace without causing the tests to fail.

Can we perform the same effective test in hiccup.core_test, using html?

It's only used by the macro, so an integration test would not be able check whether concatenate-strings works or whether it's just an alias for identity.

But we can check the generated code with macroexpand-1, right?

Checking the outcome of macroexpand would produce extremely fragile tests. It would hinder refactoring instead of helping it, because it would cause refactoring to break tests. Tests should test what the code does, not how the code does it.

It might be possible to write a non-fragile macroexpand-based test, if it would extract all string literals from the generated code and check that only the concatenated string literal exists there. But that's an order of magnitude more complex than a simple unit test that calls the helper function directly.

The goal is to reduce the amount of bytecode, since that's the thing that has a 64KB limit. Clojure forms are only an indirect indicator for that.

One way to test that would be to have a few example inputs of pathological cases, compile them to .class files, parse the .class files with ASM, and calculate how many bytecode instructions they have. Knowing exactly how many instructions one example input produces, is not useful information. Instead it would be better to know at what rate the number of instructions increases as number of problematic elements in the input increases. I'm not sure whether that is worth the effort.

This seems quite similar to performance testing. The exact values are not important, but it's useful to see if there is a big change from one release to another. Maybe this could be approached as a performance test using a set of standard examples, and we would record their bytecode size and performance, and then compare it to the previous releases?

We could do it that way, but it seems simpler just to count the forms and use that as an approximation for the size of the generated bytecode.

To put it another way, it's unlikely that the bytecode is going to significantly increase in size if the number of forms remains the same. The point of tests is to alert us of future regressions, after all. We just need a sufficiently good canary in our coalmine; it doesn't need to be perfect.

Yes, that seems like a good compromise. Counting forms is easier.

I've now written such tests.

Thanks! The new tests look good. Can you remove this compiler test and namespace from the PR?

weavejester · 2023-11-30T16:37:36Z

src/hiccup/compiler.clj

+         (when (or ~(boolean (container-tag? tag content))
+                   (not (map? ~attrs-sym)))


Could this be written as:

~(if (container-tag? tag content) `(build-string ~@(compile-seq content) ~(str "</" tag ">")) `(when (not (map? ~attrs-sym)) (build-string ~@(compile-seq content) ~(str "</" tag ">"))))

Unless I've misread the code, this should do the same thing, but potentially shave off another form in some circumstances.

I found a way to simplify it further: 95a8b0f

luontola · 2023-12-03T13:36:32Z

I found another bug while improving the test coverage.

  (is (= (str (html [(identity :p) [:span "x"]]))
         "<p>&lt;span&gt;x&lt;/span&gt;</p>"))

The fix is in https://github.com/luontola/hiccup/tree/fix-runtime-tags

Can we merge this current PR now, so that I can create a new PR about that fix?

weavejester

Thank you for the update and for all your work so far. This all looks good. I've added a couple of comments for code that can be removed/made more concise.

Once that's all done, can you ensure that all added/altered lines are under 80 character in length, as this project follows the Clojure Style Guide (I notice Hiccup is missing my standard "CONTRIBUTING.md", probably because it's an older project.)

Then the commits can be squashed down, since we don't need the intermediate commits in the final history. I follow the seven rules of a great git commit message for writing commit messages.

test/hiccup2/optimizations_test.clj

luontola · 2023-12-04T18:22:33Z

Do you mean I should squash them to one commit, or will you use github's squash merge to do it yourself?

weavejester · 2023-12-04T23:56:46Z

Unfortunately GitHub's "squash and merge" doesn't maintain a merge commit - it's more like a "squash and rebase" - so I'm afraid you need to squash the commits manually.

It was possible for the macros to generate so much bytecode that they would exceed Java's 64KB limit for method code size. In such situations the Clojure compiler would fail with the message "Method code too large!" See weavejester#205 This commit does the following optimizations: 1. Concatenate string literals at compile time, so that we can replace multiple StringBuilder.append() calls with just one. This reduces the generated code from O(n) to O(1). This also improves performance by 10-20%, since copying one long string is faster than many short strings. 2. When a runtime check is needed to determine whether a value is the element's attribute map or its first child element, avoid duplicating the code for the element's content. This reduces the generated code from O(n^2) to O(n). While improving the test coverage, some edge cases of generating bad HTML were detected. This commit doesn't change the existing behavior, but only documents it in the new tests. Fixing that behavior will be done in future commits.

luontola · 2023-12-06T11:17:31Z

Squashed.

I moved the tests from hiccup2.core-test to hiccup.compiler-test, because they're focused on covering the edge cases in individual functions in hiccup.compiler.

luontola commented Nov 30, 2023

View reviewed changes

weavejester reviewed Nov 30, 2023

View reviewed changes

luontola force-pushed the optimize-bytecode-amount branch from dfa8a19 to 95a8b0f Compare December 1, 2023 14:27

weavejester requested changes Dec 3, 2023

View reviewed changes

luontola requested a review from weavejester December 4, 2023 16:35

luontola force-pushed the optimize-bytecode-amount branch from 71ed868 to 00ddf99 Compare December 6, 2023 11:10

weavejester approved these changes Dec 7, 2023

View reviewed changes

weavejester merged commit 19c7531 into weavejester:master Dec 7, 2023
1 check passed

luontola deleted the optimize-bytecode-amount branch December 8, 2023 13:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize bytecode amount #206

Optimize bytecode amount #206

luontola commented Nov 30, 2023 •

edited

Loading

luontola Nov 30, 2023

luontola Nov 30, 2023

weavejester Nov 30, 2023

luontola Nov 30, 2023

weavejester left a comment

weavejester Nov 30, 2023

luontola Nov 30, 2023

weavejester Nov 30, 2023

luontola Nov 30, 2023

weavejester Nov 30, 2023

weavejester Nov 30, 2023

luontola Nov 30, 2023

weavejester Nov 30, 2023

luontola Nov 30, 2023 •

edited

Loading

luontola Nov 30, 2023 •

edited

Loading

luontola Nov 30, 2023 •

edited

Loading

weavejester Nov 30, 2023

luontola Nov 30, 2023

luontola Dec 1, 2023

weavejester Dec 3, 2023

weavejester Nov 30, 2023

luontola Dec 1, 2023

luontola commented Dec 3, 2023

weavejester left a comment

luontola commented Dec 4, 2023

weavejester commented Dec 4, 2023

luontola commented Dec 6, 2023

		(when (or ~(boolean (container-tag? tag content))
		(not (map? ~attrs-sym)))

Optimize bytecode amount #206

Optimize bytecode amount #206

Conversation

luontola commented Nov 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

weavejester left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luontola Nov 30, 2023 • edited Loading

Choose a reason for hiding this comment

luontola Nov 30, 2023 • edited Loading

Choose a reason for hiding this comment

luontola Nov 30, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luontola commented Dec 3, 2023

weavejester left a comment

Choose a reason for hiding this comment

luontola commented Dec 4, 2023

weavejester commented Dec 4, 2023

luontola commented Dec 6, 2023

luontola commented Nov 30, 2023 •

edited

Loading

luontola Nov 30, 2023 •

edited

Loading

luontola Nov 30, 2023 •

edited

Loading

luontola Nov 30, 2023 •

edited

Loading