-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize bytecode amount #206
Optimize bytecode amount #206
Conversation
test/hiccup2/core_test.clj
Outdated
(testing "runtime both absent" | ||
;; TODO: this might not be desired behavior (use case: the user has a function which returns a map of attributes or nil) | ||
(is (= (str (html {:mode :xhtml} [:br (identity nil)])) "<br></br>")) | ||
(is (= (str (html {:mode :html} [:br (identity nil)])) "<br></br>")) | ||
(is (= (str (html {:mode :xml} [:br (identity nil)])) "<br></br>")) | ||
(is (= (str (html {:mode :sgml} [:br (identity nil)])) "<br></br>"))) | ||
(testing "compile-time both absent: nil child" | ||
;; TODO: same as above, but more of the user's fault to write a nil literal inside a void element | ||
(is (= (str (html {:mode :xhtml} [:br nil])) "<br></br>")) | ||
(is (= (str (html {:mode :html} [:br nil])) "<br></br>")) | ||
(is (= (str (html {:mode :xml} [:br nil])) "<br></br>")) | ||
(is (= (str (html {:mode :sgml} [:br nil])) "<br></br>"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When writing tests for the old version of the code, I noticed this weird behavior. I didn't change it in this PR, but it might be good to change Hiccup to generate <br>
or <br />
instead of <br></br>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think this is a bug that should be fixed, I volunteer to work on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does look like a bug, and I'd very much appreciate the help - but let's address that in a separate PR. Can you remove this particular testing block for now, so we can include it in a later PR that's focused on this issue?
Also, instead of <br>
, would <p>
work instead? IIRC that would have the same behavior, but generate more correct HTML.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change this test when I write the fix. For now it's useful for it to document the current behavior, and it was also critical in preventing my optimizations from changing the current behavior. It covers edge cases that the existing tests didn't cover, and it saved me a already couple of times during my refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for your work on this, I have some comments/changes needed before it can be merged.
src/hiccup/compiler.clj
Outdated
@@ -314,22 +333,24 @@ | |||
(defmethod compile-element ::literal-tag | |||
[[tag attrs & content]] | |||
(let [[tag tag-attrs _] (normalize-element-form [tag]) | |||
attrs-sym (gensym "attrs")] | |||
attrs-sym (gensym "attrs")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the spacing here was accidentally removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What code formatter do you use? I wasn't able to configure Cursive to produce exactly the same style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using Emacs/CIDER. Does Cursive try to revert aligned let forms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursive aligns let forms, but ignores destructurings when deciding how deep to align them.
You could configure a Leiningen alias for formatting the whole project. That would end all discussion about code formatting and PRs could focus on more important things. For example add the zprint plugin and configure it to use the community style.
test/hiccup2/core_test.clj
Outdated
(testing "runtime both absent" | ||
;; TODO: this might not be desired behavior (use case: the user has a function which returns a map of attributes or nil) | ||
(is (= (str (html {:mode :xhtml} [:br (identity nil)])) "<br></br>")) | ||
(is (= (str (html {:mode :html} [:br (identity nil)])) "<br></br>")) | ||
(is (= (str (html {:mode :xml} [:br (identity nil)])) "<br></br>")) | ||
(is (= (str (html {:mode :sgml} [:br (identity nil)])) "<br></br>"))) | ||
(testing "compile-time both absent: nil child" | ||
;; TODO: same as above, but more of the user's fault to write a nil literal inside a void element | ||
(is (= (str (html {:mode :xhtml} [:br nil])) "<br></br>")) | ||
(is (= (str (html {:mode :html} [:br nil])) "<br></br>")) | ||
(is (= (str (html {:mode :xml} [:br nil])) "<br></br>")) | ||
(is (= (str (html {:mode :sgml} [:br nil])) "<br></br>"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does look like a bug, and I'd very much appreciate the help - but let's address that in a separate PR. Can you remove this particular testing block for now, so we can include it in a later PR that's focused on this issue?
Also, instead of <br>
, would <p>
work instead? IIRC that would have the same behavior, but generate more correct HTML.
test/hiccup/compiler_test.clj
Outdated
(deftest test-concatenate-strings | ||
(let [concatenate-strings #'compiler/concatenate-strings] | ||
(testing "empty collection" | ||
(is (= [] (concatenate-strings nil))) | ||
(is (= [] (concatenate-strings [])))) | ||
(testing "concatenates consecutive strings, ignores other types" | ||
(is (= ["abc"] (concatenate-strings ["a" "b" "c"]))) | ||
(is (= [1 "ab" 2 "cd" 3] (concatenate-strings [1 "a" "b" 2 "c" "d" 3])))))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't be testing private functions on their own, as it should be possible to refactor the internals of a namespace without causing the tests to fail.
Can we perform the same effective test in hiccup.core_test
, using html
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only used by the macro, so an integration test would not be able check whether concatenate-strings
works or whether it's just an alias for identity
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we can check the generated code with macroexpand-1
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking the outcome of macroexpand would produce extremely fragile tests. It would hinder refactoring instead of helping it, because it would cause refactoring to break tests. Tests should test what the code does, not how the code does it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be possible to write a non-fragile macroexpand-based test, if it would extract all string literals from the generated code and check that only the concatenated string literal exists there. But that's an order of magnitude more complex than a simple unit test that calls the helper function directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is to reduce the amount of bytecode, since that's the thing that has a 64KB limit. Clojure forms are only an indirect indicator for that.
One way to test that would be to have a few example inputs of pathological cases, compile them to .class files, parse the .class files with ASM, and calculate how many bytecode instructions they have. Knowing exactly how many instructions one example input produces, is not useful information. Instead it would be better to know at what rate the number of instructions increases as number of problematic elements in the input increases. I'm not sure whether that is worth the effort.
This seems quite similar to performance testing. The exact values are not important, but it's useful to see if there is a big change from one release to another. Maybe this could be approached as a performance test using a set of standard examples, and we would record their bytecode size and performance, and then compare it to the previous releases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do it that way, but it seems simpler just to count the forms and use that as an approximation for the size of the generated bytecode.
To put it another way, it's unlikely that the bytecode is going to significantly increase in size if the number of forms remains the same. The point of tests is to alert us of future regressions, after all. We just need a sufficiently good canary in our coalmine; it doesn't need to be perfect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that seems like a good compromise. Counting forms is easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now written such tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The new tests look good. Can you remove this compiler test and namespace from the PR?
src/hiccup/compiler.clj
Outdated
(when (or ~(boolean (container-tag? tag content)) | ||
(not (map? ~attrs-sym))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be written as:
~(if (container-tag? tag content)
`(build-string ~@(compile-seq content) ~(str "</" tag ">"))
`(when (not (map? ~attrs-sym))
(build-string ~@(compile-seq content) ~(str "</" tag ">"))))
Unless I've misread the code, this should do the same thing, but potentially shave off another form in some circumstances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a way to simplify it further: 95a8b0f
dfa8a19
to
95a8b0f
Compare
I found another bug while improving the test coverage. (is (= (str (html [(identity :p) [:span "x"]]))
"<p><span>x</span></p>")) The fix is in https://github.com/luontola/hiccup/tree/fix-runtime-tags Can we merge this current PR now, so that I can create a new PR about that fix? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the update and for all your work so far. This all looks good. I've added a couple of comments for code that can be removed/made more concise.
Once that's all done, can you ensure that all added/altered lines are under 80 character in length, as this project follows the Clojure Style Guide (I notice Hiccup is missing my standard "CONTRIBUTING.md", probably because it's an older project.)
Then the commits can be squashed down, since we don't need the intermediate commits in the final history. I follow the seven rules of a great git commit message for writing commit messages.
Do you mean I should squash them to one commit, or will you use github's squash merge to do it yourself? |
Unfortunately GitHub's "squash and merge" doesn't maintain a merge commit - it's more like a "squash and rebase" - so I'm afraid you need to squash the commits manually. |
It was possible for the macros to generate so much bytecode that they would exceed Java's 64KB limit for method code size. In such situations the Clojure compiler would fail with the message "Method code too large!" See weavejester#205 This commit does the following optimizations: 1. Concatenate string literals at compile time, so that we can replace multiple StringBuilder.append() calls with just one. This reduces the generated code from O(n) to O(1). This also improves performance by 10-20%, since copying one long string is faster than many short strings. 2. When a runtime check is needed to determine whether a value is the element's attribute map or its first child element, avoid duplicating the code for the element's content. This reduces the generated code from O(n^2) to O(n). While improving the test coverage, some edge cases of generating bad HTML were detected. This commit doesn't change the existing behavior, but only documents it in the new tests. Fixing that behavior will be done in future commits.
71ed868
to
00ddf99
Compare
Squashed. I moved the tests from |
Fixes #205
This PR does two optimizations which reduce the amount of generated bytecode:
hiccup.compiler/build-string
to concatenate consecutive string literals at compile timehiccup.compiler/compile-element ::literal-tag
to generate only once the code for an element's children and ending tagExample of the generated code:
I did a little benchmarking and this improved the performance of my code by 7-14%. This improvement was due to concatenating the strings at compile time, which led to fewer
StringBuilder.append()
calls. It could be optimized further by using only one StringBuilder, avoiding the multiple intermediate StringBuilder->String->StringBuilder conversions. I'll have a look at that later in another PR.