-
-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make invalid Unicode data raise when encoding through Oj::Rails::Encoder #912
Make invalid Unicode data raise when encoding through Oj::Rails::Encoder #912
Conversation
I'll have to spend a little time looking at the changes. It make me uncomfortable that some of the existing tests were removed. Did you run any benchmarks to see what impact on performance the change had? |
I definitely shuffled some tests around but I didn’t mean to remove any. What did I delete? I’ll fix that for sure :) good call on benchmarking - I’ll put something together today. |
Maybe it was the moving around part that made it appear as if some tests were removed. I will look more carefully. |
Were you able to put together a benchmark and fix the clang formatting issues? |
Sorry, I haven’t gotten to that - I do plan to in the next couple of days! |
These tests were not even loading Oj::Rails; they were definitely not actually testing the Oj rails shim.
Activesupport & JSON gem will raise an exception when trying to an encode an object containing a string with invalid byte sequences for the string's encoding. Oj correctly raises if escaspe_html_entites_in_json is enabled, but if that's disabled, the invalid byte sequence is copied directly to the output. Use the same logic to validate unicode in that case as well.
ad29f92
to
eb3febb
Compare
OK, I've fixed the clang-format problems, and I put together a benchmark: https://gist.github.com/KJTsanaktsidis/f85be084d61aca54f8493ab63fe0707f Without this patch:
With this patch:
The only substantial difference is that the "lots of multibyte characters" case with this patch now takes the same amount of time regardless of using |
Benchmarks look good. I'll start a more detailed review to get this merged. |
LGTM other than one open question. |
Thanks for the work. I know I was a little picky. Maybe too much so, sorry. |
Not at all! Thanks for your attention on this.
Do you want me to open another PR with those changes? |
This is a potential fix for #911. Currently, whether or not
Oj::Rails::Encoder
raises on invalid unicode data depends on the value ofActiveSupport.escape_html_entities_in_json
. In order to accurately mimic the behaviour of stock Rails with the stock json gem, it should in fact raise an exception regardless.I've so far deliberately copied rather than shared functionality that's shared between
RailsEsc
andRailsXEsc
mode, because I wasn't quite sure how to factor the similarities out. We can leave it like this, or I'm happy to take pointers on a way to factor this down better.I added a testcase for invalid Unicode to the Rails 6 & 7 encoding tests, and also parameterised the existing unicode-related tests to make sure they work correctly with both settings of
ActiveSupport.escape_html_entities_in_json
.