-
-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use "interned" (frozen and deduplicated) Strings in Ruby 3.0+ to minimize object allocations. #275
Comments
…cation. Fixes ohler55#275 - Background info: https://en.wikipedia.org/wiki/String_interning - Available to C extensions since MRI Ruby 3.0: - https://bugs.ruby-lang.org/issues/13381 - https://bugs.ruby-lang.org/issues/16029 This change makes `Ox` use "interned" (frozen and deduplicated) `String`s anywhere possible, the same effect as calling `String#-@` except without the heap churn of `RVALUE` allocation: https://ruby-doc.org/core/String.html#method-i-2B-40 - Uses the `HAVE_<funcname>` preprocessor macros to detect this functionality and avoid breaking older Rubies (confirmed with at least MRI 2.7.2). - I explicitly use interned `String`s anywhere an allocated `String` seemed entirely internal to `Ox`, e.g. in `sax_value_as_time` when allocating the `String` argument to `ox_time_class`. - Adds a new user-facing option `intern_strings` to control the behavior of `String` return values from user-facing functions, e.g. from `sax_as_s`. - Results in a ~25% reduction in startup GC pressure in my library! :D Inspired by similar changes to `json` and `msgpack`: - ruby/json#451 - msgpack/msgpack-ruby#196 My goal is `Object` allocation reduction in the `Ox::Sax` handler of my MIME Type file-identification library caused by the large number of duplicate `String`s in the `shared-mime-info`-format XML it uses as a data source. I was already calling `#-@` (or using the `-some_str_variable` syntax) everywhere possible in my handler, but that only freezes and deduplicates an already-allocated `String`: irb(main):068:0> oid = proc { puts "#{_1} is object_id #{_1.object_id}" } => #<Proc:0x0000564d53eb6850 (irb):66> irb(main):069:0> lol = 'lol'.tap(&oid)[email protected](&oid) lol is object_id 19800 lol is object_id 19740 => "lol" irb(main):070:0> -lol.tap(&oid)[email protected](&oid) lol is object_id 19740 lol is object_id 19740 => "lol" That avoids duplicate retained `String`s but still causes my library to take a big GC hit right at startup when it loads its data. All the stats below were collected using SamSaffron's `memory_profiler`: https://github.com/SamSaffron/memory_profiler First, a sanity-check to make sure I didn't break MRI < 3.0, using MRI 2.7 + latest official `Ox` from RubyGems (2.14.5): [okeeblow@emi#CHECKING-YOU-OUT] ruby -v ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux] [okeeblow@emi#CHECKING-YOU-OUT] ./bin/are-we-unallocated-yet|grep Total Total allocated: 20561351 bytes (401372 objects) Total retained: 1680971 bytes (22102 objects) versus same MRI 2.7 but with my patched `Ox`: [okeeblow@emi#CHECKING-YOU-OUT] gem install ../../ox/ox-2.14.5.gem [okeeblow@emi#CHECKING-YOU-OUT] ./bin/are-we-unallocated-yet|grep Total Total allocated: 20561111 bytes (401366 objects) Total retained: 1680971 bytes (22102 objects) No difference between the two, showing that this patch is a no-op for pre-3.0. Sorry I did not test older versions of MRI or other Rubies like J/Truffle/etc since I don't have them on my system. Now the real gainz can be seen when comparing MRI 3.0 with unpatched `Ox`: [okeeblow@emi#CHECKING-YOU-OUT] bundle install Fetching gem metadata from https://rubygems.org/...... … Installing ox 2.14.5 with native extensions Using checking-you-out 0.7.0 from source at `.` Bundle complete! 11 Gemfile dependencies, 19 gems now installed. [okeeblow@emi#CHECKING-YOU-OUT] ./bin/are-we-unallocated-yet|grep Total Total allocated: 20081080 bytes (390133 objects) Total retained: 1713209 bytes (22095 objects) against same MRI 3.0 with my patched `Ox`: [okeeblow@emi#CHECKING-YOU-OUT] gem install ../../ox/ox-2.14.5.gem Building native extensions. This could take a while... Successfully installed ox-2.14.5 Parsing documentation for ox-2.14.5 unknown encoding name ""UTF-8"" for README.md, skipping Installing ri documentation for ox-2.14.5 Done installing documentation for ox after 0 seconds 1 gem installed [okeeblow@emi#CHECKING-YOU-OUT] ./bin/are-we-unallocated-yet|grep Total Total allocated: 17298804 bytes (322860 objects) Total retained: 1713202 bytes (22073 objects) against same MRI 3.0, my patched `Ox`, and `Ox.parse_sax(intern_strings: true)`: [okeeblow@emi#CHECKING-YOU-OUT] ./bin/are-we-unallocated-yet|grep Total Total allocated: 16414938 bytes (301382 objects) Total retained: 1713226 bytes (22073 objects) This shows that just having Ruby 3.0 available results in a huge win, and opting into immutable `String`s from `Value#as_s` helps me even more! Unit tests pass: [okeeblow@emi#test] ruby tests.rb Loaded suite tests Started ................. 16 tests, 40 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 100% passed ..... 18 tests, 42 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 100% passed ............................................................................................................................... 151 tests, 249 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 100% passed Apologies for possible indentation issues in this patch. There seem to be lots of existing lines with a mix of tabs and spaces — nbd, it happens — so I tried to match the surrounding lines everywhere I made an addition to make sure things line up and look okay in `git diff`. I only use `Ox` for Sax parsing, not for marshalling, so please scrutinize my changes in `gen_load.c` and friends extra hard in case I omitted any `intern_strings` option checks that could result in an unwary user getting an immutable `String` where there wasn't one before. The one change I had to make to a `String#force_encoding`-using test case is an example of what I want to avoid surprising anyone with.
For what it's worth Ox is already the fastest Ruby XML parser for my use case even without this change. My library wouldn't be feasible without it since it's the only one that can parse |
I'll look over this tonight. Good to know about the latests string optimizations. |
…ax` example. For ohler55#277 I've been experimenting with `Ractor`s since upgrading to Ruby 3.0 for ohler55#275 but quickly ran into `Ractor::UnsafeError` when trying to call `Ox::sax_parse` in anything except the main `Ractor`. Per Ruby's `Ractor` C extension documentation (link below): "By default, all C extensions are recognized as `Ractor`-unsafe. If C extension becomes `Ractor`-safe, the extension should call `rb_ext_ractor_safe(true)` at the `Init_` function and all defined method marked as `Ractor`-safe. `Ractor`-unsafe C-methods only been called from main-ractor. If non-main ractor calls it, then `Ractor::UnsafeError` is raised." I don't like to open seemingly-large feature requests like this without making some attempt at it myself first, and luckily it seems like `Ox::sax_parse` Just Works™ since I marked it `Ractor`-safe, even with the `class_cache`. Confirming this safety, making any remaining changes to `Ox::Sax`, and expanding this to the non-`Sax` parts of `Ox` are all unfortunately out of my depth as a n00b C coder, so I would appreciate if you could take this over if it interests you. I am happy with just `Sax` support since I have no current need for marshalling, but I imagine other `Ox` users wouldn't be satisfied if stratified. In this commit: - Enable `rb_ext_ractor_safe` preprocessor macro via `have_func` in `extconf.rb`. - Mark `Init_Ox` and `ox_sax_parse` as `Ractor` -safe. - Add a new `Ractor`-based `Ox::Sax` example exercising both parallel and serial `Ox::Sax` handler `Ractor`s to parse data from `shared-mime-info` XML files many users likely already have on their systems. Official `Ractor` info: - "Ractor: a proposal for a new concurrent abstraction without thread-safety issues": https://bugs.ruby-lang.org/issues/17100 (ruby/ruby#3365) - Ruby's official `Ractor` documentation: https://docs.ruby-lang.org/en/master/doc/ractor_md.html - "A way to mark C extensions as thread-safe, Ractor-safe, or unsafe": https://bugs.ruby-lang.org/issues/17307 (ruby/ruby#3824) - Ruby's C Extension `Ractor` documention covering `rb_ext_ractor_safe`: https://docs.ruby-lang.org/en/master/doc/extension_rdoc.html#label-Appendix+F.+Ractor+support - A `Ractor` C Extension from the creator of `Ractor` that might serve as a useful example: https://github.com/ko1/ractor-tvar Blogs: - "Ractors: Multi-Core Parallel Processing Comes to Ruby 3": https://www.ruby3.dev/ruby-3-fundamentals/2021/01/27/ractors-multi-core-parallel-processing-in-ruby-3/ - "Ruby Ractor Experiments: Safe async communication" :https://ivoanjo.me/blog/2021/02/14/ractor-experiments-safe-async/ - "Playing with Ruby Ractors": https://billy-ruffian.co.uk/playing-with-ruby-ractors/ - "How Fast are Ractors?": https://www.fastruby.io/blog/ruby/performance/how-fast-are-ractors.html (https://github.com/noahgibbs/ractor_basic_benchmarks/tree/main/benchmarks) Before this change: ``` [okeeblow@emi#CHECKING-YOU-OUT] time ./bin/checking-you-out ~/224031-dot-jpg Received /home/okeeblow/.local/share/mime/packages/user-extension-rsrc.xml /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival/mr_mime.rb:375:in `sax_parse': ractor unsafe method called from not main ractor (Ractor::UnsafeError) from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival/mr_mime.rb:375:in `block in open' from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival/mr_mime.rb:331:in `open' from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival/mr_mime.rb:331:in `open' from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:24:in `block (2 levels) in remember_me' from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:19:in `loop' from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:19:in `block in remember_me' <internal:ractor>:583:in `send': The incoming-port is already closed (Ractor::ClosedError) from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:86:in `block in extended' from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:63:in `each' from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:63:in `each_with_object' from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:63:in `extended' from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/inner_spirit.rb:216:in `extend' from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/inner_spirit.rb:216:in `<top (required)>' from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out.rb:10:in `require_relative' from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out.rb:10:in `<top (required)>' from ./bin/checking-you-out:3:in `require_relative' from ./bin/checking-you-out:3:in `<main>' ./bin/checking-you-out ~/224031-dot-jpg 0.12s user 0.05s system 100% cpu 0.168 total ``` My new `Ox::Sax` Ractor example script's usage: ``` [okeeblow@emi#ox] ./examples/sax_ractor.rb Please provide the path to a `shared-mime-info` XML package and some media-type query arguments (e.g. 'image/jpeg') ``` ``` [okeeblow@emi#ox] ./examples/sax_ractor.rb /usr/share/mime/packages/freedesktop.org.xml Please provide some media-type query arguments (e.g. 'image/jpeg') ``` Finding all-extant types: ``` [okeeblow@emi#ox] ./examples/sax_ractor.rb /usr/share/mime/packages/freedesktop.org.xml image/jpeg font/ttf application/xhtml+xml image/x-pict Parallel Ractors ["Worker 0 gave us JPEG 影像 (image/jpeg) [.jpeg,.jpg,.jpe]", "Worker 1 gave us TrueType 字型 (font/ttf) [.ttf]", "Worker 2 gave us XHTML 網頁 (application/xhtml+xml) [.xhtml,.xht]", "Worker 3 gave us Macintosh Quickdraw/PICT 繪圖 (image/x-pict) [.pct,.pict,.pict1,.pict2]"] Serial Ractor "ONLY ONE OX gave us [#<CYO JPEG 影像 (image/jpeg) [.jpeg,.jpg,.jpe]>, #<CYO TrueType 字型 (font/ttf) [.ttf]>, #<CYO XHTML 網頁 (application/xhtml+xml) [.xhtml,.xht]>, #<CYO Macintosh Quickdraw/PICT 繪圖 (image/x-pict) [.pct,.pict,.pict1,.pict2]>]" ``` …and not finding invalid ones: ``` [okeeblow@emi#ox] ./examples/sax_ractor.rb /usr/share/mime/packages/freedesktop.org.xml lol/rofl fart/butt image/jpeg Parallel Ractors ["Worker 0 gave us nothing", "Worker 1 gave us nothing", "Worker 2 gave us JPEG 影像 (image/jpeg) [.jpeg,.jpg,.jpe]"] Serial Ractor "ONLY ONE OX gave us [nil, nil, #<CYO JPEG 影像 (image/jpeg) [.jpeg,.jpg,.jpe]>]" ``` Unit tests pass.
…ax` example. (#278) For #277 I've been experimenting with `Ractor`s since upgrading to Ruby 3.0 for #275 but quickly ran into `Ractor::UnsafeError` when trying to call `Ox::sax_parse` in anything except the main `Ractor`. Per Ruby's `Ractor` C extension documentation (link below): "By default, all C extensions are recognized as `Ractor`-unsafe. If C extension becomes `Ractor`-safe, the extension should call `rb_ext_ractor_safe(true)` at the `Init_` function and all defined method marked as `Ractor`-safe. `Ractor`-unsafe C-methods only been called from main-ractor. If non-main ractor calls it, then `Ractor::UnsafeError` is raised." I don't like to open seemingly-large feature requests like this without making some attempt at it myself first, and luckily it seems like `Ox::sax_parse` Just Works™ since I marked it `Ractor`-safe, even with the `class_cache`. Confirming this safety, making any remaining changes to `Ox::Sax`, and expanding this to the non-`Sax` parts of `Ox` are all unfortunately out of my depth as a n00b C coder, so I would appreciate if you could take this over if it interests you. I am happy with just `Sax` support since I have no current need for marshalling, but I imagine other `Ox` users wouldn't be satisfied if stratified. In this commit: - Enable `rb_ext_ractor_safe` preprocessor macro via `have_func` in `extconf.rb`. - Mark `Init_Ox` and `ox_sax_parse` as `Ractor` -safe. - Add a new `Ractor`-based `Ox::Sax` example exercising both parallel and serial `Ox::Sax` handler `Ractor`s to parse data from `shared-mime-info` XML files many users likely already have on their systems. Official `Ractor` info: - "Ractor: a proposal for a new concurrent abstraction without thread-safety issues": https://bugs.ruby-lang.org/issues/17100 (ruby/ruby#3365) - Ruby's official `Ractor` documentation: https://docs.ruby-lang.org/en/master/doc/ractor_md.html - "A way to mark C extensions as thread-safe, Ractor-safe, or unsafe": https://bugs.ruby-lang.org/issues/17307 (ruby/ruby#3824) - Ruby's C Extension `Ractor` documention covering `rb_ext_ractor_safe`: https://docs.ruby-lang.org/en/master/doc/extension_rdoc.html#label-Appendix+F.+Ractor+support - A `Ractor` C Extension from the creator of `Ractor` that might serve as a useful example: https://github.com/ko1/ractor-tvar Blogs: - "Ractors: Multi-Core Parallel Processing Comes to Ruby 3": https://www.ruby3.dev/ruby-3-fundamentals/2021/01/27/ractors-multi-core-parallel-processing-in-ruby-3/ - "Ruby Ractor Experiments: Safe async communication" :https://ivoanjo.me/blog/2021/02/14/ractor-experiments-safe-async/ - "Playing with Ruby Ractors": https://billy-ruffian.co.uk/playing-with-ruby-ractors/ - "How Fast are Ractors?": https://www.fastruby.io/blog/ruby/performance/how-fast-are-ractors.html (https://github.com/noahgibbs/ractor_basic_benchmarks/tree/main/benchmarks) Before this change: ``` [okeeblow@emi#CHECKING-YOU-OUT] time ./bin/checking-you-out ~/224031-dot-jpg Received /home/okeeblow/.local/share/mime/packages/user-extension-rsrc.xml /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival/mr_mime.rb:375:in `sax_parse': ractor unsafe method called from not main ractor (Ractor::UnsafeError) from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival/mr_mime.rb:375:in `block in open' from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival/mr_mime.rb:331:in `open' from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival/mr_mime.rb:331:in `open' from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:24:in `block (2 levels) in remember_me' from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:19:in `loop' from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:19:in `block in remember_me' <internal:ractor>:583:in `send': The incoming-port is already closed (Ractor::ClosedError) from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:86:in `block in extended' from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:63:in `each' from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:63:in `each_with_object' from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival.rb:63:in `extended' from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/inner_spirit.rb:216:in `extend' from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out/inner_spirit.rb:216:in `<top (required)>' from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out.rb:10:in `require_relative' from /home/okeeblow/Works/DistorteD/CHECKING-YOU-OUT/lib/checking-you-out.rb:10:in `<top (required)>' from ./bin/checking-you-out:3:in `require_relative' from ./bin/checking-you-out:3:in `<main>' ./bin/checking-you-out ~/224031-dot-jpg 0.12s user 0.05s system 100% cpu 0.168 total ``` My new `Ox::Sax` Ractor example script's usage: ``` [okeeblow@emi#ox] ./examples/sax_ractor.rb Please provide the path to a `shared-mime-info` XML package and some media-type query arguments (e.g. 'image/jpeg') ``` ``` [okeeblow@emi#ox] ./examples/sax_ractor.rb /usr/share/mime/packages/freedesktop.org.xml Please provide some media-type query arguments (e.g. 'image/jpeg') ``` Finding all-extant types: ``` [okeeblow@emi#ox] ./examples/sax_ractor.rb /usr/share/mime/packages/freedesktop.org.xml image/jpeg font/ttf application/xhtml+xml image/x-pict Parallel Ractors ["Worker 0 gave us JPEG 影像 (image/jpeg) [.jpeg,.jpg,.jpe]", "Worker 1 gave us TrueType 字型 (font/ttf) [.ttf]", "Worker 2 gave us XHTML 網頁 (application/xhtml+xml) [.xhtml,.xht]", "Worker 3 gave us Macintosh Quickdraw/PICT 繪圖 (image/x-pict) [.pct,.pict,.pict1,.pict2]"] Serial Ractor "ONLY ONE OX gave us [#<CYO JPEG 影像 (image/jpeg) [.jpeg,.jpg,.jpe]>, #<CYO TrueType 字型 (font/ttf) [.ttf]>, #<CYO XHTML 網頁 (application/xhtml+xml) [.xhtml,.xht]>, #<CYO Macintosh Quickdraw/PICT 繪圖 (image/x-pict) [.pct,.pict,.pict1,.pict2]>]" ``` …and not finding invalid ones: ``` [okeeblow@emi#ox] ./examples/sax_ractor.rb /usr/share/mime/packages/freedesktop.org.xml lol/rofl fart/butt image/jpeg Parallel Ractors ["Worker 0 gave us nothing", "Worker 1 gave us nothing", "Worker 2 gave us JPEG 影像 (image/jpeg) [.jpeg,.jpg,.jpe]"] Serial Ractor "ONLY ONE OX gave us [nil, nil, #<CYO JPEG 影像 (image/jpeg) [.jpeg,.jpg,.jpe]>]" ``` Unit tests pass.
Can this be closed? Is it still relevant? |
I have a PR ready to go for this feature but this can be for discussion of the general concept.
The text was updated successfully, but these errors were encountered: