Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source: Remove unnecessary string length comparisons in the case of string comparisons #116

Merged
merged 1 commit into from
Mar 3, 2024

Conversation

naitoh
Copy link
Contributor

@naitoh naitoh commented Mar 2, 2024

Why

rexml/lib/rexml/source.rb

Lines 208 to 234 in 370666e

def readline
str = @source.readline(@line_break)
if @pending_buffer
if str.nil?
str = @pending_buffer
else
str = @pending_buffer + str
end
@pending_buffer = nil
end
return nil if str.nil?
if @to_utf
decode(str)
else
str.force_encoding(::Encoding::UTF_8) if @force_utf8
str
end
end
def encoding_updated
case @encoding
when "UTF-16BE", "UTF-16LE"
@source.binmode
@source.set_encoding(@encoding, @encoding)
end
@line_break = encode(">")

Because @line_break = encode(">"), the end of @scanner << readline is one of the following.

  1. ">"
  2. "X>"
  3. "X" (eof)

This will not be matched by additional reads in the following cases.

  • @source.match("<?")
  • @source.match("--")
  • @source.match("DOCTYPE")

In the following cases, additional reads may result in a match, but explicitly prohibiting such a specification with a comment makes the string length check unnecessary.

  • @source.match(">>")
  • @source.match(">X")

Benchmark

RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.0/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin22]
Calculating -------------------------------------
                         before       after  before(YJIT)  after(YJIT)
                 dom     10.689      10.736        18.484       18.108 i/s -     100.000 times in 9.355754s 9.314792s 5.409984s 5.522527s
                 sax     30.793      31.583        52.965       52.641 i/s -     100.000 times in 3.247486s 3.166258s 1.888036s 1.899660s
                pull     36.308      37.182        63.773       64.669 i/s -     100.000 times in 2.754203s 2.689440s 1.568069s 1.546325s
              stream     34.936      35.991        56.830       57.729 i/s -     100.000 times in 2.862361s 2.778467s 1.759632s 1.732238s

Comparison:
                              dom
        before(YJIT):        18.5 i/s
         after(YJIT):        18.1 i/s - 1.02x  slower
               after:        10.7 i/s - 1.72x  slower
              before:        10.7 i/s - 1.73x  slower

                              sax
        before(YJIT):        53.0 i/s
         after(YJIT):        52.6 i/s - 1.01x  slower
               after:        31.6 i/s - 1.68x  slower
              before:        30.8 i/s - 1.72x  slower

                             pull
         after(YJIT):        64.7 i/s
        before(YJIT):        63.8 i/s - 1.01x  slower
               after:        37.2 i/s - 1.74x  slower
              before:        36.3 i/s - 1.78x  slower

                           stream
         after(YJIT):        57.7 i/s
        before(YJIT):        56.8 i/s - 1.02x  slower
               after:        36.0 i/s - 1.60x  slower
              before:        34.9 i/s - 1.65x  slower
  • YJIT=ON : 0.98x - 1.02x faster
  • YJIT=OFF : 1.00x - 1.03x faster

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -161,6 +161,9 @@ def read
end
end

# Note: "When specifying a string for 'pattern', it must not include '>' except in the following format."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Note: "When specifying a string for 'pattern', it must not include '>' except in the following format."
# Note: When specifying a string for 'pattern', it must not include '>' except in the following formats:

@kou kou changed the title Remove unnecessary string length comparisons in the case of string comparisons source: Remove unnecessary string length comparisons in the case of string comparisons Mar 3, 2024
…tring comparisons

## Why
https://github.com/ruby/rexml/blob/370666e314816b57ecd5878e757224c3b6bc93f5/lib/rexml/source.rb#L208-L234

Because `@line_break = encode(">")`, the end of `@scanner << readline` is one of the following.

1. ">"
2. "X>"
3. "X" (eof)

This will not be matched by additional reads in the following cases.

- `@source.match("<?")`
- `@source.match("--")`
- `@source.match("DOCTYPE")`

In the following cases, additional reads may result in a match, but explicitly prohibiting such a specification with a comment makes the string length check unnecessary.
- `@source.match(">>")`
- `@source.match(">X")`

## Benchmark

```
RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.0/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin22]
Calculating -------------------------------------
                         before       after  before(YJIT)  after(YJIT)
                 dom     10.689      10.736        18.484       18.108 i/s -     100.000 times in 9.355754s 9.314792s 5.409984s 5.522527s
                 sax     30.793      31.583        52.965       52.641 i/s -     100.000 times in 3.247486s 3.166258s 1.888036s 1.899660s
                pull     36.308      37.182        63.773       64.669 i/s -     100.000 times in 2.754203s 2.689440s 1.568069s 1.546325s
              stream     34.936      35.991        56.830       57.729 i/s -     100.000 times in 2.862361s 2.778467s 1.759632s 1.732238s

Comparison:
                              dom
        before(YJIT):        18.5 i/s
         after(YJIT):        18.1 i/s - 1.02x  slower
               after:        10.7 i/s - 1.72x  slower
              before:        10.7 i/s - 1.73x  slower

                              sax
        before(YJIT):        53.0 i/s
         after(YJIT):        52.6 i/s - 1.01x  slower
               after:        31.6 i/s - 1.68x  slower
              before:        30.8 i/s - 1.72x  slower

                             pull
         after(YJIT):        64.7 i/s
        before(YJIT):        63.8 i/s - 1.01x  slower
               after:        37.2 i/s - 1.74x  slower
              before:        36.3 i/s - 1.78x  slower

                           stream
         after(YJIT):        57.7 i/s
        before(YJIT):        56.8 i/s - 1.02x  slower
               after:        36.0 i/s - 1.60x  slower
              before:        34.9 i/s - 1.65x  slower
```

- YJIT=ON : 0.98x - 1.02x faster
- YJIT=OFF : 1.00x - 1.03x faster
@naitoh naitoh force-pushed the del_bytesize_check branch from 84ba8bd to 261e10b Compare March 3, 2024 09:17
@naitoh naitoh requested a review from kou March 3, 2024 09:17
@kou kou merged commit 19975fe into ruby:master Mar 3, 2024
39 checks passed
@naitoh naitoh deleted the del_bytesize_check branch March 3, 2024 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants