Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ByteSubstring: add an option to used an owned needle (not borrowed) #50

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

tsnoam
Copy link

@tsnoam tsnoam commented Sep 29, 2021

This PR is a suggestion. I have done a minimal implementation, just so we have something to discuss.

The problem I was facing was that I couldn't use a borrowed ByteSubstring. So, I've modified jetscii's code to accept a Cow object.

Let me know what you think and if there're are any more modifications you would like to see as part of this PR.

@tsnoam
Copy link
Author

tsnoam commented Jun 28, 2022

@shepmaster
kind ping on that.

i would be happy to rebase on current main but I would like to know if you'd be interested in this change

@shepmaster
Copy link
Owner

shepmaster commented Jul 6, 2022

Oh, wow, I'm sorry that this totally dropped off my radar!

I think the idea has merit, but I think I would suggest a different implementation.

Right now, we have crate::ByteSubstring which delegates to crate::fallback::ByteSubstring and crate::simd::ByteSubstring. In order to support your goal, I'd suggest:

  1. refactoring commit
    • Stop storing the needle in {fallback,simd}::ByteSubstring, instead passing it into find.
    • Start storing the needle in crate::ByteSubstring instead.
  2. feature commit
    • Add a new crate::OwnedByteSubstring that holds a Vec<u8>
    • This calls into {fallback,simd}::ByteSubstring::find.

This way, you can choose if you need ownership or just a reference.

Does that make sense and seem plausible?

@shepmaster
Copy link
Owner

After musing on it for a while I wonder if we could change ByteSubstring to be generic with T: AsRef<[u8]>.

@tsnoam
Copy link
Author

tsnoam commented Jul 14, 2022

Hi @shepmaster
Sorry for the late response. Yes, I like your suggestion. I can contribute these changes and obviously base it on the latest "main" branch.

@shepmaster
Copy link
Owner

Sorry for the late response

i think you’ve got a few months longer before you need to start apologizing for anything like that 😅

@tsnoam
Copy link
Author

tsnoam commented Aug 25, 2022

@shepmaster I've updated the PR in accordance with our discussion:

  1. ByteSubstring takes T: AsRef<[u8]
  2. I also added what I thought was an appropriate way to do the same thing for Substring, without breaking ergonomics or backward compat

let me know what you think.

@tsnoam
Copy link
Author

tsnoam commented Aug 28, 2022

I just realized that I only compiled it on my macos m1. So only the fallback code was tested 🤦‍♂️

I will fix and update here...

@tsnoam
Copy link
Author

tsnoam commented Aug 28, 2022

Ok. Fixed.

@shepmaster would really love to see your comments.

@tsnoam
Copy link
Author

tsnoam commented Sep 5, 2022

@shepmaster kind ping on that

@tsnoam
Copy link
Author

tsnoam commented Sep 12, 2022

@shepmaster kind ping :)

@shepmaster shepmaster force-pushed the bytesubstring_owned branch 2 times, most recently from 4c42031 to 1248666 Compare September 13, 2022 17:55
@shepmaster
Copy link
Owner

I did one force-push to tidy up some formatting things, and a second force-push to rebase on the current main branch (which has some soundness fixes).

Overall, things seem reasonable, but I'm a bit hesitant about the dispatch macro vs enum solution. The new code has to check the enum discriminant each call (e.g. to find). The old code would use the feature test, which I believe would do some trickery to perform the test just once and then effectively set a global value that could be checked in the future. I'll need to do some performance tests.

Have you tried this code locally with a String? Does it work for your original case?

shepmaster and others added 6 commits September 14, 2022 13:08
This allows to either own (consume) the needle or keep a reference to it.
This allows to either own (consume) the needle or keep a reference to it.
This gains two purposes:
1. Fix ownership issue of trying to own needle `T` by both simd & fallback
   modules.

2. Simplifies the runtime checks done by the code when running in `maybe` mode.
@tsnoam
Copy link
Author

tsnoam commented Sep 14, 2022

Hi @shepmaster

Thanks for the review.

  1. ACK for the formatting and rebasing over master.
  2. Basically, I wanted to hear your initial feedback about the code before working harder to make things perfect ;)
  3. However, I did not realize that unitests should be executed with all features turned on (I did that and fixed the Substring code).
  4. I did notice that Substring might have backward-incompatible type definition. I'm not 100% sure that this is the way to go, but I think it is quite clean and obvious, will be glad to know what you think of it.
  5. Regarding the dispatch macro: there is no indication in rust documentation that this is cached in any way ( https://doc.rust-lang.org/std/macro.is_x86_feature_detected.html ), so I decided to check it out - first by benchmarks (I couldn't find any significant difference in either direction) but then I disassembled the compiled executable (using objdump -d). The new approach was shorter in about 50% for the ByteSubstring::find code.

Bottom line, I fixed the code a little more (now for all features), benchmarks are running properly - and obviously the new code was forced pushed into the branch.

@shepmaster
Copy link
Owner

I don't understand exactly how, but the benchmarks all show a marked decrease in speed with this branch. That's even including the benchmarks for the standard library versions, which obviously shouldn't change!

 name                                         before ns/iter         after ns/iter          diff ns/iter  diff %  speedup 
 bench::space_ascii_chars                     356,001 (14727 MB/s)   362,121 (14478 MB/s)          6,120   1.72%   x 0.98 
 bench::space_stdlib_find_char                279,225 (18776 MB/s)   278,723 (18810 MB/s)           -502  -0.18%   x 1.00 
 bench::space_stdlib_find_char_set            2,608,215 (2010 MB/s)  2,626,160 (1996 MB/s)        17,945   0.69%   x 0.99 
 bench::space_stdlib_find_closure             2,318,368 (2261 MB/s)  2,390,488 (2193 MB/s)        72,120   3.11%   x 0.97 
 bench::space_stdlib_find_string              2,153,673 (2434 MB/s)  2,268,436 (2311 MB/s)       114,763   5.33%   x 0.95 
 bench::space_stdlib_iterator_position        1,160,877 (4516 MB/s)  1,198,881 (4373 MB/s)        38,004   3.27%   x 0.97 
 bench::substring_stdlib_find                 463,832 (11303 MB/s)   469,920 (11156 MB/s)          6,088   1.31%   x 0.99 
 bench::substring_with_created_searcher       358,074 (14641 MB/s)   361,645 (14497 MB/s)          3,571   1.00%   x 0.99 
 bench::xml_delim_3_ascii_chars               361,004 (14523 MB/s)   362,989 (14443 MB/s)          1,985   0.55%   x 0.99 
 bench::xml_delim_3_stdlib_find_char_closure  3,516,929 (1490 MB/s)  3,459,551 (1515 MB/s)       -57,378  -1.63%   x 1.02 
 bench::xml_delim_3_stdlib_find_char_set      2,612,870 (2006 MB/s)  3,456,608 (1516 MB/s)       843,738  32.29%   x 0.76 
 bench::xml_delim_3_stdlib_iterator_position  1,131,135 (4635 MB/s)  1,155,312 (4538 MB/s)        24,177   2.14%   x 0.98 
 bench::xml_delim_5_ascii_chars               360,298 (14551 MB/s)   360,735 (14533 MB/s)            437   0.12%   x 1.00 
 bench::xml_delim_5_stdlib_find_char_closure  2,715,305 (1930 MB/s)  3,458,322 (1516 MB/s)       743,017  27.36%   x 0.79 
 bench::xml_delim_5_stdlib_find_char_set      3,449,795 (1519 MB/s)  3,473,528 (1509 MB/s)        23,733   0.69%   x 0.99 
 bench::xml_delim_5_stdlib_iterator_position  1,138,696 (4604 MB/s)  1,158,930 (4523 MB/s)        20,234   1.78%   x 0.98 

@shepmaster
Copy link
Owner

might have backward-incompatible type definition

Yeah, I think that this change will warrant a new semver-incompatible version to be released; that's not a big deal IMO.

there is no indication in rust documentation that this is cached in any way

Yeah, I read through that before posting my comment (that's what led to the "I believe"). It must have been made up in my head.

@tsnoam
Copy link
Author

tsnoam commented Sep 14, 2022

I don't understand exactly how, but the benchmarks all show a marked decrease in speed with this branch. That's even including the benchmarks for the standard library versions, which obviously shouldn't change!

That is to be expected. Benchmarks will usually show differences of +-3% depending on their nature and whatever the benchmarking server is doing at the moment.

What I usually did when I was running benchmarks using nightly-rust was to externally run them again and again and look at the avg/max/min of each individual benchmark.

If you would like, it is possible to migrate the benchmarks to use criterion, which saves you the hassle of what described above. There is also a nice tool called critcmp which helps in comparing the results of two different criterion executions.

@tsnoam
Copy link
Author

tsnoam commented Sep 14, 2022

oh, and criterion also runs with stable rust, which is a bonus...

@tsnoam
Copy link
Author

tsnoam commented Sep 14, 2022

sorry, i'm looking again at the stdlib difference, and it is weird. too high of a diff.

let me look into it a little more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants