Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce single_char_pattern to only lint on ascii chars #11852

Merged
merged 2 commits into from
Apr 22, 2024

Conversation

llogiq
Copy link
Contributor

@llogiq llogiq commented Nov 21, 2023

This should mostly fix the single_char_pattern lint, because with a single byte, the optimizer will usually see through the char-to-string-expansion and single loop iteration. This fixes #11675 and #8111.

Update: As per the meeting on November 28th, 2023, we voted to also downgrade the lint to pedantic.


changelog: downgrade [single_char_pattern] to pedantic

@rustbot
Copy link
Collaborator

rustbot commented Nov 21, 2023

r? @Alexendoo

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label Nov 21, 2023
@Alexendoo
Copy link
Member

Maybe it's time to retire this lint? Which is faster seems unpredictable and subject to change across versions, it seems like a better avenue would be opening issues in rustc for any case where one is slower than the other

Comment on lines 1133 to 1134
/// Performing these methods using a `char` can be faster than
/// using a `str` because it needs one less indirection.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true in the current implementation. If you want to be implementation agnostic, you might as well write the opposite:

Suggested change
/// Performing these methods using a `char` can be faster than
/// using a `str` because it needs one less indirection.
/// Performing these methods using a `str` can be faster than
/// using a `char` because it needs one less conversion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It apparently depends on a lot of factors. The char based version stores the UTF8 data inline, whereas the &str one stores a ref to the caller-supplied string slice instead, so if the Searcher instantiation is const, it gets by with one less pointer.

Of course e.g. the fastest starts_with for an ascii character is haystack.as_bytes().get(0) == Some(&(c as u8)), dropping below a nanosecond in my benchmarks.

@flip1995
Copy link
Member

Maybe it's time to retire this lint?

At least I think we should move it to style, as it no longer affects performance IIUC.

@Alexendoo Alexendoo added the I-nominated Issue: Nominated to be discussed at the next Clippy meeting label Nov 27, 2023
@llogiq llogiq force-pushed the single-char-pattern-ascii-only branch from d69a594 to 3d98b67 Compare November 29, 2023 07:11
@flip1995 flip1995 removed the I-nominated Issue: Nominated to be discussed at the next Clippy meeting label Dec 11, 2023
@bors
Copy link
Contributor

bors commented Jan 3, 2024

☔ The latest upstream changes (presumably #12030) made this pull request unmergeable. Please resolve the merge conflicts.

@llogiq
Copy link
Contributor Author

llogiq commented Jan 6, 2024

@Alexendoo should I rebase or close?

@Alexendoo
Copy link
Member

Sorry for the delay, given the decision to move it to pedantic do we still want to make the multibyte change? If we're dropping the perf aspect of the lint to be more a stylistic one I think it would make sense to lint multibyte characters still & modify the lint description to be less perf focused

@llogiq
Copy link
Contributor Author

llogiq commented Jan 7, 2024

The problem I see with that is that for multibyte inputs using a char may actually hurt performance. So yes, I believe we should reduce the lint scope even as we move it to pedantic.

@xFrednet
Copy link
Member

xFrednet commented Apr 1, 2024

Hey @llogiq, this is a ping from triage, since there hasn't been any activity in some time. Are you still planning to continue this implementation?

If you have any questions, you're always welcome to ask them in this PR or on Zulip.

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action from the author. (Use `@rustbot ready` to update this status) and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties labels Apr 1, 2024
@llogiq
Copy link
Contributor Author

llogiq commented Apr 1, 2024

I'll take another look during the next week.

@Alexendoo
Copy link
Member

Ignoring multibyte chars as a known issue sounds reasonable to avoid a perf regression if it's still there, ideally we'd have a rustc issue to link to

I still think the description should be changed though, the perf angle doesn't hold up to me

@llogiq llogiq force-pushed the single-char-pattern-ascii-only branch from 3d98b67 to 1a69f84 Compare April 11, 2024 18:32
@llogiq llogiq force-pushed the single-char-pattern-ascii-only branch from 1a69f84 to 54de78a Compare April 11, 2024 20:24
@llogiq
Copy link
Contributor Author

llogiq commented Apr 11, 2024

@Alexendoo / @xFrednet I rebased the implementation and updated the docs. r?

@rust-lang rust-lang deleted a comment from rustbot Apr 13, 2024
@xFrednet
Copy link
Member

Looks good to me.

Thank you for the update :)

@bors r+

@bors
Copy link
Contributor

bors commented Apr 22, 2024

📌 Commit 54de78a has been approved by xFrednet

It is now in the queue for this repository.

@xFrednet xFrednet assigned xFrednet and unassigned Alexendoo Apr 22, 2024
@bors
Copy link
Contributor

bors commented Apr 22, 2024

⌛ Testing commit 54de78a with merge fc6dfeb...

@bors
Copy link
Contributor

bors commented Apr 22, 2024

☀️ Test successful - checks-action_dev_test, checks-action_remark_test, checks-action_test
Approved by: xFrednet
Pushing fc6dfeb to master...

@bors bors merged commit fc6dfeb into master Apr 22, 2024
6 checks passed
@llogiq llogiq deleted the single-char-pattern-ascii-only branch April 22, 2024 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action from the author. (Use `@rustbot ready` to update this status)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

single_char_pattern suggests using char instead of str for starts_with
7 participants