Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Improve containsLower performance using quick rejection #15076

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

cyriltovena
Copy link
Contributor

@cyriltovena cyriltovena commented Nov 22, 2024

This pull request includes improvements to the containsLower function in pkg/logql/log/filter.go for better performance and adds benchmark tests to ensure the changes are effective. The key changes include optimizing the case-insensitive substring search and introducing comprehensive benchmark tests.

Optimizations to containsLower function:

  • pkg/logql/log/filter.go: Optimized the containsLower function by implementing a more efficient search algorithm that first locates the potential starting byte of the substring and then verifies the rest of the substring with both fast ASCII and slower Unicode comparisons.

Addition of benchmark tests:

  • pkg/logql/log/filter_test.go: Added a new benchmark function BenchmarkContainsLower with various test cases to evaluate the performance of the containsLower function under different scenarios, including short and long lines, matches and non-matches, and lines with Unicode characters.

Result:

benchstat before.txt before2.txt
name                                             old time/op    new time/op    delta
name                                             old time/op    new time/op    delta
ContainsLower/short_line_no_match-16               45.6ns ± 2%    17.6ns ± 5%  -61.28%  (p=0.008 n=5+5)
ContainsLower/short_line_with_match-16             29.3ns ±12%    27.6ns ± 9%     ~     (p=0.222 n=5+5)
ContainsLower/long_line_no_match-16                 486ns ± 5%     247ns ± 2%  -49.14%  (p=0.008 n=5+5)
ContainsLower/long_line_match_start-16             7.70ns ± 3%    8.99ns ± 5%  +16.82%  (p=0.008 n=5+5)
ContainsLower/long_line_match_middle-16             205ns ± 3%     103ns ± 7%  -49.89%  (p=0.008 n=5+5)
ContainsLower/long_line_match_end-16                469ns ± 2%     299ns ±36%  -36.18%  (p=0.008 n=5+5)
ContainsLower/short_unicode_line_no_match-16        203ns ± 4%      46ns ± 9%  -77.27%  (p=0.008 n=5+5)
ContainsLower/short_unicode_line_with_match-16     69.0ns ± 1%    42.1ns ±17%  -38.99%  (p=0.008 n=5+5)
ContainsLower/long_unicode_line_no_match-16        2.03µs ± 1%    0.22µs ± 1%  -89.14%  (p=0.016 n=5+4)
ContainsLower/long_unicode_line_match_start-16      724ns ± 2%     290ns ± 0%  -59.89%  (p=0.016 n=5+4)
ContainsLower/long_unicode_line_match_middle-16     805ns ± 0%      86ns ± 8%  -89.36%  (p=0.016 n=4+5)
ContainsLower/long_unicode_line_match_end-16       3.70µs ± 1%    0.39µs ±11%  -89.35%  (p=0.008 n=5+5)

*What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@cyriltovena cyriltovena requested a review from a team as a code owner November 22, 2024 15:20
@cyriltovena
Copy link
Contributor Author

Sounds like test failure so will need to check

@cyriltovena cyriltovena marked this pull request as draft November 22, 2024 15:21
@cyriltovena cyriltovena marked this pull request as ready for review November 26, 2024 14:06
Comment on lines -450 to -457
j := 0
for len(line) > 0 {
// ascii fast case
if c := line[0]; c < utf8.RuneSelf && substr[j] < utf8.RuneSelf {
if c == substr[j] || c+'a'-'A' == substr[j] || c == substr[j]+'a'-'A' {
j++
if j == len(substr) {
return true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting I've always thought Loki was using strings.Index but I guess one cannot ignore character casing there.

Anyhow, during my work on a SIMD substring search I've found that the underlying Rabin-Karp algorithm is soo much faster. I'm curious if it could be adapted to be case-insensitive 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't for case sensitive yes. What do you think of the PR though that would help me to get a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants