Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add lines skipped metric to pattern ingesters #14997

Merged
merged 7 commits into from
Nov 22, 2024

Conversation

trevorwhitney
Copy link
Collaborator

@trevorwhitney trevorwhitney commented Nov 18, 2024

What this PR does / why we need it:

This adds a lines_skpped metric to the pattern ingesters, which counts log lines that have been skipped for pattern ingestion. This also adds logic to skip lines with too many (> 50) tokens.

Reasons for skipping:

  • too few tokens
  • too many tokens
  • line too long

Which issue(s) this PR fixes:
Fixes #14882

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

Reasons for skipping:
- too few tokens
- too many tokens
- line too long
@trevorwhitney trevorwhitney requested a review from a team as a code owner November 18, 2024 21:17
@pull-request-size pull-request-size bot added size/L and removed size/M labels Nov 20, 2024
}
return nil
}
if len(tokens) > 80 {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

80 is the value adaptive logs uses, so it seems reasonable to do the same. the difference there is they truncate the tokens slice at 80, whereas we drop. my reason for that is the integration between pattern ingester patterns and pattern search in Explore Logs, and searching by a truncated set of tokens won't yield the same result unless we know it's truncated and insert a wildcard at the end of the pattern.

Copy link
Contributor

@benclive benclive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! We might have to tweak the 80 value judging by the tests as I think these tokenizers generate more tokens than the adaptive logs ones do.
I'm happy to judge that once it has rolled out - we might need to make this a per-tenant config eventually.

return nil
}
if len(tokens) > 80 {
print(tokens)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug leftovers

Copy link
Contributor

@poyzannur poyzannur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@trevorwhitney trevorwhitney merged commit dea5d78 into main Nov 22, 2024
57 checks passed
@trevorwhitney trevorwhitney deleted the limit-patterns-by-tokens branch November 22, 2024 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Limit number of tokens in pattern ingester
3 participants