-
Notifications
You must be signed in to change notification settings - Fork 465
-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
detect-secrets not identifying all Github token occurrences in a file #858
Comments
karamuz
added a commit
to karamuz/detect-secrets
that referenced
this issue
Jun 20, 2024
karamuz
added a commit
to karamuz/detect-secrets
that referenced
this issue
Jun 21, 2024
3 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'm submitting a ...
What is the current behavior?
For example, given the file
test_ghp.txt
:When I scan the file, I get these results:
As referenced in #493, if the secret is written into a file at multiple locations, only the first one is identified by detect-secrets. The problem here is that having multiple GitHub tokens with different values in the same file, they are still interpreted as if they were the same.
In the regular expression used here:
(ghp|gho|ghu|ghs|ghr)_[A-Za-z0-9_]{36}
There is one capturing group:
(ghp|gho|ghu|ghs|ghr)
. This group is designed to match and capture the prefix part of a GitHub token.Because of this capturing group, when findall() processes a string matching this pattern, it does not return the entire match ("ghp_...36 characters..."). Instead, it returns only the part of the match that corresponds to the capturing group, which in your test cases would be "ghp", "gho", etc., depending on the token.
Example:
If you were to run findall() on a string like "Test ghp_abc123...", given the regex above, the output would be:
['ghp'] # Instead of ['ghp_abc123...']
This output occurs because findall() focuses solely on the capturing group, rather than the entire pattern.
The expected behavior would be to capture all the different secrets in a file.
Please tell us about your environment:
Other information
In the analyze_string function, maybe using finditer() could solve the issue to ensure that the entire matching string is retrieved.
finditer() yields match objects from which you can extract specific groups or the entire match (via match.group(0)), providing flexibility and precision in handling regex matches.
The text was updated successfully, but these errors were encountered: