Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjacent replaced multiline matches result in wrong line numbers #2779

Open
1 task done
meedstrom opened this issue Apr 11, 2024 · 6 comments
Open
1 task done

Adjacent replaced multiline matches result in wrong line numbers #2779

meedstrom opened this issue Apr 11, 2024 · 6 comments
Labels
bug A bug.

Comments

@meedstrom
Copy link

meedstrom commented Apr 11, 2024

Please tick this box to confirm you have reviewed the above.

  • I have a different issue.

What version of ripgrep are you using?

ripgrep 13.0.0
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)

How did you install ripgrep?

APT

What operating system are you using ripgrep on?

Kubuntu 23.10

Describe your bug.

This is similar to #2420, and I understand why that's WONTFIX. This is different though.

Using a multiline regexp, when the regexp matches strings that come immediately one after another, it bungles the line numbers of all of them. Easier to show you with a reproduction example:

What are the steps to reproduce the behavior?

Save a file test.txt containing:

:properties:
:id: fnord
:end:
:properties:
:id: boccob
:end:
:properties:
:id: d321fdddffff
:end:
:properties:
:id: clowns
:end:

Then run

rg -nU '^:properties:\n:id: (.*)\n:end:' -r '$1' test.txt

What is the actual behavior?

The result is

1:fnord
2:boccob
3:d321fdddffff
4:clowns

Only the first hit is correct.

You can see that the line numbers will be correctly reported if you modify the file to add a newline after each instance of ":end:".

What is the expected behavior?

Expected the result:

1:fnord
4:boccob
7:d321fdddffff
10:clowns
@BurntSushi
Copy link
Owner

Using a multiline regexp, when the regexp matches strings that come immediately one after another, it bungles the line numbers of all of them.

This is an incomplete description of the problem you're reporting. It isn't just when a regex matches strings that are adjacent, it's also required that you use the -r/--replace flag to replace matches with something else. This can be easily demonstrated by omitting the -r/--replace flag and observing that the line numbers are correct:

$ rg -nU '^:properties:\n:id: (.*)\n:end:' test.txt
1::properties:
2::id: fnord
3::end:
4::properties:
5::id: boccob
6::end:
7::properties:
8::id: d321fdddffff
9::end:
10::properties:
11::id: clowns
12::end:

Indeed though, the adjacency part of this seems important. If instead I use the following haystack as test2.txt:

:properties:
:id: fnord
:end:
ZZZ
:properties:
:id: boccob
:end:
ZZZ
:properties:
:id: d321fdddffff
:end:
ZZZ
:properties:
:id: clowns
:end:

Then the line numbers change:

$ rg -nU '^:properties:\n:id: (.*)\n:end:' -r '$1' test2.txt
1:fnord
5:boccob
9:d321fdddffff
13:clowns

@BurntSushi BurntSushi added the bug A bug. label Apr 11, 2024
@meedstrom
Copy link
Author

meedstrom commented Apr 25, 2024

As you say, I forgot to mertion the --replace. Changed the title.

@meedstrom meedstrom changed the title Consecutive multiline matches mis-report line numbers Adjacent replaced multiline matches result in wrong line numbers Apr 25, 2024
@meedstrom
Copy link
Author

I'm not proficient with Rust, but I could try to look for the problem. Any pointers on where to look?

@BurntSushi
Copy link
Owner

Likely in grep-printer. Look at the "standard" printer.

That's where I would start anyway.

@WalterScottYoung
Copy link

It seems that the problem is we didn't keep track of line number when doing replace using regex package.

Within a call of StandardSink::replace, if there are multiple matches we didn't record the line number corresponding to each matches, instead we output the data after replace to sink, in which the line number was counted by number of '\n' which is wrong because some '\n' was modified when we do StandardSink::replace.

@WalterScottYoung
Copy link

WalterScottYoung commented Nov 29, 2024

With that said, it minght be hard to get the line number of the replaced result, and it requires some changes in apis to pass the corresponding line number of each replaced result to sink to be used in printing. I wonder would it be easier to do what the the documentation says (#2852), and don't print line number but print index of matches in multi-line search.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug.
Projects
None yet
Development

No branches or pull requests

3 participants