Skip to content

Conversation

gechr
Copy link
Contributor

@gechr gechr commented Sep 9, 2025

Attempt to implement the replace(pattern, replacement, [limit]) method on strings, with support for capture groups only when the pattern is specified using either the regex: or regex-i prefixes.

This was made trickier by the fact that the replacement string should not attempt to substitute $1, $2 etc when the pattern is not a regular expression (see the tests for example).

I'm not convinced my implementation is the best here. I tried a few alternatives to handle the conditional capture groups and settled on using NoExpand, but I'm sure the code can be improved here - very open to feedback!

@gechr gechr requested a review from a team as a code owner September 9, 2025 00:41
@gechr gechr force-pushed the gc/vroxqwnozmul branch 3 times, most recently from 036db58 to 3c4da11 Compare September 9, 2025 00:58
Copy link
Contributor

@yuja yuja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Comment on lines 1084 to 1089
(false, None) => {
regex.replace_all(haystack_bytes, regex::bytes::NoExpand(replace_bytes))
}
(false, Some(n)) => {
regex.replacen(haystack_bytes, n, regex::bytes::NoExpand(replace_bytes))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does .to_regex() of string/glob insert numbered/named capture groups? If not, it's probably better to remove NoExpand so $0 always expands to the matched part.

Copy link
Contributor Author

@gechr gechr Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For non-regexes, capture groups are not inserted, but the replacement is the issue - it expands $1, $2 to empty strings, which might be confusing for the average user trying to do a simple substring replacement like .replace("$500", "$1000"), which was my rationale for this special logic.

Current implementation (since pattern is not a regex):

$ jj show --no-patch -T '"(a) $100 (b)".replace("$100", "$2 $500 $1")'
(a) $2 $500 $1 (b)

But if we remove NoExpand:

$ jj show --no-patch -T '"(a) $100 (b)".replace("$100", "$2 $500 $1")'
(a)    (b)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that the same for regex substitution? replace(regex:'[a-z]+', '$500') where user should escape $.

Since we don't add separate replace_regex(), replace_str() functions, I don't expect the substitution argument is parsed differently depending on the pattern argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but I suppose my thought process was: if the user is explicitly opting in to use regular expressions (via the regex: prefix), then they'll also know that they also need to escape the substitution, because they made that choice themselves.

The average user attempting a simple substring replacement without the regex: prefix might find it surprising that they need to escape the replacement string, but I can also see your point that choosing to escape or not based on the pattern is a bit weird.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even when opted in to regex:, user might not always expect $n substitution. We could add an explicit syntax/option to do variable substitution (e.g. replace(pattern, |m| "foo" ++ m.get(0) ++ "bar")), but that would be complicated in both implementation and usage. So I thought it would be simpler to just enable $n substitution.

Another option is to add expand: boolean argument, but I'm not sure if that's intuitive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there is a case for having Regex/Pattern be a distinct type rather than being just a string?

Copy link
Contributor Author

@gechr gechr Sep 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I personally feel it's kinda clunky and bad UX, I'm happy to add the expand parameter previously suggested to get this over the line. To be clear, I'm not entirely happy with the regex: prefix triggering special handling of capture groups either, but it just feels, to me, like the least surprising of all the ideas so far.

@yuja What I want to avoid is that the regex: prefix has a special behaviour in very specific call sites.

Technically, the regex: prefix doesn't have special behaviour for the pattern - it still essentially behaves the same way. It's the replacement that would have the special behaviour in this instance. As far as I know there is nowhere else in the jj template system where capture groups are currently used in such a way? I may be wrong as I'm still fairly new to jj.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, the regex: prefix doesn't have special behaviour for the pattern - it still essentially behaves the same way. It's the replacement that would have the special behaviour in this instance.

Yeah, that's the problem I want to avoid. The pattern prefix applies to both pattern and replacement, which seems odd. I also agree expand=true|false is clunky. Maybe we could add expand: prefix to the substitution part, but I'm not sure if that would be better overall.

Copy link
Contributor Author

@gechr gechr Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had an another idea, just to throw it into the mix, I'm not sure if I'm a fan of it, but just to stir things up a bit...

Since @yuja's concern is different regex semantics based on call site, then perhaps we could just have separate functions entirely, e.g. string_replace and regex_replace, or substitute and replace?

The former would support StringPattern (but disallow regex:, with an error suggesting to use regex_replace instead), and not expand the replacement.

The latter would support StringPattern but only allow regex: (or a string that is treated as a regex), and expand the replacement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's also good if we can find good names for these two functions. I would make the one take pattern as Stringify (any template-able substring) and the other take as StringPattern, though.

@gechr gechr force-pushed the gc/vroxqwnozmul branch 5 times, most recently from f52e329 to 8f12050 Compare September 9, 2025 09:53
@gechr gechr added the 🕚 waiting to give others chance to review We want to give people a chance to comment before merging label Sep 10, 2025
@gechr gechr removed the 🕚 waiting to give others chance to review We want to give people a chance to comment before merging label Sep 13, 2025
Comment on lines +1095 to +1080
let limit_property =
expect_integer_expression(language, diagnostics, build_ctx, limit_node)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just use expect_usize_expression(). It also handles overflow.

Copy link
Contributor Author

@gechr gechr Sep 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, I did. But I wasn't a fan of the ambiguous error:

<Error: out of range integral type conversion attempted>

Using expect_integer_expression allowed me to return a useful error:

<Error: replace() limit must be non-negative, got -1>

Up to you though; let me know which you prefer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just use the helper function because it's obvious that negative limit is invalid. If needed, we can make expect_usize_expression() return error message for negative integers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants