-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for UTF8 string literals #2940
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your first contribution! I've got some suggestions
""")] | ||
public void ShouldMutateUtf8StringLiteral(string original, string expected) | ||
{ | ||
var syntaxTree = CSharpSyntaxTree.ParseText($"""var test = "{original}"u8;"""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you add the u8 token here? I'd put it in the inline data to keep it transparent what is being mutated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, good idea. Will add to inline data.
integrationtest/TargetProjects/NetCoreTestProject.XUnit/String/Utf8StringMagicTests.cs
Show resolved
Hide resolved
src/Stryker.Core/Stryker.Core.UnitTest/Mutators/StringMutatorTests.cs
Outdated
Show resolved
Hide resolved
src/Stryker.Core/Stryker.Core.UnitTest/Mutators/StringMutatorTests.cs
Outdated
Show resolved
Hide resolved
{ | ||
public ReadOnlySpan<byte> HelloWorld() | ||
{ | ||
return "Hello"u8 + " "u8 + "World!"u8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've run the integration test locally and saw this line isn't mutated correctly. The mutation causes compile errors. I think this isn't related to the mutator but related to the mutation placing logic. I'd say we create a separate defect for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be fixed in this PR, not as a separate defect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've investigated why this test is failing. It fails during mutated code compilation. Stryker by default add ternary expression to each operand:
So this statement
return "Hello"u8 + " "u8 + "World!"u8;
is transformed into this:
return (StrykerqBEu3bP1rxHLxTW.MutantControl.IsActive(99)?""u8 :"Hello"u8 )+ (StrykerqBEu3bP1rxHLxTW.MutantControl.IsActive(100)?""u8 :" "u8 )+ (StrykerqBEu3bP1rxHLxTW.MutantControl.IsActive(101)?""u8:"World!"u8);
It boxes each UTF-8 value to ReadOnlySpan<byte>()
type. The problem lies with C# and how it allows to concatenate UTF-8 string literals. It allows to concatenate UTF-8 string directly: "Hello"u8 + " "u8 + "World!"u8
, but not when they are represented as ReadOnlySpan<byte>
: .new ReadOnlySpan<byte>() + new ReadOnlySpan<byte>()
Hence, mutated code fails to compile with:
CS9047 Operator '+' cannot be applied to operands of type 'ReadOnlySpan<byte>' and 'ReadOnlySpan<byte>' that are not UTF-8 byte representations
Do you think it's worth fixing this bug in current PR? It also touches core MutantControl
injection logic. If so, any new ideas are welcome here. So far, I've only thought of converting UTF-8 strings into arrays and concatenating them manually using LINQ when ternary conditions are used:
public ReadOnlySpan<byte> HelloWorld()
{
return (true ? ""u8 : "Hello"u8).ToArray()
.Concat((true ? ""u8 : " "u8).ToArray()
.Concat((true ? ""u8 : "World!"u8).ToArray()))
.ToArray();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. In that case, we could maybe include some extra code in the compilation that contains an operator overload for +
on ReadOnlySpan<byte>
. During the compilation we already pass some custom code to be compiled together with the source project, so that should be fairly easy to do.
Do you think it's worth fixing this bug in current PR?
Yes, as this seems like an easy fix we should add to this PR. Otherwise, we risk that this will never be fixed once this PR has been merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure adding this operator is needed.
"Hello"u8 + " "u8 + "World!"u8;
is actually a compile time constant so the compiler concatenate the strings and does not compile any expression. See the actual result:
internal static readonly __StaticArrayInitTypeSize=13 5A09E8FA9C77807B24E99C9CF999DEBFAD8441E269EB960E201F61FC3DE20D5A/* Not supported: data(48 65 6C 6C 6F 20 57 6F 72 6C 64 21 00) */;
An array of 13 bytes. This is seen as a single string here. As such, it does not really make sense to mutate sub parts.- In any case, there is no interest in mutating this expression more than once. It is almost certain that if removing
Hello
is killed by a failed test, that very same test will kill removing the space orworld!
. So there is no benefit keeping each mutation. Note that this remarks is also valid for classical constant string concatenations. - As such, it would make more sense to mutate the whole expression once, which would remove any problems with the missing operator.
- Adding this operator requires injecting a dedicated
using
statement at the start of any files needing it; meaning it would have to be added everywhere. Furthermore, it could result in compilation errors (ambiguous call
) that would require a new rollback logic (to detect it and remove the unnecessaryusing
statement). It does not appear easy to me. - The only consequence of not adding this operand is simply that concatenated u8 strings will not be mutated; which looks alright to me.
TLDR;
I recommend: doing nothing for now and contemplate detecting concatenated constant strings (u8 or otherwise) to be mutated at once. But it requires specific logic within the ExpressionOrchestrator
test = ""u8; | ||
} | ||
|
||
public bool IsNullOrEmpty(ReadOnlySpan<byte> myString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has nothing to do with utf8 strings and can be removed
I tested your PR on the following code and it broke: public string Test()
{
return "Hello " + " " + "World";
} So there is some issue with the mutator logic |
Can you specify what exactly fails here? I have noticed it doesn't compile when concatenation operator
If that's the case, should it be fixed somehow? Because I don't see how current or previous string mutator code has solution for that. My best guess would be to fix BinaryExpressionMutator implementation to avoid this. |
This isn't the problem, we prevent this mutation from being placed.
The above displays the issue. The interesting part is that the code should compile, but it doesn't because of how stryker places the mutation. I haven't investigate yet what exactly goes wrong, but my guess is that there is a flaw in our mutation placing logic |
Thank you. Now it's clear, I will look into this use-case. |
@iamdmitrij The integration tests have been updated and now also include my previous example. |
…ests.cs Co-authored-by: Richard Werkman <[email protected]>
…ests.cs Co-authored-by: Richard Werkman <[email protected]>
Fixes #2923.
Implemented the same logic as in plain strings:
""u8
->"Stryker was here!"u8
""something8
->""u8
SyntaxFactory.Literal(string)
method used for string mutations doesn't work with UTF8 string literal type (ReadOnlySpan<byte>
), so I've taken inspiration on how to make it work from dotnet/roslyn analyzers' code.Tasks:
StringMutator
to support UTF8 string literals