Add support for UTF8 string literals #2940

iamdmitrij · 2024-05-22T05:33:30Z

Implemented the same logic as in plain strings:

""u8 -> "Stryker was here!"u8
""something8 -> ""u8

SyntaxFactory.Literal(string) method used for string mutations doesn't work with UTF8 string literal type (ReadOnlySpan<byte>), so I've taken inspiration on how to make it work from dotnet/roslyn analyzers' code.

Tasks:

Modify StringMutator to support UTF8 string literals
Unit tests
Integration tests
Edit docs

src/Stryker.Core/Stryker.Core/Mutators/StringMutator.cs

richardwerkman

Thanks for your first contribution! I've got some suggestions

richardwerkman · 2024-05-24T07:19:18Z

src/Stryker.Core/Stryker.Core.UnitTest/Mutators/StringMutatorTests.cs

+                           """)]
+        public void ShouldMutateUtf8StringLiteral(string original, string expected)
+        {
+            var syntaxTree = CSharpSyntaxTree.ParseText($"""var test = "{original}"u8;""");


Why do you add the u8 token here? I'd put it in the inline data to keep it transparent what is being mutated.

Ok, good idea. Will add to inline data.

integrationtest/TargetProjects/NetCoreTestProject.XUnit/String/Utf8StringMagicTests.cs

src/Stryker.Core/Stryker.Core.UnitTest/Mutators/StringMutatorTests.cs

richardwerkman · 2024-05-24T07:35:38Z

integrationtest/TargetProjects/TargetProject/String/Utf8StringMagic.cs

+    {
+        public ReadOnlySpan<byte> HelloWorld()
+        {
+            return "Hello"u8 + " "u8 + "World!"u8;


I've run the integration test locally and saw this line isn't mutated correctly. The mutation causes compile errors. I think this isn't related to the mutator but related to the mutation placing logic. I'd say we create a separate defect for that.

This should be fixed in this PR, not as a separate defect.

I've investigated why this test is failing. It fails during mutated code compilation. Stryker by default add ternary expression to each operand:

So this statement

return "Hello"u8 + " "u8 + "World!"u8;

is transformed into this:

return (StrykerqBEu3bP1rxHLxTW.MutantControl.IsActive(99)?""u8 :"Hello"u8 )+ (StrykerqBEu3bP1rxHLxTW.MutantControl.IsActive(100)?""u8 :" "u8 )+ (StrykerqBEu3bP1rxHLxTW.MutantControl.IsActive(101)?""u8:"World!"u8);

It boxes each UTF-8 value to ReadOnlySpan<byte>() type. The problem lies with C# and how it allows to concatenate UTF-8 string literals. It allows to concatenate UTF-8 string directly: "Hello"u8 + " "u8 + "World!"u8, but not when they are represented as ReadOnlySpan<byte>: ~~new ReadOnlySpan<byte>() + new ReadOnlySpan<byte>()~~.

Hence, mutated code fails to compile with:

CS9047 Operator '+' cannot be applied to operands of type 'ReadOnlySpan<byte>' and 'ReadOnlySpan<byte>' that are not UTF-8 byte representations

Do you think it's worth fixing this bug in current PR? It also touches core MutantControl injection logic. If so, any new ideas are welcome here. So far, I've only thought of converting UTF-8 strings into arrays and concatenating them manually using LINQ when ternary conditions are used:

public ReadOnlySpan<byte> HelloWorld() { return (true ? ""u8 : "Hello"u8).ToArray() .Concat((true ? ""u8 : " "u8).ToArray() .Concat((true ? ""u8 : "World!"u8).ToArray())) .ToArray(); }

Interesting. In that case, we could maybe include some extra code in the compilation that contains an operator overload for + on ReadOnlySpan<byte>. During the compilation we already pass some custom code to be compiled together with the source project, so that should be fairly easy to do.

Do you think it's worth fixing this bug in current PR?

Yes, as this seems like an easy fix we should add to this PR. Otherwise, we risk that this will never be fixed once this PR has been merged.

I am not sure adding this operator is needed.

"Hello"u8 + " "u8 + "World!"u8; is actually a compile time constant so the compiler concatenate the strings and does not compile any expression. See the actual result:
internal static readonly __StaticArrayInitTypeSize=13 5A09E8FA9C77807B24E99C9CF999DEBFAD8441E269EB960E201F61FC3DE20D5A/* Not supported: data(48 65 6C 6C 6F 20 57 6F 72 6C 64 21 00) */;
An array of 13 bytes. This is seen as a single string here. As such, it does not really make sense to mutate sub parts.

In any case, there is no interest in mutating this expression more than once. It is almost certain that if removing Hello is killed by a failed test, that very same test will kill removing the space or world!. So there is no benefit keeping each mutation. Note that this remarks is also valid for classical constant string concatenations.

As such, it would make more sense to mutate the whole expression once, which would remove any problems with the missing operator.

Adding this operator requires injecting a dedicated using statement at the start of any files needing it; meaning it would have to be added everywhere. Furthermore, it could result in compilation errors (ambiguous call) that would require a new rollback logic (to detect it and remove the unnecessary using statement). It does not appear easy to me.

The only consequence of not adding this operand is simply that concatenated u8 strings will not be mutated; which looks alright to me.

TLDR;
I recommend: doing nothing for now and contemplate detecting concatenated constant strings (u8 or otherwise) to be mutated at once. But it requires specific logic within the ExpressionOrchestrator

richardwerkman · 2024-05-24T07:43:28Z

integrationtest/TargetProjects/TargetProject/String/Utf8StringMagic.cs

+            test = ""u8;
+        }
+
+        public bool IsNullOrEmpty(ReadOnlySpan<byte> myString)


This has nothing to do with utf8 strings and can be removed

richardwerkman · 2024-05-24T08:40:21Z

I tested your PR on the following code and it broke:

        public string Test()
        {
            return "Hello " + " " + "World";
        }

So there is some issue with the mutator logic

iamdmitrij · 2024-05-28T07:43:28Z

I tested your PR on the following code and it broke:
        public string Test()
        {
            return "Hello " + " " + "World";
        }
So there is some issue with the mutator logic

Can you specify what exactly fails here?

I have noticed it doesn't compile when concatenation operator + is mutated to -.

var a1 = "Hello"u8 + " "u8 + "World"u8; // OK
var a2 = "Hello"u8 - " "u8 - "World"u8; // Doesn't compile

var a3 = "Hello" + " " + "World"; // OK
var a4 = "Hello" - " " - "World"; // Doesn't compile

If that's the case, should it be fixed somehow? Because I don't see how current or previous string mutator code has solution for that. My best guess would be to fix BinaryExpressionMutator implementation to avoid this.

richardelekta · 2024-05-28T08:17:16Z

I have noticed it doesn't compile when concatenation operator + is mutated to -.

This isn't the problem, we prevent this mutation from being placed.

var a1 = "Hello"u8 + " "u8 + "World"u8; // OK
var a2 = ""u8 + " "u8 + "World"u8; // Doesn't compile

var a3 = "Hello" + " " + "World"; // OK
var a4 = "" + " " + "World"; // OK

The above displays the issue. The interesting part is that the code should compile, but it doesn't because of how stryker places the mutation. I haven't investigate yet what exactly goes wrong, but my guess is that there is a flaw in our mutation placing logic

iamdmitrij · 2024-05-28T08:47:48Z

I have noticed it doesn't compile when concatenation operator + is mutated to -.

This isn't the problem, we prevent this mutation from being placed.
var a1 = "Hello"u8 + " "u8 + "World"u8; // OK
var a2 = ""u8 + " "u8 + "World"u8; // Doesn't compile

var a3 = "Hello" + " " + "World"; // OK
var a4 = "" + " " + "World"; // OK
The above displays the issue. The interesting part is that the code should compile, but it doesn't because of how stryker places the mutation. I haven't investigate yet what exactly goes wrong, but my guess is that there is a flaw in our mutation placing logic

Thank you. Now it's clear, I will look into this use-case.

richardwerkman · 2024-06-10T09:26:16Z

@iamdmitrij The integration tests have been updated and now also include my previous example.

…ests.cs Co-authored-by: Richard Werkman <[email protected]>

iamdmitrij added 12 commits May 21, 2024 23:03

Add support for Utf8 string literals

705bc66

Change integration test results

4430981

Fix test results

b09a3a8

Refactor ApplyMutations method

efcfa2f

Remove new lines

c73643b

Fix ShouldMutateUtf8 unit test

bf247d2

Remove plain string from integration test

9ac57f4

Update ValidateStrykerResults.cs

6841729

Update ValidateStrykerResults.cs

54653f8

Update ValidateStrykerResults.cs

318b569

Update ValidateStrykerResults.cs

5ed4a7f

Merge branch 'master' into utf-8-literals

e8e2a3e

iamdmitrij marked this pull request as ready for review May 22, 2024 07:56

iamdmitrij commented May 22, 2024

View reviewed changes

src/Stryker.Core/Stryker.Core/Mutators/StringMutator.cs Outdated Show resolved Hide resolved

iamdmitrij added 2 commits May 22, 2024 10:57

Update src/Stryker.Core/Stryker.Core/Mutators/StringMutator.cs

6d7678d

Merge branch 'master' into utf-8-literals

3574f4d

rouke-broersma requested a review from richardwerkman May 24, 2024 08:07

richardwerkman requested changes May 24, 2024

View reviewed changes

iamdmitrij and others added 2 commits June 11, 2024 15:51

Update src/Stryker.Core/Stryker.Core.UnitTest/Mutators/StringMutatorT…

d7d764b

…ests.cs Co-authored-by: Richard Werkman <[email protected]>

Update src/Stryker.Core/Stryker.Core.UnitTest/Mutators/StringMutatorT…

a9fe1a7

…ests.cs Co-authored-by: Richard Werkman <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for UTF8 string literals #2940

Add support for UTF8 string literals #2940

iamdmitrij commented May 22, 2024 •

edited

Loading

richardwerkman left a comment

richardwerkman May 24, 2024

iamdmitrij Jun 11, 2024

richardwerkman May 24, 2024

rouke-broersma May 24, 2024

iamdmitrij Jun 11, 2024 •

edited

Loading

richardwerkman Jun 11, 2024

dupdob Jul 20, 2024 •

edited

Loading

richardwerkman May 24, 2024

richardwerkman commented May 24, 2024

iamdmitrij commented May 28, 2024 •

edited

Loading

richardelekta commented May 28, 2024

iamdmitrij commented May 28, 2024

richardwerkman commented Jun 10, 2024

Add support for UTF8 string literals #2940

Are you sure you want to change the base?

Add support for UTF8 string literals #2940

Conversation

iamdmitrij commented May 22, 2024 • edited Loading

richardwerkman left a comment

Choose a reason for hiding this comment

richardwerkman May 24, 2024

Choose a reason for hiding this comment

iamdmitrij Jun 11, 2024

Choose a reason for hiding this comment

richardwerkman May 24, 2024

Choose a reason for hiding this comment

rouke-broersma May 24, 2024

Choose a reason for hiding this comment

iamdmitrij Jun 11, 2024 • edited Loading

Choose a reason for hiding this comment

richardwerkman Jun 11, 2024

Choose a reason for hiding this comment

dupdob Jul 20, 2024 • edited Loading

Choose a reason for hiding this comment

richardwerkman May 24, 2024

Choose a reason for hiding this comment

richardwerkman commented May 24, 2024

iamdmitrij commented May 28, 2024 • edited Loading

richardelekta commented May 28, 2024

iamdmitrij commented May 28, 2024

richardwerkman commented Jun 10, 2024

iamdmitrij commented May 22, 2024 •

edited

Loading

iamdmitrij Jun 11, 2024 •

edited

Loading

dupdob Jul 20, 2024 •

edited

Loading

iamdmitrij commented May 28, 2024 •

edited

Loading