Drawing performance tweaks #290

JimBobSquarePants · 2023-09-07T10:40:12Z

Prerequisites

I have written a descriptive pull-request title
I have verified that there are no overlapping pull-requests open
I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
I have provided test coverage for my change (where applicable)

Description

Some performance tweaks to help with our drawing performance.

Avx/Sse41/Arm64 rounding in ScanEdgeCollection
Remove unnecessary scaling in ClipperOffset since we don't do fixed precision maths. This should actually make the clipper more accurate.

~~Given that Avx does not have an exact equivalent to MidpointRounding.AwayFromZero there are minor changes to the test output, but they are nonvisible. This is why there are so many files listed.~~

Only 6 reference files are changed now (due to improved accuracy of the clipper). However, GitHub is marking 30 files as changed with several LFS files marked as invalid. I have no idea why. I've not touched those files and they are present in our CI tests.

Here's the difference in performance with the new rounding method testing a buffer of 1000 vertices. (I can't test the ARM version).

|      Method |       Mean |     Error |    StdDev | Ratio |
|------------ |-----------:|----------:|----------:|------:|
|   RoundYAvx |   224.4 ns |   2.43 ns |   2.03 ns |  0.03 |
| RoundYSse41 | 2,639.2 ns |  10.78 ns |   9.56 ns |  0.35 |
|      RoundY | 7,449.2 ns | 142.53 ns | 119.02 ns |  1.00 |

Here's some comparison benchmarks indicating where we stand against System.Drawing and SkiaSharp.

I've included the benchmarks from #96 to give us a rough idea of how much better we are looking compared to v1. Bear in mind though that the difference is massively influenced by the improved blending performance in ImageSharp v3.

FillPolygonMedium (Mississippi):

v1

|                 Method |         Mean |       Error |      StdDev |    Ratio | RatioSD |
|----------------------- |-------------:|------------:|------------:|---------:|--------:|
|          SystemDrawing |     358.2 us |     7.14 us |    10.68 us |     2.27 |    0.06 |
|    ImageSharp_Original | 187,078.3 us | 3,699.54 us | 6,181.11 us | 1,186.12 |   37.40 |
| ImageSharp_ActiveEdges |   2,884.7 us |    56.27 us |   109.75 us |    18.84 |    0.67 |
|              SkiaSharp |     156.8 us |     2.91 us |     2.43 us |     1.00 |    0.00 |

This PR

|        Method |     Mean |    Error |   StdDev | Ratio | RatioSD |
|-------------- |---------:|---------:|---------:|------:|--------:|
| SystemDrawing | 253.3 us |  2.50 us |  1.95 us |  2.26 |    0.03 |
|    ImageSharp | 766.4 us | 13.54 us | 16.12 us |  6.84 |    0.14 |
|     SkiaSharp | 112.5 us |  1.25 us |  1.17 us |  1.00 |    0.00 |

DrawPolygonMediumThick:

v1

|                 Method |       Mean |     Error |    StdDev |     Median |  Ratio | RatioSD |
|----------------------- |-----------:|----------:|----------:|-----------:|-------:|--------:|
|          SystemDrawing |   1.539 ms | 0.0304 ms | 0.0621 ms |   1.515 ms |   0.97 |    0.04 |
|    ImageSharp_Original | 169.697 ms | 3.3095 ms | 4.7463 ms | 168.241 ms | 104.43 |    2.09 |
| ImageSharp_ActiveEdges |  38.795 ms | 0.7716 ms | 1.0562 ms |  38.591 ms |  24.16 |    0.74 |
|              SkiaSharp |   1.625 ms | 0.0158 ms | 0.0132 ms |   1.623 ms |   1.00 |    0.00 |

This PR

|        Method |      Mean |     Error |    StdDev | Ratio | RatioSD |
|-------------- |----------:|----------:|----------:|------:|--------:|
| SystemDrawing |  1.352 ms | 0.0094 ms | 0.0083 ms |  0.92 |    0.01 |
|    ImageSharp | 14.104 ms | 0.2779 ms | 0.3089 ms |  9.61 |    0.25 |
|     SkiaSharp |  1.473 ms | 0.0068 ms | 0.0053 ms |  1.00 |    0.00 |

Drawing, though much faster is still way off the pace so I did a little digging. It turns out almost half the overall time of the actual operation is spent clipping away self-intersections. Here's the benchmark if I remove that.

|        Method |     Mean |     Error |    StdDev | Ratio | RatioSD |
|-------------- |---------:|----------:|----------:|------:|--------:|
| SystemDrawing | 1.352 ms | 0.0103 ms | 0.0080 ms |  0.90 |    0.02 |
|    ImageSharp | 8.518 ms | 0.0815 ms | 0.0681 ms |  5.65 |    0.14 |
|     SkiaSharp | 1.498 ms | 0.0297 ms | 0.0342 ms |  1.00 |    0.00 |

Only 11 tests fail when I disable clipping, some results are actually better. (Note the ghosting disappearing #106)

However, we cannot remove clipping as the offsetter depends on it.

I've been looking at both Blend2d and Skia and it appears to me (I could be reading it wrong) that they work differently. I cannot find any trace of clipping in either stroke operation. Perhaps someone would like to have a closer look?

codecov · 2023-09-07T11:46:51Z

Codecov Report

Merging #290 (d15be1c) into main (3f1b4fa) will decrease coverage by 1%.
The diff coverage is 69%.

@@         Coverage Diff         @@
##           main   #290   +/-   ##
===================================
- Coverage    80%    80%   -1%     
===================================
  Files        97     97           
  Lines      4873   4937   +64     
  Branches    869    878    +9     
===================================
+ Hits       3929   3975   +46     
- Misses      752    769   +17     
- Partials    192    193    +1

Flag	Coverage Δ
unittests	`80% <69%> (-1%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed	Coverage Δ
...g/Shapes/Rasterization/ScanEdgeCollection.Build.cs	`83% <67%> (-13%)`	⬇️
...arp.Drawing/Shapes/PolygonClipper/ClipperOffset.cs	`100% <100%> (ø)`

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

📢 Have feedback on the report? Share it here.

antonfirsov

there are minor changes to the test output, but they are nonvisible. This is why there are so many files listed.

Do we have at least one test configuration that excercises scalar runs primarily? We may see failures there. If the difference is minor, it could be better to tweak the tolerance than to replace all the files.

antonfirsov · 2023-09-07T13:44:00Z

src/ImageSharp.Drawing/Shapes/Rasterization/ScanEdgeCollection.Build.cs

@@ -82,12 +85,85 @@ internal static ScanEdgeCollection Create(TessellatedMultipolygon multipolygon,

        static void RoundY(ReadOnlySpan<PointF> vertices, Span<float> destination, float subsamplingRatio)
        {
-            for (int i = 0; i < vertices.Length; i++)
+            int ri = 0;
+            if (Avx.IsSupported)


We need to make sure all 3 cases (AVX, SSE, scalar) are covered by tests for various input sizes. Given that running an extensive set of integration tests out of process is rather expensive, it would be much better to have the method unit-tested with FeatureTestRunner against random input of various sizes. Here's an example for this approach:

https://github.com/SixLabors/ImageSharp/blob/54b7e04f7a3c2921af3c769bd6c27fd3d5156f04/tests/ImageSharp.Tests/Common/SimdUtilsTests.cs#L155-L166

I've ported FeatureTestRunner across and added a wrapper around existing ScanEdgeCollection tests to cover changes. We now use the same rounding throughout as I figured out an easy way to replicate midpoint rounding.

Coverage is better than reported (actually better than main) because we can't provide a coverage report for the ARM code.

JimBobSquarePants · 2023-09-08T02:58:21Z

There is something funky going on with Git LFS. If I switch the branch to main I see the following 15 changes which I then can do absolutely nothing with (can't switch branch or discard).

It might well be main that's the issue actually as I can see the files are hosted in Git LFS in my feature branch.

https://github.com/SixLabors/ImageSharp.Drawing/blob/a7481bb4cbe160a02ab8793ef46b87d4a260db2a/tests/Images/ReferenceOutput/Drawing/DrawComplexPolygonTests/DrawComplexPolygon__Overlap.png

I would suggest we ignore and merge the new files. I can follow up with a PR to blanket normalize the images if there are issues.

antonfirsov · 2023-09-08T15:37:22Z

When trying to copy back files from main I'm seeing a diff of 6 files 🤷

antonfirsov

Apart from the small overlook with maxIterations LGTM.

I would still prefer to see SIMD optimized utility methods being directly unit tested, since that enables fine grained testing of various element counts and edge cases and may catch bugs we could miss otherwise, but won't block the PR on it.

antonfirsov · 2023-09-08T14:51:38Z

src/ImageSharp.Drawing/Shapes/Rasterization/ScanEdgeCollection.Build.cs

+                if (verticesLength > 0)
+                {
+                    ri = vertices.Length - (remainder / 2);
+                    float maxIterations = verticesLength / (Vector256<float>.Count * 2);


This should be nint. (Also for Sse41 and AdvSimd.)

Thanks, fixed here and in benchmarks

Are you sure you've pushed the change? I don't see the update in the PR diff.

🫢Nope.... committed but didn't push. I'll open a PR.

antonfirsov · 2023-09-08T15:43:29Z

src/ImageSharp.Drawing/Shapes/Rasterization/ScanEdgeCollection.Build.cs

+                    }
+                }
+            }
+            else if (AdvSimd.IsSupported)


Do we know if this is true on the BuildJet image?

Yeah, we've specified the ARM images. These are the ones we've used previously to diagnose issues in ImageSharp. https://buildjet.com/for-github-actions/docs/runners/hardware#arm

JimBobSquarePants added 3 commits September 7, 2023 16:41

Add Avx rounding

3a50436

Remove unneeded clipper scaling

d244b89

Update test output

560c671

JimBobSquarePants requested review from tocsoft and antonfirsov September 7, 2023 10:40

JimBobSquarePants added the area:performance label Sep 7, 2023

JimBobSquarePants added 4 commits September 7, 2023 20:58

Tweak tolerance for Win .NET 7

c18494b

Tweak tolerance for Win .NET 7

ac69b4c

Windows boxes must be shonky

ce4d2ac

Add Sse41 rounding

775c95e

antonfirsov reviewed Sep 7, 2023

View reviewed changes

JimBobSquarePants added 4 commits September 8, 2023 11:11

Use midpoint rounding for SIMD. And add feature tests

20fb019

Revert previous test reference changes

d2d51a2

Replace test output images.

36209ff

Revert tolerance changes

a7481bb

Attempt AdvSimd implementation

9d4d0ac

JimBobSquarePants added the arch:arm64 label Sep 8, 2023

JimBobSquarePants added 3 commits September 8, 2023 17:12

Update ScanEdgeCollection.Build.cs

e15175e

Bump Magick

1fb4acf

Tweak tolerance for ARM64

d15be1c

antonfirsov approved these changes Sep 8, 2023

View reviewed changes

JimBobSquarePants merged commit e050b92 into main Sep 9, 2023
8 of 9 checks passed

JimBobSquarePants deleted the js/draw-perf branch September 9, 2023 03:15

JimBobSquarePants mentioned this pull request Sep 11, 2023

Additional drawing performance fixes #292

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drawing performance tweaks #290

Drawing performance tweaks #290

JimBobSquarePants commented Sep 7, 2023 •

edited

Loading

codecov bot commented Sep 7, 2023 •

edited

Loading

antonfirsov left a comment

antonfirsov Sep 7, 2023 •

edited

Loading

JimBobSquarePants Sep 8, 2023

JimBobSquarePants Sep 8, 2023

JimBobSquarePants commented Sep 8, 2023 •

edited

Loading

antonfirsov commented Sep 8, 2023

antonfirsov left a comment

antonfirsov Sep 8, 2023

JimBobSquarePants Sep 9, 2023

antonfirsov Sep 10, 2023

JimBobSquarePants Sep 11, 2023

antonfirsov Sep 8, 2023

JimBobSquarePants Sep 9, 2023

Drawing performance tweaks #290

Drawing performance tweaks #290

Conversation

JimBobSquarePants commented Sep 7, 2023 • edited Loading

Prerequisites

Description

codecov bot commented Sep 7, 2023 • edited Loading

Codecov Report

antonfirsov left a comment

Choose a reason for hiding this comment

antonfirsov Sep 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JimBobSquarePants commented Sep 8, 2023 • edited Loading

antonfirsov commented Sep 8, 2023

antonfirsov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JimBobSquarePants commented Sep 7, 2023 •

edited

Loading

codecov bot commented Sep 7, 2023 •

edited

Loading

antonfirsov Sep 7, 2023 •

edited

Loading

JimBobSquarePants commented Sep 8, 2023 •

edited

Loading