-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drawing performance tweaks #290
Conversation
Codecov Report
@@ Coverage Diff @@
## main #290 +/- ##
===================================
- Coverage 80% 80% -1%
===================================
Files 97 97
Lines 4873 4937 +64
Branches 869 878 +9
===================================
+ Hits 3929 3975 +46
- Misses 752 769 +17
- Partials 192 193 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more 📢 Have feedback on the report? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are minor changes to the test output, but they are nonvisible. This is why there are so many files listed.
Do we have at least one test configuration that excercises scalar runs primarily? We may see failures there. If the difference is minor, it could be better to tweak the tolerance than to replace all the files.
@@ -82,12 +85,85 @@ internal static ScanEdgeCollection Create(TessellatedMultipolygon multipolygon, | |||
|
|||
static void RoundY(ReadOnlySpan<PointF> vertices, Span<float> destination, float subsamplingRatio) | |||
{ | |||
for (int i = 0; i < vertices.Length; i++) | |||
int ri = 0; | |||
if (Avx.IsSupported) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to make sure all 3 cases (AVX, SSE, scalar) are covered by tests for various input sizes. Given that running an extensive set of integration tests out of process is rather expensive, it would be much better to have the method unit-tested with FeatureTestRunner
against random input of various sizes. Here's an example for this approach:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've ported FeatureTestRunner
across and added a wrapper around existing ScanEdgeCollection
tests to cover changes. We now use the same rounding throughout as I figured out an easy way to replicate midpoint rounding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coverage is better than reported (actually better than main) because we can't provide a coverage report for the ARM code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the small overlook with maxIterations
LGTM.
I would still prefer to see SIMD optimized utility methods being directly unit tested, since that enables fine grained testing of various element counts and edge cases and may catch bugs we could miss otherwise, but won't block the PR on it.
if (verticesLength > 0) | ||
{ | ||
ri = vertices.Length - (remainder / 2); | ||
float maxIterations = verticesLength / (Vector256<float>.Count * 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be nint
. (Also for Sse41
and AdvSimd
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed here and in benchmarks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you've pushed the change? I don't see the update in the PR diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🫢Nope.... committed but didn't push. I'll open a PR.
} | ||
} | ||
} | ||
else if (AdvSimd.IsSupported) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know if this is true on the BuildJet image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we've specified the ARM images. These are the ones we've used previously to diagnose issues in ImageSharp. https://buildjet.com/for-github-actions/docs/runners/hardware#arm
Prerequisites
Description
Some performance tweaks to help with our drawing performance.
ScanEdgeCollection
ClipperOffset
since we don't do fixed precision maths. This should actually make the clipper more accurate.Given that Avx does not have an exact equivalent toMidpointRounding.AwayFromZero
there are minor changes to the test output, but they are nonvisible. This is why there are so many files listed.Only 6 reference files are changed now (due to improved accuracy of the clipper). However, GitHub is marking 30 files as changed with several LFS files marked as invalid. I have no idea why. I've not touched those files and they are present in our CI tests.
Here's the difference in performance with the new rounding method testing a buffer of 1000 vertices. (I can't test the ARM version).
Here's some comparison benchmarks indicating where we stand against System.Drawing and SkiaSharp.
I've included the benchmarks from #96 to give us a rough idea of how much better we are looking compared to v1. Bear in mind though that the difference is massively influenced by the improved blending performance in ImageSharp v3.
FillPolygonMedium (Mississippi):
v1
This PR
DrawPolygonMediumThick:
v1
This PR
Drawing, though much faster is still way off the pace so I did a little digging. It turns out almost half the overall time of the actual operation is spent clipping away self-intersections. Here's the benchmark if I remove that.
Only 11 tests fail when I disable clipping, some results are actually better. (Note the ghosting disappearing #106)
However, we cannot remove clipping as the offsetter depends on it.
I've been looking at both Blend2d and Skia and it appears to me (I could be reading it wrong) that they work differently. I cannot find any trace of clipping in either stroke operation. Perhaps someone would like to have a closer look?