-
-
Notifications
You must be signed in to change notification settings - Fork 978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory diagnoser fix for Tiered Compilation #1543
Conversation
…iteration, the Tiered JIT might kick-in and allocate some memory and affect the results as a workaround, we can put the thread to sleep for more than 200ms to TC thread kicks in before we start memory measurements it's far from perfect but it works fixes #1542
Hmm you had mentioned on the other issue that 5.0 is free of this issue, is this really specific to 3.1? Timings have changed between 3.1 and 5.0 such that the rejits would happen sooner in 5.0 but it seems unlikely that that would affect microbenchmarks (the change is easily visible in real-world cases with many thousands of methods but there would only be a few methods here). For allocation measurements especially with long-running methods and with a strong assertion about allocation (that there shouldn't be any) generally my suggestion is to disable tiered compilation. Tier 0 jitted code may allocate where optimized code would not, and the former may allocate much more than the latter. An alternative to the delay may be to set Aside from that, I wonder what is actually causing allocation during jitting. There are some small allocation-dependent things I'm aware of regarding virtual methods that can be fixed, but otherwise on a rejit since all static construction, etc. should have already been done I'm not sure what would be allocating. Do you have an idea? |
Also I don't think there should be any guarantee that there would not be any allocation happening in the background. As we move more stuff to managed code like the thread pool, things may happen in the background unrelated to the app's code that may cause allocation and that should be ok. |
Yes, it looks like it's specific to 3.1 (I did not test 2.2 and 3.0 as they are not supported anymore) BenchmarkDotNet=v0.12.1.20201002-develop, OS=Windows 10.0.18363.1082 (1909/November2019Update/19H2)
Intel Xeon CPU E5-1650 v4 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=5.0.100-rc.1.20452.10
[Host] : .NET Core 2.1.21 (CoreCLR 4.6.29130.01, CoreFX 4.6.29130.02), X64 RyuJIT
Job-WTPPRD : .NET 5.0.0 (5.0.20.45114), X64 RyuJIT
Job-MUUDGX : .NET Core 2.1.21 (CoreCLR 4.6.29130.01, CoreFX 4.6.29130.02), X64 RyuJIT
Job-TXYPKS : .NET Core 3.1.7 (CoreCLR 4.700.20.36602, CoreFX 4.700.20.37001), X64 RyuJIT
Job-IEOLEB : .NET Framework 4.8 (4.8.4220.0), X64 RyuJIT
| Method | Runtime | Allocated |
|----------- |--------------------- |----------:|
| Benchmark1 | .NET 5.0 | - |
| Benchmark1 | .NET Core 2.1 | - |
| Benchmark1 | .NET Core 3.1 | 9 B |
| Benchmark1 | .NET Framework 4.8 | - |
| | | |
| Benchmark2 | .NET 5.0 | - |
| Benchmark2 | .NET Core 2.1 | - |
| Benchmark2 | .NET Core 3.1 | 20 B |
| Benchmark2 | .NET Framework 4.8 | - |
| | | |
| Benchmark3 | .NET 5.0 | - |
| Benchmark3 | .NET Core 2.1 | - |
| Benchmark3 | .NET Core 3.1 | 21 B |
| Benchmark3 | .NET Framework 4.8 | - |
| | | |
| Benchmark4 | .NET 5.0 | - |
| Benchmark4 | .NET Core 2.1 | - |
| Benchmark4 | .NET Core 3.1 | 178 B |
| Benchmark4 | .NET Framework 4.8 | - |
| | | |
| Benchmark5 | .NET 5.0 | - |
| Benchmark5 | .NET Core 2.1 | - |
| Benchmark5 | .NET Core 3.1 | 101 B |
| Benchmark5 | .NET Framework 4.8 | - | |
Ok I think it is the virtual slot backpatching storage. It's probably the rejit timing that is causing the allocation to happen earlier and not show up in the benchmark in 5.0. |
I am afraid that this could lead to BDN reporting "too perfect" results that could differ from what end-users with default settings are experiencing
I was also curious and tried to use VS Memory Profiler to find out, but I've failed. In this particular case the VS Profiler shows me memory allocated for JITTing of the methods that are executed for the first time: But it does not show me anything attributed to the TP Thread and |
I totally agree. But I also expect the users to be quite suprised when they see that BDN reports allocated memory for a code that clearly does not allocate anything in explicit way. |
Possibly. With aggressive tiering, it still goes through the normal tiering stages, just more quickly. It would change rejit timings and more methods may be rejitted, though likely the code quality would be similar to default mode (no guarantee though since it can depend on timing and code paths hit in the calls before rejit). The other option is to give the test more warmup time to stabilize. |
@@ -183,6 +183,13 @@ public Measurement RunIteration(IterationData data) | |||
// it does not matter, because we have already obtained the results! | |||
EnableMonitoring(); | |||
|
|||
if (RuntimeInformation.IsNetCore) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this problem is relevant only for .NET Core 3.x, could we add an additional check?
{ | ||
// we put the current thread to sleep so Tiered Compiler can kick in, compile it's stuff | ||
// and NOT allocate anything on the background thread when we are measuring allocations | ||
System.Threading.Thread.Sleep(TimeSpan.FromMilliseconds(250)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make it 500 (just to be extra sure).
What if you just run |
@timcassell this would increase the time it takes to get the stats which would be bad for long-running benchmarks |
@adamsitnik I noticed you showed that the issue is not apparent in .NET 5.0, which I find strange because I posted an issue in runtime (dotnet/runtime#45446) that demonstrates memory measurement issues in both .NET 5.0 and .NET Core 3.1 runtimes. If that same thing affects this issue, this may not be solvable until the fix is in the runtime (it was placed on the 6.0.0 milestone). |
Also @adamsitnik I was looking at |
@timcassell The list of methods that get promoted to Tier 1 in the meantime is not fixed. It's not just the benchmarked code, it's also BDN Engine code and the BCL types that it's using. For example in my investigation I've found the following methods to be "problematic" (the TC marked with yellow): There is no way to ensure that all of them are going to get promoted to Tier 1 before we start measuring GC stats. |
I can confirm that I am seeing the same issue (sometimes shows allocations, sometimes doesn't) in .NET 5.0, so it's not isolated to Core 3.1.
|
@kouvel is dotnet/runtime#45901 going to affect this behavior? will Tiered JIT stop allocating managed memory in the background thread? |
No it wouldn't affect the allocation. The allocation is from virtual slot backpatching, which currently uses some GC memory for tracking virtual slots and is somewhat separate from tiering. An eventual goal is to remove the dependency on the GC for that tracking, as it has other issues, but it's lower on the priority list at the moment. |
@kouvel thank you for a very detailed answer! |
Looks like this was fixed in .Net 6 preview 6. dotnet/runtime#45446 (comment) |
@timcassell the underlying issue is not fixed, it just may not be reproducing anymore by chance. When measuring GC allocations I would for now still suggest turning off tiered compilation to avoid the noise. |
@kouvel Curious if this was actually fixed in dotnet/runtime#67160? |
Yes this should be fixed now in .NET 7.0 by dotnet/runtime#67160 |
I've used the latest .NET Runtime (7.0.100-rc.1.22364.28) and BDN still reports that some memory is being allocated. Most likely there are still some background allocations. I currently don't have the time to dig in deeper into that.
|
Superseded by #2562. |
In #1542 @ronbrogan has reported a very unusual bug - a code that was CPU bound and not allocating at all was reporting allocations for .NET Core 3.1 (it works fine for 2.1 as there we use
GC.GetAllocatedBytesForCurrentThread
insteadGC.GetTotalAllocatedBytes
):After some investigation, I've narrowed down the problem to Tiered JIT thread that from time to time would be promoting methods from Tier 0 to Tier 1 and allocating memory during the iteration where we call
GC.GetTotalAllocatedBytes
.I had few ideas, but the only one worked was putting the thread to sleep for 250ms before we call
GC.GetTotalAllocatedBytes
. In this time TC thread kicks-in and promotes the methods. It's of course far from perfect as TC might not finish the promotion before we make the first call toGC.GetTotalAllocatedBytes
. I don't want to prolong the sleeping period because it would increase the time we need to run the benchmarks.@kouvel do you have a better idea of how we could prevent TC from working at a given point of time?
I've confirmed that it works as expected by modifying the
GetExtraStats
method to emit some extra events and filtering the TC events in PerfView to this particular period of time:Fixes #1542