Fix memory callback coverage #1163
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does:
qemu_ld_helpers_pandabased onpanda_use_memcbhelper_[endian]_[ld/st]_namefunctions into the inline static functions of the formhelper_[endian]_[ld/st]_name_inner(and changes internal calls to this form as well)helper_[endian]_[ld/st]_namefunctions with PANDA equivalent that checkspanda_use_memcbat runtimehelper_[endian]_[ld/st]_name_pandafunctions for compatibility.The issue:
See #1119 for context.
A bug was reported that indicated that the x86 instruction
cmpxchg(32 or 64-bit versions) did not generate memory callbacks. The examination in the PR is somewhat different from my analysis. First, the relevant functions are not:because when generating for single core we generate
_unlockedversions.panda/target/i386/translate.c
Lines 5269 to 5273 in 0c15599
The unlocked versions use
cpu_ldstfunctions like @AndrewFasano mentioned.panda/target/i386/mem_helper.c
Lines 28 to 52 in 0c15599
It seems to me that there are two problems:
helper_[endian]_st[size]_mmu_pandaare only replaced herepanda/tcg/i386/tcg-target.inc.c
Lines 1223 to 1242 in d2ee3d1
ld[size]_pis called directly on translated memory. https://github.com/panda-re/panda/blob/dev/include/exec/cpu_ldst_template.h#L105-L106Advantages
cpu_ldstfunctions, but also includes direct calls from other instruction such as:panda/target/i386/fpu_helper.c
Lines 72 to 80 in 6ed3f19
panda/target/i386/mpx_helper.c
Lines 138 to 148 in 6ed3f19
panda/target/i386/mpx_helper.c
Lines 105 to 120 in 6ed3f19
panda/target/ppc/mem_helper.c
Lines 56 to 66 in 6ed3f19
Potential issues
panda_use_memcbfor every memory access. The function makes the case for not using memory callbacks as fast as possible. It inlines the internal version, but it's still a change that could impact performance.panda/softmmu_template.h
Lines 536 to 543 in 6ed3f19
Design decisions
Instead of replacing
helper_[endian]_[ld/st]_namewith their PANDA equivalents and moving the old to the_internalversions I could have instead changed thecpu_[ld/st][size]_data_retfunctions to use the_pandaversions based on the status ofpanda_use_memcb. I chose not to because I think covering the default case ofhelper_[endian]_[ld/st]_namewith the PANDA version covers far more cases we may have not considered. I also wanted the ability to dynamically enable/disable memory callbacks. If it turns out that that is hugely detrimental to performance I'm happy to consider other options.Coverage decisions
This PR changes the PANDA model to cover virtual memory read/writes from all system mmu modes. This means every time data is written/read in a helper it will be recorded. Examples include #1179 and #1119. However, it also includes things not previously considered part of the model such as reads made by the FPU, register store/restore, segment reads, and SIMD.
Performance implications
Does this PR make a difference compare to two systems with
memcbdisabled?The performance difference here is almost indistinguishable.
Does this PR make a difference compare to two systems with
memcbenabled?The relatively minimal performance impact here seems to result from additional reads/writes being hit.
In a relatively short recording on x86 there were about 1500 more calls out of around 7.3 million total calls.
In an ARM recording there were exactly the same number of calls. ARM doesn't seem to use these calls.
CLOSES: #1119