-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64] Variety of incorrect disassembly info (mainly SVE) #2472
Comments
I am so grateful you guys use AArch64 so early and are so thoroughly checking it. I cannot thank you enough! SVE and SME2 added so much more complexity, this part of the module will be the best tested of all Capstone :D Now the comments to the issues (working backwards):
Good idea! Going to add and document it. But this means also that the system operand is moved out of the union of operands. But this is probably a good idea anyways. Maybe people have similar requests for other sys ops in the future.
There seems something broken how the shift details are added. Will fix it.
Those are (almost always) faulty definitions in the LLVM files. Found many of those in the last months (especially for AArch64 :( ). Will patch it in our fork. So I really really appreciate that you went through the effort to list all effected instructions! I will fix this separately to the ones above. Because they are the same problem category. |
I've been eagerly awaiting the LLVM 18 aarch64 update to Capstone for a while as I do research into SME, so will let you guys know if I run into anything else! As for the access issues, wrong implicit read/write - I'm sure there are many more effected than the ones I've listed. If there isn't a universal fix then we will be sure to let you know as and when we find more! |
Also, an alternative to moving the sysop outside the union could be to add a |
Just to ensure we are on the same page. AArch64 on the
Yeah. Minimizing memory usage with enabled details is out of scope currently. There was some thinking about an v2 API which tries to save memory were possible. But this is something for v7. |
LLVM doesn't distinguish apparently between MSR instruction which set it and which not:
I removed it, because we are more often right than wrong. But are you aware of any way to test this? |
Some more comments regarding the flawed access information:
The memory operand should be
Weird one. LLVM defines them with the incorrect instruction format, as a three-vector instruction instead as a two vector instruction.
Fixing this, requires to track which As described in the chat, currently we check for tied operands and their access attributes. This is necessary, because write back registers are given the For a register which is read and written there are effectively two distinct logical operands for it. One "invisible" "written" register (which is never printed into the asm text) and the register which is printed/added to the details (and marked as read). Because of this, registers which use the same register twice, once for reading and once for writing AND have the name twice in the asm text, now are marked both as Implementing this would also fix the issue that the memory can be written, but none of the operands forming the address is. But up until now, the registers nonetheless appear in the I would fix (partly) the
-> Seems to have happened because this specific one is added with the 2023 extension. There are others which are correct. But Id like to move it as well after the Alpha release and fix it in a patch release. Do you have experience with TableGen? Maybe you could fix it faster? Search for Sorry that they are not so quick fixes. I will list them as bugs in the release guide for now. |
RE MRS and NZCV, I'm not aware of any MRS instruction that sets the NZCV and I couldn't find anything in the spec either... For As for the rest, thanks for explaining how it all works as it gives us a better understanding of how these issues have come about. For our use case we have work arounds, so if it will take a while to formulate a fix then that is fine. We are happy to log these errors so that the project in the long run can be made better. I unfortunately have no experience with TableGen.... nor the capacity at this time to learn how to implement a fix (sorry!). I appriciate your effort into this and do nto mind too much non-quick fixes given the good communication around why and when (roughly) it can be expected |
Especially for AArch64 I would be super grateful if you could continue with these very detailed bug reports. Testing of the details was basically none existent up until now. And AArch64 has something around 100+ unique operand types. So we catch up slowly to get to 100% test coverage over the next releases. |
0 cb 3e 80 05 and z11.b, z11.b, #0xfe
ID: 32 (and)
Is alias: 1408 (and) with REAL operand set
op_count: 3
operands[0].type: REG = z11
operands[0].access: READ | WRITE
Vector Arrangement Specifier: 0x40
operands[1].type: REG = z11
operands[1].access: READ | WRITE
Vector Arrangement Specifier: 0x40
operands[2].type: IMM = 0xfefefefefefefefe
operands[2].access: READ
Write-back: True
Registers read: z11
Registers modified: z11
Groups: HasSVEorSME
I'm currently re-implementing the latest version of |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Please open a new issue if you find more. So this one gets not too long. Can be again a collection issue. |
Work environment
git clone
Sorry for not using the template fully, and sorry in advance for the long issue, but I have identified a fair few instructions (mainly SVE) with incorrect access types, and others with incorrect implicit register reads / writes or no immidiate encoding.
Incorrect implicit destinations
AArch64_MRS
(0x42d03bd5) currently has the NZCV register as an implicit write register. This isn't correctAArch64_BLR
(0x20003fd6) has an implicit read of SP. Again this isn't correct.Incorrect access permissions
For this, for SVE instructions which have the format of
fsub zdn, pg, zdn, zm
- that is, where the 1st and 2nd z-regs must be the same - both operands have their access set to READ | WRITE.Although this is technically correct, as register zdn is being written to and read from, I think it can be confusing. My reasoning for this is that operand[0] represents the register being written to, so its access should be just WRITE. Then operand[2] (in the example above) is the first source vector and should be READ.
If someone does not know that the destination vector register and the first source vector are mandated by the ISA spec to be the same register, then it could be confusing to see 2 registers being written to.
Here is an examples of this occuring:
Which could be instead this:
I don't have a comprehensive list of all instructions that are effected by this, but it generally seems to be SVE only and with the format
Zdn, pg, zdn, <zm|#imm>
.Below is a list of opcode enums (and some bytecodes where I've made a note of them) of the ones I have run into so far:
Similar has also been seen with unpredicated SVE instructions where operand[0] and operand[1] must be the same SVE vector register:
Incorrect access permissions pt. 2
There are some other instructions I have found with wrong access information.
AArch64_CASALX and AArch64_CASALW // Example bytecode - 02fce188
All permissions should be READ as no register is updated with CASAL. Also writeback should be False:
AArch64_FCVTNv4i32 // Example bytecode - 0168614e
operands[0] should be WRITE only. More variants of this instruction may be effected, I just haven't verified this:
Imm not set when a shift is present
For many instructions that take an immidiate, a shift can also optionally be provided. When the shift is not provided, the instructions work fine.
However, when the shift is provided the shift amount is often fixed or in a range. As such, Capstone / LLVM disassembler automatically works out the shifted value. The shifted immidiate is given correctly in the operand string, but is not in the disassembly info.
Example: AArch64_CPY_ZPzI_H: // Example bytecode - 01215005
Here, there is an extra operand in operand[3], and the imm is not set:
An alternative assembly for this instruction (and the one I used to generate the bytecode) is
cpy z1.h, p0/z, #8, lsl #8
, where the only LSL available is by #8.The instructions we have found to be effected are:
This issue is likely to effect all instructions which use immidiates and optional shifts in this way.
FP immidate not shown in disassembly information
For instructions which take a fixed floating point immidiate value, it is correctly identified that one exists, and the
EXACTFPIMM
field is populated. But, we also have the.fp
field in thecs_aarch64_op
union. It could be useful to also populate this field as well as the enum for better clarity and improved in-project usage.Example: AArch64_FADD_ZPmI_D // Example bytecode - 0584d865
Could be
Thanks in advance!
The text was updated successfully, but these errors were encountered: