Skip to content

Initial support for symbolic opcodes#145

Merged
katrinafyi merged 6 commits intoUQ-PAC:partial_evalfrom
mmcloughlin:symaslp
Jan 23, 2026
Merged

Initial support for symbolic opcodes#145
katrinafyi merged 6 commits intoUQ-PAC:partial_evalfrom
mmcloughlin:symaslp

Conversation

@mmcloughlin
Copy link

Add initial support for generating semantics of symbolic opcodes.

These extensions were a crucial ingredient of the ISA specification generation for instruction-selection verification in Cranelift, recently presented at OOPSLA.

Scaling Instruction-Selection Verification against Authoritative ISA Semantics
https://doi.org/10.1145/3764383

Thank you for all your work on this project, it was a great help to us.

Updates #52

Experimental support for symbolic opcodes in ASLp.

Opcode semantics generation now accepts an "opcode" template, for
example:

```
ASLi> :sem A64 0x122:9|sh:1|imm:12|0xa4:10
Decoding instruction A64 0x122:9|sh:1|imm:12|0xa4:10
bits ( 64 ) imm__3 ;
if eq_bits.0 {{ 1 }} ( sh [ 0 +: 1 ],'0' ) then {
imm__3 = ZeroExtend.0 {{ 12,64 }} ( imm [ 0 +: 12 ],64 ) ;
}  else {
if eq_bits.0 {{ 1 }} ( sh [ 0 +: 1 ],'1' ) then {
imm__3 = ZeroExtend.0 {{ 24,64 }} ( append_bits.0 {{ 12,12 }} ( imm [ 0 +: 12 ],'000000000000' ),64 ) ;
}  else {
assert FALSE ;
}
}
__array _R [ 4 ] = add_bits.0 {{ 64 }} ( __array _R [ 5 ],imm__3 ) ;
```
Fix a problem with the treatment of symbolic field variables by
common-subexpression-elimination.

When a common subexpression has been identified, it needs to place it in
the statement list. It does this by placing it in the first position
after all its dependencies have been assigned. The problem comes because
a common subexpression that depends on a symbolic field won't ever find
an assignment to the symbolic field.

This is what the algorithm was probably supposed to do, but as they
explain in a comment what they actually do is place the definition of
the common subexpression in the first place after all its dependencies
have _appeared_ (whether assigned to or not). Therefore in practice,
common expressions involving symbolic fields would end up placed after
their first use.

This PR fixes it by:

* Passing a list of globals into CSE which are treated as predefined
* Appending symbolic fields to the global identifier set
Comment on lines +1568 to +1571
(* Declare symbolic fields as globals *)
let fields = fields_of_sym_bits op in
let field_names = List.map fst fields in
let globals = List.fold_left (fun idents name -> IdentSet.add name idents) globals field_names in
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I took the best approach to ensuring that symbolic variables in the opcode were available as globals. Please let me know if there was a better way?

@katrinafyi
Copy link
Member

Thanks for the PR! From a very quick look, the changes look reasonable.

I'm on leave rn, so if someone else wants to review this, they should feel free. Otherwise, I'll take a look when I get back!

@mmcloughlin
Copy link
Author

Otherwise, I'll take a look when I get back!

No rush, take your time! Thanks.

@mmcloughlin
Copy link
Author

Following up on this. Would love to get your review and integrate this into ASLp. Thanks!

Copy link
Member

@katrinafyi katrinafyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments, I can look into these and make any needed changes.

Small output difference in 0x3a410ac4 (and others). ITESimp seems to be slightly less effective:

 constant bits ( 32 ) Cse0__5 = add_bits.0 {{ 32 }} ( __array _R [ 22 ] [ 0 +: 32 ],'00000000000000000000000000000001' ) ;
 constant bits ( 1 ) ITESimp_0 = PSTATE . Z ;
 PSTATE . V = and_bits.0 {{ 1 }} ( ITESimp_0,not_bits.0 {{ 1 }} ( cvt_bool_bv.0 {{  }} ( eq_bits.0 {{ 64 }} ( SignExtend.0 {{ 32,64 }} ( Cse0__5,64 ),add_bits.0 {{ 64 }} ( SignExtend.0 {{ 32,64 }} ( __array _R [ 22 ] [ 0 +: 32 ],64 ),'0000000000000000000000000000000000000000000000000000000000000001' ) ) ) ) ) ;
 PSTATE . C = and_bits.0 {{ 1 }} ( ITESimp_0,not_bits.0 {{ 1 }} ( cvt_bool_bv.0 {{  }} ( eq_bits.0 {{ 64 }} ( ZeroExtend.0 {{ 32,64 }} ( Cse0__5,64 ),add_bits.0 {{ 64 }} ( ZeroExtend.0 {{ 32,64 }} ( __array _R [ 22 ] [ 0 +: 32 ],64 ),'0000000000000000000000000000000000000000000000000000000000000001' ) ) ) ) ) ;
-PSTATE . Z = or_bits.0 {{ 1 }} ( cvt_bool_bv.0 {{  }} ( eq_bits.0 {{ 32 }} ( Cse0__5,'00000000000000000000000000000000' ) ),not_bits.0 {{ 1 }} ( ITESimp_0 ) ) ;
+PSTATE . Z = or_bits.0 {{ 1 }} ( and_bits.0 {{ 1 }} ( ITESimp_0,cvt_bool_bv.0 {{  }} ( eq_bits.0 {{ 32 }} ( Cse0__5,'00000000000000000000000000000000' ) ) ),not_bits.0 {{ 1 }} ( ITESimp_0 ) ) ;
 PSTATE . N = and_bits.0 {{ 1 }} ( ITESimp_0,Cse0__5 [ 31 +: 1 ] ) ;

libASL/dis.ml Outdated
stmts

let dis_decode_entry (env: Eval.Env.t) ((lenv,globals): env) (decode: decode_case) (op: Primops.bigint): stmt list =
let dis_decode_entry (env: Eval.Env.t) ((lenv,globals): env) (decode: decode_case) (op: sym_bits): stmt list =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to reduce the extent of the sym_bits usage, just because we have some downstream projects which call these functions with the old APIs. Maybe symbolic features should be provided as dis_decode_entry_sym or similar

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added dis_decode_entry_sym and defined dis_decode_entry using it.

Comment on lines 592 to 599
| (Exp (Expr_TApply (FIdent ("append_bits", 0), [lw; rw], [l; r])), y) ->
let lw' = sym_of_expr lw in
let rw' = sym_of_expr rw in
let yl = sym_extract_bits loc y rw' lw' in
let yr = sym_extract_bits loc y (sym_of_int 0) rw' in
let l' = sym_and_bits loc (sym_expr lw') (sym_of_expr l) yl in
let r' = sym_and_bits loc (sym_expr rw') (sym_of_expr r) yr in
Exp (Expr_TApply (FIdent ("append_bits", 0), [lw; rw], [sym_expr l'; sym_expr r']))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that lw and rw should always be LitInt, and so we should be able to use sym_slice. also, can use sym_append_bits

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the case to match on Expr_LitInt and used the sym_slice and sym_append_bits functions.

@katrinafyi katrinafyi merged commit 25bc43e into UQ-PAC:partial_eval Jan 23, 2026
4 checks passed
@katrinafyi
Copy link
Member

katrinafyi commented Jan 23, 2026

Thanks for doing the comments and sorry for the lack of communication.

I was looking at this and doing some of the comments myself, but I got stuck on this comment of yours - https://github.com/UQ-PAC/aslp/pull/145/changes#r2469441528. I think that the better way would be to construct some ASL statements declaring those variables and then run them through the partial evaluator to perform all the necessary state modifications. But I was running into problems with this approach (which I now cannot recall...).

But I'm happy for it to be merged as-is and we just keep that improvement in the back of our minds. Since you're the main user of this feature for now, you should feel free to do fixes or improvements to this feature and I'll endeavour to merge rapidly.

@mmcloughlin
Copy link
Author

Great, thank you!

I think I figured out why the output got worse for a few cases. Fix in #147.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants