-
Notifications
You must be signed in to change notification settings - Fork 0
Kitoc/vls cc #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Kitoc/vls cc #1
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements support for RISC-V Vector Length Specific (VLS) calling conventions that enable passing vector arguments with known lengths in vector registers. The implementation adds multiple VLS calling convention variants (for different vector lengths from 32 to 16384 bits) and updates the argument passing and return value mechanisms to handle VLS types appropriately.
Key changes include:
- Addition of VLS calling convention variants and associated helper functions
- Updates to argument flattening logic to support up to 8 aggregate fields for VLS types
- Implementation of vector register passing for VLS types and aggregates containing VLS types
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| gcc/tree-nrv.cc | Updates function call to use function type instead of function declaration for ABI compatibility |
| gcc/function-abi.h | Increases maximum number of ABI IDs from 8 to 12 to accommodate new VLS variants |
| gcc/config/riscv/riscv.h | Adds VLS calling convention enums and removes deprecated macros |
| gcc/config/riscv/riscv.cc | Implements core VLS calling convention logic, argument passing, and return value handling |
| gcc/config/riscv/riscv-v.cc | Updates VLS mode validation to support larger LMUL values |
| gcc/config/riscv/riscv-protos.h | Updates function signatures for VLS support |
gcc/config/riscv/riscv.cc
Outdated
| bool vls_cc_p = lookup_attribute ("vls_cc", attr) != NULL_TREE | ||
| || lookup_attribute ("riscv_vls_cc", attr) != NULL_TREE; | ||
|
|
||
| if (vls_cc_p && !TARGET_VECTOR) | ||
| error_at (input_location, | ||
| "function attribute %qs requires the V ISA extension", | ||
| "riscv_vls_cc"); | ||
|
|
||
| // TODO: Get ABI_VLEN | ||
|
|
||
| if (vls_cc_p) | ||
| return RISCV_CC_VLS_V_128; |
Copilot
AI
Aug 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO comment indicates incomplete implementation. The ABI_VLEN should be extracted from the function attribute to properly determine the VLS calling convention variant.
| bool vls_cc_p = lookup_attribute ("vls_cc", attr) != NULL_TREE | |
| || lookup_attribute ("riscv_vls_cc", attr) != NULL_TREE; | |
| if (vls_cc_p && !TARGET_VECTOR) | |
| error_at (input_location, | |
| "function attribute %qs requires the V ISA extension", | |
| "riscv_vls_cc"); | |
| // TODO: Get ABI_VLEN | |
| if (vls_cc_p) | |
| return RISCV_CC_VLS_V_128; | |
| tree vls_attr = lookup_attribute ("vls_cc", attr); | |
| if (vls_attr == NULL_TREE) | |
| vls_attr = lookup_attribute ("riscv_vls_cc", attr); | |
| bool vls_cc_p = vls_attr != NULL_TREE; | |
| if (vls_cc_p && !TARGET_VECTOR) | |
| error_at (input_location, | |
| "function attribute %qs requires the V ISA extension", | |
| "riscv_vls_cc"); | |
| int abi_vlen = 128; // Default value | |
| if (vls_cc_p && TREE_VALUE (vls_attr) != NULL_TREE) | |
| { | |
| tree args = TREE_VALUE (vls_attr); | |
| if (args && TREE_CODE (args) == TREE_LIST) | |
| { | |
| tree vlen_arg = TREE_VALUE (args); | |
| if (vlen_arg && TREE_CODE (vlen_arg) == INTEGER_CST) | |
| abi_vlen = TREE_INT_CST_LOW (vlen_arg); | |
| } | |
| } | |
| if (vls_cc_p) | |
| { | |
| switch (abi_vlen) | |
| { | |
| case 128: | |
| return RISCV_CC_VLS_V_128; | |
| case 256: | |
| return RISCV_CC_VLS_V_256; | |
| case 512: | |
| return RISCV_CC_VLS_V_512; | |
| default: | |
| error_at (input_location, | |
| "unsupported ABI_VLEN value %d for vls_cc attribute; supported values are 128, 256, 512", | |
| abi_vlen); | |
| return RISCV_CC_UNKNOWN; | |
| } | |
| } |
gcc/config/riscv/riscv.cc
Outdated
| ATTRIBUTE_UNUSED int flags, | ||
| bool *no_add_attrs) | ||
| { | ||
| // TODO: Handle ABI_VLEN |
Copilot
AI
Aug 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO comment indicates that ABI_VLEN handling is not implemented in the attribute handler. This should extract and validate the ABI_VLEN parameter from the attribute arguments.
| // TODO: Handle ABI_VLEN | |
| /* Handle the ABI_VLEN argument for riscv_rvv_vls_cc attribute. */ | |
| if (!is_attribute_p ("riscv_rvv_vls_cc", name)) | |
| return NULL_TREE; | |
| if (!args) | |
| { | |
| warning (OPT_Wattributes, | |
| "%qE attribute requires an integer ABI_VLEN argument", | |
| name); | |
| *no_add_attrs = true; | |
| return NULL_TREE; | |
| } | |
| tree cst = TREE_VALUE (args); | |
| if (TREE_CODE (cst) != INTEGER_CST) | |
| { | |
| warning (OPT_Wattributes, | |
| "%qE attribute argument must be an integer constant", | |
| name); | |
| *no_add_attrs = true; | |
| return NULL_TREE; | |
| } | |
| /* Extract the ABI_VLEN value. */ | |
| HOST_WIDE_INT abi_vlen = TREE_INT_CST_LOW (cst); | |
| if (abi_vlen <= 0) | |
| { | |
| warning (OPT_Wattributes, | |
| "%qE attribute ABI_VLEN argument must be positive", | |
| name); | |
| *no_add_attrs = true; | |
| return NULL_TREE; | |
| } | |
| /* Optionally, add further validation for ABI_VLEN here (e.g., must be a power of two). */ | |
| /* If valid, allow the attribute. */ |
gcc/config/riscv/riscv.cc
Outdated
| in integer mode. */ | ||
|
|
||
| static machine_mode | ||
| riscv_get_vls_constainer_type (machine_mode mode, unsigned abi_vlen) |
Copilot
AI
Aug 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function name has a typo: 'constainer' should be 'container'.
| riscv_get_vls_constainer_type (machine_mode mode, unsigned abi_vlen) | |
| riscv_get_vls_container_type (machine_mode mode, unsigned abi_vlen) |
gcc/config/riscv/riscv.cc
Outdated
| For more details please reference the below link. | ||
| https://github.com/riscv-non-isa/riscv-c-api-doc/pull/67 */ | ||
| static riscv_cc | ||
| riscv_vls_cc_funciton_abi (const_tree fntype) |
Copilot
AI
Aug 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function name has a typo: 'funciton' should be 'function'.
| riscv_vls_cc_funciton_abi (const_tree fntype) | |
| riscv_vls_cc_function_abi (const_tree fntype) |
…slot::execute Call with funciton pointer might not able to get the fndecl, but fntype so use gimple_call_fntype instead of gimple_call_fndecl. aggregate_value_p can handle fndecl and fntype right (and even CALL_EXPR), so I think this change is safe. gcc/ChangeLog: * tree-nrv.cc (pass_return_slot::execute): Use gimple_call_fntype instead of gimple_call_fndecl.
We used to apply -mrvv-max-lmul= to limit VLS code gen, auto vectorizer, and builtin string function expansion. But I think the VLS code gen part doesn't need this limit, since it only happens when the user explicitly writes vector types. For example, int32x8_t under -mrvv-max-lmul=m1 with VLEN=128 would be split into two int32x4_t, which generate more instructions and runs slower. In this patch, I changed -mrvv-max-lmul= to only affect auto vectorization and builtin string function expansion. Actually, the option's help text already says it only controls the LMUL used by auto-vectorization, so I believe this change is makes sense :) gcc/ChangeLog: * config/riscv/riscv-protos.h (vls_mode_valid_p): New argument allow_up_to_lmul_8. * config/riscv/riscv-v.cc (autovectorize_vector_modes): Set allow_up_to_lmul_8 to false. (vls_mode_valid_p): Add new argument allow_up_to_lmul_8, and use it to determine whether to allow LMUL 8. gcc/testsuite/ChangeLog: * gcc.target.riscv/rvv/vls-type-rvv-max-lmul.c: New test.
can_find_related_mode_p didn't handle VLS type correctly, and the root cause is TARGET_MIN_VLEN is in bits, but what we want is in bytes. gcc/ChangeLog: * config/riscv/riscv-selftests.cc (riscv_run_selftests): Call run_vectorize_related_mode_selftests. (test_vectorize_related_mode): New. (run_vectorize_related_mode_selftests): New. * config/riscv/riscv-v.cc (can_find_related_mode_p): Handle VLS type correctly.
RISC-V fixed length vector calling convention has defined with variable ABI_VLEN argument, which will expand several ABI IDs in the implmentation, and we need to bump the number to implmente that...
…FUNCTION_VALUE|LIBCALL_VALUE] The macro version is deprecated, and also VLS CC need more infomation later, so it should be good timing to use new target hook for those hooks.
b9683c8 to
7ed5d27
Compare
When comparing constraints during correspondence checking for a using from a partial specialization, we need to substitute the partial specialization arguments into the constraints rather than the primary template arguments. Otherwise we incorrectly reject e.g. the below testcase as ambiguous since we substitute T=int* instead of T=int into #1's constraints and don't notice the correspondence. This patch corrects the recent r16-2771-gb9f1cc4e119da9 fix by using outer_template_args instead of TI_ARGS of the DECL_CONTEXT, which should always give the correct outer arguments for substitution. PR c++/121351 gcc/cp/ChangeLog: * class.cc (add_method): Use outer_template_args when substituting outer template arguments into constraints. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-using7.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>
…op is invariant [PR121290]
Consider the example:
void
f (int *restrict x, int *restrict y, int *restrict z, int n)
{
for (int i = 0; i < 4; ++i)
{
int res = 0;
for (int j = 0; j < 100; ++j)
res += y[j] * z[i];
x[i] = res;
}
}
we currently vectorize as
f:
movi v30.4s, 0
ldr q31, [x2]
add x2, x1, 400
.L2:
ld1r {v29.4s}, [x1], 4
mla v30.4s, v29.4s, v31.4s
cmp x2, x1
bne .L2
str q30, [x0]
ret
which is not useful because by doing outer-loop vectorization we're performing
less work per iteration than we would had we done inner-loop vectorization and
simply unrolled the inner loop.
This patch teaches the cost model that if all your leafs are invariant, then
adjust the loop cost by * VF, since every vector iteration has at least one lane
really just doing 1 scalar.
There are a couple of ways we could have solved this, one is to increase the
unroll factor to process more iterations of the inner loop. This removes the
need for the broadcast, however we don't support unrolling the inner loop within
the outer loop. We only support unrolling by increasing the VF, which would
affect the outer loop as well as the inner loop.
We also don't directly support costing inner-loop vs outer-loop vectorization,
and as such we're left trying to predict/steer the cost model ahead of time to
what we think should be profitable. This patch attempts to do so using a
heuristic which penalizes the outer-loop vectorization.
We now cost the loop as
note: Cost model analysis:
Vector inside of loop cost: 2000
Vector prologue cost: 4
Vector epilogue cost: 0
Scalar iteration cost: 300
Scalar outside cost: 0
Vector outside cost: 4
prologue iterations: 0
epilogue iterations: 0
missed: cost model: the vector iteration cost = 2000 divided by the scalar iteration cost = 300 is greater or equal to the vectorization factor = 4.
missed: not vectorized: vectorization not profitable.
missed: not vectorized: vector version will never be profitable.
missed: Loop costings may not be worthwhile.
And subsequently generate:
.L5:
add w4, w4, w7
ld1w z24.s, p6/z, [x0, #1, mul vl]
ld1w z23.s, p6/z, [x0, #2, mul vl]
ld1w z22.s, p6/z, [x0, #3, mul vl]
ld1w z29.s, p6/z, [x0]
mla z26.s, p6/m, z24.s, z30.s
add x0, x0, x8
mla z27.s, p6/m, z23.s, z30.s
mla z28.s, p6/m, z22.s, z30.s
mla z25.s, p6/m, z29.s, z30.s
cmp w4, w6
bls .L5
and avoids the load and replicate if it knows it has enough vector pipes to do
so.
gcc/ChangeLog:
PR target/121290
* config/aarch64/aarch64.cc
(class aarch64_vector_costs ): Add m_loop_fully_scalar_dup.
(aarch64_vector_costs::add_stmt_cost): Detect invariant inner loops.
(adjust_body_cost): Adjust final costing if m_loop_fully_scalar_dup.
gcc/testsuite/ChangeLog:
PR target/121290
* gcc.target/aarch64/pr121290.c: New test.
No description provided.