Skip to content

Conversation

@kito-cheng
Copy link
Owner

No description provided.

@kito-cheng kito-cheng requested a review from Copilot August 8, 2025 05:37

This comment was marked as outdated.

@kito-cheng kito-cheng requested a review from Copilot August 8, 2025 05:41
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements support for RISC-V Vector Length Specific (VLS) calling conventions that enable passing vector arguments with known lengths in vector registers. The implementation adds multiple VLS calling convention variants (for different vector lengths from 32 to 16384 bits) and updates the argument passing and return value mechanisms to handle VLS types appropriately.

Key changes include:

  • Addition of VLS calling convention variants and associated helper functions
  • Updates to argument flattening logic to support up to 8 aggregate fields for VLS types
  • Implementation of vector register passing for VLS types and aggregates containing VLS types

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
gcc/tree-nrv.cc Updates function call to use function type instead of function declaration for ABI compatibility
gcc/function-abi.h Increases maximum number of ABI IDs from 8 to 12 to accommodate new VLS variants
gcc/config/riscv/riscv.h Adds VLS calling convention enums and removes deprecated macros
gcc/config/riscv/riscv.cc Implements core VLS calling convention logic, argument passing, and return value handling
gcc/config/riscv/riscv-v.cc Updates VLS mode validation to support larger LMUL values
gcc/config/riscv/riscv-protos.h Updates function signatures for VLS support

Comment on lines 7091 to 7102
bool vls_cc_p = lookup_attribute ("vls_cc", attr) != NULL_TREE
|| lookup_attribute ("riscv_vls_cc", attr) != NULL_TREE;

if (vls_cc_p && !TARGET_VECTOR)
error_at (input_location,
"function attribute %qs requires the V ISA extension",
"riscv_vls_cc");

// TODO: Get ABI_VLEN

if (vls_cc_p)
return RISCV_CC_VLS_V_128;
Copy link

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment indicates incomplete implementation. The ABI_VLEN should be extracted from the function attribute to properly determine the VLS calling convention variant.

Suggested change
bool vls_cc_p = lookup_attribute ("vls_cc", attr) != NULL_TREE
|| lookup_attribute ("riscv_vls_cc", attr) != NULL_TREE;
if (vls_cc_p && !TARGET_VECTOR)
error_at (input_location,
"function attribute %qs requires the V ISA extension",
"riscv_vls_cc");
// TODO: Get ABI_VLEN
if (vls_cc_p)
return RISCV_CC_VLS_V_128;
tree vls_attr = lookup_attribute ("vls_cc", attr);
if (vls_attr == NULL_TREE)
vls_attr = lookup_attribute ("riscv_vls_cc", attr);
bool vls_cc_p = vls_attr != NULL_TREE;
if (vls_cc_p && !TARGET_VECTOR)
error_at (input_location,
"function attribute %qs requires the V ISA extension",
"riscv_vls_cc");
int abi_vlen = 128; // Default value
if (vls_cc_p && TREE_VALUE (vls_attr) != NULL_TREE)
{
tree args = TREE_VALUE (vls_attr);
if (args && TREE_CODE (args) == TREE_LIST)
{
tree vlen_arg = TREE_VALUE (args);
if (vlen_arg && TREE_CODE (vlen_arg) == INTEGER_CST)
abi_vlen = TREE_INT_CST_LOW (vlen_arg);
}
}
if (vls_cc_p)
{
switch (abi_vlen)
{
case 128:
return RISCV_CC_VLS_V_128;
case 256:
return RISCV_CC_VLS_V_256;
case 512:
return RISCV_CC_VLS_V_512;
default:
error_at (input_location,
"unsupported ABI_VLEN value %d for vls_cc attribute; supported values are 128, 256, 512",
abi_vlen);
return RISCV_CC_UNKNOWN;
}
}

Copilot uses AI. Check for mistakes.
ATTRIBUTE_UNUSED int flags,
bool *no_add_attrs)
{
// TODO: Handle ABI_VLEN
Copy link

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment indicates that ABI_VLEN handling is not implemented in the attribute handler. This should extract and validate the ABI_VLEN parameter from the attribute arguments.

Suggested change
// TODO: Handle ABI_VLEN
/* Handle the ABI_VLEN argument for riscv_rvv_vls_cc attribute. */
if (!is_attribute_p ("riscv_rvv_vls_cc", name))
return NULL_TREE;
if (!args)
{
warning (OPT_Wattributes,
"%qE attribute requires an integer ABI_VLEN argument",
name);
*no_add_attrs = true;
return NULL_TREE;
}
tree cst = TREE_VALUE (args);
if (TREE_CODE (cst) != INTEGER_CST)
{
warning (OPT_Wattributes,
"%qE attribute argument must be an integer constant",
name);
*no_add_attrs = true;
return NULL_TREE;
}
/* Extract the ABI_VLEN value. */
HOST_WIDE_INT abi_vlen = TREE_INT_CST_LOW (cst);
if (abi_vlen <= 0)
{
warning (OPT_Wattributes,
"%qE attribute ABI_VLEN argument must be positive",
name);
*no_add_attrs = true;
return NULL_TREE;
}
/* Optionally, add further validation for ABI_VLEN here (e.g., must be a power of two). */
/* If valid, allow the attribute. */

Copilot uses AI. Check for mistakes.
in integer mode. */

static machine_mode
riscv_get_vls_constainer_type (machine_mode mode, unsigned abi_vlen)
Copy link

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function name has a typo: 'constainer' should be 'container'.

Suggested change
riscv_get_vls_constainer_type (machine_mode mode, unsigned abi_vlen)
riscv_get_vls_container_type (machine_mode mode, unsigned abi_vlen)

Copilot uses AI. Check for mistakes.
For more details please reference the below link.
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/67 */
static riscv_cc
riscv_vls_cc_funciton_abi (const_tree fntype)
Copy link

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function name has a typo: 'funciton' should be 'function'.

Suggested change
riscv_vls_cc_funciton_abi (const_tree fntype)
riscv_vls_cc_function_abi (const_tree fntype)

Copilot uses AI. Check for mistakes.
…slot::execute

Call with funciton pointer might not able to get the fndecl, but fntype
so use gimple_call_fntype instead of gimple_call_fndecl.

aggregate_value_p can handle fndecl and fntype right (and even CALL_EXPR), so I
think this change is safe.

gcc/ChangeLog:

	* tree-nrv.cc (pass_return_slot::execute): Use
	gimple_call_fntype instead of gimple_call_fndecl.
We used to apply -mrvv-max-lmul= to limit VLS code gen, auto vectorizer,
and builtin string function expansion. But I think the VLS code gen part doesn't
need this limit, since it only happens when the user explicitly writes vector
types.

For example, int32x8_t under -mrvv-max-lmul=m1 with VLEN=128 would be split into
two int32x4_t, which generate more instructions and runs slower.

In this patch, I changed -mrvv-max-lmul= to only affect auto vectorization and
builtin string function expansion. Actually, the option's help text already
says it only controls the LMUL used by auto-vectorization, so I believe this
change is makes sense :)

gcc/ChangeLog:

	* config/riscv/riscv-protos.h (vls_mode_valid_p): New argument
	allow_up_to_lmul_8.
	* config/riscv/riscv-v.cc (autovectorize_vector_modes): Set
	allow_up_to_lmul_8 to false.
	(vls_mode_valid_p): Add new argument allow_up_to_lmul_8, and use
	it to determine whether to allow LMUL 8.

gcc/testsuite/ChangeLog:

	* gcc.target.riscv/rvv/vls-type-rvv-max-lmul.c: New test.
can_find_related_mode_p didn't handle VLS type correctly, and the root
cause is TARGET_MIN_VLEN is in bits, but what we want is in bytes.

gcc/ChangeLog:

	* config/riscv/riscv-selftests.cc (riscv_run_selftests):
	Call run_vectorize_related_mode_selftests.
	(test_vectorize_related_mode): New.
	(run_vectorize_related_mode_selftests): New.
	* config/riscv/riscv-v.cc (can_find_related_mode_p):
	Handle VLS type correctly.
RISC-V fixed length vector calling convention has defined with variable
ABI_VLEN argument, which will expand several ABI IDs in the
implmentation, and we need to bump the number to implmente that...
…FUNCTION_VALUE|LIBCALL_VALUE]

The macro version is deprecated, and also VLS CC need more infomation later,
so it should be good timing to use new target hook for those hooks.
kito-cheng pushed a commit that referenced this pull request Sep 3, 2025
When comparing constraints during correspondence checking for a using
from a partial specialization, we need to substitute the partial
specialization arguments into the constraints rather than the primary
template arguments.  Otherwise we incorrectly reject e.g. the below
testcase as ambiguous since we substitute T=int* instead of T=int
into #1's constraints and don't notice the correspondence.

This patch corrects the recent r16-2771-gb9f1cc4e119da9 fix by using
outer_template_args instead of TI_ARGS of the DECL_CONTEXT, which
should always give the correct outer arguments for substitution.

	PR c++/121351

gcc/cp/ChangeLog:

	* class.cc (add_method): Use outer_template_args when
	substituting outer template arguments into constraints.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp2a/concepts-using7.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
kito-cheng pushed a commit that referenced this pull request Sep 3, 2025
…op is invariant [PR121290]

Consider the example:

void
f (int *restrict x, int *restrict y, int *restrict z, int n)
{
  for (int i = 0; i < 4; ++i)
    {
      int res = 0;
      for (int j = 0; j < 100; ++j)
        res += y[j] * z[i];
      x[i] = res;
    }
}

we currently vectorize as

f:
        movi    v30.4s, 0
        ldr     q31, [x2]
        add     x2, x1, 400
.L2:
        ld1r    {v29.4s}, [x1], 4
        mla     v30.4s, v29.4s, v31.4s
        cmp     x2, x1
        bne     .L2
        str     q30, [x0]
        ret

which is not useful because by doing outer-loop vectorization we're performing
less work per iteration than we would had we done inner-loop vectorization and
simply unrolled the inner loop.

This patch teaches the cost model that if all your leafs are invariant, then
adjust the loop cost by * VF, since every vector iteration has at least one lane
really just doing 1 scalar.

There are a couple of ways we could have solved this, one is to increase the
unroll factor to process more iterations of the inner loop.  This removes the
need for the broadcast, however we don't support unrolling the inner loop within
the outer loop.  We only support unrolling by increasing the VF, which would
affect the outer loop as well as the inner loop.

We also don't directly support costing inner-loop vs outer-loop vectorization,
and as such we're left trying to predict/steer the cost model ahead of time to
what we think should be profitable.  This patch attempts to do so using a
heuristic which penalizes the outer-loop vectorization.

We now cost the loop as

note:  Cost model analysis:
  Vector inside of loop cost: 2000
  Vector prologue cost: 4
  Vector epilogue cost: 0
  Scalar iteration cost: 300
  Scalar outside cost: 0
  Vector outside cost: 4
  prologue iterations: 0
  epilogue iterations: 0
missed:  cost model: the vector iteration cost = 2000 divided by the scalar iteration cost = 300 is greater or equal to the vectorization factor = 4.
missed:  not vectorized: vectorization not profitable.
missed:  not vectorized: vector version will never be profitable.
missed:  Loop costings may not be worthwhile.

And subsequently generate:

.L5:
        add     w4, w4, w7
        ld1w    z24.s, p6/z, [x0, #1, mul vl]
        ld1w    z23.s, p6/z, [x0, #2, mul vl]
        ld1w    z22.s, p6/z, [x0, #3, mul vl]
        ld1w    z29.s, p6/z, [x0]
        mla     z26.s, p6/m, z24.s, z30.s
        add     x0, x0, x8
        mla     z27.s, p6/m, z23.s, z30.s
        mla     z28.s, p6/m, z22.s, z30.s
        mla     z25.s, p6/m, z29.s, z30.s
        cmp     w4, w6
        bls     .L5

and avoids the load and replicate if it knows it has enough vector pipes to do
so.

gcc/ChangeLog:

	PR target/121290
	* config/aarch64/aarch64.cc
	(class aarch64_vector_costs ): Add m_loop_fully_scalar_dup.
	(aarch64_vector_costs::add_stmt_cost): Detect invariant inner loops.
	(adjust_body_cost): Adjust final costing if m_loop_fully_scalar_dup.

gcc/testsuite/ChangeLog:

	PR target/121290
	* gcc.target/aarch64/pr121290.c: New test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants