Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Graph] Explicit update with index only #350

Closed
wants to merge 857 commits into from
Closed

Conversation

reble
Copy link
Owner

@reble reble commented Jan 12, 2024

This patch removes the match by value functionality,
thus an index is always required to update a kernel parameter.

lukel97 and others added 28 commits January 17, 2024 13:32
…attribute (#77426)

This patch reworks RISCVTargetInfo::initFeatureMap to fix the issue
described
in
llvm/llvm-project#74889 (review)
(and is an alternative to #75804)

When a full arch string is specified, a "full" list of extensions is now
passed
after the __RISCV_TargetAttrNeedOverride marker feature, which includes
any
negative features that disable ISA extensions.

In initFeatureMap, there are now two code paths:

1. If the arch string was overriden, use the "full" list of override
features,
only adding back any non-isa features that were specified.

Using the full list of positive and negative features will mean that the
target-cpu will have no effect on the final arch, e.g.
__attribute__((target("arch=rv64i"))) with -mcpu=sifive-x280 will have
the
features for rv64i, not a mix of both.

2. Otherwise, parse and *append* the list of implied features. By
appending, we
turn back on any features that might have been disabled by a negative
extension, i.e. this handles the case fixed in #74889.
## Why

The documentation is written relatively to `clang-llvm`, not the root
repository directory. However, some steps in the documentation are
relative to the repository root, which is not correct.

## What
Documentation steps have been modified to make them correct and outdated
ones were updated. Some details:
* Correct paths in documentation
* Change `bootstrap.py` -> `configure.py` since bootstraping Ninja has
[slightly
changed](https://github.com/ninja-build/ninja/tree/master?tab=readme-ov-file#python)
This removes the entire modules testing infrastructure.

The current infrastructure uses CMake to generate the std and std.compat
module. This requires quite a bit of plumbing and uses CMake. Since
CMake introduced module support in CMake 3.26, modules have a higher
CMake requirement than the rest of the LLVM project. (The LLVM project
requires 3.20.) The main motivation for this approach was how libc++
generated its modules. Every header had its own module partition. This
was changed to improve performance and now only two modules remain. The
code to build these can be manually crafted.

A followup patch will reenable testing modules, using a different
approach.
This adds a new module test infrastructure. This requires tagging tests
using modules. The test runner uses this information to determine the
compiler flags needed to build and use the module.

Currently modules are build per test, which allows testing them for
tests with ADDITIONAL_COMPILE_FLAGS. At the moment only 4 tests use
modules. Therefore the performance penalty is not measurable. If in the
future more tests use modules it would be good to measure the overhead
and determine whether it's acceptable.
As suggested in #71438 we should use
  export import std;
in the std.compat module.

Testing this locally failed when building with the clang-tidy-17 plugin.
The std module was considered corrupt in the test
  libcxx/test/libcxx/module_std_compat.gen.py
however the test
  libcxx/test/libcxx/module_std.gen.py
passed. Both test generated identical std.pcm files. Using the
clang-tidy-18 plugin solves the issue.
Clang-tidy 18 no longer has false positives with the spaceship operator.
Note that I'm quite sure there are more occurrences in our headers that
are not caught.
As suggested in #71438 we should use
  export import std;
in the std.compat module.

Using this exports some named declarations from functions and records,
adding them to the global namespace. Clang correctly does not export
these and it's an issue in the declaration filtering. Declarations in
function or record context are not considered a global named
declaration.
This commit simplifies ASan helper functions in `std::vector` by
removing arguments which can be calculated later.

Short term it improves readability of helper functions in `std::vector`.

Long term it aims to help with a bigger refactor of container
annotations.
… reference (#77886)

Printing the raw symbol is useful in inline asm (e.g. getting the C++
mangled name, referencing a symbol in a custom way while ensuring it is
not optimized out even if internal). Similar constraints are available
in other targets (e.g. "S" for aarch64/riscv, "Cs" for m68k).

```
namespace ns { extern int var, a[4]; }
void foo() {
  asm(".pushsection .xxx,\"aw\"; .dc.a %p0; .popsection" :: "Ws"(&ns::var));
  asm(".reloc ., BFD_RELOC_NONE, %p0" :: "Ws"(&ns::a[3]));
}
```

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105576
… NFC

This illustrates #78352. There are other tests for gathers/scatters and masked
load/stores which likely have the same issue, but I've left them out for now
since they don't seem to be affected by fixing RISCVTTIImpl::getMemoryOpCost.
In https://reviews.llvm.org/D151292 I added the ability to track address
masks separately for high and low memory addresses, a capability of
AArch64. I did my testing with manual address mask settings (via
target.process.highmem-virtual-addressable-bits) but didn't have a real
corefile that included this metadata and required it.

My intention is that when the high address mask isn't specified, by the
user (via the setting) or the Process plugin, we fall back to using the
low address mask. The low and high address mask is the same for almost
all environments.

But the patch I wrote never uses the Process plugin high address mask if
it was set, e.g. from corefile metadata. This patch corrects that.

I also have an old patch in Phabractor that was approved to add
FixAddress methods to SBProcess; I need to pick that patch up and finish
it (I wanted to add an enum to specify which mask is being requested
iirc), so I can do address masks tests in API tests.

rdar://120926000
…R<->FPR moves

The fmv can be removed through appropriate logic in
RISCVInstrInfo::foldMemoryOperandImpl.
…… (#74125)

…tructions into account

Hint instructions like G_ASSERT_ZEXT cann be viewed as a copy. Including
this fact into the combiner allows the match more patterns involving
such instructions.
BasicAA currently says that any Constant cannot alias an identified
local object. This is not correct if the local object escaped, as it's
possible to create a pointer to the escaped object using an inttoptr
constant expression base.

To compensate for this, make sure that inttoptr constant expressions are
treated as escape sources, just like inttoptr instructions. This ensures
that the optimization can still be applied if the local object is
non-escaping. This is sufficient to still optimize the original
motivating case from c53e2ec.

Fixes llvm/llvm-project#76789.
Add a debug counter for DAGCombine. This can help with bisecting which
DAG combine introduced a miscompile.
Add a debug counter that allows forcing an sdag fallback after a certain
number of functions.

The intended use-case is to bisect which function gets miscompiled by
global isel using `-debug-counter=globalisel-count=N` (in cases where
sdag doesn't also miscompile it, of course).

The "falling back" debug line is printed unconditionally, because using
`-debug-only` is usually too spammy for the intended purpose.
We currently require that all referenced globals have an explicit
declaration or definition in the IR. For intrinsics, this requirement is
redundant, because they cannot be called indirectly (including "direct"
calls with mismatched function type). The function type used in the call
directly determines the function type of the intrinsic declaration.

Relax this requirement, and implicitly declare any intrinsics that do
not have an explicit declaration. This will remove a common annoyance
when writing tests and alive2 proofs.

(I also plan to introduce a mode where declarations for all missing
symbols will be automatically added, to make working with incomplete IR
easier -- but that will be behind a default-disabled flag.)
  CONFLICT (content): Merge conflict in clang/lib/Driver/ToolChains/Clang.cpp
Remove Python 2 support and clean up code that conditions based on
version.

Issue #76664.
This enables the libclang python binding test to check
the oldest version of Python supported in addition
to the normal python version.

It is important to check this for issue #76664, since
many new mainstream python type annotation features
and best practices are not compatible with older
versions of python.

Additionally, frustration around ever increasing
platform dependencies and versions has been raised.
This will help ensure that python maintains reasonable
backwards compatibility.

Adding this additional build step will increase the
run time, but this should always be minimal, since
the additional libclang compilation should see 100%
cache hit rate.

Issue #76664.
Fixes #76601.
The wording "fails silently" has been sometimes used to indicate that a
silenceable failure was emitted by the operation. The meaning is exactly
the opposite: silenceable failure is _not_ silent unless silenced.
…#78409)

This reverts commit 020ea3e.

This seems to break `test/libcxx/clang_tidy.gen.py/regex.sh.cpp` (in
C++03 mode for some reason):

```
2024-01-17T07:32:22.1759374Z # RUN: at line 12
2024-01-17T07:32:22.1773919Z clang-tidy-18 /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/test/libcxx/clang_tidy.gen.py/regex.sh.cpp --warnings-as-errors=* -header-filter=.* --config-file=/home/runner/_work/llvm-project/llvm-project/libcxx/.clang-tidy -- -Wweak-vtables -nostdinc++ -I /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/include/c++/v1 -I /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/include/c++/v1 -I /home/runner/_work/llvm-project/llvm-project/libcxx/test/support -std=c++03 -Werror -Wall -Wctad-maybe-unsupported -Wextra -Wshadow -Wundef -Wunused-template -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-noexcept-type -Wno-atomic-alignment -Wno-reserved-module-identifier -Wdeprecated-copy -Wdeprecated-copy-dtor -Wno-user-defined-literals -Wno-tautological-compare -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -Wno-local-type-template-args -Wno-c++11-extensions -Wno-unknown-pragmas -Wno-pass-failed -Wno-mismatched-new-delete -Wno-redundant-move -Wno-self-move -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -D_LIBCPP_ENABLE_EXPERIMENTAL -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_NONE -Werror=thread-safety -Wuser-defined-warnings  -fno-modules
2024-01-17T07:32:22.1803227Z # executed command: clang-tidy-18 /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/test/libcxx/clang_tidy.gen.py/regex.sh.cpp '--warnings-as-errors=*' '-header-filter=.*' --config-file=/home/runner/_work/llvm-project/llvm-project/libcxx/.clang-tidy -- -Wweak-vtables -nostdinc++ -I /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/include/c++/v1 -I /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/include/c++/v1 -I /home/runner/_work/llvm-project/llvm-project/libcxx/test/support -std=c++03 -Werror -Wall -Wctad-maybe-unsupported -Wextra -Wshadow -Wundef -Wunused-template -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-noexcept-type -Wno-atomic-alignment -Wno-reserved-module-identifier -Wdeprecated-copy -Wdeprecated-copy-dtor -Wno-user-defined-literals -Wno-tautological-compare -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -Wno-local-type-template-args -Wno-c++11-extensions -Wno-unknown-pragmas -Wno-pass-failed -Wno-mismatched-new-delete -Wno-redundant-move -Wno-self-move -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -D_LIBCPP_ENABLE_EXPERIMENTAL -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_NONE -Werror=thread-safety -Wuser-defined-warnings -fno-modules
2024-01-17T07:32:22.1817757Z # .---command stdout------------
2024-01-17T07:32:22.1820160Z # | /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/include/c++/v1/__iterator/iterator_traits.h:124:50: error: use nullptr [modernize-use-nullptr,-warnings-as-errors]
2024-01-17T07:32:22.1822498Z # |   124 |   static const bool value = decltype(__test<_Tp>(0, 0, 0, 0, 0))::value;
2024-01-17T07:32:22.1823337Z # |       |                                                  ^
2024-01-17T07:32:22.1824052Z # |       |                                                  nullptr
2024-01-17T07:32:22.1826651Z # | /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/include/c++/v1/__iterator/iterator_traits.h:124:53: error: use nullptr [modernize-use-nullptr,-warnings-as-errors]
2024-01-17T07:32:22.1829325Z # |   124 |   static const bool value = decltype(__test<_Tp>(0, 0, 0, 0, 0))::value;
2024-01-17T07:32:22.1831137Z # |       |                                                     ^
2024-01-17T07:32:22.1831963Z # |       |                                                     nullptr
2024-01-17T07:32:22.1834149Z # | /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/include/c++/v1/__iterator/iterator_traits.h:124:56: error: use nullptr [modernize-use-nullptr,-warnings-as-errors]
2024-01-17T07:32:22.1848711Z # |   124 |   static const bool value = decltype(__test<_Tp>(0, 0, 0, 0, 0))::value;
2024-01-17T07:32:22.1849506Z # |       |                                                        ^
2024-01-17T07:32:22.1849997Z # |       |                                                        nullptr
2024-01-17T07:32:22.1851528Z # | /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/include/c++/v1/__iterator/iterator_traits.h:124:59: error: use nullptr [modernize-use-nullptr,-warnings-as-errors]
2024-01-17T07:32:22.1853391Z # |   124 |   static const bool value = decltype(__test<_Tp>(0, 0, 0, 0, 0))::value;
2024-01-17T07:32:22.1854277Z # |       |                                                           ^
2024-01-17T07:32:22.1854841Z # |       |                                                           nullptr
2024-01-17T07:32:22.1856314Z # | /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/include/c++/v1/__iterator/iterator_traits.h:124:62: error: use nullptr [modernize-use-nullptr,-warnings-as-errors]
2024-01-17T07:32:22.1857768Z # |   124 |   static const bool value = decltype(__test<_Tp>(0, 0, 0, 0, 0))::value;
2024-01-17T07:32:22.1858493Z # |       |                                                              ^
2024-01-17T07:32:22.1858998Z # |       |                                                              nullptr
2024-01-17T07:32:22.1859503Z # `-----------------------------
2024-01-17T07:32:22.1859900Z # .---command stderr------------
2024-01-17T07:32:22.1860259Z # | 43 warnings generated.
2024-01-17T07:32:22.1860710Z # | Suppressed 33 warnings (33 in non-user code).
2024-01-17T07:32:22.1861809Z # | Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
2024-01-17T07:32:22.1862750Z # | 5 warnings treated as errors
2024-01-17T07:32:22.1863229Z # `-----------------------------
2024-01-17T07:32:22.1863622Z # error: command failed with exit status: 1
```

(see
https://github.com/llvm/llvm-project/actions/runs/7552394884/job/20561192096)
…#76641)

Fixes llvm/llvm-project#76549

The cause of the optimization miss was -
1. `optimizePow` converting almost integer FP exponents to integer, and
turning `pow` to `powi`.
2. `optimizeLog` not accepting `Intrinsic::powi` as a target.

This patch converts constantInt back to constantFP where applicable and
adds a test.
This fixes #71892 allowing us to check magled names in the IR verifier.
…. (#71556)

Substitute with zero-extended to i64 ballot.i32 intrinsic.
@EwanC
Copy link
Collaborator

EwanC commented Jan 19, 2024

  1. I think we need to change how the dynamic_parameter class is templated to as suggested by Greg in [SYCL][Graph] Add API for updating single node parameters #340 (comment)
template<typename T>
class dynamic_parameter {
 public:
  dynamic_parameter(const T& param, command_graph<graph_state::modifiable> graph, std::string label = "");
  register(const handler &cgh);
  update(const T& newval, command_graph<graph_state::executable> execGraph);
};
  1. I think we need to remove wording about changing concurrent execution management and revisit this later. We haven't fully thought this through and the implementation change will drag out the time to implement this spec change
If `graph` is submitted multiple times and
`property::graph::updatable_graph` is not set, dependencies are automatically
added by the runtime to prevent concurrent executions of an identical graph.

sys-ce-bb and others added 7 commits January 19, 2024 07:06
This PR preserves unsigned return type of image read and unsigned texel
type of image write builtins as ZeroExtend image operand in SPIRV.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@0aab124
This functionality was added in SPIR-V 1.2 and allows using an <id> to
set the execution modes SubgroupsPerWorkgroupId, LocalSizeId, and
LocalSizeHintI, and others.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@10b0aab
@Bensuo Bensuo changed the base branch from ben/explicit-update to sycl January 22, 2024 18:13
Comment on lines 2087 to 2088
Having a more unified approach to updating nodes/graphs may provide a more
concise and easy to use API.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Been thinking about this since we chatted yesterday, and I'm not mad about going forward with an API that we think we could do better with and will change very soon after delivering. If it'll only take a day or two to modify this so that nd_range is updated on the node, and nodes are passed to a graph to update, then it might be worth doing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new commit looks good, think we can remove this issue item

mfrancepillois and others added 3 commits January 23, 2024 13:15
…ntel#12433)

When a host-task is submitted to in-order queue, dependency between this
host-task and the successor is explicitly handled.
However, when we record an in-order queue, the recorded CG are not part
of the regular in-order queue execution sequence.
But inter-CG dependencies are managed by the graph implementation.
This PR implements this point and ensures that recording an in-order
does not impact the normal execution sequence.
Tests (e2e and unitest) have been added to check it.

---------

Co-authored-by: Marcos Maronas <[email protected]>
The method was deprecated in favor of `StringRef::starts_with` in
llvm/llvm-project#75491
pre-commit Pr for
oneapi-src/unified-runtime#1135

---------

Signed-off-by: Spruit, Neil R <[email protected]>
Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
@Bensuo
Copy link
Collaborator

Bensuo commented Jan 23, 2024

I've pushed a commit wihch proposes some API changes to (IMO) improve the usage of the API, and make things a bit clearer in what's happening.

The main change is the ND-ranges are updated on the Node itself, and that updates to the executable graph are triggered by explicitly passing updates nodes to the graph. This to me serves two purposes:

  • Unifies how Node param and ND-range updates are applied to the executable graph
  • Makes explicit for the user where the update actually occurs. Previously dynamic_parameter had a lot of responsibility, both updating nodes and also applying those updates to the executable graph. IMO this is clearer and easier to use.

Copy link
Collaborator

@EwanC EwanC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments are minor nits, LGTM

Comment on lines 2087 to 2088
Having a more unified approach to updating nodes/graphs may provide a more
concise and easy to use API.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new commit looks good, think we can remove this issue item

fineg74 and others added 8 commits January 23, 2024 11:55
…erations with simd vectors (intel#12089)" (intel#12462)

This reverts commit 8c92df9 to address
test failures
…ntel#12470)

Those test cases fail even with the most recent GPU drivers, it seems
the issue with such N was fixed for slm_block_load(), but not for
slm_block_store().

Signed-off-by: Klochkov, Vyacheslav N <[email protected]>
…l name (intel#12393)

Implement a custom string class (backed by a `DynArray<char>`, hence
owning its memory) to represent kernel names on the fusion interface.

*This PR is part of a series of changes to remove uses of STL classes in
the kernel fusion interface to prevent ABI issues in the future.*

Signed-off-by: Julian Oppermann <[email protected]>
…intel#12445)

Untangle the interaction between options and the `Config` class, and
hide `Config` (which needs to store options in a map) from the KF
interface.

The idea is to introduce free functions to interact with the
`thread_local` instance of `Config` held by the existing `ConfigHelper`
class, instead of letting the client construct the `Config` object and
handing it to over `ConfigHelper` for storing it.

_This PR is part of a series of changes to remove uses of STL classes in
the kernel fusion interface to prevent ABI issues in the future._

Signed-off-by: Julian Oppermann <[email protected]>
Dynamic parameters are arguments to a node's command-group which can be updated
by the user after the node has been added to a graph. Updating the value of a
dynamic parameter will be reflected in the modifiable graph which contains this
node. Executable graphs can then be updated with these update nodes
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
node. Executable graphs can then be updated with these update nodes
node.

Sentence doesn't make sense to me. Maybe we can be more specific or just remove it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to rework this sentence to make it read better. I think it's important to keep something here that explains how the update makes it's way to the executable graph in the end though.

bb-sycl and others added 3 commits January 24, 2024 08:15
Scheduled drivers uplift

Co-authored-by: GitHub Actions <[email protected]>
It's failing in postcommit, we need this to fix the CI.

Signed-off-by: Sarnie, Nick <[email protected]>
- Adds an API to the spec for updating graph nodes using explicit indices
- Also includes functionality for updating ND-range of kernel nodes
- Note: Current design is only for kernel execution nodes
@Bensuo Bensuo force-pushed the pablo/explicit-update branch from dff51db to d7f5d51 Compare January 24, 2024 18:30
@Bensuo
Copy link
Collaborator

Bensuo commented Jan 24, 2024

Closing in favour of upstream PR: intel#12486

@Bensuo Bensuo closed this Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.