-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Graph] Explicit update with index only #350
Conversation
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
…attribute (#77426) This patch reworks RISCVTargetInfo::initFeatureMap to fix the issue described in llvm/llvm-project#74889 (review) (and is an alternative to #75804) When a full arch string is specified, a "full" list of extensions is now passed after the __RISCV_TargetAttrNeedOverride marker feature, which includes any negative features that disable ISA extensions. In initFeatureMap, there are now two code paths: 1. If the arch string was overriden, use the "full" list of override features, only adding back any non-isa features that were specified. Using the full list of positive and negative features will mean that the target-cpu will have no effect on the final arch, e.g. __attribute__((target("arch=rv64i"))) with -mcpu=sifive-x280 will have the features for rv64i, not a mix of both. 2. Otherwise, parse and *append* the list of implied features. By appending, we turn back on any features that might have been disabled by a negative extension, i.e. this handles the case fixed in #74889.
## Why The documentation is written relatively to `clang-llvm`, not the root repository directory. However, some steps in the documentation are relative to the repository root, which is not correct. ## What Documentation steps have been modified to make them correct and outdated ones were updated. Some details: * Correct paths in documentation * Change `bootstrap.py` -> `configure.py` since bootstraping Ninja has [slightly changed](https://github.com/ninja-build/ninja/tree/master?tab=readme-ov-file#python)
This removes the entire modules testing infrastructure. The current infrastructure uses CMake to generate the std and std.compat module. This requires quite a bit of plumbing and uses CMake. Since CMake introduced module support in CMake 3.26, modules have a higher CMake requirement than the rest of the LLVM project. (The LLVM project requires 3.20.) The main motivation for this approach was how libc++ generated its modules. Every header had its own module partition. This was changed to improve performance and now only two modules remain. The code to build these can be manually crafted. A followup patch will reenable testing modules, using a different approach.
This adds a new module test infrastructure. This requires tagging tests using modules. The test runner uses this information to determine the compiler flags needed to build and use the module. Currently modules are build per test, which allows testing them for tests with ADDITIONAL_COMPILE_FLAGS. At the moment only 4 tests use modules. Therefore the performance penalty is not measurable. If in the future more tests use modules it would be good to measure the overhead and determine whether it's acceptable.
As suggested in #71438 we should use export import std; in the std.compat module. Testing this locally failed when building with the clang-tidy-17 plugin. The std module was considered corrupt in the test libcxx/test/libcxx/module_std_compat.gen.py however the test libcxx/test/libcxx/module_std.gen.py passed. Both test generated identical std.pcm files. Using the clang-tidy-18 plugin solves the issue.
Clang-tidy 18 no longer has false positives with the spaceship operator. Note that I'm quite sure there are more occurrences in our headers that are not caught.
As suggested in #71438 we should use export import std; in the std.compat module. Using this exports some named declarations from functions and records, adding them to the global namespace. Clang correctly does not export these and it's an issue in the declaration filtering. Declarations in function or record context are not considered a global named declaration.
This commit simplifies ASan helper functions in `std::vector` by removing arguments which can be calculated later. Short term it improves readability of helper functions in `std::vector`. Long term it aims to help with a bigger refactor of container annotations.
… reference (#77886) Printing the raw symbol is useful in inline asm (e.g. getting the C++ mangled name, referencing a symbol in a custom way while ensuring it is not optimized out even if internal). Similar constraints are available in other targets (e.g. "S" for aarch64/riscv, "Cs" for m68k). ``` namespace ns { extern int var, a[4]; } void foo() { asm(".pushsection .xxx,\"aw\"; .dc.a %p0; .popsection" :: "Ws"(&ns::var)); asm(".reloc ., BFD_RELOC_NONE, %p0" :: "Ws"(&ns::a[3])); } ``` Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105576
… NFC This illustrates #78352. There are other tests for gathers/scatters and masked load/stores which likely have the same issue, but I've left them out for now since they don't seem to be affected by fixing RISCVTTIImpl::getMemoryOpCost.
In https://reviews.llvm.org/D151292 I added the ability to track address masks separately for high and low memory addresses, a capability of AArch64. I did my testing with manual address mask settings (via target.process.highmem-virtual-addressable-bits) but didn't have a real corefile that included this metadata and required it. My intention is that when the high address mask isn't specified, by the user (via the setting) or the Process plugin, we fall back to using the low address mask. The low and high address mask is the same for almost all environments. But the patch I wrote never uses the Process plugin high address mask if it was set, e.g. from corefile metadata. This patch corrects that. I also have an old patch in Phabractor that was approved to add FixAddress methods to SBProcess; I need to pick that patch up and finish it (I wanted to add an enum to specify which mask is being requested iirc), so I can do address masks tests in API tests. rdar://120926000
…R<->FPR moves The fmv can be removed through appropriate logic in RISCVInstrInfo::foldMemoryOperandImpl.
…… (#74125) …tructions into account Hint instructions like G_ASSERT_ZEXT cann be viewed as a copy. Including this fact into the combiner allows the match more patterns involving such instructions.
BasicAA currently says that any Constant cannot alias an identified local object. This is not correct if the local object escaped, as it's possible to create a pointer to the escaped object using an inttoptr constant expression base. To compensate for this, make sure that inttoptr constant expressions are treated as escape sources, just like inttoptr instructions. This ensures that the optimization can still be applied if the local object is non-escaping. This is sufficient to still optimize the original motivating case from c53e2ec. Fixes llvm/llvm-project#76789.
Add a debug counter for DAGCombine. This can help with bisecting which DAG combine introduced a miscompile.
Add a debug counter that allows forcing an sdag fallback after a certain number of functions. The intended use-case is to bisect which function gets miscompiled by global isel using `-debug-counter=globalisel-count=N` (in cases where sdag doesn't also miscompile it, of course). The "falling back" debug line is printed unconditionally, because using `-debug-only` is usually too spammy for the intended purpose.
We currently require that all referenced globals have an explicit declaration or definition in the IR. For intrinsics, this requirement is redundant, because they cannot be called indirectly (including "direct" calls with mismatched function type). The function type used in the call directly determines the function type of the intrinsic declaration. Relax this requirement, and implicitly declare any intrinsics that do not have an explicit declaration. This will remove a common annoyance when writing tests and alive2 proofs. (I also plan to introduce a mode where declarations for all missing symbols will be automatically added, to make working with incomplete IR easier -- but that will be behind a default-disabled flag.)
CONFLICT (content): Merge conflict in clang/lib/Driver/ToolChains/Clang.cpp
Remove Python 2 support and clean up code that conditions based on version. Issue #76664.
This enables the libclang python binding test to check the oldest version of Python supported in addition to the normal python version. It is important to check this for issue #76664, since many new mainstream python type annotation features and best practices are not compatible with older versions of python. Additionally, frustration around ever increasing platform dependencies and versions has been raised. This will help ensure that python maintains reasonable backwards compatibility. Adding this additional build step will increase the run time, but this should always be minimal, since the additional libclang compilation should see 100% cache hit rate. Issue #76664. Fixes #76601.
The wording "fails silently" has been sometimes used to indicate that a silenceable failure was emitted by the operation. The meaning is exactly the opposite: silenceable failure is _not_ silent unless silenced.
…#78409) This reverts commit 020ea3e. This seems to break `test/libcxx/clang_tidy.gen.py/regex.sh.cpp` (in C++03 mode for some reason): ``` 2024-01-17T07:32:22.1759374Z # RUN: at line 12 2024-01-17T07:32:22.1773919Z clang-tidy-18 /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/test/libcxx/clang_tidy.gen.py/regex.sh.cpp --warnings-as-errors=* -header-filter=.* --config-file=/home/runner/_work/llvm-project/llvm-project/libcxx/.clang-tidy -- -Wweak-vtables -nostdinc++ -I /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/include/c++/v1 -I /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/include/c++/v1 -I /home/runner/_work/llvm-project/llvm-project/libcxx/test/support -std=c++03 -Werror -Wall -Wctad-maybe-unsupported -Wextra -Wshadow -Wundef -Wunused-template -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-noexcept-type -Wno-atomic-alignment -Wno-reserved-module-identifier -Wdeprecated-copy -Wdeprecated-copy-dtor -Wno-user-defined-literals -Wno-tautological-compare -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -Wno-local-type-template-args -Wno-c++11-extensions -Wno-unknown-pragmas -Wno-pass-failed -Wno-mismatched-new-delete -Wno-redundant-move -Wno-self-move -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -D_LIBCPP_ENABLE_EXPERIMENTAL -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_NONE -Werror=thread-safety -Wuser-defined-warnings -fno-modules 2024-01-17T07:32:22.1803227Z # executed command: clang-tidy-18 /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/test/libcxx/clang_tidy.gen.py/regex.sh.cpp '--warnings-as-errors=*' '-header-filter=.*' --config-file=/home/runner/_work/llvm-project/llvm-project/libcxx/.clang-tidy -- -Wweak-vtables -nostdinc++ -I /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/include/c++/v1 -I /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/include/c++/v1 -I /home/runner/_work/llvm-project/llvm-project/libcxx/test/support -std=c++03 -Werror -Wall -Wctad-maybe-unsupported -Wextra -Wshadow -Wundef -Wunused-template -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-noexcept-type -Wno-atomic-alignment -Wno-reserved-module-identifier -Wdeprecated-copy -Wdeprecated-copy-dtor -Wno-user-defined-literals -Wno-tautological-compare -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -Wno-local-type-template-args -Wno-c++11-extensions -Wno-unknown-pragmas -Wno-pass-failed -Wno-mismatched-new-delete -Wno-redundant-move -Wno-self-move -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -D_LIBCPP_ENABLE_EXPERIMENTAL -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_NONE -Werror=thread-safety -Wuser-defined-warnings -fno-modules 2024-01-17T07:32:22.1817757Z # .---command stdout------------ 2024-01-17T07:32:22.1820160Z # | /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/include/c++/v1/__iterator/iterator_traits.h:124:50: error: use nullptr [modernize-use-nullptr,-warnings-as-errors] 2024-01-17T07:32:22.1822498Z # | 124 | static const bool value = decltype(__test<_Tp>(0, 0, 0, 0, 0))::value; 2024-01-17T07:32:22.1823337Z # | | ^ 2024-01-17T07:32:22.1824052Z # | | nullptr 2024-01-17T07:32:22.1826651Z # | /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/include/c++/v1/__iterator/iterator_traits.h:124:53: error: use nullptr [modernize-use-nullptr,-warnings-as-errors] 2024-01-17T07:32:22.1829325Z # | 124 | static const bool value = decltype(__test<_Tp>(0, 0, 0, 0, 0))::value; 2024-01-17T07:32:22.1831137Z # | | ^ 2024-01-17T07:32:22.1831963Z # | | nullptr 2024-01-17T07:32:22.1834149Z # | /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/include/c++/v1/__iterator/iterator_traits.h:124:56: error: use nullptr [modernize-use-nullptr,-warnings-as-errors] 2024-01-17T07:32:22.1848711Z # | 124 | static const bool value = decltype(__test<_Tp>(0, 0, 0, 0, 0))::value; 2024-01-17T07:32:22.1849506Z # | | ^ 2024-01-17T07:32:22.1849997Z # | | nullptr 2024-01-17T07:32:22.1851528Z # | /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/include/c++/v1/__iterator/iterator_traits.h:124:59: error: use nullptr [modernize-use-nullptr,-warnings-as-errors] 2024-01-17T07:32:22.1853391Z # | 124 | static const bool value = decltype(__test<_Tp>(0, 0, 0, 0, 0))::value; 2024-01-17T07:32:22.1854277Z # | | ^ 2024-01-17T07:32:22.1854841Z # | | nullptr 2024-01-17T07:32:22.1856314Z # | /home/runner/_work/llvm-project/llvm-project/build/generic-cxx03/include/c++/v1/__iterator/iterator_traits.h:124:62: error: use nullptr [modernize-use-nullptr,-warnings-as-errors] 2024-01-17T07:32:22.1857768Z # | 124 | static const bool value = decltype(__test<_Tp>(0, 0, 0, 0, 0))::value; 2024-01-17T07:32:22.1858493Z # | | ^ 2024-01-17T07:32:22.1858998Z # | | nullptr 2024-01-17T07:32:22.1859503Z # `----------------------------- 2024-01-17T07:32:22.1859900Z # .---command stderr------------ 2024-01-17T07:32:22.1860259Z # | 43 warnings generated. 2024-01-17T07:32:22.1860710Z # | Suppressed 33 warnings (33 in non-user code). 2024-01-17T07:32:22.1861809Z # | Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well. 2024-01-17T07:32:22.1862750Z # | 5 warnings treated as errors 2024-01-17T07:32:22.1863229Z # `----------------------------- 2024-01-17T07:32:22.1863622Z # error: command failed with exit status: 1 ``` (see https://github.com/llvm/llvm-project/actions/runs/7552394884/job/20561192096)
…#76641) Fixes llvm/llvm-project#76549 The cause of the optimization miss was - 1. `optimizePow` converting almost integer FP exponents to integer, and turning `pow` to `powi`. 2. `optimizeLog` not accepting `Intrinsic::powi` as a target. This patch converts constantInt back to constantFP where applicable and adds a test.
This fixes #71892 allowing us to check magled names in the IR verifier.
…. (#71556) Substitute with zero-extended to i64 ballot.i32 intrinsic.
|
This PR preserves unsigned return type of image read and unsigned texel type of image write builtins as ZeroExtend image operand in SPIRV. Original commit: KhronosGroup/SPIRV-LLVM-Translator@0aab124
This functionality was added in SPIR-V 1.2 and allows using an <id> to set the execution modes SubgroupsPerWorkgroupId, LocalSizeId, and LocalSizeHintI, and others. Original commit: KhronosGroup/SPIRV-LLVM-Translator@10b0aab
intel#11840 is resolved by usptreaming the fix in intel#12068 to llvm/llvm-project#74437.
This reverts commit 8c2b0d4.
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
Having a more unified approach to updating nodes/graphs may provide a more | ||
concise and easy to use API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Been thinking about this since we chatted yesterday, and I'm not mad about going forward with an API that we think we could do better with and will change very soon after delivering. If it'll only take a day or two to modify this so that nd_range
is updated on the node, and nodes are passed to a graph to update, then it might be worth doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new commit looks good, think we can remove this issue item
…ntel#12433) When a host-task is submitted to in-order queue, dependency between this host-task and the successor is explicitly handled. However, when we record an in-order queue, the recorded CG are not part of the regular in-order queue execution sequence. But inter-CG dependencies are managed by the graph implementation. This PR implements this point and ensures that recording an in-order does not impact the normal execution sequence. Tests (e2e and unitest) have been added to check it. --------- Co-authored-by: Marcos Maronas <[email protected]>
The method was deprecated in favor of `StringRef::starts_with` in llvm/llvm-project#75491
pre-commit Pr for oneapi-src/unified-runtime#1135 --------- Signed-off-by: Spruit, Neil R <[email protected]> Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
I've pushed a commit wihch proposes some API changes to (IMO) improve the usage of the API, and make things a bit clearer in what's happening. The main change is the ND-ranges are updated on the Node itself, and that updates to the executable graph are triggered by explicitly passing updates nodes to the graph. This to me serves two purposes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments are minor nits, LGTM
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc
Outdated
Show resolved
Hide resolved
Having a more unified approach to updating nodes/graphs may provide a more | ||
concise and easy to use API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new commit looks good, think we can remove this issue item
…erations with simd vectors (intel#12089)" (intel#12462) This reverts commit 8c92df9 to address test failures
LLVM: llvm/llvm-project@110e1717b31e SPIRV-LLVM-Translator: KhronosGroup/SPIRV-LLVM-Translator@10b0aab
…l#12414) Signed-off-by: Vyacheslav N Klochkov <[email protected]>
…ntel#12470) Those test cases fail even with the most recent GPU drivers, it seems the issue with such N was fixed for slm_block_load(), but not for slm_block_store(). Signed-off-by: Klochkov, Vyacheslav N <[email protected]>
…l name (intel#12393) Implement a custom string class (backed by a `DynArray<char>`, hence owning its memory) to represent kernel names on the fusion interface. *This PR is part of a series of changes to remove uses of STL classes in the kernel fusion interface to prevent ABI issues in the future.* Signed-off-by: Julian Oppermann <[email protected]>
CI testing for oneapi-src/unified-runtime#1070 --------- Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
…intel#12445) Untangle the interaction between options and the `Config` class, and hide `Config` (which needs to store options in a map) from the KF interface. The idea is to introduce free functions to interact with the `thread_local` instance of `Config` held by the existing `ConfigHelper` class, instead of letting the client construct the `Config` object and handing it to over `ConfigHelper` for storing it. _This PR is part of a series of changes to remove uses of STL classes in the kernel fusion interface to prevent ABI issues in the future._ Signed-off-by: Julian Oppermann <[email protected]>
Dynamic parameters are arguments to a node's command-group which can be updated | ||
by the user after the node has been added to a graph. Updating the value of a | ||
dynamic parameter will be reflected in the modifiable graph which contains this | ||
node. Executable graphs can then be updated with these update nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
node. Executable graphs can then be updated with these update nodes | |
node. |
Sentence doesn't make sense to me. Maybe we can be more specific or just remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to rework this sentence to make it read better. I think it's important to keep something here that explains how the update makes it's way to the executable graph in the end though.
Scheduled drivers uplift Co-authored-by: GitHub Actions <[email protected]>
It's failing in postcommit, we need this to fix the CI. Signed-off-by: Sarnie, Nick <[email protected]>
- Adds an API to the spec for updating graph nodes using explicit indices - Also includes functionality for updating ND-range of kernel nodes - Note: Current design is only for kernel execution nodes
dff51db
to
d7f5d51
Compare
Closing in favour of upstream PR: intel#12486 |
This patch removes the match by value functionality,
thus an index is always required to update a kernel parameter.