Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pkgimages for coverage & malloc tracking by ignoring native code in tracked packages #52123

Merged

Conversation

IanButterworth
Copy link
Member

@IanButterworth IanButterworth commented Nov 11, 2023

Fixes #48071
Fixes #51412

Alternative to #48183

This enables and defaults to using pkgimage caches for code-coverage & malloc tracking by ignoring the native code in the pkgimage, so that code must be JIT compiled, at which point the code tracking mechanism will be compiled into the native code.

This is controlled specifically at the package level, rather than globally, meaning that if the --code-coverage=@path approach is used (which is the default for Pkg.test), pkgimage cached native code will be used for all packages outside of path, meaning faster stdlibs in CI.

Benefits of this approach

  • We don't need to ship --pkgimage=no stdlib caches (faster julia builds).
  • Ecosystem CI will be running in a closer to default state with pkgimages being generated

Based on a suggestion by @vchuravy

Waiting for:

src/init.c Outdated Show resolved Hide resolved
Comment on lines +8453 to +8465
// If a pkgimage or sysimage is being generated, disable tracking.
// This means sysimage build or pkgimage precompilation workloads aren't tracked.
auto do_coverage = [&] (bool in_user_code, bool is_tracked) {
return (coverage_mode == JL_LOG_ALL ||
return (jl_generating_output() == 0 &&
(coverage_mode == JL_LOG_ALL ||
(in_user_code && coverage_mode == JL_LOG_USER) ||
(is_tracked && coverage_mode == JL_LOG_PATH));
(is_tracked && coverage_mode == JL_LOG_PATH)));
};
auto do_malloc_log = [&] (bool in_user_code, bool is_tracked) {
return (malloc_log_mode == JL_LOG_ALL ||
return (jl_generating_output() == 0 &&
(malloc_log_mode == JL_LOG_ALL ||
(in_user_code && malloc_log_mode == JL_LOG_USER) ||
(is_tracked && malloc_log_mode == JL_LOG_PATH));
(is_tracked && malloc_log_mode == JL_LOG_PATH)));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to avoid baking tracking-enabled native code into pkgimages, which is likely undesirable (though arguably reasonable in CI environments)

Copy link
Member Author

@IanButterworth IanButterworth Nov 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it could be allowed if --check-bounds=yes is also forced, as it is by Pkg.test, given that is unlikely in non-test sessions.

And in that case we wouldn't need to zero out the fptrs from the pkgimage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @Keno has an idea on how to handle --check-bounds=yes.

IIUC the problem is that --check-bounds doesn't just change the LLVM code, but actually the behaviour of inference and co. See #50239 (comment)

So right now I don't see away around us needing at least those two copies.

base/loading.jl Outdated Show resolved Hide resolved
Comment on lines -246 to -247
else
# If pkgimage is set, malloc_log and code_coverage should not
@assert opts.malloc_log == 0 && opts.code_coverage == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should check that malloc_log/code_coverage don't reach the generating-ouput process?

@vchuravy vchuravy added this to the 1.11 milestone Nov 11, 2023
@vchuravy
Copy link
Member

We may consider backporting this to 1.10.1

@IanButterworth

This comment was marked as resolved.

@giordano

This comment was marked as resolved.

@IanButterworth IanButterworth merged commit b1c8e12 into JuliaLang:master Nov 16, 2023
2 checks passed
@IanButterworth IanButterworth deleted the ib/ignore_pkgimage_code branch November 16, 2023 19:52
@vtjnash
Copy link
Member

vtjnash commented Feb 22, 2024

Oh, I think I missed what this PR was doing. This needs be reverted as it is entirely illegal to zero this memory here and it cannot be permitted to do so without some difficult changes to codegen:

julia/src/staticdata.c

Lines 3771 to 3773 in a96726b

if (ignore_native){
memset(&pkgimage.fptrs, 0, sizeof(pkgimage.fptrs));
}

It is causing the following test to fail an assert, which is why CI has been red for awhile:

$ ./julia -C native -J./usr/lib/julia/sys.so --code-coverage=user  --code-coverage=temp-tracefile.info --startup-file=no   --eval 'using Test'
julia: /home/vtjnash/julia/src/staticdata.c:2266: jl_root_new_gvars: Assertion `codeinst && (jl_atomic_load_relaxed(&codeinst->specsigflags) & 0b01) && jl_atomic_load_relaxed(&codeinst->specptr.fptr)' failed.

Where the offending CodeInstance was created by StyledStrings and later attempted to be accessed by Test.

@vchuravy
Copy link
Member

Isn't this very similar to 02699bb ?

But I agree we need to "taint" any package that depends on this package and also ignore their native code, but that seems complicated (also what happens if we have a code instance duplication and one of them is coming from a pkgimg with disabled native code)

Right now pkgimage assumes that the fptr inside the codeinstance is valid and we can load and jump to it. We would need to maybe change calls across images to something like #52797 so that we can support calling a CI with deleted fptrs.

Oof.

vtjnash added a commit that referenced this pull request Feb 23, 2024
This should fix the assertion failure that has been plaguing the Pkg
tests, as discussed in #52123 (comment)
vchuravy added a commit that referenced this pull request Feb 24, 2024
When we use options like code coverage, we can't use the native code
present in the cache file since it is not instrumented.

PR #52123 introduced the capability of skipping the native
code during loading, but created the issue that subsequent packages
could have an explicit or implicit dependency on the native code.

PR #53439 tainted the current process by setting
`use_sysimage_native_code`, but this flag is propagated to subprocesses
and lead to a regression in test time.

Move this to a process local flag to avoid the regression.
In the future we might be able to change the calling convention for
cross-image calls to `invoke(ci::CodeInstance, args...)` instead of
`ci.fptr(args...)` to handle native code not being present.
vchuravy added a commit that referenced this pull request Feb 25, 2024
When we use options like code coverage, we can't use the native code
present in the cache file since it is not instrumented.

PR #52123 introduced the capability of skipping the native
code during loading, but created the issue that subsequent packages
could have an explicit or implicit dependency on the native code.

PR #53439 tainted the current process by setting
`use_sysimage_native_code`, but this flag is propagated to subprocesses
and lead to a regression in test time.

Move this to a process local flag to avoid the regression.
In the future we might be able to change the calling convention for
cross-image calls to `invoke(ci::CodeInstance, args...)` instead of
`ci.fptr(args...)` to handle native code not being present.

---------

Co-authored-by: Jameson Nash <[email protected]>
KristofferC pushed a commit that referenced this pull request Feb 26, 2024
This should fix the assertion failure that has been plaguing the Pkg
tests, as discussed in #52123 (comment)

(cherry picked from commit 6cbed31)
KristofferC pushed a commit that referenced this pull request Feb 26, 2024
This should fix the assertion failure that has been plaguing the Pkg
tests, as discussed in #52123 (comment)

(cherry picked from commit 6cbed31)
KristofferC pushed a commit that referenced this pull request Feb 26, 2024
When we use options like code coverage, we can't use the native code
present in the cache file since it is not instrumented.

PR #52123 introduced the capability of skipping the native
code during loading, but created the issue that subsequent packages
could have an explicit or implicit dependency on the native code.

PR #53439 tainted the current process by setting
`use_sysimage_native_code`, but this flag is propagated to subprocesses
and lead to a regression in test time.

Move this to a process local flag to avoid the regression.
In the future we might be able to change the calling convention for
cross-image calls to `invoke(ci::CodeInstance, args...)` instead of
`ci.fptr(args...)` to handle native code not being present.

---------

Co-authored-by: Jameson Nash <[email protected]>
(cherry picked from commit b8a0a39)
tecosaur pushed a commit to tecosaur/julia that referenced this pull request Mar 4, 2024
…#53439)

This should fix the assertion failure that has been plaguing the Pkg
tests, as discussed in JuliaLang#52123 (comment)
tecosaur pushed a commit to tecosaur/julia that referenced this pull request Mar 4, 2024
When we use options like code coverage, we can't use the native code
present in the cache file since it is not instrumented.

PR JuliaLang#52123 introduced the capability of skipping the native
code during loading, but created the issue that subsequent packages
could have an explicit or implicit dependency on the native code.

PR JuliaLang#53439 tainted the current process by setting
`use_sysimage_native_code`, but this flag is propagated to subprocesses
and lead to a regression in test time.

Move this to a process local flag to avoid the regression.
In the future we might be able to change the calling convention for
cross-image calls to `invoke(ci::CodeInstance, args...)` instead of
`ci.fptr(args...)` to handle native code not being present.

---------

Co-authored-by: Jameson Nash <[email protected]>
mkitti pushed a commit to mkitti/julia that referenced this pull request Mar 7, 2024
…#53439)

This should fix the assertion failure that has been plaguing the Pkg
tests, as discussed in JuliaLang#52123 (comment)
mkitti pushed a commit to mkitti/julia that referenced this pull request Mar 7, 2024
When we use options like code coverage, we can't use the native code
present in the cache file since it is not instrumented.

PR JuliaLang#52123 introduced the capability of skipping the native
code during loading, but created the issue that subsequent packages
could have an explicit or implicit dependency on the native code.

PR JuliaLang#53439 tainted the current process by setting
`use_sysimage_native_code`, but this flag is propagated to subprocesses
and lead to a regression in test time.

Move this to a process local flag to avoid the regression.
In the future we might be able to change the calling convention for
cross-image calls to `invoke(ci::CodeInstance, args...)` instead of
`ci.fptr(args...)` to handle native code not being present.

---------

Co-authored-by: Jameson Nash <[email protected]>
fatteneder added a commit to fatteneder/PkgCacheInspector.jl that referenced this pull request May 12, 2024
- JuliaLang/julia#52123: Added the
  `int ignore_native` argument to jl_restore_package_image_from_file
- JuliaLang/julia#49538: Added the
  `const char *pkgname` argument to jl_restore_package_image_from_file
  and jl_restore_incremental.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants