Skip to content

Commit

Permalink
Prevent tainting native code loading from propagating (#53457)
Browse files Browse the repository at this point in the history
When we use options like code coverage, we can't use the native code
present in the cache file since it is not instrumented.

PR #52123 introduced the capability of skipping the native
code during loading, but created the issue that subsequent packages
could have an explicit or implicit dependency on the native code.

PR #53439 tainted the current process by setting
`use_sysimage_native_code`, but this flag is propagated to subprocesses
and lead to a regression in test time.

Move this to a process local flag to avoid the regression.
In the future we might be able to change the calling convention for
cross-image calls to `invoke(ci::CodeInstance, args...)` instead of
`ci.fptr(args...)` to handle native code not being present.

---------

Co-authored-by: Jameson Nash <[email protected]>
  • Loading branch information
vchuravy and vtjnash authored Feb 25, 2024
1 parent 714c6d0 commit b8a0a39
Showing 1 changed file with 8 additions and 2 deletions.
10 changes: 8 additions & 2 deletions src/staticdata.c
Original file line number Diff line number Diff line change
Expand Up @@ -3066,6 +3066,11 @@ JL_DLLEXPORT void jl_set_sysimg_so(void *handle)
extern void rebuild_image_blob_tree(void);
extern void export_jl_small_typeof(void);

// When an image is loaded with ignore_native, all subsequent image loads must ignore
// native code in the cache-file since we can't gurantuee that there are no call edges
// into the native code of the image. See https://github.com/JuliaLang/julia/pull/52123#issuecomment-1959965395.
int IMAGE_NATIVE_CODE_TAINTED = 0;

static void jl_restore_system_image_from_stream_(ios_t *f, jl_image_t *image, jl_array_t *depmods, uint64_t checksum,
/* outputs */ jl_array_t **restored, jl_array_t **init_order,
jl_array_t **extext_methods, jl_array_t **internal_methods,
Expand All @@ -3092,9 +3097,10 @@ static void jl_restore_system_image_from_stream_(ios_t *f, jl_image_t *image, jl

// in --build mode only use sysimg data, not precompiled native code
int imaging_mode = jl_generating_output() && !jl_options.incremental;
if (imaging_mode || jl_options.use_sysimage_native_code != JL_OPTIONS_USE_SYSIMAGE_NATIVE_CODE_YES) {
if (imaging_mode || jl_options.use_sysimage_native_code != JL_OPTIONS_USE_SYSIMAGE_NATIVE_CODE_YES || IMAGE_NATIVE_CODE_TAINTED) {
memset(&image->fptrs, 0, sizeof(image->fptrs));
image->gvars_base = NULL;
IMAGE_NATIVE_CODE_TAINTED = 1;
}

// step 1: read section map
Expand Down Expand Up @@ -3772,7 +3778,7 @@ JL_DLLEXPORT jl_value_t *jl_restore_package_image_from_file(const char *fname, j
// Must disable using native code in possible downstream users of this code:
// https://github.com/JuliaLang/julia/pull/52123#issuecomment-1959965395.
// The easiest way to do that is to disable it in all of them.
jl_options.use_sysimage_native_code = JL_OPTIONS_USE_SYSIMAGE_NATIVE_CODE_NO;
IMAGE_NATIVE_CODE_TAINTED = 1;
}

jl_value_t* mod = jl_restore_incremental_from_buf(pkgimg_handle, pkgimg_data, &pkgimage, *plen, depmods, completeinfo, pkgname, 0);
Expand Down

9 comments on commit b8a0a39

@nanosoldier
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Executing the daily package evaluation, I will reply here when finished:

@nanosoldier runtests(isdaily = true)

@nanosoldier
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

@vtjnash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nanosoldier runbenchmarks(ALL, isdaily = true)

@nanosoldier
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your job failed.

@vtjnash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aviatesk can you fix the bug in inference here:

      From worker 3:    ERROR: LoadError: reprocess_instruction!: unhandled expression found                                                                                                                                                      
      From worker 3:    Stacktrace:                                                                                                                                                                                                               
      From worker 3:       [1] error(s::String)                                                                                                                                                                                                   
      From worker 3:         @ Core.Compiler ./error.jl:35                                                                                                                                                                                        
      From worker 3:       [2] reprocess_instruction!(interp::BaseBenchmarks.InferenceBenchmarks.InferenceBenchmarker, inst::Core.Compiler.Instruction, idx::Int64, bb::Int64, irsv::Core.Compiler.IRInterpretationState)                         
      From worker 3:         @ Core.Compiler ./compiler/ssair/irinterp.jl:166                                                                                                                                                                     
      From worker 3:       [3] (::Core.Compiler.var"#565#568"{Nothing, BaseBenchmarks.InferenceBenchmarks.InferenceBenchmarker, Core.Compiler.IRInterpretationState, Core.Compiler.var"#check_ret!#567"{Vector{Int64}}, Core.Compiler.BitSet, Core
.Compiler.TwoPhaseDefUseMap, Core.Compiler.IRCode})(inst::Core.Compiler.Instruction, lstmt::Int64, bb::Int64)                                                                                                                                     
      From worker 3:         @ Core.Compiler ./compiler/ssair/irinterp.jl:326                                                                                                                                                                     
      From worker 3:       [4] scan!(callback::Core.Compiler.var"#565#568"{Nothing, BaseBenchmarks.InferenceBenchmarks.InferenceBenchmarker, Core.Compiler.IRInterpretationState, Core.Compiler.var"#check_ret!#567"{Vector{Int64}}, Core.Compiler
.BitSet, Core.Compiler.TwoPhaseDefUseMap, Core.Compiler.IRCode}, scanner::Core.Compiler.BBScanner, forwards_only::Bool)                                                                                                                           
      From worker 3:         @ Core.Compiler ./compiler/ssair/irinterp.jl:248                                                                                                                                                             
      From worker 3:       [5] _ir_abstract_constant_propagation(interp::BaseBenchmarks.InferenceBenchmarks.InferenceBenchmarker, irsv::Core.Compiler.IRInterpretationState; externally_refined::Nothing)                                         
      From worker 3:         @ Core.Compiler ./compiler/ssair/irinterp.jl:294                                                                                                                                                                     
      From worker 3:       [6] _ir_abstract_constant_propagation(interp::BaseBenchmarks.InferenceBenchmarks.InferenceBenchmarker, irsv::Core.Compiler.IRInterpretationState)                                                                      
      From worker 3:         @ Core.Compiler ./compiler/ssair/irinterp.jl:280                                                                                                                                                                     
      From worker 3:       [7] ir_abstract_constant_propagation(interp::BaseBenchmarks.InferenceBenchmarks.InferenceBenchmarker, irsv::Core.Compiler.IRInterpretationState)                                                                       
      From worker 3:         @ Core.Compiler ./compiler/ssair/irinterp.jl:443                                                                                                                                                                     
      From worker 3:       [8] semi_concrete_eval_call(interp::BaseBenchmarks.InferenceBenchmarks.InferenceBenchmarker, mi::Core.MethodInstance, result::Core.Compiler.MethodCallResult, arginfo::Core.Compiler.ArgInfo, sv::Core.Compiler.Inferen
ceState)                                                                                                                                                                                                                                          
      From worker 3:         @ Core.Compiler ./compiler/abstractinterpretation.jl:1201                                                                                                                                                            
      From worker 3:       [9] abstract_call_method_with_const_args(interp::BaseBenchmarks.InferenceBenchmarks.InferenceBenchmarker, result::Core.Compiler.MethodCallResult, f::Any, arginfo::Core.Compiler.ArgInfo, si::Core.Compiler.StmtInfo, m
atch::Core.MethodMatch, sv::Core.Compiler.InferenceState, invokecall::Nothing)                                                                                                                                                                    
      From worker 3:         @ Core.Compiler ./compiler/abstractinterpretation.jl:837                                                                                                                                                             
      From worker 3:      [10] abstract_call_method_with_const_args(interp::BaseBenchmarks.InferenceBenchmarks.InferenceBenchmarker, result::Core.Compiler.MethodCallResult, f::Any, arginfo::Core.Compiler.ArgInfo, si::Core.Compiler.StmtInfo, m
atch::Core.MethodMatch, sv::Core.Compiler.InferenceState)                                                                                                                                                                                         
      From worker 3:         @ Core.Compiler ./compiler/abstractinterpretation.jl:813                                                                                                                                                             
      From worker 3:      [11] abstract_call_gf_by_type(interp::BaseBenchmarks.InferenceBenchmarks.InferenceBenchmarker, f::Any, arginfo::Core.Compiler.ArgInfo, si::Core.Compiler.StmtInfo, atype::Any, sv::Core.Compiler.InferenceState, max_met
hods::Int64)                                                                                                                                                                                                                                      
...
      From worker 3:     [263] typeinf                                                                                   
      From worker 3:         @ ./compiler/typeinfer.jl:219 [inlined]                                                                                                                                                                              
      From worker 3:     [264] #inf_method_instance!#6                                                                   
      From worker 3:         @ /home/nanosoldier/.julia/dev/BaseBenchmarks/src/inference/InferenceBenchmarks.jl:139 [inlined]
      From worker 3:     [265] inf_method_instance!                                                                      
      From worker 3:         @ /home/nanosoldier/.julia/dev/BaseBenchmarks/src/inference/InferenceBenchmarks.jl:135 [inlined]
      From worker 3:     [266] inf_method_signature!(interp::BaseBenchmarks.InferenceBenchmarks.InferenceBenchmarker, m::Method, atype::Any, sparams::Core.SimpleVector; kwargs::@Kwargs{run_optimizer::Bool})                                    
      From worker 3:         @ BaseBenchmarks.InferenceBenchmarks /home/nanosoldier/.julia/dev/BaseBenchmarks/src/inference/InferenceBenchmarks.jl:132
      From worker 3:     [267] inf_method_signature!                                                                     
      From worker 3:         @ /home/nanosoldier/.julia/dev/BaseBenchmarks/src/inference/InferenceBenchmarks.jl:132 [inlined]                                                                                                                     
      From worker 3:     [268] #inf_gf_by_type!#3                                                                        
      From worker 3:         @ /home/nanosoldier/.julia/dev/BaseBenchmarks/src/inference/InferenceBenchmarks.jl:95 [inlined]                                                                                                                      
      From worker 3:     [269] inf_gf_by_type!                                                                           
      From worker 3:         @ /home/nanosoldier/.julia/dev/BaseBenchmarks/src/inference/InferenceBenchmarks.jl:93 [inlined]                                                                                                                      
      From worker 3:     [270] inf_call(f::Any, types::Any; interp::BaseBenchmarks.InferenceBenchmarks.InferenceBenchmarker, run_optimizer::Bool, is_errorneous::Bool)
      From worker 3:         @ BaseBenchmarks.InferenceBenchmarks /home/nanosoldier/.julia/dev/BaseBenchmarks/src/inference/InferenceBenchmarks.jl:157
      From worker 3:     [271] inf_call                                                                                  
      From worker 3:         @ /home/nanosoldier/.julia/dev/BaseBenchmarks/src/inference/InferenceBenchmarks.jl:146 [inlined]                                                                                                                     
      From worker 3:     [272] #abs_call#8                                                                               
      From worker 3:         @ /home/nanosoldier/.julia/dev/BaseBenchmarks/src/inference/InferenceBenchmarks.jl:169 [inlined]                                                                                                                     
      From worker 3:     [273] abs_call(f::Any, types::Any)                                                              
      From worker 3:         @ BaseBenchmarks.InferenceBenchmarks /home/nanosoldier/.julia/dev/BaseBenchmarks/src/inference/InferenceBenchmarks.jl:166                                                                                            

@vtjnash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this was a bug exposed by #53219. Previously to that commit, CodeInfo which had not been optimized would be not be copied into the cache because they did not get the inferred flag set on them. That PR removed that flag in favor of assuming it was always set, a faulty assumption which causes IRInterpreter to encounter unoptimized IR, a situation that it wasn't written defensively to handle, which causes it to crash, as seen here.

Seems the most direct fix may be to revert the deletion of that field, to undo this regression, which might also fix the performance issues noticed on that PR also (#53459)

@aviatesk
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the debugging effort. That explanation makes sense. I've run into something similar problem myself (JuliaDebug/JuliaInterpreter.jl#611 (comment)). And I confirmed the benchmark runs without problem by applying the following diff and revising afterwards:

diff --git a/base/compiler/inferencestate.jl b/base/compiler/inferencestate.jl
index 663fd78c90..d2456096ad 100644
--- a/base/compiler/inferencestate.jl
+++ b/base/compiler/inferencestate.jl
@@ -804,6 +804,7 @@ function IRInterpretationState(interp::AbstractInterpreter,
     else
         isa(src, CodeInfo) || return nothing
     end
+    src.slottypes === nothing && return nothing
     method_info = MethodInfo(src)
     ir = inflate_ir(src, mi)
     return IRInterpretationState(interp, method_info, ir, mi, argtypes, world,

Meanwhile, I see no issue with reviving the inferred field. Moreover, we could use a format like state::UInt8, where 0x00 represents "lowered", 0x01 does "inferred" (after abstract interpretation only), and so 0x02 does "optimized"?

@aviatesk
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mistaken. slottypes turns out to be always nothing in CodeInfo decompressed from jl_uncompress_ir, meaning the above diff effectively disabled irinterp, which is why there seemed to be no issue.

The real issue stems from abs_call disabling may_optimize. Due to the CodeInstance refactor, the removal of the ci.inferred check at

cache_the_tree = ci.inferred && (is_inlineable(ci) || isa_compileable_sig(linfo.specTypes, linfo.sparam_vals, def))
allows non-optimized CodeInfo to be cached for InferenceBenchmarker. However, irinterp fundamentally requires optimized IR, creating a problem (since it's currently assumed that all cached CodeInstances are optimized)."

@vtjnash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are basically 2 issues here: 1 is that cache_the_tree is true here when it should be false (because it is not optimized IR). The second, semi-related issue, is that irinterp doesn't seem to have a state bit that checks whether irinterp is possible to run. I think bringing back inferred would fix the first issue (although their might be other fields that are equivalent to checking if it is optimized, that punning makes it more complicated to make changes later). The second issue perhaps is best suited to having a bit of state for that case as well, but it also happens that currently inferred::Bool was exactly equivalent anyways.

Please sign in to comment.