Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring Fusion Executor, pulling out compiled kernel #3468

Draft
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

csarofeen
Copy link
Collaborator

@csarofeen csarofeen commented Nov 24, 2024

Pull out kernel compilation from the KernelExecutor, trying to separate out the two concepts as we will move towards a world where the execution of a kernel is done through HostIr.

Moved code:

  • from runtime/executor.h to rutnime/compiled_kernel.h
  • from runtime/executor.cpp to rutnime/compiled_kernel.cpp
  • from runtime/executor_utils.cpp to runtime/compiled_kernel.cpp (these are functions only used in compiled_kernel)
  • from sys/utils.cpp to runtime/compiled_kernel.cpp (these are functions only used in compiled_kernel)

@csarofeen
Copy link
Collaborator Author

!test

Copy link
Collaborator Author

@csarofeen csarofeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a few TODO's in this PR, I might not take on all of them in this PR.

csrc/fusion.cpp Outdated
@@ -231,6 +231,8 @@ void Fusion::removeVal(Val* val) {
void Fusion::addInput(Val* input) {
assertInContainer(input, "Cannot register input ");

std::cout << "Registering input: " << input->toString() << std::endl;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

buffer << cuda_src.rdbuf();
return buffer.str();
}
} // namespace
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything above this is only code motion.

: compile_params_(compile_params),
lowered_(std::make_unique<GpuLower>(fusion, compile_params)) {
FUSER_PERF_SCOPE("CompiledKernel::CompiledKernel");
// TODO: No hooks can be sent because this is in the constructor
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

lowered_->run();
}

// TODO:Rename to "compile"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

// between them.
return val->isFusionInput() && !val->isA<TensorView>();
})) {
// TODO: parameter cache is too big a hammer here. We should consider
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an old TODO I don't plan to address here. I'm not sure what the challenge of caching with scalar inputs is.

// This could be refactored.
struct CompiledKernel : public NonCopyable {
NVF_API ~CompiledKernel();
struct CudaExecutable : public NonCopyable {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO can this be merged into compiled_kernel.h?

@@ -58,20 +47,11 @@ struct CompiledKernel : public NonCopyable {
int register_spills = -1;
};

// Returns executable function and the ptxas log from compilation
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to compiled_kernel.cpp

@@ -253,12 +233,5 @@ void validateCircularBuffering(
kir::Kernel* kernel,
ExpressionEvaluator& expr_eval);

//! Query the target GPU version number NVRTC compiles CUDA kernels for
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to compiled_kernel.cpp

Copy link
Collaborator Author

@csarofeen csarofeen Dec 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually these are likely just removed as they should now be contained in runtime/compiled_kernel.cpp

@@ -32,117 +30,6 @@
#include <cstdlib>

namespace nvfuser {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to compiled_kernel.cpp

@@ -194,14 +81,6 @@ bool detectComputeSanitizer() {

namespace nvfuser {

namespace executor_utils {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to compiled_kernel.cpp

@csarofeen
Copy link
Collaborator Author

csarofeen commented Dec 22, 2024

TODO list:

  • Move pre lowering hooks to constructor of compiled kernel (cleanup executor.cpp 279)
  • Move post lowering hooks to the constructor of compiled kernel as well
  • rename compileFusion to compile
  • When compiled kernel checks when to disable the parameter cache change the check to make sure when an extent depends on a TensorView input it goes through metadata op, or it throws an error.
  • Not sure what this TODO means it's on compiled_kernel.cpp 1131 ("TODO: These properties should be set as part of the constructor so that it can be const")
  • uncertain in compiled_kernel.cpp 1168 ("TODO: pass in kernel name?")
  • Check that compiled_kernel.cpp 1233 ("TODO: high water mark should be computed via occupancy API after") is an old todo.
  • Remove compiled_kernel.h::CompileOptions it only holds device and should likely be in compile params
  • Check if the default constructor of CompiledKernel can be removed
  • Remove the TODO "TODO: Consider split out compileRtc and runRtc to a different" it will need to be evaluated in the future if that makes sense when compilation and execution are completely separate concepts
  • Check if executor_utils::CudaExecutable can be merged with CompiledKernel
  • executor.cpp 241 "TODO: Is this necessary?"
  • Check if executor.h still needs the function disableLaunchParamCache since CompiledKernel has one
  • executor.h 384 "TODO: Should this be removed?" SchedulerType scheduler_type_ = SchedulerType::None;
  • Don't disable parameter cache completely when a scalar input is used in a computation of an ID extent

@csarofeen
Copy link
Collaborator Author

!test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant