Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support LLVM large code model #1091

Open
matthiaskoenig opened this issue Mar 23, 2023 · 11 comments
Open

Support LLVM large code model #1091

matthiaskoenig opened this issue Mar 23, 2023 · 11 comments

Comments

@matthiaskoenig
Copy link
Collaborator

Hi all,
we the following issue when running hundreds to thousands of roadrunner instances in C++ at the same time on some cluster. The roadrunner instances are part of a large FEM simulation calculating ODEs on different mesh points.

The runtime error is

llvm-13.x/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:319: void llvm::RuntimeDyldELF::resolveX86_64Relocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t, uint64_t): Assertion `isInt<32>(RealOffset)' failed.

Probably the same issue discussed here:
https://lists.llvm.org/pipermail/llvm-dev/2015-May/085793.html

As I understand the issue problems arise because LLVM has a 2GB limit by default for memory relocation. To be safe during JIT the medium or large model should be used!
See here
https://stackoverflow.com/questions/40493448/what-does-the-codemodel-in-clang-llvm-refer-to

In short - the majority of the offsets inside x86-64 instructions are PC-relative, however the immediate field inside instructions is only 32-bit long. Therefore if the data is located "far" from the code (more than 32-bit apart), then one could not use immediate field inside the instructions to efficiently encode the offset and should calculate the address explicitly. The code model provides various restrictions on the relative location of code and data.
If you're compiling everything statically, then 'small' is safe (and default). If you're JIT'ing, then everything is possible especially if ASLR is enabled and you'd need to use medium / large code model.

So in short because we have a large application with many instances we require more than 2GB of memory so not all memory addresses can be resolved relative any more.

The code model seems to be part of the LLVM CodeGen options:
https://clang.llvm.org/doxygen/classclang_1_1CodeGenOptions.html

It would be great if the option could be supported/enabled by default by roadrunner in the code generation. I am not sure how we could otherwise fix the issue.

Best Matthias

@hsauro
Copy link

hsauro commented Mar 23, 2023 via email

@matthiaskoenig
Copy link
Collaborator Author

Hi all,
I looked through the code and the fix has to be applied most likely when generating the JITTargetMachineBuilder JTMB close to the following line:

JTMB.setCodeGenOptLevel(convertRRCodeGenOptLevelToLLVM(options));

On the JTMB various options can be set such as the optimization strategy in the code generation.
As I understand it here also the Codesize of the code generation level can be set.

The possible options are as seen here
https://llvm.org/doxygen/JITTargetMachineBuilder_8cpp_source.html
in line 108

CodeModel::Tiny
CodeModel::Small
CodeModel::Kernel
CodeModel::Medium
CodeModel::Large

We would need an option to set CodeModel::Large here (most likely there is no large speed penalty for that so this could also be the default for roadrunner).

The code model can be set via

:   /// Set the code model.
      84             :   JITTargetMachineBuilder &setCodeModel(Optional<CodeModel::Model> CM) {
      85             :     this->CM = std::move(CM);
      86             :     return *this;
      87             :   }

I.e. the only thing to do here is to add the line

JTMB.setCodeModel(convertRRCodeModelToLLVM(options));

and support the additional option via the roadrunner options.
Unfortunately I am not a C++ expert and have no idea how the options are managed in roadrunner, otherwise I would make a pull request for that. But the fix is straight forward and requires one line of setting the codemodel on the JTMB and support for an additional option.

It would be great if someone could implement this fix, because we can't run our simulations at the moment without the option.

Best Matthias

@matthiaskoenig
Copy link
Collaborator Author

Hi all,

it would be great if this could be implemented. If you need any support with that let me know. I basically know what has to be done, but would need a core developer to help with this.
This is not a big change, but we would urgently need a bugfix for this.

Best Matthias

@hsauro
Copy link

hsauro commented Apr 4, 2023 via email

@adelhpour
Copy link
Member

Hi Mathias,

I just saw your solution for this issue. Implementing it does make sense to me, but I believe setting the CodeModel to 'large' as the default value may have some consequences on roadrunner performance; let's keep it as our last resort. Instead, I have made some changes into our llvm13.x code based on a proposed patch. As I am not able to reproduce your error, I couldn't test it to see if it works. You can now re-build llvm dependency on your system using https://github.com/sys-bio/llvm-13.x/tree/Win64CallCheckStackLLVMTrunk branch.

Let me know about the outcome.

Adel

@steffenger
Copy link

Hi @adelhpour,

I used your branch for compiling roadrunner

/zhome/academic/HLRS/isd/isdsg/llvm_test/llvm-13.x
isd01701 cl1fr4 229$ git branch

  • Win64CallCheckStackLLVMTrunk

But the error is still the same,

febio3: /zhome/academic/HLRS/isd/isdsg/llvm_test/llvm-13.x/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:319:
void llvm::RuntimeDyldELF::resolveX86_64Relocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int, , int64_t, uint64_t): Assertion `isInt<32>(RealOffset)' failed.

Of course I'm not sure if you don't have to change something in roadrunner itself, as I said I only changed the llvm dependency. If you need more information please let me know.

best regards,

Steffen

@adelhpour
Copy link
Member

adelhpour commented Apr 13, 2023

Hi Steffen,

Now that this didn't work for you, I just set the default value of CodeModel to "large" in roadrunner so that you can use it on your system. We need to check if setting it into large as the default value have performance consequences before merging it into the develop branch. Here is the link to the updated branch that you can now use:
https://github.com/sys-bio/roadrunner/tree/LargeCodeModel

Simply build this branch instead of the develop branch. No further configuration is required.

Let me know if it worked for you so that we can start checking its impact on roadrunner performance.

Regards,
Adel

@luciansmith
Copy link

@luciansmith
Copy link

OK, the best guess from the LLVM people is that the large models are not, in fact, on. The assert error you are getting is designed, at least, to fire when you have small models and not large ones.

https://discourse.llvm.org/t/getting-assert-error-due-to-32-64-bit-issues-but-on-a-64-bit-os-in-runtimedyldelf-resolvex86-64relocation/70595/4

Also suggested was Compiler Explorer, cf https://godbolt.org/z/84qWbr71d

@matthiaskoenig
Copy link
Collaborator Author

matthiaskoenig commented May 13, 2023 via email

@adelhpour
Copy link
Member

adelhpour commented May 13, 2023

Steffen and I tested setting the value of code model to large as in:
144ecd0
, but that didn't work.

Also, I set the getCodeModel function return value in such a way that it always returns llvm::CodeModel::Large as in:
sys-bio/llvm-13.x@08e5049
, and that didn't work, either.

One possible solution to figure out if it is an actual "memory size" problem is that you use a larger/smaller number of roadrunner instances and run it on a machine with larger/smaller memory capacity. Then, you could check at what number of roadrunner instances the error starts to show up and if that number scales up/down relatively to the number you usually get on your current machine, and if that sounds logical based on the memory capacity increase/decrease.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants