Support LLVM large code model #1091

matthiaskoenig · 2023-03-23T14:36:41Z

Hi all,
we the following issue when running hundreds to thousands of roadrunner instances in C++ at the same time on some cluster. The roadrunner instances are part of a large FEM simulation calculating ODEs on different mesh points.

The runtime error is

llvm-13.x/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:319: void llvm::RuntimeDyldELF::resolveX86_64Relocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int64_t, uint64_t): Assertion `isInt<32>(RealOffset)' failed.

Probably the same issue discussed here:
https://lists.llvm.org/pipermail/llvm-dev/2015-May/085793.html

As I understand the issue problems arise because LLVM has a 2GB limit by default for memory relocation. To be safe during JIT the medium or large model should be used!
See here
https://stackoverflow.com/questions/40493448/what-does-the-codemodel-in-clang-llvm-refer-to

In short - the majority of the offsets inside x86-64 instructions are PC-relative, however the immediate field inside instructions is only 32-bit long. Therefore if the data is located "far" from the code (more than 32-bit apart), then one could not use immediate field inside the instructions to efficiently encode the offset and should calculate the address explicitly. The code model provides various restrictions on the relative location of code and data.
If you're compiling everything statically, then 'small' is safe (and default). If you're JIT'ing, then everything is possible especially if ASLR is enabled and you'd need to use medium / large code model.

So in short because we have a large application with many instances we require more than 2GB of memory so not all memory addresses can be resolved relative any more.

The code model seems to be part of the LLVM CodeGen options:
https://clang.llvm.org/doxygen/classclang_1_1CodeGenOptions.html

It would be great if the option could be supported/enabled by default by roadrunner in the code generation. I am not sure how we could otherwise fix the issue.

Best Matthias

The text was updated successfully, but these errors were encountered:

hsauro · 2023-03-23T23:24:38Z

This is a task that either Adel or TJ can look at.

matthiaskoenig · 2023-03-24T10:55:17Z

Hi all,
I looked through the code and the fix has to be applied most likely when generating the JITTargetMachineBuilder JTMB close to the following line:

roadrunner/source/llvm/LLJit.cpp

Line 96 in 8082940

JTMB.setCodeGenOptLevel(convertRRCodeGenOptLevelToLLVM(options));

On the JTMB various options can be set such as the optimization strategy in the code generation.
As I understand it here also the Codesize of the code generation level can be set.

The possible options are as seen here
https://llvm.org/doxygen/JITTargetMachineBuilder_8cpp_source.html
in line 108

CodeModel::Tiny
CodeModel::Small
CodeModel::Kernel
CodeModel::Medium
CodeModel::Large

We would need an option to set CodeModel::Large here (most likely there is no large speed penalty for that so this could also be the default for roadrunner).

The code model can be set via

:   /// Set the code model.
      84             :   JITTargetMachineBuilder &setCodeModel(Optional<CodeModel::Model> CM) {
      85             :     this->CM = std::move(CM);
      86             :     return *this;
      87             :   }

I.e. the only thing to do here is to add the line

JTMB.setCodeModel(convertRRCodeModelToLLVM(options));

and support the additional option via the roadrunner options.
Unfortunately I am not a C++ expert and have no idea how the options are managed in roadrunner, otherwise I would make a pull request for that. But the fix is straight forward and requires one line of setting the codemodel on the JTMB and support for an additional option.

It would be great if someone could implement this fix, because we can't run our simulations at the moment without the option.

Best Matthias

matthiaskoenig · 2023-04-04T14:33:14Z

Hi all,

it would be great if this could be implemented. If you need any support with that let me know. I basically know what has to be done, but would need a core developer to help with this.
This is not a big change, but we would urgently need a bugfix for this.

Best Matthias

hsauro · 2023-04-04T19:52:56Z

I think Adel will take a look at it. Herbert

adelhpour · 2023-04-12T06:25:12Z

Hi Mathias,

I just saw your solution for this issue. Implementing it does make sense to me, but I believe setting the CodeModel to 'large' as the default value may have some consequences on roadrunner performance; let's keep it as our last resort. Instead, I have made some changes into our llvm13.x code based on a proposed patch. As I am not able to reproduce your error, I couldn't test it to see if it works. You can now re-build llvm dependency on your system using https://github.com/sys-bio/llvm-13.x/tree/Win64CallCheckStackLLVMTrunk branch.

Let me know about the outcome.

Adel

steffenger · 2023-04-13T12:12:51Z

Hi @adelhpour,

I used your branch for compiling roadrunner

/zhome/academic/HLRS/isd/isdsg/llvm_test/llvm-13.x
isd01701 cl1fr4 229$ git branch

Win64CallCheckStackLLVMTrunk

But the error is still the same,

febio3: /zhome/academic/HLRS/isd/isdsg/llvm_test/llvm-13.x/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:319:
void llvm::RuntimeDyldELF::resolveX86_64Relocation(const llvm::SectionEntry&, uint64_t, uint64_t, uint32_t, int, , int64_t, uint64_t): Assertion `isInt<32>(RealOffset)' failed.

Of course I'm not sure if you don't have to change something in roadrunner itself, as I said I only changed the llvm dependency. If you need more information please let me know.

best regards,

Steffen

adelhpour · 2023-04-13T23:41:30Z

Hi Steffen,

Now that this didn't work for you, I just set the default value of CodeModel to "large" in roadrunner so that you can use it on your system. We need to check if setting it into large as the default value have performance consequences before merging it into the develop branch. Here is the link to the updated branch that you can now use:
https://github.com/sys-bio/roadrunner/tree/LargeCodeModel

Simply build this branch instead of the develop branch. No further configuration is required.

Let me know if it worked for you so that we can start checking its impact on roadrunner performance.

Regards,
Adel

luciansmith · 2023-04-13T23:43:36Z

It's actually already built, if you want to use the pipeline artifacts:

https://dev.azure.com/TheRoadrunnerProject/roadrunner/_build/results?buildId=2051&view=artifacts&pathAsName=false&type=publishedArtifacts

luciansmith · 2023-05-13T03:40:38Z

OK, the best guess from the LLVM people is that the large models are not, in fact, on. The assert error you are getting is designed, at least, to fire when you have small models and not large ones.

https://discourse.llvm.org/t/getting-assert-error-due-to-32-64-bit-issues-but-on-a-64-bit-os-in-runtimedyldelf-resolvex86-64relocation/70595/4

Also suggested was Compiler Explorer, cf https://godbolt.org/z/84qWbr71d

matthiaskoenig · 2023-05-13T06:04:57Z

Could you have a look at my suggested solution above in this thread?I linked the code before where I think you have to activate the code model for roadrunner.

…

On Sat, May 13, 2023, 05:40 Lucian Smith ***@***.***> wrote: OK, the best guess from the LLVM people is that the large models are not, in fact, on. The assert error you are getting is designed, at least, to fire when you have small models and not large ones. https://discourse.llvm.org/t/getting-assert-error-due-to-32-64-bit-issues-but-on-a-64-bit-os-in-runtimedyldelf-resolvex86-64relocation/70595/4 Also suggested was Compiler Explorer, cf https://godbolt.org/z/84qWbr71d — Reply to this email directly, view it on GitHub <#1091 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAG33ORQCSJVGBRHQ723ERDXF37EDANCNFSM6AAAAAAWFJK6JQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

adelhpour · 2023-05-13T06:24:56Z

Steffen and I tested setting the value of code model to large as in:
144ecd0
, but that didn't work.

Also, I set the getCodeModel function return value in such a way that it always returns llvm::CodeModel::Large as in:
sys-bio/llvm-13.x@08e5049
, and that didn't work, either.

One possible solution to figure out if it is an actual "memory size" problem is that you use a larger/smaller number of roadrunner instances and run it on a machine with larger/smaller memory capacity. Then, you could check at what number of roadrunner instances the error starts to show up and if that number scales up/down relatively to the number you usually get on your current machine, and if that sounds logical based on the memory capacity increase/decrease.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support LLVM large code model #1091

Support LLVM large code model #1091

matthiaskoenig commented Mar 23, 2023

hsauro commented Mar 23, 2023 via email •

edited by luciansmith

Loading

matthiaskoenig commented Mar 24, 2023

matthiaskoenig commented Apr 4, 2023

hsauro commented Apr 4, 2023 via email •

edited by luciansmith

Loading

adelhpour commented Apr 12, 2023

steffenger commented Apr 13, 2023

adelhpour commented Apr 13, 2023 •

edited

Loading

luciansmith commented Apr 13, 2023

luciansmith commented May 13, 2023

matthiaskoenig commented May 13, 2023 via email

adelhpour commented May 13, 2023 •

edited

Loading

Support LLVM large code model #1091

Support LLVM large code model #1091

Comments

matthiaskoenig commented Mar 23, 2023

hsauro commented Mar 23, 2023 via email • edited by luciansmith Loading

matthiaskoenig commented Mar 24, 2023

matthiaskoenig commented Apr 4, 2023

hsauro commented Apr 4, 2023 via email • edited by luciansmith Loading

adelhpour commented Apr 12, 2023

steffenger commented Apr 13, 2023

adelhpour commented Apr 13, 2023 • edited Loading

luciansmith commented Apr 13, 2023

luciansmith commented May 13, 2023

matthiaskoenig commented May 13, 2023 via email

adelhpour commented May 13, 2023 • edited Loading

hsauro commented Mar 23, 2023 via email •

edited by luciansmith

Loading

hsauro commented Apr 4, 2023 via email •

edited by luciansmith

Loading

adelhpour commented Apr 13, 2023 •

edited

Loading

adelhpour commented May 13, 2023 •

edited

Loading