Skip to content
This repository has been archived by the owner on Feb 24, 2023. It is now read-only.

Add documentation #6

Open
ZuseZ4 opened this issue Dec 9, 2021 · 2 comments
Open

Add documentation #6

ZuseZ4 opened this issue Dec 9, 2021 · 2 comments
Assignees

Comments

@ZuseZ4
Copy link
Member

ZuseZ4 commented Dec 9, 2021

To manage expectations, we should document what is expected to work already, what we will possibly fix and what is probably
not going to work till we have finished a better Enzyme integration. All issues should be solved in the next iteration.
This overview should be moved into a real documentation..

Not Working, unlikely to be fixed:

  • Differentiating a function having one or more f32 parameters (f64 works and f32 values are allowed inside of the function).
  • Not ffi-safe types
  • Differentiating a function which accepts or uses a dyn trait object anywhere.
  • AD across language barriers
  • CUDA and HIP support.
  • Using oxide-enzyme in a dependency, rather than in your current main project.

Likely to be fixed

  • Only a combined forward+ReverseAD pass is supported at the moment. Enzyme does support ForwardAD and splitting
    the forward and reverse pass of ReverseAD. (Working on it).
  • Differentiating a function which uses parallelism (e.g. rayon) other than OpenMP / MPI.
  • Differentiating a function using BLAS / Lapack routines. If you compile them with debug symbols it will already work.
    If you use a pre-compiled version it might or might not work, Enzyme doesn't cover all functions yet.

Fixed

  • Generating Functions returning one f64 parameter in the return struct.
  • Generating Functions returning three or more parameters in the return struct.

Workarounds:

  • There is no real workaround for the dyn trait issue, since it requires creating a new vTable. You might be lucky if that object
    is not interacting with your active values, but that's not granted.
  • You can cast f32 to f64 values before passing them to the function and cast them back to f32 inside of your function.
  • If you want to differentiate functions calling C/C++/Julia/Fortran/... routines, first compile those other languages to .bc bitcode files. Look up the build script on where to place them. You might run into some issues due to missing symbols when using other languages, let me know about your experiences.
  • Similar to differentiating across language barriers. Technically it will work if you use rust wrappers around them or if you use the clang version which we build in ~/.cache/enzyme/rustc-<version>-src, but that is probably too messy to set up and also requires looking at https://enzyme.mit.edu/getting_started/CUDAGuide/
@ZuseZ4 ZuseZ4 self-assigned this Dec 9, 2021
@strasdat
Copy link

strasdat commented Jan 4, 2022

Using oxide-enzyme in a dependency, rather than in your current main project.

@ZuseZ4 - this sounds indeed like a significant limitation. Can you share a little background about this? What would it take to get this enabled? I'd assume some substantial changes to cargo, right?

@ZuseZ4
Copy link
Member Author

ZuseZ4 commented Jan 4, 2022

Sure @strasdat

So our main issue is that the cargo team discussed post-build.rs scripts, which would run after the compilation. They rejected them, because they did want to keep cargo focused and not turn it into a full cmake alternative. It seemed very unlikely to me, that they are going to reconsider that just for this project. So all we have are build.rs files, which will run before compilation. That's unfortunate for us, since Enzyme requires llvm-bc or llvm-ir files which are just generated towards the end
of the compilation process. Their is no official way to register some function running after that. There are a few solutions out there from people with related issues, but all have their drawbacks. This is the drawback of my solution.
So the cargo enzyme command currently doesn't do much except of calling

RUSTFLAGS="--emit=llvm-bc" cargo +enzyme -Z build-std rustc --target x86_64-unknown-linux-gnu -- --emit=llvm-bc -g -C opt-level=3 -Zno-link 

followed by

RUSTFLAGS="--emit=llvm-bc" cargo +enzyme -Z build-std rustc --target x86_64-unknown-linux-gnu -- --emit=llvm-bc -g -C opt-level=3

Notice the -Zno-link in the first run. This is necessary, since Enzyme didn't had a chance to create the functions yet. Not using no-link would result in compilation failure, since cargo would be missing the definition for those functions.

Inside of my library I'm doing some simple checks to see whether this is the first, or the second compilation run. If it's the first run I just return from my build script and let cargo do it's compilation. If it is the second compilation run,
I look for all *.bc files, run llvm-link on them, read the merged.bc file and run enzyme on it. After some symbol magic I create an archive which just contains the function generated by enzyme. Afterwards I hand over the compilation process to cargo again and just ask it to link the new archive.
When compiling your crate, cargo will first download and compile all of your dependencies. You can tell cargo to compile your dependencies with some extra flags. There is however no way to tell enzyme to compile some (or any) of your dependencies twice.

Some alternatives we considered (and dropped) include:

  1. Calling cargo inside of our build-script to manually take over the compilation of dependencies. Cargo places a lock, so you can't run cargo in the same location because it would deadlock. Don't ask me how I know.
  2. Automatically copying dependencies using Enzyme into a tmp dir. There I can spawn a cargo process to compile them there (twice) and move the relevant artifacts back, hoping that the main cargo process will pick them up.
  3. Call rustc from our build file (because it doesn't create a lock, it won't interfere with cargo). Along the way re-implement cargo to solve dependency chains and such things.

If you happen to know a better alternative to this setup I'd be happy to switch.
I should however note that issues around the c-abi are at least similar severe (in my opinion) and then we also use incomplete debug output (-g flag) to estimate how the memory layout of Rust types looks like.
With the current implementation we have no clear path to solve any of these three issues, so we are currently working on
a pre-rfc to discuss an alternative implementation.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants