Graph custom gradient support #292

rnett · 2021-04-19T01:11:49Z

This PR aims to add support for custom gradients for graphs, using the legacy gradient setup. Eventually it will be replaced by the gradient API in #283, but we have no idea when that will happen.

rnett · 2021-04-19T01:14:04Z

@saudet I'm getting a bunch of JavaCPP errors from GradFunc, that seem to be related to the std::vector adapter.

See here. Do you have any idea what could be causing it? I'm not doing any special mapping around those classes.

saudet · 2021-04-19T08:41:37Z

We probably need to "define" a wrapper class for the std::vector<tensorflow::Output> class, with something like this:
https://github.com/saudet/tensorflow-java/blob/add-gradient-tape/tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/internal/c_api/presets/tensorflow.java#L268
BTW, you don't need to give them special names like NativeOutput. They are already in a different package. Also, are you sure we can actually define custom gradients that way? It didn't seem possible the last time I looked at that, or it had limitations like it didn't work with eager mode, or something like that.

zaleslaw · 2021-04-19T10:42:07Z

@rnett looks very cool, especially your test with registered grads for the Concat function.
Do you have any plans to add these gradients as a part of the library, not the test or examples only?
I could help with the testing and writing a few gradients too.

Did you take grad formulas from C++ code or from Python?

rnett · 2021-04-19T19:58:07Z

@rnett looks very cool, especially your test with registered grads for the Concat function.
Do you have any plans to add these gradients as a part of the library, not the test or examples only?
I could help with the testing and writing a few gradients too.

I don't have any specific plans, but I think it would be good to add missing grads in the TensorFlow init. Maybe extract that out to another class. Definitely something that should live in this repo imo.

Unfortunately writing gradients is rather hard atm since you don't have good ways to access attributes or inputs for Ops (like say axis for Concat) without using the GraphOperation and native code. I'm going to try to add those to the Op generator.

Did you take grad formulas from C++ code or from Python?

I'm not taking grad formulas from anywhere yet, this method adds the gradient to those in tensorflow/cc/gradients.

rnett · 2021-04-19T19:59:51Z

BTW, you don't need to give them special names like NativeOutput. They are already in a different package.

Yeah I know, but Imo it's cleaner than just relying on different packages. There's a number of methods (GradientHelpers mostly) that need both.

Also, are you sure we can actually define custom gradients that way? It didn't seem possible the last time I looked at that, or it had limitations like it didn't work with eager mode, or something like that.

Yeah, it's graph only, it's the legacy graph gradients. But until the graph backend starts using the gradient tape API, it's the only way to add gradients for graphs.

rnett · 2021-04-26T19:08:00Z

cc @saudet

Hmm, ok, I still get errors with a vector adapter: (new Info("std::vector<tensorflow::Output>").valueTypes("@StdMove NativeOutputVector").pointerTypes("NativeOutputVector").define())

/windows/Users/jimne/Desktop/OtherStuff/tensorflow_java/tensorflow-core/tensorflow-core-api/target/native/org/tensorflow/internal/c_api/linux-x86_64/jnitensorflow.cpp: In member function ‘tensorflow::Status JavaCPP_org_tensorflow_internal_c_1api_GradFunc::operator()(const tensorflow::Scope&, const tensorflow::Operation&, std::vector<tensorflow::Output>*, std::vector<tensorflow::Output>*)’:
/windows/Users/jimne/Desktop/OtherStuff/tensorflow_java/tensorflow-core/tensorflow-core-api/target/native/org/tensorflow/internal/c_api/linux-x86_64/jnitensorflow.cpp:1999:65: error: no matching function for call to ‘MoveAdapter<std::vector<tensorflow::Output> >::MoveAdapter(std::vector<tensorflow::Output>*&)’
     MoveAdapter< std::vector<tensorflow::Output> > adapter2(arg2);
                                                                 ^
/windows/Users/jimne/Desktop/OtherStuff/tensorflow_java/tensorflow-core/tensorflow-core-api/target/native/org/tensorflow/internal/c_api/linux-x86_64/jnitensorflow.cpp:723:5: note: candidate: MoveAdapter<T>::MoveAdapter(T&&) [with T = std::vector<tensorflow::Output>]
     MoveAdapter(T&& ptr) : ptr(&movedPtr), size(0), owner(0), movedPtr((T&&)ptr) { }
     ^~~~~~~~~~~
/windows/Users/jimne/Desktop/OtherStuff/tensorflow_java/tensorflow-core/tensorflow-core-api/target/native/org/tensorflow/internal/c_api/linux-x86_64/jnitensorflow.cpp:723:5: note:   no known conversion for argument 1 from ‘std::vector<tensorflow::Output>*’ to ‘std::vector<tensorflow::Output>&&’
/windows/Users/jimne/Desktop/OtherStuff/tensorflow_java/tensorflow-core/tensorflow-core-api/target/native/org/tensorflow/internal/c_api/linux-x86_64/jnitensorflow.cpp:722:5: note: candidate: MoveAdapter<T>::MoveAdapter(const T&) [with T = std::vector<tensorflow::Output>]
     MoveAdapter(const T& ptr) : ptr(&movedPtr), size(0), owner(0), movedPtr(std::move((T&)ptr)) { }
     ^~~~~~~~~~~

Making the adapter type a pointer doesn't help either.

saudet · 2021-04-27T00:31:53Z

Adapters don't work for defining function types like that, whether it is @StdVector, @StdMove, or anything else like that, but here it doesn't use an rvalue reference declaration with && so we don't need @StdMove there anyway, and you can remove it.

rnett · 2021-04-27T00:59:39Z

Had to remove valueTypes as well but that worked.

saudet · 2021-04-27T01:34:30Z

Sounds like memory corruption. Something is probably getting deallocated too early. We can set the "org.bytedeco.javacpp.nopointergc" system property to "true" and see that way if it's not GC doing that.

rnett · 2021-04-27T02:33:08Z

Yeah, my function pointers were getting GC'd, I forgot to save them.

rnett · 2021-04-27T04:33:46Z

Next question: I'm generating a wrapper for std::unordered_map here, but I need the erase method and I'm not seeing it. Is it inherited or something? We're using cpp11 so I would think it should be there.

saudet · 2021-04-27T06:58:13Z

Right, the way that works is by mapping a minimalist set of functions that are usually available in these kinds of templates. There should be some other way to erase an element though, there isn't? In any case, let me figure out some way to customize the output of that a bit...

rnett · 2021-04-27T07:07:38Z

There should be some other way to erase an element though, there isn't?

Doesn't seem like it, if there is it's not coming up on google

saudet · 2021-04-28T04:27:01Z

Ok, I'm confident enough that pretty much all "map" containers have an erase(iterator) method, so I've added that in commit bytedeco/javacpp@dcc06df. You'll have to use JavaCPP 1.5.6-SNAPSHOT for it to appear. If you need to remove using the key, we can also add overloads with something like the following in this case:

.put(new Info("std::unordered_map<tensorflow::string,tensorflow::Node*>").pointerTypes("NameMap").define().javaText("public native long erase(@StdString BytePointer key);"))

BTW, it looks like you're starting to map all of the legacy C++ API. You could pick up from what has already been done for TF 1.x:
https://github.com/bytedeco/javacpp-presets/blob/master/tensorflow/src/main/java/org/bytedeco/tensorflow/presets/tensorflow.java

rnett · 2021-04-28T21:12:37Z

Thanks, that works nicely.

BTW, it looks like you're starting to map all of the legacy C++ API. You could pick up from what has already been done for TF 1.x:
https://github.com/bytedeco/javacpp-presets/blob/master/tensorflow/src/main/java/org/bytedeco/tensorflow/presets/tensorflow.java

I'm trying not to, but it's gotten pretty big, that should help.

rnett · 2021-04-28T23:18:31Z

Now I'm getting a segfault from TF_OperationNumControlOutputs. Do you have any idea what would cause this? It's from GraphOperationTest.controlConsumers which works fine on master, and it fails independently of the gradient test being ran. The only change to TF_Operation is adding the node getter and removing @Opaque, I'm not sure how that would cause a segfault.

dump file

saudet · 2021-04-29T00:35:45Z

If those TF_Operation objects end up with a deallocator, they may be getting deallocated prematurely. If that's the case, we can use PointerScope as appropriate to make sure that doesn't happen, and to prevent those objects from sticking around longer than necessary, slowing things down as well.

rnett · 2021-04-29T00:55:38Z

Yeah, that's what I thought it could be, but I didn't change anything around those methods, and the gradient methods all use PointerScopes. Also, the failing methods all verify the pointer is not null before calling.

I cherry-picked the JavaCPP generation changes (presets/tensorflow.java, maven configuration) to master, and that causes it to happen, so it doesn't seem like it's a deallocation issue.

rnett · 2021-04-29T01:03:03Z

Another note: only controlConsumers and consumers fail, all the other tests work fine.

rnett · 2021-04-29T01:24:39Z

The method looks like this:

try (PointerScope scope = new PointerScope()) {
  TF_Output output = new TF_Output().oper(getUnsafeNativeHandle()).index(index);
  return TF_OperationOutputNumConsumers(output);
}

If I breakpoint right before the TF_OperationOutputNumConsumers call, output is valid, I can get output.oper().node().name().getString(). I can also get the consumer's inputs manually.

rnett · 2021-04-29T02:10:01Z

Ok, I've found the cause. It only happens when I include tensorflow/c/c_api.cc, even if I skip parsing it. I need a few functions from there (ToOperation, TF_FinishOperationLocked, etc) that aren't published in the header file. I assumed that it would be supported even though it isn't a header file, is it not?

saudet · 2021-04-29T02:21:16Z

You could try to add them to the header file with a patch in here: https://github.com/saudet/tensorflow-java/blob/master/tensorflow-core/tensorflow-core-api/WORKSPACE

Craigacp · 2021-04-29T02:41:45Z

Let's ask upstream first before ad-hoc expanding the C API. There may be a reason those functions aren't part of the C API. Did you check to see if libtensorflow exports those symbols?

rnett · 2021-04-29T03:37:47Z

It doesn't, some of them are static and others are in an anonymous namespace. The namespaced ones are just helpers that would be nice to have, the static ones ignore the graphs lock (which is required to define gradient functions via the C API).

I can make an issue in tensorflow, but I'm not sure what to ask for other than a full custom gradient C API which wouldn't be worth doing for the old version. These are functions that shouldn't really be public, the C API just isn't made with custom gradients in mind.

rnett · 2021-04-29T04:01:30Z

Ok, things work now, but it's a bit hacky. I'm going to make a tensorflow issue asking for the necessary functions, but even if they approve exporting them, we may want to merge this w/ the patch instead of waiting for 2.6 or whenever they make it in.

I need three functions:

TF_Operation* ToOperation(Node* node) to work with the c++ API for gradient defs.
TF_NewOperationLocked and
TF_FinishOperationLocked because both the C API gradient definition function and the normal versions of those functions lock the graph's muxex, preventing you from using the C API op def functions in a gradient definition.

rnett · 2021-04-29T04:10:16Z

Also, can you mark this with CI Build?

Signed-off-by: Ryan Nett <[email protected]>

… Java Signed-off-by: Ryan Nett <[email protected]>

Signed-off-by: Ryan Nett <[email protected]>

Craigacp · 2021-11-05T02:14:28Z

Ok, I think the scopes should be named if possible, and the docs on CustomGradient and RawCustomGradient need to be tidied up a bit but otherwise it's fine. If there are issues throwing exceptions through the TypedGradientAdapter then let's use TF's status signalling mechanism instead, it'll still result in a Java exception on the other end.

Signed-off-by: Ryan Nett <[email protected]>

rnett · 2021-11-05T17:12:01Z

Can I get someone to re-run the CI jobs? The cache needs to be populated.

Signed-off-by: Ryan Nett <[email protected]>

karllessard · 2021-11-10T13:48:44Z

@Craigacp , @rnett : is this ready to be merged now?

Craigacp · 2021-11-10T15:57:45Z

I think so.

rnett · 2021-11-10T16:15:34Z

I'll document the rawtypes and then push the generation later today.

Signed-off-by: Ryan Nett <[email protected]>

karllessard · 2021-11-11T01:37:56Z

All right, merging this now, thanks again for that great contribution, @rnett !

rnett · 2021-11-11T03:12:00Z

@karllessard @Craigacp generation is pushed, we're good to go.

Edit: Welp I got ninja'd.

rnett mentioned this pull request Apr 27, 2021

C API Locking prevents custom gradient definition tensorflow/tensorflow#48767

Closed

rnett force-pushed the rn_custom_gradients branch from 04aba60 to 59750dc Compare April 28, 2021 22:26

rnett force-pushed the rn_custom_gradients branch from b39bd2f to d2ffa40 Compare April 29, 2021 03:57

rnett mentioned this pull request Apr 29, 2021

Expose a few C API functions to allow custom gradients tensorflow/tensorflow#48815

Closed

rnett dismissed karllessard’s stale review via 8c40b8c November 5, 2021 01:03

rnett added 6 commits November 4, 2021 18:05

Add documentation about dangerousGradientBuilder

d08b38b

Signed-off-by: Ryan Nett <[email protected]>

Add Javadoc for getUnsafeNativeHandle

72ed4f0

Signed-off-by: Ryan Nett <[email protected]>

More dangerous gradient builder javadocs

fd2609d

Signed-off-by: Ryan Nett <[email protected]>

Add note about why gradientFuncs is required

36a6e30

Signed-off-by: Ryan Nett <[email protected]>

Store and allow getting native scope device when it has been set from…

2ddbb6c

… Java Signed-off-by: Ryan Nett <[email protected]>

Rename withDevice's parameter

759a754

Signed-off-by: Ryan Nett <[email protected]>

rnett requested a review from Craigacp November 5, 2021 01:32

rnett added 3 commits November 4, 2021 20:04

Update scope for fix review comments

ed1da29

Signed-off-by: Ryan Nett <[email protected]>

Clarify the difference between CustomGradient and RawCustomGradient

4504a6f

Signed-off-by: Ryan Nett <[email protected]>

Remove experiment

be4840c

Signed-off-by: Ryan Nett <[email protected]>

Craigacp added CI build Triggers a full native build on a pull request and removed CI build Triggers a full native build on a pull request labels Nov 5, 2021

rnett added 3 commits November 7, 2021 19:35

Adjust GraphOperation#input to not require a graph lock

9601138

Signed-off-by: Ryan Nett <[email protected]>

Remove printing from CustomGradientTest

ca5d343

Signed-off-by: Ryan Nett <[email protected]>

Cleanup adapter exceptions, name gradient scopes

517fd8d

Signed-off-by: Ryan Nett <[email protected]>

Craigacp previously approved these changes Nov 10, 2021

View reviewed changes

Document use of rawtypes

f5dd0e5

Signed-off-by: Ryan Nett <[email protected]>

rnett dismissed Craigacp’s stale review via f5dd0e5 November 11, 2021 00:52

Generate the new op classes

bfbfb49

Signed-off-by: Ryan Nett <[email protected]>

karllessard approved these changes Nov 11, 2021

View reviewed changes

karllessard merged commit e0eec4a into tensorflow:master Nov 11, 2021

rnett deleted the rn_custom_gradients branch November 11, 2021 03:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph custom gradient support #292

Graph custom gradient support #292

rnett commented Apr 19, 2021

rnett commented Apr 19, 2021

saudet commented Apr 19, 2021

zaleslaw commented Apr 19, 2021

rnett commented Apr 19, 2021

rnett commented Apr 19, 2021

rnett commented Apr 26, 2021 •

edited

Loading

saudet commented Apr 27, 2021

rnett commented Apr 27, 2021

saudet commented Apr 27, 2021 •

edited

Loading

rnett commented Apr 27, 2021

rnett commented Apr 27, 2021 •

edited

Loading

saudet commented Apr 27, 2021

rnett commented Apr 27, 2021

saudet commented Apr 28, 2021

rnett commented Apr 28, 2021

rnett commented Apr 28, 2021 •

edited

Loading

saudet commented Apr 29, 2021

rnett commented Apr 29, 2021 •

edited

Loading

rnett commented Apr 29, 2021

rnett commented Apr 29, 2021 •

edited

Loading

rnett commented Apr 29, 2021

saudet commented Apr 29, 2021 via email •

edited

Loading

Craigacp commented Apr 29, 2021

rnett commented Apr 29, 2021 •

edited

Loading

rnett commented Apr 29, 2021

rnett commented Apr 29, 2021

Craigacp commented Nov 5, 2021

rnett commented Nov 5, 2021

karllessard commented Nov 10, 2021

Craigacp commented Nov 10, 2021

rnett commented Nov 10, 2021

karllessard commented Nov 11, 2021

rnett commented Nov 11, 2021 •

edited

Loading

Graph custom gradient support #292

Graph custom gradient support #292

Conversation

rnett commented Apr 19, 2021

rnett commented Apr 19, 2021

saudet commented Apr 19, 2021

zaleslaw commented Apr 19, 2021

rnett commented Apr 19, 2021

rnett commented Apr 19, 2021

rnett commented Apr 26, 2021 • edited Loading

saudet commented Apr 27, 2021

rnett commented Apr 27, 2021

saudet commented Apr 27, 2021 • edited Loading

rnett commented Apr 27, 2021

rnett commented Apr 27, 2021 • edited Loading

saudet commented Apr 27, 2021

rnett commented Apr 27, 2021

saudet commented Apr 28, 2021

rnett commented Apr 28, 2021

rnett commented Apr 28, 2021 • edited Loading

saudet commented Apr 29, 2021

rnett commented Apr 29, 2021 • edited Loading

rnett commented Apr 29, 2021

rnett commented Apr 29, 2021 • edited Loading

rnett commented Apr 29, 2021

saudet commented Apr 29, 2021 via email • edited Loading

Craigacp commented Apr 29, 2021

rnett commented Apr 29, 2021 • edited Loading

rnett commented Apr 29, 2021

rnett commented Apr 29, 2021

Craigacp commented Nov 5, 2021

rnett commented Nov 5, 2021

karllessard commented Nov 10, 2021

Craigacp commented Nov 10, 2021

rnett commented Nov 10, 2021

karllessard commented Nov 11, 2021

rnett commented Nov 11, 2021 • edited Loading

rnett commented Apr 26, 2021 •

edited

Loading

saudet commented Apr 27, 2021 •

edited

Loading

rnett commented Apr 27, 2021 •

edited

Loading

rnett commented Apr 28, 2021 •

edited

Loading

rnett commented Apr 29, 2021 •

edited

Loading

rnett commented Apr 29, 2021 •

edited

Loading

saudet commented Apr 29, 2021 via email •

edited

Loading

rnett commented Apr 29, 2021 •

edited

Loading

rnett commented Nov 11, 2021 •

edited

Loading