-
Notifications
You must be signed in to change notification settings - Fork 31
Open
Description
Nice to see there's already an implementation of this!
I just stumbled across tensorflow's "stop_gradient" function. In the examples of where the function might be needed, they mention "The EM algorithm where the M-step should not involve backpropagation through the output of the E-step."
Does this also apply when using the EM algorithm for routing? I don't think I read anything about this in the paper, but then again the paper is very sparse with information about the backpropagation...
Not calculating the gradients for the E-step might considerably speed up training, I believe.
Thoughts?
Metadata
Metadata
Assignees
Labels
No labels