Skip to content

Conversation

mratsim
Copy link
Owner

@mratsim mratsim commented Dec 12, 2020

This replaces the use of mapX_inline and applyX_inline by the forEach / forEachContiguous / forEachParallel / forEachSerial laser iterators.

This is particularly valuable for recurrent neural network like GRU because we can implement the equation in a straightforward manner with meaning ful variable name instead of magic x, y, z and we would have needed an apply11_inline anyway (with the correct var/non-var parameters).

TODO:

  • There is a significant parallel performance regression on GRU when running test_nnp_gru. While before this PR the parallel version was "only" 2x slower than serial, now it's 13x slower than serial which probably signals a false sharing issue.
    Even then RNN are a type of iterative stencil application that require special care and often used for polyhedral benchmarking so another approach using tiling is probably needed to properly speed those up. (see https://github.com/numforge/laser/blob/master/research/automatic_loop_nest_scheduling.md#polyhedral-approaches)

Not in scope:

  • using forEach for backward propagation in GRU: this is a headache inducing refactoring

@mratsim
Copy link
Owner Author

mratsim commented Jan 3, 2021

Some changes in operator BLAS L1, require changing autograd to +.= because the += based on apply2_inline was doing implicit broadcast somehow. But then we have cascading issues that requires changing +.= to use broadcast2 and broadcast2 fixes, and then we have the Stack autograd layer that is failing tests. See 2929cb5 in the laser-iterators-stashed branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant