This is an additional tutorial to the re-implementation of "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention" (Xu et al., 2016) (see here).
There are some additional comments in the code, the code itself slightly refactored, we added a few notebooks with examples and debugging, we analyzed connection between the paper and the code and so on. See details in the Medium post.