You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
which mentioned the specific implementation of relative attention bias.
For the exact rab_{p, t}(i, j) setting in this particular codebase, the relative timespan between token i and j is calculated by timestamp[i] and timestamp[j+1] (not timestamp[j]), furthermore, the relative positional gap between token i and j is calculated by N - (j - i) (not the same like "google text-to-text transformers" paper source codebase, which is j - i). So that I am strange about this rab implementation, but what does it advantages for?
Thank you for more information about the implementation details.
The text was updated successfully, but these errors were encountered:
Hi @jiaqizhai
We can find some previous issues,
#148
#36
which mentioned the
specific implementation
of relative attention bias.For the exact rab_{p, t}(i, j) setting in this particular codebase, the relative timespan between token i and j is calculated by timestamp[i] and timestamp[j+1] (not timestamp[j]), furthermore, the relative positional gap between token i and j is calculated by
N - (j - i)
(not the same like "google text-to-text transformers" paper source codebase, which isj - i
). So that I am strange about this rab implementation, but what does it advantages for?Thank you for more information about the implementation details.
The text was updated successfully, but these errors were encountered: