More information about the `specific implementation` of relative attention bias. #181

buaaliyi · 2025-02-05T11:28:50Z

We can find some previous issues,

which mentioned the specific implementation of relative attention bias.
For the exact rab_{p, t}(i, j) setting in this particular codebase, the relative timespan between token i and j is calculated by timestamp[i] and timestamp[j+1] (not timestamp[j]), furthermore, the relative positional gap between token i and j is calculated by N - (j - i) (not the same like "google text-to-text transformers" paper source codebase, which is j - i). So that I am strange about this rab implementation, but what does it advantages for?

Thank you for more information about the implementation details.

The text was updated successfully, but these errors were encountered:

buaaliyi · 2025-02-05T11:50:38Z

In addition, what about the rab for ranking task, the same as the above retrieval task or not？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More information about the `specific implementation` of relative attention bias. #181

More information about the `specific implementation` of relative attention bias. #181

buaaliyi commented Feb 5, 2025 •

edited

Loading

buaaliyi commented Feb 5, 2025

More information about the specific implementation of relative attention bias. #181

More information about the specific implementation of relative attention bias. #181

Comments

buaaliyi commented Feb 5, 2025 • edited Loading

buaaliyi commented Feb 5, 2025

More information about the `specific implementation` of relative attention bias. #181

More information about the `specific implementation` of relative attention bias. #181

buaaliyi commented Feb 5, 2025 •

edited

Loading