a question in the paper #76

csEylLee · 2023-07-21T07:45:05Z

In your paper, you mentioned that "In practice, the projection layer can transform the queries to any desired output, making the self-attention module redundant". But self-attention has softmax, which means that self-attention is non-linear in general, but the projection layer can only make linear transformations. I don't understand why you said it can transform the queries to any desired output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a question in the paper #76

a question in the paper #76

csEylLee commented Jul 21, 2023

a question in the paper #76

a question in the paper #76

Comments

csEylLee commented Jul 21, 2023