欢迎光临散文网 会员登陆 & 注册

《注意力矩阵乘法》Attention as matrix multiplication

2023-02-27 20:38 作者:学的很杂的一个人  | 我要投稿


来源:https://e2eml.school/transformers.html#softmax

中英双语版,由各类翻译程序和少量自己理解的意思做中文注释


相关文章汇总在文集:Transformers from Scratch(中文注释)

--------------------------------------------------------------------------------------------------------------------


Feature weights could be straightforward to build by counting how often each word pair/next word transition occurs in training, but attention masks are not. 

通过计算每个单词对/下一个单词转换在训练中发生的频率,可以很容易地建立特征权重,但注意力掩码不是。

Up to this point, we've pulled the mask vector out of thin air. 

到目前为止,我们已经凭空拉出了掩模矢量。

How transformers find the relevant mask matters. 

transformers是如何找到相关的掩码。

It would be natural to use some sort of lookup table, but now we are focusing hard on expressing everything as matrix multiplications. 

使用某种查找表是很自然的,但现在我们专注于将所有内容表示为矩阵乘法。

We can use the same lookup method we introduced above by stacking the mask vectors for every word into a matrix and using the one-hot representation of the most recent word to pull out the relevant mask.

我们可以使用与上面介绍的相同的查找方法,将每个单词的掩码向量堆叠到一个矩阵中,并使用最新单词的独热表示来提取相关的掩码。

In the matrix showing the collection of mask vectors, we've only shown the one we're trying to pull out, for clarity.

在显示掩码向量集合的矩阵中,为了清楚起见,我们只显示我们试图提取的那个。

We're finally getting to the point where we can start tying into the paper.

我们终于到了可以开始进入到论文的地步。

This mask lookup is represented by the QK^T term in the attention equation.

这种掩码查找由注意方程中的QK^T项表示。

The query Q represents the feature of interest and the matrix K represents the collection of masks.

查询 Q 表示感兴趣的特征,矩阵 K 表示掩码的集合。

Because it's stored with masks in columns, rather than rows, it needs to be transposed (with the T operator) before multiplying.

因为它是用掩码存储在列中,而不是在行中,所以在乘法之前需要转置(使用 T 运算符)。

By the time we're all done, we'll make some important modifications to this, but at this level it captures the concept of a differentiable lookup table that transformers make use of.

当我们完成所有操作时,我们将对此进行一些重要的修改,但在此级别,它捕获了transformers使用的可微查找表的概念。

《注意力矩阵乘法》Attention as matrix multiplication的评论 (共 条)

分享到微博请遵守国家法律