欢迎光临散文网 会员登陆 & 注册

《一阶序列模型》First order sequence model

2023-02-20 23:53 作者:学的很杂的一个人  | 我要投稿

来源:https://e2eml.school/transformers.html#softmax

中英双语版,由微软翻译和少量自己理解的意思做中文注释


We can set aside matrices for a minute and get back to what  we really care about,sequences of words.

我们可以先把矩阵放在一边,回到我们真正关心的问题上来,词的序列。

Imagine that as we start to develop our natural language computer interface we want to handle just three possible commands:

想象一下,当我们开始开发我们的自然语言计算机界面时,我们只想处理三种可能的命令。

  • Show me my directories please.(请给我看看我的目录。)

  • Show me my files please.(请给我看我的文件。)

  • Show me my photos please.(请给我看看我的照片。)

Our vocabulary size is now seven: {directories, files, me, my, photos, please, show}.

我们的词汇量现在是7个: {directories, files, me, my, photos, please, show}.

One useful way to represent sequences is with a transition model.

表示序列的一个有用的方法是用一个过渡模型。

For every word in the vocabulary, it shows what the next word is likely to be.

对于词汇中的每一个单词,它显示下一个词可能是什么。

If users ask about photos half  the time, files 30% of the time, and directories the rest of the time,

如果用户一半时间询问照片,30% 时间询问文件,其余时间询问目录,

the transition model will look like this. The sum of the transitions away from any word will always add up to one.

过渡模型将如下所示。任何单词的过渡之和将始终加起来为 1。

       

This particular transition model is called a Markov chain,

这种特殊的过渡模型称为马尔可夫链,

because it satisfies the Markov property that the probabilities for the next word depend only on recent words.

因为它满足马尔可夫性质,即下一个单词的概率仅取决于最近的单词。

More specifically, it is a first order Markov model because it only looks at the single most recent word.

更具体地说,它是一个一阶马尔可夫模型,因为它只查看一个最近的单词。

If it considered the two most recent words it would be a second order Markov model.

如果它考虑最近的两个词,它将是一个二阶马尔可夫模型。

Our break from matrices is over.

我们回到矩阵。

It turns out that Markov chains can be expressed conveniently in matrix form. 

事实证明,马尔可夫链可以方便地以矩阵形式表示。

Using the same indexing scheme that we used when creating one-hot vectors,

使用我们在创建独热向量时使用的相同索引方案,

each row represents one of the words in our vocabulary.

每一行代表我们词汇表中的一个单词。

So does each column. The matrix transition model treats a matrix as a lookup table.

每列也是如此。矩阵转换模型将矩阵视为查找表。

Find the row that corresponds to the word you’re interested in.

找到与您感兴趣的字词对应的行。

The value in each column shows the probability of that word coming next.

每列中的值显示该单词接下来出现的概率。

Because the value of each element in the matrix represents a probability, they will all fall between zero and one.         
由于矩阵中每个元素的值表示一个概率,因此它们都将介于 0 和 1 之间。   

Because probabilities always sum to one, the values in each row will always add up to one.

 由于概率总和始终为 1,因此每行中的值总和始终为 1。

In the transition matrix here we can see the structure of our three sentences clearly.

在过渡矩阵中,我们可以清楚地看到三个句子的结构。

Almost all of the transition probabilities are zero or one.

几乎所有的转移概率都是零或一。

There is only one place in the Markov chain where branching happens.

马尔可夫链中只有一个地方发生分支。

After my, the words directories, files, or photos might appear, each with a different probability.

在 my 之后,可能会出现目录、文件或照片等词,每个词都有不同的概率。

Other than that,  there’s no uncertainty about which word will come next.

除此之外,没有不确定性下一个词。

That certainty is reflected by having mostly ones and zeros in the transition matrix.

这种确定性反映在过渡矩阵中主要有 1 和 0。

We can revisit our trick of using matrix multiplication with a one-hot vector to pull out the transition probabilities associated with any given word.

我们可以重新审视我们的技巧,即使用矩阵乘法和 one-hot 向量来提取与任何给定单词相关的转移概率。

For instance, if we just wanted to isolate the probabilities of which word comes after my,

例如,如果我们只是想隔离哪个词在我之后的概率,

we can create a one-hot vector representing the word my  and multiply it by our transition matrix.

我们可以创建一个表示单词 My 的 one-hot 向量,并将其乘以我们的转移矩阵。

This pulls out the row  the relevant row and shows us the probability distribution of what the next word will be.

这将拉出相关行的行,并向我们显示下一个单词的概率分布。



《一阶序列模型》First order sequence model的评论 (共 条)

分享到微博请遵守国家法律