基于马尔科夫链的随机文章生成
这几天在玩Caves of Qud,这是一个传统Roguelike游戏,游戏的背景是末日后,人们用水作为一般等价物进行交易,同时用水进行水仪式,可以促进不同派系之间的好感度。游戏中的书本内容是随机生成的,听群友说是隐马尔科夫链,之后出于好奇自己搜了一些资料,并且尝试用马尔科夫链生成一些内容,以下内容是我基于自己写的模型,并且用Soul Music作为语料库生成的内容:
You are certain if you can be an immortal and took the Death of his own, and took the Discworld, or Mort for the Discworld, on the Discworld, someone she'd known to become sixteen but she knew how to his own, and took the dimensions. But if it were, well, nothing against horses, or later had Eg and took the Death started to become real sorry. Or what, for the Death started to his home yet. Er. Got it!' he believed in the Discworld, or later and took the Discworld, on business of the Discworld, someone married then rolled up a story about sex and this much to become accustomed and took the dimensions. And another tooth.'
And then, eventually, and this shows that he later hired and this silliness.'
And another one, and this much can hardly existed at age of his own, once been then rolled his home between the Death sat under the Discworld, someone she'd already circling the Death which was probably safe from the Discworld, or Mort was an apprentice but it were, well, nothing here,' said that the dimensions. But first thing regardless then got up a story or Mort lost in the Death of his own, and still sitting there are interviewing or Mort was probably true. But the Discworld, or later take a story about memory. And then, in the Discworld, then said, ' I did Miss Butts shuffled the Death of his home between the dimensions. And another direction. His brow or later take a story but it is a story about sex and took her feel better. It doesn't take an apprentice then glanced at the Discworld, then said, rolling fields, and took the dimensions. But first thing regardless or later another river that you ran smoothly or later another direction.
看着像是一点都不正经的胡说八道,不过如果你有需要生成一些胡说八道随机文章的地方,这篇文章或许能帮助到你。
原理
马尔科夫链,今天发生的事情只和昨天有关系,明天发生的事情只和今天有关系。比如现在只有一个词I,然后I能和很多可能性的词连接,比如and/am等等,假如我选择and,那么现在的内容就是I and,接着是and,能和and连接的词比如有you/him/her,这时可能连接的词和I就没有任何关系了,然后假如我们选择you,现在的内容就是I and you,之后我们再去找可以和you连接的词。

接着是语料库的收集,要生成东西就需要一些原有的素材,比如有这样一段话“I am Feishiko, I like play games.”,我们可以按空格拆分,让这句话拆成I/am/Feishiko,/like/play/games.
然后输入给我们的模型,以I为开头,可能会生成:I like play games.
代码
还是以lua为例,因为模型我是用lua写的(
我们首先要创建一个函数,我们需要一个文本源,用来生成语料库,输出多少个单词,以及我们的第一个关键词是什么。
下面我们要拆分出文本源的关键词,并且把它传给listKey这个table,也就是把所有单词拆分成一个一维数组的一个个元素。
下面这段代码用来把关键词插入到一个新的table里面,相当于其他语言的字典,key是一个字符串,value是一个数组,数组里的各个元素是可能与key连接的词 (还是在这个函数里面写代码)
下面这段代码用来基于刚才训练的语料库生成文章
那么函数部分就写好了,接下来需要把我们的文章传进来
调用一下函数
重新整理一下,就是以下代码:
你站什么时候支持支持markdown,在你站传文章真是折磨