欢迎光临散文网 会员登陆 & 注册

2023阿里巴巴全球数学竞赛预选赛题/决赛部分题个人解 (五)

2023-06-26 08:59 作者:saqatl  | 我要投稿

应用与计算数学题 2. LLM Reward Collapse

给定一个特定的 prompt,让 LLM 生成 n 个回答。标注者将这 n 个回答从最好到最差排序。设 G 在 %5B-1%2C1%5D 上强凹且单调递增,我们想训练一个 reward model,为第 i 个回答分配一个 0%20%5Cle%20r_i%20%5Cle%201 的分数。奖励 r_i 应为如下优化问题的解:

%5Cmax%5Climits_%7B0%20%5Cle%20r_1%2C%20%5Cldots%2C%20r_n%20%5Cle%201%7D%20%5Csum%5Climits_%7Bi%3Cj%7D%20G(r_i%20-%20r_j)

(a) 证明

L(r_1%2C%20%5Cldots%2C%20r_n)%20%3A%3D%20%5Csum%5Climits_%7Bi%3Cj%7D%20G(r_%7Bi%7D%20-%20r_%7Bj%7D)

是凹的,并证明上述优化问题有唯一解 (r_1%5E*%2C%20%5Cldots%2C%20r_n%5E*)%20%5Cin%20%5B0%2C1%5D%5En

(b) 证明上述解满足:(1) 1%20%3D%20r_%7B1%7D%5E*%20%5Cge%20r_%7B2%7D%5E*%20%5Cge%20%5Ccdots%20%5Cge%20r_%7Bn%7D%5E*%20%3D%200;(2) 对任意 %20i%3D%201%2C%5Cldots%2Cnr_%7Bi%7D%5E*%20%2B%20r_%7Bn%2B1-i%7D%5E*%20%3D%201

(c) 设当 n%20%5Cto%20%5Cinfty 时,r_1%2C%20%5Cldots%2C%20r_n 的经验分布收敛于区间 %5B0%2C1%5D 上的概率测度 %5Cmu。此时优化问题即为

%5Csup%5Climits_%7B%5Cmu%7D%20%5Cmathbb%7BE%7D_%7BX%2CX'%20%5Cmathop%20%20%5Csim%20%5Climits%5E%7Bi.i.d.%7D%20%5Cmu%7D%20G(%7CX%20-%20X'%7C)

若 %5Cmathbb%7BE%7D_%7BX%2CX'%20%5Cmathop%20%20%5Csim%20%5Climits%5E%7Bi.i.d.%7D%20%5Cmu%7D%20G(%7CX%20-%20X'%7C) 在一个概率测度 %5Cmu%5E* 上取得最大值,证明:%5Cmathbb%7BE%7D_%7BX%20%5Csim%20%5Cmu%5E*%7DG(%7CX%20-%20c%7C) 不依赖于 c%20%5Cin%20%5B0%2C1%5D

(a) 与 (b):该优化问题的解记为 %5Ctextbf%20r%5E*。直接由定义得

%5Cbegin%7Baligned%7D%0A%0AL(%5Ctextbf%20r)%20%2B%20L(%5Ctextbf%20r')%20%26%3D%20%5Csum%5Climits_%7Bi%3Cj%7D%20G(r_i%20-%20r_j)%20%2B%20G(r_i'%20-%20r_j')%20%5C%5C%0A%0A%26%20%5Cle%20%5Csum%5Climits_%7Bi%3Cj%7D%202G%5Cleft(%5Cfrac%7Br_i%20%2B%20r_i'%20-%20r_j%20-%20r_j'%7D%7B2%7D%5Cright)%20%3D%202L%5Cleft(%5Cfrac%7B%5Ctextbf%20r%20%2B%20%5Ctextbf%20r'%7D%7B2%7D%5Cright)%0A%0A%5Cend%7Baligned%7D

因此 L 是凹的。特别地,上述不等式取等当且仅当对任意 1%20%5Cle%20i%20%3C%20j%20%5Cle%20n 都有 r_i%20-%20r_j%20%3D%20r_i'%20-%20r_j'。由于 G 是递增的,因此 r_1%5E*%20%3D%201,这说明优化问题的解是唯一的(若有两个不同的解 %5Ctextbf%20r_1%2C%20%5Ctextbf%20r_2,则 %5Cfrac%7B%5Ctextbf%20r_1%20%2B%20%5Ctextbf%20r_2%7D%7B2%7D 使得 L 更大)。

下面再证明最优解满足 r_1%5E*%20%5Cge%20r_2%5E*%20%5Cge%20%5Ccdots%20%5Cge%20r_n%5E*。若不然,则存在 k%20%5Cge%200 使得 r_1%5E*%20%5Cge%20r_2%5E*%20%5Cge%20%5Ccdots%20%5Cge%20r_k%5E* 但 r_k%5E*%20%3C%20r_%7Bk%2B1%7D%5E*。但此时考察

%5Ctilde%7Br%7D_i%20%3D%20%5Cbegin%7Bcases%7D%0A%0Ar_i%5E*%20%26%20i%20%5Cne%20k%2C%20k%2B1%5C%5C%0A%0Ar_%7Bk%2B1%7D%5E*%20%26%20i%20%3D%20k%5C%5C%0A%0Ar_k%5E*%20%26%20i%20%3D%20k%2B1%0A%0A%5Cend%7Bcases%7D

此时

%5Csum%5Climits_%7Bi%3Cj%7DG(r_i%5E*%20-%20r_j%5E*)%20-%20%5Csum%5Climits_%7Bi%3Cj%7DG(%5Ctilde%20r_i%20-%20%5Ctilde%20r_j)%20%3D%20G(r_k%5E*%20-%20r_%7Bk%2B1%7D%5E*)%20-%20G(r_%7Bk%2B1%7D%5E*%20-%20r_k%5E*)%20%3C%200

这与 %5Ctextbf%20r%5E* 的最优性矛盾。

最后,令 %5Ctilde%20r_i%20%3D%201%20-%20r_%7Bn-i%2B1%7D%5E*,则 %5Ctilde%20r_i%20-%20%5Ctilde%20r_j%20%3D%20r_%7Bn-j%2B1%7D%5E*%20-%20r_%7Bn-i%2B1%7D%5E*,故 L(%5Ctextbf%20r%5E*)%20%3D%20L(%5Ctilde%20%7B%5Ctextbf%20r%7D)。根据解的唯一性可知 %5Ctextbf%20r%5E*%20%3D%20%5Ctilde%20%7B%5Ctextbf%20r%7D,因此 r_%7Bi%7D%5E*%20%2B%20r_%7Bn%2B1-i%7D%5E*%20%3D%201

(c) %5Cmu%5E* 为最优解 %5Ctextbf%20r%5E* 在 n%20%5Cto%20%5Cinfty 时所收敛的概率测度,设其对应的概率密度为 f%20%5Cge%20a%20%3E%200,其中 a 为常数。令

%5Cmathcal%20F%5Bf%5D%20%3D%20%5Cmathbb%7BE%7D_%7BX%2CX'%20%5Cmathop%20%20%5Csim%20%5Climits%5E%7Bi.i.d.%7D%20%5Cmu%5E*%7D%20G(%7CX%20-%20X'%7C)%20%3D%20%5Cint_%7B%5B0%2C1%5D%5E2%7D%20G(%7Cx%20-%20y%7C)f(x)f(y)dxdy

%5Cmathbb%7BE%7D_%7BX%20%5Csim%20%5Cmu%5E*%7DG(%7CX%20-%20c%7C)%20%3D%20%5Cfrac%7B%5Cdelta%20%5Cmathcal%7BF%7D%7D%7B%5Cdelta%20f%7D(c)%20%3D%20%5Cint_%7B%5B0%2C1%5D%7D%20G(%7Cx%20-%20c%7C)f(x)dx

由于 G 在紧集 %5B0%2C1%5D 上连续,那么 %5Cint_%7B%5B0%2C1%5D%7D%20G(%7Cx%20-%20c%7C)f(x)dx 也是连续的。因此,如果 %5Cint_%7B%5B0%2C1%5D%7D%20G(%7Cx%20-%20c%7C)f(x)dx 不是常数,那么我们可以考虑 f%20%2B%20%5Cdelta%20f 使得 %5Cint%20%5Cdelta%20f%20%3D%200%5C%7C%5Cdelta%20f%5C%7C_%7B%5Cinfty%7D%20%3C%20a%2F2,此时 %5Cmathcal%20F%5Bf%20%2B%20%5Cdelta%20f%5D%20%3E%20%5Cmathcal%20F%5Bf%5D,矛盾。

下面的题赛时并没有选,只是出于个人喜好和某人一起做的。

组合与概率题 5. 证明:对任意 %5Cvarepsilon%20%3E%200,存在 n_0 使得任取 %20n%20%5Cge%20n_0,所有边数不少于 n%5E%7B1%20%2B%20%5Cvarepsilon%7D 的 n 阶简单图包含一个圈 C 满足:其至少有 %7CE(C)%7C 条弦。这里 %7C%5Ccdot%7C 是集合的势。(圈 C是一条边,其连接 C 中两个顶点但不属于 C 的边集 E(C)。)

先来吐槽这个题,一方面是因为专栏公式又要插不下了,另一方面是这是个 open problem,直到开赛前 3 天有一篇文章挂在 arxiv 上解决了这个问题(https://arxiv.org/abs/2306.09157),就很离谱。


2023阿里巴巴全球数学竞赛预选赛题/决赛部分题个人解 (五)的评论 (共 条)

分享到微博请遵守国家法律