欢迎光临散文网 会员登陆 & 注册

概率密度函数映射变换推导(PRML-1.27式)

2022-11-23 16:00 作者:淡蓝小点Bluedotdot  | 我要投稿

本文节选自淡蓝小点PRML Page-by-page项目notes第一章1-020说明点(对应PRML page18)。第一章视频还未在B站上传(还有部分剪辑校对工作未完成),需要notes的朋友请加微信(微信名:淡蓝小点Bluedotdot,微信号:bluedotdot_cn)索取。淡蓝小点网站(https://bluedotdot.cn, https://bluedotdot.com.cn)还在建设中,暂时不能访问!

p_x(x)是一个关于x的概率密度函数,现有函数变换g使得x%3Dg(y)g为双射,要求关于y的概率密度函数p_y(y),应该怎么求?一种最想当然的想法就是将g(y)代入p_x(x)替换掉里面的x变成p_x(g(y))。但这是不是就是y的概率密度函数p_y(y)了?答案是否定的。用g(y)替换xp_y(y)的正确结果如下式所示(也就是PRML的1.27式)

p_y(y)%20%3D%20p_x(x)%7C%5Cfrac%7B%5Cmathrm%7Bd%7Dx%7D%7B%5Cmathrm%7Bd%7Dy%7D%7C%0A%3Dp_x(g(y))%7Cg'(y)%7C

它比直接替换多了一个求导的绝对值项。试举一列,设x(0%2C1)范围内均匀分布,所以p_x(x)%3D1。令x%3D2y也就是g(y)%3D2y。如果直接替换的话有

p_y(y)%20%3D%20p_x(g(y))%20%3D%20p_x(2y)%3D1

注意,p_x(x)%3D1是个常函数,跟变量的变化无关,所以变换后仍有p_x(2y)%20%3D%201。因为x%5Cin(0%2C1)所以自然有y%5Cin(0%2C%5Cfrac%7B1%7D%7B2%7D)。所以上式积分则有

%5Cint%20p_y(y)%5Cmathrm%7Bd%7Dy%20%3D%20%5Cint_0%5E%7B%5Cfrac%7B1%7D%7B2%7D%7D%201%5Cmathrm%7Bd%7Dy%20%3D%20y%7C_0%5E%7B%5Cfrac%7B1%7D%7B2%7D%7D%20%3D%20%5Cfrac%7B1%7D%7B2%7D

可见,积分并不等于1,所以p_Y(y)一定不是合法的概率密度。而根据1.27式得到的结果为

p_y(y)%20%3D%20p_x(g(y))%7C%5Cfrac%7B%5Cmathrm%7Bd%7Dx%7D%7B%5Cmathrm%7Bd%7Dy%7D%7C%20%3D%201*2%20%3D%202

所以积分为

%5Cint%20p_y(y)%5Cmathrm%7Bd%7Dy%20%3D%20%5Cint_0%5E%7B%5Cfrac%7B1%7D%7B2%7D%7D%202%5Cmathrm%7Bd%7Dy%20%3D%202y%7C_0%5E%7B%5Cfrac%7B1%7D%7B2%7D%7D%20%3D%201

那么书上的1.27式是如何得到的了(可参考MLAPP的2.6.2)?前面给出的变化式是x%20%3D%20g(y),现在假设y%20%3D%20f(x),也就是f%20%3D%20g%5E%7B-1%7D。设F_Y(y)y的累积分布函数,也就是

F_y(Y)%20%3D%20P_y(y%5Cle%20Y)%20%3D%20%20%5Cint_%7B-%5Cinfty%7D%5EY%20p_y(y)%5Cmathrm%7Bd%7Dy

所以有

%5Cfrac%7B%5Cpartial%20F_y(Y)%7D%7B%5Cpartial%20y%7D%20%3D%20p_y(y)

注意,这里的p_y(y)就是我们要求的y的概率密度。而用f(x)替换y得到

F_y(Y)%20%3D%20P_y(y%20%5Cle%20Y)%20%3D%20P_y(f(x)%20%5Cle%20Y)

y%20%3D%20f(x)是递增函数时f(x)%5Cle%20Y可以表示为x%20%5Cle%20f%5E%7B-1%7D(Y),因为此时有f(x)%20%5Cle%20f(f%5E%7B-1%7D(Y))%20%3D%20Y。当y%20%3D%20f(x)是递减函数时f(x)%20%5Cle%20Y可表示为x%20%5Cge%20f%5E%7B-1%7D(Y),因为此时有f(x)%5Cle%20f(f%5E%7B-1%7D(Y))%3DY,所以上式可写为

F_y(Y)%20%3D%0A%5Cleft%5C%7B%0A%5Cbegin%7Baligned%7D%0AP_x(x%20%5Cle%20f%5E%7B-1%7D(Y))%20%20%5Cquad%20f%5Cmbox%7B%E9%80%92%E5%A2%9E%7D%20%5C%5C%0AP_x(x%20%5Cge%20f%5E%7B-1%7D(Y))%20%20%5Cquad%20f%5Cmbox%7B%E9%80%92%E5%87%8F%7D%5C%5C%0A%5Cend%7Baligned%7D%0A%5Cright.%0A%3D%0A%5Cleft%5C%7B%0A%5Cbegin%7Baligned%7D%0AP_x(x%20%5Cle%20f%5E%7B-1%7D(Y))%20%20%5Cquad%20f%5Cmbox%7B%E9%80%92%E5%A2%9E%7D%20%5C%5C%0A1%20-%20P_x(x%20%5Cle%20f%5E%7B-1%7D(Y))%20%20%5Cquad%20f%5Cmbox%7B%E9%80%92%E5%87%8F%7D%5C%5C%0A%5Cend%7Baligned%7D%0A%5Cright.

因此有

%5Cbegin%7Balign*%7D%0Ap_y(y)%20%3D%20%5Cfrac%7B%5Cmathrm%7Bd%7D%20F_y(Y)%7D%7B%5Cmathrm%7Bd%7D%20y%7D%0A%26%3D%0A%5Cleft%5C%7B%0A%5Cbegin%7Baligned%7D%0A%5Cfrac%7B%5Cmathrm%7Bd%7DP_x(x%20%5Cle%20f%5E%7B-1%7D(Y))%7D%7B%5Cmathrm%7Bd%7Dx%7D%5Cfrac%7B%5Cmathrm%7Bd%7Dx%7D%7B%5Cmathrm%7Bd%7Dy%7D%20%20%5Cquad%20f%5Cmbox%7B%E9%80%92%E5%A2%9E%7D%20%5C%5C%0A%5Cfrac%7B%5Cmathrm%7Bd%7D%5C%7B1%20-%20P_x(x%20%5Cle%20f%5E%7B-1%7D(Y))%5C%7D%7D%7B%5Cmathrm%7Bd%7Dx%7D%5Cfrac%7B%5Cmathrm%7Bd%7Dx%7D%7B%5Cmathrm%7Bd%7Dy%7D%20%20%5Cquad%20f%5Cmbox%7B%E9%80%92%E5%87%8F%7D%5C%5C%0A%5Cend%7Baligned%7D%0A%5Cright.%20%5C%5C%0A%26%3D%0A%5Cleft%5C%7B%0A%5Cbegin%7Baligned%7D%0Ap_x(x)%5Cfrac%7B%5Cmathrm%7Bd%7Dx%7D%7B%5Cmathrm%7Bd%7Dy%7D%20%20%5Cquad%20f%5Cmbox%7B%E9%80%92%E5%A2%9E%7D%20%5C%5C%0A-p_x(x)%5Cfrac%7B%5Cmathrm%7Bd%7Dx%7D%7B%5Cmathrm%7Bd%7Dy%7D%20%20%5Cquad%20f%5Cmbox%7B%E9%80%92%E5%87%8F%7D%5C%5C%0A%5Cend%7Baligned%7D%0A%5Cright.%20%5C%5C%0A%26%3D%20p_x(x)%7C%5Cfrac%7B%5Cmathrm%7Bd%7Dx%7D%7B%5Cmathrm%7Bd%7Dy%7D%7C%0A%5Cend%7Balign*%7D

最后一步是因为若f(x)是递增函数,则必有%5Cfrac%7B%5Cmathrm%7Bd%7Dy%7D%7B%5Cmathrm%7Bd%7Dx%7D%20%3E%200(导数大于零函数值递增),所以也有%5Cfrac%7B%5Cmathrm%7Bd%7Dx%7D%7B%5Cmathrm%7Bd%7Dy%7D%3E0(可把后者看作前者的倒数);若f(x)是递减函数,则必有%5Cfrac%7B%5Cmathrm%7Bd%7Dx%7D%7B%5Cmathrm%7Bd%7Dy%7D%20%3C%200,所以也有%5Cfrac%7B%5Cmathrm%7Bd%7Dy%7D%7B%5Cmathrm%7Bd%7Dx%7D%20%3C%200。所以有

%7C%5Cfrac%7B%5Cmathrm%7Bd%7Dx%7D%7B%5Cmathrm%7Bd%7Dy%7D%7C%3D%0A%5Cleft%5C%7B%0A%5Cbegin%7Baligned%7D%0A%5Cfrac%7B%5Cmathrm%7Bd%7Dx%7D%7B%5Cmathrm%7Bd%7Dy%7D%20%20%5Cquad%20f%5Cmbox%7B%E9%80%92%E5%A2%9E%7D%20%5C%5C%0A-%5Cfrac%7B%5Cmathrm%7Bd%7Dx%7D%7B%5Cmathrm%7Bd%7Dy%7D%20%20%5Cquad%20f%5Cmbox%7B%E9%80%92%E5%87%8F%7D%5C%5C%0A%5Cend%7Baligned%7D%0A%5Cright.

至此我们就得到了书上的1.27式。当然,我们也可以从另一种角度来看1.27式的推导过程。设x的取值范围为(a%2C%20b)y的取值范围为(%5Calpha%2C%20%5Cbeta)。根据g(y)是递增还是递减,我们有

%5Cbegin%7Balign*%7D%0A%5Cint_%7B%5Calpha%7D%5E%7B%5Cbeta%7DP_y(y)%5Cmathrm%7Bd%7Dy%20%26%3D%20%5Cint_%7Ba%7D%5E%7Bb%7DP_x(x)%5Cmathrm%7Bd%7Dx%20%5C%5C%0A%26%3D%5Cint_%7B%5Calpha%7D%5E%7B%5Cbeta%7DP_x(g(y))%5Cmathrm%7Bd%7Dg(y)%20%5C%5C%0A%26%3D%0A%5Cleft%5C%7B%0A%5Cbegin%7Baligned%7D%0A%5Cint_%7B%5Calpha%7D%5E%7B%5Cbeta%7DP_x(g(y))g'(y)%5Cmathrm%7Bd%7Dy%20%20%5Cquad%20f%5Cmbox%7B%E9%80%92%E5%A2%9E%7D%20%5C%5C%0A%5Cint_%7B%5Cbeta%7D%5E%7B%5Calpha%7DP_x(g(y))g'(y)%5Cmathrm%7Bd%7Dy%20%20%5Cquad%20f%5Cmbox%7B%E9%80%92%E5%87%8F%7D%5C%5C%0A%5Cend%7Baligned%7D%0A%5Cright.%20%5C%5C%0A%26%3D%0A%5Cleft%5C%7B%0A%5Cbegin%7Baligned%7D%0A%5Cint_%7B%5Calpha%7D%5E%7B%5Cbeta%7DP_x(g(y))g'(y)%5Cmathrm%7Bd%7Dy%20%20%5Cquad%20f%5Cmbox%7B%E9%80%92%E5%A2%9E%7D%20%5C%5C%0A-%5Cint_%7B%5Calpha%7D%5E%7B%5Cbeta%7DP_x(g(y))g'(y)%5Cmathrm%7Bd%7Dy%20%20%5Cquad%20f%5Cmbox%7B%E9%80%92%E5%87%8F%7D%5C%5C%0A%5Cend%7Baligned%7D%0A%5Cright.%20%5C%5C%0A%26%3D%5Cint_%7B%5Calpha%7D%5E%7B%5Cbeta%7DP_x(g(y))%7Cg'(y)%7C%5Cmathrm%7Bd%7Dy%0A%5Cend%7Balign*%7D

所以有

p_y(y)%20%3D%20%5Cfrac%7B%5Cmathrm%7Bd%7D%5Cint_%7B%5Calpha%7D%5E%7B%5Cbeta%7DP_y(y)%5Cmathrm%7Bd%7Dy%7D%7B%5Cmathrm%7Bd%7Dy%7D%20%3D%20%5Cfrac%7B%5Cmathrm%7Bd%7D%5Cint_%7B%5Calpha%7D%5E%7B%5Cbeta%7DP_x(g(y))%7Cg'(y)%7C%5Cmathrm%7Bd%7Dy%20%7D%7B%5Cmathrm%7Bd%7Dy%7D%20%3D%20p_x(g(y))%7Cg'(y)%7C

我们已经推导了随机变量y概率密度,现在讨论极值的问题。假设y%5E%7B%5Cast%7Dp_y(y)的众数(即p_y(y%5E%5Cast)p_y(y)的最大值,相当于p_y%5E%7B'%7D(y%5E%7B%5Cast%7D)%3D0)。

是否能得出结论x%5E%5Cast%3Dg(y%5E%5Cast)也为p_x(x)的众数了?答案也是否定的,即通常情况下p_x(x%5E%5Cast)不一定是最大值,也就是通常都有p_x%5E%7B'%7D(x%5E%5Cast)%5Cnot%3D0。再通俗一点就是说,不能通过求p_y(y)的极大似然估计得到y%5E%5Cast,再将y%5E%5Cast代入g得到x%5E%5Cast%3Dg(y%5E%5Cast)当作p_x(x)的极大似然估计结果。

但是要注意,这里说的是常用。也就是说在特殊情况,即g为线性变换时,上述结论是成立的。也就是当g为线性变换时,x%5E%5Cast%3Dg(y%5E%5Cast)也是p_x(x)的极大似然估计。下面来证明这一点。

根据1.27式我们有p_y(y)%3Dp_x(g(y))sg'(y),其中s%5Cin%5C%7B-1%2C%2B1%5C%7D对应去绝对值后的符号。基于这个公式有:

p_y'(y)%3Dsp_x'(g(y))%5C%7Bg'(y)%5C%7D%5E2%2Bsp_x(g(y))g''(y)

假设x%5E%5Castp_x(x)的极值点,则根据x%3Dg(y)得到的y%5E%5Cast(满足x%5E%5Cast%20%3D%20g(y%5E%5Cast))是否一定是p_y(y)的极值点了?即是否一定有p_y'(y%5E%5Cast)%20%3D%200了?将x%5E%5Cast%2C%20y%5E%5Cast代入上式后有

p_y'(y%5E%5Cast)%3Dsp_x'(g(y%5E%5Cast))%5C%7Bg'(y%5E%5Cast)%5C%7D%5E2%2Bsp_x(g(y%5E%5Cast))g''(y%5E%5Cast)

上式右边第一项必为0因为有p_x'(g(y%5E%5Cast))%3D0,但第二项就不一定为0了......

【本问题未完结,但至此已达到B站插入公式数量上限量,无法继续刊载,请需要的朋友联系我拿notes...】

概率密度函数映射变换推导(PRML-1.27式)的评论 (共 条)

分享到微博请遵守国家法律