欢迎光临散文网 会员登陆 & 注册

浅谈stable diffusion (一又二分之一)

2023-03-20 22:33 作者:FDX01  | 我要投稿

前言

首先说一下以前常见的生成模型GAN,VAE和Flow-based 模型。

它们在生成高质量样本方面取得了巨大成功,但各自也存在一些局限性。GAN模型因其对抗性训练的本质可能导致训练不稳定以及生成多样性不足,VAE依赖于替代损失,Flow模型必须使用专门的架构来构建可逆变换。

扩散模型受非平衡热力学的启发。它们定义了一个扩散步骤的马尔可夫链,逐渐向数据添加随机噪声,然后学习逆转扩散过程,从噪声中构造期望的数据样本。与变分自编码器或流模型不同,扩散模型使用固定的过程进行学习,并且潜在变量具有高维度(与原始数据相同)。

generative-overview

Overview of different types of generative models


所以什么是扩散模型呢?

几份重要的工作:

扩散概率模型 diffusion -----------(Sohl-Dickstein et al., 2015)

噪声条件评分网络 NCSN ---------(Yang & Ermon, 2019)

去噪扩散概率模型 DDPM---------(Ho et al. 2020)


扩散模型的前向过程

给定真实图片%5Cmathbf%7Bx%7D_0%20%5Csim%20q(%5Cmathbf%7Bx%7D),diffusion前向过程通过 T 次累计对其添加高斯噪声,得到x_1%2C%20x_2%2C%20%5Cldots%2C%20x_T%20,如下图的q过程。这里需要给定一系列的高斯分布方差的超参数%5Cleft%5C%7B%5Cbeta_t%20%5Cin(0%2C1)%5Cright%5C%7D_%7Bt%3D1%7D%5ET%20。前向过程由于每个时刻 T只与t-1时刻有关,所以也可以看做马尔科夫过程:


%0Aq%5Cleft(%5Cmathbf%7Bx%7D_t%20%5Cmid%20%5Cmathbf%7Bx%7D_%7Bt-1%7D%5Cright)%3D%5Cmathcal%7BN%7D%5Cleft(%5Cmathbf%7Bx%7D_t%20%3B%20%5Csqrt%7B1-%5Cbeta_t%7D%20%5Cmathbf%7Bx%7D_%7Bt-1%7D%2C%20%5Cbeta_t%20%5Cmathbf%7BI%7D%5Cright)%20%5Cquad%20q%5Cleft(%5Cmathbf%7Bx%7D_%7B1%3A%20T%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%3D%5Cprod_%7Bt%3D1%7D%5ET%20q%5Cleft(%5Cmathbf%7Bx%7D_t%20%5Cmid%20%5Cmathbf%7Bx%7D_%7Bt-1%7D%5Cright)%0A



这个过程中,随着 t 的增大,%20x_t%20越来越接近纯噪声。当 T%20%5Crightarrow%20%5Cinfty ,%20x_T 是完全的高斯噪声 (下 面会证明,且与均值系数 %5Csqrt%7B1-%5Cbeta_t%7D%20的选择有关) 。且实际中%20%5Cbeta_t 随着增大是递增的,即 %5Cbeta_1%3C%5Cbeta_2%3C%5Cldots%3C%5Cbeta_T%20。在GLIDE的code中, %5Cbeta_t 是由 0.0001 到0.02线性插值 (以 T=1000 为基准, T 增加, %5Cbeta_t%20对应降低)。

The Markov chain of forward (reverse) diffusion process of generating a sample by slowly adding (removing) noise. (Image source: Ho et al. 2020 with a few additional annotations)


上述过程的很好的特性是我们可以采样 %5Cmathbf%7Bx%7D_t%20在任何任意时间步长 t 以封闭形式使用重新参数化技巧。假设%5Cmathcal%7BN%7D%5Cleft(%5Cmathbf%7B0%7D%2C%20%5Csigma_1%5E2%20%5Cmathbf%7BI%7D%5Cright)%20:


%5Cbegin%7Barray%7D%7Brlr%7D%20%5Cmathbf%7Bx%7D_t%20%26%20%3D%5Csqrt%7B%5Calpha_t%7D%20%5Cmathbf%7Bx%7D_%7Bt-1%7D%2B%5Csqrt%7B1-%5Calpha_t%7D%20%5Cboldsymbol%7B%5Cepsilon%7D_%7Bt-1%7D%20%26%20%3B%20%5Ctext%20%7B%20where%20%7D%20%5Cboldsymbol%7B%5Cepsilon%7D_%7Bt-1%7D%2C%20%5Cboldsymbol%7B%5Cepsilon%7D_%7Bt-2%7D%2C%20%5Ccdots%20%5Csim%20%5Cmathcal%7BN%7D(%5Cmathbf%7B0%7D%2C%20%5Cmathbf%7BI%7D)%20%5C%5C%20%26%20%3D%5Csqrt%7B%5Calpha_t%20%5Calpha_%7Bt-1%7D%7D%20%5Cmathbf%7Bx%7D_%7Bt-2%7D%2B%5Csqrt%7B1-%5Calpha_t%20%5Calpha_%7Bt-1%7D%7D%20%5Coverline%7B%5Cboldsymbol%7B%5Cepsilon%7D%7D_%7Bt-2%7D%20%26%20%3B%20%5Ctext%20%7B%20where%20%7D%20%5Coverline%7B%5Cboldsymbol%7B%5Cepsilon%7D%7D_%7Bt-2%7D%20%5Ctext%20%7B%20merges%20two%20Gaussians%20%7D(*)%20.%20%5C%5C%20%26%20%3D%5Cldots%20%5C%5C%20%26%20%3D%5Csqrt%7B%5Cbar%7B%5Calpha%7D_t%7D%20%5Cmathbf%7Bx%7D_0%2B%5Csqrt%7B1-%5Cbar%7B%5Calpha%7D_t%7D%20%5Cboldsymbol%7B%5Cepsilon%7D%20%5C%5C%20q%5Cleft(%5Cmathbf%7Bx%7D_t%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%20%26%20%3D%5Cmathcal%7BN%7D%5Cleft(%5Cmathbf%7Bx%7D_t%20%3B%20%5Csqrt%7B%5Cbar%7B%5Calpha%7D_t%7D%20%5Cmathbf%7Bx%7D_0%2C%5Cleft(1-%5Cbar%7B%5Calpha%7D_t%5Cright)%20%5Cmathbf%7BI%7D%5Cright)%20%5Cend%7Barray%7D



回想一下,当我们合并两个具有不同方差的高斯分布时时,%20%5Cmathcal%7BN%7D%5Cleft(%5Cmathbf%7B0%7D%2C%20%5Csigma_1%5E2%20%5Cmathbf%7BI%7D%5Cright)%20%5Cmathcal%7BN%7D%5Cleft(%5Cmathbf%7B0%7D%2C%20%5Csigma_2%5E2%20%5Cmathbf%7BI%7D%5Cright)%20,新的分布是%20%5Cmathcal%7BN%7D%5Cleft(%5Cmathbf%7B0%7D%2C%5Cleft(%5Csigma_1%5E2%2B%5Csigma_2%5E2%5Cright)%20%5Cmathbf%7BI%7D%5Cright)。这里的合并后的标准差为 %5Csqrt%7B%5Cleft(1-%5Calpha_t%5Cright)%2B%5Calpha_t%5Cleft(1-%5Calpha_%7Bt-1%7D%5Cright)%7D%3D%5Csqrt%7B1-%5Calpha_t%20%5Calpha_%7Bt-1%7D%7D

由此当样本变得更嘈杂时,我们可以承受更大的更新步长,所以 %5Cbeta_1%3C%5Cbeta_2%3C%5Ccdots%3C%5Cbeta_T  且 %5Cbar%7B%5Calpha%7D_1%3E%5Ccdots%3E%5Cbar%7B%5Calpha%7D_T



扩散模型的反向过程

如果我们能逆转上述过程并从中采样 q%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_t%5Cright)%20,我们将能够从高斯噪声输入中重新创建真实样本,%20%5Cmathbf%7Bx%7D_T%20%5Csim%20%5Cmathcal%7BN%7D(%5Cmathbf%7B0%7D%2C%20%5Cmathbf%7BI%7D)请注意,如果 %5Cbeta_t%20足够小, q%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_t%5Cright)%20也将是高斯分布的。不幸的是,我们无法轻易估计 q%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_t%5Cright)%20因为它需要使用整个数据集,因此我们需要学习一个模型%20p_%5Ctheta%20近似这些条件概率,以便运行反向扩散过程。

p_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_%7B0%3A%20T%7D%5Cright)%3Dp%5Cleft(%5Cmathbf%7Bx%7D_T%5Cright)%20%5Cprod_%7Bt%3D1%7D%5ET%20p_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_t%5Cright)%20%5Cquad%20p_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_t%5Cright)%3D%5Cmathcal%7BN%7D%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%3B%20%5Cboldsymbol%7B%5Cmu%7D_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_t%2C%20t%5Cright)%2C%20%5Cmathbf%7B%5CSigma%7D_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_t%2C%20t%5Cright)%5Cright)


An example of training a diffusion model for modeling a 2D swiss roll data. (Image source: Sohl-Dickstein et al., 2015)


虽然我们无法得到逆转后的分布 q%5Cleft(x_%7Bt-1%7D%20%5Cmid%20x_t%5Cright) ,但是如果知道 x_0 ,是可以通过贝叶斯公式得到 q%5Cleft(x_%7Bt-1%7D%20%5Cmid%20x_t%2C%20x_0%5Cright)%20为:

q%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_t%2C%20%5Cmathbf%7Bx%7D_0%5Cright)%3D%5Cmathcal%7BN%7D%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%3B%20%5Ctilde%7B%5Cboldsymbol%7B%5Cmu%7D%7D%5Cleft(%5Cmathbf%7Bx%7D_t%2C%20%5Cmathbf%7Bx%7D_0%5Cright)%2C%20%5Ctilde%7B%5Cbeta%7D_t%20%5Cmathbf%7BI%7D%5Cright)


过程:

%5Cbegin%7Baligned%7D%20q%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_t%2C%20%5Cmathbf%7Bx%7D_0%5Cright)%20%26%20%3Dq%5Cleft(%5Cmathbf%7Bx%7D_t%20%5Cmid%20%5Cmathbf%7Bx%7D_%7Bt-1%7D%2C%20%5Cmathbf%7Bx%7D_0%5Cright)%20%5Cfrac%7Bq%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%7Bq%5Cleft(%5Cmathbf%7Bx%7D_t%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%20%5C%5C%20%26%20%5Cpropto%20%5Cexp%20%5Cleft(-%5Cfrac%7B1%7D%7B2%7D%5Cleft(%5Cfrac%7B%5Cleft(%5Cmathbf%7Bx%7D_t-%5Csqrt%7B%5Calpha_t%7D%20%5Cmathbf%7Bx%7D_%7Bt-1%7D%5Cright)%5E2%7D%7B%5Cbeta_t%7D%2B%5Cfrac%7B%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D-%5Csqrt%7B%5Cbar%7B%5Calpha%7D_%7Bt-1%7D%7D%20%5Cmathbf%7Bx%7D_0%5Cright)%5E2%7D%7B1-%5Cbar%7B%5Calpha%7D_%7Bt-1%7D%7D-%5Cfrac%7B%5Cleft(%5Cmathbf%7Bx%7D_t-%5Csqrt%7B%5Cbar%7B%5Calpha%7D_t%7D%20%5Cmathbf%7Bx%7D_0%5Cright)%5E2%7D%7B1-%5Cbar%7B%5Calpha%7D_t%7D%5Cright)%5Cright)%20%5C%5C%20%26%20%3D%5Cexp%20%5Cleft(-%5Cfrac%7B1%7D%7B2%7D%5Cleft(%5Cfrac%7B%5Cmathbf%7Bx%7D_t%5E2-2%20%5Csqrt%7B%5Calpha_t%7D%20%5Cmathbf%7Bx%7D_t%20%5Cmathbf%7Bx%7D_%7Bt-1%7D%2B%5Calpha_t%20%5Cmathbf%7Bx%7D_%7Bt-1%7D%5E2%7D%7B%5Cbeta_t%7D%2B%5Cfrac%7B%5Cmathbf%7Bx%7D_%7Bt-1%7D%5E2-2%20%5Csqrt%7B%5Cbar%7B%5Calpha%7D_%7Bt-1%7D%7D%20%5Cmathbf%7Bx%7D_0%20%5Cmathbf%7Bx%7D_%7Bt-1%7D%2B%5Cbar%7B%5Calpha%7D_%7Bt-1%7D%20%5Cmathbf%7Bx%7D_0%5E2%7D%7B1-%5Cbar%7B%5Calpha%7D_%7Bt-1%7D%7D-%5Cfrac%7B%5Cleft(%5Cmathbf%7Bx%7D_t-%5Csqrt%7B%5Cbar%7B%5Calpha%7D_t%7D%20%5Cmathbf%7Bx%7D_0%5Cright)%5E2%7D%7B1-%5Cbar%7B%5Calpha%7D_t%7D%5Cright)%5Cright)%20%5C%5C%20%26%20%3D%5Cexp%20%5Cleft(-%5Cfrac%7B1%7D%7B2%7D%5Cleft(%5Cleft(%5Cfrac%7B%5Calpha_t%7D%7B%5Cbeta_t%7D%2B%5Cfrac%7B1%7D%7B1-%5Cbar%7B%5Calpha%7D_%7Bt-1%7D%7D%5Cright)%20%5Cmathbf%7Bx%7D_%7Bt-1%7D%5E2-%5Cleft(%5Cfrac%7B2%20%5Csqrt%7B%5Calpha_t%7D%7D%7B%5Cbeta_t%7D%20%5Cmathbf%7Bx%7D_t%2B%5Cfrac%7B2%20%5Csqrt%7B%5Cbar%7B%5Calpha%7D_%7Bt-1%7D%7D%7D%7B1-%5Cbar%7B%5Calpha%7D_%7Bt-1%7D%7D%20%5Cmathbf%7Bx%7D_0%5Cright)%20%5Cmathbf%7Bx%7D_%7Bt-1%7D%2BC%5Cleft(%5Cmathbf%7Bx%7D_t%2C%20%5Cmathbf%7Bx%7D_0%5Cright)%5Cright)%5Cright)%20%5Cend%7Baligned%7D%0A



遵循标准高斯密度函数,均值和方差可以参数化如下:

%5Cbegin%7Baligned%7D%20%5Ctilde%7B%5Cbeta%7D_t%20%26%20%3D1%20%2F%5Cleft(%5Cfrac%7B%5Calpha_t%7D%7B%5Cbeta_t%7D%2B%5Cfrac%7B1%7D%7B1-%5Cbar%7B%5Calpha%7D_%7Bt-1%7D%7D%5Cright)%3D1%20%2F%5Cleft(%5Cfrac%7B%5Calpha_t-%5Cbar%7B%5Calpha%7D_t%2B%5Cbeta_t%7D%7B%5Cbeta_t%5Cleft(1-%5Cbar%7B%5Calpha%7D_%7Bt-1%7D%5Cright)%7D%5Cright)%3D%5Cfrac%7B1-%5Cbar%7B%5Calpha%7D_%7Bt-1%7D%7D%7B1-%5Cbar%7B%5Calpha%7D_t%7D%20%5Ccdot%20%5Cbeta_t%20%5C%5C%20%5Ctilde%7B%5Cboldsymbol%7B%5Cmu%7D%7D_t%5Cleft(%5Cmathbf%7Bx%7D_t%2C%20%5Cmathbf%7Bx%7D_0%5Cright)%20%26%20%3D%5Cleft(%5Cfrac%7B%5Csqrt%7B%5Calpha_t%7D%7D%7B%5Cbeta_t%7D%20%5Cmathbf%7Bx%7D_t%2B%5Cfrac%7B%5Csqrt%7B%5Cbar%7B%5Calpha%7D_%7Bt-1%7D%7D%7D%7B1-%5Cbar%7B%5Calpha%7D_%7Bt-1%7D%7D%20%5Cmathbf%7Bx%7D_0%5Cright)%20%2F%5Cleft(%5Cfrac%7B%5Calpha_t%7D%7B%5Cbeta_t%7D%2B%5Cfrac%7B1%7D%7B1-%5Cbar%7B%5Calpha%7D_%7Bt-1%7D%7D%5Cright)%20%5C%5C%20%26%20%3D%5Cleft(%5Cfrac%7B%5Csqrt%7B%5Calpha_t%7D%7D%7B%5Cbeta_t%7D%20%5Cmathbf%7Bx%7D_t%2B%5Cfrac%7B%5Csqrt%7B%5Cbar%7B%5Calpha%7D_%7Bt-1%7D%7D%7D%7B1-%5Cbar%7B%5Calpha%7D_%7Bt-1%7D%7D%20%5Cmathbf%7Bx%7D_0%5Cright)%20%5Cfrac%7B1-%5Cbar%7B%5Calpha%7D_%7Bt-1%7D%7D%7B1-%5Cbar%7B%5Calpha%7D_t%7D%20%5Ccdot%20%5Cbeta_t%20%5C%5C%20%26%20%3D%5Cfrac%7B%5Csqrt%7B%5Calpha_t%7D%5Cleft(1-%5Cbar%7B%5Calpha%7D_%7Bt-1%7D%5Cright)%7D%7B1-%5Cbar%7B%5Calpha%7D_t%7D%20%5Cmathbf%7Bx%7D_t%2B%5Cfrac%7B%5Csqrt%7B%5Cbar%7B%5Calpha%7D_%7Bt-1%7D%7D%20%5Cbeta_t%7D%7B1-%5Cbar%7B%5Calpha%7D_t%7D%20%5Cmathbf%7Bx%7D_0%20%5Cend%7Baligned%7D



根据前向过程的特性,我们可以得到%20%5Cmathbf%7Bx%7D_0%3D%5Cfrac%7B1%7D%7B%5Csqrt%7B%5Cbar%7B%5Calpha%7D_t%7D%7D%5Cleft(%5Cmathbf%7Bx%7D_t-%5Csqrt%7B1-%5Cbar%7B%5Calpha%7D_t%7D%20%5Cboldsymbol%7B%5Cepsilon%7D_t%5Cright) 并将其代入上述等式并获得:

%5Cbegin%7Baligned%7D%20%5Ctilde%7B%5Cboldsymbol%7B%5Cmu%7D%7D_t%20%26%20%3D%5Cfrac%7B%5Csqrt%7B%5Calpha_t%7D%5Cleft(1-%5Cbar%7B%5Calpha%7D_%7Bt-1%7D%5Cright)%7D%7B1-%5Cbar%7B%5Calpha%7D_t%7D%20%5Cmathbf%7Bx%7D_t%2B%5Cfrac%7B%5Csqrt%7B%5Cbar%7B%5Calpha%7D_%7Bt-1%7D%7D%20%5Cbeta_t%7D%7B1-%5Cbar%7B%5Calpha%7D_t%7D%20%5Cfrac%7B1%7D%7B%5Csqrt%7B%5Cbar%7B%5Calpha%7D_t%7D%7D%5Cleft(%5Cmathbf%7Bx%7D_t-%5Csqrt%7B1-%5Cbar%7B%5Calpha%7D_t%7D%20%5Cepsilon_t%5Cright)%20%5C%5C%20%26%20%3D%5Cfrac%7B1%7D%7B%5Csqrt%7B%5Calpha_t%7D%7D%5Cleft(%5Cmathrm%7Bx%7D_t-%5Cfrac%7B1-%5Calpha_t%7D%7B%5Csqrt%7B1-%5Cbar%7B%5Calpha%7D_t%7D%7D%20%5Cepsilon_t%5Cright)%20%5Cend%7Baligned%7D


其中高斯分布%20%5Cbar%7Bz%7D_t%20为深度模型所预测的噪声 (用于去噪) ,可看做为%20z_%5Ctheta%5Cleft(x_t%2C%20t%5Cright) ,即得到:

%5Cmu_%5Ctheta%5Cleft(x_t%2C%20t%5Cright)%3D%5Cfrac%7B1%7D%7B%5Csqrt%7Ba_t%7D%7D%5Cleft(x_t-%5Cfrac%7B%5Cbeta_t%7D%7B%5Csqrt%7B1-%5Cbar%7Ba%7D_t%7D%7D%20z_%5Ctheta%5Cleft(x_t%2C%20t%5Cright)%5Cright)



这样的设置和VAE非常相似,因此我们可以使用变分下限来优化负对数似然。

%5Cbegin%7Baligned%7D%20-%5Clog%20p_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_0%5Cright)%20%26%20%5Cleq-%5Clog%20p_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_0%5Cright)%2BD_%7B%5Cmathrm%7BKL%7D%7D%5Cleft(q%5Cleft(%5Cmathbf%7Bx%7D_%7B1%3A%20T%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%20%5C%7C%20p_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_%7B1%3A%20T%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%5Cright)%20%5C%5C%20%26%20%3D-%5Clog%20p_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_0%5Cright)%2B%5Cmathbb%7BE%7D_%7B%5Cmathbf%7Bx%7D_%7B1%3A%20T%7D%20%5Csim%20q%5Cleft(%5Cmathbf%7Bx%7D_%7B1%3A%20T%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%5Cleft%5B%5Clog%20%5Cfrac%7Bq%5Cleft(%5Cmathbf%7Bx%7D_%7B1%3A%20T%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%7Bp_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_%7B0%3A%20T%7D%5Cright)%20%2F%20p_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_0%5Cright)%7D%5Cright%5D%20%5C%5C%20%26%20%3D-%5Clog%20p_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_0%5Cright)%2B%5Cmathbb%7BE%7D_q%5Cleft%5B%5Clog%20%5Cfrac%7Bq%5Cleft(%5Cmathbf%7Bx%7D_%7B1%3A%20T%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%7Bp_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_%7B0%3A%20T%7D%5Cright)%7D%2B%5Clog%20p_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_0%5Cright)%5Cright%5D%20%5C%5C%20%26%20%3D%5Cmathbb%7BE%7D_q%5Cleft%5B%5Clog%20%5Cfrac%7Bq%5Cleft(%5Cmathbf%7Bx%7D_%7B1%3A%20T%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%7Bp_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_%7B0%3A%20T%7D%5Cright)%7D%5Cright%5D%20%5C%5C%20%5Ctext%20%7B%20Let%20%7D%20L_%7B%5Cmathrm%7BVLB%7D%7D%20%26%20%3D%5Cmathbb%7BE%7D_%7Bq(%5Cmathbf%7Bx%7D%200%3A%20T)%7D%5Cleft%5B%5Clog%20%5Cfrac%7Bq%5Cleft(%5Cmathbf%7Bx%7D_%7B1%3A%20T%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%7Bp_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_%7B0%3A%20T%7D%5Cright)%7D%5Cright%5D%20%5Cgeq-%5Cmathbb%7BE%7D_%7Bq(%5Cmathbf%7Bx%7D)%7D%20%5Clog%20p_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_0%5Cright)%20%5Cend%7Baligned%7D


通过使用詹森不等式获得相同的结果也很简单。

%5Cbegin%7Baligned%7D%20L_%7B%5Cmathrm%7BCE%7D%7D%20%26%20%3D-%5Cmathbb%7BE%7D_%7Bq%5Cleft(%5Cmathbf%7Bx%7D_0%5Cright)%7D%20%5Clog%20p_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_0%5Cright)%20%5C%5C%20%26%20%3D-%5Cmathbb%7BE%7D_%7Bq%5Cleft(%5Cmathbf%7Bx%7D_0%5Cright)%7D%20%5Clog%20%5Cleft(%5Cint%20p_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_%7B0%3A%20T%7D%5Cright)%20d%20%5Cmathbf%7Bx%7D_%7B1%3A%20T%7D%5Cright)%20%5C%5C%20%26%20%3D-%5Cmathbb%7BE%7D_%7Bq%5Cleft(%5Cmathbf%7Bx%7D_0%5Cright)%7D%20%5Clog%20%5Cleft(%5Cint%20q%5Cleft(%5Cmathbf%7Bx%7D_%7B1%3A%20T%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%20%5Cfrac%7Bp_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_%7B0%3A%20T%7D%5Cright)%7D%7Bq%5Cleft(%5Cmathbf%7Bx%7D_%7B1%3A%20T%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%20d%20%5Cmathbf%7Bx%7D_%7B1%3A%20T%7D%5Cright)%20%5C%5C%20%26%20%3D-%5Cmathbb%7BE%7D_%7Bq(%5Cmathbf%7Bx%7D%200)%7D%20%5Clog%20%5Cleft(%5Cmathbb%7BE%7D_%7Bq%5Cleft(%5Cmathbf%7Bx%7D%3A%20T%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%20%5Cfrac%7Bp_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_%7B0%3A%20T%7D%5Cright)%7D%7Bq%5Cleft(%5Cmathbf%7Bx%7D_%7B1%3A%20T%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%5Cright)%20%5C%5C%20%26%20%5Cleq-%5Cmathbb%7BE%7D_%7Bq(%5Cmathbf%7Bx%7D%200%3A%20T)%7D%20%5Clog%20%5Cfrac%7Bp_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_%7B0%3A%20T%7D%5Cright)%7D%7Bq%5Cleft(%5Cmathbf%7Bx%7D_%7B1%3A%20T%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%20%5C%5C%20%26%20%3D%5Cmathbb%7BE%7D_%7Bq(%5Cmathbf%7Bx%7D%200%3A%20T)%7D%5Cleft%5B%5Clog%20%5Cfrac%7Bq%5Cleft(%5Cmathbf%7Bx%7D_%7B1%3A%20T%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%7Bp_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_%7B0%3A%20T%7D%5Cright)%7D%5Cright%5D%3DL_%7B%5Cmathrm%7BVLB%7D%7D%20%5Cend%7Baligned%7D%0A


为了将方程中的每个项转换为可解析计算的项,可以进一步将目标重写熵与多个KL散度的累加

%5Cbegin%7Baligned%7D%20%26%20L_%7B%5Cmathrm%7BVLB%7D%7D%3D%5Cmathbb%7BE%7D_%7Bq%5Cleft(%5Cmathbf%7Bx%7D_%7B0%3A%20T%7D%5Cright)%7D%5Cleft%5B%5Clog%20%5Cfrac%7Bq%5Cleft(%5Cmathbf%7Bx%7D_%7B1%3A%20T%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%7Bp_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_%7B0%3A%20T%7D%5Cright)%7D%5Cright%5D%20%5C%5C%20%26%20%3D%5Cmathbb%7BE%7D_q%5Cleft%5B%5Clog%20%5Cfrac%7B%5Cprod_%7Bt%3D1%7D%5ET%20q%5Cleft(%5Cmathbf%7Bx%7D_t%20%5Cmid%20%5Cmathbf%7Bx%7D_%7Bt-1%7D%5Cright)%7D%7Bp_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_T%5Cright)%20%5Cprod_%7Bt%3D1%7D%5ET%20p_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_t%5Cright)%7D%5Cright%5D%20%5C%5C%20%26%20%3D%5Cmathbb%7BE%7D_q%5Cleft%5B-%5Clog%20p_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_T%5Cright)%2B%5Csum_%7Bt%3D1%7D%5ET%20%5Clog%20%5Cfrac%7Bq%5Cleft(%5Cmathbf%7Bx%7D_t%20%5Cmid%20%5Cmathbf%7Bx%7D_%7Bt-1%7D%5Cright)%7D%7Bp_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_t%5Cright)%7D%5Cright%5D%20%5C%5C%20%26%20%3D%5Cmathbb%7BE%7D_q%5Cleft%5B-%5Clog%20p_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_T%5Cright)%2B%5Csum_%7Bt%3D2%7D%5ET%20%5Clog%20%5Cfrac%7Bq%5Cleft(%5Cmathbf%7Bx%7D_t%20%5Cmid%20%5Cmathbf%7Bx%7D_%7Bt-1%7D%5Cright)%7D%7Bp_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_t%5Cright)%7D%2B%5Clog%20%5Cfrac%7Bq%5Cleft(%5Cmathbf%7Bx%7D_1%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%7Bp_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_0%20%5Cmid%20%5Cmathbf%7Bx%7D_1%5Cright)%7D%5Cright%5D%20%5C%5C%20%26%20%3D%5Cmathbb%7BE%7D_q%5Cleft%5B-%5Clog%20p_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_T%5Cright)%2B%5Csum_%7Bt%3D2%7D%5ET%20%5Clog%20%5Cleft(%5Cfrac%7Bq%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_t%2C%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%7Bp_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_t%5Cright)%7D%20%5Ccdot%20%5Cfrac%7Bq%5Cleft(%5Cmathbf%7Bx%7D_t%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%7Bq%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%5Cright)%2B%5Clog%20%5Cfrac%7Bq%5Cleft(%5Cmathbf%7Bx%7D_1%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%7Bp_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_0%20%5Cmid%20%5Cmathbf%7Bx%7D_1%5Cright)%7D%5Cright%5D%20%5C%5C%20%26%20%3D%5Cmathbb%7BE%7D_q%5Cleft%5B-%5Clog%20p_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_T%5Cright)%2B%5Csum_%7Bt%3D2%7D%5ET%20%5Clog%20%5Cfrac%7Bq%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_t%2C%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%7Bp_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_t%5Cright)%7D%2B%5Csum_%7Bt%3D2%7D%5ET%20%5Clog%20%5Cfrac%7Bq%5Cleft(%5Cmathbf%7Bx%7D_t%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%7Bq%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%2B%5Clog%20%5Cfrac%7Bq%5Cleft(%5Cmathbf%7Bx%7D_1%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%7Bp_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_0%20%5Cmid%20%5Cmathbf%7Bx%7D_1%5Cright)%7D%5Cright%5D%20%5C%5C%20%26%20%3D%5Cmathbb%7BE%7D_q%5Cleft%5B-%5Clog%20p_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_T%5Cright)%2B%5Csum_%7Bt%3D2%7D%5ET%20%5Clog%20%5Cfrac%7Bq%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_t%2C%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%7Bp_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_t%5Cright)%7D%2B%5Clog%20%5Cfrac%7Bq%5Cleft(%5Cmathbf%7Bx%7D_T%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%7Bq%5Cleft(%5Cmathbf%7Bx%7D_1%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%2B%5Clog%20%5Cfrac%7Bq%5Cleft(%5Cmathbf%7Bx%7D_1%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%7Bp_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_0%20%5Cmid%20%5Cmathbf%7Bx%7D_1%5Cright)%7D%5Cright%5D%20%5C%5C%20%26%20%3D%5Cmathbb%7BE%7D_q%5Cleft%5B%5Clog%20%5Cfrac%7Bq%5Cleft(%5Cmathbf%7Bx%7D_T%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%7Bp_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_T%5Cright)%7D%2B%5Csum_%7Bt%3D2%7D%5ET%20%5Clog%20%5Cfrac%7Bq%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_t%2C%20%5Cmathbf%7Bx%7D_0%5Cright)%7D%7Bp_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_t%5Cright)%7D-%5Clog%20p_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_0%20%5Cmid%20%5Cmathbf%7Bx%7D_1%5Cright)%5Cright%5D%20%5C%5C%20%26%20%3D%5Cmathbb%7BE%7D_q%5B%5Cunderbrace%7BD_%7B%5Cmathrm%7BKL%7D%7D%5Cleft(q%5Cleft(%5Cmathbf%7Bx%7D_T%20%5Cmid%20%5Cmathbf%7Bx%7D_0%5Cright)%20%5C%7C%20p_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_T%5Cright)%5Cright)%7D_%7BL_T%7D%2B%5Csum_%7Bt%3D2%7D%5ET%20%5Cunderbrace%7BD_%7B%5Cmathrm%7BKL%7D%7D%5Cleft(q%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_t%2C%20%5Cmathbf%7Bx%7D_0%5Cright)%20%5C%7C%20p_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_%7Bt-1%7D%20%5Cmid%20%5Cmathbf%7Bx%7D_t%5Cright)%5Cright)%7D_%7BL_%7Bt-1%7D%7D-%5Cunderbrace%7B%5Clog%20p_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_0%20%5Cmid%20%5Cmathbf%7Bx%7D_1%5Cright)%7D_%7BL_0%7D%5D%20%5C%5C%20%26%20%5Cend%7Baligned%7D%0A


或者写为

%5Cbegin%7Baligned%7D%20%26%20%5Cmathcal%7BL%7D_%7BV%20L%20B%7D%3DL_T%2BL_%7BT-1%7D%2B%5Cldots%2BL_0%20%5C%5C%20%26%20L_T%3DD_%7BK%20L%7D%5Cleft(q%5Cleft(x_T%20%5Cmid%20x_0%5Cright)%20%5C%7C%20p_%5Ctheta%5Cleft(x_T%5Cright)%5Cright)%20%5C%5C%20%26%20L_t%3DD_%7BK%20L%7D%5Cleft(q%5Cleft(x_t%20%5Cmid%20x_%7Bt%2B1%7D%2C%20x_0%5Cright)%20%5C%7C%20p_%5Ctheta%5Cleft(x_t%20%5Cmid%20x_%7Bt%2B1%7D%5Cright)%5Cright)%20%3B%20%5Cquad%201%20%5Cleq%20t%20%5Cleq%20T-1%20%5C%5C%20%26%20L_0%3D-%5Clog%20p_%5Ctheta%5Cleft(x_0%20%5Cmid%20x_1%5Cright)%20.%20%5Cend%7Baligned%7D



突然打不出公式,只能截图了。。。


L_t%3D%5Cmathbb%7BE%7D_q%5Cleft%5B%5Cfrac%7B1%7D%7B2%5Cleft%5C%7C%5CSigma_%5Ctheta%5Cleft(x_t%2C%20t%5Cright)%5Cright%5C%7C_2%5E2%7D%5Cleft%5C%7C%5Ctilde%7B%5Cmu%7D_t%5Cleft(x_t%2C%20x_0%5Cright)-%5Cmu_%5Ctheta%5Cleft(x_t%2C%20t%5Cright)%5Cright%5C%7C%5E2%5Cright%5D%2BC


%5Cbegin%7Baligned%7D%20L_t%20%26%20%3D%5Cmathbb%7BE%7D_%7B%5Cmathbf%7Bx%7D_%7B0%2C%7D%20%5Cepsilon%7D%5Cleft%5B%5Cfrac%7B1%7D%7B2%5Cleft%5C%7C%5Cboldsymbol%7B%5CSigma%7D_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_t%2C%20t%5Cright)%5Cright%5C%7C_2%5E2%7D%5Cleft%5C%7C%5Ctilde%7B%5Cboldsymbol%7B%5Cmu%7D%7D_t%5Cleft(%5Cmathbf%7Bx%7D_t%2C%20%5Cmathbf%7Bx%7D_0%5Cright)-%5Cboldsymbol%7B%5Cmu%7D_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_t%2C%20t%5Cright)%5Cright%5C%7C%5E2%5Cright%5D%20%5C%5C%20%26%20%3D%5Cmathbb%7BE%7D_%7B%5Cmathbf%7Bx%7D%200%2C%20%5Cepsilon%7D%5Cleft%5B%5Cfrac%7B1%7D%7B2%5Cleft%5C%7C%5Cboldsymbol%7B%5CSigma%7D_%5Ctheta%5Cright%5C%7C_2%5E2%7D%5Cleft%5C%7C%5Cfrac%7B1%7D%7B%5Csqrt%7B%5Calpha_t%7D%7D%5Cleft(%5Cmathbf%7Bx%7D_t-%5Cfrac%7B1-%5Calpha_t%7D%7B%5Csqrt%7B1-%5Cbar%7B%5Calpha%7D_t%7D%7D%20%5Cboldsymbol%7B%5Cepsilon%7D_t%5Cright)-%5Cfrac%7B1%7D%7B%5Csqrt%7B%5Calpha_t%7D%7D%5Cleft(%5Cmathbf%7Bx%7D_t-%5Cfrac%7B1-%5Calpha_t%7D%7B%5Csqrt%7B1-%5Cbar%7B%5Calpha%7D_t%7D%7D%20%5Cboldsymbol%7B%5Cepsilon%7D_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_t%2C%20t%5Cright)%5Cright)%5Cright%5C%7C%5E2%5Cright%5D%20%5C%5C%20%26%20%3D%5Cmathbb%7BE%7D_%7B%5Cmathbf%7Bx%7D_%7B0%2C%7D%20%5Cepsilon%7D%5Cleft%5B%5Cfrac%7B%5Cleft(1-%5Calpha_t%5Cright)%5E2%7D%7B2%20%5Calpha_t%5Cleft(1-%5Cbar%7B%5Calpha%7D_t%5Cright)%5Cleft%5C%7C%5Cboldsymbol%7B%5CSigma%7D_%5Ctheta%5Cright%5C%7C_2%5E2%7D%5Cleft%5C%7C%5Cboldsymbol%7B%5Cepsilon%7D_t-%5Cboldsymbol%7B%5Cepsilon%7D_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_t%2C%20t%5Cright)%5Cright%5C%7C%5E2%5Cright%5D%20%5C%5C%20%26%20%3D%5Cmathbb%7BE%7D_%7B%5Cmathbf%7Bx%7D_%7B0%2C%20%5Cepsilon%7D%20%5Cepsilon%7D%5Cleft%5B%5Cfrac%7B%5Cleft(1-%5Calpha_t%5Cright)%5E2%7D%7B2%20%5Calpha_t%5Cleft(1-%5Cbar%7B%5Calpha%7D_t%5Cright)%5Cleft%5C%7C%5Cboldsymbol%7B%5CSigma%7D_%5Ctheta%5Cright%5C%7C_2%5E2%7D%5Cleft%5C%7C%5Cboldsymbol%7B%5Cepsilon%7D_t-%5Cboldsymbol%7B%5Cepsilon%7D_%5Ctheta%5Cleft(%5Csqrt%7B%5Cbar%7B%5Calpha%7D_t%7D%20%5Cmathbf%7Bx%7D_0%2B%5Csqrt%7B1-%5Cbar%7B%5Calpha%7D_t%7D%20%5Cboldsymbol%7B%5Cepsilon%7D_t%2C%20t%5Cright)%5Cright%5C%7C%5E2%5Cright%5D%20%5Cend%7Baligned%7D



但是根据后来的实验结果显示,使用忽略加权项的简化目标训练扩散模型效果更好:

%5Cbegin%7Baligned%7D%20L_t%5E%7B%5Ctext%20%7Bsimple%20%7D%7D%20%26%20%3D%5Cmathbb%7BE%7D_%7Bt%20%5Csim%5B1%2C%20T%5D%2C%20%5Cmathbf%7Bx%7D_0%2C%20%5Cepsilon_t%7D%5Cleft%5B%5Cleft%5C%7C%5Cepsilon_t-%5Cboldsymbol%7B%5Cepsilon%7D_%5Ctheta%5Cleft(%5Cmathbf%7Bx%7D_t%2C%20t%5Cright)%5Cright%5C%7C%5E2%5Cright%5D%20%5C%5C%20%26%20%3D%5Cmathbb%7BE%7D_%7Bt%20%5Csim%5B1%2C%20T%5D%2C%20%5Cmathbf%7Bx%7D_%7B0%2C%20%5Cepsilon%7D%2C%20%5Cepsilon%7D%5Cleft%5B%5Cleft%5C%7C%5Cboldsymbol%7B%5Cepsilon%7D_t-%5Cboldsymbol%7B%5Cepsilon%7D_%5Ctheta%5Cleft(%5Csqrt%7B%5Cbar%7B%5Calpha%7D_t%7D%20%5Cmathbf%7Bx%7D_0%2B%5Csqrt%7B1-%5Cbar%7B%5Calpha%7D_t%7D%20%5Cboldsymbol%7B%5Cepsilon%7D_t%2C%20t%5Cright)%5Cright%5C%7C%5E2%5Cright%5D%20%5Cend%7Baligned%7D


最终简化为:



L_%7B%5Ctext%20%7Bsimple%20%7D%7D%3DL_t%5E%7B%5Ctext%20%7Bsimple%20%7D%7D%2BC



加速扩散模型采样


通过遵循反向扩散过程的马尔可夫链从DDPM生成样品非常慢,因为 T 需要几千步才能获得高质量的结果。

DDIM中则提出了一种牺牲多样性来换取更快推理的手段。

本文就省略其数学推断过程,直接给出结论。


与DDPM相比,DDIM能够:

  1. 使用更少的步骤生成更高质量的样本。

  2. 具有“一致性”属性,因为生成过程是确定性的,这意味着以同一潜在变量为条件的多个样本应该具有类似的高级特征。

  3. 由于一致性,DDIM 可以在潜在变量中执行语义上有意义的插值。


LDM则将扩散过程放在潜在空间(隐空间)而不是像素空间中进行扩散过程,进一步降低了训练成本,提高了推理速度。具体过程可见于前篇。


DPM-Solver则是DDIM的高阶形式,基于DPM-Solver扩散模型的采样速度直接翻倍。



本篇为1.5篇,提前放出,亟待修改


















浅谈stable diffusion (一又二分之一)的评论 (共 条)

分享到微博请遵守国家法律