Stop Thinking, Just Do!

Sungsoo Kim's Blog

Denoising Score Matching for Gaussian Prob. Path

tagsTags

18 June 2025


Denoising Score Matching for Gaussian Probability Path

Key Points

  • 연구에 따르면, Denoising Score Matching은 확산 모델에서 가우시안 잡음이 포함된 데이터의 스코어 함수를 추정하는 기법으로 보입니다.
  • Gaussian Probability Path는 데이터가 점진적으로 가우시안 잡음으로 변하는 경로로, 확산 과정의 각 단계가 가우시안 분포로 구성될 가능성이 높습니다.
  • 이 기법은 이미지 생성과 같은 데이터 생성 작업에서 효과적이며, 복잡한 고차원 데이터에 강력한 잠재력을 가질 수 있습니다.

Denoising Score Matching for Gaussian Probability Path

개요

Denoising Score Matching for Gaussian Probability Path는 확산 모델에서 데이터를 생성하기 위해 사용되는 기법으로, 데이터에 가우시안 잡음을 점진적으로 추가하여 특정 경로를 형성한 후 이를 역으로 따라가며 새로운 데이터를 생성합니다. 이 과정은 주로 이미지 생성과 같은 작업에서 효과적입니다.

Denoising Score Matching이란?

Denoising Score Matching(DSM)은 데이터 분포의 스코어 함수(로그 확률 밀도 함수의 그라디언트)를 추정하는 방법입니다. 여기서 스코어 함수는 데이터가 더 가능성이 높은 방향으로 이동해야 하는 방향을 나타냅니다. DSM은 원래 데이터 \(\mathbf{x}\)에 가우시안 잡음 \(\epsilon \sim \mathcal{N}(0, \sigma^2 \cdot \mathbf{I})\)를 추가하여 잡음이 포함된 데이터 \(\tilde{\mathbf{x}} = \mathbf{x} + \epsilon\)를 생성합니다.
이렇게 생성된 데이터의 분포 \(q_\sigma(\tilde{\mathbf{x}} \| \mathbf{x})\)는 \(\mathbf{x}\)를 중심으로 분산 \(\sigma^2\)를 가진 가우시안 분포입니다. 모델 \(s_\theta\)는 이 잡음이 포함된 데이터의 스코어 함수를 추정하며, 목적 함수는 다음과 같습니다:
\(J_{DSM_\sigma}(\theta) = \mathbb{E}_{q_\sigma (\tilde{\mathbf{x}}, \mathbf{x})}[ \frac{1}{2} \lVert s_\theta (\tilde{\mathbf{x}}) - \triangledown_{\tilde{\mathbf{x}}} \log q_\sigma (\tilde{\mathbf{x}} \| \mathbf{x})\rVert^2_2]\)
여기서 \(\triangledown_{\tilde{\mathbf{x}}} \log q_\sigma (\tilde{\mathbf{x}} \| \mathbf{x}) = \frac{(\mathbf{x} - \tilde{\mathbf{x}})}{\sigma^2}\)로, 모델은 잡음을 제거하는 방향을 학습합니다.

Gaussian Probability Path란?

Gaussian Probability Path는 확산 모델에서 데이터가 지나가는 확률 분포의 경로를 의미합니다. 확산 과정은 데이터에 점진적으로 가우시안 잡음을 추가하여 데이터 분포를 순수한 가우시안 잡음 분포로 변환합니다. 예를 들어, 각 단계 \(t\)에서 데이터 \(\mathbf{x}_t = \sqrt{\alpha_t} \mathbf{x}_{t-1} + \sqrt{1 - \alpha_t} \epsilon_t\) (여기서 \(\epsilon_t \sim \mathcal{N}(0, \mathbf{I})\))로 업데이트됩니다.
이 과정에서 각 단계의 분포 \(q(\mathbf{x}_t)\)는 가우시안 분포이며, 이 모든 분포를 연결한 것이 Gaussian Probability Path입니다. 이 경로는 \(t=0\)에서 원래 데이터 분포로 시작하여 \(t=T\)에서 \(\mathcal{N}(0, \mathbf{I})\)로 끝납니다.

DSM과 Gaussian Probability Path의 연결

DSM은 각 단계 \(t\)에서의 스코어 함수를 추정하여 Gaussian Probability Path를 역으로 따라가는 데 사용됩니다. 즉, 순수한 가우시안 잡음 \(\mathbf{x}_T\)로부터 시작하여 Langevin Dynamics와 같은 방법을 통해 점진적으로 잡음을 제거하며 원래 데이터 \(\mathbf{x}_0\)로 돌아가는 경로를 생성합니다.
이 과정은 특히 이미지 생성 작업에서 뛰어난 성능을 보이며, 복잡한 고차원 데이터에 강력한 잠재력을 가집니다.


Survey Note: Detailed Analysis of Denoising Score Matching for Gaussian Probability Path

This section provides a comprehensive exploration of the concept “Denoising Score Matching for Gaussian Probability Path,” based on available information and interpretations. The analysis aims to cover theoretical foundations, practical applications, and areas of uncertainty, ensuring a thorough understanding for researchers and practitioners.

1. Theoretical Framework and Interpretation

“Denoising Score Matching for Gaussian Probability Path” is not a widely recognized standalone term but can be inferred as a combination of Denoising Score Matching (DSM) and the concept of a Gaussian Probability Path within the context of diffusion models. DSM, first introduced by Vincent (2011) [2], is a technique for estimating the score function (the gradient of the log probability density) of a noised data distribution by training a model to denoise the data. The score function is crucial for generative modeling, as it indicates the direction in which data should move to increase its likelihood under the true distribution.

The “Gaussian Probability Path” likely refers to the sequence of probability distributions that data passes through in a diffusion process, where Gaussian noise is progressively added. In diffusion models, such as those proposed by Ho et al. (2020) [1], the forward process systematically adds Gaussian noise to the data over multiple timesteps, transforming the data distribution into a pure Gaussian noise distribution. Each step in this process can be modeled as a Gaussian distribution, forming a path from the original data distribution to the noise distribution.

Research suggests that DSM is particularly effective when the noise added is Gaussian, as seen in the blog post by Johannes Schusterbauer (Denoising Score Matching Explained), which details how DSM links to denoising autoencoders and uses Gaussian noise perturbations. The DSM objective is formalized as:

\[J_{DSM_\sigma}(\theta) = \mathbb{E}_{q_\sigma (\tilde{\mathbf{x}}, \mathbf{x})}[ \frac{1}{2} \lVert s_\theta (\tilde{\mathbf{x}}) - \triangledown_{\tilde{\mathbf{x}}} \log q_\sigma (\tilde{\mathbf{x}} \| \mathbf{x})\rVert^2_2],\]

where \(q_\sigma(\tilde{\mathbf{x}} \| \mathbf{x}) \sim \mathcal{N}(\mathbf{x}, \sigma^2 \cdot \mathbf{I})\), and the gradient is derived as \(\triangledown_{\tilde{\mathbf{x}}} \log q_\sigma (\tilde{\mathbf{x}} \| \mathbf{x}) = \frac{(\mathbf{x} - \tilde{\mathbf{x}})}{\sigma^2}\). This formulation aligns with the Gaussian nature of the noise, supporting the idea that the probability path is composed of Gaussian distributions.

2. Diffusion Models and Gaussian Probability Path

Diffusion models, such as Denoising Diffusion Probabilistic Models (DDPMs) by Ho et al. (2020) [1], operate on a forward process that adds Gaussian noise progressively. At each timestep \(t\), the data \(\mathbf{x}_t\) is updated as:

\[\mathbf{x}_t = \sqrt{\alpha_t} \mathbf{x}_{t-1} + \sqrt{1 - \alpha_t} \epsilon_t, \quad \epsilon_t \sim \mathcal{N}(0, \mathbf{I}),\]

where \(\alpha_t\) is a time-dependent coefficient. This process results in each \(\mathbf{x}_t\) following a Gaussian distribution, with the variance increasing as \(t\) increases. The sequence of distributions \(q(\mathbf{x}_t)\) for \(t = 0\) to \(T\) forms the “Gaussian Probability Path,” starting from the original data distribution \(q(\mathbf{x}_0)\) and ending at \(q(\mathbf{x}_T) \sim \mathcal{N}(0, \mathbf{I})\).

The evidence leans toward this interpretation, as seen in resources like What is Denoising Score Matching, which describe how diffusion models handle noise-corrupted data with Gaussian noise at each timestep. The path is crucial for the reverse process, where DSM is used to estimate the score at each step to denoise and generate new data.

3. Connection Between DSM and Gaussian Probability Path

DSM is integral to the reverse process in diffusion models, where the model learns to estimate the score function at each timestep \(t\) to move \(\mathbf{x}_t\) back toward \(\mathbf{x}_{t-1}\). This is achieved by minimizing the DSM objective, which, as shown in Schusterbauer’s blog, involves matching the estimated score \(s_\theta(\tilde{\mathbf{x}})\) to the true score $$\triangledown_{\tilde{\mathbf{x}}} \log q_\sigma (\tilde{\mathbf{x}} \mathbf{x})$$. Given that the noise is Gaussian, the path through which the data evolves is naturally Gaussian, reinforcing the concept of a Gaussian Probability Path.

Practical implementations, such as those discussed in Generative Modeling by Estimating Gradients of the Data Distribution, highlight that DSM allows for flexible model architectures and exact log-likelihood computation, particularly when dealing with continuous data distributions perturbed by Gaussian noise. The blog also notes challenges, such as slow sampling speed due to Langevin-type iterations, but these are mitigated by numerical ODE solvers in modern approaches.

4. Applications and Practical Implications

Denoising Score Matching for Gaussian Probability Path is particularly effective in generative tasks, such as image generation, as evidenced by its application in benchmarks like CIFAR-10, as mentioned in ICLR 2022 Denoising Likelihood Score Matching. The technique allows for high-quality sample generation by leveraging the Gaussian nature of the diffusion path, enabling models to handle varying levels of noise corruption iteratively.

A trade-off exists in choosing the noise scale \(\sigma^2\), as noted in Schusterbauer’s blog. Small \(\sigma^2\) approximates the score well but may miss low-density regions, while large \(\sigma^2\) covers low-density regions but is less accurate. This is mitigated in DDPMs by using multiple noise scales, ensuring robust performance across the Gaussian Probability Path.

5. Challenges and Uncertainties

The main challenge is the lack of a standardized definition for “Gaussian Probability Path,” as searches for this term yielded no direct results, suggesting it is an interpretive concept within diffusion models. The interpretation provided here is based on the logical extension of DSM and diffusion processes, supported by academic papers and blogs. However, users should be aware that the exact terminology may vary by context, and further research may refine this understanding.

Comparative Analysis and Tables

To organize the concepts, let’s compare DSM and Gaussian Probability Path based on their definitions, applications, and level of standardization:

Concept Definition Applications Standardization
Denoising Score Matching Estimates score function of noised data distribution, typically with Gaussian noise Image generation, audio synthesis, representation learning Standard in diffusion models
Gaussian Probability Path Sequence of Gaussian distributions in diffusion process, from data to noise Data generation in diffusion models Not standard, interpretive

This table highlights the varying levels of standardization, with DSM being well-established, while Gaussian Probability Path requires contextual interpretation.

Conclusion

This analysis provides a detailed exploration of Denoising Score Matching for Gaussian Probability Path, suggesting it is a combination of DSM applied within the Gaussian diffusion process of diffusion models. The evidence leans toward DSM being effective for estimating scores along a path of Gaussian distributions, with applications in generative modeling. Researchers are encouraged to consider the context and potential interdisciplinary applications when using these concepts.


Key Citations