Chapter 16 - Normalizing Flows

 
  • Warning : If I explain every equation fully, we won’t get lunch. So I will share this document later.

Intro

  • [[Articles/Scalar/CycleGANscalarGAN|GANs]] samples should be as close to real data as possible hard to identify distribution
  • Normalizing flow take an input distribution convert it to a more complicated one using a NN learn a probability model

Algebra

What is a Normalizing Flow

  • Density estimation is hard
    • we need to run backprop in deep learning models, the posterior ) should be easy to calculate the derivative of. So, we use a gaussian distribution when we can. (Of course this is not really nice always)
  • NF models offer better and more powerful distribution approximation
  • Transforms a simple distribution into a complex one by applying a sequence of invertible transformation functions
    • Flowing through a chain of transformations, we repeatedly substitute the variable for the new one according to the change of variables theorem and eventually obtain a probability distribution of the final target variable.
  • only works with integer data

Uses of Normalizing Flows

  • Approximating density models

Steps for a Normalizing Flow

  • Consider a continuous random variable (simple distribution like Spherical gaussian)
  • Transform this distribution using a composition of functions to a more complicated one.
  • each is invertible (bijective function), so all of the transformations are also invertible
  • So what is the PDF of x?
    • Maybe,
    • but this is not really always true, and depends on the behavior of the function f
    • why?
      • Open: Pasted image 20241119181152.png
      • z = f(x) is where the point in x moves to z
      • the det is the mass that gets pushed in the neighborhood
    • forward and backward mapping
  • From the Change of variable theorem and Jacobian Matrix, we get
    • Assume
    • A valid pdf must always integrate to 1
  • So what is a normalizing flow?
    • A sequence of bijective transformations is a normalizing flow. After z is initially sampled, it flows through these steps.
      • Density function is re-normalized at each step to ensure it remains valid.
  • forward mapping for deep neural networks - -
  • Loss function
    • Let us parameterize the likelihood function as a flow
      • We have,
      • Applying log on both sides,
    • Overall transformation f is a composition of a sequence of functions
      • replace the log determinant with the sum of the log determinants of the intermediate jacobians
      • Using this, we can now use Maximum Likelihood to train our model since we can calculate the log marginal likelihood exactly
      • Exact posterior inference (unique z for a given x using ())
    • This determinant is now very expensive to compute (of course it is :P), so we ensure that the Jacobian is triangular.
    • Using coupling flows, we now get
  • Now it is annoying to only have latent space = dimensionality of the data (need this for a bijection)

Other Detailed Information (if time permits)

Vs Others

  • vs. VAE
    • can only get the lower bound on log-likelihood (ELBO loss)
    • approximate posterior
  • GAN
    • no log likelihood, only min-max evaluation
    • no latent variable inference

References + Resources for the Math