Puzzle Mix

@kimPuzzleMixExploiting2020
learns to augment two images optimally based on saliency.
Images are divided into regions for the mixup
The algorithm learns to transport the salient region of one image such that the output image has the maximized saliency from both images.
$h (x_{0}, x_{1}) = (1 - z) ⊙ Π_{0}^{T} x_{0} + z ⊙ Π_{1}^{T} x_{1}$
where $z_{i}$ is a binary mask, $λ = \frac{1}{n} Σ_{i} z_{i}$ is the mixing ratio and $Π_{0}, Π_{1}$ are represent $n \times n$ grids that denote the amount of mass that is transported during transport of the image patch to another location.

Subhaditya's KB