learns to augment two images optimally based on saliency.
Images are divided into regions for the mixup
The algorithm learns to transport the salient region of one image such that the output image has the maximized saliency from both images.
h(x0,x1)=(1−z)⊙Π0Tx0+z⊙Π1Tx1
where zi is a binary mask, λ=n1Σizi is the mixing ratio and Π0,Π1 are represent n×n grids that denote the amount of mass that is transported during transport of the image patch to another location.