instead of deleting a patch, the patch is replaced with some other image region
y this approach, an image shares multiple class labels, whereas the major class label belongs to the original class label
Hence, the model learns to differentiate between two classes within a single image.
CutMix can be defined by the following operations x∼=M⊙xA+(1−M)⊙xB
y∼=λyA+(1−λ)yB
where x is an RGB image, y is the respective label, M is a binary mask of the patch of the image that will be dropped and ⊙ represents element wise multiplication. The new training sample x∼,y∼ is created by combining two other training samples xA,yA and xB,yB. To control the combination ratio λ, a sample from the β(1,1) distribution is chosen. This combination is quite similar to Mixup.