only penalizes structure at the scale of local image patches
tries to classify if each N×N patch in an image is real or fake
discriminator is run convolutionally across the image, averaging all responses to provide the ultimate output of D
effectively models the image as a Markov random field
assuming independence between pixels separated by more than a patch diameter
type of texture/style loss
rather the regular GAN maps from a 256×256 image to a single scalar output, which signifies “real” or “fake”, whereas the PatchGAN maps from 256×256 to an NxN (here 70×70) array of outputs X, where each Xij signifies whether the patch ij in the image is real or fake.