Deep Neural Networks Are Easily Fooled High Confidence Predictions for Unrecognizable Images

  • @nguyenDeepNeuralNetworks2015
  • Current study
    • False positives
    • MAP Elites algorithm
    • parallel generation
    • Direct encodings
    • Indirect encodings
    • Gradient ascent generation

GA

  • Population of individuals
  • Each individual has a fitness
  • Mutation makes small edits to specific individuals
  • Recombination (not used here)

Direct Encoding

  • Individuals are images in pixel space
  • Fitness is the confidence of the DNN that the individual is a class
  • Mutations make edits to the pixel values

Indirect Encoding

  • Individuals are Compositional Pattern-Producing Networks (CPPNs)
  • The CPPN generates an image
  • All individuals initially have no hidden neurons
  • Mutations add new neurons to the networks
  • Maximize confidence of the network

Gradient Ascent

  • Take the gradient with respect to the image pixel values
  • Modify the image by moving it in the direction of the gradient

MNIST Results - EAs

ImageNet Results - EAs

  • Harder to fool
  • Different runs result into differences in patterns
  • Removing repetitive patterns does not cause a dramatic confidence drop
  • Global structures are not learned

What about a Fooling Class?

  • MNIST
    • Added an 11th fooling class.
    • Evolved unrecognizable images were still recognized as digits.
    • Number of misclassifications did not decrease.
  • ImageNet
    • Added an 1001st fooling class.
    • No decrease in confidence for directly evolved images, but already low confidence.
    • Confidence decreased from 88.1% to 11.7% for indirectly evolved images.
  • Indirectly evolved images are easier to differentiate.

Gradient Ascent Results

  • Maximize softmax output
  • Produced unrecognizable images classified with 99.99% confidence