Smooth-Grad

@smilkovSmoothGradRemovingNoise2017
reduces visual noise and, hence, improves visual explanations about how a DNN is making a classification decision. Comparing their work to several gradient-based sensitivity map methods such as LRP, [DeepLift], and [Integrated Gradients](DeepLift], and [Integrated Gradients.md) (IG) [96], which estimate the global importance of each pixel and create saliency maps, showed that Smooth-Grad focuses on local sensitivity and calculates averaging maps with a smoothing effect made from several small perturbations of an input image. The effect is enhanced by further training with these noisy images and finally having an impact on the quality of sensitivity maps by sharpening them.
a local, post hoc approach gave visual and textual justifications of the predictions with the help of two novel explanation datasets through crowd sourcing.
involves adding random noise to the input and computing the attribution maps multiple times with the noisy inputs.
The final attribution map is obtained by averaging the maps obtained from the noisy inputs. The idea behind this technique is that the noise added to the input image will cause the model to activate different features in the input, resulting in a more stable and interpretable attribution map.

Technical Details

Consider an image classification task where an input image $x$ is to be classified as a single class from a set $C$ . For every class $c \in C$ , the output class is represented as $c l a ss (x) = a r g ma x_{c \in C} S_{c} (x)$ . Using this $c l a ss$ , a sensitivity map $M_{c} (x)$ can be generated by differentiating with respect to $x$ , $M_{c} (x) = \frac{\partial S _{c}}{\partial x}$ . $M_{c}$ , being a sensitivity map, thus represents the influential regions of the image used to make the prediction. Since these maps are noisy in nature, Smilkov et al. propose SmoothGrad, a modification of the previous method where instead of using $\partial S_{c}$ , a smoothing is applied using a Gaussian kernel to $\partial S_{c}$ . The authors also find that it is not possible to directly compute the smoothing due to high dimensionality, and thus approximate the calculation by averaging multiple maps computed in the neighborhood of $x$ using random sampling. The final SmoothGrad equation then becomes $\hat{M}_{c} (x) = \frac{1}{n} Σ_{1}^{n} M_{c} (x + N (0, σ^{2}))$ , where $N (0, σ^{2})$ is the Gaussian noise and $σ$ is the standard deviation.