In Score-Cam, we use the weights of the score obtained for a specific target class.
The first step involved is passing images to a CNN model and performing a forward_pass. After the forward pass, the activations are extracted from last convolutional layer in the network.
Each Activation Map obtained from the last layer having shape 1×m×n is then upsampled using bilinear-interpolation to the same size as the Input Image.
After Upsampling the activation maps, the resultant activation maps are normalized with each pixel within the range of [0,1] to maintain the relative intensities between the pixels
Ai,jk=maxAk−minAkAi,jk
After the Normalization of the Activation Maps is complete, the highlighted areas of the activation maps are projected on the input space by multiplying each normalized activation map(1 x W x H) with the Original Input Image(3 x W x H) to obtain a masked image M with shape 3 x W x H
Mk=Ak⋅I
The Masked Images M thus obtained are then passed to Convolutional Neural Network with SoftMax output
Sk=Softmax(F(Mk))
After getting the scores for each class we extract the score of the target class to represent the importance of the kth activation map.
wkc=Skc
compute the sum across all the activation maps for the linear combination between the target class score and each activation map
apply pixel-wise ReLU to the final activation map
LScoreCAMc=ReLU(kΣwkcAk)
ReLU because we are interested only in the features that have a positive influence on the class of interest
Advantages
can be used in any Convolutional Neural Network architecture and don’t require retraining of the model to produce saliency maps like CAM
class discriminative
removes irrelevant noise to produce a meaningful saliency map
Softmax scores as weights and removes the dependence on unstable gradients