Multimodal Explanation
- The visual explanation was created by an attention mechanism that conveyed knowledge about what region of the image was important for the decision. This explanation guides the generation of the textual justification out of a LSTM feature, which is a prediction of a classification problem over all possible justifications.