Grad-CAM, ViTs, model decision, visualization of models, deep learning
Abstract
As the area of deep learning is advancing at the speed of light, the issue of model interpretability is now a priority for a better understanding and enhancement of the decision-making processes implemented by sophisticated neural networks. So, as deeper learning models create, it becomes vital to guarantee greater transparency and interpretability, in particular, in applications as medical image analytics, Auto-Driving, and Security Systems. The process of visualization of these decisions is assisted by Grad-CAM which is a powerful visualization tool. The rationale behind this work stems from the growing concern on the interpretability of deep learning models with the hope of systematically evaluating how different models attend to certain regions of images during classification. In this study, the three deep neural network models were Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), and Swin Transformer, which were used to classify the images with the help of Grad-CAM in visualizing the heatmaps of the important regions of the input images important for the models’ decision-making. The conclusively featured experimental outcomes prove that Grad-CAM can help to improve the interpretability of these deep networks irrespective of the used architecture or type. This work also extends Grad-CAM, demonstrating its capability to offer insights of model processes toward achieving better and more interpretable AI models.