Malware classification is a critical problem in cybersecurity, characterized by numerous challenges due to the complexity and diversity of malware variants. In this study, we propose a novel approach that transforms bytecode into image representations and employs the Vision Transformer (ViT) architecture for malware family classification. The proposed data preprocessing method preserves essential structural information of the malware while simplifying feature extraction. ViT leverages the self-attention mechanism to model complex and long-range dependencies, offering advantages over traditional CNN-based models. Experiments conducted on the Microsoft Malware Classification Challenge dataset demonstrate that the proposed model achieves high accuracy and F1-scores, particularly for malware families such as Kelihos_ver3 and Lollipop. Confusion matrix analysis reveals a strong discriminative capability across malware families, while also highlighting challenges in distinguishing families...
Malware classification is a critical problem in cybersecurity, characterized by numerous challenges due to the complexity and diversity of malware variants. In this study, we propose a novel approach that transforms bytecode into image representations and employs the Vision Transformer (ViT) architecture for malware family classification. The proposed data preprocessing method preserves essential structural information of the malware while simplifying feature extraction. ViT leverages the self-attention mechanism to model complex and long-range dependencies, offering advantages over traditional CNN-based models. Experiments conducted on the Microsoft Malware Classification Challenge dataset demonstrate that the proposed model achieves high accuracy and F1-scores, particularly for malware families such as Kelihos_ver3 and Lollipop. Confusion matrix analysis reveals a strong discriminative capability across malware families, while also highlighting challenges in distinguishing families with structurally similar or heavily obfuscated patterns. The study also discusses current limitations, including computational cost and the lack of integration of dynamic behavioral data, and outlines future research directions to improve performance and real-world applicability. Overall, the results highlight the potential of Vision Transformer architectures in malware classification, suggesting a promising avenue for further research in cybersecurity.