Cybersecurity has recently become crucial in daily life routines due to several attacks performed by malicious users. Over the years, researchers and experts have proposed several solutions leveraging artificial intelligence to curb these problems. This research proposes a malware detector able to classify malware, ransomware, and trusted Windows executable files leveraging machine learning. As the first step, we created a dataset of approximately 15,000 Portable Executable files from which we extracted opcodes and computed feature vectors like the frequency and the distribution of each opcode for each file taken under analysis. Once we concluded the dataset creation phase, multiple classifiers were trained and evaluated, with the Gradient Boosting achieving the highest accuracy of 0.870. To ensure robustness, we performed 5-fold cross-validation. Moreover, we identified the best two models and applied explainability using Local Interpretable Model-Agnostic Explanations to understand better which features were most relevant for a specific classification. In conclusion, we also analyzed the most frequently used opcode classes to aid in their classification.
Explainable ransomware detection through static analysis and machine learning / Ciaramella, Giovanni; Martinelli, Fabio; Santone, Antonella; Mercaldo, Francesco. - (2025), pp. 91-98. ( CSR 2025 - 5th IEEE International Conference on Cyber Security and Resilience Chania, Greece 4-6/08/2025) [10.1109/csr64739.2025.11130044].
Explainable ransomware detection through static analysis and machine learning
Ciaramella Giovanni
;
2025
Abstract
Cybersecurity has recently become crucial in daily life routines due to several attacks performed by malicious users. Over the years, researchers and experts have proposed several solutions leveraging artificial intelligence to curb these problems. This research proposes a malware detector able to classify malware, ransomware, and trusted Windows executable files leveraging machine learning. As the first step, we created a dataset of approximately 15,000 Portable Executable files from which we extracted opcodes and computed feature vectors like the frequency and the distribution of each opcode for each file taken under analysis. Once we concluded the dataset creation phase, multiple classifiers were trained and evaluated, with the Gradient Boosting achieving the highest accuracy of 0.870. To ensure robustness, we performed 5-fold cross-validation. Moreover, we identified the best two models and applied explainability using Local Interpretable Model-Agnostic Explanations to understand better which features were most relevant for a specific classification. In conclusion, we also analyzed the most frequently used opcode classes to aid in their classification.| File | Dimensione | Formato | |
|---|---|---|---|
|
Explainable_Ransomware_Detection_through_Static_Analysis_and_Machine_Learning.pdf
non disponibili
Descrizione: Explainable Ransomware Detection through Static Analysis and Machine Learning
Tipologia:
Versione Editoriale (PDF)
Licenza:
Copyright dell'editore
Dimensione
721.3 kB
Formato
Adobe PDF
|
721.3 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

