Explainable ransomware detection through static analysis and machine learning

IRIS

Cybersecurity has recently become crucial in daily life routines due to several attacks performed by malicious users. Over the years, researchers and experts have proposed several solutions leveraging artificial intelligence to curb these problems. This research proposes a malware detector able to classify malware, ransomware, and trusted Windows executable files leveraging machine learning. As the first step, we created a dataset of approximately 15,000 Portable Executable files from which we extracted opcodes and computed feature vectors like the frequency and the distribution of each opcode for each file taken under analysis. Once we concluded the dataset creation phase, multiple classifiers were trained and evaluated, with the Gradient Boosting achieving the highest accuracy of 0.870. To ensure robustness, we performed 5-fold cross-validation. Moreover, we identified the best two models and applied explainability using Local Interpretable Model-Agnostic Explanations to understand better which features were most relevant for a specific classification. In conclusion, we also analyzed the most frequently used opcode classes to aid in their classification.

Explainable ransomware detection through static analysis and machine learning / Ciaramella, G., Martinelli, F., Santone, A., Mercaldo, F.. - (2025), pp. 91-98. (CSR 2025 - 5th IEEE International Conference on Cyber Security and Resilience Chania, Greece 4-6/08/2025) [10.1109/csr64739.2025.11130044].

Explainable ransomware detection through static analysis and machine learning

Ciaramella Giovanni;Martinelli Fabio;Santone Antonella;Mercaldo Francesco

2025

Abstract

Cybersecurity has recently become crucial in daily life routines due to several attacks performed by malicious users. Over the years, researchers and experts have proposed several solutions leveraging artificial intelligence to curb these problems. This research proposes a malware detector able to classify malware, ransomware, and trusted Windows executable files leveraging machine learning. As the first step, we created a dataset of approximately 15,000 Portable Executable files from which we extracted opcodes and computed feature vectors like the frequency and the distribution of each opcode for each file taken under analysis. Once we concluded the dataset creation phase, multiple classifiers were trained and evaluated, with the Gradient Boosting achieving the highest accuracy of 0.870. To ensure robustness, we performed 5-fold cross-validation. Moreover, we identified the best two models and applied explainability using Local Interpretable Model-Agnostic Explanations to understand better which features were most relevant for a specific classification. In conclusion, we also analyzed the most frequently used opcode classes to aid in their classification.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Codice ISBN
	
				979-8-3315-3591-9
			
	Codice OpenAlex
	
				W4413680158
			
	Parole chiave
	
				Artificial intelligence
Cybersecurity
Explainability
Machine Learning
Malware detection
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Explainable_Ransomware_Detection_through_Static_Analysis_and_Machine_Learning.pdf non disponibili Descrizione: Explainable Ransomware Detection through Static Analysis and Machine Learning Tipologia: Versione Editoriale (PDF) Licenza: Copyright dell'editore Dimensione 721.3 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	721.3 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11771/39642

Citazioni

ND

1

1

social impact