Obfuscation-resistant feature extraction for macro-based office malware detection

IRIS

Over the years, the widespread adoption of the Microsoft (MS) Office suite as a productivity tool used by millions of users worldwide has attracted the interest of malicious users in exploiting its vulnerabilities. The most known of these concerns documents containing macros, whose construction is becoming increasingly complex in order to avoid detection of malicious, hidden, behaviors. In this context, this paper presents a novel technique that exploits Large Language Models (LLMs) to extract a set of linguistic features that could reveal the presence of malicious code embedded within macros, even in case of obfuscation. The experimental evaluation, conducted on a publicly available dataset of MS Office files, indicates that the proposed system achieves robust detection of obfuscated malicious macros. Moreover, the performances of a lighter, purely statistical, method are also evaluated so as to offer analysts the choice between a high-precision, resource-intensive model, or a more time-efficient alternative.

Obfuscation-resistant feature extraction for macro-based office malware detection / Imperiale, Sergio; Morana, Marco; Lo Re, Giuseppe. - 4198:(2026). ( ITASEC & SERICS 2026 - Joint National Conference on Cybersecurity Cagliari, Italy 09-13/02/2026).

Obfuscation-resistant feature extraction for macro-based office malware detection

Imperiale Sergio;Morana Marco;Lo Re Giuseppe

2026

Abstract

Over the years, the widespread adoption of the Microsoft (MS) Office suite as a productivity tool used by millions of users worldwide has attracted the interest of malicious users in exploiting its vulnerabilities. The most known of these concerns documents containing macros, whose construction is becoming increasingly complex in order to avoid detection of malicious, hidden, behaviors. In this context, this paper presents a novel technique that exploits Large Language Models (LLMs) to extract a set of linguistic features that could reveal the presence of malicious code embedded within macros, even in case of obfuscation. The experimental evaluation, conducted on a publicly available dataset of MS Office files, indicates that the proposed system achieves robust detection of obfuscated malicious macros. Moreover, the performances of a lighter, purely statistical, method are also evaluated so as to offer analysts the choice between a high-precision, resource-intensive model, or a more time-efficient alternative.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Parole chiave
	
				Macros security, Malware detection, Machine learning
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Obfuscation_Resistant_Feature_Extraction_for_Macro_based_Office_Malware_Detection.pdf accesso aperto Descrizione: Obfuscation-Resistant Feature Extraction for Macro-based Office Malware Detection Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 2.84 MB Formato Adobe PDF Visualizza/Apri	2.84 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11771/40698

Citazioni

ND

ND

social impact