Over the years, the widespread adoption of the Microsoft (MS) Office suite as a productivity tool used by millions of users worldwide has attracted the interest of malicious users in exploiting its vulnerabilities. The most known of these concerns documents containing macros, whose construction is becoming increasingly complex in order to avoid detection of malicious, hidden, behaviors. In this context, this paper presents a novel technique that exploits Large Language Models (LLMs) to extract a set of linguistic features that could reveal the presence of malicious code embedded within macros, even in case of obfuscation. The experimental evaluation, conducted on a publicly available dataset of MS Office files, indicates that the proposed system achieves robust detection of obfuscated malicious macros. Moreover, the performances of a lighter, purely statistical, method are also evaluated so as to offer analysts the choice between a high-precision, resource-intensive model, or a more time-efficient alternative.

Obfuscation-resistant feature extraction for macro-based office malware detection / Imperiale, Sergio; Morana, Marco; Lo Re, Giuseppe. - 4198:(2026). ( ITASEC & SERICS 2026 - Joint National Conference on Cybersecurity Cagliari, Italy 09-13/02/2026).

Obfuscation-resistant feature extraction for macro-based office malware detection

Imperiale Sergio
;
2026

Abstract

Over the years, the widespread adoption of the Microsoft (MS) Office suite as a productivity tool used by millions of users worldwide has attracted the interest of malicious users in exploiting its vulnerabilities. The most known of these concerns documents containing macros, whose construction is becoming increasingly complex in order to avoid detection of malicious, hidden, behaviors. In this context, this paper presents a novel technique that exploits Large Language Models (LLMs) to extract a set of linguistic features that could reveal the presence of malicious code embedded within macros, even in case of obfuscation. The experimental evaluation, conducted on a publicly available dataset of MS Office files, indicates that the proposed system achieves robust detection of obfuscated malicious macros. Moreover, the performances of a lighter, purely statistical, method are also evaluated so as to offer analysts the choice between a high-precision, resource-intensive model, or a more time-efficient alternative.
2026
Macros security, Malware detection, Machine learning
File in questo prodotto:
File Dimensione Formato  
Obfuscation_Resistant_Feature_Extraction_for_Macro_based_Office_Malware_Detection.pdf

accesso aperto

Descrizione: Obfuscation-Resistant Feature Extraction for Macro-based Office Malware Detection
Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 2.84 MB
Formato Adobe PDF
2.84 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11771/40698
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
social impact