Malware is a constant threat for the security of devices and users. Successful and automatic malware detection is a critical necessity [1]. Existing malware detection solutions cannot accurately characterize the behavior of a malware and, thereby, they rely on other indicators, e.g., digital signatures. Nevertheless, behavior-based detection is an active field of research since it can deal with zero-day malware. Although many proposals leveraging machine learning (ML) classifiers have been put forward, finding proper behavioral features is still an open problem. Existing solutions typically consider either static or dynamic software features. Static refers to the program syntax while dynamic refers to features observed at runtime. However, both of them suffer from limitations which impact on the effectiveness of the ML classification. Here we follow a different approach. We used symbolic execution to model features that denote the malware behavior in a more precise way. To this aim, we introduce a novel feature specification language called Symbolic Feature Specification Language (SFSL). Each rule precisely models a specific malicious behavior that has been documented in past malware samples. Then, we apply local, bounded symbolic exploration to establish whether a binary under analysis matches the defined rules. Eventually, the result of the rule matching process is used to generate vectors of features for a ML classifier. Our current experiments with different ML classifiers show that this technique can lead to actual improvements of the classification accuracy. Moreover, since behavioral features do not depend on the program syntax, our methodology can even detect threats in new malware samples.

Enhancing malware classification with symbolic features

Costa G.;de Nicola R.
2021-01-01

Abstract

Malware is a constant threat for the security of devices and users. Successful and automatic malware detection is a critical necessity [1]. Existing malware detection solutions cannot accurately characterize the behavior of a malware and, thereby, they rely on other indicators, e.g., digital signatures. Nevertheless, behavior-based detection is an active field of research since it can deal with zero-day malware. Although many proposals leveraging machine learning (ML) classifiers have been put forward, finding proper behavioral features is still an open problem. Existing solutions typically consider either static or dynamic software features. Static refers to the program syntax while dynamic refers to features observed at runtime. However, both of them suffer from limitations which impact on the effectiveness of the ML classification. Here we follow a different approach. We used symbolic execution to model features that denote the malware behavior in a more precise way. To this aim, we introduce a novel feature specification language called Symbolic Feature Specification Language (SFSL). Each rule precisely models a specific malicious behavior that has been documented in past malware samples. Then, we apply local, bounded symbolic exploration to establish whether a binary under analysis matches the defined rules. Eventually, the result of the rule matching process is used to generate vectors of features for a ML classifier. Our current experiments with different ML classifiers show that this technique can lead to actual improvements of the classification accuracy. Moreover, since behavioral features do not depend on the program syntax, our methodology can even detect threats in new malware samples.
2021
ML based malware detection
Static program analysis
Symbolic program features
ML based malware detection
Static program analysis
Symbolic program features
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11771/21018
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
social impact