Asm2Vec is an algorithm capable of learning representations for binary files using word embedding techniques. Researchers have employed this approach for binary analysis as well as malware classification. Malware classification is, however, known to be widely affected by drift, i.e., models built to identify a particular malware family become obsolete rapidly. We ask whether representation learning approaches such as Asm2Vec help reduce the impact of drift in malware classification. To answer this question, we design an experiment using two public malware datasets and train classic machine learning models with (i) static features extracted from malware headers and (ii) features obtained using Asm2Vec. Our results show that there is little difference in relation to the effect of drift and that the classifiers trained with Asm2Vec resources present worse classification performance. We provide initial insights into the effects of representation learning on the drift in malware classification.
Does Asm2Vec Reduce Drift on Malware Classification?
Rocha, Rafael;Rosa, Stefano de;Castagno, Paolo;Drago, Idilio;
2023-01-01
Abstract
Asm2Vec is an algorithm capable of learning representations for binary files using word embedding techniques. Researchers have employed this approach for binary analysis as well as malware classification. Malware classification is, however, known to be widely affected by drift, i.e., models built to identify a particular malware family become obsolete rapidly. We ask whether representation learning approaches such as Asm2Vec help reduce the impact of drift in malware classification. To answer this question, we design an experiment using two public malware datasets and train classic machine learning models with (i) static features extracted from malware headers and (ii) features obtained using Asm2Vec. Our results show that there is little difference in relation to the effect of drift and that the classifiers trained with Asm2Vec resources present worse classification performance. We provide initial insights into the effects of representation learning on the drift in malware classification.File | Dimensione | Formato | |
---|---|---|---|
SBSEG_27207-877-22277-1-10-20240118.pdf
Accesso aperto
Tipo di file:
POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione
295.33 kB
Formato
Adobe PDF
|
295.33 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.