Asm2Vec is an algorithm capable of learning representations for binary files using word embedding techniques. Researchers have employed this approach for binary analysis as well as malware classification. Malware classification is, however, known to be widely affected by drift, i.e., models built to identify a particular malware family become obsolete rapidly. We ask whether representation learning approaches such as Asm2Vec help reduce the impact of drift in malware classification. To answer this question, we design an experiment using two public malware datasets and train classic machine learning models with (i) static features extracted from malware headers and (ii) features obtained using Asm2Vec. Our results show that there is little difference in relation to the effect of drift and that the classifiers trained with Asm2Vec resources present worse classification performance. We provide initial insights into the effects of representation learning on the drift in malware classification.

Does Asm2Vec Reduce Drift on Malware Classification?

Rocha, Rafael;Rosa, Stefano de;Castagno, Paolo;Drago, Idilio;
2023-01-01

Abstract

Asm2Vec is an algorithm capable of learning representations for binary files using word embedding techniques. Researchers have employed this approach for binary analysis as well as malware classification. Malware classification is, however, known to be widely affected by drift, i.e., models built to identify a particular malware family become obsolete rapidly. We ask whether representation learning approaches such as Asm2Vec help reduce the impact of drift in malware classification. To answer this question, we design an experiment using two public malware datasets and train classic machine learning models with (i) static features extracted from malware headers and (ii) features obtained using Asm2Vec. Our results show that there is little difference in relation to the effect of drift and that the classifiers trained with Asm2Vec resources present worse classification performance. We provide initial insights into the effects of representation learning on the drift in malware classification.
2023
XXIII Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais
Juiz de Fora, Brazil
18/09/2023
Anais do XXIII Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais
Sociedade Brasileira de Computação
195
208
Rocha, Rafael; Rosa, Stefano de; Castagno, Paolo; Drago, Idilio; Pereira Junior, Lourenço Alves
File in questo prodotto:
File Dimensione Formato  
SBSEG_27207-877-22277-1-10-20240118.pdf

Accesso aperto

Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione 295.33 kB
Formato Adobe PDF
295.33 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2007475
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact