Honeypots are active sensors deployed to obtain information about attacks. In their search for vulnerabilities, attackers generate large volumes of logs, whose analysis is time consuming and cumbersome. We here evaluate whether Natural Language Processing (NLP) approaches can provide meaningful representations to find common traits in attackers' activity. We consider a widely used SSH/Telnet honeypot to record more than 200 000 sessions, including 61 000 unique shell scripts, some containing sequences of more than 100 Bash commands. We first parse the sessions to separate Bash commands, options and parameters. Next, we project each session in a metric space opposing two common tools used in NLP: Bag of Words and Word2Vec. Last, we leverage a clustering algorithm to aggregate the sessions while offering an instrumental representation of the clustering process. In the end, we obtain few tens of clusters that we analyze to explain the attackers' goals, i.e., obtain system information, inject malicious accounts, download and run executables, etc. Our work is a first step towards automatically identifying attack patterns on honeypots, thus effectively supporting security activities.

Towards NLP-based Processing of Honeypot Logs

Drago, I;
2022-01-01

Abstract

Honeypots are active sensors deployed to obtain information about attacks. In their search for vulnerabilities, attackers generate large volumes of logs, whose analysis is time consuming and cumbersome. We here evaluate whether Natural Language Processing (NLP) approaches can provide meaningful representations to find common traits in attackers' activity. We consider a widely used SSH/Telnet honeypot to record more than 200 000 sessions, including 61 000 unique shell scripts, some containing sequences of more than 100 Bash commands. We first parse the sessions to separate Bash commands, options and parameters. Next, we project each session in a metric space opposing two common tools used in NLP: Bag of Words and Word2Vec. Last, we leverage a clustering algorithm to aggregate the sessions while offering an instrumental representation of the clustering process. In the end, we obtain few tens of clusters that we analyze to explain the attackers' goals, i.e., obtain system information, inject malicious accounts, download and run executables, etc. Our work is a first step towards automatically identifying attack patterns on honeypots, thus effectively supporting security activities.
2022
IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)
Genova, Italy
06-10 June 2022
Proceedings of the 2022 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)
IEEE
314
321
978-1-6654-9560-8
Honeypots; NLP; Word2Vec; Bag of Words
Boffa, M; Milan, G; Vassio, L; Drago, I; Mellia, M; Ben Houidi, Z
File in questo prodotto:
File Dimensione Formato  
0_main.pdf

Accesso riservato

Descrizione: Articolo
Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione 738.47 kB
Formato Adobe PDF
738.47 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1876801
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 4
social impact