CINECA IRIS Institutional Research Information System

Network security relies on effective measurements and analysis for identifying malicious traffic. Recent proposals aim at automatically learning compact and informative representations (i.e. embeddings) of network traffic that capture salient features. These representations can serve multiple downstream tasks, streamlining the machine learning pipeline. Researchers have proposed techniques borrowed from Natural Language Processing (NLP) and Graph Neural Networks (GNN) to learn such embeddings, with both lines delivering promising results.This paper investigates the benefits of combining complementary sources of information represented by embeddings learnt via different techniques and from different data. We rely on classifiers based on traditional features engineering and on automatic embedding generation (borrowing from NLP and GNN) to classify hosts observed from darknets and honeypots. We then stack these base classifiers trained on each embedding through meta-learning to combine the complementary information sources to improve performance.Our results show that meta-learning outperforms each single classifier. Importantly, the proposed meta-learner provides explainability on the importance of the embedding types and the impact of each data source on the outcome. All in all, this work is a step forward in the search for more effective, general, understandable, and practical representations that could carry multiple traffic characteristics.

Explainable Stacking Models based on Complementary Traffic Embeddings

Gioacchini, Luca;Santos, Welton;Lopes, Barbara;Drago, Idilio;Mellia, Marco;Almeida, Jussara M.;Gonçalves, Marcos André

2024-01-01

Abstract

Network security relies on effective measurements and analysis for identifying malicious traffic. Recent proposals aim at automatically learning compact and informative representations (i.e. embeddings) of network traffic that capture salient features. These representations can serve multiple downstream tasks, streamlining the machine learning pipeline. Researchers have proposed techniques borrowed from Natural Language Processing (NLP) and Graph Neural Networks (GNN) to learn such embeddings, with both lines delivering promising results.This paper investigates the benefits of combining complementary sources of information represented by embeddings learnt via different techniques and from different data. We rely on classifiers based on traditional features engineering and on automatic embedding generation (borrowing from NLP and GNN) to classify hosts observed from darknets and honeypots. We then stack these base classifiers trained on each embedding through meta-learning to combine the complementary information sources to improve performance.Our results show that meta-learning outperforms each single classifier. Importantly, the proposed meta-learner provides explainability on the importance of the embedding types and the impact of each data source on the outcome. All in all, this work is a step forward in the search for more effective, general, understandable, and practical representations that could carry multiple traffic characteristics.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Titolo dell'evento
	
				IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)
			
	Luogo dell'evento
	
				Vienna, Austria
			
	Data dell'evento
	
				08-12 July 2024
			
	Titolo del volume
	
				Proceedings of the 2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)
			
	Nome editore
	
				IEEE COMPUTER SOC
			
	Pagine (da)
	
				261
			
	Pagine (a)
	
				272
			
	DOI
	
				https://dx.doi.org/10.1109/eurospw61312.2024.00035
			
	Parole Chiave
	
				Representation learning; traffic classification; meta-learning; model stacking
			
	Tutti gli autori
	
						Gioacchini, Luca; Santos, Welton; Lopes, Barbara; Drago, Idilio; Mellia, Marco; Almeida, Jussara M.; Gonçalves, Marcos André
					
	Appare nelle tipologie:
	
				04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
2024_WTMC_Stacking.pdf Accesso aperto Tipo di file: PREPRINT (PRIMA BOZZA) Dimensione 581.23 kB Formato Adobe PDF Visualizza/Apri	581.23 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2031935

Citazioni

ND

0

0

social impact