CINECA IRIS Institutional Research Information System

Vulnerability management requires identifying, classifying, and prioritizing security threats. Recent research has explored using Large Language Models (LLMs) to analyze Common Vulnerabilities and Exposures (CVEs), generating metadata to categorize vulnerabilities (e.g., in CWEs) and to determine severity ratings. This has led some studies to use CVE datasets as benchmarks for LLM-based threat analysis. We reproduce and extend one such benchmark, testing three approaches: (i) TF-IDF embeddings with shallow classifiers, (ii) LLM-generated embeddings with shallow classifiers, and (iii) direct prompting of LLMs for vulnerability metadata extraction. For the latter, we replicate exact prompts from a recent benchmark and evaluate multiple state-of-the-art LLMs. Results show that classic TF-IDF classifiers still win the benchmark, followed by the generative method. The best model (TF-IDF) achieves 74% accuracy on the classification task. This appears to be caused by the heavily schematic texts of CVEs, where keywords already determine key characteristics of the vulnerability. General purpose LLMs with generic prompts fail to capture that. These results call for better evaluation by the community of the application of LLMs to cybersecurity problems.

On Using LLMs for Vulnerability Classification

Talibzade, Rustam;Drago, Idilio;Bergadano, Francesco

2025-01-01

Abstract

Vulnerability management requires identifying, classifying, and prioritizing security threats. Recent research has explored using Large Language Models (LLMs) to analyze Common Vulnerabilities and Exposures (CVEs), generating metadata to categorize vulnerabilities (e.g., in CWEs) and to determine severity ratings. This has led some studies to use CVE datasets as benchmarks for LLM-based threat analysis. We reproduce and extend one such benchmark, testing three approaches: (i) TF-IDF embeddings with shallow classifiers, (ii) LLM-generated embeddings with shallow classifiers, and (iii) direct prompting of LLMs for vulnerability metadata extraction. For the latter, we replicate exact prompts from a recent benchmark and evaluate multiple state-of-the-art LLMs. Results show that classic TF-IDF classifiers still win the benchmark, followed by the generative method. The best model (TF-IDF) achieves 74% accuracy on the classification task. This appears to be caused by the heavily schematic texts of CVEs, where keywords already determine key characteristics of the vulnerability. General purpose LLMs with generic prompts fail to capture that. These results call for better evaluation by the community of the application of LLMs to cybersecurity problems.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Titolo dell'evento
	
				ACM CCS Workshop on Large AI Systems and Models with Privacy and Security Analysis
			
	Luogo dell'evento
	
				Taipei, Taiwan
			
	Data dell'evento
	
				13-17 October, 2025
			
	Titolo del volume
	
				LAMPS '25: ACM CCS Workshop on Large AI Systems and Models with Privacy and Security Analysis
			
	Nome editore
	
				ACM
			
	Pagine (da)
	
				68
			
	Pagine (a)
	
				71
			
	Codice ISBN
	
				9798400715259
			
	DOI
	
				https://dx.doi.org/10.1145/3733800.3763267
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				https://dl.acm.org/doi/10.1145/3733800.3763267
			
	Parole Chiave
	
				Common Vulnerabilities and Exposures (CVE), Common Weakness Enumeration (CWE), Large Language Models (LLMs), TF-IDF,
Vulnerability Management, Software Security
			
	Tutti gli autori
	
						Talibzade, Rustam; Drago, Idilio; Bergadano, Francesco
					
	Appare nelle tipologie:
	
				04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
3733800.3763267.pdf Accesso riservato Dimensione 625.97 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	625.97 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2109972

Citazioni

ND

ND

ND

social impact