CINECA IRIS Institutional Research Information System

The rapid advancement of Large Language Models (LLMs) has highlighted the need for robust tools to evaluate them correctly. A major challenge in developing models that serve non-English speakers lies in the predominance of benchmarks that are either in English or machine translated from it. Evaluating the performance of multilingual or language-specific models requires native-language resources. In this paper, we present EVALITA-LLM a benchmark entirely composed of datasets in native Italian and adjusted to assess LLMs capabilities. The benchmark consists of 10 tasks that cover key aspects of NLP. We also provide prompts for all tasks that are designed to follow specific criteria. In order to avoid prompt sensibility, the evaluation of the models considers different methodologies to combine the scores obtained on different prompts.

Evaluating large language models on Italian tasks

Bernardo Magnini;Roberto Zanoli;Michele Resta;Martin Cimmino;Paolo Albano;Marco Madeddu;Viviana Patti

2025-01-01

Abstract

The rapid advancement of Large Language Models (LLMs) has highlighted the need for robust tools to evaluate them correctly. A major challenge in developing models that serve non-English speakers lies in the predominance of benchmarks that are either in English or machine translated from it. Evaluating the performance of multilingual or language-specific models requires native-language resources. In this paper, we present EVALITA-LLM a benchmark entirely composed of datasets in native Italian and adjusted to assess LLMs capabilities. The benchmark consists of 10 tasks that cover key aspects of NLP. We also provide prompts for all tasks that are designed to follow specific criteria. In order to avoid prompt sensibility, the evaluation of the models considers different methodologies to combine the scores obtained on different prompts.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Titolo dell'evento
	
				Thematic Workshops at Ital-IA 2025
			
	Luogo dell'evento
	
				Trieste, Italy
			
	Data dell'evento
	
				June 23-24, 2025
			
	Titolo del volume
	
				Joint Proceedings of the Thematic Workshops at Ital-IA 2025 colocated with the 5th National Conference on Artificial Intelligence, organized by CINI (Ital-IA 2025)
			
	Nome editore
	
				CEUR Workshop Proceedings
			
	N. Volume
	
				4121
			
	Pagine (da)
	
				1
			
	Pagine (a)
	
				6
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				https://ceur-ws.org/Vol-4121/Ital-IA_2025_paper_112.pdf
			
	Parole Chiave
	
				Benchmark, Italian, Evaluation, Large Language Models
			
	Tutti gli autori
	
						Bernardo Magnini, Roberto Zanoli, Michele Resta, Martin Cimmino,
Paolo Albano, Marco Madeddu, Viviana Patti
					
	Appare nelle tipologie:
	
				04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
Ital-IA_2025_paper_112.pdf Accesso aperto Tipo di file: PDF EDITORIALE Dimensione 950.75 kB Formato Adobe PDF Visualizza/Apri	950.75 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2121312

Citazioni

ND

ND

ND

social impact