The rapid advancement of Large Language Models (LLMs) has highlighted the need for robust tools to evaluate them correctly. A major challenge in developing models that serve non-English speakers lies in the predominance of benchmarks that are either in English or machine translated from it. Evaluating the performance of multilingual or language-specific models requires native-language resources. In this paper, we present EVALITA-LLM a benchmark entirely composed of datasets in native Italian and adjusted to assess LLMs capabilities. The benchmark consists of 10 tasks that cover key aspects of NLP. We also provide prompts for all tasks that are designed to follow specific criteria. In order to avoid prompt sensibility, the evaluation of the models considers different methodologies to combine the scores obtained on different prompts.
Evaluating large language models on Italian tasks
Bernardo Magnini
;Marco Madeddu;Viviana Patti
2025-01-01
Abstract
The rapid advancement of Large Language Models (LLMs) has highlighted the need for robust tools to evaluate them correctly. A major challenge in developing models that serve non-English speakers lies in the predominance of benchmarks that are either in English or machine translated from it. Evaluating the performance of multilingual or language-specific models requires native-language resources. In this paper, we present EVALITA-LLM a benchmark entirely composed of datasets in native Italian and adjusted to assess LLMs capabilities. The benchmark consists of 10 tasks that cover key aspects of NLP. We also provide prompts for all tasks that are designed to follow specific criteria. In order to avoid prompt sensibility, the evaluation of the models considers different methodologies to combine the scores obtained on different prompts.| File | Dimensione | Formato | |
|---|---|---|---|
|
Ital-IA_2025_paper_112.pdf
Accesso aperto
Tipo di file:
PDF EDITORIALE
Dimensione
950.75 kB
Formato
Adobe PDF
|
950.75 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



