Data perspectivism goes beyond majority vote label aggregation by recognizing various perspectives as legitimate ground truths.However, current evaluation practices remain fragmented, making it difficult to compare perspectivist approaches and analyze their impact on different users and demographic subgroups. To address this gap, we introduce PersEval, the first unified framework for evaluating perspectivist models in NLP. A key innovation is its evaluation at the individual annotator level and its treatment of annotators and users as distinct entities, consistently with real-world scenarios. We demonstrate PersEval’s capabilities through experiments with both Encoder-based and Decoder-based approaches, as well as an analysis of the effect of sociodemographic prompting. By considering global, text-, trait- and user-level evaluation metrics, we show that PersEval is a powerful tool for examining how models are influenced by user-specific information and identifying the biases this information may introduce.

PERSEVAL: A Framework for Perspectivist Classification Evaluation

Lo, Soda Marem;Casola, Silvia;Basile, Valerio;Sansonetti, Franco;
2025-01-01

Abstract

Data perspectivism goes beyond majority vote label aggregation by recognizing various perspectives as legitimate ground truths.However, current evaluation practices remain fragmented, making it difficult to compare perspectivist approaches and analyze their impact on different users and demographic subgroups. To address this gap, we introduce PersEval, the first unified framework for evaluating perspectivist models in NLP. A key innovation is its evaluation at the individual annotator level and its treatment of annotators and users as distinct entities, consistently with real-world scenarios. We demonstrate PersEval’s capabilities through experiments with both Encoder-based and Decoder-based approaches, as well as an analysis of the effect of sociodemographic prompting. By considering global, text-, trait- and user-level evaluation metrics, we show that PersEval is a powerful tool for examining how models are influenced by user-specific information and identifying the biases this information may introduce.
2025
Conference on Empirical Methods in Natural Language Processing
Suzhou, China
November 2025
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Association for Computational Linguistics
22345
22370
979-8-89176-332-6
https://aclanthology.org/2025.emnlp-main.1137.pdf
Lo, Soda Marem; Casola, Silvia; Sezerer, Erhan; Basile, Valerio; Sansonetti, Franco; Uva, Antonio; Bernardi, Davide
File in questo prodotto:
File Dimensione Formato  
2025.emnlp-main.1137.pdf

Accesso aperto

Tipo di file: PDF EDITORIALE
Dimensione 487.82 kB
Formato Adobe PDF
487.82 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2117259
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact