CINECA IRIS Institutional Research Information System

Data perspectivism goes beyond majority vote label aggregation by recognizing various perspectives as legitimate ground truths.However, current evaluation practices remain fragmented, making it difficult to compare perspectivist approaches and analyze their impact on different users and demographic subgroups. To address this gap, we introduce PersEval, the first unified framework for evaluating perspectivist models in NLP. A key innovation is its evaluation at the individual annotator level and its treatment of annotators and users as distinct entities, consistently with real-world scenarios. We demonstrate PersEval’s capabilities through experiments with both Encoder-based and Decoder-based approaches, as well as an analysis of the effect of sociodemographic prompting. By considering global, text-, trait- and user-level evaluation metrics, we show that PersEval is a powerful tool for examining how models are influenced by user-specific information and identifying the biases this information may introduce.

PERSEVAL: A Framework for Perspectivist Classification Evaluation

Lo, Soda Marem;Casola, Silvia;Sezerer, Erhan;Basile, Valerio;Sansonetti, Franco;Uva, Antonio;Bernardi, Davide

2025-01-01

Abstract

Data perspectivism goes beyond majority vote label aggregation by recognizing various perspectives as legitimate ground truths.However, current evaluation practices remain fragmented, making it difficult to compare perspectivist approaches and analyze their impact on different users and demographic subgroups. To address this gap, we introduce PersEval, the first unified framework for evaluating perspectivist models in NLP. A key innovation is its evaluation at the individual annotator level and its treatment of annotators and users as distinct entities, consistently with real-world scenarios. We demonstrate PersEval’s capabilities through experiments with both Encoder-based and Decoder-based approaches, as well as an analysis of the effect of sociodemographic prompting. By considering global, text-, trait- and user-level evaluation metrics, we show that PersEval is a powerful tool for examining how models are influenced by user-specific information and identifying the biases this information may introduce.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Titolo dell'evento
	
				Conference on Empirical Methods in Natural Language Processing
			
	Luogo dell'evento
	
				Suzhou, China
			
	Data dell'evento
	
				November 2025
			
	Titolo del volume
	
				Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
			
	Nome editore
	
				Association for Computational Linguistics
			
	Pagine (da)
	
				22345
			
	Pagine (a)
	
				22370
			
	Codice ISBN
	
				979-8-89176-332-6
			
	DOI
	
				https://dx.doi.org/10.18653/v1/2025.emnlp-main.1137
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				https://aclanthology.org/2025.emnlp-main.1137.pdf
			
	Tutti gli autori
	
						Lo, Soda Marem; Casola, Silvia; Sezerer, Erhan; Basile, Valerio; Sansonetti, Franco; Uva, Antonio; Bernardi, Davide
					
	Appare nelle tipologie:
	
				04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
2025.emnlp-main.1137.pdf Accesso aperto Tipo di file: PDF EDITORIALE Dimensione 487.82 kB Formato Adobe PDF Visualizza/Apri	487.82 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2117259

Citazioni

ND

ND

ND

social impact