CINECA IRIS Institutional Research Information System

Recently, several scholars have contributed to the growth of a new theoretical framework in NLP called perspectivism. This approach aims to leverage data annotated by different individuals to model diverse perspectives that affect their opinions on subjective phenomena such as irony. In this context, we propose MultiPICo, a multilingual perspectivist corpus of ironic short conversations in different languages and linguistic varieties extracted from Twitter and Reddit. The corpus includes sociodemographic information about its annotators. Our analysis of the annotated corpus shows how different demographic cohorts may significantly disagree on their annotation of irony and how certain cultural factors influence the perception of the phenomenon and the agreement on the annotation. Moreover, we show how disaggregated annotations and rich annotator metadata can be exploited to benchmark the ability of large language models to recognize irony, their positionality with respect to sociodemographic groups, and the efficacy of perspective-taking prompting for irony detection in multiple languages.

MultiPICo: Multilingual Perspectivist Irony Corpus

Casola S.;Frenda S.;Lo S. M.;Sezerer E.;Uva A.;Basile V.;Bosco C.;Pedrani A.;Rubagotti C.;Patti V.;Bernardi D.

2024-01-01

Abstract

Recently, several scholars have contributed to the growth of a new theoretical framework in NLP called perspectivism. This approach aims to leverage data annotated by different individuals to model diverse perspectives that affect their opinions on subjective phenomena such as irony. In this context, we propose MultiPICo, a multilingual perspectivist corpus of ironic short conversations in different languages and linguistic varieties extracted from Twitter and Reddit. The corpus includes sociodemographic information about its annotators. Our analysis of the annotated corpus shows how different demographic cohorts may significantly disagree on their annotation of irony and how certain cultural factors influence the perception of the phenomenon and the agreement on the annotation. Moreover, we show how disaggregated annotations and rich annotator metadata can be exploited to benchmark the ability of large language models to recognize irony, their positionality with respect to sociodemographic groups, and the efficacy of perspective-taking prompting for irony detection in multiple languages.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Titolo dell'evento
	
				Association for Computational Linguistics
			
	Luogo dell'evento
	
				Bangkok
			
	Data dell'evento
	
				2024
			
	Titolo del volume
	
				Proceedings of the Annual Meeting of the Association for Computational Linguistics
			
	Nome editore
	
				Association for Computational Linguistics (ACL)
			
	N. Volume
	
				1
			
	Pagine (da)
	
				16008
			
	Pagine (a)
	
				16021
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				https://aclanthology.org/2024.acl-long.849/
			
	Tutti gli autori
	
						Casola S.; Frenda S.; Lo S.M.; Sezerer E.; Uva A.; Basile V.; Bosco C.; Pedrani A.; Rubagotti C.; Patti V.; Bernardi D.
					
	Appare nelle tipologie:
	
				04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
2024.acl-long.849.pdf Accesso aperto Tipo di file: PDF EDITORIALE Dimensione 394.22 kB Formato Adobe PDF Visualizza/Apri	394.22 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2029669

Citazioni

ND

11

ND

social impact