Modeling annotator perspectives has emerged as a technique to model subjective linguistic phenomena more accurately. Authors in the NLP community approached this issue by creating perspective-aware and personalized models, where demographic data or previous annotations are needed. In this paper, we explore two methodologies to represent annotators solely on the basis of the labels they assigned: label agreement and Kernel PCA. For both these techniques, we computed respectively 5 and 4 clusters, trained perspective-aware models on each of them, and finally implemented majority vote ensembles. The results show that clusters obtained by the first mining technique are more balanced and homogeneous in terms of annotators' demographic traits, while those obtained by KPCA tend to correlate more with their nationalities. Despite these differences, both ensemble models outperform the baseline, confirming that leveraging annotation using clustering techniques is advantageous for the classification of a subjective phenomenon such as irony. We sustain that this approach can be beneficial for taking into account annotators' perspectives when demographic data are not known, together with the possibility that their annotations might be influenced by factors other than given demographics.

Hierarchical Clustering of Label-based Annotator Representations for Mining Perspectives

Lo S. M.
First
;
Basile V.
Last
2023-01-01

Abstract

Modeling annotator perspectives has emerged as a technique to model subjective linguistic phenomena more accurately. Authors in the NLP community approached this issue by creating perspective-aware and personalized models, where demographic data or previous annotations are needed. In this paper, we explore two methodologies to represent annotators solely on the basis of the labels they assigned: label agreement and Kernel PCA. For both these techniques, we computed respectively 5 and 4 clusters, trained perspective-aware models on each of them, and finally implemented majority vote ensembles. The results show that clusters obtained by the first mining technique are more balanced and homogeneous in terms of annotators' demographic traits, while those obtained by KPCA tend to correlate more with their nationalities. Despite these differences, both ensemble models outperform the baseline, confirming that leveraging annotation using clustering techniques is advantageous for the classification of a subjective phenomenon such as irony. We sustain that this approach can be beneficial for taking into account annotators' perspectives when demographic data are not known, together with the possibility that their annotations might be influenced by factors other than given demographics.
2023
2nd Workshop on Perspectivist Approaches to NLP, NLPerspectives 2023
pol
2023
Proceedings of the 2nd Workshop on Perspectivist Approaches to NLP co-located with 26th European Conference on Artificial Intelligence (ECAI 2023)
CEUR-WS
3494
1
10
https://ceur-ws.org/Vol-3494/paper8.pdf
clustering; irony detection; Perspectivism
Lo S.M.; Basile V.
File in questo prodotto:
File Dimensione Formato  
hierarchical_clustering.pdf

Accesso aperto

Tipo di file: PDF EDITORIALE
Dimensione 1.17 MB
Formato Adobe PDF
1.17 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2080774
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact