In this paper we propose an approach to exploit the fine-grained knowledge expressed by individual human annotators during a hate speech (HS) detection task, before the aggregation of single judgments in a gold standard dataset eliminates non-majority perspectives. We automatically divide the annotators into groups, aiming at grouping them by similar personal characteristics (ethnicity, social background, culture etc.). To serve a multi-lingual perspective, we performed classification experiments on three different Twitter datasets in English and Italian languages. We created different gold standards, one for each group, and trained a state-of-the-art deep learning model on them, showing that supervised models informed by different perspectives on the target phenomena outperform a baseline represented by models trained on fully aggregated data. Finally, we implemented an ensemble approach that combines the single perspective-aware classifiers into an inclusive model. The results show that this strategy further improves the classification performance, especially with a significant boost in the recall of HS prediction.

Modeling Annotator Perspective and Polarized Opinions to Improve Hate Speech Detection

Sohail Akhtar
;
Valerio Basile;Viviana Patti
2020-01-01

Abstract

In this paper we propose an approach to exploit the fine-grained knowledge expressed by individual human annotators during a hate speech (HS) detection task, before the aggregation of single judgments in a gold standard dataset eliminates non-majority perspectives. We automatically divide the annotators into groups, aiming at grouping them by similar personal characteristics (ethnicity, social background, culture etc.). To serve a multi-lingual perspective, we performed classification experiments on three different Twitter datasets in English and Italian languages. We created different gold standards, one for each group, and trained a state-of-the-art deep learning model on them, showing that supervised models informed by different perspectives on the target phenomena outperform a baseline represented by models trained on fully aggregated data. Finally, we implemented an ensemble approach that combines the single perspective-aware classifiers into an inclusive model. The results show that this strategy further improves the classification performance, especially with a significant boost in the recall of HS prediction.
2020
AAAI Conference on Human Computation and Crowdsourcing
Hilversum
25-29 ottobre 2020
Proceedings of the AAAI Conference on Human Computation and Crowdsourcing
Association for the Advancement of Artificial Intelligence
8
151
154
978-1-57735-848-0
https://ojs.aaai.org/index.php/HCOMP/article/view/7473
Sohail Akhtar, Valerio Basile, Viviana Patti
File in questo prodotto:
File Dimensione Formato  
7473-Article Text-10855-1-10-20200925.pdf

Accesso aperto

Descrizione: Articolo principale
Tipo di file: PDF EDITORIALE
Dimensione 442.6 kB
Formato Adobe PDF
442.6 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1759758
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 50
  • ???jsp.display-item.citation.isi??? ND
social impact