CINECA IRIS Institutional Research Information System

The number of social media users is ever-increasing. Unfortunately, this has also resulted in the massive rise of uncensored online hate against vulnerable communities such as immigrants, LGBT and women. Current work on the automatic detection of various forms of hate speech (HS) typically employs supervised learning, requiring manually annotated data. The highly polarizing nature of the topics involved raises concerns about the quality of annotations these systems rely on, because not all the annotators are equally sensitive to different kinds of hate speech. We propose an approach to leverage the fine-grained knowledge expressed by individual annotators, before their subjectivity is averaged out by the gold standard creation process. This helps us to refine the quality of training sets for hate speech detection. We introduce a measure of polarization at the level of single instances in the data to manipulate the training set and reduce the impact of most polarizing text on the learning process. We test our approach on three datasets, in English and Italian, annotated by experts and workers hired on a crowdsourcing platform. We classify instances of sexist, racist, and homophobic hate speech in tweets and show how our approach improves the prediction performance of a supervised classifier. Moreover, the proposed polarization measure helps towards the manual exploration of the individual instances of tweets in our datasets.

A New Measure of Polarization in the Annotation of Hate Speech

Akhtar, Sohail;Basile, Valerio;Patti, Viviana

2019-01-01

Abstract

The number of social media users is ever-increasing. Unfortunately, this has also resulted in the massive rise of uncensored online hate against vulnerable communities such as immigrants, LGBT and women. Current work on the automatic detection of various forms of hate speech (HS) typically employs supervised learning, requiring manually annotated data. The highly polarizing nature of the topics involved raises concerns about the quality of annotations these systems rely on, because not all the annotators are equally sensitive to different kinds of hate speech. We propose an approach to leverage the fine-grained knowledge expressed by individual annotators, before their subjectivity is averaged out by the gold standard creation process. This helps us to refine the quality of training sets for hate speech detection. We introduce a measure of polarization at the level of single instances in the data to manipulate the training set and reduce the impact of most polarizing text on the learning process. We test our approach on three datasets, in English and Italian, annotated by experts and workers hired on a crowdsourcing platform. We classify instances of sexist, racist, and homophobic hate speech in tweets and show how our approach improves the prediction performance of a supervised classifier. Moreover, the proposed polarization measure helps towards the manual exploration of the individual instances of tweets in our datasets.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Titolo dell'evento
	
				XVIIIth International Conference of the Italian Association for Artificial Intelligence
			
	Luogo dell'evento
	
				Rende, Italy
			
	Data dell'evento
	
				November 19–22, 2019
			
	Titolo del volume
	
				AI*IA 2019 -- Advances in Artificial Intelligence
			
	Nome editore
	
				Springer International Publishing
			
	N. Volume
	
				11946
			
	Pagine (da)
	
				588
			
	Pagine (a)
	
				603
			
	Codice ISBN
	
				978-3-030-35165-6
978-3-030-35166-3
			
	DOI
	
				https://dx.doi.org/10.1007/978-3-030-35166-3_41
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				https://link.springer.com/content/pdf/10.1007/978-3-030-35166-3_41.pdf
			
	Parole Chiave
	
				Hate speech detection, Linguistic annotation , Inter-rater agreement, Data augmentation
			
	Tutti gli autori
	
						Akhtar, Sohail; Basile, Valerio; Patti, Viviana
					
	Appare nelle tipologie:
	
				04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
2019_Chapter_SoValViviaixia2019.pdf Accesso riservato Tipo di file: PDF EDITORIALE Dimensione 492.14 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	492.14 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1715940

Citazioni

ND

34

11

social impact