CINECA IRIS Institutional Research Information System

Hate Speech in social media is a complex phenomenon, whose detection has recently gained significant traction in the Natural Language Processing community, as attested by several recent review works. Annotated corpora and benchmarks are key resources, considering the vast number of supervised approaches that have been proposed. Lexica play an important role as well for the development of hate speech detection systems. In this review, we systematically analyze the resources made available by the community at large, including their development methodology, topical focus, language coverage, and other factors. The results of our analysis highlight a heterogeneous, growing landscape, marked by several issues and venues for improvement.

Resources and benchmark corpora for hate speech detection: a systematic review

poletto fabio;basile valerio;sanguinetti manuela;bosco cristina;viviana patti

2021-01-01

Abstract

Hate Speech in social media is a complex phenomenon, whose detection has recently gained significant traction in the Natural Language Processing community, as attested by several recent review works. Annotated corpora and benchmarks are key resources, considering the vast number of supervised approaches that have been proposed. Lexica play an important role as well for the development of hate speech detection systems. In this review, we systematically analyze the resources made available by the community at large, including their development methodology, topical focus, language coverage, and other factors. The results of our analysis highlight a heterogeneous, growing landscape, marked by several issues and venues for improvement.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Titolo rivista
	
				LANGUAGE RESOURCES AND EVALUATION
			
	N. Volume
	
				55
			
	Fascicolo
	
				2
			
	Pagine (da)
	
				477
			
	Pagine (a)
	
				523
			
	DOI
	
				https://dx.doi.org/10.1007/s10579-020-09502-8
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				https://link.springer.com/article/10.1007/s10579-020-09502-8
			
	Parole Chiave
	
				Hate speech detection, Benchmark corpora, Natural Language Processing shared tasks, Systematic review
			
	Tutti gli autori
	
						poletto fabio, basile valerio, sanguinetti manuela, bosco cristina, viviana patti
					
	Appare nelle tipologie:
	
				03A-Articolo su Rivista

File in questo prodotto:

File	Dimensione	Formato
Poletto2020_Article_ResourcesAndBenchmarkCorporaFo.pdf Accesso aperto Descrizione: articolo principale Tipo di file: PDF EDITORIALE Dimensione 571.23 kB Formato Adobe PDF Visualizza/Apri	571.23 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1757913

Citazioni

ND

347

220

social impact