CINECA IRIS Institutional Research Information System

The detection of abusive or offensive remarks in social texts has received significant attention in research. In several related shared tasks, BERT has been shown to be the state-of-the-art. In this paper, we propose to utilize lexical features derived from a hate lexicon towards improving the performance of BERT in such tasks. We explore different ways to utilize the lexical features in the form of lexicon-based encodings at the sentence level or embeddings at the word level. We provide an extensive dataset evaluation that addresses in-domain as well as cross-domain detection of abusive content to render a complete picture. Our results indicate that our proposed models combining BERT with lexical features help improve over a baseline BERT model in many of our in-domain and cross-domain experiments.

HurtBERT: Incorporating Lexical Features with BERT for the Detection of Abusive Language

Koufakou, Anna;Pamungkas, Endang Wahyu;Basile, Valerio;Patti, Viviana

2020-01-01

Abstract

The detection of abusive or offensive remarks in social texts has received significant attention in research. In several related shared tasks, BERT has been shown to be the state-of-the-art. In this paper, we propose to utilize lexical features derived from a hate lexicon towards improving the performance of BERT in such tasks. We explore different ways to utilize the lexical features in the form of lexicon-based encodings at the sentence level or embeddings at the word level. We provide an extensive dataset evaluation that addresses in-domain as well as cross-domain detection of abusive content to render a complete picture. Our results indicate that our proposed models combining BERT with lexical features help improve over a baseline BERT model in many of our in-domain and cross-domain experiments.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Titolo dell'evento
	
				Fourth Workshop on Online Abuse and Harms
			
	Luogo dell'evento
	
				Online
			
	Data dell'evento
	
				November 2020
			
	Titolo del volume
	
				Proceedings of the Fourth Workshop on Online Abuse and Harms
			
	Nome editore
	
				Association for Computational Linguistics
			
	Pagine (da)
	
				34
			
	Pagine (a)
	
				43
			
	DOI
	
				https://dx.doi.org/10.18653/v1/2020.alw-1.5
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				https://www.aclweb.org/anthology/2020.alw-1.5
			
	Parole Chiave
	
				abusive language detection, linguistically informed deep learning, social media
			
	Tutti gli autori
	
						Koufakou, Anna; Pamungkas, Endang Wahyu; Basile, Valerio; Patti, Viviana
					
	Appare nelle tipologie:
	
				04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
2020.alw-1.5.pdf Accesso aperto Tipo di file: PDF EDITORIALE Dimensione 444.21 kB Formato Adobe PDF Visualizza/Apri	444.21 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1769037

Citazioni

ND

ND

ND

social impact