CINECA IRIS Institutional Research Information System

Relation Extraction (RE) is at the core of many Natural Language Understanding tasks, including knowledge-base population and Question Answering. However, any Natural Language Processing system is exposed to biases, and the analysis of these has not received much attention in RE. We propose a new method for inspecting bias in the RE pipeline, which is completely transparent in terms of interpretability. Specifically, in this work we analyze biases related to gender and place of birth. Our methodology includes (i) obtaining semantic triplets (subject, object, semantic relation) involving ‘person’ entities from RE resources, (ii) collecting meta-information (‘gender’ and ‘place of birth’) using Entity Linking technologies, and then (iii) analyze the distribution of triplets across different groups (e.g., men versus women). We investigate bias at two levels: In the training data of three commonly used RE datasets (SREDFM, CrossRE, NYT), and in the predictions of a state-of-the-art RE approach (ReLiK). To enable cross-dataset analysis, we introduce a taxonomy of relation types mapping the label sets of different RE datasets to a unified label space. Our findings reveal that bias is a compounded issue affecting underrepresented groups within data and predictions for RE.

Dissecting Biases in Relation Extraction: A Cross-Dataset Analysis on People’s Gender and Origin

Stranisci M. A.^{Co-first

Membro del Collaboration Group};Bassignana E.;Cabot P. -L. H.;Navigli R.

2024-01-01

Abstract

Relation Extraction (RE) is at the core of many Natural Language Understanding tasks, including knowledge-base population and Question Answering. However, any Natural Language Processing system is exposed to biases, and the analysis of these has not received much attention in RE. We propose a new method for inspecting bias in the RE pipeline, which is completely transparent in terms of interpretability. Specifically, in this work we analyze biases related to gender and place of birth. Our methodology includes (i) obtaining semantic triplets (subject, object, semantic relation) involving ‘person’ entities from RE resources, (ii) collecting meta-information (‘gender’ and ‘place of birth’) using Entity Linking technologies, and then (iii) analyze the distribution of triplets across different groups (e.g., men versus women). We investigate bias at two levels: In the training data of three commonly used RE datasets (SREDFM, CrossRE, NYT), and in the predictions of a state-of-the-art RE approach (ReLiK). To enable cross-dataset analysis, we introduce a taxonomy of relation types mapping the label sets of different RE datasets to a unified label space. Our findings reveal that bias is a compounded issue affecting underrepresented groups within data and predictions for RE.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Titolo dell'evento
	
				5th Workshop on Gender Bias in Natural Language Processing, GeBNLP 2024, held in conjunction with the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
			
	Luogo dell'evento
	
				tha
			
	Data dell'evento
	
				2024
			
	Titolo del volume
	
				GeBNLP 2024 - 5th Workshop on Gender Bias in Natural Language Processing, Proceedings of the Workshop
			
	Nome editore
	
				Association for Computational Linguistics (ACL)
			
	Pagine (da)
	
				190
			
	Pagine (a)
	
				202
			
	Tutti gli autori
	
						Stranisci M.A.; Bassignana E.; Cabot P.-L.H.; Navigli R.
					
	Appare nelle tipologie:
	
				04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
2024.gebnlp-1.12.pdf Accesso aperto Dimensione 346.16 kB Formato Adobe PDF Visualizza/Apri	346.16 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2034310

Citazioni

ND

0

ND

social impact