CINECA IRIS Institutional Research Information System

Categorical features are widespread in many decision support systems relying on personal and sensitive data, such as credit scoring or personalized medicine and are not exempt of bias and fairness concerns. Unfortunately, bias mitigation techniques based on representation learning for categorical data are poorly studied and most solutions are limited to using the same approaches designed for numeric data on one-hot encoded features. To fill this gap, we propose FairDILCA, a fair extension of a known framework for learning distances on categorical data, which exploits co-distributions of attributes values for computing distances. FairDILCA considers the correlation of the features w.r.t. the protected one to create an unbiased representation of the data, making any subsequent analysis and learning task fairer. Furthermore, it also represents a more interpretable option than typical representation learning approaches, since it relies on deterministic and clear computational steps. Thanks to extensive experiments, we show the effectiveness of our framework also when applied to a classification task and in comparison with a state-of-the-art method pursuing a similar objective.

Interpretable Fair Distance Learning for Categorical Data

A. Famiani^{Co-first

Membro del Collaboration Group};F. Peiretti^{Co-first

Membro del Collaboration Group};R. G. Pensa^{Last

Membro del Collaboration Group}

In corso di stampa

Abstract

Categorical features are widespread in many decision support systems relying on personal and sensitive data, such as credit scoring or personalized medicine and are not exempt of bias and fairness concerns. Unfortunately, bias mitigation techniques based on representation learning for categorical data are poorly studied and most solutions are limited to using the same approaches designed for numeric data on one-hot encoded features. To fill this gap, we propose FairDILCA, a fair extension of a known framework for learning distances on categorical data, which exploits co-distributions of attributes values for computing distances. FairDILCA considers the correlation of the features w.r.t. the protected one to create an unbiased representation of the data, making any subsequent analysis and learning task fairer. Furthermore, it also represents a more interpretable option than typical representation learning approaches, since it relies on deterministic and clear computational steps. Thanks to extensive experiments, we show the effectiveness of our framework also when applied to a classification task and in comparison with a state-of-the-art method pursuing a similar objective.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				In corso di stampa
			
	Titolo dell'evento
	
				4th Workshop on Bias and Fairness in AI (BIAS 2024), co-located with ECML PKDD 2024
			
	Luogo dell'evento
	
				Vilnius (Lithuania)
			
	Data dell'evento
	
				September 13, 2024
			
	Titolo del volume
	
				Machine Learning and Principles and Practice of Knowledge Discovery in Databases - International Workshops of ECML PKDD 2024, Vilnius, Lithuania, September 9-13, 2024
			
	Nome editore
	
				Springer Nature
			
	Pagine (da)
	
				1
			
	Pagine (a)
	
				16
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				https://drive.google.com/file/d/1d7oDmA2BEPGymK_3tqL4VPAhGXXmkNMx/view?usp=sharing
			
	Parole Chiave
	
				Categorical features, Distance learning, Fairness
			
	Tutti gli autori
	
						A. Famiani, F. Peiretti, R.G. Pensa
					
	Appare nelle tipologie:
	
				04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
bias2024_author.pdf Accesso aperto Descrizione: PDF Author copy Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE) Dimensione 5.21 MB Formato Adobe PDF Visualizza/Apri	5.21 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2032190

Citazioni

ND

ND

ND

social impact