Categorical features are widespread in many decision support systems relying on personal and sensitive data, such as credit scoring or personalized medicine and are not exempt of bias and fairness concerns. Unfortunately, bias mitigation techniques based on representation learning for categorical data are poorly studied and most solutions are limited to using the same approaches designed for numeric data on one-hot encoded features. To fill this gap, we propose FairDILCA, a fair extension of a known framework for learning distances on categorical data, which exploits co-distributions of attributes values for computing distances. FairDILCA considers the correlation of the features w.r.t. the protected one to create an unbiased representation of the data, making any subsequent analysis and learning task fairer. Furthermore, it also represents a more interpretable option than typical representation learning approaches, since it relies on deterministic and clear computational steps. Thanks to extensive experiments, we show the effectiveness of our framework also when applied to a classification task and in comparison with a state-of-the-art method pursuing a similar objective.

Interpretable Fair Distance Learning for Categorical Data

A. Famiani
Co-first
Membro del Collaboration Group
;
F. Peiretti
Co-first
Membro del Collaboration Group
;
R. G. Pensa
Last
Membro del Collaboration Group
In corso di stampa

Abstract

Categorical features are widespread in many decision support systems relying on personal and sensitive data, such as credit scoring or personalized medicine and are not exempt of bias and fairness concerns. Unfortunately, bias mitigation techniques based on representation learning for categorical data are poorly studied and most solutions are limited to using the same approaches designed for numeric data on one-hot encoded features. To fill this gap, we propose FairDILCA, a fair extension of a known framework for learning distances on categorical data, which exploits co-distributions of attributes values for computing distances. FairDILCA considers the correlation of the features w.r.t. the protected one to create an unbiased representation of the data, making any subsequent analysis and learning task fairer. Furthermore, it also represents a more interpretable option than typical representation learning approaches, since it relies on deterministic and clear computational steps. Thanks to extensive experiments, we show the effectiveness of our framework also when applied to a classification task and in comparison with a state-of-the-art method pursuing a similar objective.
In corso di stampa
4th Workshop on Bias and Fairness in AI (BIAS 2024), co-located with ECML PKDD 2024
Vilnius (Lithuania)
September 13, 2024
Machine Learning and Principles and Practice of Knowledge Discovery in Databases - International Workshops of ECML PKDD 2024, Vilnius, Lithuania, September 9-13, 2024
Springer Nature
1
16
https://drive.google.com/file/d/1d7oDmA2BEPGymK_3tqL4VPAhGXXmkNMx/view?usp=sharing
Categorical features, Distance learning, Fairness
A. Famiani, F. Peiretti, R.G. Pensa
File in questo prodotto:
File Dimensione Formato  
bias2024_author.pdf

Accesso aperto

Descrizione: PDF Author copy
Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione 5.21 MB
Formato Adobe PDF
5.21 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2032190
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact