Categorical features are widespread in many decision support systems relying on personal and sensitive data, such as credit scoring or personalized medicine and are not exempt of bias and fairness concerns. Unfortunately, bias mitigation techniques based on representation learning for categorical data are poorly studied and most solutions are limited to using the same approaches designed for numeric data on one-hot encoded features. To fill this gap, we propose FairDILCA, a fair extension of a known framework for learning distances on categorical data, which exploits co-distributions of attributes values for computing distances. FairDILCA considers the correlation of the features w.r.t. the protected one to create an unbiased representation of the data, making any subsequent analysis and learning task fairer. Furthermore, it also represents a more interpretable option than typical representation learning approaches, since it relies on deterministic and clear computational steps. Thanks to extensive experiments, we show the effectiveness of our framework also when applied to a classification task and in comparison with a state-of-the-art method pursuing a similar objective.
Interpretable Fair Distance Learning for Categorical Data
A. FamianiCo-first
Membro del Collaboration Group
;F. PeirettiCo-first
Membro del Collaboration Group
;R. G. Pensa
Last
Membro del Collaboration Group
In corso di stampa
Abstract
Categorical features are widespread in many decision support systems relying on personal and sensitive data, such as credit scoring or personalized medicine and are not exempt of bias and fairness concerns. Unfortunately, bias mitigation techniques based on representation learning for categorical data are poorly studied and most solutions are limited to using the same approaches designed for numeric data on one-hot encoded features. To fill this gap, we propose FairDILCA, a fair extension of a known framework for learning distances on categorical data, which exploits co-distributions of attributes values for computing distances. FairDILCA considers the correlation of the features w.r.t. the protected one to create an unbiased representation of the data, making any subsequent analysis and learning task fairer. Furthermore, it also represents a more interpretable option than typical representation learning approaches, since it relies on deterministic and clear computational steps. Thanks to extensive experiments, we show the effectiveness of our framework also when applied to a classification task and in comparison with a state-of-the-art method pursuing a similar objective.File | Dimensione | Formato | |
---|---|---|---|
bias2024_author.pdf
Accesso aperto
Descrizione: PDF Author copy
Tipo di file:
POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione
5.21 MB
Formato
Adobe PDF
|
5.21 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.