CINECA IRIS Institutional Research Information System

Transparency is a non-functional requirement of machine learning that promotes interpretable or easily explainable outcomes. Unfortunately, interpretable classification models, such as linear, rule-based, and decision tree models, are superseded by more accurate but complex learning paradigms, such as deep neural networks and ensemble methods. For tabular data classification, more specifically, models based on gradient-boosted tree ensembles, such as XGBoost, are still competitive compared to deep learning ones, so they are often preferred to the latter. However, they share the same interpretability issues, due to the complexity of the learnt model and, consequently, of the predictions. While the problem of computing local explanations is largely addressed, the problem of extracting global explanations is scarcely investigated. Existing solutions consist of computing some feature importance score, or extracting approximate surrogate trees from the learnt forest, or even using a black-box explainability method. However, those methods either have poor fidelity or their comprehensibility is questionable. In this paper, we propose to fill this gap by leveraging the strong theoretical basis of the SHAP framework in the context of co-clustering and feature selection. As a result, we are able to extract shallow decision trees that explain XGBoost with competitive fidelity and higher comprehensibility compared to two recent state-of-the-art competitors.

Combining SHAP-driven Co-clustering and Shallow Decision Trees to Explain XGBoost

R. G. Pensa^{First

Membro del Collaboration Group};

2025-01-01

Abstract

Transparency is a non-functional requirement of machine learning that promotes interpretable or easily explainable outcomes. Unfortunately, interpretable classification models, such as linear, rule-based, and decision tree models, are superseded by more accurate but complex learning paradigms, such as deep neural networks and ensemble methods. For tabular data classification, more specifically, models based on gradient-boosted tree ensembles, such as XGBoost, are still competitive compared to deep learning ones, so they are often preferred to the latter. However, they share the same interpretability issues, due to the complexity of the learnt model and, consequently, of the predictions. While the problem of computing local explanations is largely addressed, the problem of extracting global explanations is scarcely investigated. Existing solutions consist of computing some feature importance score, or extracting approximate surrogate trees from the learnt forest, or even using a black-box explainability method. However, those methods either have poor fidelity or their comprehensibility is questionable. In this paper, we propose to fill this gap by leveraging the strong theoretical basis of the SHAP framework in the context of co-clustering and feature selection. As a result, we are able to extract shallow decision trees that explain XGBoost with competitive fidelity and higher comprehensibility compared to two recent state-of-the-art competitors.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Titolo dell'evento
	
				27th International Conference on Discovery Science 2024
			
	Luogo dell'evento
	
				Pisa, Italy
			
	Data dell'evento
	
				October 14-16, 2024
			
	Titolo del volume
	
				Discovery Science - 27th International Conference, DS 2024, Pisa, Italy, October 14-16, 2024, Proceedings, LNCS
			
	Nome editore
	
				Springer Nature
			
	N. Volume
	
				15243
			
	Pagine (da)
	
				369
			
	Pagine (a)
	
				384
			
	Codice ISBN
	
				9783031789762
9783031789779
			
	DOI
	
				https://dx.doi.org/10.1007/978-3-031-78977-9_24
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				https://link.springer.com/chapter/10.1007/978-3-031-78977-9_24
			
	Parole Chiave
	
				Explainable AI, SHAP values, Co-clustering
			
	Tutti gli autori
	
						R.G. Pensa, A. Crombach, S. Peignier, C. Rigotti
					
	Appare nelle tipologie:
	
				04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
ds2024_author.pdf Accesso aperto Descrizione: PDF autore Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE) Dimensione 581.28 kB Formato Adobe PDF Visualizza/Apri	581.28 kB	Adobe PDF	Visualizza/Apri
ds2024_printed.pdf Accesso riservato Descrizione: PDF editoriale Tipo di file: PDF EDITORIALE Dimensione 337.4 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	337.4 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2032194

Citazioni

ND

1

0

social impact