Multi-Label Classification (MLC) is an extension of the standard single-label classification where each data instance is associated with several labels simultaneously. MLC has gained much importance in re- cent years due to its wide range of application domains. However, the class imbalance problem has be- come an inherent characteristic of many multi-label datasets, where the samples and their correspond- ing labels are non-uniformly distributed over the data space. The imbalanced problem in MLC imposes challenges to multi-label data analytics which can be viewed from three perspectives: imbalance within labels, among labels, and label-sets. In this paper, we provide a review of the approaches for handling the imbalance problem in multi-label data by collecting the existing research work. As the first system- atic study of approaches addressing an imbalanced problem in MLC, this paper provides a comprehensive survey of the state-of-the-art methods for imbalanced MLC, including the characteristics of imbalanced multi-label datasets, evaluation measures and comparative analysis of the proposed methods. The study also discusses important results reported so far in the literature and highlights some of their strengths and limitations to guide future research.

A review of methods for imbalanced multi-label classification

Adane Nega Tarekegn
First
;
Mario Giacobini;
2021-01-01

Abstract

Multi-Label Classification (MLC) is an extension of the standard single-label classification where each data instance is associated with several labels simultaneously. MLC has gained much importance in re- cent years due to its wide range of application domains. However, the class imbalance problem has be- come an inherent characteristic of many multi-label datasets, where the samples and their correspond- ing labels are non-uniformly distributed over the data space. The imbalanced problem in MLC imposes challenges to multi-label data analytics which can be viewed from three perspectives: imbalance within labels, among labels, and label-sets. In this paper, we provide a review of the approaches for handling the imbalance problem in multi-label data by collecting the existing research work. As the first system- atic study of approaches addressing an imbalanced problem in MLC, this paper provides a comprehensive survey of the state-of-the-art methods for imbalanced MLC, including the characteristics of imbalanced multi-label datasets, evaluation measures and comparative analysis of the proposed methods. The study also discusses important results reported so far in the literature and highlights some of their strengths and limitations to guide future research.
2021
118
107965
1
12
https://www.sciencedirect.com/science/article/pii/S0031320321001527
Imbalanced Data, Multi-label Classification, Imbalanced Classification, Machine learning
Adane Nega Tarekegn, Mario Giacobini, Krzysztof Michalak
File in questo prodotto:
File Dimensione Formato  
Final published version.pdf

Accesso riservato

Tipo di file: PDF EDITORIALE
Dimensione 1.07 MB
Formato Adobe PDF
1.07 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Final editable author version.docx

Accesso aperto

Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione 210.08 kB
Formato Microsoft Word XML
210.08 kB Microsoft Word XML Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1807891
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 177
  • ???jsp.display-item.citation.isi??? 140
social impact