This paper presents a novel pipeline for transforming flat-labeled text collections into a hierarchical structure, which involves leveraging simple yet effective similarity methods that account for both lexical and semantic criteria to associate labels from disparate sources. Our approach employs a custom similarity measure, the Reinforced Edit Similarity, to identify probable correspondences based on lexical similarities. A subsequent semantic alignment and validation phase is then performed using an automatic classification mechanism. Preliminary results attest to the effectiveness of our proposal. These results are obtained from the research group of the University of Torino in the NGUPP project
Organizing the Unorganized: A Novel Approach for Transferring a Taxonomy of Labels into Flat-Labeled Document Collections
Michele Colombino
First
;Laurentiu Jr Marius Zaharia
;Giorgia Iacobellis
;Rachele Mignone;Ivan Spada;Chiara Bonfanti;Emilio Sulis;Luigi Di Caro;Guido Boella
2023-01-01
Abstract
This paper presents a novel pipeline for transforming flat-labeled text collections into a hierarchical structure, which involves leveraging simple yet effective similarity methods that account for both lexical and semantic criteria to associate labels from disparate sources. Our approach employs a custom similarity measure, the Reinforced Edit Similarity, to identify probable correspondences based on lexical similarities. A subsequent semantic alignment and validation phase is then performed using an automatic classification mechanism. Preliminary results attest to the effectiveness of our proposal. These results are obtained from the research group of the University of Torino in the NGUPP projectFile | Dimensione | Formato | |
---|---|---|---|
Organizing the Unorganized- A Novel Approach for Transferring a Taxonomy of Labels into Flat-Labeled Document Collections_ASAIL@ICAIL2023.pdf
Accesso aperto
Dimensione
898.57 kB
Formato
Adobe PDF
|
898.57 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.