This paper presents a novel pipeline for transforming flat-labeled text collections into a hierarchical structure, which involves leveraging simple yet effective similarity methods that account for both lexical and semantic criteria to associate labels from disparate sources. Our approach employs a custom similarity measure, the Reinforced Edit Similarity, to identify probable correspondences based on lexical similarities. A subsequent semantic alignment and validation phase is then performed using an automatic classification mechanism. Preliminary results attest to the effectiveness of our proposal. These results are obtained from the research group of the University of Torino in the NGUPP project

Organizing the Unorganized: A Novel Approach for Transferring a Taxonomy of Labels into Flat-Labeled Document Collections

Michele Colombino
First
;
Laurentiu Jr Marius Zaharia
;
Giorgia Iacobellis
;
Rachele Mignone;Ivan Spada;Chiara Bonfanti;Emilio Sulis;Luigi Di Caro;Guido Boella
2023-01-01

Abstract

This paper presents a novel pipeline for transforming flat-labeled text collections into a hierarchical structure, which involves leveraging simple yet effective similarity methods that account for both lexical and semantic criteria to associate labels from disparate sources. Our approach employs a custom similarity measure, the Reinforced Edit Similarity, to identify probable correspondences based on lexical similarities. A subsequent semantic alignment and validation phase is then performed using an automatic classification mechanism. Preliminary results attest to the effectiveness of our proposal. These results are obtained from the research group of the University of Torino in the NGUPP project
2023
ASAIL
Braga
23 September 2023
Proceedings of the 6th Workshop on Automated Semantic Analysis of Information in Legal Text
CEUR
83
92
Legal informatics, Legal document classification, Legal taxonomies, Taxonomy alignment, Text embeddings
Michele Colombino, Laurentiu Jr Marius Zaharia, Giorgia Iacobellis, Rachele Mignone, Ivan Spada, Chiara Bonfanti, Emilio Sulis, Luigi Di Caro , Guido Boella
File in questo prodotto:
File Dimensione Formato  
Organizing the Unorganized- A Novel Approach for Transferring a Taxonomy of Labels into Flat-Labeled Document Collections_ASAIL@ICAIL2023.pdf

Accesso aperto

Dimensione 898.57 kB
Formato Adobe PDF
898.57 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1945611
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact