The automated identification of national implementations (NIMs) of European directives by text similarity techniques has shown promising preliminary results. Previous works have proposed and utilized unsupervised lexical and semantic similarity techniques based on vector space models, latent semantic analysis and topic models. However, these techniques were evaluated on a small multilingual corpus of directives and NIMs. In this paper, we utilize word and paragraph embedding models learned by shallow neural networks from a multilingual legal corpus of European directives and national legislation (from Ireland, Luxembourg and Italy) to develop unsupervised semantic similarity systems to identify transpositions. We evaluate these models and compare their results with the previous unsupervised methods on a multilingual test corpus of 43 Directives and their corresponding NIMs. We also develop supervised machine learning models to identify transpositions and compare their performance with different feature sets.
Unsupervised and supervised text similarity systems for automated identification of national implementing measures of European directives
NANDA, ROHAN;SIRAGUSA, GIOVANNI;Di Caro L.;Boella G.;Grossio L.;GERBAUDO, Marco;Costamagna F.
2019-01-01
Abstract
The automated identification of national implementations (NIMs) of European directives by text similarity techniques has shown promising preliminary results. Previous works have proposed and utilized unsupervised lexical and semantic similarity techniques based on vector space models, latent semantic analysis and topic models. However, these techniques were evaluated on a small multilingual corpus of directives and NIMs. In this paper, we utilize word and paragraph embedding models learned by shallow neural networks from a multilingual legal corpus of European directives and national legislation (from Ireland, Luxembourg and Italy) to develop unsupervised semantic similarity systems to identify transpositions. We evaluate these models and compare their results with the previous unsupervised methods on a multilingual test corpus of 43 Directives and their corresponding NIMs. We also develop supervised machine learning models to identify transpositions and compare their performance with different feature sets.File | Dimensione | Formato | |
---|---|---|---|
Nanda2019_Article_UnsupervisedAndSupervisedTextS.pdf
Accesso riservato
Tipo di file:
PDF EDITORIALE
Dimensione
1.47 MB
Formato
Adobe PDF
|
1.47 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
preprint_AILaw_Nanda.pdf
Accesso aperto
Tipo di file:
PREPRINT (PRIMA BOZZA)
Dimensione
857 kB
Formato
Adobe PDF
|
857 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.