Numerous NLP applications rely on the accessibility to multilingual, diversified, context-sensitive, and broadly shared lexical semantic information. Standard lexical resources tend to first encode monolithic language-bounded senses which are eventually translated and linked across repositories and languages. In this paper, we propose a novel approach for the representation of lexical-semantic knowledge in - and shared from the origin by - multiple languages, based on the idea of k-Multilingual Concept (MCk). MC(k)s consist of multilingual alignments of semantically equivalent words in k different languages, that are generated through a defined linguistic context and linked via empirically determined semantic relations without the use of any sense disambiguation process. The MCk model allows to uncover novel layers of lexical knowledge in the form of multifaceted conceptual links between naturally disambiguated sets of words. We first present the conceptualization of the MC(k)s, along with the word alignment methodology that generates them. Secondly, we describe a large-scale automatic acquisition of MC(k)s in English, Italian and German based on the exploitation of corpora. Finally, we introduce MultiAlignNet, an original lexical resource built using the data gathered from the extraction task. Results from both qualitative and quantitative assessments on the generated knowledge demonstrate both the quality and the novelty of the proposed model.
MultiAligNet: Cross-lingual Knowledge Bridges Between Words and Senses
Grasso, F;Lovera Rulfi, V;Di Caro, L
2022-01-01
Abstract
Numerous NLP applications rely on the accessibility to multilingual, diversified, context-sensitive, and broadly shared lexical semantic information. Standard lexical resources tend to first encode monolithic language-bounded senses which are eventually translated and linked across repositories and languages. In this paper, we propose a novel approach for the representation of lexical-semantic knowledge in - and shared from the origin by - multiple languages, based on the idea of k-Multilingual Concept (MCk). MC(k)s consist of multilingual alignments of semantically equivalent words in k different languages, that are generated through a defined linguistic context and linked via empirically determined semantic relations without the use of any sense disambiguation process. The MCk model allows to uncover novel layers of lexical knowledge in the form of multifaceted conceptual links between naturally disambiguated sets of words. We first present the conceptualization of the MC(k)s, along with the word alignment methodology that generates them. Secondly, we describe a large-scale automatic acquisition of MC(k)s in English, Italian and German based on the exploitation of corpora. Finally, we introduce MultiAlignNet, an original lexical resource built using the data gathered from the extraction task. Results from both qualitative and quantitative assessments on the generated knowledge demonstrate both the quality and the novelty of the proposed model.File | Dimensione | Formato | |
---|---|---|---|
EKAW2022_MultiAlignet-2.pdf
Open Access dal 27/09/2023
Descrizione: postprint
Tipo di file:
POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione
419.32 kB
Formato
Adobe PDF
|
419.32 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.