Numerous NLP applications rely on the accessibility to multilingual, diversified, context-sensitive, and broadly shared lexical semantic information. Standard lexical resources tend to first encode monolithic language-bounded senses which are eventually translated and linked across repositories and languages. In this paper, we propose a novel approach for the representation of lexical-semantic knowledge in - and shared from the origin by - multiple languages, based on the idea of k-Multilingual Concept (MCk). MC(k)s consist of multilingual alignments of semantically equivalent words in k different languages, that are generated through a defined linguistic context and linked via empirically determined semantic relations without the use of any sense disambiguation process. The MCk model allows to uncover novel layers of lexical knowledge in the form of multifaceted conceptual links between naturally disambiguated sets of words. We first present the conceptualization of the MC(k)s, along with the word alignment methodology that generates them. Secondly, we describe a large-scale automatic acquisition of MC(k)s in English, Italian and German based on the exploitation of corpora. Finally, we introduce MultiAlignNet, an original lexical resource built using the data gathered from the extraction task. Results from both qualitative and quantitative assessments on the generated knowledge demonstrate both the quality and the novelty of the proposed model.

MultiAligNet: Cross-lingual Knowledge Bridges Between Words and Senses

Grasso, F;Lovera Rulfi, V;Di Caro, L
2022-01-01

Abstract

Numerous NLP applications rely on the accessibility to multilingual, diversified, context-sensitive, and broadly shared lexical semantic information. Standard lexical resources tend to first encode monolithic language-bounded senses which are eventually translated and linked across repositories and languages. In this paper, we propose a novel approach for the representation of lexical-semantic knowledge in - and shared from the origin by - multiple languages, based on the idea of k-Multilingual Concept (MCk). MC(k)s consist of multilingual alignments of semantically equivalent words in k different languages, that are generated through a defined linguistic context and linked via empirically determined semantic relations without the use of any sense disambiguation process. The MCk model allows to uncover novel layers of lexical knowledge in the form of multifaceted conceptual links between naturally disambiguated sets of words. We first present the conceptualization of the MC(k)s, along with the word alignment methodology that generates them. Secondly, we describe a large-scale automatic acquisition of MC(k)s in English, Italian and German based on the exploitation of corpora. Finally, we introduce MultiAlignNet, an original lexical resource built using the data gathered from the extraction task. Results from both qualitative and quantitative assessments on the generated knowledge demonstrate both the quality and the novelty of the proposed model.
2022
23rd International Conference on Knowledge Engineering and Knowledge Management
Bolzano, Italy
September 2022
International Conference on Knowledge Engineering and Knowledge Management (EKAW)
SPRINGER INTERNATIONAL PUBLISHING AG
13514
36
50
978-3-031-17104-8
978-3-031-17105-5
Lexical Semantics; Multilingual alignments
Grasso, F; Lovera Rulfi, V; Di Caro, L
File in questo prodotto:
File Dimensione Formato  
EKAW2022_MultiAlignet-2.pdf

Open Access dal 27/09/2023

Descrizione: postprint
Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione 419.32 kB
Formato Adobe PDF
419.32 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1891705
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact