The paper presents the results of the research related to the preparation of parallel corpora, focusing on transformation into RDF graphs using NLP Interchange Format (NIF) for linguistic annotation. We give an overview of the parallel corpus that was used in this case study, as well as the process of POS tagging, lemmatization, and named entity recognition (NER). Next, we describe the named entity linking (NEL), data conversion to RDF, and incorporation of NIF annotations. Produced NIF files were evaluated through the exploration of triplestore using SPARQL queries. Finally, the bridging of Linked Data and Digital Humanities research is discussed, as well as some drawbacks related to the verbosity of transformation. Semantic interoperability concept in the context of linked data and parallel corpora ensures that data exchanged between systems carries shared and well-defined meanings, enabling effective communication and understanding.

Towards Semantic Interoperability: Parallel Corpora as Linked Data Incorporating Named Entity Linking

Olja Perišić;
2024-01-01

Abstract

The paper presents the results of the research related to the preparation of parallel corpora, focusing on transformation into RDF graphs using NLP Interchange Format (NIF) for linguistic annotation. We give an overview of the parallel corpus that was used in this case study, as well as the process of POS tagging, lemmatization, and named entity recognition (NER). Next, we describe the named entity linking (NEL), data conversion to RDF, and incorporation of NIF annotations. Produced NIF files were evaluated through the exploration of triplestore using SPARQL queries. Finally, the bridging of Linked Data and Digital Humanities research is discussed, as well as some drawbacks related to the verbosity of transformation. Semantic interoperability concept in the context of linked data and parallel corpora ensures that data exchanged between systems carries shared and well-defined meanings, enabling effective communication and understanding.
2024
The 9th Workshop on Linked Data in Linguistics: Resources, Applications, Best Practices (LDL-2024) @LREC-COLING-2024
Torino
20-25 maggio 2024
The 9th Workshop on Linked Data in Linguistics: Resources, Applications, Best Practices (LDL-2024) @LREC-COLING-2024
ELRA Language Resources Association and the International Committee on Computational Linguistics
115
125
978-2-493814-38-8
https://aclanthology.org/2024.ldl-1.15.pdf
parallel corpora, named entity linking, named entity recognition, NER, NEL, linked data, NIF, Wikidata
Ranka Stanković, Milica Ikonić Nešić, Olja Perišić, Mihailo Škorić, Olivera Kitanović
File in questo prodotto:
File Dimensione Formato  
Pubblicazione LREC 2024.ldl-1.15.pdf

Accesso aperto

Tipo di file: PDF EDITORIALE
Dimensione 3.88 MB
Formato Adobe PDF
3.88 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1979230
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact