It-Sr-NER-corp is the Italian/Serbian bilingual corpus with 10,000 aligned sentences compiled in the scope of the It-Sr-project from samples of several Italian novels translated to Serbian and vice versa, with the aim of the development of the CLARIN compatible NER web service for parallel text with the case study on Italian and Serbian. The set of 10,000 natural language segments is split into 4 files: 1*1000+3*3000. The corpus comprises of: 1) text versions, Italian and Serbian, with one segment per line 2) TMX (Translation Memory eXchange) bilingual aligned segments; 3) monolingual text and TMX files with automatically annotated named entities for six NER classes: demonyms (DEMO), works of art (WORK), person names (PERS), places (LOC), events (EVENT) and organizations (ORG). It-Sr-NER annotation uses a powerful Convolutional Neural Network architecture within the spaCy tool, for Italien WikiNER (Joel Nothman, Nicky Ringland, Will Radford, Tara Murphy, James R Curran) and for Serbian SrpCNNER (Cvetana Krstev, Ranka Stanković, Milica Ikonić Nešić, Branislava Šandrih Todorović).

It-Sr-NER: CLARIN compatible NER and geoparsing web services for parallel texts: case study Italian and Serbian

Perisic Olja
First
;
2022-01-01

Abstract

It-Sr-NER-corp is the Italian/Serbian bilingual corpus with 10,000 aligned sentences compiled in the scope of the It-Sr-project from samples of several Italian novels translated to Serbian and vice versa, with the aim of the development of the CLARIN compatible NER web service for parallel text with the case study on Italian and Serbian. The set of 10,000 natural language segments is split into 4 files: 1*1000+3*3000. The corpus comprises of: 1) text versions, Italian and Serbian, with one segment per line 2) TMX (Translation Memory eXchange) bilingual aligned segments; 3) monolingual text and TMX files with automatically annotated named entities for six NER classes: demonyms (DEMO), works of art (WORK), person names (PERS), places (LOC), events (EVENT) and organizations (ORG). It-Sr-NER annotation uses a powerful Convolutional Neural Network architecture within the spaCy tool, for Italien WikiNER (Joel Nothman, Nicky Ringland, Will Radford, Tara Murphy, James R Curran) and for Serbian SrpCNNER (Cvetana Krstev, Ranka Stanković, Milica Ikonić Nešić, Branislava Šandrih Todorović).
2022
Italiano
Serbo
https://github.com/rankastankovic/It-Sr-NER/tree/main/corpus
corpus parallelo, serbo, italiano, testi letterari
SERBIA
   It-Sr-NER: CLARIN compatible NER and geoparsing web services for parallel texts: case study Italian and Serbian
   It-Sr-NER
   CLARIN ERIC
   CE-2022-2070
4 – prodotto già presente in altro archivio Open Access (arXiv, REPEC…)
07-ALTRO PRODOTTO SCIENTIFICO::07L-Banca Dati
Perisic Olja; Stankovic Ranka; Vitas Dusko; Krstev Cvetana; Moderc Sasa
info:eu-repo/semantics/other
none
5
295
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1998710
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact