Sense Identification is a newly proposed task; in considering a pair of terms to assess their conceptual similarity, human raters are postulated to preliminarily select a sense pair. Senses involved in this pair are those actually subject to similarity rating. The sense identification task is searching for the sense selected during the similarity rating. The sense individuation task is important to investigate strategies and sense inventories underlying human lexical access and, moreover, it is a relevant complement to the semantic similarity task. Individuating which senses are involved in the similarity rating is also crucial in order to fully assess those ratings: if we have no idea of which two senses were retrieved, on which base can we assess the score expressing their semantic proximity? The Sense Identification Dataset (SID) dataset has been built to provide a common experimental ground to systems and approaches dealing with the sense identification task. It is the first dataset specifically designed for experimenting on the mentioned task. The SID dataset was created by manually annotating with sense identifiers the term pairs from an existing dataset, the SemEval-2017 Task 2 English dataset. The original dataset was originally conceived for experimenting on the semantic similarity task, and it contains a score expressing the human similarity rating for each term pair. For each such term pair we added a pair of annotated senses: in particular, senses were annotated such that they are compatible (explicative of) with the existing similarity ratings. The SID dataset contains BabelNet sense identifiers. This sense inventory is a broadly adopted ‘naming convention’ for word senses, and such identifiers can be easily mapped onto further resources such as WordNet and WikiData, thereby enabli

Sense identification data: a dataset for lexical semantics

Davide Colla;Enrico Mensa;Daniele P. Radicioni
2020-01-01

Abstract

Sense Identification is a newly proposed task; in considering a pair of terms to assess their conceptual similarity, human raters are postulated to preliminarily select a sense pair. Senses involved in this pair are those actually subject to similarity rating. The sense identification task is searching for the sense selected during the similarity rating. The sense individuation task is important to investigate strategies and sense inventories underlying human lexical access and, moreover, it is a relevant complement to the semantic similarity task. Individuating which senses are involved in the similarity rating is also crucial in order to fully assess those ratings: if we have no idea of which two senses were retrieved, on which base can we assess the score expressing their semantic proximity? The Sense Identification Dataset (SID) dataset has been built to provide a common experimental ground to systems and approaches dealing with the sense identification task. It is the first dataset specifically designed for experimenting on the mentioned task. The SID dataset was created by manually annotating with sense identifiers the term pairs from an existing dataset, the SemEval-2017 Task 2 English dataset. The original dataset was originally conceived for experimenting on the semantic similarity task, and it contains a score expressing the human similarity rating for each term pair. For each such term pair we added a pair of annotated senses: in particular, senses were annotated such that they are compatible (explicative of) with the existing similarity ratings. The SID dataset contains BabelNet sense identifiers. This sense inventory is a broadly adopted ‘naming convention’ for word senses, and such identifiers can be easily mapped onto further resources such as WordNet and WikiData, thereby enabli
2020
32
106267
106272
https://www.sciencedirect.com/science/article/pii/S2352340920311616
Sense annotation, Sense individuation, Lexical processing, Lexical semantics, Word embeddings, Sense embeddings, Semantic similarity, Similarity metrics
Davide Colla, Enrico Mensa, Daniele P. Radicioni
File in questo prodotto:
File Dimensione Formato  
colla2020novel.pdf

Accesso riservato

Tipo di file: PDF EDITORIALE
Dimensione 713.58 kB
Formato Adobe PDF
713.58 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
1-s2.0-S2352340920311616-main.pdf

Accesso aperto

Tipo di file: PDF EDITORIALE
Dimensione 285.88 kB
Formato Adobe PDF
285.88 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1762846
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact