CINECA IRIS Institutional Research Information System

Sense Identification is a newly proposed task; in considering a pair of terms to assess their conceptual similarity, human raters are postulated to preliminarily select a sense pair. Senses involved in this pair are those actually subject to similarity rating. The sense identification task is searching for the sense selected during the similarity rating. The sense individuation task is important to investigate strategies and sense inventories underlying human lexical access and, moreover, it is a relevant complement to the semantic similarity task. Individuating which senses are involved in the similarity rating is also crucial in order to fully assess those ratings: if we have no idea of which two senses were retrieved, on which base can we assess the score expressing their semantic proximity? The Sense Identification Dataset (SID) dataset has been built to provide a common experimental ground to systems and approaches dealing with the sense identification task. It is the first dataset specifically designed for experimenting on the mentioned task. The SID dataset was created by manually annotating with sense identifiers the term pairs from an existing dataset, the SemEval-2017 Task 2 English dataset. The original dataset was originally conceived for experimenting on the semantic similarity task, and it contains a score expressing the human similarity rating for each term pair. For each such term pair we added a pair of annotated senses: in particular, senses were annotated such that they are compatible (explicative of) with the existing similarity ratings. The SID dataset contains BabelNet sense identifiers. This sense inventory is a broadly adopted ‘naming convention’ for word senses, and such identifiers can be easily mapped onto further resources such as WordNet and WikiData, thereby enabli

Sense identification data: a dataset for lexical semantics

Davide Colla;Enrico Mensa;Daniele P. Radicioni

2020-01-01

Abstract

Sense Identification is a newly proposed task; in considering a pair of terms to assess their conceptual similarity, human raters are postulated to preliminarily select a sense pair. Senses involved in this pair are those actually subject to similarity rating. The sense identification task is searching for the sense selected during the similarity rating. The sense individuation task is important to investigate strategies and sense inventories underlying human lexical access and, moreover, it is a relevant complement to the semantic similarity task. Individuating which senses are involved in the similarity rating is also crucial in order to fully assess those ratings: if we have no idea of which two senses were retrieved, on which base can we assess the score expressing their semantic proximity? The Sense Identification Dataset (SID) dataset has been built to provide a common experimental ground to systems and approaches dealing with the sense identification task. It is the first dataset specifically designed for experimenting on the mentioned task. The SID dataset was created by manually annotating with sense identifiers the term pairs from an existing dataset, the SemEval-2017 Task 2 English dataset. The original dataset was originally conceived for experimenting on the semantic similarity task, and it contains a score expressing the human similarity rating for each term pair. For each such term pair we added a pair of annotated senses: in particular, senses were annotated such that they are compatible (explicative of) with the existing similarity ratings. The SID dataset contains BabelNet sense identifiers. This sense inventory is a broadly adopted ‘naming convention’ for word senses, and such identifiers can be easily mapped onto further resources such as WordNet and WikiData, thereby enabli

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Lingua di pubblicazione
	
				Inglese
			
	Codice ISI WoS
	
				WOS:000583229100228
			
	Codice Scopus
	
				2-s2.0-85090422411
			
	Referee
	
				Esperti anonimi
			
	Titolo rivista
	
				DATA IN BRIEF
			
	N. Volume
	
				32
			
	Pagine (da)
	
				106267
			
	Pagine (a)
	
				106272
			
	Numero di pagine totale
	
				6
			
	DOI
	
				https://dx.doi.org/10.1016/j.dib.2020.106267
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				https://www.sciencedirect.com/science/article/pii/S2352340920311616
			
	Parole Chiave
	
				Sense annotation, Sense individuation, Lexical processing, Lexical semantics, Word embeddings, Sense embeddings, Semantic similarity, Similarity metrics
			
	Coautori affiliati a enti stranieri
	
				no
			
	Prodotto conforme al Regolamento di Ateneo sull'accesso aperto?
	
				1 – prodotto con  file in versione Open Access (allegherò il file al passo 6 - Carica)
			
	Tipologia sito docente
	
				262
			
	Numero autori
	
				3
			
	Tutti gli autori
	
						Davide Colla, Enrico Mensa, Daniele P. Radicioni
					
	Tipologia
	
				info:eu-repo/semantics/article
			
	Fulltext
	
				partially_open
			
	Tipologia
	
				03-CONTRIBUTO IN RIVISTA::03A-Articolo su Rivista
			
	Appare nelle tipologie:
	
				03A-Articolo su Rivista

File in questo prodotto:

File	Dimensione	Formato
colla2020novel.pdf Accesso riservato Tipo di file: PDF EDITORIALE Dimensione 713.58 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	713.58 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
1-s2.0-S2352340920311616-main.pdf Accesso aperto Tipo di file: PDF EDITORIALE Dimensione 285.88 kB Formato Adobe PDF Visualizza/Apri	285.88 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1762846

Citazioni

ND

1

1

social impact