CINECA IRIS Institutional Research Information System

In this paper we illustrate a system aimed at solving a long-standing and challenging problem: acquiring a classifier to automatically annotate bibliographic records by starting from a huge set of unbalanced and unlabelled data. We illustrate the main features of the dataset, the learning algorithm adopted, and how it was used to discriminate philosophical documents from documents of other disciplines. One strength of our approach lies in the novel combination of a standard learning approach with a semantic one: the results of the acquired classifier are improved by accessing a semantic network containing conceptual information. We illustrate the experimentation by describing the construction rationale of training and test set, we report and discuss the obtained results and conclude by drawing future work.

Semantically Aware Text Categorisation for Metadata Annotation

CARDUCCI, GIULIO;LEONTINO, MARCO;Radicioni, Daniele P.;Bonino, Guido;Pasini, Enrico;Tripodi, Paolo

2019-01-01

Abstract

In this paper we illustrate a system aimed at solving a long-standing and challenging problem: acquiring a classifier to automatically annotate bibliographic records by starting from a huge set of unbalanced and unlabelled data. We illustrate the main features of the dataset, the learning algorithm adopted, and how it was used to discriminate philosophical documents from documents of other disciplines. One strength of our approach lies in the novel combination of a standard learning approach with a semantic one: the results of the acquired classifier are improved by accessing a semantic network containing conceptual information. We illustrate the experimentation by describing the construction rationale of training and test set, we report and discuss the obtained results and conclude by drawing future work.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Titolo dell'evento
	
				15th Italian Research Conference on Digital Libraries, IRCDL 2019
			
	Luogo dell'evento
	
				ita
			
	Data dell'evento
	
				2019
			
	Titolo del volume
	
				Communications in Computer and Information Science
			
	Nome editore
	
				Springer Verlag
			
	N. Volume
	
				988
			
	Pagine (da)
	
				315
			
	Pagine (a)
	
				330
			
	Codice ISBN
	
				9783030112257
			
	DOI
	
				https://dx.doi.org/10.1007/978-3-030-11226-4_25
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				http://www.springer.com/series/7899
			
	Parole Chiave
	
				Language models; Lexical resources; NLP; Semantics; Text categorization; Computer Science (all); Mathematics (all); Knowledge Graphs
			
	Tutti gli autori
	
						Carducci, Giulio*; Leontino, Marco; Radicioni, Daniele P.; Bonino, Guido; Pasini, Enrico; Tripodi, Paolo
					
	Appare nelle tipologie:
	
				04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
carducci2019categorization.pdf Accesso aperto Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE) Dimensione 359.24 kB Formato Adobe PDF Visualizza/Apri	359.24 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1693870

Citazioni

ND

11

ND

social impact