In this paper we illustrate a system aimed at solving a long-standing and challenging problem: acquiring a classifier to automatically annotate bibliographic records by starting from a huge set of unbalanced and unlabelled data. We illustrate the main features of the dataset, the learning algorithm adopted, and how it was used to discriminate philosophical documents from documents of other disciplines. One strength of our approach lies in the novel combination of a standard learning approach with a semantic one: the results of the acquired classifier are improved by accessing a semantic network containing conceptual information. We illustrate the experimentation by describing the construction rationale of training and test set, we report and discuss the obtained results and conclude by drawing future work.

Semantically Aware Text Categorisation for Metadata Annotation

CARDUCCI, GIULIO;LEONTINO, MARCO;Radicioni, Daniele P.;Bonino, Guido;Pasini, Enrico;Tripodi, Paolo
2019-01-01

Abstract

In this paper we illustrate a system aimed at solving a long-standing and challenging problem: acquiring a classifier to automatically annotate bibliographic records by starting from a huge set of unbalanced and unlabelled data. We illustrate the main features of the dataset, the learning algorithm adopted, and how it was used to discriminate philosophical documents from documents of other disciplines. One strength of our approach lies in the novel combination of a standard learning approach with a semantic one: the results of the acquired classifier are improved by accessing a semantic network containing conceptual information. We illustrate the experimentation by describing the construction rationale of training and test set, we report and discuss the obtained results and conclude by drawing future work.
2019
15th Italian Research Conference on Digital Libraries, IRCDL 2019
ita
2019
Communications in Computer and Information Science
Springer Verlag
988
315
330
9783030112257
http://www.springer.com/series/7899
Language models; Lexical resources; NLP; Semantics; Text categorization; Computer Science (all); Mathematics (all); Knowledge Graphs
Carducci, Giulio*; Leontino, Marco; Radicioni, Daniele P.; Bonino, Guido; Pasini, Enrico; Tripodi, Paolo
File in questo prodotto:
File Dimensione Formato  
carducci2019categorization.pdf

Accesso aperto

Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione 359.24 kB
Formato Adobe PDF
359.24 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1693870
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? ND
social impact