With the availability of user-generated content in the Web, malicious users have access to huge repositories of private (and often sensitive) information regarding a large part of the world’s population. In this paper, we propose a way to evaluate the harmfulness of text content by defining a new data mining task called content sensitivity analysis. According to our definition, a score can be assigned to any text sample according to its degree of sensitivity. Even though the task is similar to sentiment analysis, we show that it has its own peculiarities and may lead to a new branch of research. Thanks to some preliminary experiments, we show that content sensitivity analysis can not be addressed as a simple binary classification task.

Classification-based Content Sensitivity Analysis

Battaglia, Elena
Co-first
;
Bioglio, Livio
Co-first
;
Pensa, Ruggero G.
Last
2020-01-01

Abstract

With the availability of user-generated content in the Web, malicious users have access to huge repositories of private (and often sensitive) information regarding a large part of the world’s population. In this paper, we propose a way to evaluate the harmfulness of text content by defining a new data mining task called content sensitivity analysis. According to our definition, a score can be assigned to any text sample according to its degree of sensitivity. Even though the task is similar to sentiment analysis, we show that it has its own peculiarities and may lead to a new branch of research. Thanks to some preliminary experiments, we show that content sensitivity analysis can not be addressed as a simple binary classification task.
2020
28th Symposium on Advanced Database Systems (SEBD 2020)
Villasimius, Italy
June 21-24, 2020
Proceedings of the 28th Italian Symposium on Advanced Database Systems,Villasimius, Sud Sardegna, Italy (virtual due to Covid-19 pandemic),June 21-24, 2020
CEUR-WS.org
2646
326
333
http://ceur-ws.org/Vol-2646/12-paper.pdf
privacy, text mining, text categorization
Battaglia, Elena; Bioglio, Livio; Pensa, Ruggero G.
File in questo prodotto:
File Dimensione Formato  
sebd2020_2_online.pdf

Accesso aperto

Descrizione: PDF online (open access)
Tipo di file: PDF EDITORIALE
Dimensione 423.41 kB
Formato Adobe PDF
423.41 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1749119
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact