As many psychological and sociological study reveal, many people disclose too much privacy-harming information in social media in the form of text and multimedia posts, thus exposing themselves and other persons to several security risks. Consequently, many researchers have addressed this problem by investigating on the detection and analysis of the so-called self-disclosure behavior in social media and blogging platforms. Among the others, content sensitivity analysis has emerged as a promising research direction, but, so far, it has only focused on English text posts, although it is well-known that people tend to disclose mostly in their own native languages. Therefore, in this paper, we address this limitation by proposing a new text corpus of Italian posts that we have annotated following to the anonymity assumption. We then apply several language models based on transformers to classify them according to their sensitivity. Moreover, since Italian is a lower-resource language compared to English, we also apply some multilingual zero-shot transfer learning architectures trained on a rich and manually annotated English corpus and tested on the Italian one. We show experimentally that the approaches trained directly on the Italian corpus, still outperform multilingual ones trained on the English data and tested on Italian, although some of them exhibit promising prediction performances.

Detection of Privacy-Harming Social Media Posts in Italian

Peiretti, Federico
First
;
Pensa, Ruggero G.
Last
2023-01-01

Abstract

As many psychological and sociological study reveal, many people disclose too much privacy-harming information in social media in the form of text and multimedia posts, thus exposing themselves and other persons to several security risks. Consequently, many researchers have addressed this problem by investigating on the detection and analysis of the so-called self-disclosure behavior in social media and blogging platforms. Among the others, content sensitivity analysis has emerged as a promising research direction, but, so far, it has only focused on English text posts, although it is well-known that people tend to disclose mostly in their own native languages. Therefore, in this paper, we address this limitation by proposing a new text corpus of Italian posts that we have annotated following to the anonymity assumption. We then apply several language models based on transformers to classify them according to their sensitivity. Moreover, since Italian is a lower-resource language compared to English, we also apply some multilingual zero-shot transfer learning architectures trained on a rich and manually annotated English corpus and tested on the Italian one. We show experimentally that the approaches trained directly on the Italian corpus, still outperform multilingual ones trained on the English data and tested on Italian, although some of them exhibit promising prediction performances.
2023
9th International Symposium on Security and Privacy in Social Networks and Big Data (SocialSec 2023)
University of Kent, Canterbury, UK
August 14-16, 2023
SocialSec 2023: Security and Privacy in Social Networks and Big Data
Springer
14097
203
223
978-981-99-5176-5
978-981-99-5177-2
https://link.springer.com/chapter/10.1007/978-981-99-5177-2_12
Privacy, Neural language models, Social media
Peiretti, Federico; Pensa, Ruggero G.
File in questo prodotto:
File Dimensione Formato  
main.pdf

Accesso aperto

Descrizione: preprint
Tipo di file: PREPRINT (PRIMA BOZZA)
Dimensione 448.44 kB
Formato Adobe PDF
448.44 kB Adobe PDF Visualizza/Apri
socialsec2023_printed.pdf

Accesso riservato

Descrizione: PDF editoriale
Tipo di file: PDF EDITORIALE
Dimensione 212.68 kB
Formato Adobe PDF
212.68 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1925050
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact