As many psychological and sociological study reveal, many people disclose too much privacy-harming information in social media in the form of text and multimedia posts, thus exposing themselves and other persons to several security risks. Consequently, many researchers have addressed this problem by investigating on the detection and analysis of the so-called self-disclosure behavior in social media and blogging platforms. Among the others, content sensitivity analysis has emerged as a promising research direction, but, so far, it has only focused on English text posts, although it is well-known that people tend to disclose mostly in their own native languages. Therefore, in this paper, we address this limitation by proposing a new text corpus of Italian posts that we have annotated following to the anonymity assumption. We then apply several language models based on transformers to classify them according to their sensitivity. Moreover, since Italian is a lower-resource language compared to English, we also apply some multilingual zero-shot transfer learning architectures trained on a rich and manually annotated English corpus and tested on the Italian one. We show experimentally that the approaches trained directly on the Italian corpus, still outperform multilingual ones trained on the English data and tested on Italian, although some of them exhibit promising prediction performances.

Detection of Privacy-Harming Social Media Posts in Italian

Peiretti, Federico
First
;
Pensa, Ruggero G.
Last
2023-01-01

Abstract

As many psychological and sociological study reveal, many people disclose too much privacy-harming information in social media in the form of text and multimedia posts, thus exposing themselves and other persons to several security risks. Consequently, many researchers have addressed this problem by investigating on the detection and analysis of the so-called self-disclosure behavior in social media and blogging platforms. Among the others, content sensitivity analysis has emerged as a promising research direction, but, so far, it has only focused on English text posts, although it is well-known that people tend to disclose mostly in their own native languages. Therefore, in this paper, we address this limitation by proposing a new text corpus of Italian posts that we have annotated following to the anonymity assumption. We then apply several language models based on transformers to classify them according to their sensitivity. Moreover, since Italian is a lower-resource language compared to English, we also apply some multilingual zero-shot transfer learning architectures trained on a rich and manually annotated English corpus and tested on the Italian one. We show experimentally that the approaches trained directly on the Italian corpus, still outperform multilingual ones trained on the English data and tested on Italian, although some of them exhibit promising prediction performances.
2023
Inglese
contributo
1 - Conferenza
9th International Symposium on Security and Privacy in Social Networks and Big Data (SocialSec 2023)
University of Kent, Canterbury, UK
August 14-16, 2023
Internazionale
Arief, B., Monreale, A., Sirivianos, M., Li, S.
SocialSec 2023: Security and Privacy in Social Networks and Big Data
Comitato scientifico
Springer
Singapore
SINGAPORE
14097
203
223
21
978-981-99-5176-5
978-981-99-5177-2
https://link.springer.com/chapter/10.1007/978-981-99-5177-2_12
Privacy, Neural language models, Social media
no
   Bando CRT I Tornata 2022 - "Social4School 4.0 - L'intelligenzaartificiale al servizio dell'educazione civica digitale nelle scuole" - CDD 12/09/2022
   Social4School 4.0
   FONDAZIONE CRT
   Pensa R. -
1 – prodotto con file in versione Open Access (allegherò il file al passo 6 - Carica)
2
info:eu-repo/semantics/conferenceObject
04-CONTRIBUTO IN ATTI DI CONVEGNO::04A-Conference paper in volume
Peiretti, Federico; Pensa, Ruggero G.
273
partially_open
File in questo prodotto:
File Dimensione Formato  
main.pdf

Accesso aperto

Descrizione: preprint
Tipo di file: PREPRINT (PRIMA BOZZA)
Dimensione 448.44 kB
Formato Adobe PDF
448.44 kB Adobe PDF Visualizza/Apri
socialsec2023_printed.pdf

Accesso riservato

Descrizione: PDF editoriale
Tipo di file: PDF EDITORIALE
Dimensione 212.68 kB
Formato Adobe PDF
212.68 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1925050
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact