CINECA IRIS Institutional Research Information System

The increasing availability of personal data of a sequential nature, such as time-stamped transaction or location data, enables increasingly sophisticated sequential pattern mining techniques. However, privacy is at risk if it is possible to reconstruct the identity of individuals from sequential data. Therefore, it is important to develop privacy-preserving techniques that support publishing of really anonymous data, without altering the analysis results significantly. In this paper we propose to apply the Privacy-by-design paradigm for designing a technological framework to counter the threats of undesirable, unlawful effects of privacy violation on sequence data, without obstructing the knowledge discovery opportunities of data mining technologies. First, we introduce a k-anonymity framework for sequence data, by defining the sequence linking attack model and its associated countermeasure, a k-anonymity notion for sequence datasets, which provides a formal protection against the attack. Second, we instantiate this framework and provide a specific method for constructing the k-anonymous version of a sequence dataset, which preserves the results of sequential pattern mining, together with several basic statistics and other analytical properties of the original data, including the clustering structure. A comprehensive experimental study on realistic datasets of process-logs, web-logs and GPS tracks is carried out, which empirically shows how, in our proposed method, the protection of privacy meets analytical utility.

Anonymity preserving sequential pattern mining

Anna Monreale;Dino Pedreschi;PENSA, Ruggero Gaetano;Fabio Pinelli

2014-01-01

Abstract

The increasing availability of personal data of a sequential nature, such as time-stamped transaction or location data, enables increasingly sophisticated sequential pattern mining techniques. However, privacy is at risk if it is possible to reconstruct the identity of individuals from sequential data. Therefore, it is important to develop privacy-preserving techniques that support publishing of really anonymous data, without altering the analysis results significantly. In this paper we propose to apply the Privacy-by-design paradigm for designing a technological framework to counter the threats of undesirable, unlawful effects of privacy violation on sequence data, without obstructing the knowledge discovery opportunities of data mining technologies. First, we introduce a k-anonymity framework for sequence data, by defining the sequence linking attack model and its associated countermeasure, a k-anonymity notion for sequence datasets, which provides a formal protection against the attack. Second, we instantiate this framework and provide a specific method for constructing the k-anonymous version of a sequence dataset, which preserves the results of sequential pattern mining, together with several basic statistics and other analytical properties of the original data, including the clustering structure. A comprehensive experimental study on realistic datasets of process-logs, web-logs and GPS tracks is carried out, which empirically shows how, in our proposed method, the protection of privacy meets analytical utility.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2014
			
	Titolo rivista
	
				ARTIFICIAL INTELLIGENCE AND LAW
			
	N. Volume
	
				22
			
	Fascicolo
	
				2
			
	Pagine (da)
	
				141
			
	Pagine (a)
	
				173
			
	DOI
	
				https://dx.doi.org/10.1007/s10506-014-9154-6
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				http://link.springer.com/article/10.1007/s10506-014-9154-6
			
	Parole Chiave
	
				privacy-by-design; sequence data; k-anonymity
			
	Tutti gli autori
	
						Anna Monreale; Dino Pedreschi; Ruggero G. Pensa; Fabio Pinelli
					
	Appare nelle tipologie:
	
				03A-Articolo su Rivista

File in questo prodotto:

File	Dimensione	Formato
arti2014_draft.pdf Accesso riservato Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE) Dimensione 1.12 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.12 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
arti2014_draft_4aperto_1280022.pdf Open Access dal 01/07/2015 Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE) Dimensione 1.13 MB Formato Adobe PDF Visualizza/Apri	1.13 MB	Adobe PDF	Visualizza/Apri
arti2014_printed.pdf Accesso riservato Descrizione: PDF versione a stampa Tipo di file: PDF EDITORIALE Dimensione 1.61 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.61 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/141980

Citazioni

ND

19

ND

social impact