In this paper, we introduce a new approach of semisupervised anomaly detection that deals with categorical data. Given a training set of instances (all belonging to the normal class), we analyze the relationship among features for the extraction of a discriminative characterization of the anomalous instances. Our key idea is to build a model that characterizes the features of the normal instances and then use a set of distance-based techniques for the discrimination between the normal and the anomalous instances. We compare our approach with the state-of-the-art methods for semisupervised anomaly detection. We empirically show that a specifically designed technique for the management of the categorical data outperforms the general-purpose approaches. We also show that, in contrast with other approaches that are opaque because their decision cannot be easily understood, our proposed approach produces a discriminative model that can be easily interpreted and used for the exploration of the data.

A Semisupervised Approach to the Detection and Characterization of Outliers in Categorical Data

IENCO, Dino;PENSA, Ruggero Gaetano;MEO, Rosa
2017-01-01

Abstract

In this paper, we introduce a new approach of semisupervised anomaly detection that deals with categorical data. Given a training set of instances (all belonging to the normal class), we analyze the relationship among features for the extraction of a discriminative characterization of the anomalous instances. Our key idea is to build a model that characterizes the features of the normal instances and then use a set of distance-based techniques for the discrimination between the normal and the anomalous instances. We compare our approach with the state-of-the-art methods for semisupervised anomaly detection. We empirically show that a specifically designed technique for the management of the categorical data outperforms the general-purpose approaches. We also show that, in contrast with other approaches that are opaque because their decision cannot be easily understood, our proposed approach produces a discriminative model that can be easily interpreted and used for the exploration of the data.
2017
28
5
1017
1029
http://ieeexplore.ieee.org/document/7412753/
Anomaly detection, categorical data, distance learning, semisupervised learning
Ienco, Dino; Pensa, Ruggero G.; Meo, Rosa
File in questo prodotto:
File Dimensione Formato  
tnnls2016_online.pdf

Accesso riservato

Descrizione: Versione stampata
Tipo di file: PDF EDITORIALE
Dimensione 2.57 MB
Formato Adobe PDF
2.57 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
tnnls2016_author_4aperto.pdf

Accesso aperto

Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione 531.24 kB
Formato Adobe PDF
531.24 kB Adobe PDF Visualizza/Apri
tnnls2017_printed.pdf

Accesso riservato

Tipo di file: PDF EDITORIALE
Dimensione 2.22 MB
Formato Adobe PDF
2.22 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1558955
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 35
  • ???jsp.display-item.citation.isi??? 28
social impact