Tensor decomposition is used for many web and user data analysis operations from clustering, trend detection, anomaly detection, to correlation analysis. However, many of the tensor decomposition schemes are sensitive to noisy data, an inevitable problem in the real world that can lead to false conclusions. The problem is compounded by over-fitting when the user data is sparse. Recent research has shown that it is possible to avoid over-fitting by relying on probabilistic techniques. However, these have two major deficiencies: (a) firstly, they assume that all the data and intermediary results can fit in the main memory, and (b) they treat the entire tensor uniformly, ignoring potential non-uniformities in the noise distribution. In this paper, we propose a Noise-Profile Adaptive Tensor Decomposition (nTD) method, which aims to tackle both of these challenges. In particular, nTD leverages a grid-based two-phase decomposition strategy for two complementary purposes: firstly, the grid partitioning helps ensure that the memory footprint of the decomposition is kept low; secondly (and perhaps more importantly) any a priori knowledge about the noise profiles of the grid partitions enable us to develop a sample assignment strategy (or s-strategy) that best suits the noise distribution of the given tensor. Experiments show that nTD's performance is significantly better than conventional CP decomposition techniques on noisy user data tensors.

nTD: Noise-Profile Adaptive Tensor Decomposition.

SAPINO, Maria Luisa
2017-01-01

Abstract

Tensor decomposition is used for many web and user data analysis operations from clustering, trend detection, anomaly detection, to correlation analysis. However, many of the tensor decomposition schemes are sensitive to noisy data, an inevitable problem in the real world that can lead to false conclusions. The problem is compounded by over-fitting when the user data is sparse. Recent research has shown that it is possible to avoid over-fitting by relying on probabilistic techniques. However, these have two major deficiencies: (a) firstly, they assume that all the data and intermediary results can fit in the main memory, and (b) they treat the entire tensor uniformly, ignoring potential non-uniformities in the noise distribution. In this paper, we propose a Noise-Profile Adaptive Tensor Decomposition (nTD) method, which aims to tackle both of these challenges. In particular, nTD leverages a grid-based two-phase decomposition strategy for two complementary purposes: firstly, the grid partitioning helps ensure that the memory footprint of the decomposition is kept low; secondly (and perhaps more importantly) any a priori knowledge about the noise profiles of the grid partitions enable us to develop a sample assignment strategy (or s-strategy) that best suits the noise distribution of the given tensor. Experiments show that nTD's performance is significantly better than conventional CP decomposition techniques on noisy user data tensors.
2017
WWW '17 - 26th International Conference on World Wide Web
Perth - Australia
3-7 Aprile 2017
WWW '17 Proceedings of the 26th International Conference on World Wide Web
International World Wide Web Conferences Steering Committee
243
252
978-1-4503-4913-0
http://dl.acm.org/citation.cfm?doid=3038912.3052641
Xinsheng Li ; K. Selçuk Candan; Maria Luisa Sapino
File in questo prodotto:
File Dimensione Formato  
WWW2017.pdf

Accesso aperto

Descrizione: Articolo principale
Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione 3.52 MB
Formato Adobe PDF
3.52 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1647177
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 2
social impact