With many applications relying on multi-dimensional datasets for decision making, matrix factorization (or decomposition) is becoming the basis for many knowledge discovery and machine learning tasks, from clustering, trend detection, anomaly detection, to correlation analysis. Unfortunately, a major shortcoming of matrix analysis operations is that, despite their effectiveness when the data is scalar, these operations become difficult to apply in the presence of non-scalar data, as they are not designed for data that include non-scalar observations, such as intervals. Yet, in many applications, the available data are inherently non-scalar for various reasons, including imprecision in data collection, conflicts in aggregated data, data summarization, or privacy issues, where one is provided with a reduced, clustered, or intentionally noisy and obfuscated version of the data to hide information. In this paper, we propose matrix decomposition techniques that consider the existence of interval-valued data. We show that naive ways to deal with such imperfect data may introduce errors in analysis and present factorization techniques that are especially effective when the amount of imprecise information is large.

Matrix Factorization with Interval-Valued Data

Francesco Di Mauro;Maria Luisa Sapino
2019-01-01

Abstract

With many applications relying on multi-dimensional datasets for decision making, matrix factorization (or decomposition) is becoming the basis for many knowledge discovery and machine learning tasks, from clustering, trend detection, anomaly detection, to correlation analysis. Unfortunately, a major shortcoming of matrix analysis operations is that, despite their effectiveness when the data is scalar, these operations become difficult to apply in the presence of non-scalar data, as they are not designed for data that include non-scalar observations, such as intervals. Yet, in many applications, the available data are inherently non-scalar for various reasons, including imprecision in data collection, conflicts in aggregated data, data summarization, or privacy issues, where one is provided with a reduced, clustered, or intentionally noisy and obfuscated version of the data to hide information. In this paper, we propose matrix decomposition techniques that consider the existence of interval-valued data. We show that naive ways to deal with such imperfect data may introduce errors in analysis and present factorization techniques that are especially effective when the amount of imprecise information is large.
2019
1
1
https://ieeexplore.ieee.org/document/8844796
Matrix decomposition, Semantics, Probabilistic logic, Principal component analysis, Singular value decomposition
Mao-Lin Li, Francesco Di Mauro, K. Selcuk Candan, Maria Luisa Sapino
File in questo prodotto:
File Dimensione Formato  
tkde18_revised_submission.pdf

Accesso aperto

Descrizione: articolo e appendice
Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione 1.31 MB
Formato Adobe PDF
1.31 MB Adobe PDF Visualizza/Apri
Sapino-08844796.pdf

Accesso riservato

Tipo di file: PDF EDITORIALE
Dimensione 2.85 MB
Formato Adobe PDF
2.85 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1726448
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 0
social impact