With many applications relying on multi-dimensional datasets for decision making, matrix factorization (or decomposition) is becoming the basis for many knowledge discovery and machine learning tasks, from clustering, trend detection, anomaly detection, to correlation analysis. Unfortunately, a major shortcoming of matrix analysis operations is that, despite their effectiveness when the data is scalar, these operations become difficult to apply in the presence of non-scalar data, as they are not designed for data that include non-scalar observations, such as intervals. Yet, in many applications, the available data are inherently non-scalar for various reasons, including imprecision in data collection, conflicts in aggregated data, data summarization, or privacy issues, where one is provided with a reduced, clustered, or intentionally noisy and obfuscated version of the data to hide information. In this paper, we propose matrix decomposition techniques that consider the existence of interval-valued data. We show that naive ways to deal with such imperfect data may introduce errors in analysis and present factorization techniques that are especially effective when the amount of imprecise information is large.
Matrix Factorization with Interval-Valued Data
Francesco Di Mauro;Maria Luisa Sapino
2021-01-01
Abstract
With many applications relying on multi-dimensional datasets for decision making, matrix factorization (or decomposition) is becoming the basis for many knowledge discovery and machine learning tasks, from clustering, trend detection, anomaly detection, to correlation analysis. Unfortunately, a major shortcoming of matrix analysis operations is that, despite their effectiveness when the data is scalar, these operations become difficult to apply in the presence of non-scalar data, as they are not designed for data that include non-scalar observations, such as intervals. Yet, in many applications, the available data are inherently non-scalar for various reasons, including imprecision in data collection, conflicts in aggregated data, data summarization, or privacy issues, where one is provided with a reduced, clustered, or intentionally noisy and obfuscated version of the data to hide information. In this paper, we propose matrix decomposition techniques that consider the existence of interval-valued data. We show that naive ways to deal with such imperfect data may introduce errors in analysis and present factorization techniques that are especially effective when the amount of imprecise information is large.File | Dimensione | Formato | |
---|---|---|---|
tkde18_revised_submission.pdf
Accesso aperto
Descrizione: articolo e appendice
Tipo di file:
POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione
1.31 MB
Formato
Adobe PDF
|
1.31 MB | Adobe PDF | Visualizza/Apri |
Sapino-08844796.pdf
Accesso riservato
Tipo di file:
PDF EDITORIALE
Dimensione
2.85 MB
Formato
Adobe PDF
|
2.85 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.