It is well know that in all situations involving the study of large data sets where a substantial number of outliers or clustered data are present, regression models based on $M$-estimators are likely to be unstable. Resorting to the inherent properties of robustness of the estimates based on the Integrated Square Error criterion, a technique of regression analysis which consists in comparing the results arising from $L_2$ estimates with the ones obtained applying some common $M$-estimators. The discrepancy between the estimated regression models is measured resorting to a new concept of similarity between functions and a system of statistical hypothesis, based on a Monte Carlo Significance test, is introduced to verify the similarity of the estimates. Whenever the hypothesis of similarity between models is rejected, a careful investigation of data structure is necessary to check the presence of clusters, which can lead to consider a mixture of regression models. Concerning this, we shall see how $L_2$ criterion can be applied in fitting a finite mixture of regression models. Theory is outlined and the whole procedure is applied to a case study concerning the evaluation of the risk of fire and the risk of electric shocks of electronic transformers.
Comparing Robust Regression Estimators to Detect Data Clusters: A case Study
DURIO, Alessandra;ISAIA, Ennio Davide
2014-01-01
Abstract
It is well know that in all situations involving the study of large data sets where a substantial number of outliers or clustered data are present, regression models based on $M$-estimators are likely to be unstable. Resorting to the inherent properties of robustness of the estimates based on the Integrated Square Error criterion, a technique of regression analysis which consists in comparing the results arising from $L_2$ estimates with the ones obtained applying some common $M$-estimators. The discrepancy between the estimated regression models is measured resorting to a new concept of similarity between functions and a system of statistical hypothesis, based on a Monte Carlo Significance test, is introduced to verify the similarity of the estimates. Whenever the hypothesis of similarity between models is rejected, a careful investigation of data structure is necessary to check the presence of clusters, which can lead to consider a mixture of regression models. Concerning this, we shall see how $L_2$ criterion can be applied in fitting a finite mixture of regression models. Theory is outlined and the whole procedure is applied to a case study concerning the evaluation of the risk of fire and the risk of electric shocks of electronic transformers.File | Dimensione | Formato | |
---|---|---|---|
comparing_robust.pdf
Accesso riservato
Tipo di file:
PDF EDITORIALE
Dimensione
707.55 kB
Formato
Adobe PDF
|
707.55 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.