It is well know that in all situations involving the study of large data sets where a substantial number of outliers or clustered data are present, regression models based on $M$-estimators are likely to be unstable. Resorting to the inherent properties of robustness of the estimates based on the Integrated Square Error criterion, a technique of regression analysis which consists in comparing the results arising from $L_2$ estimates with the ones obtained applying some common $M$-estimators. The discrepancy between the estimated regression models is measured resorting to a new concept of similarity between functions and a system of statistical hypothesis, based on a Monte Carlo Significance test, is introduced to verify the similarity of the estimates. Whenever the hypothesis of similarity between models is rejected, a careful investigation of data structure is necessary to check the presence of clusters, which can lead to consider a mixture of regression models. Concerning this, we shall see how $L_2$ criterion can be applied in fitting a finite mixture of regression models. Theory is outlined and the whole procedure is applied to a case study concerning the evaluation of the risk of fire and the risk of electric shocks of electronic transformers.

Comparing Robust Regression Estimators to Detect Data Clusters: A case Study

DURIO, Alessandra;ISAIA, Ennio Davide
2014-01-01

Abstract

It is well know that in all situations involving the study of large data sets where a substantial number of outliers or clustered data are present, regression models based on $M$-estimators are likely to be unstable. Resorting to the inherent properties of robustness of the estimates based on the Integrated Square Error criterion, a technique of regression analysis which consists in comparing the results arising from $L_2$ estimates with the ones obtained applying some common $M$-estimators. The discrepancy between the estimated regression models is measured resorting to a new concept of similarity between functions and a system of statistical hypothesis, based on a Monte Carlo Significance test, is introduced to verify the similarity of the estimates. Whenever the hypothesis of similarity between models is rejected, a careful investigation of data structure is necessary to check the presence of clusters, which can lead to consider a mixture of regression models. Concerning this, we shall see how $L_2$ criterion can be applied in fitting a finite mixture of regression models. Theory is outlined and the whole procedure is applied to a case study concerning the evaluation of the risk of fire and the risk of electric shocks of electronic transformers.
2014
Statistical Modelling in Biostatistics adn Bioinformatics
Springer
Contribution to Statistics
125
138
9783319045788
http://www.springer.com/statistics/statistical+theory+and+methods/book/978-3-319-04578-8
A. Durio; E.D. Isaia
File in questo prodotto:
File Dimensione Formato  
comparing_robust.pdf

Accesso riservato

Tipo di file: PDF EDITORIALE
Dimensione 707.55 kB
Formato Adobe PDF
707.55 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/158644
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact