Purpose of this paper is to investigate the use of the Integrated Square Error, or L2 distance, as a practical estimation tool in building useful re- gression models. Exploiting its properties of robustness, we shall see how it can be particularly helpful in all those situations involving the study of large data sets where regression models based on M -estimators are likely to be unstable due to the presence of of a substantial number of outliers or clustered data. We propose a technique of regression analysis which consists in comparing the results arising from L2 estimates with the ones obtained applying some common M -estimators. The discrepancy between the estimated regression models is measured resorting to a new concept of similarity between functions and a system of statistical hypothesis, based on Monte Carlo Significance test, is introduced to verify the similarity of the estimates. Theory is outlined and a case study, based on Health Professionals Follow- Up Study (Harvard School of Public Health), in estimating the waist circumference as predictor of type 2 diabetes risk is presented.
Regression Models and Cluster Detection. An Application to Anthropometric Measurements
ISAIA, Ennio Davide;DURIO, Alessandra
2007-01-01
Abstract
Purpose of this paper is to investigate the use of the Integrated Square Error, or L2 distance, as a practical estimation tool in building useful re- gression models. Exploiting its properties of robustness, we shall see how it can be particularly helpful in all those situations involving the study of large data sets where regression models based on M -estimators are likely to be unstable due to the presence of of a substantial number of outliers or clustered data. We propose a technique of regression analysis which consists in comparing the results arising from L2 estimates with the ones obtained applying some common M -estimators. The discrepancy between the estimated regression models is measured resorting to a new concept of similarity between functions and a system of statistical hypothesis, based on Monte Carlo Significance test, is introduced to verify the similarity of the estimates. Theory is outlined and a case study, based on Health Professionals Follow- Up Study (Harvard School of Public Health), in estimating the waist circumference as predictor of type 2 diabetes risk is presented.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.