The availability of a huge number of variables is not always associated to better classification performances, as some of them can be redundant, irrelevant or source of noise. For this reason, a Feature Selection (FS) step is often applied to high-dimensional datasets. FS based on correlation relies on the idea that 'good feature subsets contain features highly correlated with the class yet uncorrelated with each other'. However, the main problem of this kind of approach is to define a threshold from which considering two variables correlated. In this study, we evaluated the impact of different thresholds on the performances of two classifiers trained to predict response to neoadjuvant chemotherapy (from grade 1 to 5) of 44 patients with breast cancer. First, 27 texture features were computed on the largest slices belonging to the segmented tumor on the pretreatment dynamic contrast enhanced-MRI. Then, we applied a FS algorithm that identifies the couples of variables with absolute value of the linear correlation coefficient above a given threshold and removed, for each couple, the variable less correlated with the response to the neoadjuvant chemotherapy. We tested correlation thresholds ranging from 1 to 0.8 with intervals of 0.01, and we used each obtained subset to construct a Decision Tree (DT) classifier and a Linear Regression Model (LRM). Our results showed that the removal of highly correlated variables (absolute value of the correlation coefficient >0.97) produced a reduction of the DT performance of about 10%. Although the LRM was not able to reach acceptable results in terms of chemotherapy response prediction (accuracy=40.9%), its intrinsic linearity allowed to be more stable to linear redundancy removal.

Correlation based Feature Selection impact on the classification of breast cancer patients response to neoadjuvant chemotherapy

Giannini V.;Regge D.
2018

Abstract

The availability of a huge number of variables is not always associated to better classification performances, as some of them can be redundant, irrelevant or source of noise. For this reason, a Feature Selection (FS) step is often applied to high-dimensional datasets. FS based on correlation relies on the idea that 'good feature subsets contain features highly correlated with the class yet uncorrelated with each other'. However, the main problem of this kind of approach is to define a threshold from which considering two variables correlated. In this study, we evaluated the impact of different thresholds on the performances of two classifiers trained to predict response to neoadjuvant chemotherapy (from grade 1 to 5) of 44 patients with breast cancer. First, 27 texture features were computed on the largest slices belonging to the segmented tumor on the pretreatment dynamic contrast enhanced-MRI. Then, we applied a FS algorithm that identifies the couples of variables with absolute value of the linear correlation coefficient above a given threshold and removed, for each couple, the variable less correlated with the response to the neoadjuvant chemotherapy. We tested correlation thresholds ranging from 1 to 0.8 with intervals of 0.01, and we used each obtained subset to construct a Decision Tree (DT) classifier and a Linear Regression Model (LRM). Our results showed that the removal of highly correlated variables (absolute value of the correlation coefficient >0.97) produced a reduction of the DT performance of about 10%. Although the LRM was not able to reach acceptable results in terms of chemotherapy response prediction (accuracy=40.9%), its intrinsic linearity allowed to be more stable to linear redundancy removal.
13th IEEE International Symposium on Medical Measurements and Applications, MeMeA 2018
Universita La Sapienza, ita
2018
MeMeA 2018 - 2018 IEEE International Symposium on Medical Measurements and Applications, Proceedings
Institute of Electrical and Electronics Engineers Inc.
1
5
978-1-5386-3392-2
breast cancer; correlation; decision tree; feature selection; linear model regression; neoadjuvant chemotherapy; texture features
Rosati S.; Gianfreda C.M.; Balestra G.; Martincich L.; Giannini V.; Regge D.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/2318/1789060
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 2
social impact