Recent advances in computing technology have increased interest in applying data mining to ecology. Machine learning is one of the methods used in most of these data mining applications. As is well known, approximately 80% of the resources in most data mining applications are devoted to cleaning and preprocessing the data. However, there are few studies on preprocessing the ecological data used as the input in these data mining systems. In this study, we use four different feature selection methods (χ2, Information Gain, Gain Ratio, and Symmetrical Uncertainty) and evaluate their effectiveness in preprocessing the input data to be used for inducing artificial neural networks (ANNs) and decision trees (DTs). The presence/absence of fish is the data item used to illustrate our models. Feature selection is fundamental in order to increase the performances of the models obtained. Accuracy of classification improves when a small set of optimally selected features is used. DTs and ANNs are very useful tools when applied to modeling presence/absence of Alburnus alburnus alborella. ANNs generally performed better than DT models.

Importance of feature selection in decision-tree and artificial-neural-network ecological applications. Alburnus alburnus alborella: a practical example.

TIRELLI, Santina;PESSANI, Daniela
2011-01-01

Abstract

Recent advances in computing technology have increased interest in applying data mining to ecology. Machine learning is one of the methods used in most of these data mining applications. As is well known, approximately 80% of the resources in most data mining applications are devoted to cleaning and preprocessing the data. However, there are few studies on preprocessing the ecological data used as the input in these data mining systems. In this study, we use four different feature selection methods (χ2, Information Gain, Gain Ratio, and Symmetrical Uncertainty) and evaluate their effectiveness in preprocessing the input data to be used for inducing artificial neural networks (ANNs) and decision trees (DTs). The presence/absence of fish is the data item used to illustrate our models. Feature selection is fundamental in order to increase the performances of the models obtained. Accuracy of classification improves when a small set of optimally selected features is used. DTs and ANNs are very useful tools when applied to modeling presence/absence of Alburnus alburnus alborella. ANNs generally performed better than DT models.
2011
6
5
309
315
Data preprocessing; Species prediction; Performance comparison; Prediction accuracy
TIRELLI T.; PESSANI D.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/101347
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 36
  • ???jsp.display-item.citation.isi??? 31
social impact