In the field of Machine Learning, one of the most common and discussed questions is how to choose an adequate number of data observations, in order to train our models satisfactorily. In other words, find what is the right amount of data needed to create a model, that is neither underfitted nor overfitted, but instead is able to achieve a reasonable generalization ability. The problem grows in importance when we consider Genetic Programming, where fitness evaluation is often rather slow. Therefore, finding the minimum amount of data that enables us to discover the solution to a given problem could bring significant benefits. Using the notion of entropy in a dataset, we seek to understand the information gain obtainable from each additional data point. We then look for the smallest percentage of data that corresponds to enough information to yield satisfactory results. We present, as a first step, an example derived from the state of art. Then, we question a relevant part of our procedure and introduce two case studies to experimentally validate our theoretical hypothesis.

Reducing the Number of Training Cases in Genetic Programming

Giacomo Zoppi
;
Mario Giacobini
2022-01-01

Abstract

In the field of Machine Learning, one of the most common and discussed questions is how to choose an adequate number of data observations, in order to train our models satisfactorily. In other words, find what is the right amount of data needed to create a model, that is neither underfitted nor overfitted, but instead is able to achieve a reasonable generalization ability. The problem grows in importance when we consider Genetic Programming, where fitness evaluation is often rather slow. Therefore, finding the minimum amount of data that enables us to discover the solution to a given problem could bring significant benefits. Using the notion of entropy in a dataset, we seek to understand the information gain obtainable from each additional data point. We then look for the smallest percentage of data that corresponds to enough information to yield satisfactory results. We present, as a first step, an example derived from the state of art. Then, we question a relevant part of our procedure and introduce two case studies to experimentally validate our theoretical hypothesis.
2022
IEEE World Congress on Computational Intelligence - Congress on Evolutionary Computation
Padua, Italy
18-23 July 2022
2022 IEEE Congress on Evolutionary Computation, CEC 2022 - Conference Proceedings
Institute of Electrical and Electronics Engineers Inc.
1
8
978-1-6654-6708-7
https://ieeexplore.ieee.org/document/9870327
Genetic Programming, Entropy
Giacomo Zoppi, Leonardo Vanneschi, Mario Giacobini
File in questo prodotto:
File Dimensione Formato  
Preprint_CEC_Reducing.pdf

Accesso aperto

Tipo di file: PREPRINT (PRIMA BOZZA)
Dimensione 497.76 kB
Formato Adobe PDF
497.76 kB Adobe PDF Visualizza/Apri
2385.pdf

Accesso riservato

Tipo di file: PDF EDITORIALE
Dimensione 1.02 MB
Formato Adobe PDF
1.02 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1875940
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact