CINECA IRIS Institutional Research Information System

In the field of Machine Learning, one of the most common and discussed questions is how to choose an adequate number of data observations, in order to train our models satisfactorily. In other words, find what is the right amount of data needed to create a model, that is neither underfitted nor overfitted, but instead is able to achieve a reasonable generalization ability. The problem grows in importance when we consider Genetic Programming, where fitness evaluation is often rather slow. Therefore, finding the minimum amount of data that enables us to discover the solution to a given problem could bring significant benefits. Using the notion of entropy in a dataset, we seek to understand the information gain obtainable from each additional data point. We then look for the smallest percentage of data that corresponds to enough information to yield satisfactory results. We present, as a first step, an example derived from the state of art. Then, we question a relevant part of our procedure and introduce two case studies to experimentally validate our theoretical hypothesis.

Reducing the Number of Training Cases in Genetic Programming

Giacomo Zoppi;Leonardo Vanneschi;Mario Giacobini

2022-01-01

Abstract

In the field of Machine Learning, one of the most common and discussed questions is how to choose an adequate number of data observations, in order to train our models satisfactorily. In other words, find what is the right amount of data needed to create a model, that is neither underfitted nor overfitted, but instead is able to achieve a reasonable generalization ability. The problem grows in importance when we consider Genetic Programming, where fitness evaluation is often rather slow. Therefore, finding the minimum amount of data that enables us to discover the solution to a given problem could bring significant benefits. Using the notion of entropy in a dataset, we seek to understand the information gain obtainable from each additional data point. We then look for the smallest percentage of data that corresponds to enough information to yield satisfactory results. We present, as a first step, an example derived from the state of art. Then, we question a relevant part of our procedure and introduce two case studies to experimentally validate our theoretical hypothesis.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Titolo dell'evento
	
				IEEE World Congress on Computational Intelligence - Congress on Evolutionary Computation
			
	Luogo dell'evento
	
				Padua, Italy
			
	Data dell'evento
	
				18-23 July 2022
			
	Titolo del volume
	
				2022 IEEE Congress on Evolutionary Computation, CEC 2022 - Conference Proceedings
			
	Nome editore
	
				Institute of Electrical and Electronics Engineers Inc.
			
	Pagine (da)
	
				1
			
	Pagine (a)
	
				8
			
	Codice ISBN
	
				978-1-6654-6708-7
			
	DOI
	
				https://dx.doi.org/10.1109/CEC55065.2022.9870327
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				https://ieeexplore.ieee.org/document/9870327
			
	Parole Chiave
	
				Genetic Programming, Entropy
			
	Tutti gli autori
	
						Giacomo Zoppi, Leonardo Vanneschi, Mario Giacobini
					
	Appare nelle tipologie:
	
				04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
Preprint_CEC_Reducing.pdf Accesso aperto Tipo di file: PREPRINT (PRIMA BOZZA) Dimensione 497.76 kB Formato Adobe PDF Visualizza/Apri	497.76 kB	Adobe PDF	Visualizza/Apri
2385.pdf Accesso riservato Tipo di file: PDF EDITORIALE Dimensione 1.02 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.02 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1875940

Citazioni

ND

0

0

social impact