CINECA IRIS Institutional Research Information System

DNN learning jobs are common in today's clusters due to the advances in AI driven services such as machine translation and image recognition. The most critical phase of these jobs for model performance and learning cost is the tuning of hyperparameters. Existing approaches make use of techniques such as early stopping criteria to reduce the tuning impact on learning cost. However, these strategies do not consider the impact that certain hyperparameters and systems parameters have on training time. This paper presents PIPETUNE, a framework for DNN learning jobs that addresses the trade-offs between these two types of parameters. PipeTune takes advantage of the high parallelism and recurring characteristics of such jobs to minimize the learning cost via a pipelined simultaneous tuning of both hyper and system parameters. Our experimental evaluation using three different types of workloads indicates that PIPETUNE achieves up to 22.6% reduction and 1.7x speed up on tuning and training time, respectively. PIPETUNE not only improves performance but also lowers energy consumption up to 29%.

PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep Learning Clusters

Isabelly Rocha;Nathaniel Morris;Lydia Y. Chen;Pascal Felber;Robert Birke;Valerio Schiavoni

2020-01-01

Abstract

DNN learning jobs are common in today's clusters due to the advances in AI driven services such as machine translation and image recognition. The most critical phase of these jobs for model performance and learning cost is the tuning of hyperparameters. Existing approaches make use of techniques such as early stopping criteria to reduce the tuning impact on learning cost. However, these strategies do not consider the impact that certain hyperparameters and systems parameters have on training time. This paper presents PIPETUNE, a framework for DNN learning jobs that addresses the trade-offs between these two types of parameters. PipeTune takes advantage of the high parallelism and recurring characteristics of such jobs to minimize the learning cost via a pipelined simultaneous tuning of both hyper and system parameters. Our experimental evaluation using three different types of workloads indicates that PIPETUNE achieves up to 22.6% reduction and 1.7x speed up on tuning and training time, respectively. PIPETUNE not only improves performance but also lowers energy consumption up to 29%.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
			2020
		
	Titolo dell'evento
	
			The 21st International Middleware Conference
		
	Luogo dell'evento
	
			Virtual
		
	Data dell'evento
	
			7-11 December 2020
		
	Titolo del volume
	
			Proceedings of the 21st International Middleware Conference
		
	Nome editore
	
			ASSOC COMPUTING MACHINERY
		
	Pagine (da)
	
			89
		
	Pagine (a)
	
			104
		
	Codice ISBN
	
			9781450381536
		
	DOI
	
			https://dx.doi.org/10.1145/3423211.3425692
		
	Parole Chiave
	
			Parameter tuning; Deep Neural Networks training; accuracy time trade-off
		
	Tutti gli autori
	
			Isabelly Rocha; Nathaniel Morris; Lydia Y. Chen; Pascal Felber; Robert Birke; Valerio Schiavoni
		
	Appare nelle tipologie:
	
			04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
2020 Middleware PipeTune- Pipeline Parallelism of Hyper and System Parameters Tuning for Deep Learning Clusters.pdf Accesso riservato Dimensione 2.77 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.77 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1891345

Citazioni

ND

7

7

social impact