In the field of deep learning (DL), the deployment of complex Neural Networks (NN) models on memory-constrained devices presents a significant challenge. TinyML focuses on optimizing DL models for such environments, where computational and storage resources are limited. The main key aspect of this optimization involves reducing the size of the models without compromising their performance too much. We have investigated the efficacy of various quantization techniques in optimizing DL models for deployment on memory-constrained devices. To understand the challenges of memory requirements of standard deep learning models, we conducted comprehensive literature reviews and identified quantization methods as a potent approach for model size reduction. Our study targets popular NN architectures such as ResNetV1 and V2, MobileNetV1 and V2, and introduces a custom-designed model, examining their suitability to TinyML constraints. We have analyzed CIFAR-10 and MNIST datasets to assess the impact of four distinct quantization techniques on model size and accuracy. These techniques include Dynamic Range Quantization, Full Integer Quantization, Float16 Quantization, and Integer 16×8 Quantization. Our aim is to contribute valuable insights into model optimization for efficient deployment in resource-limited environments.

Edge AI on Constrained IoT Devices: Quantization Strategies for Model Optimization

Shabir M. Y.
;
Torta G.;Damiani F.
2024-01-01

Abstract

In the field of deep learning (DL), the deployment of complex Neural Networks (NN) models on memory-constrained devices presents a significant challenge. TinyML focuses on optimizing DL models for such environments, where computational and storage resources are limited. The main key aspect of this optimization involves reducing the size of the models without compromising their performance too much. We have investigated the efficacy of various quantization techniques in optimizing DL models for deployment on memory-constrained devices. To understand the challenges of memory requirements of standard deep learning models, we conducted comprehensive literature reviews and identified quantization methods as a potent approach for model size reduction. Our study targets popular NN architectures such as ResNetV1 and V2, MobileNetV1 and V2, and introduces a custom-designed model, examining their suitability to TinyML constraints. We have analyzed CIFAR-10 and MNIST datasets to assess the impact of four distinct quantization techniques on model size and accuracy. These techniques include Dynamic Range Quantization, Full Integer Quantization, Float16 Quantization, and Integer 16×8 Quantization. Our aim is to contribute valuable insights into model optimization for efficient deployment in resource-limited environments.
2024
Intelligent Systems Conference, IntelliSys 2024
nld
2024
Lecture Notes in Networks and Systems
Springer Science and Business Media Deutschland GmbH
1066
556
574
9783031664274
9783031664281
https://link.springer.com/chapter/10.1007/978-3-031-66428-1_35
AI; Deep learning; Machine learning; Memory constrained devices; Optimization; Quantization
Shabir M.Y.; Torta G.; Damiani F.
File in questo prodotto:
File Dimensione Formato  
intellisys24.pdf

Open Access dal 02/08/2025

Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione 353.18 kB
Formato Adobe PDF
353.18 kB Adobe PDF Visualizza/Apri
Shabir-et-al-Intellisys24-editorial.pdf

Accesso riservato

Descrizione: Articollo principale (conferenza)
Tipo di file: PDF EDITORIALE
Dimensione 862.76 kB
Formato Adobe PDF
862.76 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2067155
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 5
social impact