In Machine Learning (ML), the deployment of complex Neural Network (NN) models on memory-constrained Internet of Things (IoT) devices presents a significant challenge. Tiny Machine Learning (TinyML) focuses on optimizing NN models for such environments, where computational and storage resources are limited. A major aspect of this optimization involves reducing model size without substantially compromising accuracy. We conducted a systematic literature review to identify pruning and quantization techniques suitable for optimization of NN models. In addition, this study investigates the efficiency of pruning and 8-bit integer (INT8) quantization in optimizing NN models for deployment on memory-constrained devices. The study evaluates widely used NN architectures such as ResNet50/101, VGG16, and MobileNet, alongside a custom-designed model, using CIFAR-100, CIFAR-10, MNIST, and Fashion-MNIST datasets. The results show that combining pruning with INT8 quantization reduced the size of MobileNet by 77.01% and the custom model by 94.38%. Notably, the custom model achieved improved accuracy, while MobileNet retained competitive accuracy with minimal loss on CIFAR-100. The main contribution of this work lies in systematically analyzing and comparing pruning, INT8 quantization, and hybrid optimization methods across multiple architectures and datasets, with performance evaluated in terms of recall, latency, and memory requirements before and after optimization. Pruning and INT8 quantization reduced model size and inference time while preserving accuracy for TinyML deployment. These findings highlight practical approaches for enabling efficient TinyML deployment in real-world IoT applications.

TinyML model compression: A comparative study of pruning and quantization on selected standard and custom neural networks

Shabir M. Y.
;
Torta G.
;
Damiani F.
2025-01-01

Abstract

In Machine Learning (ML), the deployment of complex Neural Network (NN) models on memory-constrained Internet of Things (IoT) devices presents a significant challenge. Tiny Machine Learning (TinyML) focuses on optimizing NN models for such environments, where computational and storage resources are limited. A major aspect of this optimization involves reducing model size without substantially compromising accuracy. We conducted a systematic literature review to identify pruning and quantization techniques suitable for optimization of NN models. In addition, this study investigates the efficiency of pruning and 8-bit integer (INT8) quantization in optimizing NN models for deployment on memory-constrained devices. The study evaluates widely used NN architectures such as ResNet50/101, VGG16, and MobileNet, alongside a custom-designed model, using CIFAR-100, CIFAR-10, MNIST, and Fashion-MNIST datasets. The results show that combining pruning with INT8 quantization reduced the size of MobileNet by 77.01% and the custom model by 94.38%. Notably, the custom model achieved improved accuracy, while MobileNet retained competitive accuracy with minimal loss on CIFAR-100. The main contribution of this work lies in systematically analyzing and comparing pruning, INT8 quantization, and hybrid optimization methods across multiple architectures and datasets, with performance evaluated in terms of recall, latency, and memory requirements before and after optimization. Pruning and INT8 quantization reduced model size and inference time while preserving accuracy for TinyML deployment. These findings highlight practical approaches for enabling efficient TinyML deployment in real-world IoT applications.
2025
88
4
1
21
https://ieeexplore.ieee.org/document/11154242
Edge Computing; IoT; Neural Network Optimization; Pruning; Quantization; TinyML
Shabir M.Y.; Torta G.; Damiani F.
File in questo prodotto:
File Dimensione Formato  
Shabir-et-al-SN-TelSys-2025.pdf

Accesso aperto

Descrizione: Article
Tipo di file: PDF EDITORIALE
Dimensione 1.98 MB
Formato Adobe PDF
1.98 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2117410
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact