We investigate the minimization of the age of information (AoI) of an AI-powered application requiring timely processing of data generated by a multitude of users. We consider that sequences of inference tasks generated at individual terminals can either be processed locally with a tiny machine learning (ML) model or be offloaded to a more powerful ML model residing on an edge computing facility shared by all users. Since the local ML model is less powerful, its inferences may have low confidence. When this happens, the user is forced to repeat the inference with the more powerful edge ML model. The choice between local processing or offloading follows a randomized-alpha policy, where the local ML model, while less powerful, offers the advantage to alleviate congestion of the edge server. The AoI model follows the frameworks presented in the literature for multiple sources sharing the same queue. Local processing instead works as a single-server dedicated queue, but we account for the imperfections of the tiny ML model by including a failure probability in the local server. Tasks that are processed locally but eventually fail to achieve a minimum confidence level are offloaded to the edge server, resulting in a longer overall processing time. We derive a queueing model of the entire system, expanding on the existing investigations to obtain an entirely new contribution. Our results show the trade-offs between processing latency, inference accuracy, and system congestion, highlighting the importance of optimizing task allocation strategies.
Age of Information for Machine Learning Tasks With Mobile Edge Computing Offloading
Castagno, Paolo
;Sereno, Matteo;
2025-01-01
Abstract
We investigate the minimization of the age of information (AoI) of an AI-powered application requiring timely processing of data generated by a multitude of users. We consider that sequences of inference tasks generated at individual terminals can either be processed locally with a tiny machine learning (ML) model or be offloaded to a more powerful ML model residing on an edge computing facility shared by all users. Since the local ML model is less powerful, its inferences may have low confidence. When this happens, the user is forced to repeat the inference with the more powerful edge ML model. The choice between local processing or offloading follows a randomized-alpha policy, where the local ML model, while less powerful, offers the advantage to alleviate congestion of the edge server. The AoI model follows the frameworks presented in the literature for multiple sources sharing the same queue. Local processing instead works as a single-server dedicated queue, but we account for the imperfections of the tiny ML model by including a failure probability in the local server. Tasks that are processed locally but eventually fail to achieve a minimum confidence level are offloaded to the edge server, resulting in a longer overall processing time. We derive a queueing model of the entire system, expanding on the existing investigations to obtain an entirely new contribution. Our results show the trade-offs between processing latency, inference accuracy, and system congestion, highlighting the importance of optimizing task allocation strategies.| File | Dimensione | Formato | |
|---|---|---|---|
|
AoI4AI.pdf
Accesso aperto con embargo fino al 01/01/2028
Dimensione
465.35 kB
Formato
Adobe PDF
|
465.35 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



