With the advent of the deep learning age, a novel paradigm for image compression known as Learned Image Compression (LIC) has emerged. LIC models are designed around an autoencoder where the encoder projects the image into a latent representation that undergoes quantization, entropy coding and is then decoded back into the pixel domain by the decoder, yielding the final reconstructed image. Although LIC models have surpassed or equaled standardized codecs in compression efficiency, some limitations still persist jeopardizing their practical applicability. First, a single LIC model is inherently unsuited at meeting a range of different bitrates because attaining a different RD tradeoff requires to train and deploy a separate model. Moreover, such models typically lack any progressive coding ability allowing to improve the bitstream quality as more bits are received, requiring encoding another bitstream whenever a quality change in sought. Furthermore, the performance of these models may be suboptimal when used to compress contents in domains that are far the training dataset. Such limitations of LIC models that do not affect standardized codecs motivate the research work described in this thesis. As a preliminary investigation, we introduce a parameter-free and differentiable mathematical formulation of the latent probability distribution. We also explore a novel context estimation based on a local graph to optimize the allocation of bits. About the variable rate coding problem, we propose two distinct approaches. The first approach consists in introducing a pluggable quantization layer called STanH designed around a weighted aggregation of parametrized hyperbolic tangents to tailor different quality bitstreams from a pretrained anchor model. The second approach is designed towards transformer based architectures and is designed around LoRA adapters for adaptable output. Namely, we show that plugging a LoRA module into the visual attention multilayer perceptron of transformer module enables a pre-trained model to yield bitstreams at different RD points. Concerning the progressive compression property, we proposed a novel architecture that maps the image into two latent representations, referred to as base and top: following a variance-based masking system, the latter is then decomposed in complementary parts that can be delivered separately to the decoder, achieving progressive coding. iv About the domain adaptation problem, we propose to tackle it plugging domain-specific adapter modules at the decoder, improving the RD efficiency of image compression for a target domain without the need for retraining. We also successfully apply the same paradigm to video compression, showing that improvements to Intra-encode frames propagates also to predicted frames improving the overall video sequence RD compression efficiency. We believe that the insights in this thesis will contribute to advance the field of LIC and bridge the gap with standardized codecs towards practical applications.
Toward sensible learned image compression. Closing the gap with standard codecs(2025 May 08).
Toward sensible learned image compression. Closing the gap with standard codecs
PRESTA, ALBERTO
2025-05-08
Abstract
With the advent of the deep learning age, a novel paradigm for image compression known as Learned Image Compression (LIC) has emerged. LIC models are designed around an autoencoder where the encoder projects the image into a latent representation that undergoes quantization, entropy coding and is then decoded back into the pixel domain by the decoder, yielding the final reconstructed image. Although LIC models have surpassed or equaled standardized codecs in compression efficiency, some limitations still persist jeopardizing their practical applicability. First, a single LIC model is inherently unsuited at meeting a range of different bitrates because attaining a different RD tradeoff requires to train and deploy a separate model. Moreover, such models typically lack any progressive coding ability allowing to improve the bitstream quality as more bits are received, requiring encoding another bitstream whenever a quality change in sought. Furthermore, the performance of these models may be suboptimal when used to compress contents in domains that are far the training dataset. Such limitations of LIC models that do not affect standardized codecs motivate the research work described in this thesis. As a preliminary investigation, we introduce a parameter-free and differentiable mathematical formulation of the latent probability distribution. We also explore a novel context estimation based on a local graph to optimize the allocation of bits. About the variable rate coding problem, we propose two distinct approaches. The first approach consists in introducing a pluggable quantization layer called STanH designed around a weighted aggregation of parametrized hyperbolic tangents to tailor different quality bitstreams from a pretrained anchor model. The second approach is designed towards transformer based architectures and is designed around LoRA adapters for adaptable output. Namely, we show that plugging a LoRA module into the visual attention multilayer perceptron of transformer module enables a pre-trained model to yield bitstreams at different RD points. Concerning the progressive compression property, we proposed a novel architecture that maps the image into two latent representations, referred to as base and top: following a variance-based masking system, the latter is then decomposed in complementary parts that can be delivered separately to the decoder, achieving progressive coding. iv About the domain adaptation problem, we propose to tackle it plugging domain-specific adapter modules at the decoder, improving the RD efficiency of image compression for a target domain without the need for retraining. We also successfully apply the same paradigm to video compression, showing that improvements to Intra-encode frames propagates also to predicted frames improving the overall video sequence RD compression efficiency. We believe that the insights in this thesis will contribute to advance the field of LIC and bridge the gap with standardized codecs towards practical applications.| File | Dimensione | Formato | |
|---|---|---|---|
|
PhD_Thesis_Alberto_Presta (2).pdf
Accesso aperto
Descrizione: Tesi
Dimensione
26.58 MB
Formato
Adobe PDF
|
26.58 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



