Generative models (GANs or Diffusion Models) have achieved state-of-the-art performance in image synthesis in various domains, including the medical field. Many works have shown that it is possible to fully train deep models (e.g. classifiers) solely based on synthesized data and that augmenting real data with synthesized samples can lead to superior discriminative power. However, large datasets are needed to train realistic generative models. On the other hand, this can be an issue in certain domains, such as medical imaging, where datasets are typically smaller and more costly to annotate. Small sample size poses a fundamental problem in the applicability of generative models for data augmentation, as a generative model will not necessarily approximate the real data distribution better than a simple classifier in a low-data regime. To tackle this issue, we introduce prior information in the generative process to compensate for the lack of data, unlocking generative augmentation in low-data settings. In this work, we focus on computational pathology, specifically on the sensitive topic of the classification of colorectal polyp dysplasia. To guide the generative process, we take advantage of medical knowledge on tissue morphology taken from the World Health Organization (WHO) guidelines for the classification of dysplasia. By incorporating our proposed generative pipeline into a contrastive learning framework, we achieve state-of-the-art results in the detection of high-grade dysplasia on the UnitoPatho dataset.
Improving Generative Data Augmentation with Prior-Knowledge for Dysplasia Grading of Colorectal Polyps
Craparotta R.;Ivanov D.;Barbano C. A.;Tartaglione E.;Gambella A.;Cavallo L.;Cassoni P.;Bertero L.;Grangetto M.
2026-01-01
Abstract
Generative models (GANs or Diffusion Models) have achieved state-of-the-art performance in image synthesis in various domains, including the medical field. Many works have shown that it is possible to fully train deep models (e.g. classifiers) solely based on synthesized data and that augmenting real data with synthesized samples can lead to superior discriminative power. However, large datasets are needed to train realistic generative models. On the other hand, this can be an issue in certain domains, such as medical imaging, where datasets are typically smaller and more costly to annotate. Small sample size poses a fundamental problem in the applicability of generative models for data augmentation, as a generative model will not necessarily approximate the real data distribution better than a simple classifier in a low-data regime. To tackle this issue, we introduce prior information in the generative process to compensate for the lack of data, unlocking generative augmentation in low-data settings. In this work, we focus on computational pathology, specifically on the sensitive topic of the classification of colorectal polyp dysplasia. To guide the generative process, we take advantage of medical knowledge on tissue morphology taken from the World Health Organization (WHO) guidelines for the classification of dysplasia. By incorporating our proposed generative pipeline into a contrastive learning framework, we achieve state-of-the-art results in the detection of high-grade dysplasia on the UnitoPatho dataset.| File | Dimensione | Formato | |
|---|---|---|---|
|
ICIAP_2025___Histopatho_Gen_Aug.pdf
Accesso aperto
Tipo di file:
POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione
5.29 MB
Formato
Adobe PDF
|
5.29 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



