Automating surgical suturing requires reliable computer vision systems, yet annotated real surgical datasets remain scarce, costly, and difficult to obtain. To address this challenge, we introduce a data-centric pipeline that combines synthetic data generation, generative realism boosting, and model-guided filtering to improve sim-to-real transfer without relying on real annotated surgical footage. Synthetic images were created in Unity with both type-based and part-based instruments annotations, then enhanced using CycleGAN-TURBO for unpaired image-to-image translation and Real-ESRGAN for high-resolution restoration. A YOLO-based selector model, trained on synthetic images, assessed the quality of generatively enhanced data through Dice similarity scoring, discarding samples with distortions or misalignments. In the part-based configuration, on a real test set, the baseline model trained solely on synthetic images achieved a Dice score of 0.17, while combining synthetic with unfiltered enhanced data reached 0.24. Filtering proved decisive: accepted enhanced images combined with a synthetic (hybrid curated dataset) further boosted scores to 0.44. Fine-tuning strategies yielded only marginal gains, confirming that improvements were driven primarily by data quality rather than training variations. In the type-based setup, the hybrid curated dataset achieved a mean Dice score of 0.65, a substantial improvement over previous fully synthetic baselines (0.384) without requiring real training annotations. These results demonstrate that curation of generative outputs is critical for sim-to-real transfer in surgical vision. By uniting synthetic generation, generative realism, and automated filtering, this pipeline enables scalable, low-cost dataset creation, providing resources on GitHub and a reproducible foundation for developing reliable perception systems and advancing autonomy in surgical robotics.
Generative AI pipeline with model-guided filtering for sim-to-real transfer in surgical imaging
Pietro LeonciniFirst
Membro del Collaboration Group
;Francesco Marzola
Membro del Collaboration Group
;Matteo PescioMembro del Collaboration Group
;Lorenzo RevelloMembro del Collaboration Group
;Federica BarontiniMembro del Collaboration Group
;Giovanni DistefanoMembro del Collaboration Group
;Alberto ArezzoMembro del Collaboration Group
;
2026-01-01
Abstract
Automating surgical suturing requires reliable computer vision systems, yet annotated real surgical datasets remain scarce, costly, and difficult to obtain. To address this challenge, we introduce a data-centric pipeline that combines synthetic data generation, generative realism boosting, and model-guided filtering to improve sim-to-real transfer without relying on real annotated surgical footage. Synthetic images were created in Unity with both type-based and part-based instruments annotations, then enhanced using CycleGAN-TURBO for unpaired image-to-image translation and Real-ESRGAN for high-resolution restoration. A YOLO-based selector model, trained on synthetic images, assessed the quality of generatively enhanced data through Dice similarity scoring, discarding samples with distortions or misalignments. In the part-based configuration, on a real test set, the baseline model trained solely on synthetic images achieved a Dice score of 0.17, while combining synthetic with unfiltered enhanced data reached 0.24. Filtering proved decisive: accepted enhanced images combined with a synthetic (hybrid curated dataset) further boosted scores to 0.44. Fine-tuning strategies yielded only marginal gains, confirming that improvements were driven primarily by data quality rather than training variations. In the type-based setup, the hybrid curated dataset achieved a mean Dice score of 0.65, a substantial improvement over previous fully synthetic baselines (0.384) without requiring real training annotations. These results demonstrate that curation of generative outputs is critical for sim-to-real transfer in surgical vision. By uniting synthetic generation, generative realism, and automated filtering, this pipeline enables scalable, low-cost dataset creation, providing resources on GitHub and a reproducible foundation for developing reliable perception systems and advancing autonomy in surgical robotics.| File | Dimensione | Formato | |
|---|---|---|---|
|
CMIG-Paper.pdf
Accesso aperto
Tipo di file:
PDF EDITORIALE
Dimensione
2.77 MB
Formato
Adobe PDF
|
2.77 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



