Automated segmentation of historical archaeological photographs: A CNN-based approach applied to the Tachara palace of Persepolis

Andreucci, Domenico

doi:10.19272/202503501002

This article presents a methodological study focused on the application of convolutional neural networks (CNNs) for the automatic segmentation of historical archaeological photographs, exploring ways to mitigate long-standing interpretative challenges related to ambiguous or poorly preserved visual elements. The research centers on the Tachara palace of Persepolis (Iran), chosen as a case study due to its rich architecture and iconographic programme, documented through an extensive corpus of historical and modern photographs from various archival collections, including those of the IsMEO Italian mission (1964-1979). A custom dataset, named PeRSeg14 (Persepolis Restoration activities Segmentation), was developed through manual annotation of 14 visual classes representing significant architectural, decorative, and contextual features identified based on direct visual analysis of the photographic corpus and established terminologies from the Art & Architecture Thesaurus (Getty) and authoritative Persepolis scholarship. The YOLOv8n-seg model was trained and evaluated following a reproducible pipeline using open-source tools. The quantitative evaluation showed moderate but promising results (Precision=0.649, Recall=0.425, [email protected]=0.445, [email protected]:0.95=0.264), with higher performance (higher precision and mAP) for recurrent and clearly defined architectural elements, while highlighting significant difficulties in segmenting visually ambiguous or underrepresented categories, such as Scaffolding Poles or Human figures. The qualitative analysis confirmed the model’s capacity to produce semantically coherent segmentation masks for both validation and test images, demonstrating some generalisation to external photographic archives, though with notable limitations due to visual variability and preservation states. The study critically addresses methodological issues related to dataset imbalance, photographic degradation, and iconographic ambiguity, underscoring the essential role of human interpretation and post-processing validation. Despite the intrinsic limitations of the lightweight YOLOv8n-seg architecture, constrained by the complexity inherent in archaeological data, this research offers a replicable analytical framework and provides a structured, richly annotated dataset designed to facilitate future deep learning applications in archaeology. The PeRSeg project thus shows the potential of CNNs as analytical tools for archaeological documentation, proposing both a controlled visual vocabulary and a defined annotation protocol, designed to address the enduring challenges posed by fragmentation and limited accessibility within historical photographic archives.