Object detection is essential for precision agriculture applications like automated plant counting, but the minimum dataset requirements for effective model deployment remain poorly understood for arable crop seedling detection on orthomosaics. This study investigated how much annotated data is required to achieve standard counting accuracy (R2 = 0.85) for maize seedlings across different object detection approaches. We systematically evaluated traditional deep learning models requiring many training examples (YOLOv5, YOLOv8, YOLO11, RT-DETR), newer approaches requiring few examples (CD-ViTO), and methods requiring zero labeled examples (OWLv2) using drone-captured orthomosaic RGB imagery. We also implemented a handcrafted computer graphics algorithm as baseline. Models were tested with varying training sources (in-domain vs. out-of-distribution data), training dataset sizes (10–150 images), and annotation quality levels (10–100%). Our results demonstrate that no model trained on out-of-distribution data achieved acceptable performance, regardless of dataset size. In contrast, models trained on in-domain data reached the benchmark with as few as 60–130 annotated images, depending on architecture. Transformer-based models (RT-DETR) required significantly fewer samples (60) than CNN-based models (110–130), though they showed different tolerances to annotation quality reduction. Models maintained acceptable performance with only 65–90% of original annotation quality. Despite recent advances, neither few-shot nor zero-shot approaches met minimum performance requirements for precision agriculture deployment. These findings provide practical guidance for developing maize seedling detection systems, demonstrating that successful deployment requires in-domain training data, with minimum dataset requirements varying by model architecture.

On the Minimum Dataset Requirements for Fine-Tuning an Object Detector for Arable Crop Plant Counting: A Case Study on Maize Seedlings

Bumbaca S.;Borgogno-Mondino E.
2025-01-01

Abstract

Object detection is essential for precision agriculture applications like automated plant counting, but the minimum dataset requirements for effective model deployment remain poorly understood for arable crop seedling detection on orthomosaics. This study investigated how much annotated data is required to achieve standard counting accuracy (R2 = 0.85) for maize seedlings across different object detection approaches. We systematically evaluated traditional deep learning models requiring many training examples (YOLOv5, YOLOv8, YOLO11, RT-DETR), newer approaches requiring few examples (CD-ViTO), and methods requiring zero labeled examples (OWLv2) using drone-captured orthomosaic RGB imagery. We also implemented a handcrafted computer graphics algorithm as baseline. Models were tested with varying training sources (in-domain vs. out-of-distribution data), training dataset sizes (10–150 images), and annotation quality levels (10–100%). Our results demonstrate that no model trained on out-of-distribution data achieved acceptable performance, regardless of dataset size. In contrast, models trained on in-domain data reached the benchmark with as few as 60–130 annotated images, depending on architecture. Transformer-based models (RT-DETR) required significantly fewer samples (60) than CNN-based models (110–130), though they showed different tolerances to annotation quality reduction. Models maintained acceptable performance with only 65–90% of original annotation quality. Despite recent advances, neither few-shot nor zero-shot approaches met minimum performance requirements for precision agriculture deployment. These findings provide practical guidance for developing maize seedling detection systems, demonstrating that successful deployment requires in-domain training data, with minimum dataset requirements varying by model architecture.
2025
17
13
1
28
https://www.mdpi.com/2673-7418/5/3/31
agricultural computer vision; annotation quality; dataset requirements; few-shot learning; maize seedling counting; object detection; orthomosaic imagery; precision agriculture; transformer models
Bumbaca S.; Borgogno-Mondino E.
File in questo prodotto:
File Dimensione Formato  
remotesensing-17-02190-v2_opt.pdf

Accesso aperto

Tipo di file: PDF EDITORIALE
Dimensione 1.78 MB
Formato Adobe PDF
1.78 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2102514
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact