: Telling species apart using DNA sequence data plays a key role in understanding, monitoring, and managing biodiversity. However, plant species discrimination is often difficult due to the complex nature of plant species boundaries. To inform future strategies for DNA-based identification of plants using the nuclear genome and to gain fundamental insights into the genomic nature of differences between plant species, we conducted a large-scale analysis mining data from 151 studies. Of the 1713 multiple-sampled species evaluated, 1202 resolved as monophyletic (70.2%). We then assessed the density of species-specific SNPs (SSSNPs) in the DNA sequence data - of the 462 species from 27 genera assessed in detail, there was a median density of 193 SSSNPs per Mb and 412 species (89.2%) had at least one SSSNP. Randomly sub-sampling the SNP data showed an asymptote in species discrimination with around 3000 randomly selected SNPs. Finally, we undertook a resampling of 6 target-capture datasets and showed that 1-9 pre-selected loci provided equivalent levels of species discrimination compared to hundreds of nuclear loci. These findings provide an important quantitative assessment of the genomic nature of differences between plant species and provide foundations for the development of enhanced approaches for high-resolution DNA-based plant species discrimination.

DNA-based identification of plants and the genomic nature of plant species differences

Dexter, Kyle G.;
2026-01-01

Abstract

: Telling species apart using DNA sequence data plays a key role in understanding, monitoring, and managing biodiversity. However, plant species discrimination is often difficult due to the complex nature of plant species boundaries. To inform future strategies for DNA-based identification of plants using the nuclear genome and to gain fundamental insights into the genomic nature of differences between plant species, we conducted a large-scale analysis mining data from 151 studies. Of the 1713 multiple-sampled species evaluated, 1202 resolved as monophyletic (70.2%). We then assessed the density of species-specific SNPs (SSSNPs) in the DNA sequence data - of the 462 species from 27 genera assessed in detail, there was a median density of 193 SSSNPs per Mb and 412 species (89.2%) had at least one SSSNP. Randomly sub-sampling the SNP data showed an asymptote in species discrimination with around 3000 randomly selected SNPs. Finally, we undertook a resampling of 6 target-capture datasets and showed that 1-9 pre-selected loci provided equivalent levels of species discrimination compared to hundreds of nuclear loci. These findings provide an important quantitative assessment of the genomic nature of differences between plant species and provide foundations for the development of enhanced approaches for high-resolution DNA-based plant species discrimination.
2026
9
1
1
11
Huang, Wu; Li, De-Zhu; Antonelli, Alexandre; Bacon, Christine D.; Gao, Lian-Ming; Kidner, Catherine; Pennington, R. Toby; Soltis, Douglas E.; Soltis, ...espandi
File in questo prodotto:
File Dimensione Formato  
Huang.etal.2026_DNAid.pdf

Accesso aperto

Dimensione 1.47 MB
Formato Adobe PDF
1.47 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2142170
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact