Genome annotation makes it possible to identify the coding and non-coding regions of a genome, such as exons-introns, regulatory elements, repeats as well as gene functions and locations. The newly developed eggplant genome sequence (see Chap. 7) was masked using RepeatMasker, by combining homology-based and de novo approaches, and ~73% of the eggplant genome was found to include transposable elements (TEs). In total, 34,916 protein-coding genes were predicted, confirming that the diploid gene number in the Solanaceae is around 35,000, as previously reported for tomato (Solanum lycopersicum L.), potato (S. tuberosum L.) and pepper (Capsicum spp.). A total of 108,360 protein sequences from eggplant, pepper and potato were clustered into 22,337 gene families (excluding singletons) using OrthoMCL, with 12,568 gene families (comprising 76,920 genes) in common between the four Solanaceae crops, while 674 eggplant-specific clusters containing 1999 genes were identified. The high-quality eggplant genome sequence offers the possibility to perform comparative genomic studies within species, in order to find variation across individuals for genetic association and linkage analyses, as well as between species, with the goal to perform evolutionary studies. Furthermore, it provides a key resource for the understanding the Solanaceae biology and a key tool for future breeding programmes. The newly developed eggplant genome was also surveyed for the identification of single-locus SSR markers and nearly 133,000 perfect SSRs, a density of 125.5 SSRs/Mbp, as well as about 178,400 imperfect SSRs were identified. Using these data, a public dynamic microsatellite database was developed (www.eggplantmicrosatellite.org), which represents a one-stop resource for the global community of scientists and breeders.
Genome Annotation
Sergio Lanteri;Lorenzo Barchi
2019-01-01
Abstract
Genome annotation makes it possible to identify the coding and non-coding regions of a genome, such as exons-introns, regulatory elements, repeats as well as gene functions and locations. The newly developed eggplant genome sequence (see Chap. 7) was masked using RepeatMasker, by combining homology-based and de novo approaches, and ~73% of the eggplant genome was found to include transposable elements (TEs). In total, 34,916 protein-coding genes were predicted, confirming that the diploid gene number in the Solanaceae is around 35,000, as previously reported for tomato (Solanum lycopersicum L.), potato (S. tuberosum L.) and pepper (Capsicum spp.). A total of 108,360 protein sequences from eggplant, pepper and potato were clustered into 22,337 gene families (excluding singletons) using OrthoMCL, with 12,568 gene families (comprising 76,920 genes) in common between the four Solanaceae crops, while 674 eggplant-specific clusters containing 1999 genes were identified. The high-quality eggplant genome sequence offers the possibility to perform comparative genomic studies within species, in order to find variation across individuals for genetic association and linkage analyses, as well as between species, with the goal to perform evolutionary studies. Furthermore, it provides a key resource for the understanding the Solanaceae biology and a key tool for future breeding programmes. The newly developed eggplant genome was also surveyed for the identification of single-locus SSR markers and nearly 133,000 perfect SSRs, a density of 125.5 SSRs/Mbp, as well as about 178,400 imperfect SSRs were identified. Using these data, a public dynamic microsatellite database was developed (www.eggplantmicrosatellite.org), which represents a one-stop resource for the global community of scientists and breeders.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.