Multiple-breed genomic evaluation by principal component analysis in small size populations

Gaspa, G; Jorjani, H.; Dimauro, C.; Cellesi, M.; Ajmone-Marsan, P.; Stella, A; Npp, Macciotta

doi:10.1017/S1751731114002973

In this study, the effects of breed composition and predictor dimensionality on the accuracy of direct genomic values (DGV) in a multiple breed (MB) cattle population were investigated. A total of 3559 bulls of three breeds were genotyped at 54 001 single nucleotide polymorphisms: 2093 Holstein (H), 749 Brown Swiss (B) and 717 Simmental (S). DGV were calculated using a principal component (PC) approach for either single (SB) or MB scenarios. Moreover, DGV were computed using all SNP genotypes simultaneously with SNPBLUP model as comparison. A total of seven data sets were used: three with a SB each, three with different pairs of breeds (HB, HS and BS), and one with all the three breeds together (HBS), respectively. Editing was performed separately for each scenario. Reference populations differed in breed composition, whereas the validation bulls were the same for all scenarios. The number of SNPs retained after data editing ranged from 36 521 to 41 360. PCs were extracted from actual genotypes. The total number of retained PCs ranged from 4029 to 7284 in Brown Swiss and HBS respectively, reducing the number of predictors by about 85% (from 82% to 89%). In all, three traits were considered: milk, fat and protein yield. Correlations between deregressed proofs and DGV were used to assess prediction accuracy in validation animals. In the SB scenarios, average DGV accuracy did not substantially change when either SNPBLUP or PC were used. Improvement of DGV accuracy were observed for some traits in Brown Swiss, only when MB reference populations and PC approach were used instead of SB-SNPBLUP (+10% HBS, +16%HB for milk yield and +3% HBS and +7% HB for protein yield, respectively). With the exclusion of the abovementioned cases, similar accuracies were observed using MB reference population, under the PC or SNPBLUP models. Random variation owing to sampling effect or size and composition of the reference population may explain the difficulty in finding a defined pattern in the results.