Use of different statistical models to predict direct genomic values for productive and functional traits in Italian Holsteins

Pintus, M. A.; Nicolazzi, E. L.; Van Kaam, J. B. C. H. M.; Biffani, S.; Stella, A.; Gaspa, Giustino; Dimauro, Corrado; Macciotta, Nicolo' Pietro Paolo

doi:10.1111/j.1439-0388.2012.01019.x

The huge number of markers in comparison with phenotypes available represents one of the main issues in genomic selection. In this work, principal component analysis (PCA) was used to reduce the number of predictors for calculating direct genomic breeding values (DGV) and genomic enhanced estimated breeding values (GEBV). Bulls of two cattle breeds in Italy (749 Brown and 479 Simmental) were genotyped with the 54K Illumina beadchip. After data editing, 37,254 and 40,179 SNP were retained for Brown and Simmental, respectively. Principal component analysis carried out on SNP genotype matrix extracted 2,257 and 3,596 new variables in the two breeds, respectively. Bulls were sorted by birth year or randomly shuffled to create reference and prediction populations. The effect of principal components on de-regressed proofs in reference animals was estimated with a BLUP model. Results were compared to those obtained by using SNP genotypes as predictors either with BLUP or Bayes_A methods. Traits considered were milk, fat and protein yield, fat and protein percentage, somatic cell score, and udder score. GEBV were obtained for prediction population by blending DGV and PA. No substantial differences in correlations between DGV and EBV were observed among the three methods in the two breeds. The approach based on the use of PCA showed the lowest prediction bias. The PCA method allowed for a reduction of about 90% in the number of independent variables when predicting DGV, with a huge decrease in calculation time and without losses in accuracy.