Species-sampling problems (SSPs) refer to a vast class of statistical problems calling for the estimation of (discrete) functionals of the unknown species composition of an unobservable population. A common feature of SSPs is their invariance with respect to species labeling, which is at the core of the Bayesian nonparametric (BNP) approach to SSPs under the popular Pitman-Yor process (PYP) prior. In this paper, we develop a BNP approach to SSPs that are not “in- variant” to species labeling, in the sense that an ordering or ranking is assigned to species’ labels. Inspired by the population genetics literature on age-ordered alle- les’ compositions, we study the following SSP with ordering: given an observable sample from an unknown population of individuals belonging to species (alleles), with species’ labels being ordered according to weights (ages), estimate the fre- quencies of the first r order species’ labels in an enlarged sample obtained by including additional unobservable samples. By relying on an ordered PYP prior, we obtain an explicit posterior distribution of the first r order frequencies, with estimates being of easy implementation and computationally efficient. We apply our approach to the analysis of genetic variation, showing its effectiveness in es- timating the frequency of the oldest allele, and then we discuss other potential applications.

A Bayesian nonparametric approach to species sampling problems with ordering

Cecilia Balocchi;Stefano Favaro
In corso di stampa

Abstract

Species-sampling problems (SSPs) refer to a vast class of statistical problems calling for the estimation of (discrete) functionals of the unknown species composition of an unobservable population. A common feature of SSPs is their invariance with respect to species labeling, which is at the core of the Bayesian nonparametric (BNP) approach to SSPs under the popular Pitman-Yor process (PYP) prior. In this paper, we develop a BNP approach to SSPs that are not “in- variant” to species labeling, in the sense that an ordering or ranking is assigned to species’ labels. Inspired by the population genetics literature on age-ordered alle- les’ compositions, we study the following SSP with ordering: given an observable sample from an unknown population of individuals belonging to species (alleles), with species’ labels being ordered according to weights (ages), estimate the fre- quencies of the first r order species’ labels in an enlarged sample obtained by including additional unobservable samples. By relying on an ordered PYP prior, we obtain an explicit posterior distribution of the first r order frequencies, with estimates being of easy implementation and computationally efficient. We apply our approach to the analysis of genetic variation, showing its effectiveness in es- timating the frequency of the oldest allele, and then we discuss other potential applications.
In corso di stampa
1
26
Bayesian nonparametrics, exchangeable partition probability function, first r order frequency, ordered Pitman-Yor process prior, species sampling problems, population genetics.
Cecilia Balocchi; Federico Camerlenghi; Stefano Favaro
File in questo prodotto:
File Dimensione Formato  
24-BA1418-4.pdf

Accesso aperto

Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione 1.21 MB
Formato Adobe PDF
1.21 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2042350
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact