Species-sampling problems (SSPs) refer to a vast class of statistical problems calling for the estimation of (discrete) functionals of the unknown species composition of an unobservable population. A common feature of SSPs is their invariance with respect to species labeling, which is at the core of the Bayesian nonparametric (BNP) approach to SSPs under the popular Pitman-Yor process (PYP) prior. In this paper, we develop a BNP approach to SSPs that are not “in- variant” to species labeling, in the sense that an ordering or ranking is assigned to species’ labels. Inspired by the population genetics literature on age-ordered alle- les’ compositions, we study the following SSP with ordering: given an observable sample from an unknown population of individuals belonging to species (alleles), with species’ labels being ordered according to weights (ages), estimate the fre- quencies of the first r order species’ labels in an enlarged sample obtained by including additional unobservable samples. By relying on an ordered PYP prior, we obtain an explicit posterior distribution of the first r order frequencies, with estimates being of easy implementation and computationally efficient. We apply our approach to the analysis of genetic variation, showing its effectiveness in es- timating the frequency of the oldest allele, and then we discuss other potential applications.
A Bayesian nonparametric approach to species sampling problems with ordering
Cecilia Balocchi;Stefano Favaro
In corso di stampa
Abstract
Species-sampling problems (SSPs) refer to a vast class of statistical problems calling for the estimation of (discrete) functionals of the unknown species composition of an unobservable population. A common feature of SSPs is their invariance with respect to species labeling, which is at the core of the Bayesian nonparametric (BNP) approach to SSPs under the popular Pitman-Yor process (PYP) prior. In this paper, we develop a BNP approach to SSPs that are not “in- variant” to species labeling, in the sense that an ordering or ranking is assigned to species’ labels. Inspired by the population genetics literature on age-ordered alle- les’ compositions, we study the following SSP with ordering: given an observable sample from an unknown population of individuals belonging to species (alleles), with species’ labels being ordered according to weights (ages), estimate the fre- quencies of the first r order species’ labels in an enlarged sample obtained by including additional unobservable samples. By relying on an ordered PYP prior, we obtain an explicit posterior distribution of the first r order frequencies, with estimates being of easy implementation and computationally efficient. We apply our approach to the analysis of genetic variation, showing its effectiveness in es- timating the frequency of the oldest allele, and then we discuss other potential applications.| File | Dimensione | Formato | |
|---|---|---|---|
|
24-BA1418-4.pdf
Accesso aperto
Tipo di file:
POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione
1.21 MB
Formato
Adobe PDF
|
1.21 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



