Given an observed sample from a population of individuals belong- ing to species, “species-sampling” problems (SSPs) call for estimating some features of the unknown species composition of additional unobservable sam- ples from the same population. Within SSPs, the problems of estimating cov- erage probabilities, the number of unseen species and coverages of preva- lences have emerged in the past three decades for being the subject of nu- merous methodological and applied works, mostly in biological sciences but also in statistical machine learning, electrical engineering, theoretical com- puter science, information theory and forensic statistics. In this paper, we focus on these popular SSPs, and present an overview of their Bayesian non- parametric (BNP) analysis under the Pitman–Yor process (PYP) prior. While reviewing the literature, we improve on computation and interpretability of existing posterior inferences, typically expressed through complicated com- binatorial numbers, by establishing novel posterior representations in terms of simple compound Binomial and Hypergeometric distributions. We also consider the problem of estimating the discount and scale parameters of the PYP prior, showing a property of Bayesian consistency with respect to esti- mation through the hierarchical Bayes and empirical Bayes approaches, that is: the discount parameter can be estimated consistently, whereas the scale parameter cannot be estimated consistently, thus advising caution in poste- rior inference. We conclude our work by discussing some generalizations of SSPs, mostly in the field of biological sciences, which deal with “feature- sampling”, multiple populations of individuals sharing species and classes of Markov chains.
Bayesian nonparametric inference for "species-sampling" problems
Cecilia Balocchi;Stefano Favaro;
In corso di stampa
Abstract
Given an observed sample from a population of individuals belong- ing to species, “species-sampling” problems (SSPs) call for estimating some features of the unknown species composition of additional unobservable sam- ples from the same population. Within SSPs, the problems of estimating cov- erage probabilities, the number of unseen species and coverages of preva- lences have emerged in the past three decades for being the subject of nu- merous methodological and applied works, mostly in biological sciences but also in statistical machine learning, electrical engineering, theoretical com- puter science, information theory and forensic statistics. In this paper, we focus on these popular SSPs, and present an overview of their Bayesian non- parametric (BNP) analysis under the Pitman–Yor process (PYP) prior. While reviewing the literature, we improve on computation and interpretability of existing posterior inferences, typically expressed through complicated com- binatorial numbers, by establishing novel posterior representations in terms of simple compound Binomial and Hypergeometric distributions. We also consider the problem of estimating the discount and scale parameters of the PYP prior, showing a property of Bayesian consistency with respect to esti- mation through the hierarchical Bayes and empirical Bayes approaches, that is: the discount parameter can be estimated consistently, whereas the scale parameter cannot be estimated consistently, thus advising caution in poste- rior inference. We conclude our work by discussing some generalizations of SSPs, mostly in the field of biological sciences, which deal with “feature- sampling”, multiple populations of individuals sharing species and classes of Markov chains.| File | Dimensione | Formato | |
|---|---|---|---|
|
STS2203-015R2A0-3.pdf
Accesso aperto
Tipo di file:
POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione
669.23 kB
Formato
Adobe PDF
|
669.23 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



