Given an observed sample from a population of individuals belong- ing to species, “species-sampling” problems (SSPs) call for estimating some features of the unknown species composition of additional unobservable sam- ples from the same population. Within SSPs, the problems of estimating cov- erage probabilities, the number of unseen species and coverages of preva- lences have emerged in the past three decades for being the subject of nu- merous methodological and applied works, mostly in biological sciences but also in statistical machine learning, electrical engineering, theoretical com- puter science, information theory and forensic statistics. In this paper, we focus on these popular SSPs, and present an overview of their Bayesian non- parametric (BNP) analysis under the Pitman–Yor process (PYP) prior. While reviewing the literature, we improve on computation and interpretability of existing posterior inferences, typically expressed through complicated com- binatorial numbers, by establishing novel posterior representations in terms of simple compound Binomial and Hypergeometric distributions. We also consider the problem of estimating the discount and scale parameters of the PYP prior, showing a property of Bayesian consistency with respect to esti- mation through the hierarchical Bayes and empirical Bayes approaches, that is: the discount parameter can be estimated consistently, whereas the scale parameter cannot be estimated consistently, thus advising caution in poste- rior inference. We conclude our work by discussing some generalizations of SSPs, mostly in the field of biological sciences, which deal with “feature- sampling”, multiple populations of individuals sharing species and classes of Markov chains.

Bayesian nonparametric inference for "species-sampling" problems

Cecilia Balocchi;Stefano Favaro;
In corso di stampa

Abstract

Given an observed sample from a population of individuals belong- ing to species, “species-sampling” problems (SSPs) call for estimating some features of the unknown species composition of additional unobservable sam- ples from the same population. Within SSPs, the problems of estimating cov- erage probabilities, the number of unseen species and coverages of preva- lences have emerged in the past three decades for being the subject of nu- merous methodological and applied works, mostly in biological sciences but also in statistical machine learning, electrical engineering, theoretical com- puter science, information theory and forensic statistics. In this paper, we focus on these popular SSPs, and present an overview of their Bayesian non- parametric (BNP) analysis under the Pitman–Yor process (PYP) prior. While reviewing the literature, we improve on computation and interpretability of existing posterior inferences, typically expressed through complicated com- binatorial numbers, by establishing novel posterior representations in terms of simple compound Binomial and Hypergeometric distributions. We also consider the problem of estimating the discount and scale parameters of the PYP prior, showing a property of Bayesian consistency with respect to esti- mation through the hierarchical Bayes and empirical Bayes approaches, that is: the discount parameter can be estimated consistently, whereas the scale parameter cannot be estimated consistently, thus advising caution in poste- rior inference. We conclude our work by discussing some generalizations of SSPs, mostly in the field of biological sciences, which deal with “feature- sampling”, multiple populations of individuals sharing species and classes of Markov chains.
In corso di stampa
1
20
Bayesian nonparametrics, Bayesian consistency, coverage of prevalences, coverage probabilities, empirical Bayes, hierarchical Bayes, Pitman–Yor process prior, “species-sampling” problems, unseen species.
Cecilia Balocchi; Stefano Favaro; Zacharie Naulet
File in questo prodotto:
File Dimensione Formato  
STS2203-015R2A0-3.pdf

Accesso aperto

Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione 669.23 kB
Formato Adobe PDF
669.23 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2042331
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact