Given a sample of size n from a population of individuals belonging to different species with unknown proportions, a popular problem of practical interest consists in making inference on the probability Dn(l) that the (n+1)-th draw coincides with a species with frequency l in the sample, for any l=0,1,…,n. This paper contributes to the methodology of Bayesian nonparametric inference for Dn(l). Specifically, under the general framework of Gibbs-type priors we show how to derive credible intervals for a Bayesian nonparametric estimation of Dn(l), and we investigate the large n asymptotic behaviour of such an estimator. Of particular interest are special cases of our results obtained under the specification of the two parameter Poisson--Dirichlet prior and the normalized generalized Gamma prior, which are two of the most commonly used Gibbs-type priors. With respect to these two prior specifications, the proposed results are illustrated through a simulation study and a benchmark Expressed Sequence Tags dataset. To the best our knowledge, this illustration provides the first comparative study between the two parameter Poisson--Dirichlet prior and the normalized generalized Gamma prior in the context of Bayesian nonparemetric inference for Dn(l).

Bayesian nonparametric inference for discovery probabilities: credible intervals and large sample asymptotics

FAVARO, STEFANO;NIPOTI, BERNARDO;
2017-01-01

Abstract

Given a sample of size n from a population of individuals belonging to different species with unknown proportions, a popular problem of practical interest consists in making inference on the probability Dn(l) that the (n+1)-th draw coincides with a species with frequency l in the sample, for any l=0,1,…,n. This paper contributes to the methodology of Bayesian nonparametric inference for Dn(l). Specifically, under the general framework of Gibbs-type priors we show how to derive credible intervals for a Bayesian nonparametric estimation of Dn(l), and we investigate the large n asymptotic behaviour of such an estimator. Of particular interest are special cases of our results obtained under the specification of the two parameter Poisson--Dirichlet prior and the normalized generalized Gamma prior, which are two of the most commonly used Gibbs-type priors. With respect to these two prior specifications, the proposed results are illustrated through a simulation study and a benchmark Expressed Sequence Tags dataset. To the best our knowledge, this illustration provides the first comparative study between the two parameter Poisson--Dirichlet prior and the normalized generalized Gamma prior in the context of Bayesian nonparemetric inference for Dn(l).
2017
27
2
839
858
Asymptotics, Bayesian nonparametrics, credible intervals, discovery probability, Gibbs-type priors, Good-Turing estimator, normalized generalized Gamma prior, smoothing technique, two-parameter Poisson-Dirichlet.
Arbel, Julyan; Favaro, Stefano; Nipoti, Bernardo; Teh, Yee Whye
File in questo prodotto:
File Dimensione Formato  
sinica_final.pdf

Accesso aperto

Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione 2.46 MB
Formato Adobe PDF
2.46 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1591192
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 6
social impact