In this paper we investigate a class of random probability measures, termed generalized Dirichlet processes, which has been recently introduced in the literature. Such processes induce exchangeable random partitions which are characterized by a more elaborate clustering structure than those arising from Gibbs–type random probability measures. A natural area of application of these random probability measures is represented by species sampling problems and, in particular, prediction problems in genomics. To this end we study both the distribution of the number of distinct species present in a sample and the distribution of the number of new species conditionally on an observed sample. We also provide the nonparametric Bayesian estimator, under quadratic loss, for the number of new species in an additional sample of given size and for the discovery probability as function of the size of the additional sample. Finally, the study of its conditional structure is completed by the determination of the posterior distribution. As a by–product, we also obtain an interesting generalization of the Chu-Vandermonde convolution formula involving the fourth Lauricella hypergeometric function.

On a class of random probability measures with general predictive structure

FAVARO, STEFANO;PRUENSTER, Igor;
2008-01-01

Abstract

In this paper we investigate a class of random probability measures, termed generalized Dirichlet processes, which has been recently introduced in the literature. Such processes induce exchangeable random partitions which are characterized by a more elaborate clustering structure than those arising from Gibbs–type random probability measures. A natural area of application of these random probability measures is represented by species sampling problems and, in particular, prediction problems in genomics. To this end we study both the distribution of the number of distinct species present in a sample and the distribution of the number of new species conditionally on an observed sample. We also provide the nonparametric Bayesian estimator, under quadratic loss, for the number of new species in an additional sample of given size and for the discovery probability as function of the size of the additional sample. Finally, the study of its conditional structure is completed by the determination of the posterior distribution. As a by–product, we also obtain an interesting generalization of the Chu-Vandermonde convolution formula involving the fourth Lauricella hypergeometric function.
2008
Carlo Alberto Notebooks
161
http://www.carloalberto.org/research/working-papers/2010
Bayesian Nonparametrics; Bell Polynomials; Dirichlet process; Exchangeable random partitions; Generalized gamma process; Generalized gamma convolutions; Lauricella hypergeometric function; Population genetics; Species sampling models
S. Favaro; I. Pruenster; S.G. Walker
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/58935
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact