The Bayesian approach to inference stands out for naturally allowing borrowing information across heterogeneous populations, with different samples possibly sharing the same distribution. A popular Bayesian nonparametric model for clustering probability distributions is the nested Dirichlet process, which however has the drawback of grouping distributions in a single cluster when ties are observed across samples. With the goal of achieving a flexible and effective clustering method for both samples and observations, we investigate a nonparametric prior that arises as the composition of two different discrete random structures and derive a closed-form expression for the induced distribution of the random partition, the fundamental tool regulating the clustering behavior of the model. On the one hand, this allows to gain a deeper insight into the theoretical properties of the model and, on the other hand, it yields an MCMC algorithm for evaluating Bayesian inferences of interest. Moreover, we single out limitations of this algorithm when working with more than two populations and, consequently, devise an alternative more efficient sampling scheme, which as a by-product, allows testing homogeneity between different populations. Finally, we perform a comparison with the nested Dirichlet process and provide illustrative examples of both synthetic and real data.

Flexible clustering via hidden hierarchical Dirichlet priors

Rebaudo, G
2023-01-01

Abstract

The Bayesian approach to inference stands out for naturally allowing borrowing information across heterogeneous populations, with different samples possibly sharing the same distribution. A popular Bayesian nonparametric model for clustering probability distributions is the nested Dirichlet process, which however has the drawback of grouping distributions in a single cluster when ties are observed across samples. With the goal of achieving a flexible and effective clustering method for both samples and observations, we investigate a nonparametric prior that arises as the composition of two different discrete random structures and derive a closed-form expression for the induced distribution of the random partition, the fundamental tool regulating the clustering behavior of the model. On the one hand, this allows to gain a deeper insight into the theoretical properties of the model and, on the other hand, it yields an MCMC algorithm for evaluating Bayesian inferences of interest. Moreover, we single out limitations of this algorithm when working with more than two populations and, consequently, devise an alternative more efficient sampling scheme, which as a by-product, allows testing homogeneity between different populations. Finally, we perform a comparison with the nested Dirichlet process and provide illustrative examples of both synthetic and real data.
2023
50
1
213
234
http://onlinelibrary.wiley.com/doi/10.1111/sjos.12578
Bayesian nonparametrics; clustering; dependent random partitions; hierarchical Dirichlet process; mixture models; nested Dirichlet process; vectors of random probabilities
Lijoi, A; Prunster, I; Rebaudo, G
File in questo prodotto:
File Dimensione Formato  
Scandinavian J Statistics - 2022 - Lijoi - Flexible clustering via hidden hierarchical Dirichlet priors.pdf

Accesso aperto

Tipo di file: PDF EDITORIALE
Dimensione 2.49 MB
Formato Adobe PDF
2.49 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1898301
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 8
social impact