CINECA IRIS Institutional Research Information System

Dirichlet process mixtures are flexible nonparametric models, particularly suited to density estimation and probabilistic clustering. In this work we study the posterior distribution induced by Dirichlet process mixtures as the sample size increases, and more specifically focus on consistency for the unknown number of clusters when the observed data are generated from a finite mixture. Crucially, we consider the situation where a prior is placed on the concentration parameter of the underlying Dirichlet process. Previous findings in the literature suggest that Dirichlet process mixtures are typically not consistent for the number of clusters if the concentration parameter is held fixed and data come from a finite mixture. Here we show that consistency for the number of clusters can be achieved if the concentration parameter is adapted in a fully Bayesian way, as commonly done in practice. Our results are derived for data coming from a class of finite mixtures, with mild assumptions on the prior for the concentration parameter and for a variety of choices of likelihood kernels for the mixture.

Clustering consistency with Dirichlet process mixtures

Filippo Ascolani;Antonio Lijoi;Giovanni Rebaudo;Giacomo Zanella

2023-01-01

Abstract

Dirichlet process mixtures are flexible nonparametric models, particularly suited to density estimation and probabilistic clustering. In this work we study the posterior distribution induced by Dirichlet process mixtures as the sample size increases, and more specifically focus on consistency for the unknown number of clusters when the observed data are generated from a finite mixture. Crucially, we consider the situation where a prior is placed on the concentration parameter of the underlying Dirichlet process. Previous findings in the literature suggest that Dirichlet process mixtures are typically not consistent for the number of clusters if the concentration parameter is held fixed and data come from a finite mixture. Here we show that consistency for the number of clusters can be achieved if the concentration parameter is adapted in a fully Bayesian way, as commonly done in practice. Our results are derived for data coming from a class of finite mixtures, with mild assumptions on the prior for the concentration parameter and for a variety of choices of likelihood kernels for the mixture.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Titolo rivista
	
				BIOMETRIKA
			
	N. Volume
	
				110
			
	Fascicolo
	
				2
			
	Pagine (da)
	
				551
			
	Pagine (a)
	
				558
			
	DOI
	
				https://dx.doi.org/10.1093/biomet/asac051
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				https://academic.oup.com/biomet/advance-article/doi/10.1093/biomet/asac051/6696237
			
	Parole Chiave
	
				Asymptotics; Bayesian nonparametrics; Consistency; Clustering; Dirichlet process mixture; Number of components
			
	Tutti gli autori
	
						Filippo Ascolani; Antonio Lijoi; Giovanni Rebaudo; Giacomo Zanella
					
	Appare nelle tipologie:
	
				03A-Articolo su Rivista

File in questo prodotto:

File	Dimensione	Formato
DPM_Cons_neutral.pdf Accesso aperto Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE) Dimensione 498.53 kB Formato Adobe PDF Visualizza/Apri	498.53 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1898303

Citazioni

ND

5

8

social impact