This thesis studies ensembles of exchangeable random partitions, that is, collections of partition-valued random objects considered jointly rather than in isolation. While the classical theory, rooted in Kingman’s theory of partition structures, focuses mainly on a single exchangeable random partition, the thesis investigates the probabilistic and inferential structure that emerges when multiple partitions are considered jointly. This shift in perspective leads to new combinatorial constraints and posterior structures that are not visible at the level of a single partition. The first part of the thesis develops a framework for posterior inference from ensembles of exchangeable partitions sampled from a common latent population. The resulting posterior distributions are supported or depend on structured subsets of the set of integer partitions determined by the possible coagulations of the observed ensemble. Their analysis relies on tools such as matching numbers, hypergeometric fragmentations, and related combinatorial coefficients, and reveals a structural obstruction, namely that, unlike classical exchangeable partition models, these posterior laws do not admit a straightforward sequential constructive representation. To address the resulting computational challenges, the work develops exact samplers for auxiliary combinatorial objects together with Monte Carlo methods for posterior inference. The thesis then considers a related but distinct problem, namely consensus clustering. Here, multiple observed clustering solutions of the same dataset are treated as data, and the consensus partition, namely a set partition that best summarizes the common structure of the ensemble, is viewed as the inferential target. Rather than relying on a fully specified generative model for the ensemble, a generalized Bayesian approach is adopted, in which prior beliefs on the space of set partitions are updated through a loss-based mechanism. This yields a principled posterior formulation of consensus clustering, together with a computational strategy tailored to the combinatorial structure of the problem, thus connecting the classical optimization-based perspective with a fully inferential one, while also providing a natural form of uncertainty quantification. Finally, the ensemble perspective is extended to a dynamic setting by introducing a time-evolving latent population structure driven by a two-parameter Poisson–Dirichlet diffusion. Observations consist of the random partitions induced by repeated finite samples collected at discrete times, in which sampled individuals are grouped according to whether they share the same latent type, thereby yielding a time-indexed ensemble of integer partitions. The resulting inference problem is formulated as a filtering problem for an evolving infinite-dimensional Bayesian nonparametric model. By exploiting stochastic duality, the study derives an exact filtering framework in which the posterior remains a finite mixture at each time step, together with explicit smoothing and predictive procedures. This provides one of the few examples of exact online inference for a diffusion-driven Bayesian nonparametric model. Overall, this work shows that the ensemble viewpoint leads naturally to new probabilistic structures, new combinatorial posterior representations, and inferential methods tailored to the structure of the problem in both static and dynamic settings.

On Exchangeable Partition Ensembles(2026 Apr 28).

On Exchangeable Partition Ensembles

DALLA PRIA, MARCO
2026-04-28

Abstract

This thesis studies ensembles of exchangeable random partitions, that is, collections of partition-valued random objects considered jointly rather than in isolation. While the classical theory, rooted in Kingman’s theory of partition structures, focuses mainly on a single exchangeable random partition, the thesis investigates the probabilistic and inferential structure that emerges when multiple partitions are considered jointly. This shift in perspective leads to new combinatorial constraints and posterior structures that are not visible at the level of a single partition. The first part of the thesis develops a framework for posterior inference from ensembles of exchangeable partitions sampled from a common latent population. The resulting posterior distributions are supported or depend on structured subsets of the set of integer partitions determined by the possible coagulations of the observed ensemble. Their analysis relies on tools such as matching numbers, hypergeometric fragmentations, and related combinatorial coefficients, and reveals a structural obstruction, namely that, unlike classical exchangeable partition models, these posterior laws do not admit a straightforward sequential constructive representation. To address the resulting computational challenges, the work develops exact samplers for auxiliary combinatorial objects together with Monte Carlo methods for posterior inference. The thesis then considers a related but distinct problem, namely consensus clustering. Here, multiple observed clustering solutions of the same dataset are treated as data, and the consensus partition, namely a set partition that best summarizes the common structure of the ensemble, is viewed as the inferential target. Rather than relying on a fully specified generative model for the ensemble, a generalized Bayesian approach is adopted, in which prior beliefs on the space of set partitions are updated through a loss-based mechanism. This yields a principled posterior formulation of consensus clustering, together with a computational strategy tailored to the combinatorial structure of the problem, thus connecting the classical optimization-based perspective with a fully inferential one, while also providing a natural form of uncertainty quantification. Finally, the ensemble perspective is extended to a dynamic setting by introducing a time-evolving latent population structure driven by a two-parameter Poisson–Dirichlet diffusion. Observations consist of the random partitions induced by repeated finite samples collected at discrete times, in which sampled individuals are grouped according to whether they share the same latent type, thereby yielding a time-indexed ensemble of integer partitions. The resulting inference problem is formulated as a filtering problem for an evolving infinite-dimensional Bayesian nonparametric model. By exploiting stochastic duality, the study derives an exact filtering framework in which the posterior remains a finite mixture at each time step, together with explicit smoothing and predictive procedures. This provides one of the few examples of exact online inference for a diffusion-driven Bayesian nonparametric model. Overall, this work shows that the ensemble viewpoint leads naturally to new probabilistic structures, new combinatorial posterior representations, and inferential methods tailored to the structure of the problem in both static and dynamic settings.
28-apr-2026
38
MODELING AND DATA SCIENCE
RUGGIERO, Matteo
File in questo prodotto:
File Dimensione Formato  
Tesi-Dalla Pria-Marco.pdf

Accesso aperto

Descrizione: Tesi
Dimensione 5.07 MB
Formato Adobe PDF
5.07 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2138158
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact