Relabel partitions by decreasing \(\lambda\), compute point estimates, and quantify uncertainty via credible balls
Source:R/utils.R
relabel_by_lambda.RdGiven posterior samples of labels x_samples (S x N) and corresponding
cluster-level intensities lambda_samples per iteration, this function:
(i) relabels each draw so that cluster 1 has the largest \(\lambda\),
cluster 2 the second largest, etc.; (ii) computes posterior similarity
matrix (PSM) and partition point estimates (minVI and Binder); and
(iii) constructs a VI-credible ball around the minVI partition and returns the
extremal partitions on its surface (lower/upper), relabelled by decreasing
strength for interpretability. It also returns per-item assignment probabilities
and the posterior distribution of the number of clusters.
Arguments
- x_samples
Integer matrix (S x N) of sampled partitions; rows index MCMC iterations and columns index items. Cluster labels are arbitrary across iterations.
- lambda_samples
Either:
a list of length S; element
[[s]]is a numeric vector of cluster intensities \(\lambda_\ell\) indexed by the raw label id used at iterations(NAs allowed for non-occupied ids), ora matrix (S x L) whose row
sgives \(\lambda_\ell\) for label \(\ell\) at iterations(sparse columns with NAs permitted).
Value
A named list with components:
x_samples_relabelInteger matrix (S x N) of relabelled draws (labels
1..K_sper iterations, ordered by decreasing \(\lambda\)).lambda_samples_relabelNumeric matrix (S x N) assigning each item its cluster's \(\lambda\) after relabelling.
co_clusteringPosterior similarity matrix (N x N).
minVI_partitionPartition estimated by minimizing posterior expected VI (first solution returned by
mcclust.ext::minVI).partition_binderPartition estimated by Binder's loss (
mcclust.ext::minbinder.ext).credible_ball_lower_partitionPartition on the surface of the 95\ posterior-mean item strength).
credible_ball_upper_partitionAnalogous extremal partition on the credible-ball surface (relabelled).
K_VI_lowerNumber of clusters in
credible_ball_lower_partition.K_VI_upperNumber of clusters in
credible_ball_upper_partition.n_clusters_each_iterInteger vector (length S) of occupied cluster counts per iteration.
block_count_distributionData frame with columns
num_blocks,count,probsummarizing the posterior of the number of clusters.item_cluster_assignment_probsData frame (N x Kmax) of per-item marginal assignment probabilities (columns
Cluster_1,Cluster_2, ...).avg_top_block_countAverage size of the top-\(\lambda\) cluster across iterations.
top_block_count_per_iterInteger vector (length S) with the size of the top-\(\lambda\) cluster per iteration.
cluster_lambda_orderedList of length S; each element is the vector of cluster \(\lambda\) values for that iteration, ordered decreasingly.
Details
Relabelling. For each iteration, occupied labels are compacted to 1..K_s
and reordered by decreasing \(\lambda\), producing a canonical “1 = strongest” labelling.
Point estimation. The posterior similarity matrix is computed from relabelled draws; minVI and Binder partitions are obtained via mcclust.ext.
Credible ball and extremal partitions. A 95\
(under VI) is constructed around the minVI partition. We report the extreme
partitions on the ball's surface (in the sense of maximal VI distance from the centre),
as returned by mcclust.ext::credibleball. These are then relabelled by decreasing
posterior-mean item strength to ensure a consistent “strength ordering” across summaries.
The associated cluster counts K_VI_lower and K_VI_upper characterize the
local structural uncertainty around the point estimate; they are not marginal
posterior quantiles of \(K\).
Input requirements
x_samplesmust be integer-valued with no missing items per row.lambda_samplesmay be sparse (NAs for non-occupied labels).mcclust and mcclust.ext must be available;
credibleballis expected to return lower/upper partitions in eitherc.lower/c.upperorc.lowervert/c.uppervert.
References
Wade, S., 2023. Bayesian cluster analysis. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 381, 20220149. https://doi.org/10.1098/rsta.2022.0149
Examples
if (FALSE) { # \dontrun{
set.seed(42)
S <- 50; N <- 15
x_samps <- matrix(sample(1:4, S*N, TRUE), S, N)
# Sparse lambda per-iter: labels up to 6, only first 4 occupied typically
lam_list <- replicate(S, { v <- rep(NA_real_, 6); v[1:4] <- rexp(4, 1); v }, simplify = FALSE)
out <- relabel_by_lambda(x_samps, lam_list)
out$minVI_partition[1:10]
out$K_VI_lower; out$K_VI_upper
} # }