TY - JOUR
T1 - Group-Wise Shrinkage Estimation in Penalized Model-Based Clustering
AU - Casa, Alessandro
AU - Cappozzo, Andrea
AU - Fop, Michael
PY - 2022
Y1 - 2022
N2 - Finite Gaussian mixture models provide a powerful and widely employed probabilistic approach for clustering multivariate continuous data. However, the practical usefulness of these models is jeopardized in high-dimensional spaces, where they tend to be over-parameterized. As a consequence, different solutions have been proposed, often relying on matrix decompositions or variable selection strategies. Recently, a methodological link between Gaussian graphical models and finite mixtures has been established, paving the way for penalized model-based clustering in the presence of large precision matrices. Notwithstanding, current methodologies implicitly assume similar levels of sparsity across the classes, not accounting for different degrees of association between the variables across groups. We overcome this limitation by deriving group-wise penalty factors, which automatically enforce under or over-connectivity in the estimated graphs. The approach is entirely data-driven and does not require additional hyper-parameter specification. Analyses on synthetic and real data showcase the validity of our proposal.
AB - Finite Gaussian mixture models provide a powerful and widely employed probabilistic approach for clustering multivariate continuous data. However, the practical usefulness of these models is jeopardized in high-dimensional spaces, where they tend to be over-parameterized. As a consequence, different solutions have been proposed, often relying on matrix decompositions or variable selection strategies. Recently, a methodological link between Gaussian graphical models and finite mixtures has been established, paving the way for penalized model-based clustering in the presence of large precision matrices. Notwithstanding, current methodologies implicitly assume similar levels of sparsity across the classes, not accounting for different degrees of association between the variables across groups. We overcome this limitation by deriving group-wise penalty factors, which automatically enforce under or over-connectivity in the estimated graphs. The approach is entirely data-driven and does not require additional hyper-parameter specification. Analyses on synthetic and real data showcase the validity of our proposal.
KW - Model-based clustering
KW - Penalized likelihood
KW - EM algorithm
KW - Gaussian graphical models
KW - Graphical lasso
KW - Sparse precision matrices
KW - Model-based clustering
KW - Penalized likelihood
KW - EM algorithm
KW - Gaussian graphical models
KW - Graphical lasso
KW - Sparse precision matrices
UR - http://hdl.handle.net/10807/304038
UR - https://link.springer.com/article/10.1007/s00357-022-09421-z
U2 - 10.1007/s00357-022-09421-z
DO - 10.1007/s00357-022-09421-z
M3 - Article
SN - 0176-4268
VL - 39
SP - 648
EP - 674
JO - Journal of Classification
JF - Journal of Classification
ER -