TY - JOUR
T1 - Parameter Choice, Stability and Validity for Robust Cluster Weighted Modeling
AU - Cappozzo, Andrea
AU - Escudero, Luis Angel Garcìa
AU - Greselin, Francesca
AU - Mayo-Iscar, Agustìn
PY - 2021
Y1 - 2021
N2 - Statistical inference based on the cluster weighted model often requires some subjective judgment from the modeler. Many features influence the final solution, such as the number of mixture components, the shape of the clusters in the explanatory variables, and the degree of heteroscedasticity of the errors around the regression lines. Moreover, to deal with outliers and contamination that may appear in the data, hyper-parameter values ensuring robust estimation are also needed. In principle, this freedom gives rise to a variety of “legitimate” solutions, each derived by a specific set of choices and their implications in modeling. Here we introduce a method for identifying a “set of good models” to cluster a dataset, considering the whole panorama of choices. In this way, we enable the practitioner, or the scientist who needs to cluster the data, to make an educated choice. They will be able to identify the most appropriate solutions for the purposes of their own analysis, in light of their stability and validity.
AB - Statistical inference based on the cluster weighted model often requires some subjective judgment from the modeler. Many features influence the final solution, such as the number of mixture components, the shape of the clusters in the explanatory variables, and the degree of heteroscedasticity of the errors around the regression lines. Moreover, to deal with outliers and contamination that may appear in the data, hyper-parameter values ensuring robust estimation are also needed. In principle, this freedom gives rise to a variety of “legitimate” solutions, each derived by a specific set of choices and their implications in modeling. Here we introduce a method for identifying a “set of good models” to cluster a dataset, considering the whole panorama of choices. In this way, we enable the practitioner, or the scientist who needs to cluster the data, to make an educated choice. They will be able to identify the most appropriate solutions for the purposes of their own analysis, in light of their stability and validity.
KW - cluster-weighted modeling
KW - constrained estimation
KW - eigenvalue constraint
KW - model-based clustering
KW - monitoring
KW - outliers
KW - robust estimation
KW - trimmed BIC
KW - cluster-weighted modeling
KW - constrained estimation
KW - eigenvalue constraint
KW - model-based clustering
KW - monitoring
KW - outliers
KW - robust estimation
KW - trimmed BIC
UR - https://publicatt.unicatt.it/handle/10807/309180
UR - https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=85123411718&origin=inward
UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85123411718&origin=inward
U2 - 10.3390/stats4030036
DO - 10.3390/stats4030036
M3 - Article
SN - 2571-905X
VL - 4
SP - 602
EP - 615
JO - Stats
JF - Stats
IS - 3
ER -