TY - JOUR
T1 - Concept drift estimation with graphical models
AU - Riso, Luigi
AU - Guerzoni, Marco
PY - 2022
Y1 - 2022
N2 - This paper deals with the issue of concept-drift in machine learning in the context of high dimensional problems. In contrast to previous concept drift detection methods, this application does not depend on the machine learning model in use for a specific target variable, but rather, it attempts to assess the concept drift as an independent characteristic of the evolution of a dataset. This major achievement enables data to be tested for the presence of drift, independently of the specific problem at hand. This is extremely useful when the same dataset is utilized for different classifications simultaneously, as it is often the case in a business environment. Moreover, unlike previous approaches, this method does not require the re-testing of each new model; a strategy which could prove expensive in computational terms. The fundamental intention of this work is to make use of graphical models to elicit the visible structure of data and represent it as a network. Specifically, we investigate how a graphical model evolves by looking at the creation of new links, and the disappearance of existing ones, in different time periods. We perform this task in four steps. We compute the adjacency matrix of a graph in each period, we apply a function that maps each possible state of the adjacency matrix over time into a transition matrix. We use the information in the transition matrix to produce a metric to estimate the presence of a drift in the data. Eventually, we evaluate this method with both three real-world datasets and a synthetic one.(c) 2022 Elsevier Inc. All rights reserved.
AB - This paper deals with the issue of concept-drift in machine learning in the context of high dimensional problems. In contrast to previous concept drift detection methods, this application does not depend on the machine learning model in use for a specific target variable, but rather, it attempts to assess the concept drift as an independent characteristic of the evolution of a dataset. This major achievement enables data to be tested for the presence of drift, independently of the specific problem at hand. This is extremely useful when the same dataset is utilized for different classifications simultaneously, as it is often the case in a business environment. Moreover, unlike previous approaches, this method does not require the re-testing of each new model; a strategy which could prove expensive in computational terms. The fundamental intention of this work is to make use of graphical models to elicit the visible structure of data and represent it as a network. Specifically, we investigate how a graphical model evolves by looking at the creation of new links, and the disappearance of existing ones, in different time periods. We perform this task in four steps. We compute the adjacency matrix of a graph in each period, we apply a function that maps each possible state of the adjacency matrix over time into a transition matrix. We use the information in the transition matrix to produce a metric to estimate the presence of a drift in the data. Eventually, we evaluate this method with both three real-world datasets and a synthetic one.(c) 2022 Elsevier Inc. All rights reserved.
KW - Bayesian logistic regression
KW - Drift estimation
KW - Graphical models
KW - Unsupervised learning
KW - Bayesian logistic regression
KW - Drift estimation
KW - Graphical models
KW - Unsupervised learning
UR - http://hdl.handle.net/10807/228811
U2 - 10.1016/j.ins.2022.05.056
DO - 10.1016/j.ins.2022.05.056
M3 - Article
SN - 0020-0255
VL - 606
SP - 786
EP - 804
JO - Information Sciences
JF - Information Sciences
ER -