TY - JOUR
T1 - A methodological approach for time series analysis and forecasting of web dynamics
AU - Calzarossa, Maria Carla
AU - Della Vedova, Marco Luigi
AU - Massari, Luisa
AU - Nebbione, Giuseppe
AU - Tessera, Daniele
PY - 2019
Y1 - 2019
N2 - The web is a complex information ecosystem that provides a large variety of content changing over time as a consequence of the combined effects of management policies, user interactions and external events. These highly dynamic scenarios challenge technologies dealing with discovery, management and retrieval of web content. In this paper, we address the problem of modeling and predicting web dynamics in the framework of time series analysis and forecasting. We present a general methodological approach that allows the identification of the patterns describing the behavior of the time series, the formulation of suitable models and the use of these models for predicting the future behavior. Moreover, to improve the forecasts, we propose a method for detecting and modeling the spiky patterns that might be present in a time series. To test our methodological approach, we analyze the temporal patterns of page uploads of the Reuters news agency website over one year. We discover that the upload process is characterized by a diurnal behavior and by a much larger number of uploads during weekdays with respect to weekend days. Moreover, we identify several sudden spikes and a daily periodicity. The overall model of the upload process – obtained as a superposition of the models of its individual components – accurately fits the data, including most of the spikes.
AB - The web is a complex information ecosystem that provides a large variety of content changing over time as a consequence of the combined effects of management policies, user interactions and external events. These highly dynamic scenarios challenge technologies dealing with discovery, management and retrieval of web content. In this paper, we address the problem of modeling and predicting web dynamics in the framework of time series analysis and forecasting. We present a general methodological approach that allows the identification of the patterns describing the behavior of the time series, the formulation of suitable models and the use of these models for predicting the future behavior. Moreover, to improve the forecasts, we propose a method for detecting and modeling the spiky patterns that might be present in a time series. To test our methodological approach, we analyze the temporal patterns of page uploads of the Reuters news agency website over one year. We discover that the upload process is characterized by a diurnal behavior and by a much larger number of uploads during weekdays with respect to weekend days. Moreover, we identify several sudden spikes and a daily periodicity. The overall model of the upload process – obtained as a superposition of the models of its individual components – accurately fits the data, including most of the spikes.
KW - ARMA models
KW - Forecasting
KW - Performance modeling
KW - Search engines
KW - Temporal patterns
KW - Time series analysis
KW - Web dynamics
KW - ARMA models
KW - Forecasting
KW - Performance modeling
KW - Search engines
KW - Temporal patterns
KW - Time series analysis
KW - Web dynamics
UR - http://hdl.handle.net/10807/142862
UR - https://www.springer.com/series/558
U2 - 10.1007/978-3-662-59540-4_7
DO - 10.1007/978-3-662-59540-4_7
M3 - Article
SN - 2190-9288
VL - 11610
SP - 128
EP - 143
JO - TRANSACTIONS ON COMPUTATIONAL COLLECTIVE INTELLIGENCE
JF - TRANSACTIONS ON COMPUTATIONAL COLLECTIVE INTELLIGENCE
ER -