TY - JOUR
T1 - Predicting the performance of big data applications on the cloud
AU - Ardagna, D.
AU - Barbierato, Enrico
AU - Gianniti, E.
AU - Gribaudo, M.
AU - Pinto, T. B.M.
AU - Da Silva, A. P.C.
AU - Almeida, J. M.
PY - 2020
Y1 - 2020
N2 - Data science applications have become widespread as a means to extract knowledge
from large datasets. Such applications are often characterized by highly heterogeneous and irregular data access patterns, thus often being referred to as big data applications. Such characteristics make the application execution quite challenging for
existing software and hardware infrastructures to meet their resource demands. The
cloud computing paradigm, in turn, ofers a natural hosting solution to such applications since its on-demand pricing model allows allocating efectively computing
resources according to application’s needs. However, these properties impose extra
challenge to the accurate performance prediction of cloud-based applications, which
is a key step to adequate capacity planning and managing of the hosting infrastructure. In this article, we tackle this challenge by exploring three modeling approaches
for predicting the performance of big data applications running on the cloud. We
evaluate two queuing-based analytical models and dagSim, a fast ad-hoc simulator,
in various scenarios based on diferent applications and infrastructure setups. The
considered approaches are compared in terms of prediction accuracy and execution
time. Our results indicate that our two best approaches, one analytical model and
dagSim, can predict average application execution times with only up to a 7% relative error, on average. Moreover, a comparison with the widely used event-based
simulator available with the Java Modeling Tool (JMT) suite demonstrates that
both the analytical model and dagSim run very fast, requiring at least two orders of
magnitude lower execution time than JMT while providing slightly better accuracy,
being thus practical for online prediction.
AB - Data science applications have become widespread as a means to extract knowledge
from large datasets. Such applications are often characterized by highly heterogeneous and irregular data access patterns, thus often being referred to as big data applications. Such characteristics make the application execution quite challenging for
existing software and hardware infrastructures to meet their resource demands. The
cloud computing paradigm, in turn, ofers a natural hosting solution to such applications since its on-demand pricing model allows allocating efectively computing
resources according to application’s needs. However, these properties impose extra
challenge to the accurate performance prediction of cloud-based applications, which
is a key step to adequate capacity planning and managing of the hosting infrastructure. In this article, we tackle this challenge by exploring three modeling approaches
for predicting the performance of big data applications running on the cloud. We
evaluate two queuing-based analytical models and dagSim, a fast ad-hoc simulator,
in various scenarios based on diferent applications and infrastructure setups. The
considered approaches are compared in terms of prediction accuracy and execution
time. Our results indicate that our two best approaches, one analytical model and
dagSim, can predict average application execution times with only up to a 7% relative error, on average. Moreover, a comparison with the widely used event-based
simulator available with the Java Modeling Tool (JMT) suite demonstrates that
both the analytical model and dagSim run very fast, requiring at least two orders of
magnitude lower execution time than JMT while providing slightly better accuracy,
being thus practical for online prediction.
KW - Apache spark
KW - Performance prediction
KW - Apache spark
KW - Performance prediction
UR - http://hdl.handle.net/10807/155056
U2 - 10.1007/s11227-020-03307-w
DO - 10.1007/s11227-020-03307-w
M3 - Article
SN - 0920-8542
SP - N/A-N/A
JO - THE JOURNAL OF SUPERCOMPUTING
JF - THE JOURNAL OF SUPERCOMPUTING
ER -