TY - JOUR
T1 - Optimal Resource Allocation of Cloud-Based Spark Applications
AU - Lattuada, Marco
AU - Barbierato, Enrico
AU - Gianniti, Eugenio
AU - Ardagna, Danilo
PY - 2020
Y1 - 2020
N2 - Nowadays, the big data paradigm is consolidating its central position in the industry, as well as in society at large. Lots of
applications, across disparate domains, operate on huge amounts of data and offer great advantages both for business and research.
According to analysts, cloud computing adoption is steadily increasing to support big data analyses and Spark is expected to take a
prominent market position for the next decade.
As big data applications gain more and more importance over time and given the dynamic nature of cloud resources, it is fundamental
to develop an intelligent resource management system to provide Quality of Service guarantees to end-users.
This paper presents a set of run-time optimization-based resource management policies for advanced big data analytics. Users submit
Spark applications characterized by a priority and by a hard or soft deadline. Optimization policies address two scenarios:
i) identification of the minimum capacity to run a Spark application within the deadline; ii) re-balance of the cloud resources in case
of heavy load, minimising the weighted soft deadline application tardiness. The solution relies on an initial non-linear programming
model formulation and a search space exploration based on simulation-optimization procedures. Spark application execution times are
estimated by relying on a gamut of techniques, including machine learning, approximated analyses, and simulation. The benefits of
the approach are evaluated on Microsoft Azure HDInsight and on a private cloud cluster based on POWER8 by considering the
TPC-DS industry benchmark and SparkBench. The results obtained in the first scenario demonstrate that the percentage error of the
prediction of the optimal resource usage with respect to system measurement and exhaustive search is the range 4%-29% while
literature-based techniques present an average error in the range 6%-63%. Moreover, in the second scenario, the proposed algorithms
can address complex problems like computing the optimal redistribution of resources among tens of applications in less than a minute
with an error of 8% on average. On the same considered tests, literature-based approaches obtain an average error of about 57%.
AB - Nowadays, the big data paradigm is consolidating its central position in the industry, as well as in society at large. Lots of
applications, across disparate domains, operate on huge amounts of data and offer great advantages both for business and research.
According to analysts, cloud computing adoption is steadily increasing to support big data analyses and Spark is expected to take a
prominent market position for the next decade.
As big data applications gain more and more importance over time and given the dynamic nature of cloud resources, it is fundamental
to develop an intelligent resource management system to provide Quality of Service guarantees to end-users.
This paper presents a set of run-time optimization-based resource management policies for advanced big data analytics. Users submit
Spark applications characterized by a priority and by a hard or soft deadline. Optimization policies address two scenarios:
i) identification of the minimum capacity to run a Spark application within the deadline; ii) re-balance of the cloud resources in case
of heavy load, minimising the weighted soft deadline application tardiness. The solution relies on an initial non-linear programming
model formulation and a search space exploration based on simulation-optimization procedures. Spark application execution times are
estimated by relying on a gamut of techniques, including machine learning, approximated analyses, and simulation. The benefits of
the approach are evaluated on Microsoft Azure HDInsight and on a private cloud cluster based on POWER8 by considering the
TPC-DS industry benchmark and SparkBench. The results obtained in the first scenario demonstrate that the percentage error of the
prediction of the optimal resource usage with respect to system measurement and exhaustive search is the range 4%-29% while
literature-based techniques present an average error in the range 6%-63%. Moreover, in the second scenario, the proposed algorithms
can address complex problems like computing the optimal redistribution of resources among tens of applications in less than a minute
with an error of 8% on average. On the same considered tests, literature-based approaches obtain an average error of about 57%.
KW - Big Data
KW - Quality of Service
KW - Big Data
KW - Quality of Service
UR - http://hdl.handle.net/10807/155055
U2 - 10.1109/TCC.2020.2985682
DO - 10.1109/TCC.2020.2985682
M3 - Article
SN - 2168-7161
SP - N/A-N/A
JO - IEEE Transactions on Cloud Computing
JF - IEEE Transactions on Cloud Computing
ER -