Nowadays, the big data paradigm is consolidating its central position in the industry, as well as in society at large. Lots of applications, across disparate domains, operate on huge amounts of data and offer great advantages both for business and research. According to analysts, cloud computing adoption is steadily increasing to support big data analyses and Spark is expected to take a prominent market position for the next decade. As big data applications gain more and more importance over time and given the dynamic nature of cloud resources, it is fundamental to develop an intelligent resource management system to provide Quality of Service guarantees to end-users. This paper presents a set of run-time optimization-based resource management policies for advanced big data analytics. Users submit Spark applications characterized by a priority and by a hard or soft deadline. Optimization policies address two scenarios: i) identification of the minimum capacity to run a Spark application within the deadline; ii) re-balance of the cloud resources in case of heavy load, minimising the weighted soft deadline application tardiness. The solution relies on an initial non-linear programming model formulation and a search space exploration based on simulation-optimization procedures. Spark application execution times are estimated by relying on a gamut of techniques, including machine learning, approximated analyses, and simulation. The benefits of the approach are evaluated on Microsoft Azure HDInsight and on a private cloud cluster based on POWER8 by considering the TPC-DS industry benchmark and SparkBench. The results obtained in the first scenario demonstrate that the percentage error of the prediction of the optimal resource usage with respect to system measurement and exhaustive search is the range 4%-29% while literature-based techniques present an average error in the range 6%-63%. Moreover, in the second scenario, the proposed algorithms can address complex problems like computing the optimal redistribution of resources among tens of applications in less than a minute with an error of 8% on average. On the same considered tests, literature-based approaches obtain an average error of about 57%.
- Big Data
- Quality of Service