Abstract :
[en] Due to serving several purposes simultaneously, running scientific workflows on dynamic environments such as cloud computing, has become multi-objective scheduling. Among these purposes, Cost and Makespan are probably the most two primitive objectives. Another critical factor in a large-scale scientific workflow is tremendous amount of data during execution. Therefore, this work also includes Data Movement as an additional objective as it has a major impact on network utilization and energy consumption in network equipment in cloud data center. In considering these three objectives, this work proposes a framework for scheduling solutions which combines a new nodes clustering technique in Directed Acyclic Graph (DAG) model known as Multilevel Dependent Node Clustering (MDNC) and the multiobjective optimization, Extreme Nondominated Sorting Genetic Algorithm-III (E-NSGA-III). E-NSGAIII is the recent extension of Nondominated Sorting Genetic Algorithm (NSGA-III). Five well-known scientific workflows, CyberShake, Epigenomics, LIGO, Montage, and SIPHT are selected as testbeds, while the commonly known Hypervolume is chosen as the performance metric. In this work, MDNC is also experimented with both NSGA-III. Comparison among three approaches, E-NAGA-III alone, E-NAGA-III with Peer-to-Peer clustering and E-NAGA-III with MDNC are carried out. The superiority of the proposed framework among them and its limitation are discussed.
Scopus citations®
without self-citations
16