Keywords :
Computational modeling, Scheduling algorithms, Data models, Engines, Virtual environments, Prefetching, Tools
Abstract :
[en] The data locality is significant factor which has a direct impact on the performance of MapReduce framework. Several previous works have proposed alternative scheduling algorithms for improving the performance by increasing data locality. Nevertheless, their studies had focused the data locality on physical MapReduce cluster. As more and more deployment of MapReduce cluster have been on virtual environment, a more suitable evaluation of MapReduce cluster may be necessary. This study adopts a simulation based approach. Five scheduling algorithms were used for the simulation. WorkflowSim is extended by inclusion of three implemented modules to assess the new performance measure called `data locality ratio'. Comparison of their results reveals interesting findings. The proposed implementation can be used to assess `data locality ratio' and allows users prior to efficiently select and tune scheduler and system configurations suitable for an environment prior to its actual physical MapReduce deployment.
Scopus citations®
without self-citations
1