![]() ; Varrette, Sébastien ![]() ![]() in 20th IEEE/ACM Intl. Symp. on Cluster, Cloud and Internet Computing (CCGrid'20) (2020, May) With renewed global interest for Artificial Intelligence (AI) methods, the past decade has seen a myriad of new programming models and tools that enable better and faster Machine Learning (ML). More ... [more ▼] With renewed global interest for Artificial Intelligence (AI) methods, the past decade has seen a myriad of new programming models and tools that enable better and faster Machine Learning (ML). More recently, a subset of ML known as Deep Learning (DL) raised an increased interest due to its inherent ability to tackle efficiently novel cognitive computing applications. DL allows computational models that are composed of multiple processing layers to learn in an automated way representations of data with multiple levels of abstraction, and can deliver higher predictive accuracy when trained on larger data sets. Based on Artificial Neural Networks (ANN), DL is now at the core of state of the art voice recognition systems (which enable easy control over e.g. Internet-of- Things (IoT) smart home appliances for instance), self-driving car engine, online recommendation systems. The ecosystem of DL frameworks is fast evolving, as well as the DL architectures that are shown to perform well on specialized tasks and to exploit GPU accelerators. For this reason, the frequent performance evaluation of the DL ecosystem is re- quired, especially since the advent of novel distributed training frameworks such as Horovod allowing for scalable training across multiple computing resources. In this paper, the scalability evaluation of the reference DL frameworks (Tensorflow, Keras, MXNet, and PyTorch) is performed over up-to-date High Performance Comput- ing (HPC) resources to compare the efficiency of differ- ent implementations across several hardware architectures (CPU and GPU). Experimental results demonstrate that the DistributedDataParallel features in the Pytorch library seem to be the most efficient framework for distributing the training process across many devices, allowing to reach a throughput speedup of 10.11 when using 12 NVidia Tesla V100 GPUs when training Resnet44 on the CIFAR10 dataset. [less ▲] Detailed reference viewed: 115 (8 UL)![]() Pinel, Frédéric ![]() ![]() in Communications in Computer and Information Science (2020, February) We present a procedure for the design of a Deep Neural Net- work (DNN) that estimates the execution time for training a deep neural network per batch on GPU accelerators. The estimator is destined to be ... [more ▼] We present a procedure for the design of a Deep Neural Net- work (DNN) that estimates the execution time for training a deep neural network per batch on GPU accelerators. The estimator is destined to be embedded in the scheduler of a shared GPU infrastructure, capable of providing estimated training times for a wide range of network architectures, when the user submits a training job. To this end, a very short and simple representation for a given DNN is chosen. In order to compensate for the limited degree of description of the basic network representation, a novel co-evolutionary approach is taken to fit the estimator. The training set for the estimator, i.e. DNNs, is evolved by an evolutionary algorithm that optimizes the accuracy of the estimator. In the process, the genetic algorithm evolves DNNs, generates Python-Keras programs and projects them onto the simple representation. The genetic operators are dynamic, they change with the estimator’s accuracy in order to balance accuracy with generalization. Results show that despite the low degree of information in the representation and the simple initial design for the predictor, co-evolving the training set performs better than near random generated population of DNNs. [less ▲] Detailed reference viewed: 67 (4 UL)![]() Thanapol, Panissara ![]() ![]() in 5th International Conference on Information Technology, Bangsaen 21-22 October 2020 (2020) Detailed reference viewed: 41 (0 UL)![]() Varrette, Sébastien ![]() ![]() ![]() in Proc. of 13th Intl. Conf. on Parallel Processing and Applied Mathematics (PPAM 2019) (2019, December) For large scale systems, such as data centers, energy efficiency has proven to be key for reducing capital, operational expenses and environmental impact. Power drainage of a system is closely related to ... [more ▼] For large scale systems, such as data centers, energy efficiency has proven to be key for reducing capital, operational expenses and environmental impact. Power drainage of a system is closely related to the type and characteristics of workload that the device is running. For this reason, this paper presents an automatic software tuning method for parallel program generation able to adapt and exploit the hardware features available on a target computing system such as an HPC facility or a cloud system in a better way than traditional compiler infrastructures. We propose a search based approach combining both exact methods and approximated heuristics evolving programs in order to find optimized configurations relying on an ever-increasing number of tunable knobs i.e., code transformation and execution options (such as the num- ber of OpenMP threads and/or the CPU frequency settings). The main objective is to outperform the configurations generated by traditional compiling infrastructures for selected KPIs i.e., performance, energy and power usage (for both for the CPU and DRAM), as well as the runtime. First experimental results tied to the local optimization phase of the proposed framework are encouraging, demonstrating between 8% and 41% improvement for all considered metrics on a reference benchmark- ing application (i.e., Linpack). This brings novel perspectives for the global optimization step currently under investigation within the presented framework, with the ambition to pave the way toward automatic tuning of energy-aware applications beyond the performance of the current state-of-the-art compiler infrastructures. [less ▲] Detailed reference viewed: 109 (23 UL)![]() ; Pinel, Frédéric ![]() in 2017 3rd IEEE International Conference on Cybernetics (CYBCONF) (2017) Detailed reference viewed: 17 (2 UL)![]() Pinel, Frédéric ![]() Doctoral thesis (2014) Detailed reference viewed: 142 (15 UL)![]() Plugaru, Valentin ![]() ![]() ![]() Report (2014) The increasing demand for High Performance Computing (HPC) paired with the higher power requirements of the ever-faster systems has led to the search for both performant and more energy-efficient ... [more ▼] The increasing demand for High Performance Computing (HPC) paired with the higher power requirements of the ever-faster systems has led to the search for both performant and more energy-efficient architectures. This article compares and contrasts the performance and energy efficiency of two modern, traditional Intel Xeon and low power ARM-based clusters, which are tested with the recently developed High Performance Conjugate Gradient (HPCG) benchmark and the ABySS, FASTA and MrBayes bioinformatics applications. We show a higher Performance per Watt valuation of the ARM cluster, and lower energy usage during the tests, which does not offset the much faster job completion rate obtained by the Intel cluster, making the latter more suitable for the considered workloads given the disparity in the performance results. [less ▲] Detailed reference viewed: 160 (23 UL)![]() Pinel, Frédéric ![]() ![]() ![]() in Wyrzykowski, Roman; Dongarra, Jack (Eds.) Parallel Processing and Applied Mathematics 10th International Conference, PPAM 2013 Warsaw, Poland, September 8–11, 2013 (2014) Detailed reference viewed: 73 (2 UL)![]() Pinel, Frédéric ![]() in International Journal of Hybrid Intelligent Systems (2014), 11(4), 287--302 Detailed reference viewed: 23 (2 UL)![]() ; ; Pinel, Frédéric ![]() in Proc. of the 3rd Intl. Conf. on Cloud and Green Computing (CGC'13) (2013, October) Detailed reference viewed: 126 (1 UL)![]() Pinel, Frédéric ![]() ![]() ![]() in Nature and Biologically Inspired Computing (NaBIC), 2013 World Congress on (2013, August 13) Detailed reference viewed: 125 (2 UL)![]() Pinel, Frédéric ![]() ![]() ![]() in Journal of Parallel & Distributed Computing (2013), 73(1), 101-110 Detailed reference viewed: 116 (4 UL)![]() ; ; et al in Parallel Computing (2013), 39(11), 709-736 An efficient resource allocation is a fundamental requirement in high performance computing (HPC) systems. Many projects are dedicated to large-scale distributed computing systems that have designed and ... [more ▼] An efficient resource allocation is a fundamental requirement in high performance computing (HPC) systems. Many projects are dedicated to large-scale distributed computing systems that have designed and developed resource allocation mechanisms with a variety of architectures and services. In our study, through analysis, a comprehensive survey for describing resource allocation in various HPCs is reported. The aim of the work is to aggregate under a joint framework, the existing solutions for HPC to provide a thorough analysis and characteristics of the resource management and allocation strategies. Resource allocation mechanisms and strategies play a vital role towards the performance improvement of all the HPCs classifications. Therefore, a comprehensive discussion of widely used resource allocation strategies deployed in HPC environment is required, which is one of the motivations of this survey. Moreover, we have classified the HPC systems into three broad categories, namely: (a) cluster, (b) grid, and (c) cloud systems and define the characteristics of each class by extracting sets of common attributes. All of the aforementioned systems are cataloged into pure software and hybrid/hardware solutions. The system classification is used to identify approaches followed by the implementation of existing resource allocation strategies that are widely presented in the literature. [less ▲] Detailed reference viewed: 281 (7 UL)![]() Pinel, Frédéric ![]() ![]() ![]() in Cluster Computing (2012) Detailed reference viewed: 100 (5 UL)![]() Ruiz, Patricia ![]() ![]() in Journal of Supercomputing (2012), 62(3), 1213-1240 Detailed reference viewed: 163 (2 UL)![]() Pecero, Johnatan ![]() ![]() ![]() in International Congress on Computer Science Research (2011, October 28) Detailed reference viewed: 68 (2 UL)![]() ; ; et al in Cluster Computing (2011), 16(1), 3-15 Detailed reference viewed: 327 (2 UL)![]() Pinel, Frédéric ![]() ![]() ![]() in Green Computing and Communications (GreenCom), 2011 IEEE/ACM International Conference on (2011) Today’s datacenters and large scale enterprise com- puting are power hungry. A lot of research effort is devoted in industry and academy to address this challenging issue. In this context, a new type of ... [more ▼] Today’s datacenters and large scale enterprise com- puting are power hungry. A lot of research effort is devoted in industry and academy to address this challenging issue. In this context, a new type of enterprise computing platform is being investigated. This computing platform is composed of hundred of millicomputers, each requiring orders of magnitude less power. However, this approach brings challenges that must be met in order to compete with the current practice. This paper addresses two such critical challenges. First, it suggests how to decompose large applications into smaller tasks, better suited to millicomputers. Then, it casts the performance oriented and energy efficient problem into a soft real-time scheduling problem, for which several algorithms are then proposed and evaluated. Sensitivity analysis is used to provide insights into the model, and plan the evaluation of the scheduling algorithms. The contention found in multi-core millicomputing processors is also accounted for. [less ▲] Detailed reference viewed: 87 (1 UL)![]() ![]() Pecero, Johnatan ![]() ![]() ![]() in Bouvry, Pascal; González-Vélez, Horacio; Kolodziej, Joanna (Eds.) Intelligent Decision Systems in Large-Scale Distributed Environments, 362 (2011) Detailed reference viewed: 115 (5 UL)![]() Pinel, Frédéric ![]() ![]() ![]() in Computer and Information Technology (CIT), 2011 IEEE 11th International Conference on (2011) Detailed reference viewed: 97 (1 UL) |
||