博彩社区 博彩社区» 招生专栏
Chen D, Zhang R, Qiu R G. Noninvasive MapReduce performance tuning using multiple tuning methods on Hadoop[J]. IEEE Systems Journal, 2021, 15(2): 2906-2917.
代表性论文简介:
There are more than 190 configuration parameters affecting the performance of MapReduce jobs on Hadoop. It is time-consuming and tedious for general users who have no deep knowledge of Hadoop configuring to tune the parameters of a MapReduce job for optimal performance. Therefore, a self-tuning system to improve MapReduce performance in an automated and efficient manner in a complicated Hadoop environment is needed. This article explores multiple tuning methods to improve tuning efficiency for MapReduce performance on Hadoop. The proposed Catla system employs succinct templates and proper schemes of MapReduce algorithms, which can be incorporated in facilitating the tuning and optimization of MapReduce performance. A comprehensive evaluation of the Catla system, with the support of multiple tuning approaches, is discussed in this article. Direct search-based and derivative-free optimization-based tuning techniques for improved efficiency and usability are evaluated using a series of tuning experiments. The experimental results reveal that our work can identify optimal Hadoop parameters for deployed MapReduce jobs in a noninvasive, flexible, automated, and comprehensive manner.
大数据分析平台性能调优对各领域的大数据分析应用具有重要的影响。例如,影响Hadoop平台中MapReduce作业性能的配置参数数量超过190个。对于没有深入了解Hadoop配置的普通用户来说,调整MapReduce作业的参数以获得最佳性能是耗时且繁琐的。因此,我们需要一个自动性能调优方法来在复杂的Hadoop分布式环境中以自动化和高效的方式自动配置并提升MapReduce作业性能。本文提出的Catla开源系统采用简洁的配置模板和MapReduce设计改进方案,有效地推动MapReduce作业性能的自主调整和优化。其中,我们全面评估了Catla在多种调优方法的支持下的作业性能变化。我们采用一系列调优实验,对比评估基于直接搜索和基于无导数优化的调优技术,以提高参数配置的效率和可用性。实验结果表明,我们的工作能实现非侵入式、灵活、自动化和全面的Hadoop平台参数自动优化。