首站-论文投稿智能助手
典型文献
SMART:Speedup Job Completion Time by Scheduling Reduce Tasks
文献摘要:
Distributed computing systems have been widely used as the amount of data grows exponentially in the era of information explosion.Job completion time(JCT)is a major metric for assessing their effectiveness.How to reduce the JCT for these systems through reasonable scheduling has become a hot issue in both industry and academia.Data skew is a common phenomenon that can compromise the performance of such distributed computing systems.This paper proposes SMART,which can effectively reduce the JCT through handling the data skew during the reducing phase.SMART predicts the size of reduce tasks based on part of the completed map tasks and then enforces largest-first scheduling in the reducing phase according to the predicted reduce task size.SMART makes minimal modifications to the original Hadoop with only 20 additional lines of code and is readily deployable.The robustness and the effectiveness of SMART have been evaluated with a real-world cluster against a large number of datasets.Experiments show that SMART reduces JCT by up to 6.47%,9.26%,and 13.66%for Terasort,WordCount and InvertedIndex respectively with the Purdue MapReduce benchmarks suite(PUMA)dataset.
文献关键词:
作者姓名:
Jia-Qing Dong;Ze-Hao He;Yuan-Yuan Gong;Pei-Wen Yu;Chen Tian;Wan-Chun Dou;Gui-Hai Chen;Nai Xia;Hao-Ran Guan
作者机构:
State Key Laboratory of Media Convergence and Communication,Communication University of China Beijing 100024,China;State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210023,China;School of Computer Science,The University of Sydney,Sydney NSW 2006,Australia
引用格式:
[1]Jia-Qing Dong;Ze-Hao He;Yuan-Yuan Gong;Pei-Wen Yu;Chen Tian;Wan-Chun Dou;Gui-Hai Chen;Nai Xia;Hao-Ran Guan-.SMART:Speedup Job Completion Time by Scheduling Reduce Tasks)[J].计算机科学技术学报(英文版),2022(04):763-778
A类:
Terasort,WordCount,InvertedIndex
B类:
SMART,Speedup,Job,Completion,Time,by,Scheduling,Tasks,Distributed,computing,systems,have,been,widely,used,amount,grows,exponentially,information,explosion,completion,JCT,major,metric,assessing,their,effectiveness,How,these,through,reasonable,scheduling,become,hot,issue,both,industry,academia,Data,skew,common,phenomenon,that,can,compromise,performance,such,distributed,This,paper,proposes,which,effectively,handling,during,reducing,phase,predicts,size,tasks,part,completed,map,then,enforces,largest,first,according,predicted,makes,minimal,modifications,original,Hadoop,only,additional,lines,code,readily,deployable,robustness,evaluated,real,world,cluster,against,number,datasets,Experiments,show,reduces,respectively,Purdue,MapReduce,benchmarks,suite,PUMA
AB值:
0.599394
相似文献
Annotating TSSs in Multiple Cell Types Based on DNA Sequence and RNA-seq Data via DeeReCT-TSS
Juexiao Zhou;Bin Zhang;Haoyang Li;Longxi Zhou;Zhongxiao Li;Yongkang Long;Wenkai Han;Mengran Wang;Huanhuan Cui;Jingjing Li;Wei Chen;Xin Gao-Computer Science Program,Computer,Electrical and Mathematical Sciences and Engineering Division,King Abdullah University of Science and Technology,Thuwal 23955-6900,Saudi Arabia;Computational Bioscience Research Center,King Abdullah University of Science and Technology,Thuwal 23955-6900,Saudi Arabia;Department of Biology,School of Life Sciences,Southern University of Science and Technology,Shenzhen 518055,China;Shenzhen Key Laboratory of Gene Regulation and Systems Biology,School of Life Sciences,Southern University of Science and Technology,Shenzhen 518055,China;Academy for Advanced Interdisciplinary Studies,Southern University of Science and Technology,Shenzhen 518055,China
机标中图分类号,由域田数据科技根据网络公开资料自动分析生成,仅供学习研究参考。