首站-论文投稿智能助手
典型文献
A Primal-Dual SGD Algorithm for Distributed Nonconvex Optimization
文献摘要:
The distributed nonconvex optimization problem of minimizing a global cost function formed by a sum of n local cost functions by using local information exchange is considered. This problem is an important component of many machine learning techniques with data parallelism, such as deep learning and federated learning. We propose a distributed primal-dual stochastic gradient descent (SGD) algorithm, suitable for arbitrarily connected communication networks and any smooth (possibly nonconvex) cost functions. We show that the proposed algorithm achieves the linear speedup convergence rate(O(1/(nT)))for general nonconvex cost functions and the linear speedup convergence rate(O(1/(nT)))when the global cost function satisfies the Polyak-?ojasiewicz (P-?) condition, where T is the total number of iterations. We also show that the output of the proposed algorithm with constant parameters linearly converges to a neighborhood of a global optimum. We demonstrate through numerical experiments the efficiency of our algorithm in comparison with the baseline centralized SGD and recently proposed distributed SGD algorithms.
文献关键词:
作者姓名:
Xinlei Yi;Shengjun Zhang;Tao Yang;Tianyou Chai;Karl Henrik Johansson
作者机构:
Division of Decision and Control Systems,School of Electrical Engineering and Computer Science,KTH Royal Institute of Technology,and also affiliated with the Digital Futures,Stockholm 10044,Sweden;Department of Electrical Engineering,University of North Texas,Denton,TX 76203 USA;State Key Laboratory of Synthetical Automation for Process Industries,Northeastern University,Shenyang 110819,China
引用格式:
[1]Xinlei Yi;Shengjun Zhang;Tao Yang;Tianyou Chai;Karl Henrik Johansson-.A Primal-Dual SGD Algorithm for Distributed Nonconvex Optimization)[J].自动化学报(英文版),2022(05):812-833
A类:
Primal,Polyak,ojasiewicz
B类:
Dual,SGD,Algorithm,Distributed,Nonconvex,Optimization,distributed,nonconvex,optimization,problem,minimizing,global,cost,formed,by,sum,local,functions,using,information,exchange,considered,This,important,component,many,machine,learning,techniques,data,parallelism,such,deep,federated,We,primal,dual,stochastic,gradient,descent,suitable,arbitrarily,connected,communication,networks,smooth,possibly,show,that,proposed,achieves,speedup,convergence,nT,general,when,satisfies,condition,where,total,number,iterations,also,output,constant,parameters,linearly,converges,neighborhood,optimum,demonstrate,through,numerical,experiments,efficiency,our,comparison,baseline,centralized,recently,algorithms
AB值:
0.570459
相似文献
Toward High-Performance Delta-Based Iterative Processing with a Group-Based Approach
Hui Yu;Xin-Yu Jiang;Jin Zhao;Hao Qi;Yu Zhang;Xiao-Fei Lia;Hai-Kun Liu;Fu-Bing Mao;Hai Jin-National Engineering Research Center for Big Data Technology and System,Huazhong University of Science and Technology,Wuhan 430074,China;Service Computing Technology and System Laboratory,Huazhong University of Science and Technology Wuhan 430074,China;Cluster and Grid Computing Laboratory,Huazhong University of Science and Technology,Wuhan 430074,China;School of Computer Science and Technology,Huazhong University of Science and Technology,Wuhan 430074,China;School of Computer Science and Technology,HUST,Wuhan;School of Computer Science and Technology at HUST,Wuhan;School of Computer Science and Technology,Huazhong University of Science and Technology,Wuhan;Huazhong University of Science and Technology(HUST),Wuhan
机标中图分类号,由域田数据科技根据网络公开资料自动分析生成,仅供学习研究参考。