典型文献
Increasing Momentum-Like Factors:A Method for Reducing Training Errors on Multiple GPUs
文献摘要:
In distributed training,increasing batch size can improve parallelism,but it can also bring many difficulties to the training process and cause training errors.In this work,we investigate the occurrence of training errors in theory and train ResNet-50 on CIFAR-10 by using Stochastic Gradient Descent (SGD) and Adaptive moment estimation(Adam) while keeping the total batch size in the parameter server constant and lowering the batch size on each Graphics Processing Unit (GPU).A new method that considers momentum to eliminate training errors in distributed training is proposed.We define a Momentum-like Factor (MF) to represent the influence of former gradients on parameter updates in each iteration.Then,we modify the MF values and conduct experiments to explore how different MF values influence the training performance based on SGD,Adam,and Nesterov accelerated gradient.Experimental results reveal that increasing MFs is a reliable method for reducing training errors in distributed training.The analysis of convergent conditions in distributed training with consideration of a large batch size and multiple GPUs is presented in this paper.
文献关键词:
中图分类号:
作者姓名:
Yu Tang;Zhigang Kan;Lujia Yin;Zhiquan Lai;Zhaoning Zhang;Linbo Qiao;Dongsheng Li
作者机构:
Science and Technology on Paralled and Distributed Processing Laboratory,and College of Computer Science and Technology,National University of Defense Technology,Changsha 473000,China
文献出处:
引用格式:
[1]Yu Tang;Zhigang Kan;Lujia Yin;Zhiquan Lai;Zhaoning Zhang;Linbo Qiao;Dongsheng Li-.Increasing Momentum-Like Factors:A Method for Reducing Training Errors on Multiple GPUs)[J].清华大学学报自然科学版(英文版),2022(01):114-126
A类:
B类:
Increasing,Momentum,Like,Factors,Method,Reducing,Training,Errors,Multiple,GPUs,distributed,training,increasing,batch,size,can,improve,parallelism,also,bring,many,difficulties,process,cause,errors,this,work,investigate,occurrence,theory,ResNet,CIFAR,by,using,Stochastic,Gradient,Descent,SGD,Adaptive,estimation,Adam,while,keeping,total,parameter,server,constant,lowering,each,Graphics,Processing,Unit,new,method,that,considers,momentum,eliminate,proposed,We,define,like,represent,influence,former,gradients,updates,iteration,Then,modify,values,conduct,experiments,explore,how,different,performance,Nesterov,accelerated,Experimental,results,reveal,MFs,reliable,reducing,analysis,convergent,conditions,consideration,large,multiple,presented,paper
AB值:
0.59126
相似文献
机标中图分类号,由域田数据科技根据网络公开资料自动分析生成,仅供学习研究参考。