首站-论文投稿智能助手
典型文献
LIDAR:learning from imperfect demonstrations with advantage rectification
文献摘要:
In actor-critic reinforcement learning (RL) algo-rithms,function estimation errors are known to cause ineffec-tive random exploration at the beginning of training,and lead to overestimated value estimates and suboptimal policies.In this paper,we address the problem by executing advantage rectifi-cation with imperfect demonstrations,thus reducing the func-tion estimation errors.Pretraining with expert demonstrations has been widely adopted to accelerate the learning process of deep reinforcement learning when simulations are expensive to obtain.However,existing methods,such as behavior cloning,often assume the demonstrations contain other information or labels with regard to performances,such as optimal assumption,which is usually incorrect and useless in the real world.In this paper,we explicitly handle imperfect demonstrations within the actor-critic RL frameworks,and propose a new method called learning from imperfect demonstrations with advantage recti-fication (LIDAR).LIDAR utilizes a rectified loss function to merely learn from selective demonstrations,which is derived from a minimal assumption that the demonstrating policies have better performances than our current policy.LIDAR learns from contradictions caused by estimation errors,and in turn reduces estimation errors.We apply LIDAR to three popular actor-critic algorithms,DDPG,TD3 and SAC,and experiments show that our method can observably reduce the function esti-mation errors,effectively leverage demonstrations far from the optimal,and outperform state-of-the-art baselines consistently in all the scenarios.
文献关键词:
作者姓名:
Xiaoqin ZHANG;Huimin MA;Xiong LUO;Jian YUAN
作者机构:
Department of EE,Tsinghua University,Beijing 100084,China;School of Computer and Communication Engineering,University of Science and Technology Beijing,Beijing 100083,China
文献出处:
引用格式:
[1]Xiaoqin ZHANG;Huimin MA;Xiong LUO;Jian YUAN-.LIDAR:learning from imperfect demonstrations with advantage rectification)[J].计算机科学前沿,2022(01):53-62
A类:
ineffec,rectifi,Pretraining,observably
B类:
LIDAR,learning,from,imperfect,demonstrations,advantage,rectification,In,actor,critic,reinforcement,RL,function,estimation,errors,are,known,random,exploration,beginning,lead,overestimated,value,estimates,suboptimal,policies,this,paper,address,problem,by,executing,thus,reducing,expert,has,been,widely,adopted,accelerate,process,deep,when,simulations,expensive,obtain,However,existing,methods,such,behavior,cloning,often,assume,contain,other,information,labels,regard,performances,assumption,which,usually,incorrect,useless,real,world,explicitly,handle,within,frameworks,propose,new,called,utilizes,rectified,loss,merely,selective,derived,minimal,that,demonstrating,have,better,than,our,current,policy,learns,contradictions,caused,turn,reduces,We,apply,three,popular,algorithms,DDPG,TD3,SAC,experiments,show,can,effectively,leverage,far,outperform,state,art,baselines,consistently,scenarios
AB值:
0.519248
相似文献
Soliton formation and spectral translation into visible on CMOS-compatible 4H-silicon-carbide-on-insulator platform
Chengli Wang;Jin Li;Ailun Yi;Zhiwei Fang;Liping Zhou;Zhe Wang;Rui Niu;Yang Chen;Jiaxiang Zhang;Ya Cheng;Junqiu Liu;Chun-Hua Dong;Xin Ou-State Key Laboratory of Functional Materials for Informatics,Shanghai Institute of Microsystem and Information Technology,Chinese Academy of Sciences,200050 Shanghai,China;The Center of Materials Science and Optoelectronics Engineering,University of Chinese Academy of Sciences,100049 Beijing,China;CAS Key Laboratory of Quantum Information,University of Science and Technology of China,230026 Hefei,China;CAS Center for Excellence in Quantum Information and Quantum Physics,University of Science and Technology of China,230026 Hefei,China;The Extreme Optoelectromechanics Laboratory(XXL),School of Physics and Electronic Science,East China Normal University,200241 Shanghai,China;State Key Laboratory of High Field Laser Physics and CAS Center for Excellence in Ultra-intense Laser Science,Shanghai Institute of Optics and Fine Mechanics,Chinese Academy of Sciences,201800 Shanghai,China;International Quantum Academy,518048 Shenzhen,China;Hefei National Laboratory,University of Science and Technology of China,Hefei 230026,China
机标中图分类号,由域田数据科技根据网络公开资料自动分析生成,仅供学习研究参考。