LIDAR:learning from imperfect demonstrations with advantage rectification|Xiaoqin ZHANG;Huimin MA;Xiong LUO;Jian YUAN|School of Computer and Communication Engineering,University of Science and Technology Beijing,Beijing 100083,China - 期刊导航|首站-论文投稿智能助手|论文发表|论文智能投稿|期刊自助发表推荐|杂志社快速发表|查同导刊-域田数据官方网站

典型文献

LIDAR:learning from imperfect demonstrations with advantage rectification

文献摘要：

In actor-critic reinforcement learning (RL) algo-rithms,function estimation errors are known to cause ineffec-tive random exploration at the beginning of training,and lead to overestimated value estimates and suboptimal policies.In this paper,we address the problem by executing advantage rectifi-cation with imperfect demonstrations,thus reducing the func-tion estimation errors.Pretraining with expert demonstrations has been widely adopted to accelerate the learning process of deep reinforcement learning when simulations are expensive to obtain.However,existing methods,such as behavior cloning,often assume the demonstrations contain other information or labels with regard to performances,such as optimal assumption,which is usually incorrect and useless in the real world.In this paper,we explicitly handle imperfect demonstrations within the actor-critic RL frameworks,and propose a new method called learning from imperfect demonstrations with advantage recti-fication (LIDAR).LIDAR utilizes a rectified loss function to merely learn from selective demonstrations,which is derived from a minimal assumption that the demonstrating policies have better performances than our current policy.LIDAR learns from contradictions caused by estimation errors,and in turn reduces estimation errors.We apply LIDAR to three popular actor-critic algorithms,DDPG,TD3 and SAC,and experiments show that our method can observably reduce the function esti-mation errors,effectively leverage demonstrations far from the optimal,and outperform state-of-the-art baselines consistently in all the scenarios.

文献关键词：

中图分类号：

[1] 数理科学和化学（O） / 力学（O3） / 振动理论（O32） / 非线性振动（O322）

[2] 医药、卫生（R） / 特种医学（R8） / 航空航天医学（R85） / 航空航天医学工程（R857） / 生物信息技术及电子计算机的应用（R857.3）

[3] 医药、卫生（R） / 神经病学与精神病学（R74） / 神经病学（R741）

作者姓名：

Xiaoqin ZHANG;Huimin MA;Xiong LUO;Jian YUAN

作者机构：

Department of EE,Tsinghua University,Beijing 100084,China;School of Computer and Communication Engineering,University of Science and Technology Beijing,Beijing 100083,China

文献出处：

计算机科学前沿

引用格式：

[1]Xiaoqin ZHANG;Huimin MA;Xiong LUO;Jian YUAN-.LIDAR:learning from imperfect demonstrations with advantage rectification)[J].计算机科学前沿,2022(01):53-62

A类：

ineffec,rectifi,Pretraining,observably

B类：

LIDAR,learning,from,imperfect,demonstrations,advantage,rectification,In,actor,critic,reinforcement,RL,function,estimation,errors,are,known,random,exploration,beginning,lead,overestimated,value,estimates,suboptimal,policies,this,paper,address,problem,by,executing,thus,reducing,expert,has,been,widely,adopted,accelerate,process,deep,when,simulations,expensive,obtain,However,existing,methods,such,behavior,cloning,often,assume,contain,other,information,labels,regard,performances,assumption,which,usually,incorrect,useless,real,world,explicitly,handle,within,frameworks,propose,new,called,utilizes,rectified,loss,merely,selective,derived,minimal,that,demonstrating,have,better,than,our,current,policy,learns,contradictions,caused,turn,reduces,We,apply,three,popular,algorithms,DDPG,TD3,SAC,experiments,show,can,effectively,leverage,far,outperform,state,art,baselines,consistently,scenarios

AB值：

0.519248

相似文献

Knowledge transfer in multi-agent reinforcement learning with incremental number of agents

LIU Wenzhang;DONG Lu;LIU Jian;SUN Changyin-School of Automation,Southeast University,Nanjing 210096,China;School of Cyber Science and Engineering,Southeast University,Nanjing 211189,China

Heading constraint algorithm for foot-mounted PNS using low-cost IMU

GUI Jing;ZHAO Heming;XU Xiang-School of Electronic and Information Engineering,Soochow University,Suzhou 215006,China

DOA estimation of incoherently distributed sources using importance sampling maximum likelihood

WU Tao;DENG Zhenghong;HU Xiaoxiang;LI Ao;XU Jiwei-School of Automation,Northwestern Polytechnical University,Xi'an 710072,China;Equipment Management and UAV College,Air Force Engineering University,Xi'an 710051,China;School of Information Engineering,Xi'an University of Posts and Telecommunications,Xi'an 710061,China

Label correlation for partial label learning

GE Lingchi;FANG Min;LI Haikun;CHEN Bo-School of Computer Science and Technology,Xidian University,Xi'an 710071,China

Deep learning for fast channel estimation in millimeter-wave MIMO systems