Choice of discount rate in reinforcement learning with long-delay rewards|LIN Xiangyang;XING Qinghua;LIU Fuxian - 期刊导航|首站-论文投稿智能助手|论文发表|论文智能投稿|期刊自助发表推荐|杂志社快速发表|查同导刊-域田数据官方网站

典型文献

Choice of discount rate in reinforcement learning with long-delay rewards

文献摘要：

In the world, most of the successes are results of long-term efforts. The reward of success is extremely high, but be-fore that, a long-term investment process is required. People who are "myopic" only value short-term rewards and are unwill-ing to make early-stage investments, so they hardly get the ulti-mate success and the corresponding high rewards. Similarly, for a reinforcement learning (RL) model with long-delay rewards, the discount rate determines the strength of agent's "farsightedness". In order to enable the trained agent to make a chain of correct choices and succeed finally, the feasible region of the discount rate is obtained through mathematical derivation in this paper firstly. It satisfies the "farsightedness" requirement of agent. Af-terwards, in order to avoid the complicated problem of solving implicit equations in the process of choosing feasible solutions, a simple method is explored and verified by theoreti cal demon-stration and mathematical experiments. Then, a series of RL ex-periments are designed and implemented to verify the validity of theory. Finally, the model is extended from the finite process to the infinite process. The validity of the extended model is veri-fied by theories and experiments. The whole research not only reveals the significance of the discount rate, but also provides a theoretical basis as well as a practical method for the choice of discount rate in future researches.

文献关键词：

中图分类号：

[1] 医药、卫生（R） / 基础医学（R3） / 病理学（R36） / 病理过程（R364）

[2] 医药、卫生（R） / 神经病学与精神病学（R74） / 神经病学（R741）

[3] 医药、卫生（R） / 药学（R9） / 药理学（R96） / 实验药理学（R965）

作者姓名：

LIN Xiangyang;XING Qinghua;LIU Fuxian

作者机构：

Department of Air Defense and Anti-Missile,Air Force Engineering University,Xi'an 710051,China

文献出处：

系统工程与电子技术（英文版）

引用格式：

[1]LIN Xiangyang;XING Qinghua;LIU Fuxian-.Choice of discount rate in reinforcement learning with long-delay rewards)[J].系统工程与电子技术（英文版）,2022(02):381-392

A类：

unwill,farsightedness,terwards

B类：

Choice,discount,rate,reinforcement,learning,long,delay,rewards,In,world,most,successes,are,results,efforts,extremely,high,but,be,fore,that,process,required,People,myopic,only,value,short,make,early,stage,investments,they,hardly,get,ulti,mate,corresponding,Similarly,RL,model,determines,strength,agent,order,enable,trained,chain,correct,choices,succeed,finally,feasible,region,obtained,through,mathematical,derivation,this,paper,firstly,It,satisfies,requirement,Af,avoid,complicated,problem,solving,implicit,equations,choosing,solutions,simple,method,explored,verified,by,demon,stration,experiments,Then,series,designed,implemented,verify,validity,theory,Finally,extended,from,infinite,theories,whole,not,reveals,significance,also,provides,theoretical,basis,well,practical,future,researches

AB值：

0.513915

相似文献

Towards autonomous and optimal excavation of shield machine:a deep reinforcement learning-based approach

Ya-kun ZHANG;Guo-fang GONG;Hua-yong YANG;Yu-xi CHEN;Geng-lin CHEN-State Key Laboratory of Fluid Power and Mechatronic Systems,Zhejiang University,Hangzhou 310027,China;School of Electrical and Power Engineering,China University of Mining and Technology,Xuzhou 221116,China

Review of elemental mercury (Hg0) removal by CuO-based materials

Dong YE;Xiao-xiang WANG;Run-xian WANG;Xin LIU;Hui LIU;Hai-ning WANG-College of Quality&Safety Engineering,China Jiliang University,Hangzhou 310018,China;Key Laboratory of Biomass Chemical Engineering of Ministry of Education,Institute of Industrial Ecology and Environment,College of Chemical and Biological Engineering,Zhejiang University,Hangzhou 310027,China

Low-loss belief propagation decoder with Tanner graph in quantum error-correction codes

Dan-Dan Yan;Xing-Kui Fan;Zhen-Yu Chen;Hong-Yang Ma-School of Sciences,Qingdao University of Technology,Qingdao 266033,China

Influence fast or later:Two types of influencers in social networks

Fang Zhou;Chang Su;Shuqi Xu;Linyuan Lv-Yangtze Delta Region Institute(Huzhou)&Institute of Fundamental and Frontier Sciences,University of Electronic Science and Technology of China,Huzhou 313001,China;Beijing Computational Science Research Center,Beijing 100193,China

Simulation of crowd dynamics in pedestrian evacuation concerning panic contagion:A cellular automaton approach