Efficient Exploration for Multi-Agent Reinforcement Learning via Transferable Successor Features|Wenzhang Liu;Lu Dong;Dan Niu;Changyin Sun|School of Cyber Science and Engineering,Southeast University,Nanjing 211189,China - 期刊导航|首站-论文投稿智能助手|论文发表|论文智能投稿|期刊自助发表推荐|杂志社快速发表|查同导刊-域田数据官方网站

典型文献

Efficient Exploration for Multi-Agent Reinforcement Learning via Transferable Successor Features

文献摘要：

In multi-agent reinforcement learning(MARL),the behaviors of each agent can influence the learning of others,and the agents have to search in an exponentially enlarged joint-action space.Hence,it is challenging for the multi-agent teams to explore in the environment.Agents may achieve suboptimal policies and fail to solve some complex tasks.To improve the exploring efficiency as well as the performance of MARL tasks,in this paper,we propose a new approach by transferring the knowledge across tasks.Differently from the traditional MARL algorithms,we first assume that the reward functions can be computed by linear combinations of a shared feature function and a set of task-specific weights.Then,we define a set of basic MARL tasks in the source domain and pre-train them as the basic knowledge for further use.Finally,once the weights for target tasks are available,it will be easier to get a well-performed policy to explore in the target domain.Hence,the learning process of agents for target tasks is speeded up by taking full use of the basic knowledge that was learned previously.We evaluate the proposed algorithm on two challenging MARL tasks:cooperative box-pushing and non-monotonic predator-prey.The experiment results have demonstrated the improved performance compared with state-of-the-art MARL algorithms.

文献关键词：

中图分类号：

[1] 自动化技术、计算机技术（TP） / 计算技术、计算机技术（TP3） / 计算机的应用（TP39） / 信息处理(信息加工)（TP391）

[2] 自动化技术、计算机技术（TP） / 自动化基础理论（TP1） / 人工智能理论（TP18） / 自动推理、机器学习（TP181）

[3] 自动化技术、计算机技术（TP） / 计算技术、计算机技术（TP3） / 计算机的应用（TP39） / 信息处理(信息加工)（TP391） / 模式识别与装置（TP391.4）

作者姓名：

Wenzhang Liu;Lu Dong;Dan Niu;Changyin Sun

作者机构：

School of Artificial Intelligence,Anhui University,Hefei 230039,and also with the Peng Cheng Laboratory,Shenzhen 518055,China;School of Cyber Science and Engineering,Southeast University,Nanjing 211189,China;School of Automation,Southeast University,Nanjing 210096,China;School of Automation,Southeast University,Nanjing 210096;Peng Cheng Laboratory,Shenzhen 518055,China

文献出处：

自动化学报（英文版）

引用格式：

[1]Wenzhang Liu;Lu Dong;Dan Niu;Changyin Sun-.Efficient Exploration for Multi-Agent Reinforcement Learning via Transferable Successor Features)[J].自动化学报（英文版）,2022(09):1673-1686

A类：

Successor

B类：

Efficient,Exploration,Multi,Reinforcement,Learning,via,Transferable,Features,In,multi,reinforcement,learning,MARL,behaviors,each,can,influence,others,agents,have,search,exponentially,enlarged,joint,action,space,Hence,challenging,teams,explore,environment,Agents,may,achieve,suboptimal,policies,fail,solve,some,complex,tasks,To,exploring,efficiency,well,performance,this,paper,new,approach,by,transferring,knowledge,across,Differently,from,traditional,algorithms,first,assume,that,reward,functions,computed,linear,combinations,shared,feature,set,specific,weights,Then,define,basic,source,domain,train,them,further,use,Finally,once,target,available,will,easier,performed,policy,process,speeded,up,taking,full,was,learned,previously,We,evaluate,proposed,two,cooperative,box,pushing,monotonic,predator,prey,experiment,results,demonstrated,improved,compared,state,art

AB值：

0.575581

相似文献

Joint Task Scheduling,Resource Allocation,and UAV Trajectory under Clustering for FANETs

Wenjing You;Chao Dong;Qihui Wu;Yuben Qu;Yulei Wu;Rong He-The Key Laboratory of Dynamic Cognitive System of Electromagnetic Spectrum Space,Ministry of Industry and Information Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China;AVIC Chengdu Aircraft Design and Research Institute,China

Joint Scheduling and Resource Allocation for Federated Learning in SWIPT-Enabled Micro UAV Swarm Networks

Wanli Wen;Yunjian Jia;Wenchao Xia-School of Microelectronics and Communication Engineering,Chongqing University,Chongqing 400044,China;National Mobile Communications Research Laboratory,Southeast University,Nanjing 210009,China;Jiangsu Key Laboratory of Wireless Communications,Nanjing University of Posts and Telecommunications,Nanjing 210003,China

E3GCAPS:Efficient EEG-Based Multi-Capsule Framework with Dynamic Attention for Cross-Subject Cognitive State Detection

Yue Zhao;Guojun Dai;Xin Fang;Zhengxuan Wu;Nianzhang Xia;Yanping Jin;Hong Zeng-School of Computer Science and Technology,Hangzhou Dianzi University,Hangzhou 310018,China;Key Laboratory of Brain Machine Collaborative Intelligence of Zhejiang Province,Hangzhou 310018,China

Multi-Agent Few-Shot Meta Reinforcement Learning for Trajectory Design and Channel Selection in UAV-Assisted Networks

Shiyang Zhou;Yufan Cheng;Xia Lei;Huanhuan Duan-National Key Laboratory of Science and Technology on Communications,University of Electronic Science and Technology of China,Chengdu 611731,China

Efficient Multi-User for Task Offloading and Server Allocation in Mobile Edge Computing Systems