Hybrid Q-learning for data-based optimal control of non-linear switching system|LI Xiaofeng;DONG Lu;SUN Changyin|School of Artificial Intelligence,Anhui University,Hefei 230601,China - 期刊导航|首站-论文投稿智能助手|论文发表|论文智能投稿|期刊自助发表推荐|杂志社快速发表|查同导刊-域田数据官方网站

典型文献

Hybrid Q-learning for data-based optimal control of non-linear switching system

文献摘要：

In this paper,the optimal control of non-linear switch-ing system is investigated without knowing the system dyna-mics.First,the Hamilton-Jacobi-Bellman(HJB)equation is derived with the consideration of hybrid action space.Then,a novel data-based hybrid Q-learning(HQL)algorithm is proposed to find the optimal solution in an iterative manner.In addition,the theoretical analysis is provided to illustrate the convergence and optimality of the proposed algorithm.Finally,the algorithm is implemented with the actor-critic(AC)structure,and two linear-in-parameter neural networks are utilized to approximate the functions.Simulation results validate the effectiveness of the data-driven method.

文献关键词：

中图分类号：

[1] 自动化技术、计算机技术（TP） / 计算技术、计算机技术（TP3） / 计算机的应用（TP39） / 信息处理(信息加工)（TP391）

[2] 医药、卫生（R） / 基础医学（R3） / 病理学（R36） / 病理过程（R364）

[3] 数理科学和化学（O） / 力学（O3） / 振动理论（O32） / 非线性振动（O322）

作者姓名：

LI Xiaofeng;DONG Lu;SUN Changyin

作者机构：

School of Automation,Southeast University,Nanjing 210096,China;School of Artificial Intelligence,Anhui University,Hefei 230601,China;School of Cyber Science and Engineering,Southeast University,Nanjing 211189,China

文献出处：

系统工程与电子技术（英文版）

引用格式：

[1]LI Xiaofeng;DONG Lu;SUN Changyin-.Hybrid Q-learning for data-based optimal control of non-linear switching system)[J].系统工程与电子技术（英文版）,2022(05):1186-1194

A类：

HQL

B类：

Hybrid,learning,data,control,linear,switching,system,In,this,paper,investigated,without,knowing,dyna,mics,First,Hamilton,Jacobi,Bellman,HJB,equation,derived,consideration,hybrid,action,space,Then,novel,algorithm,proposed,find,solution,iterative,manner,addition,theoretical,analysis,provided,illustrate,convergence,optimality,Finally,implemented,actor,critic,AC,structure,parameter,neural,networks,are,utilized,approximate,functions,Simulation,results,validate,effectiveness,driven,method

AB值：

0.616504

相似文献

Towards autonomous and optimal excavation of shield machine:a deep reinforcement learning-based approach

Ya-kun ZHANG;Guo-fang GONG;Hua-yong YANG;Yu-xi CHEN;Geng-lin CHEN-State Key Laboratory of Fluid Power and Mechatronic Systems,Zhejiang University,Hangzhou 310027,China;School of Electrical and Power Engineering,China University of Mining and Technology,Xuzhou 221116,China

Sparse identification method of extracting hybrid energy harvesting system from observed data

Ya-Hui Sun;Yuan-Hui Zeng;Yong-Ge Yang-School of Mathematics and Statistics,Guangdong University of Technology,Guangzhou 510520,China;State Key Laboratory for Strength and Vibration of Mechanical Structures,Xi'an Jiaotong University,Xi'an 710049,China

Minimax Q-learning design for H∞ control of linear discrete-time systems

Xinxing LI;Lele XI;Wenzhong ZHA;Zhihong PENG-Information Science Academy,China Electronics Technology Group Corporation,Beijing 100086,China;School of Automation,Beijing Institute of Technology,Beijing 100081,China;Peng Cheng Laboratory,Shenzhen 518052,China

Multi-agent differential game based cooperative synchronization control using a data-driven method

Yu SHI;Yongzhao HUA;Jianglong YU;Xiwang DONG;Zhang REN-School of Automation Science and Electrical Engineering,Beihang University,Beijing 100191,China;Institute of Artificial Intelligence,Beihang University,Beijing 100191,China

Training time minimization for federated edge learning with optimized gradient quantization and bandwidth allocation