首站-论文投稿智能助手
典型文献
Efficient policy evaluation by matrix sketching
文献摘要:
In the reinforcement learning,policy evaluation aims to predict long-term values of a state under a certain policy.Since high-dimensional representations become more and more common in the reinforcement learning,how to reduce the computational cost becomes a significant problem to the policy evaluation.Many recent works focus on adopting matrix sketching methods to accelerate least-square temporal differ-ence(TD)algorithms and quasi-Newton temporal difference algorithms.Among these sketching methods,the truncated incremental SVD shows better performance because it is stable and efficient.However,the convergence properties of the incremental SVD is still open.In this paper,we first show that the conventional incremental SVD algorithms could have enormous approximation errors in the worst case.Then we propose a variant of incremental SVD with better theoretical guarantees by shrinking the singular values periodically.Moreover,we employ our improved incremental SVD to accelerate least-square TD and quasi-Newton TD algorithms.The experimental results verify the correctness and effecti-veness of our methods.
文献关键词:
作者姓名:
Cheng CHEN;Weinan ZHANG;Yong YU
作者机构:
Department of Computer Science,Shanghai Jiao Tong University,Shanghai 200240,China
文献出处:
引用格式:
[1]Cheng CHEN;Weinan ZHANG;Yong YU-.Efficient policy evaluation by matrix sketching)[J].计算机科学前沿,2022(05):92-100
A类:
sketching,effecti,veness
B类:
Efficient,policy,evaluation,by,matrix,In,reinforcement,learning,aims,predict,long,term,values,state,under,certain,Since,high,dimensional,representations,more,common,reduce,computational,cost,becomes,significant,problem,Many,recent,works,focus,adopting,methods,accelerate,least,square,temporal,TD,algorithms,quasi,Newton,difference,Among,these,truncated,incremental,SVD,shows,better,performance,because,stable,efficient,However,convergence,properties,still,open,this,paper,first,that,conventional,could,have,enormous,approximation,errors,worst,case,Then,propose,variant,theoretical,guarantees,shrinking,singular,periodically,Moreover,employ,our,improved,experimental,results,verify,correctness
AB值:
0.550068
相似文献
机标中图分类号,由域田数据科技根据网络公开资料自动分析生成,仅供学习研究参考。