Compressed page walk cache|Dunbo ZHANG;Chaoyang JIA;Li SHEN - 期刊导航|首站-论文投稿智能助手|论文发表|论文智能投稿|期刊自助发表推荐|杂志社快速发表|查同导刊-域田数据官方网站

典型文献

Compressed page walk cache

文献摘要：

GPUs are widely used in modem high-performance computing systems.To reduce the burden of GPU program-mers,operating system and GPU hardware provide great supports for shared virtual memory,which enables GPU and CPU to share the same virtual address space.Unfortunately,the current SIMT execution model of GPU brings great challenges for the virtual-physical address translation on the GPU side,mainly due to the huge number of virtual addresses which are generated simultaneously and the bad locality of these virtual addresses.Thus,the excessive TLB accesses increase the miss ratio of TLB.As an attractive solution,Page Walk Cache(PWC)has received wide attention for its capability of reducing the memory accesses caused by TLB misses.However,the current PWC mechanism suffers from heavy redundancies,which significantly limits its efficiency.In this paper,we first investigate the facts leading to this issue by evaluating the performance of PWC with typical GPU benchmarks.We find that the repeated L4 and L3 indices of virtual addresses increase the redundancies in PWC,and the low locality of L2 indices causes the low hit ratio in PWC.Based on these obser-vations,we propose a new PWC structure,namely Compressed Page Walk Cache(CPWC),to resolve the redundancy burden in current PWC.Our CPWC can be organized in either direct-mapped mode or set-associated mode.Experimental results show that CPWC increases by 3 times over TPC in the number of page table entries,increases by 38.3％over PWC in L2 index hit ratio and reduces by 26.9％in the memory accesses of page tables.The average memory accesses caused by each TLB miss is reduced to 1.13.Overall,the average IPC can improve by 25.3％.

文献关键词：

中图分类号：

[1] 自动化技术、计算机技术（TP） / 计算技术、计算机技术（TP3） / 计算机软件（TP31） / 操作系统（TP316） / 实时操作系统（TP316.2）

[2] 文化、科学、教育、体育（G） / 各级教育（G6） / 高等教育（G64） / 教学理论、教学法（G642）

[3] 文化、科学、教育、体育（G） / 各级教育（G6） / 高等教育（G64） / 思想政治教育、德育（G641）

作者姓名：

Dunbo ZHANG;Chaoyang JIA;Li SHEN

作者机构：

School of Computer,National University of Defense Technology,Changsha 410000,China

文献出处：

计算机科学前沿

引用格式：

[1]Dunbo ZHANG;Chaoyang JIA;Li SHEN-.Compressed page walk cache)[J].计算机科学前沿,2022(03):40-51

A类：

accesses,misses,redundancies,CPWC

B类：

Compressed,page,walk,cache,GPUs,widely,modem,high,performance,computing,systems,To,burden,program,mers,operating,hardware,provide,great,supports,shared,virtual,memory,which,enables,CPU,same,space,Unfortunately,current,SIMT,execution,model,brings,challenges,physical,translation,side,mainly,due,huge,number,addresses,generated,simultaneously,bad,locality,these,Thus,excessive,TLB,ratio,attractive,solution,Page,Walk,Cache,has,received,attention,capability,reducing,caused,by,However,mechanism,suffers,from,heavy,significantly,limits,efficiency,In,this,paper,first,investigate,facts,leading,issue,evaluating,typical,benchmarks,We,find,that,repeated,L4,L3,indices,low,L2,causes,hit,Based,obser,vations,propose,new,structure,namely,resolve,redundancy,Our,organized,either,direct,mapped,set,associated,Experimental,results,show,increases,times,over,TPC,entries,reduces,tables,average,each,reduced,Overall,IPC,improve

AB值：

0.485251

相似文献

Knowledge transfer in multi-agent reinforcement learning with incremental number of agents

LIU Wenzhang;DONG Lu;LIU Jian;SUN Changyin-School of Automation,Southeast University,Nanjing 210096,China;School of Cyber Science and Engineering,Southeast University,Nanjing 211189,China

Joint uplink and downlink resource allocation for low-latency mobile virtual reality delivery in fog radio access networks

Tian DANG;Chenxi LIU;Xiqing LIU;Shi YAN-State Key Laboratory of Networking and Switching Technology,Beijing University of Posts and Telecommunications,Beijing 100876,China

Self-deployed execution environment for high performance computing

Mingtian SHAO;Kai LU;Wenzhe ZHANG-College of Computer,National University of Defense Technology,Changsha 410073,China

SA-RSR:a read-optimal data recovery strategy for XOR-coded distributed storage systems

Xingjun ZHANG;Ningjing LIANG;Yunfei LIU;Changjiang ZHANG;Yang LI-School of Computer Science and Technology,Xi'an Jiaotong University,Xi'an 710049,China;Beijing Electronic Engineering General Research Institute,Beijing 100854,China

Efficient decoding self-attention for end-to-end speech synthesis