首站-论文投稿智能助手
典型文献
Compressed page walk cache
文献摘要:
GPUs are widely used in modem high-performance computing systems.To reduce the burden of GPU program-mers,operating system and GPU hardware provide great supports for shared virtual memory,which enables GPU and CPU to share the same virtual address space.Unfortunately,the current SIMT execution model of GPU brings great challenges for the virtual-physical address translation on the GPU side,mainly due to the huge number of virtual addresses which are generated simultaneously and the bad locality of these virtual addresses.Thus,the excessive TLB accesses increase the miss ratio of TLB.As an attractive solution,Page Walk Cache(PWC)has received wide attention for its capability of reducing the memory accesses caused by TLB misses.However,the current PWC mechanism suffers from heavy redundancies,which significantly limits its efficiency.In this paper,we first investigate the facts leading to this issue by evaluating the performance of PWC with typical GPU benchmarks.We find that the repeated L4 and L3 indices of virtual addresses increase the redundancies in PWC,and the low locality of L2 indices causes the low hit ratio in PWC.Based on these obser-vations,we propose a new PWC structure,namely Compressed Page Walk Cache(CPWC),to resolve the redundancy burden in current PWC.Our CPWC can be organized in either direct-mapped mode or set-associated mode.Experimental results show that CPWC increases by 3 times over TPC in the number of page table entries,increases by 38.3%over PWC in L2 index hit ratio and reduces by 26.9%in the memory accesses of page tables.The average memory accesses caused by each TLB miss is reduced to 1.13.Overall,the average IPC can improve by 25.3%.
文献关键词:
作者姓名:
Dunbo ZHANG;Chaoyang JIA;Li SHEN
作者机构:
School of Computer,National University of Defense Technology,Changsha 410000,China
文献出处:
引用格式:
[1]Dunbo ZHANG;Chaoyang JIA;Li SHEN-.Compressed page walk cache)[J].计算机科学前沿,2022(03):40-51
A类:
accesses,misses,redundancies,CPWC
B类:
Compressed,page,walk,cache,GPUs,widely,modem,high,performance,computing,systems,To,burden,program,mers,operating,hardware,provide,great,supports,shared,virtual,memory,which,enables,CPU,same,space,Unfortunately,current,SIMT,execution,model,brings,challenges,physical,translation,side,mainly,due,huge,number,addresses,generated,simultaneously,bad,locality,these,Thus,excessive,TLB,ratio,attractive,solution,Page,Walk,Cache,has,received,attention,capability,reducing,caused,by,However,mechanism,suffers,from,heavy,significantly,limits,efficiency,In,this,paper,first,investigate,facts,leading,issue,evaluating,typical,benchmarks,We,find,that,repeated,L4,L3,indices,low,L2,causes,hit,Based,obser,vations,propose,new,structure,namely,resolve,redundancy,Our,organized,either,direct,mapped,set,associated,Experimental,results,show,increases,times,over,TPC,entries,reduces,tables,average,each,reduced,Overall,IPC,improve
AB值:
0.485251
相似文献
机标中图分类号,由域田数据科技根据网络公开资料自动分析生成,仅供学习研究参考。