首站-论文投稿智能助手
典型文献
Design of high parallel CNN accelerator based on FPGA for AIoT
文献摘要:
To tackle the challenge of applying convolutional neural network(CNN)in field-programmable gate array(FPGA)due to its computational complexity,a high-performance CNN hardware accelerator based on Verilog hardware description language was designed,which utilizes a pipeline architecture with three parallel dimensions including input channels,output channels,and convolution kernels.Firstly,two multiply-and-accumulate(MAC)operations were packed into one digital signal processing(DSP)block of FPGA to double the computation rate of the CNN accelerator.Secondly,strategies of feature map block partitioning and special memory arrangement were proposed to optimize the total amount of off-chip access memory and reduce the pressure on FPGA bandwidth.Finally,an efficient computational array combining multiplicative-additive tree and Winograd fast convolution algorithm was designed to balance hardware resource consumption and computational performance.The high parallel CNN accelerator was deployed in ZU3EG of Alinx,using the YOLOv3-tiny algorithm as the test object.The average computing performance of the CNN accelerator is 127.5 giga operations per second(GOPS).The experimental results show that the hardware architecture effectively improves the computational power of CNN and provides better performance compared with other existing schemes in terms of power consumption and the efficiency of DSPs and block random access memory(BRAMs).
文献关键词:
作者姓名:
Lin Zhijian;Gao Xuewei;Chen Xiaopei;Zhu Zhipeng;Du Xiaoyong;Chen Pingping
作者机构:
School of Advanced Manufacturing,Fuzhou University,Quanzhou 362251,China;College of Physics and Information Engineering.Fuzhou University,Fuzhou 350108,China
引用格式:
[1]Lin Zhijian;Gao Xuewei;Chen Xiaopei;Zhu Zhipeng;Du Xiaoyong;Chen Pingping-.Design of high parallel CNN accelerator based on FPGA for AIoT)[J].中国邮电高校学报(英文版),2022(05):1-9,61
A类:
ZU3EG,Alinx,giga,DSPs,BRAMs
B类:
Design,high,parallel,accelerator,FPGA,AIoT,To,tackle,challenge,applying,convolutional,neural,network,field,programmable,gate,array,due,its,computational,complexity,performance,hardware,Verilog,description,language,was,designed,which,utilizes,pipeline,architecture,three,dimensions,including,input,channels,output,kernels,Firstly,multiply,accumulate,MAC,operations,were,packed,into,one,digital,signal,processing,block,double,Secondly,strategies,feature,map,partitioning,special,memory,arrangement,proposed,optimize,total,amount,off,chip,access,reduce,pressure,bandwidth,Finally,efficient,combining,multiplicative,additive,tree,Winograd,fast,algorithm,balance,resource,consumption,deployed,using,YOLOv3,tiny,test,object,average,computing,second,GOPS,experimental,results,show,that,effectively,improves,power,provides,better,compared,other,existing,schemes,terms,efficiency,random
AB值:
0.572816
相似文献
Photonic matrix multiplication lights up photonic accelerator and beyond
Hailong Zhou;Jianji Dong;Junwei Cheng;Wenchan Dong;Chaoran Huang;Yichen Shen;Qiming Zhang;Min Gu;Chao Qian;Hongsheng Chen;Zhichao Ruan;Xinliang Zhang-Wuhan National Laboratory for Optoelectronics,Huazhong University of Science and Technology,Wuhan 430074,China;Department of Electronic Engineering,The Chinese University of Hong Kong,Shatin,Hong Kong,China;Lightelligence,Hangzhou 311121,China;Institute of Photonic Chips,University of Shanghai for Science and Technology,Shanghai 200093,China;Centre for Artificial-Intelligence Nanophotonics,School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China;Interdisciplinary Center for Quantum Information,State Key Laboratory of Modern Optical Instrumentation,ZJU-Hangzhou Global Scientific and Technological Innovation Center,ZJU-UIUC Institute,Zhejiang University,Hangzhou 310027,China;Interdisciplinary Center of Quantum Information,State Key Laboratory of Modern Optical Instrumentation,and Zhejiang Province Key Laboratory of Quantum Technology and Device,Department of Physics,Zhejiang University,Hangzhou 310027,China
机标中图分类号,由域田数据科技根据网络公开资料自动分析生成,仅供学习研究参考。