PDF下载分享

[1]相博镪,凌味未,李蠡,等.基于FPGA的RNN硬件加速架构[J].成都信息工程大学学报,2022,37(04):374-378.[doi:10.16836/j.cnki.jcuit.2022.04.002]
　XIANG Boqiang,LING Weiwei,LI Li,et al.FPGA-based Hardware Accelerator for RNN[J].Journal of Chengdu University of Information Technology,2022,37(04):374-378.[doi:10.16836/j.cnki.jcuit.2022.04.002]

点击复制

基于FPGA的RNN硬件加速架构

成都信息工程大学学报[ISSN:1006-6977/CN:61-1281/TN] 卷: 37 期数: 2022年04期页码: 374-378 栏目: 电子信息科学与技术出版日期: 2022-08-30

Title:: FPGA-based Hardware Accelerator for RNN

文章编号:: 2096-1618(2022)04-0374-05

作者:: 相博镪; 凌味未; 李蠡; 邹金成; (成都信息工程大学通信工程学院，四川成都 610225)

Author(s):: XIANG Boqiang; LING Weiwei; LI Li; ZOU Jincheng; (College of Communication Engineering, Chengdu University of Information Technology, Chengdu 610225, China)

关键词:: 微电子学与固体电子学; 硬件加速器; 可编程逻辑器件; 循环神经网络; 指令集架构

Keywords:: microelectronics and solid state electronics; hardware accelerator; FPGA; RNN; ISA

分类号:: TP183

DOI:: 10.16836/j.cnki.jcuit.2022.04.002

文献标志码:: A

摘要:: 针对边缘计算场景下，循环神经网络消耗计算资源过多，且计算流程相对复杂所导致的计算效率较低的问题，提出一种RNN模型的硬件加速方法，并在FPGA平台对该方法进行验证。为在计算资源可复用的前提下尽可能提高计算速度，该加速器利用一种SIMD指令集，通过软件编程的形式来配置运算流程，适配不同层数和维度的RNN及其相关模型。还根据RNN模型数据流的特点，对加速器设计进行优化，并设置合理的片内缓存方式和并行逻辑以充分利用存储器带宽，降低资源开销。实验结果表明，加速器在100 MHz的工作频率下运算性能达到6.7 GOPS，需要的功耗为2.15 W。基于指令集和软硬件协同的方式对两种网络模型进行实现，速度是微控制器的230倍。

Abstract:: Aiming at the problem of low computing efficiency caused by the excessive consumption of computing resources and the complex computing process of Recurrent Neural Network in edge computing scenarios,the authors proposed a hardware acceleration method of RNN models and verified it on FPGA platform.To improve the processing speed under the premise that computing resources can be reused,the accelerator uses an SIMD instruction set,which can configure the operation flow in the form of software programming,and can adapt to different layers and dimensions of RNN and its related models. Moreover,according to the properties of RNN model’s data flow,the authors optimized the accelerator architecture with providing on-chip caches and parallel logic circuits to make full use of memory bandwidth and simultaneously reduce overhead.Experiments show that the performance of the accelerator reaches 6.7 GOPS at 100 MHz and the power consumption is 2.15 W.The two network models are implemented based on instruction set with the cooperation of software and hardware,and the speed is 230 times faster than that of the microcontroller.

参考文献/References:

[1] Nurvitadhi E,Sim J,Sheffield D,et al.Accelerating recurrent neural networks in analytics servers:Comparison of FPGA,CPU,GPU,and ASIC[C].International Conference on Field Programmable Logic & Applications.IEEE,2016.
[2] Rybalkin V,Wehn N,Yousefi M R,et al.Hardware architecture of Bidirectional Long Short-Term Memory Neural Network for Optical Character Recognition[C].2017 Design,Automation & Test in Europe Conference & Exhibition(DATE).IEEE,2017.
[3] Han S,Kang J,Mao H,et al.Ese:Efficient speech recognition engine with sparse lstm on fpga[C].Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays,2017:75-84.
[4] 邓良,陈章进,乔栋,等.基于FPGA的指令集架构神经网络协处理器的设计与验证[J].小型微型计算机系统,2021,42(6):1129-1135.
[5] Guan Y,Yuan Z,Sun G,et al.FPGA-based accelerator for long short-term memory recurrent neural networks[C].2017 22nd Asia and South Pacific Design Automation Conference(ASP-DAC).IEEE,2017.
[6] 高琛,张帆.基于FPGA的递归神经网络加速器的研究进展[J].网络与信息安全学报,2019,5(4):1-13.
[7] Yu J,Hu Y,Ning X,et al.Instruction driven cross-layer CNN accelerator with winograd transformation on FPGA[C].2017 International Conference on Field Programmable Technology(ICFPT),2018.
[8] Sun Z,Zhu Y,Zheng Y,et al.FPGA acceleration of LSTM based on data for test flight[C].2018 IEEE International Conference on Smart Cloud(SmartCloud).IEEE,2018:1-6.
[9] 范军,巩杰,吴茜凤,等.基于FPGA的RNN加速SoC设计与实现[J].微电子学与计算机,2020,37(11):1-5.
[10] Han S,Mao H,Dally W J.Deep Compression:Compressing Deep Neural Networks with Pruning,Trained Quantization and Huffman Coding[J].Fiber,2015,56(4):3-7.
[11] Lee M,Hwang K,Park J,et al.FPGA-Based Low-Power Speech Recognition with Recurrent Neural Networks[C].2016 IEEE International Workshop on Signal Processing Systems(SiPS).IEEE,2016.
[12] Song H,Mao H,Dally W J.Deep Compression:Compressing Deep Neural Networks with Pruning,Trained Quantization and Huffman Coding[C].ICLR,2016.
[13] 彭井桐,祝永新,汪辉,等.基于FPGA的GRU神经网络飞行数据异常检测[J].微电子学与计算机,2021,38(11).

相似文献/References:

[1]谢蓉芳,李子夫,叶松.纠错模式可配置的NAND Flash BCH译码器设计[J].成都信息工程大学学报,2018,(04):353.[doi:10.16836/j.cnki.jcuit.2018.04.001]
　XIE Rong-fang,LI Zi-fu,YE Song.Design of Mode Configurable NAND Flash BCH Decoder[J].Journal of Chengdu University of Information Technology,2018,(04):353.[doi:10.16836/j.cnki.jcuit.2018.04.001]

备注/Memo

收稿日期:2021-11-03
基金项目:国家自然科学基金资助项目(61201094)

更新日期/Last Update: 2022-08-30