基于宇航级FPGA的YOLOv5s网络模型硬件加速

蒋康宁; 周海; 卞春江; 汪伶

doi:10.11728/cjss2023.05.2022-0044

基于宇航级FPGA的YOLOv5s网络模型硬件加速

doi: 10.11728/cjss2023.05.2022-0044 cstr: 32142.14.cjss2023.05.2022-0044

蒋康宁^{1, 2,},
周海¹,
卞春江¹,
汪伶^{1, 2}

1.
中国科学院国家空间科学中心　北京　100190
2.
中国科学院大学　北京　100049

基金项目: 中国科学院青年创新促进会项目资助（E0293401）

详细信息

作者简介:

蒋康宁：E-mail：18838980607@163.com

中图分类号: V557
计量
- 文章访问数: 1578
- HTML全文浏览量: 911
- PDF下载量: 138
- 被引次数:
  0(来源:Crossref)
  
  0(来源:其他)
出版历程
- 收稿日期: 2022-08-19
- 录用日期: 2023-06-25
- 修回日期: 2022-11-25
- 网络出版日期: 2023-06-25

Hardware Acceleration of YOLOv5s Network Model Based on Aerospace-grade FPGA

1.
National Space Science Center, Chinese Academy of Sciences, Beijing 100190
2.
University of Chinese Academy of Sciences, Beijing 100049

摘要

摘要: 由于遥感图像具有分辨率高和背景信息复杂的特点，其对目标检测的精确性和鲁棒性要求越来越高，因此遥感图像处理领域逐渐引入了卷积神经网络算法。然而此类算法通常模型复杂且计算量庞大，难以在空间与资源受限的星上平台高效运行。针对这一问题，提出一种基于宇航级现场可编程门阵列（Filed Programmable Gate Array, FPGA）的卷积神经网络硬件加速架构，并选用YOLOv5s作为目标网络，采用输入与输出通道并行展开以及数据流水线控制的策略进行架构设计。实验结果表明，在使用该处理架构加速YOLOv5s的推理阶段，卷积模块的工作频率可以达到200 MHz，其运算性能高达394.4GOPS（Giga Operations Per Second），FPGA的功耗为14.662 W，数字信号处理（Digital Signal Processing, DSP）计算矩阵的平均计算效率高达96.29%。
- 星上系统 /
- 卷积神经网络 /
- 硬件加速 /
- 现场可编程门阵列
Abstract: With the rapid development of my country’s remote sensing engineering technology, the resolution of remote sensing images that can be obtained is getting higher and higher, and the image background information is also more complex, which brings great challenges to the accuracy and robustness of traditional target detection methods. With the development of deep learning, the convolutional neural network algorithm has better performance in terms of detection accuracy and robustness than traditional methods. In order to improve the accuracy and robustness of remote sensing image target detection with high resolution and complex background, the remote sensing image target detection algorithm based on convolutional neural network is applied in this field. However, such algorithms usually have complex models and a large amount of calculation, making it difficult to run efficiently on space and resource-constrained on-board platforms. Aiming at this problem, a convolutional neural network forward inference hardware acceleration architecture based on aerospace-grade FPGA (Field Programmable Gate Array) is proposed, and the YOLOv5s network model is selected as the target algorithm for architecture design. Since the main body of the YOLOv5s network is composed of a large number of convolutional layers, the center of gravity of the accelerator architecture design lies in the convolutional layer. In the design of the architecture, the parallel expansion of input channels and output channels and the optimization strategy of data pipeline control are adopted to effectively improve the real-time processing performance of the inference stage is improved. The experimental results show that when using this processing architecture to accelerate the inference stage of YOLOv5s, the operating frequency of the convolution module can reach 200 MHz, and its computing performance can reach 394.4GOPS (Giga Operations Per Second). The power consumption is 14.662 W, and the average calculation efficiency of the DSP (Digital Signal Processing) calculation matrix is as high as 96.29%. It shows that the use of FPGA for hardware acceleration of convolutional neural networks in resource and power constrained on-board platforms has significant advantages.
- On-board system /
- Convolutional Neural Network (CNN) /
- Hardware acceleration /
- Filed programmable gate array

HTML全文

图 1 YOLOv5s的网络结构

Figure 1. Network structure of YOLOv5s

下载: 全尺寸图片幻灯片

图 2 加速器整体架构设计

Figure 2. Accelerator overall architecture design

下载: 全尺寸图片幻灯片

图 3 YOLOv5s卷积层并行计算矩阵

Figure 3. Parallel calculation matrix of YOLOv5s convolution layer

下载: 全尺寸图片幻灯片

图 4 输入特征图数据分块（彩色数据带为被分块的数据）

Figure 4. Input feature map data block (The colored data bands are the data being chunked)

下载: 全尺寸图片幻灯片

图 5 卷积窗口在特征图上滑动产生的复用（黄色部分代表卷积窗口）

Figure 5. Multiplexing generated by sliding the convolution window on the feature map (The yellow part represents the convolution window)

下载: 全尺寸图片幻灯片

图 6 权重数据的循环分块（不同颜色的部分代表各分块权重）

Figure 6. Cyclic block of weight data (Parts of different colors represent each weight block)

下载: 全尺寸图片幻灯片

图 7 循环计算顺序

Figure 7. Cycle calculation sequence

下载: 全尺寸图片幻灯片

图 8 5×5池化窗口多级串联优化设计

Figure 8. Multi-stage series optimization design of 5×5 pooling window

下载: 全尺寸图片幻灯片

图 9 池化模块实现方式

Figure 9. Working method of the pooling module

下载: 全尺寸图片幻灯片

图 10 切片操作

Figure 10. Slice operation

下载: 全尺寸图片幻灯片

图 11 切片操作实现流程

Figure 11. Implementation flow chart of the slicing operation

下载: 全尺寸图片幻灯片

图 12 残差结构及数据流向

Figure 12. Residual structure and data flow

下载: 全尺寸图片幻灯片

表 1 不同的CNN在FPGA上实现的情况比较

Table 1. Comparison of different CNN implementations on FPGA

Method	文献[16]	文献[17]	文献[18]	文献[19]	本文
FPGA	ZC709	XC7 K325 T	ZCU102	VC707	VC709
Network	YOLOv2-tiny	YOLOv2	YOLOv2	YOLOv2-tiny	YOLOv5s
精度/bit	8	32	16	BNN	16
频率/MHz	200	100	300	200	200
DSP	610/900	—	609	168/2800	1024/1200
BRAM	256/545	—	491	1026/1030	1094/1470
LUT	84000/219000	—	95000	86000/304000	166000/433000
FF	65000/437000	—	90000	60000/607000	228000/866000
吞吐量/GOPS	464.5	6.222	102.2	464.7	394.4
功耗/W	10.25	2.555	11.8	8.72	14.662
能耗比 /(GOPS·W^–1)	45.3	2.435	8.66	53.29	26.90
E_DSP	95.2%	—	—	—	96.29%

下载: 导出CSV

参考文献(26)

[1]	ZAIDI S S A, ANSARI M S, ASLAM A, et al. A survey of modern deep learning based object detection models[J]. Digital Signal Processing, 2022, 126: 103514 doi: 10.1016/j.dsp.2022.103514
[2]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90 doi: 10.1145/3065386
[3]	HUANG M Y, XU Y, QIAN L X, et al. A bridge neural network-based optical-SAR image joint intelligent interpretation framework[J]. Space: Science & Technology, 2021, 2021: 9841456
[4]	KONG T, SUN F C, LIU H P, et al. FoveaBox: Beyound anchor-based object detection[J]. IEEE Transactions on Image Processing, 2020, 29: 7389-7398 doi: 10.1109/TIP.2020.3002345
[5]	XU Y C, FU M T, WANG Q M, et al. Gliding vertex on the horizontal bounding box for multi-oriented object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(4): 1452-1459 doi: 10.1109/TPAMI.2020.2974745
[6]	LIU Z, CAI Y F, WANG H, et al. Robust target recognition and tracking of self-driving cars with radar and camera information fusion under severe weather conditions[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(7): 6640-6653 doi: 10.1109/TITS.2021.3059674
[7]	SINGH B, LI H D, SHARMA A, et al. R-FCN-3000 at 30 fps: Decoupling detection and classification[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 1081-1090
[8]	HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2980-2988
[9]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot multiBox detector[C]//14 th European Conference on Computer Vision. Amsterdam: Springer, 2016: 21-37
[10]	LI Z G, WANG J T. An improved algorithm for deep learning YOLO network based on Xilinx ZYNQ FPGA[C]//2020 International Conference on Culture-oriented Science & Technology (ICCST). Beijing: IEEE, 2020: 447-451
[11]	WANG Z X, XU K, WU S X, et al. Sparse-YOLO: Hardware/software Co-design of An FPGA accelerator for YOLOv2[J]. IEEE Access, 2020, 8: 116569-116585 doi: 10.1109/ACCESS.2020.3004198
[12]	WANG J, GU S S. FPGA implementation of object detection accelerator based on Vitis-AI[C]//2021 11 th International Conference on Information Science and Technology (ICIST). Chengdu: IEEE, 2021: 571-577
[13]	CAI Y F, LUAN T Y, GAO H B, et al. YOLOv4-5 D: An effective and efficient object detector for autonomous driving[J]. IEEE Transactions on Instrumentation and Measurement, 2021, 70: 4503613
[14]	谭显东, 彭辉. 改进YOLOv5的SAR图像舰船目标检测[J]. 计算机工程与应用, 2022, 58(4): 247-254 doi: 10.3778/j.issn.1002-8331.2108-0308 TAN Xiandong, PENG Hui. Improved YOLOv5 ship target detection in SAR image[J]. Computer Engineering and Applications, 2022, 58(4): 247-254 doi: 10.3778/j.issn.1002-8331.2108-0308
[15]	GAREA A S, HERAS D B, ARGÜELLO F. Caffe CNN-based classification of hyperspectral images on GPU[J]. The Journal of Supercomputing, 2019, 75(3): 1065-1077 doi: 10.1007/s11227-018-2300-2
[16]	ZHANG J M, CHENG L F, LI C F, et al. A low-latency FPGA implementation for real-time object detection[C]//2021 IEEE International Symposium on Circuits and Systems (ISCAS). Daegu: IEEE, 2021: 1-5
[17]	BI F H, YANG J. Target detection system design and FPGA implementation based on YOLO v2 algorithm[C]//2019 3 rd International Conference on Imaging, Signal Processing and Communication (ICISPC). Singapore: IEEE, 2019: 10-14
[18]	ZHANG S G, CAO J, ZHANG Q, et al. An FPGA-based reconfigurable CNN accelerator for YOLO[C]//2020 IEEE 3 rd International Conference on Electronics Technology (ICET). Chengdu: IEEE, 2020: 74-78
[19]	NGUYEN D T, NGUYEN T N, KIM H, et al. A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2019, 27(8): 1861-1873 doi: 10.1109/TVLSI.2019.2905242
[20]	陈浩敏, 姚森敬, 席禹, 等. YOLOv3-tiny的硬件加速设计及FPGA实现[J]. 计算机工程与科学, 2021, 43(12): 2139-2149 doi: 10.3969/j.issn.1007-130X.2021.12.007 CHEN Haomin, YAO Senjing, XI Yu, et al. Design and FPGA implementation of YOLOv3-tiny hardware acceleration[J]. Computer Engineering and Science, 2021, 43(12): 2139-2149 doi: 10.3969/j.issn.1007-130X.2021.12.007
[21]	周旗开, 张伟, 李东锦, 等. 基于改进YOLOv5s的光学遥感图像舰船分类检测方法[J]. 激光与光电子学进展, 2022, 59(16): 1628008 doi: 10.3788/LOP202259.1628008 ZHOU Qikai, ZHANG Wei, LI Dongjin, et al. Ship classification and detection method for optical remote sensing images based on improved YOLOv5s[J]. Laser and Optoelectronics Progress, 2022, 59(16): 1628008 doi: 10.3788/LOP202259.1628008
[22]	周海, 侯晴宇, 卞春江, 等. 一种FPGA实现的复杂背景红外小目标检测网络[J]. 北京航空航天大学学报, 2023, 49(2): 295-310 doi: 10.13700/j.bh.1001-5965.2021.0221 ZHOU Hai, HOU Qingyu, BIAN Chunjiang, et al. An infrared small target detection network under various complex backgrounds realized on FPGA[J]. Journal of Beijing University of Aeronautics and Astronautics, 2023, 49(2): 295-310 doi: 10.13700/j.bh.1001-5965.2021.0221
[23]	DODD P E, SHANEYFELT M R, SCHWANK J R, et al. Current and future challenges in radiation effects on CMOS electronics[J]. IEEE Transactions on Nuclear Science, 2010, 57(4): 1747-1763 doi: 10.1109/TNS.2010.2042613
[24]	BINDER D, SMITH E C, HOLMAN A B. Satellite anomalies from galactic cosmic rays[J]. IEEE Transactions on Nuclear Science, 1975, 22(6): 2675-2680 doi: 10.1109/TNS.1975.4328188
[25]	胡孔阳, 胡海生, 刘小明. 三模冗余在高性能抗辐射DSP中的应用[J]. 微电子学与计算机, 2019, 36(3): 58-60 HU Kongyang, HU Haisheng, LIU Xiaoming. The application of TMR on the high performance and anti radiation DSP[J]. Microelectronics & Computer, 2019, 36(3): 58-60
[26]	ZHANG X F, WANG J S, ZHU C, et al. DNNBuilder: An automated tool for building high-performance DNN hardware accelerators for FPGAs[C]//2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). San Diego: IEEE, 2018: 1-8