基于HLS工具的CNN加速器的設(shè)計(jì)與優(yōu)化方法研究-AET-電子技術(shù)應(yīng)用

基于HLS工具的CNN加速器的設(shè)計(jì)與優(yōu)化方法研究

2021年電子技術(shù)應(yīng)用第3期

程佳風(fēng)，王紅亮

中北大學(xué) 電子測量技術(shù)國家重點(diǎn)實(shí)驗(yàn)室，山西太原030051

摘要： 基于軟硬件協(xié)同設(shè)計(jì)的思想，利用HLS工具，在PYNQ-Z2平臺上設(shè)計(jì)并實(shí)現(xiàn)了一個卷積神經(jīng)網(wǎng)絡(luò)加速器，對卷積運(yùn)算采用矩陣切割的優(yōu)化方法，均衡了資源消耗和計(jì)算資源，使得加速器的性能達(dá)到了最優(yōu)。利用MNIST數(shù)據(jù)集對加速器IP核進(jìn)行性能測試，實(shí)驗(yàn)結(jié)果表明：對單張圖片的測試，該加速器相對于ARM平臺實(shí)現(xiàn)了5.785的加速效果，對于1 000張圖片的測試則可達(dá)到9.72的加速效果，隨著測試圖片數(shù)量的不斷增加，加速器的性能也將越來越優(yōu)。

關(guān)鍵詞： 卷積神經(jīng)網(wǎng)絡(luò) PYNQ-Z2 HLS工具加速器

中圖分類號： TN108.1
文獻(xiàn)標(biāo)識碼： A
DOI：10.16157/j.issn.0258-7998.200841
中文引用格式： 程佳風(fēng)，王紅亮. 基于HLS工具的CNN加速器的設(shè)計(jì)與優(yōu)化方法研究[J].電子技術(shù)應(yīng)用，2021，47(3)：18-21，26.
英文引用格式： Cheng Jiafeng，Wang Hongliang. Research on the design and optimization method of CNN accelerator based on HLS tools[J]. Application of Electronic Technique，2021，47(3)：18-21，26.

Research on the design and optimization method of CNN accelerator based on HLS tools

Cheng Jiafeng，Wang Hongliang

National Key Laboratory for Electronic Measurement Technology，North University of China，Taiyuan 030051，China

Abstract： Based on the idea of software and hardware co-design, this article uses HLS tools to design and implement a convolutional neural network accelerator on the PYNQ-Z2 platform, and uses the matrix cutting optimization method for convolution operations to balance resource consumption and computing resources , so that the performance of the accelerator is optimized. This article uses the MNIST data set to test the performance of the accelerator IP core. The experimental results show that: for a single image test, the accelerator achieves an acceleration effect of 5.785 compared with the ARM platform, and an acceleration of 9.72 for a 1000 image test. As a result, as the number of test images continues to increase, the performance of the accelerator will become better and better.

Key words : convolutional neural network(CNN)；PYNQ-Z2；HLS tool；accelerator

0 引言

近年來，卷積神經(jīng)網(wǎng)絡(luò)的應(yīng)用范圍越來越廣泛，其應(yīng)用場景也日益復(fù)雜，卷積神經(jīng)網(wǎng)絡(luò)的計(jì)算密集和存儲密集特征日益凸顯，成為快速高效實(shí)現(xiàn)卷積神經(jīng)網(wǎng)絡(luò)的限制。于是基于GPU^[1]、ASIC^[2]、FPGA^[3]的不同的加速器平臺被相繼提出以提升CNN的設(shè)計(jì)性能。GPU的電力消耗巨大，硬件結(jié)構(gòu)固定，限制了卷積神經(jīng)網(wǎng)絡(luò)在嵌入式設(shè)備的應(yīng)用；ASIC開發(fā)成本極高，靈活性低，不適合搭載復(fù)雜多變的卷積神經(jīng)網(wǎng)絡(luò)；FPGA具有功耗低、性能高、靈活性好的特點(diǎn)，因此更加適用于卷積神經(jīng)網(wǎng)絡(luò)硬件加速的開發(fā)研究，但由于Verilog HDL開發(fā)門檻高，開發(fā)周期相對較長，影響了FPGA在卷積神經(jīng)網(wǎng)絡(luò)應(yīng)用的普及^[4-5]。

本文基于軟硬件協(xié)同的思想，利用HLS工具，在PYNQ-Z2上實(shí)現(xiàn)了一個卷積神經(jīng)網(wǎng)絡(luò)加速器，并采用矩陣切割的設(shè)計(jì)方法對卷積核運(yùn)算進(jìn)行優(yōu)化。

本文詳細(xì)內(nèi)容請下載:http://forexkbc.com/resource/share/2000003402

作者信息:

程佳風(fēng)，王紅亮

(中北大學(xué) 電子測量技術(shù)國家重點(diǎn)實(shí)驗(yàn)室，山西太原030051)

原創(chuàng)聲明：此內(nèi)容為AET網(wǎng)站原創(chuàng)，未經(jīng)授權(quán)禁止轉(zhuǎn)載。

相關(guān)內(nèi)容