Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/34409
Title: Acceleration-aware Fine-grained Channel Pruning for Deep Neural Networks via Residual Gating
Authors: Huang, Kai
CHEN, Siang 
LI, Bowen 
CLAESEN, Luc 
Yao, Hao
Chen, Junjian
Jiang, Xiaowen
Liu, Zhili
Xiong, Dongliang
Issue Date: 2022
Publisher: IEEE
Source: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 41(6), p. 1902-1915.
Abstract: Deep Neural Networks have achieved remarkable advancement in various intelligence tasks. However, the massive computation and storage consumption limit applications on resource-constrained devices. While channel pruning has been widely applied to compress models, it is challenging to reach very deep compressions for such a coarse-grained pruning structure without significant performance degradation. In this article, we propose an acceleration-aware fine-grained channel pruning (AFCP) framework for accelerating neural networks, which optimizes trainable gate parameters by estimating residual errors between pruned and original channels with hardware characteristics. Our fine-grained concept consists of both algorithm and structure levels. Different from existing methods that leverage a pre-defined pruning criterion, AFCP explicitly considers both zero-out and similar criteria for each channel and adaptively selects the suitable one via residual gate parameters. For structure level, AFCP adopts a fine-grained channel pruning strategy for residual neural networks and a decomposition-based structure, which further extends the pruning optimization space. Moreover, instead of using theoretical computation costs such as FLOPs, we propose the hardware predictor that bridges the gap between realistic acceleration and pruning procedure to guide the learning of pruning, which improves the efficiency of model pruning when deployed on accelerators. Extensive evaluation results demonstrate that AFCP outperforms state-of-the-art methods, and achieves a favorable balance between model performance and computation cost.
Keywords: Index Terms-Deep learning system;model compression and acceleration;pruning;neural networks
Document URI: http://hdl.handle.net/1942/34409
ISSN: 0278-0070
e-ISSN: 1937-4151
DOI: 10.1109/TCAD.2021.3093835
ISI #: 000799624800028
Category: A1
Type: Journal Contribution
Validations: ecoom 2023
Appears in Collections:Research publications

Files in This Item:
File Description SizeFormat 
FINAL VERSION.pdfPeer-reviewed author version2.04 MBAdobe PDFView/Open
Acceleration-Aware_Fine-Grained_Channel_Pruning_for_Deep_Neural_Networks_via_Residual_Gating.pdf
  Restricted Access
Published version2.44 MBAdobe PDFView/Open    Request a copy
Show full item record

WEB OF SCIENCETM
Citations

6
checked on Apr 24, 2024

Page view(s)

160
checked on Sep 7, 2022

Download(s)

62
checked on Sep 7, 2022

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.