Downloads
Download
This work is licensed under a Creative Commons Attribution 4.0 International License.
Comprehensive Survey of Model Compression and Speed up for Vision Transformers
Vision Transformers (ViT) have marked a paradigm shift in computer vision, outperforming state-of-the-art models across diverse tasks. However, their practical deployment is hampered by high computational and memory demands. This study addresses the challenge by evaluating four primary model compression techniques: quantization, low-rank approximation, knowledge distillation, and pruning. We methodically analyze and compare the efficacy of these techniques and their combinations in optimizing ViTs for resource-constrained environments. Our comprehensive experimental evaluation demonstrates that these methods facilitate a balanced compromise between model accuracy and computational efficiency, paving the way for wider application in edge computing devices.
References
- Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I. Attention is All you Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017.
- Luo Z. Knowledge-guided Aspect-based Summarization. In Proceedings of the 2023 International Conference on Communications, Computing and Artificial Intelligence (CCCAI), Shanghai, China, 23–25 June 2023.
- Chen F, Chen N, Mao H, Hanlin Hu M and Hu H. Assessing Four Neural Networks on Handwritten Digit Recognition Dataset (MNIST). 2018. arXiv:1811.08278v2.
- Luo Z, Zeng X, Bao Z, Xu M. Deep Learning-Based Strategy for Macromolecules Classification With Imbalanced Data From Cellular Electron Cryotomography. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019.
- Chen F, Luo Z. Sentiment Analysis using Deep Robust Complementary Fusion of Multi-Features and Multi-Modalities. 2019; ArXiv abs/1904.08138 (2019): n. pag.
- Luo Z, Xu H, Chen F. Utterance-Based Audio Sentiment Analysis Learned By a Parallel Combination of cnn and lstm. ArXiv abs/1811.08065 (2018): n. pag.
- Z. Luo, Xu H, Chen F. Audio Sentiment Analysis by Heterogeneous Signal Features Learned from Utterance-Based Parallel Neural Network. In Proceedings of the AffCon@ AAAI 2019, Honolulu, HA, USA, 27 January 2019.
- Chen F, Luo Z. Learning Robust Heterogeneous Signal Features from Parallel Neural Network for Audio Sentiment Analysis. arXiv: Audio and Speech Processing 2018. arXiv:1811.08065.
- Zhang J, Du Y, Zhou P, Ding J, Xia S, Wang Q, Chen F, Zhou M, Zhang X, Wang W, Wu H, Lu L, Zhang S. Predicting Unseen Antibodies’ Neutralizability Via Adaptive Graph Neural Networks. Nature Machine Intelligence 2022; 4(11): 964–976.
- Wu Y, Gao M, Zeng M, Zhang J, Li M. BridgeDPI: a Novel Graph Neural Network for Predicting Drug–Protein Interactions. Bioinformatics 2022; 38(9): 2571–2578.
- Chen F, Jiang Y, Zeng X, Zhang J, Gao X, Xu M. PUB-SalNet: A Pre-Trained Unsupervised Self-Aware Backpropagation Network for Biomedical Salient Segmentation. Algorithms 2020; 13(5): 126.
- Liu S, Ban X, Zeng X, Zhao F, Gao Y, Wu W, Zhang H, Chen F. Thomas Hall, Xin Gao, Min Xu. A Unified Framework for Packing Deformable and Non-Deformable Subcellular Structures in Crowded Cryo-Electron Tomogram Simulation. BMC Bioinformatics 2020; 21(1): 399.
- Child R, Gray S, Radford A, Sutskever I. Generating Long Sequences With Sparse Transformers. arXiv 2019. arXiv:1904.10509.
- Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. 2020. arXiv:2010.11929.
- Krishnamoorthi R. Quantizing Deep Convolutional Networks for Efficient Inference: A Whitepaper. 2018. arXiv:1806.08342.
- Shen S. Q-Bert: Hessian Based Ultra Low Precision Quantization of Bert. In Proceedings of the AAAI Conference on Artificial Intelligence 2020, New York, NY, USA, 7–12 February 2020.
- Chen B, Dao T, Winsor E, Song Z, Rudra A, Ré C. Scatterbrain: Unifying Sparse and Low-rank Attention Approximation. 2021. arXiv:2110.15343.
- Chen F, Chen N, Mao H, Hu H. The Application of Bipartite Matching in Assignment Problem. 2019. arXiv:1902.00256.
- Chen F, Chen N, Mao H, Hu H. An efficient sorting algorithm-Ultimate Heapsort (UHS). 2019. arXiv:1902.00257.
- Lu J, Yao J, Zhang J, Zhu X, Xu H, Gao W, Xu C, Xiang Tao, Zhang L. Soft: Softmax-Free Transformer with Linear Complexity. 2021. arXiv:2110.11945.
- Yunyang X. Nyströmformer: A Nystöm-Based Algorithm for Approximating Self-Attention. In Proceedings of the AAAI 2021, Virtual Event, 2–9 February 2021.
- Choromanski K. Likhosherstov V, Dohan D, Song X, Gane A, Sarlos T, Hawkins P, Davis J, Mohiuddin A, Kaiser L, Belanger D, Colwell L, Weller A. Rethinking Attention with Performers. 2022. arXiv:2009.14794.
- Wang S, Li BZ, Khabsa M, Fang H, Ma and H. Linformer: Self-Attention with Linear Complexity. 2020. arXiv:2006.04768.
- Hinton G, Vinyals O, Dean J. Distilling the Knowledge in a Neural Network. 2015. arXiv:1503.02531.
- Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang Z, Tay FEH, Feng J, Yan S. Tokens-to-Token Vit: Training Vision Transformers From Scratch on Imagenet. In Proceedings of the IEEE/CVF international conference on computer vision (ICCVW), Montreal, BC, Canada, 11–17 October 2021
- Wei L, Xiao A, Xie L, Zhang X, Chen X, Tian Q. Circumventing Outliers of AutoAugment with Knowledge Distillation. In Proceedings of the Computer Vision – ECCV 2020, Glasgow, UK, 23–28 August 2020.
- Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jegou H. Training Data-Efficient Image Transformers & Distillation Through Attention. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021.
- Zhu M, Tang Y, Han K. Vision Transformer Pruning. 2021. arXiv:2104.08500.
- Yu S, Chen T, Shen J, Yuan H, Tan J, Yang S, Liu J, Wang Z. Unified Visual Transformer Compression. 2022. arXiv:2203.08243.
- Yang H, Yin H, Molchanov P, Li H, Kautz J. Nvit: Vision Transformer Compression and Parameter Redistribution. Available online: https://openreview.net/forum?id=LzBBxCg-xpa (accessed on 31 January 2024).
- Yu H, Wu J. A Unified Pruning Framework for Vision Transformers. Sci China Inf Sci 2023; 66 (7): 179101.
- Liu Z, Wang Y, Han K, Zhang W, Ma S, Gao W. Post-Training Quantization for Vision Transformer. Advances in Neural Information Processing Systems 2021; 34: 28092–28103.
- Lin Y, Zhang T, Sun P, Li Z, Zhou S. Fq-Vit: Fully Quantized Vision Transformer Without Retraining. 2021. arXiv:2111.13824.
- Esser SK, McKinstry JL, Bablani D, Appuswamy R, Modha DS. Learned Step Size Quantization. 2020. arXiv:1902.08153.
- Défossez A, Adi Y, Synnaeve G. Differentiable Model Compression via Pseudo Quantization Noise. 2022. arXiv:2104.09987.
- Tang Y. Patch Slimming for Efficient Vision Transformers. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022.
- Rao Y, Zhao W, Liu B, Lu J, Zhou J, Hsieh C-J. Dynamicvit: Efficient Vision Transformers with Dynamic Token Sparsification. Advances in Neural Information Processing Systems 2021; 34: 13937–13949.
- Zhao Y, Dai W, Wang Z, Ragab AE. Application of Computer Simulation to Model Transient Vibration Responses of GPLs Reinforced Doubly Curved Concrete Panel Under Instantaneous Heating. Materials Today Communications 2024; 38: 107949.
- Dai W, Fatahizadeh M, Touchaei HG, Moayedi H, Foong LK. Application of Six Neural Network-Based Solutions on Bearing Capacity of Shallow Footing on Double-Layer Soils. Steel and Composite Structures 2023; 49(2): 231–244.
- Dai W. Safety Evaluation of Traffic System with Historical Data Based on Markov Process and Deep-Reinforcement Learning. Journal of Computational Methods in Engineering Applications 2021; 1(1): 1–14.
- Dai W. Design of Traffic Improvement Plan for Line 1 Baijiahu Station of Nanjing Metro. Innovations in Applied Engineering and Technology 2023; 2(1). DOI: 10.58195/iaet.v2i1.133.
- Dai W. Evaluation and Improvement of Carrying Capacity of a Traffic System. Innovations in Applied Engineering and Technology 2022; 1(1): 1–9. doi: 10.58195/iaet.v1i1.001.
- Wang H, Zhou Y, Perez E, Roemer F. Jointly Learning Selection Matrices For Transmitters, Receivers And Fourier Coefficients In Multichannel Imaging. 2024. arXiv:2402.19023.
- Zhou L, Luo Z, Pan X. Machine Learning-Based System Reliability Analysis with Gaussian Process Regression. 2024. arXiv:2403.11125.
- Li M, Zhou Y, Jiang G, Deng T, Wang Y, Wang H. DDN-SLAM: Real-time Dense Dynamic Neural Implicit SLAM. 2024. arXiv:2401.01545.
- Zhou Y. Osman A, Willms M, Kunz A, Philipp S, Blatt J, Eul S. Semantic Wireframe Detection. Available online: chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://www.ndt.net/article/dgzfp2023/papers/P17.pdf (accessed on 23 March 2024).
- G. Tao, Wang H, Shen Y, Zhai L, Liu B, Wang B, Chen W, Xing S, Chen Y, Gu H-M, Qin S, Zhang D-W. Surf4 (Surfeit Locus Protein 4) Deficiency Reduces Intestinal Lipid Absorption and Secretion and Decreases Metabolism in Mice. Arterioscler Thromb Vasc Biol 2023; 43(4): 562–580.
- Shen Y, Gu H-M, Qin S, Zhang D-W. Surf4, Cargo Trafficking, Lipid Metabolism, and Therapeutic Implications. Journal of Molecular Cell Biology 2022; 14(9): mjac063.
- Wang M, Alabi A, Gu H-m, Gill G, Zhang Z, Jarad S, Xia X-d, Shen Y, Wang G-q, Zhang D-w. Identification of Amino Acid Residues in the MT-Loop of MT1-MMP Critical for Its Ability to Cleave Low-Density Lipoprotein Receptor. Frontiers in Cardiovascular Medicine 2022; 9: 917238.
- Shen Y, Gu H, Zhai L, Wang B, Qin S, Zhang D. The Role of Hepatic Surf4 in Lipoprotein Metabolism and the Development of Atherosclerosis in apoE-/- mice. Biochimica et Biophysica Acta (BBA)-Molecular and Cell Biology of Lipids 2022; 1867(10): 159196.
- Wang B, Shen Y, Zhai L, Xia X, Gu H-M, Wang M, Zhao Y, Chang X, Alabi A, Xing S, Deng S, Liu B, Wang G, Qin S, Zhang D-W. Atherosclerosis-Associated Hepatic Secretion of VLDL but Not PCSK9 Is Dependent on Cargo Receptor Protein Surf4. Journal of Lipid Research 2021; 62: 100091.
Supporting Agencies
- Funding: Not applicable.