Downloads
Download
Optimizing Transformer Models for Resource-Constrained Environments: A Study on Model Compression Techniques
Recent progress in computer vision has been driven by transformer-based models, which consistently outperform traditional methods across various tasks. However, their high computational and memory demands limit their use in resource-constrained environments. This research addresses these challenges by investigating four key model compression techniques: quantization, low-rank approximation, knowledge distillation, and pruning. We thoroughly evaluate the effects of these techniques, both individually and in combination, on optimizing transformers for resource-limited settings. Our experimental findings show that these methods can successfully strike a balance between accuracy and efficiency, enhancing the feasibility of transformer models for edge computing.
References
- Vaswani A, Shazeer N, Parmar N, et al. Attention Is All You Need. Advances in Neural Information Processing Systems 2017; 30: 1–15.
- Luo Z. Knowledge-guided Aspect-based Summarization. In Proceedings of the 2023 International Conference on Communications, Computing and Artificial Intelligence (CCCAI), Shanghai, China, 23–25 June 2023; pp. 17–22. https://doi.org/10.1109/CCCAI59026.2023.00012. DOI: https://doi.org/10.1109/CCCAI59026.2023.00012
- Chen F, Chen N, Mao H, et al. Assessing Four Neural Networks on Handwritten Digit Recognition Dataset (MNIST). arXiv 2018, arXiv:1811.08278.
- Luo Z, Zeng X, Bao Z, et al. Deep Learning-Based Strategy for Macromolecules Classification with Imbalanced Data from Cellular Electron Cryotomography. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–8. DOI: https://doi.org/10.1109/IJCNN.2019.8851972
- Chen F, Luo Z. Sentiment Analysis using Deep Robust Complementary Fusion of Multi-Features and Multi-Modalities. arXiv 2019, arXiv:1904.08138.
- Luo Z, Xu H, Chen F. Utterance-Based Audio Sentiment Analysis Learned by a Parallel Combination of CNN and LSTM. arXiv 2018, arXiv:1811.08065.
- Luo Z, Xu H, Chen, F. Audio Sentiment Analysis by Heterogeneous Signal Features Learned from Utterance-Based Parallel Neural Network. In Proceedings of the AffCon@ AAAI, Honolulu, HI, USA, 27 January 2019; pp. 80–87. DOI: https://doi.org/10.29007/7mhj
- Chen F, Luo Z. Learning Robust Heterogeneous Signal Features from Parallel Neural Network for Audio Sentiment Analysis. arXiv 2019, arXiv:1811.08065.
- Zhang J, Du Y, Zhou P, et al, Predicting Unseen Antibodies’ Neutralizability via Adaptive Graph Neural Networks. Nature Machine Intelligence 2022; 4: 964–976. DOI: https://doi.org/10.1038/s42256-022-00553-w
- Wu Y, Gao M, Zeng M, et al. BridgeDPI: A Novel Graph Neural Network for Predicting Drug–Protein Interactions. Bioinformatics 2022; 38: 2571–2578. DOI: https://doi.org/10.1093/bioinformatics/btac155
- Chen F, Jiang Y, Zeng X, et al. PUB-SalNet: A Pre-Trained Unsupervised Self-Aware Backpropagation Network for Biomedical Salient Segmentation. Algorithms 2020; 13: 126. DOI: https://doi.org/10.3390/a13050126
- Liu S, Ban X, Zeng X, et al. A Unified Framework for Packing Deformable and Non-Deformable Subcellular Structures in Crowded Cryo-Electron Tomogram Simulation. BMC Bioinformatics 2020; 21: 399. https://doi.org/10.1186/s12859-020-03660-w. DOI: https://doi.org/10.1186/s12859-020-03660-w
- Child R, Gray S, Radford A, et al. Generating Long Sequences with Sparse Transformers. arXiv 2019, arXiv:1904.10509.
- Alexey D. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929.
- Krishnamoorthi R. Quantizing Deep Convolutional Networks for Efficient Inference: A Whitepaper. arXiv 2018, arXiv:1806.08342.
- Shen S, Dong Z, Ye J, et al. Q-Bert: Hessian Based Ultra Low Precision Quantization of Bert. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 8815–8821. DOI: https://doi.org/10.1609/aaai.v34i05.6409
- Chen B, Dao T, Winsor E, et al. Scatterbrain: Unifying Sparse and Low-rank Attention Approximation. Advances in Neural Information Processing Systems 2021; 34: 17413–17426.
- Chen F, Chen N, Mao H, et al. The Application of Bipartite Matching in Assignment Problem. arXiv 2019, arXiv:1902.00256.
- Chen F, Chen N, Mao H, et al. An Efficient Sorting Algorithm-Ultimate Heapsort (UHS). arXiv 2019 arXiv:1902.00257
- Lu J, Yao J, Zhang J, et al. Soft: Softmax-Free Transformer with Linear Complexity. Advances in Neural Information Processing Systems 2021; 34: 21297–21309.
- Xiong Y, Zeng Z, Chakraborty R, et al. Nyströmformer: A Nystöm-Based Algorithm for Approximating Self-Attention. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021.
- Choromanski K, Likhosherstov V, Dohan D, et al. Rethinking Attention with Performers. arXiv 2022, arXiv:2009.14794. https://doi.org/10.48550/arXiv.2009.14794.
- Wang S, Li BZ, Khabsa M, et al. Linformer: Self-Attention with Linear Complexity. arXiv 2020, arXiv:2006.04768.
- Hinton G, Vinyals O, Dean J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531.
- Yuan L, Chen Y, Wang T, et al. Tokens-to-Token Vit: Training Vision Transformers from Scratch on Imagenet. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 558–567.
- Wei L, Xiao A, Xie L, et al. Circumventing Outliers of Autoaugment with Knowledge Distillation. In Computer Vision–ECCV; Springer International Publishing: Cham, Switzerland, 2020; Volume 12348, pp. 608–625. https://doi.org/10.1007/978-3-030-58580-8_36. DOI: https://doi.org/10.1007/978-3-030-58580-8_36
- Touvron H, Cord M, Douze M, et al. Training Data-Efficient Image Transformers & Distillation through Attention. In Proceedings of the 38th International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 10347–10357.
- Zhu M, Tang Y, Han K. Vision Transformer Pruning. arXiv 2021, arXiv:2104.08500. https://doi.org/10.48550/arXiv.2104.08500.
- Yu S, Chen T, Shen J, et al. Unified Visual Transformer Compression. arXiv 2022, arXiv:2203.08243. https://doi.org/10.48550/arXiv.2203.08243.
- Yang H, Yin H, Molchanov P, et al. Nvit: Vision Transformer Compression and Parameter Redistribution. 2021. Available online: https://openreview.net/forum?id=LzBBxCg-xpa(accessed on 29 May 2024).
- Yu H, Wu J. A Unified Pruning Framework for Vision Transformers. Science China Information Sciences 2023; 66: 179101. https://doi.org/10.1007/s11432-022-3646-6. DOI: https://doi.org/10.1007/s11432-022-3646-6
- Liu Z, Wang Y, Han K, et al. Post-Training Quantization for Vision Transformer. Advances in Neural Information Processing Systems 2021; 34: 28092–28103.
- Lin Y, Zhang T, Sun P, et al. Fq-Vit: Fully Quantized Vision Transformer without Retraining. arXiv 2021, arXiv:2111.13824.
- Esser SK, McKinstry JL, Bablani, D, et al. Learned Step Size Quantization. arXiv 2020, arXiv:1902.08153. https://doi.org/10.48550/arXiv.1902.08153.
- Défossez A, Adi Y, Synnaeve G. Differentiable Model Compression via Pseudo Quantization Noise. arXiv 2022.
- Tang Y, Han K, Wang Y, et al. Patch Slimming for Efficient Vision Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12165–12174.
- Rao Y, Zhao W, Liu B, et al. Dynamicvit: Efficient Vision Transformers with Dynamic Token Sparsification. Advances in Neural Information Processing Systems 2021; 34: 13937–13949.
- Kong C, Li H, Zhang L, et al. Link Prediction on Dynamic Heterogeneous Information Networks. In Computational Data and Social Networks; Springer International Publishing: Cham, Switzerland, 2019; Volume 11917, pp. 339–350. https://doi.org/10.1007/978-3-030-34980-6_36. DOI: https://doi.org/10.1007/978-3-030-34980-6_36
- Chen F, Luo Z, Xu Y, et al. Complementary Fusion of Multi-Features and Multi-Modalities in Sentiment Analysis. arXiv 2019, arXiv:1904.08138.
- Chen F, Luo Z, Chen N, et al. Assessing Four Neural Networks on Handwritten Digit Recognition Dataset (MNIST). Journal of Computer Science Research 2024; 6(3):17–22. DOI: https://doi.org/10.30564/jcsr.v6i3.6804
- Zhou L, Luo Z, Pan X. Machine Learning-Based System Reliability Analysis with Gaussian Process Regression. arXiv 2024, arXiv:2403.11125.
- Chen F, Luo Z, Zhou L, et al. Comprehensive Survey of Model Compression and Speed Up for Vision Transformers. arXiv 2024, arXiv:2404.10407. DOI: https://doi.org/10.62836/jitp.v1i1.156
- Pan X, Luo Z, Zhou L. Navigating the Landscape of Distributed File Systems: Architectures, Implementations, and Considerations. arXiv 2024, arXiv:2403.15701. DOI: https://doi.org/10.62836/iaet.v2i1.157
- Pan X, Luo Z, Zhou L. Comprehensive Survey of State-of-the-Art Convolutional Neural Network Architectures and Their Applications in Image Classification. Innovations in Applied Engineering and Technology 2022; 1(1): 1–16. DOI: https://doi.org/10.62836/iaet.v1i1.1006
- Luo Z, Chen F, Chen X, et al. A Novel Framework for Text-Image Pair to Video Generation in Music Anime Douga (MAD) Production. Artificial Intelligence Advances 2024; 6(1): 25–33. DOI: https://doi.org/10.30564/aia.v6i1.6848
- Luo Z, Pan X. Advancing Mathematical Reasoning in AI: A Comprehensive Approach with the MathQA Dataset. HAL Open Science 2024; hal-04667520.
- Luo Z, Pan X. Defense Against Adversarial Attacks on Speech Systems. HAL Open Science 2024; hal-04667394.
- Chen Z, Fu C, Wu R, et al. Wa with Heterogeneous RGCN for Medical ICD Coding Generation. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa ON Canada, 29 October–3 November 2023; pp. 5428–5435. https://doi.org/10.1145/3581783.3612564.ng. DOI: https://doi.org/10.1145/3581783.3612564
- Chen Z, Fu C, Tang X. Multi-Domain Fake News Detection with Fuzzy Labels. In Database Systems for Advanced Applications. DASFAA 2023 International Workshops; Springer Nature: Cham, Switzerland, 2023; Volume 13922, pp. 331–343. https://doi.org/10.1007/978-3-031-35415-1_23. DOI: https://doi.org/10.1007/978-3-031-35415-1_23
- Wang Y, Chen J, Wang M, et al, A Closer Look at Classifier in Adversarial Domain Generalization. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa ON Canada, 29 October–3 November 2023; pp. 280–289. https://doi.org/10.1145/3581783.3611743. DOI: https://doi.org/10.1145/3581783.3611743
- Gu Y, Yan D, Yan S, et al. Price Forecast with High-Frequency Finance Data: An Autoregressive Recurrent Neural Network Model with Technical Indicators. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual, 19–23 October 2020; pp. 2485–2492. https://doi.org/10.1145/3340531.3412738. DOI: https://doi.org/10.1145/3340531.3412738
- Gu Y, Chen K. GAN-Based Domain Inference Attack. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023. https://doi.org/10.1609/aaai.v37i12.26663. DOI: https://doi.org/10.1609/aaai.v37i12.26663
- Gu Y, Sharma S, Chen K. Image Disguising for Scalable GPU-accelerated Confidential Deep Learning. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, Copenhagen Denmark 26–30 November 2023; pp. 3679–3681. Available online: https://dl.acm.org/doi/abs/10.1145/3576915.3624364 (accessed on 29 May 2024).
- Du S, Chen Z, Wu H, et al. Image Recommendation Algorithm Combined with Deep Neural Network Designed for Social Networks. Complexity 2021; 2021: 5196190. https://doi.org/10.1155/2021/5196190. DOI: https://doi.org/10.1155/2021/5196190
- Wang Y, Chen Z, Fu C. Synergy Masks of Domain Attribute Model DaBERT: Emotional Tracking on Time-Varying Virtual Space Communication. Sensors 2022; 22: 8450. DOI: https://doi.org/10.3390/s22218450