Optimizing Transformer Models for Resource-Constrained Environments: A Study on Model Compression Techniques

Ziqian  Luo; Hanrui Yan; Xueting Pan

doi:10.62836/jcmea.v3i1.030107

Optimizing Transformer Models for Resource-Constrained Environments: A Study on Model Compression Techniques

Recent progress in computer vision has been driven by transformer-based models, which consistently outperform traditional methods across various tasks. However, their high computational and memory demands limit their use in resource-constrained environments. This research addresses these challenges by investigating four key model compression techniques: quantization, low-rank approximation, knowledge distillation, and pruning. We thoroughly evaluate the effects of these techniques, both individually and in combination, on optimizing transformers for resource-limited settings. Our experimental findings show that these methods can successfully strike a balance between accuracy and efficiency, enhancing the feasibility of transformer models for edge computing.

Keywords: transformer model; computer vision; image processing; model compression

References

Vaswani A, Shazeer N, Parmar N, et al. Attention Is All You Need. Advances in Neural Information Processing Systems 2017; 30: 1–15.
Luo Z. Knowledge-guided Aspect-based Summarization. In Proceedings of the 2023 International Conference on Communications, Computing and Artificial Intelligence (CCCAI), Shanghai, China, 23–25 June 2023; pp. 17–22. https://doi.org/10.1109/CCCAI59026.2023.00012. DOI: https://doi.org/10.1109/CCCAI59026.2023.00012
Chen F, Chen N, Mao H, et al. Assessing Four Neural Networks on Handwritten Digit Recognition Dataset (MNIST). arXiv 2018, arXiv:1811.08278.
Luo Z, Zeng X, Bao Z, et al. Deep Learning-Based Strategy for Macromolecules Classification with Imbalanced Data from Cellular Electron Cryotomography. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–8. DOI: https://doi.org/10.1109/IJCNN.2019.8851972
Chen F, Luo Z. Sentiment Analysis using Deep Robust Complementary Fusion of Multi-Features and Multi-Modalities. arXiv 2019, arXiv:1904.08138.
Luo Z, Xu H, Chen F. Utterance-Based Audio Sentiment Analysis Learned by a Parallel Combination of CNN and LSTM. arXiv 2018, arXiv:1811.08065.
Luo Z, Xu H, Chen, F. Audio Sentiment Analysis by Heterogeneous Signal Features Learned from Utterance-Based Parallel Neural Network. In Proceedings of the AffCon@ AAAI, Honolulu, HI, USA, 27 January 2019; pp. 80–87. DOI: https://doi.org/10.29007/7mhj
Chen F, Luo Z. Learning Robust Heterogeneous Signal Features from Parallel Neural Network for Audio Sentiment Analysis. arXiv 2019, arXiv:1811.08065.
Zhang J, Du Y, Zhou P, et al, Predicting Unseen Antibodies’ Neutralizability via Adaptive Graph Neural Networks. Nature Machine Intelligence 2022; 4: 964–976. DOI: https://doi.org/10.1038/s42256-022-00553-w
Wu Y, Gao M, Zeng M, et al. BridgeDPI: A Novel Graph Neural Network for Predicting Drug–Protein Interactions. Bioinformatics 2022; 38: 2571–2578. DOI: https://doi.org/10.1093/bioinformatics/btac155
Chen F, Jiang Y, Zeng X, et al. PUB-SalNet: A Pre-Trained Unsupervised Self-Aware Backpropagation Network for Biomedical Salient Segmentation. Algorithms 2020; 13: 126. DOI: https://doi.org/10.3390/a13050126
Liu S, Ban X, Zeng X, et al. A Unified Framework for Packing Deformable and Non-Deformable Subcellular Structures in Crowded Cryo-Electron Tomogram Simulation. BMC Bioinformatics 2020; 21: 399. https://doi.org/10.1186/s12859-020-03660-w. DOI: https://doi.org/10.1186/s12859-020-03660-w
Child R, Gray S, Radford A, et al. Generating Long Sequences with Sparse Transformers. arXiv 2019, arXiv:1904.10509.
Alexey D. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929.
Krishnamoorthi R. Quantizing Deep Convolutional Networks for Efficient Inference: A Whitepaper. arXiv 2018, arXiv:1806.08342.
Shen S, Dong Z, Ye J, et al. Q-Bert: Hessian Based Ultra Low Precision Quantization of Bert. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 8815–8821. DOI: https://doi.org/10.1609/aaai.v34i05.6409
Chen B, Dao T, Winsor E, et al. Scatterbrain: Unifying Sparse and Low-rank Attention Approximation. Advances in Neural Information Processing Systems 2021; 34: 17413–17426.
Chen F, Chen N, Mao H, et al. The Application of Bipartite Matching in Assignment Problem. arXiv 2019, arXiv:1902.00256.
Chen F, Chen N, Mao H, et al. An Efficient Sorting Algorithm-Ultimate Heapsort (UHS). arXiv 2019 arXiv:1902.00257
Lu J, Yao J, Zhang J, et al. Soft: Softmax-Free Transformer with Linear Complexity. Advances in Neural Information Processing Systems 2021; 34: 21297–21309.
Xiong Y, Zeng Z, Chakraborty R, et al. Nyströmformer: A Nystöm-Based Algorithm for Approximating Self-Attention. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021.
Choromanski K, Likhosherstov V, Dohan D, et al. Rethinking Attention with Performers. arXiv 2022, arXiv:2009.14794. https://doi.org/10.48550/arXiv.2009.14794.
Wang S, Li BZ, Khabsa M, et al. Linformer: Self-Attention with Linear Complexity. arXiv 2020, arXiv:2006.04768.
Hinton G, Vinyals O, Dean J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531.
Yuan L, Chen Y, Wang T, et al. Tokens-to-Token Vit: Training Vision Transformers from Scratch on Imagenet. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 558–567.
Wei L, Xiao A, Xie L, et al. Circumventing Outliers of Autoaugment with Knowledge Distillation. In Computer Vision–ECCV; Springer International Publishing: Cham, Switzerland, 2020; Volume 12348, pp. 608–625. https://doi.org/10.1007/978-3-030-58580-8_36. DOI: https://doi.org/10.1007/978-3-030-58580-8_36
Touvron H, Cord M, Douze M, et al. Training Data-Efficient Image Transformers & Distillation through Attention. In Proceedings of the 38th International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 10347–10357.
Zhu M, Tang Y, Han K. Vision Transformer Pruning. arXiv 2021, arXiv:2104.08500. https://doi.org/10.48550/arXiv.2104.08500.
Yu S, Chen T, Shen J, et al. Unified Visual Transformer Compression. arXiv 2022, arXiv:2203.08243. https://doi.org/10.48550/arXiv.2203.08243.
Yang H, Yin H, Molchanov P, et al. Nvit: Vision Transformer Compression and Parameter Redistribution. 2021. Available online: https://openreview.net/forum?id=LzBBxCg-xpa(accessed on 29 May 2024).
Yu H, Wu J. A Unified Pruning Framework for Vision Transformers. Science China Information Sciences 2023; 66: 179101. https://doi.org/10.1007/s11432-022-3646-6. DOI: https://doi.org/10.1007/s11432-022-3646-6
Liu Z, Wang Y, Han K, et al. Post-Training Quantization for Vision Transformer. Advances in Neural Information Processing Systems 2021; 34: 28092–28103.
Lin Y, Zhang T, Sun P, et al. Fq-Vit: Fully Quantized Vision Transformer without Retraining. arXiv 2021, arXiv:2111.13824.
Esser SK, McKinstry JL, Bablani, D, et al. Learned Step Size Quantization. arXiv 2020, arXiv:1902.08153. https://doi.org/10.48550/arXiv.1902.08153.
Défossez A, Adi Y, Synnaeve G. Differentiable Model Compression via Pseudo Quantization Noise. arXiv 2022.
Tang Y, Han K, Wang Y, et al. Patch Slimming for Efficient Vision Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12165–12174.
Rao Y, Zhao W, Liu B, et al. Dynamicvit: Efficient Vision Transformers with Dynamic Token Sparsification. Advances in Neural Information Processing Systems 2021; 34: 13937–13949.
Kong C, Li H, Zhang L, et al. Link Prediction on Dynamic Heterogeneous Information Networks. In Computational Data and Social Networks; Springer International Publishing: Cham, Switzerland, 2019; Volume 11917, pp. 339–350. https://doi.org/10.1007/978-3-030-34980-6_36. DOI: https://doi.org/10.1007/978-3-030-34980-6_36
Chen F, Luo Z, Xu Y, et al. Complementary Fusion of Multi-Features and Multi-Modalities in Sentiment Analysis. arXiv 2019, arXiv:1904.08138.
Chen F, Luo Z, Chen N, et al. Assessing Four Neural Networks on Handwritten Digit Recognition Dataset (MNIST). Journal of Computer Science Research 2024; 6(3):17–22. DOI: https://doi.org/10.30564/jcsr.v6i3.6804
Zhou L, Luo Z, Pan X. Machine Learning-Based System Reliability Analysis with Gaussian Process Regression. arXiv 2024, arXiv:2403.11125.
Chen F, Luo Z, Zhou L, et al. Comprehensive Survey of Model Compression and Speed Up for Vision Transformers. arXiv 2024, arXiv:2404.10407. DOI: https://doi.org/10.62836/jitp.v1i1.156
Pan X, Luo Z, Zhou L. Navigating the Landscape of Distributed File Systems: Architectures, Implementations, and Considerations. arXiv 2024, arXiv:2403.15701. DOI: https://doi.org/10.62836/iaet.v2i1.157
Pan X, Luo Z, Zhou L. Comprehensive Survey of State-of-the-Art Convolutional Neural Network Architectures and Their Applications in Image Classification. Innovations in Applied Engineering and Technology 2022; 1(1): 1–16. DOI: https://doi.org/10.62836/iaet.v1i1.1006
Luo Z, Chen F, Chen X, et al. A Novel Framework for Text-Image Pair to Video Generation in Music Anime Douga (MAD) Production. Artificial Intelligence Advances 2024; 6(1): 25–33. DOI: https://doi.org/10.30564/aia.v6i1.6848
Luo Z, Pan X. Advancing Mathematical Reasoning in AI: A Comprehensive Approach with the MathQA Dataset. HAL Open Science 2024; hal-04667520.
Luo Z, Pan X. Defense Against Adversarial Attacks on Speech Systems. HAL Open Science 2024; hal-04667394.
Chen Z, Fu C, Wu R, et al. Wa with Heterogeneous RGCN for Medical ICD Coding Generation. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa ON Canada, 29 October–3 November 2023; pp. 5428–5435. https://doi.org/10.1145/3581783.3612564.ng. DOI: https://doi.org/10.1145/3581783.3612564
Chen Z, Fu C, Tang X. Multi-Domain Fake News Detection with Fuzzy Labels. In Database Systems for Advanced Applications. DASFAA 2023 International Workshops; Springer Nature: Cham, Switzerland, 2023; Volume 13922, pp. 331–343. https://doi.org/10.1007/978-3-031-35415-1_23. DOI: https://doi.org/10.1007/978-3-031-35415-1_23
Wang Y, Chen J, Wang M, et al, A Closer Look at Classifier in Adversarial Domain Generalization. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa ON Canada, 29 October–3 November 2023; pp. 280–289. https://doi.org/10.1145/3581783.3611743. DOI: https://doi.org/10.1145/3581783.3611743
Gu Y, Yan D, Yan S, et al. Price Forecast with High-Frequency Finance Data: An Autoregressive Recurrent Neural Network Model with Technical Indicators. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual, 19–23 October 2020; pp. 2485–2492. https://doi.org/10.1145/3340531.3412738. DOI: https://doi.org/10.1145/3340531.3412738
Gu Y, Chen K. GAN-Based Domain Inference Attack. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023. https://doi.org/10.1609/aaai.v37i12.26663. DOI: https://doi.org/10.1609/aaai.v37i12.26663
Gu Y, Sharma S, Chen K. Image Disguising for Scalable GPU-accelerated Confidential Deep Learning. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, Copenhagen Denmark 26–30 November 2023; pp. 3679–3681. Available online: https://dl.acm.org/doi/abs/10.1145/3576915.3624364 (accessed on 29 May 2024).
Du S, Chen Z, Wu H, et al. Image Recommendation Algorithm Combined with Deep Neural Network Designed for Social Networks. Complexity 2021; 2021: 5196190. https://doi.org/10.1155/2021/5196190. DOI: https://doi.org/10.1155/2021/5196190
Wang Y, Chen Z, Fu C. Synergy Masks of Domain Attribute Model DaBERT: Emotional Tracking on Time-Varying Virtual Space Communication. Sensors 2022; 22: 8450. DOI: https://doi.org/10.3390/s22218450

Downloads

Optimizing Transformer Models for Resource-Constrained Environments: A Study on Model Compression Techniques

References

Information