Fine-Tuning SAM2 for Generalizable Polyp Segmentation with a Channel Attention-Enhanced Decoder

Authors

  • Yixiao Liu Sichuan University, Chengdu 610000, China

DOI:

https://doi.org/10.62836/amr.v4i1.311

Keywords:

polyp segmentation;, vision foundation model;, SAM2;, fine-tuning;, generalizability

Abstract

Polyp segmentation is a critical task in medical image analysis, particularly in colonoscopy, where it plays a vital role in the early detection and treatment of colorectal cancer. In recent years, advancements in deep learning, especially the application of Convolutional Neural Networks (CNNs) and Transformer models, have significantly improved segmentation performance. Despite these advancements, the generalizability of these models across different datasets is often limited. Recently, Meta released the Segment Anything Model 2 (SAM2), which has demonstrated exceptional performance in both video and image segmentation tasks. This paper aims to develop a universal polyp segmentation model by fine-tuning the pre-trained encoder of SAM2. We introduce a learnable prompt layer within the Transformer blocks and employ a full-scale skip connection structure as a decoder to integrate multi-scale semantic features. Our model outperforms state-of-the-art methods on datasets such as Kvasir-Seg and CVC-ClinicDB. Additionally, our experiments show that the model exhibits excellent transfer learning capabilities on unseen datasets, making it a robust and generalizable model in the field of polyp segmentation.

References

Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland, 2015; pp. 234–241.

Badrinarayanan V, Kendall A, Cipolla, R. Segnet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 2017; 39(12): 2481–2495.

Dosovitskiy A. An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929.

Wang J, Huang Q, Tang F, et al. Stepwise Feature Fusion: Local guides Global. In Proceedings of the 2022 International Conference on Medical Image Computing and Computer-Assisted Intervention, Singapore, 18–22 September 2022; Springer: Cham, Switzerland, 2022; pp. 110–120.

Duc NT, Oanh NT, Thuy NT, et al. Colonformer: An Efficient Transformer Based Method for Colon Polyp Segmentation. IEEE Access 2022; 10: 80575–80586.

Chen J, Lu Y, Yu Q, et al. Transunet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv 2021, arXiv:2102.04306.

Zhang Y, Liu H, Hu Q. Transfuse: Fusing Transformers and CNNS for Medical Image Segmentation. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2021: 24th International Conference, Strasbourg, France, 27 September–1 October 2021; Springer: Cham, Switzerland, 2021; pp. 14–24.

Zhao X, Ding W, An, Y, et al. Fast Segment Anything. arXiv 2023, arXiv:2306.12156.

Xiong Y, Varadarajan B, Wu L, et al. Efficient sam: Leveraged masked image pretraining for efficient segment anything. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 16111–16121.

Ravi N, Gabeur V, Hu YT, et al. Sam2: Segment Anything in Images and Videos. arXiv 2024, arXiv:2408.00714.

Chen T, Lu A, Zhu L, et al. Sam2-Adapter: Evaluating & Adapting Segment Anything 2 in Downstream Tasks: Camouflage, Shadow, Medical image Segmentation, and More. arXiv 2024, arXiv:2408.04579.

Ma J, Kim S, Li F, et al. Segment Anything in Medical Images and Videos: Benchmark and Deployment. arXiv 2024, arXiv:2408.03322.

Zhang M, Wang L, Chen Z, et al. Path-sam2: Transfer sam2 for Digital Pathology Semantic Segmentation. arXiv 2024, arXiv:2408.03651.

Mansoori M, Shahabodini S, Abouei J, et al. Self-Prompting Polyp Segmentation in Colonoscopy Using Hybrid yolo-sam2 Model. arXiv 2024, arXiv:2409.09484.

Chen T, Zhu L, Deng C, et al. Sam-Adapter: Adapting Segment Anything in Underperformed Scenes. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 3367–3375.

Qiu Z, Hu Y, Li H, et al. Learnable Ophthalmology sam. arXiv 2023, arXiv:2304.13425.

Huang H, Lin L, Tong R, et al. Unet 3+: A Full-Scale Connected UNET for Medical Image Segmentation. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 1055–1059.

Sanderson E, Matuszewski BJ. FCN-Transformer Feature Fusion for Polyp Segmentation. In Proceedings of the Annual Conference on Medical Image Understanding and Analysis, Cambridge, UK, 27–29 July 2022; pp. 892–907.

Fan D-P, Ji G-P, Zhou T, et al. Pranet: Parallel Reverse Attention Network for polyp Segmentation. In Proceedings of the 2020 International Conference on Medical Image Computing and Computer-Assisted Intervention, Lima, Peru, 4–8 October 2020; pp. 263–273.

Dumitru R-G, Peteleaza D, Craciun C. Using Duck-Net for Polyp Image Segmentation. Scientific Reports 2023; 13(1): 9803.

Srivastava A, Jha D, Chanda S, et al. MSRF-Net: A Multi-Scale Residual Fusion Network for Biomedical Image Segmentation. IEEE Journal of Biomedical and Health Informatics 2021; 26(5): 2252–2263.

Jha D, Smedsrud PH, Riegler MA, et al. Resunet++: An Advanced Architecture for Medical Image Segmentation. In Proceedings of the 2019 IEEE International Symposium on Multimedia (ISM), San Diego, CA, USA, 9–11 December 2019; pp. 225–2255.

Downloads

Published

02/25/2025

How to Cite

Yixiao Liu. (2025). Fine-Tuning SAM2 for Generalizable Polyp Segmentation with a Channel Attention-Enhanced Decoder. Advanced Medical Research, 4(1), 1–9. https://doi.org/10.62836/amr.v4i1.311

Issue

Section

Medical Theory Research