Advancing Segment Anything Model for Efficient Salient Object Detection in Remote Sensing Images

Salient object detection in optical remote sensing images (ORSIs-SOD) often relies on leveraging pretrained knowledge from natural images to achieve high accuracy with limited training data. Traditional methods typically employ vision backbones [e.g., convolutional neural networks (CNNs) or vision transformers (ViTs)] pretrained on ImageNet to extract features from ORSI scenes. However, these backbones exhibit limited generalization across diverse scenarios compared to recent vision foundation models. To this end, we propose ORSI-segment anything model (SAM), a novel ORSI-SOD framework based on the SAM, leveraging its superior generalization capabilities to achieve an exceptional efficiency–accuracy tradeoff. Specifically, ORSI-SAM adopts lightweight SAM as the backbone, effectively reducing parameter size and computational overhead to enable efficient deployment on satellite devices while retaining the rich knowledge learned from large-scale natural image datasets. To mitigate the impact of unavailable prompts in ORSI-SOD on the prediction capability of the SAM decoder, we introduce a hierarchical interaction prompt generator (HIPG), which aggregates hierarchical features and generates mask prompts tailored for salient objects to guide the decoder in producing high-quality saliency maps. Furthermore, to address the recognition challenges caused by the inherent characteristics of ORSIs, we propose a semantic-aware refinement decoder (SARD). SARD integrates structural details from low-level features to enrich fine-grained object information while leveraging high-level features to suppress redundant interference in shallow layers, thereby improving the detailed information in the predicted saliency map. ORSI-SAM is the first work to explore the accuracy–efficiency tradeoffs for ORSI-SOD based on SAM architecture. Extensive experiments on benchmark datasets show that ORSI-SAM achieves superior performance compared to recent state-of-the-art methods with 12.2 M parameters and 8.9 G FLOPs.

Zhang Jiehua, Liu Li, Su Zhuo, Liu Tianpeng, Liu Zhen, Pietikäinen Matti

A1 Journal article (refereed), original research

, ,

J. Zhang, L. Liu, Z. Su, T. Liu, Z. Liu and M. Pietikäinen, "Advancing Segment Anything Model for Efficient Salient Object Detection in Remote Sensing Images," in IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1-16, 2025, Art no. 5642216, doi: 10.1109/TGRS.2025.3604856. keywords: {Remote sensing;Decoding;Training;Object detection;Image segmentation;Generators;Training data;Semantics;Interference;Computational modeling;Lightweight salient object detection (SOD);optical remote sensing image;segment anything model (SAM)},

https://doi.org/10.1109/TGRS.2025.3604856

Related 6G Flagship publications