Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/43606
Title: SMVT: Spectrum-Driven Multi-scale Vision Transformer for Referring Image Segmentation
Authors: Li, Tianxiao
CHEN, Junhong 
Huang, Yiheng
Huang, Kesi
Xia, Qiqiang
Asim, Muhammad
Liu, Wenyin
Issue Date: 2024
Source: Advanced Intelligent Computing Technology and Applications: Proceedings, Part VI, p. 193 -206
Series/Report: Lecture Notes in Computer Science
Series/Report no.: 14867
Abstract: Referring image segmentation is a challenging task at the intersection of computer vision and natural language processing, aiming to segment out an object referred to by a natural language expression from an image. Recently despite significant progress on this task, existing methods still face challenges in effectively integrating visual and language information and enhancing the model's ability to capture fine-grained details within images. These challenges primarily originate from a lack of a mechanism capable of deeply and comprehensively fusing visual features with language features and effectively utilizing cross-modal features. To address these problems, we propose the Spectrum-driven Multi-scale Visual Transformer (SMVT), which incorporates two innovative designs: Spectrum-driven Fusion Attention (SFA) and the Cross-modal Feature Refinement Enhancement (CFRE) module. SFA, by guiding the fusion of visual and linguistic features at the spectral domain level, effectively captures fine-grained features in images and enhances the model's sensitivity to local spectral domain information , thereby responding more accurately to the detail requirements in language descriptions. CFRE module, by refining and enhancing cross-modal features at different layers, enhances the complementarity and the ability to capture fine-grained cross-modal features across different layers, promoting the precise alignment of visual and language features. These two modules enable the SMVT to more effectively process visual and language information. Experiments on three benchmark datasets have shown that our method surpasses state-of-the-art approaches.
Document URI: http://hdl.handle.net/1942/43606
ISBN: 978-981-97-5596-7
978-981-97-5597-4
DOI: 10.1007/978-981-97-5597-4_17
Rights: The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 D.-S. Huang et al. (Eds.): ICIC 2024, LNCS 14867, pp. 193–206, 2024.
Category: C1
Type: Proceedings Paper
Appears in Collections:Research publications

Files in This Item:
File Description SizeFormat 
978-981-97-5597-4_17.pdf
  Restricted Access
Published version1.82 MBAdobe PDFView/Open    Request a copy
Show full item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.