Please use this identifier to cite or link to this item:
http://hdl.handle.net/1942/43607
Title: | Cross-modal Spectral Fusion Model for Referring Video Object Segmentation | Authors: | Huang, Kesi Li, Tianxiao Xia, Qiqiang CHEN, Junhong Asim, Muhammad Liu, Wenyin |
Issue Date: | 2024 | Publisher: | IEEE | Source: | 2024 IEEE 14th International Conference on Electronics Information and Emergency Communication (ICEIEC), IEEE, p. 133 -137 | Abstract: | Referring Video Object Segmentation (R-VOS) demands precise visual comprehension and sophisticated cross-modal reasoning to segment objects in videos based on descriptions from natural language. Addressing this challenge, we introduce the Cross-modal Spectral Fusion Model (CSF). Our model incorporates a Multi-Scale Spectral Fusion Module (MSFM), which facilitates robust global interactions between the modalities, and a Consensus Fusion Module (CFM) that dynamically balances multiple prediction vectors based on text features and spectral cues for accurate mask generation. Additionally, the Dual-stream Mask Decoder (DMD) enhances the segmentation accuracy by capturing both local and global information through parallel processing. Tested on three datasets, CSF surpasses existing methods in R-VOS, proving its efficacy and potential for advanced video understanding tasks. | Keywords: | referring video object segmentation;cross-modal;multi-scale alignment | Document URI: | http://hdl.handle.net/1942/43607 | ISBN: | 979-8-3503-6189-6 | DOI: | 10.1109/ICEIEC61773.2024.10561688 | Rights: | 2024 IEEE | Category: | C1 | Type: | Proceedings Paper |
Appears in Collections: | Research publications |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Cross-modal_Spectral_Fusion_Model_for_Referring_Video_Object_Segmentation.pdf Restricted Access | Published version | 2.34 MB | Adobe PDF | View/Open Request a copy |
ICEIEC+17_Cross-modal Spectral Fusion Model for Referring Video Object Segmentation.pdf Until 2025-08-29 | Peer-reviewed author version | 1.39 MB | Adobe PDF | View/Open Request a copy |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.