Please use this identifier to cite or link to this item:
http://hdl.handle.net/1942/43607
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Huang, Kesi | - |
dc.contributor.author | Li, Tianxiao | - |
dc.contributor.author | Xia, Qiqiang | - |
dc.contributor.author | CHEN, Junhong | - |
dc.contributor.author | Asim, Muhammad | - |
dc.contributor.author | Liu, Wenyin | - |
dc.date.accessioned | 2024-08-29T11:13:05Z | - |
dc.date.available | 2024-08-29T11:13:05Z | - |
dc.date.issued | 2024 | - |
dc.date.submitted | 2024-08-14T18:19:50Z | - |
dc.identifier.citation | 2024 IEEE 14th International Conference on Electronics Information and Emergency Communication (ICEIEC), IEEE, p. 133 -137 | - |
dc.identifier.isbn | 979-8-3503-6189-6 | - |
dc.identifier.uri | http://hdl.handle.net/1942/43607 | - |
dc.description.abstract | Referring Video Object Segmentation (R-VOS) demands precise visual comprehension and sophisticated cross-modal reasoning to segment objects in videos based on descriptions from natural language. Addressing this challenge, we introduce the Cross-modal Spectral Fusion Model (CSF). Our model incorporates a Multi-Scale Spectral Fusion Module (MSFM), which facilitates robust global interactions between the modalities, and a Consensus Fusion Module (CFM) that dynamically balances multiple prediction vectors based on text features and spectral cues for accurate mask generation. Additionally, the Dual-stream Mask Decoder (DMD) enhances the segmentation accuracy by capturing both local and global information through parallel processing. Tested on three datasets, CSF surpasses existing methods in R-VOS, proving its efficacy and potential for advanced video understanding tasks. | - |
dc.description.sponsorship | This work is supported by the National Natural Science Foundation of China (No. 91748107), the Special Research Fund (BOF) of Hasselt University (No. BOF23DOCBL11), the Guangdong Innovative Research Team Program (No. 2014ZT05G157). Chen Junhong was sponsored by the China Scholarship Council (No. 202208440309). | - |
dc.language.iso | en | - |
dc.publisher | IEEE | - |
dc.rights | 2024 IEEE | - |
dc.subject.other | referring video object segmentation | - |
dc.subject.other | cross-modal | - |
dc.subject.other | multi-scale alignment | - |
dc.title | Cross-modal Spectral Fusion Model for Referring Video Object Segmentation | - |
dc.type | Proceedings Paper | - |
local.bibliographicCitation.conferencedate | 2024, 24-25 May | - |
local.bibliographicCitation.conferencename | 14th International Conference on Electronics Information and Emergency Communication | - |
local.bibliographicCitation.conferenceplace | Beijing, China | - |
dc.identifier.epage | 137 | - |
dc.identifier.spage | 133 | - |
local.bibliographicCitation.jcat | C1 | - |
local.type.refereed | Refereed | - |
local.type.specified | Proceedings Paper | - |
dc.identifier.doi | 10.1109/ICEIEC61773.2024.10561688 | - |
local.provider.type | CrossRef | - |
local.bibliographicCitation.btitle | 2024 IEEE 14th International Conference on Electronics Information and Emergency Communication (ICEIEC) | - |
local.uhasselt.international | yes | - |
item.fulltext | With Fulltext | - |
item.contributor | Huang, Kesi | - |
item.contributor | Li, Tianxiao | - |
item.contributor | Xia, Qiqiang | - |
item.contributor | CHEN, Junhong | - |
item.contributor | Asim, Muhammad | - |
item.contributor | Liu, Wenyin | - |
item.embargoEndDate | 2025-08-29 | - |
item.fullcitation | Huang, Kesi; Li, Tianxiao; Xia, Qiqiang; CHEN, Junhong; Asim, Muhammad & Liu, Wenyin (2024) Cross-modal Spectral Fusion Model for Referring Video Object Segmentation. In: 2024 IEEE 14th International Conference on Electronics Information and Emergency Communication (ICEIEC), IEEE, p. 133 -137. | - |
item.accessRights | Embargoed Access | - |
Appears in Collections: | Research publications |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Cross-modal_Spectral_Fusion_Model_for_Referring_Video_Object_Segmentation.pdf Restricted Access | Published version | 2.34 MB | Adobe PDF | View/Open Request a copy |
ICEIEC+17_Cross-modal Spectral Fusion Model for Referring Video Object Segmentation.pdf Until 2025-08-29 | Peer-reviewed author version | 1.39 MB | Adobe PDF | View/Open Request a copy |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.