Weighted Feature Fusion Network Based on Large Kernel Convolution and Transformer for Multi-Modal Remote Sensing Image Segmentation

The heterogeneity and complexity of multi-modal data in high-resolution remote sensing images posed a severe challenge to existing cross-modal networks that aim to fuse complementary information of high-resolution optical and elevation data information (DSM) to achieve accurate semantic segmentation...

Full description

Bibliographic Details
Published in:	IEEE Access
Main Authors:	Jianxia Wang, Shaozu Qiu, Jia Cai, Xiaoming Zhang
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Subjects:	Multi-modal large convolution kernel remote sensing image semantic segmentation feature fusion
Online Access:	https://ieeexplore.ieee.org/document/11123171/

Description
Summary:	The heterogeneity and complexity of multi-modal data in high-resolution remote sensing images posed a severe challenge to existing cross-modal networks that aim to fuse complementary information of high-resolution optical and elevation data information (DSM) to achieve accurate semantic segmentation. To solve this problem, a weighted feature fusion network based on large kernel convolution and Transformer (LTFCNet) was proposed. The model uses two parallel encoders to extract the features of different modalities, an improved cross-fusion module to enhance the encoder’s feature extraction capability, and a gate module based on large kernel and Transformer to achieve multi-modal fusion. Finally, a Difference information Feature Fusion Module (DFFM) leveraging attention to differential regions is used to achieve cross-level feature fusion and enhance small object detection. To evaluate the network, we compare it with several state-of-the-art models (SOTA), using the Potsdam and Vaihingen datasets. The experimental results demonstrate that the proposed model outperforms other SOTA models by approximately 2% in the mIoU metric, validating its effectiveness in multi-modal feature fusion.
ISSN:	2169-3536

Weighted Feature Fusion Network Based on Large Kernel Convolution and Transformer for Multi-Modal Remote Sensing Image Segmentation

Similar Items