| Summary: | High-resolution remote sensing imagery plays an essential role in urban management and environmental monitoring, providing detailed insights for applications ranging from land cover mapping to disaster response. Semantic segmentation methods are among the most effective techniques for comprehensive land cover mapping, and they commonly employ ImageNet-based pre-training semantics. However, traditional fine-tuning processes exhibit poor transferability across different downstream tasks and require large amounts of labeled data. To address these challenges, we introduce Denoising Diffusion Probabilistic Models (DDPMs) as a generative pre-training approach for semantic features extraction in remote sensing imagery. We pre-trained a DDPM on extensive unlabeled imagery, obtaining features at multiple noise levels and resolutions. In order to integrate and optimize these features efficiently, we designed a multi-layer perceptron module with residual connections. It performs channel-wise optimization to suppress feature redundancy and refine representations. Additionally, we froze the feature extractor during fine-tuning. This strategy significantly reduces computational consumption and facilitates fast transfer and deployment across various interpretation tasks on homogeneous imagery. Our comprehensive evaluation on the sparsely labeled dataset MiniFrance-S and the fully labeled Gaofen Image Dataset achieved mean intersection over union scores of 42.7% and 66.5%, respectively, outperforming previous works. This demonstrates that our approach effectively reduces reliance on labeled imagery and increases transferability to downstream remote sensing tasks.
|