MR-Former: Improving universal image segmentation via refined masked-attention transformer

Mask-based universal image segmentation architectures, which focus attention within the mask region, have significantly enhanced both segmentation accuracy and speed, becoming a key component of universal image segmentation frameworks. However, existing methods often generate masks from the predicte...

Full description

Bibliographic Details
Published in:Alexandria Engineering Journal
Main Authors: Xingliang Zhu, Weiwei Yu, Xiaoyu Dong, Wei Zhang, Bin Kong
Format: Article
Language:English
Published: Elsevier 2025-11-01
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1110016825010361
_version_ 1848757121680670720
author Xingliang Zhu
Weiwei Yu
Xiaoyu Dong
Wei Zhang
Bin Kong
author_facet Xingliang Zhu
Weiwei Yu
Xiaoyu Dong
Wei Zhang
Bin Kong
author_sort Xingliang Zhu
collection DOAJ
container_title Alexandria Engineering Journal
description Mask-based universal image segmentation architectures, which focus attention within the mask region, have significantly enhanced both segmentation accuracy and speed, becoming a key component of universal image segmentation frameworks. However, existing methods often generate masks from the predicted probability map using a hard threshold, which may introduce biased guidance due to missing or inaccurate object predictions. To address this issue, we propose a novel mask refinement method, called MR-Former. The method divides the segmentation output into two categories: ‘object’ and ‘non-object’ masks. For ‘object’ masks, we further refine the mask into interior and edge regions, and apply independent attention branches to learn their respective attention distributions, which are then adaptively fused. For ‘non-object’ masks, we introduce a mask focusing technique that objects key regions to ensure attention is concentrated on high-probability areas, effectively eliminating irrelevant distractions. Experimental results demonstrate that MR-Former, with a ResNet-50 backbone, significantly improves segmentation performance across multiple state-of-the-art architectures, including Mask2Former, OneFormer, and PEM. On the Cityscapes dataset, MR-Former achieves improvements of up to 0.9 PQ, 0.7 AP, and 1.3 mIoU with a maximum Params increase of only 1.4%; on the ADE20K dataset, it shows improvements of up to 1.1 PQ, 1.4 AP, and 1.3 mIoU accompanied by a maximum FLOPs increment of just 1G; and on the Mapillary Vistas dataset, it demonstrates improvements of up to 1.1 PQ and 1.2 mIoU with a maximum FPS loss of merely 3.4.
format Article
id doaj-art-fd0e2355a86e4c028cdf3cdedc52cfcd
institution Directory of Open Access Journals
issn 1110-0168
language English
publishDate 2025-11-01
publisher Elsevier
record_format Article
spelling doaj-art-fd0e2355a86e4c028cdf3cdedc52cfcd2025-10-16T05:01:32ZengElsevierAlexandria Engineering Journal1110-01682025-11-0113123224410.1016/j.aej.2025.09.072MR-Former: Improving universal image segmentation via refined masked-attention transformerXingliang Zhu0Weiwei Yu1Xiaoyu Dong2Wei Zhang3Bin Kong4Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, 230031, China; University of Science and Technology of China, Hefei, 230026, ChinaHefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, 230031, China; University of Science and Technology of China, Hefei, 230026, ChinaHefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, 230031, China; University of Science and Technology of China, Hefei, 230026, ChinaHefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, 230031, China; University of Science and Technology of China, Hefei, 230026, ChinaHefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, 230031, China; University of Science and Technology of China, Hefei, 230026, China; Anhui Engineering Laboratory for Intelligent Driving Technology and Application, Hefei, Anhui, China; Innovation Research Institute of Robotics and Intelligent Manufacturing(Hefei), Chinese Academy of Sciences, Hefei, Anhui, China; Corresponding author at: Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, 230031, China.Mask-based universal image segmentation architectures, which focus attention within the mask region, have significantly enhanced both segmentation accuracy and speed, becoming a key component of universal image segmentation frameworks. However, existing methods often generate masks from the predicted probability map using a hard threshold, which may introduce biased guidance due to missing or inaccurate object predictions. To address this issue, we propose a novel mask refinement method, called MR-Former. The method divides the segmentation output into two categories: ‘object’ and ‘non-object’ masks. For ‘object’ masks, we further refine the mask into interior and edge regions, and apply independent attention branches to learn their respective attention distributions, which are then adaptively fused. For ‘non-object’ masks, we introduce a mask focusing technique that objects key regions to ensure attention is concentrated on high-probability areas, effectively eliminating irrelevant distractions. Experimental results demonstrate that MR-Former, with a ResNet-50 backbone, significantly improves segmentation performance across multiple state-of-the-art architectures, including Mask2Former, OneFormer, and PEM. On the Cityscapes dataset, MR-Former achieves improvements of up to 0.9 PQ, 0.7 AP, and 1.3 mIoU with a maximum Params increase of only 1.4%; on the ADE20K dataset, it shows improvements of up to 1.1 PQ, 1.4 AP, and 1.3 mIoU accompanied by a maximum FLOPs increment of just 1G; and on the Mapillary Vistas dataset, it demonstrates improvements of up to 1.1 PQ and 1.2 mIoU with a maximum FPS loss of merely 3.4.http://www.sciencedirect.com/science/article/pii/S1110016825010361Universal image segmentationMasked transformerRefined masked attention
spellingShingle Xingliang Zhu
Weiwei Yu
Xiaoyu Dong
Wei Zhang
Bin Kong
MR-Former: Improving universal image segmentation via refined masked-attention transformer
Universal image segmentation
Masked transformer
Refined masked attention
title MR-Former: Improving universal image segmentation via refined masked-attention transformer
title_full MR-Former: Improving universal image segmentation via refined masked-attention transformer
title_fullStr MR-Former: Improving universal image segmentation via refined masked-attention transformer
title_full_unstemmed MR-Former: Improving universal image segmentation via refined masked-attention transformer
title_short MR-Former: Improving universal image segmentation via refined masked-attention transformer
title_sort mr former improving universal image segmentation via refined masked attention transformer
topic Universal image segmentation
Masked transformer
Refined masked attention
url http://www.sciencedirect.com/science/article/pii/S1110016825010361
work_keys_str_mv AT xingliangzhu mrformerimprovinguniversalimagesegmentationviarefinedmaskedattentiontransformer
AT weiweiyu mrformerimprovinguniversalimagesegmentationviarefinedmaskedattentiontransformer
AT xiaoyudong mrformerimprovinguniversalimagesegmentationviarefinedmaskedattentiontransformer
AT weizhang mrformerimprovinguniversalimagesegmentationviarefinedmaskedattentiontransformer
AT binkong mrformerimprovinguniversalimagesegmentationviarefinedmaskedattentiontransformer