给定一张输入图像,输出全景分割掩膜,类别,分数(虚拟分数)。
全景分割是要分割出图像中的stuff,things。stuff是指天空,草地等不规则区域,things是指可数的物体,例如人,车,猫等。
请前往Mask2Former-R50全景分割 体验模型finetune
本模型使用swin large为backbone,Mask2Former为分割头。COCO全景分割数据库上训练。
Swin transformer是一种具有金字塔结构的transformer架构,其表征通过shifted windows计算。Shifted windows方案将自注意力的计算限制在不重叠的局部窗口上,同时还允许跨窗口连接,从而带来更高的计算效率。分层的金字塔架构则让其具有在各种尺度上建模的灵活性。这些特性使swin transformer与广泛的视觉任务兼容,并在密集预测任务如COCO实例分割上达到SOTA性能。其结构如下图所示。
Mask2Former是一种能够解决任何图像分割任务(全景、实例或语义)的新架构。它包含了一个masked attention结构,通过将交叉注意力计算内来提取局部特征。
本模型适用范围较广,能对图片中包含的大部分感兴趣物体(COCO things 80类,stuff 53类)进行分割。
在ModelScope框架上,提供输入图片,即可通过简单的Pipeline调用来使用。
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
from modelscope.utils.cv.image_utils import panoptic_seg_masks_to_image
from modelscope.outputs import OutputKeys
import cv2
segmentor = pipeline(Tasks.image_segmentation, model='damo/cv_swinL_panoptic-segmentation_cocopan')
input_url = 'https://modelscope.oss-cn-beijing.aliyuncs.com/test/images/image_panoptic_segmentation.jpg'
result = segmentor(input_url)
draw_img = panoptic_seg_masks_to_image(result[OutputKeys.MASKS])
cv2.imwrite('result.jpg', draw_img)
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks,DownloadMode
from modelscope.utils.cv.image_utils import panoptic_seg_masks_to_image
from modelscope.outputs import OutputKeys
from modelscope.msdatasets import MsDataset
import cv2
segmentor = pipeline(Tasks.image_segmentation, model='damo/cv_swinL_panoptic-segmentation_cocopan')
ms_ds_val = MsDataset.load("COCO_segmentation_inference", namespace="modelscope", split="validation", download_mode=DownloadMode.FORCE_REDOWNLOAD)
result = segmentor(ms_ds_val[0]["InputImage:FILE"])
draw_img = panoptic_seg_masks_to_image(result[OutputKeys.MASKS])
cv2.imwrite('result.jpg', draw_img)
测试时主要的预处理如下:
Backbone | Pretrain | box mAP | mask mAP | PQ |
---|---|---|---|---|
Swin-L | ImageNet-21K | 52.2 | 48.5 | 57.6 |
@inproceedings{cheng2022masked,
title={Masked-attention mask transformer for universal image segmentation},
author={Cheng, Bowen and Misra, Ishan and Schwing, Alexander G and Kirillov, Alexander and Girdhar, Rohit},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={1290--1299},
year={2022}
}
@inproceedings{liu2021Swin,
title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year={2021}
}