MaskDINO检测和分割通用框架,backbone选用Swin transformer模型。
Swin transformer是一种具有金字塔结构的transformer架构,其表征通过shifted windows计算。Shifted windows方案将自注意力的计算限制在不重叠的局部窗口上,同时还允许跨窗口连接,从而带来更高的计算效率。分层的金字塔架构则让其具有在各种尺度上建模的灵活性。这些特性使swin transformer与广泛的视觉任务兼容,并在密集预测任务如COCO实例分割上达到SOTA性能。其结构如下图所示。
MaskDINO是一种用于目标检测、全景、实例和语义分割的统一架构。它可以实现检测和分割之间的任务和数据协同,并在相同设置下达到最先进性能。其结构示意图如下。
本模型适用范围较广,能对图片中包含的大部分感兴趣物体(COCO 80类)进行识别和分割。
在ModelScope框架上,提供输入图片,即可通过简单的Pipeline调用来使用。
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
input_img = 'https://modelscope.oss-cn-beijing.aliyuncs.com/test/images/image_instance_segmentation.jpg'
output = './result.jpg'
segmentation_pipeline = pipeline(Tasks.image_segmentation, 'damo/cv_maskdino-swin-l_image-instance-segmentation_coco')
result = segmentation_pipeline(input_img)
# if you want to show the result, you can run
from modelscope.preprocessors.image import LoadImage
from modelscope.models.cv.image_instance_segmentation.postprocess_utils import show_result
numpy_image = LoadImage.convert_to_ndarray(input_img)[:, :, ::-1] # in bgr order
show_result(numpy_image, result, out_file=output, show_box=True, show_label=True, show_score=False)
from PIL import Image
Image.open(output).show()
测试时主要的预处理如下:
模型在COCO2017val上进行测试,结果如下:
Backbone | Pretrain | Epochs | box mAP | mask mAP | #params | FLOPs |
---|---|---|---|---|---|---|
Swin-L | ImageNet-21k | 50 | 59.0 | 52.3 | 223M | 1326G |
可视化结果:
如果你觉得该模型对你有所帮助,请考虑引用下面的相关论文:
@inproceedings{liu2021Swin,
title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year={2021}
}
@misc{li2022mask,
title={Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation},
author={Feng Li and Hao Zhang and Huaizhe xu and Shilong Liu and Lei Zhang and Lionel M. Ni and Heung-Yeung Shum},
year={2022},
eprint={2206.02777},
archivePrefix={arXiv},
primaryClass={cs.CV}
}