M2FP(Mask2Former for Parsing,官方代码)基于 Mask2Former 架构,并进行了一些改进以适应人体解析。 M2FP 可以适应几乎所有人体解析任务并产生惊人的性能。
模型整体架构图如下:
本模型适用于包含多个人体的图像,对图片中的人体各组件进行解析、分割。
在ModelScope框架上,提供输入图片,即可通过简单的Pipeline调用来使用。
from modelscope.outputs import OutputKeys
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks
input_img = 'https://modelscope.oss-cn-beijing.aliyuncs.com/test/images/mog_face_detection.jpg'
segmentation_pipeline = pipeline(Tasks.image_segmentation, 'damo/cv_resnet101_image-multiple-human-parsing')
result = segmentation_pipeline(input_img)
print(result[OutputKeys.LABELS])
模型在CIHP验证集上进行评估,结果如下:
Backbone | Pretrain | Epochs | mIoU | APr | #params | FLOPs |
---|---|---|---|---|---|---|
ResNet-101 | ImageNet-1k | 150 | 69.15 | 60.47 | 63.0M | 206.2G |
注:测试时使用了 TTA(Test Time Augmentation)数据增强
可视化结果:
如果你觉得该模型对你有所帮助,请考虑引用下面的相关论文:
@article{yang2023humanparsing,
title={Deep Learning Technique for Human Parsing: A Survey and Outlook},
author={Lu Yang and Wenhe Jia and Shan Li and Qing Song},
journal={arXiv preprint arXiv:2301.00394},
year={2023}
}
@inproceedings{cheng2021mask2former,
title={Masked-attention Mask Transformer for Universal Image Segmentation},
author={Bowen Cheng and Ishan Misra and Alexander G. Schwing and Alexander Kirillov and Rohit Girdhar},
journal={CVPR},
year={2021}
}
@inproceedings{cheng2021maskformer,
title={Per-Pixel Classification is Not All You Need for Semantic Segmentation},
author={Bowen Cheng and Alexander G. Schwing and Alexander Kirillov},
journal={NeurIPS},
year={2021}
}